Mastering Looker Studio : How to use Regexp to Transform Your Reports

Looker Studio offers a powerful platform for generating insightful reports, but sometimes the data you’re working with might need some fine-tuning to extract the precise information you’re looking for. This is where regular expressions (regex) come into play. Regex formulas allow you to manipulate and extract data patterns efficiently. In this blog, we’ll explore the essential regex expressions to include when building Looker Studio reports.

What is Regexp?

Regular expressions, often abbreviated as regex or regexp, are sequences of characters that define a search pattern. They are used for pattern matching within strings. In simpler terms, regular expressions allow you to search for strings matching a specific pattern, or to replace parts of strings based on matching criteria.

In Looker Studio, regular expressions can be used in various contexts, particularly in the context of creating filters, calculated fields, or data validation rules. Here’s how regular expressions can be used in Looker Studio:

  1. Filtering Data: Looker allows you to filter data based on specific criteria. Regular expressions can be used within these filters to match patterns within the data. For example, you might use a regular expression to filter out email addresses that match a certain pattern, or to filter data based on specific formats such as dates, phone numbers, or product codes.

  2. Calculated Fields: Looker enables users to create calculated fields based on existing data. Regular expressions can be used within these calculated fields to extract or manipulate data based on patterns. For instance, you might use a regular expression to extract the domain from a list of email addresses, or to clean up messy data by removing unwanted characters.

  3. Data Validation: Regular expressions can also be used for data validation purposes in Looker. For example, you might use a regular expression to ensure that user input adheres to a specific format or pattern before allowing it to be submitted.

In Looker, there are 4 main Regexp expressions you can use: regexp_replace, regexp_contains, regexp_extract, regexp_match. 

We take a look below with examples of how you can use each one in your own Looker Studio reports. 

Regexp Expression Example 1: Regexp_Replace - Full Page URL Clean

Regexp_replace is a key function which replaces all occurrences of text which matches the regular expression with the replacement string. 

One key use of regexp_replace is to remove query parameters from a full page url, effectively executing a page clean. This will allow you to see true stats against a particular page. 

In the example above, anything after a ‘?’ treats the page as if it was different however this isn’t true. You ideally would want to see visitors to a page without the query parameter. To perform this, use the following regexp:

REGEXP_REPLACE(REGEXP_REPLACE(Full page URL, ‘\\?.+’,”),’^http(|s):\\/\\/’,”)

Regexp Expression 2: Regexp_Contains - Grouping Sources Together

By combining regexp_contains with sorting options in Looker Studio, you can effectively organise and present your data based on specific patterns or criteria identified within your data.

In the example above, this case allows you to group together Session Sources. Sometimes, traffic from a particular source, e.g. Instagram, may come through in slightly different formats. (IG, insta, I.Instagra.com). Of course, we know this traffic has come from the same place however being able to group these together is vital to report on more accurately. 

By using regexp_contains, you can group together any of the variations and look at them in a whole – as seen in the table on the right. 

Use the code below to start sorting your data: 

CASE
WHEN REGEXP_CONTAINS(Session source, “IG|insta”) THEN “Instagram”
ELSE “Other”
End

Regexp Expression 3: Regexp_Extract - Pulling out the data you want

Using regexp_extract can be particularly useful when you are only wanting to look at certain bits of data. 

In the example above, we wanted to look at which Google Tag Manager blogs separately. To do this, we used regex_extract to be able to withdraw the part of the url we wanted to view. 

The expression we used is below however you can adapt it your own: 

REGEXP_EXTRACT(Full page URL,’/google-tag-manager/(.*?)(?:\\?|$)’)

Regexp Expression 4: Regexp_match - Turning Country into Continents

In similar cases to regexp_contains, regexp_match can also help to sort data. 

Regexp_match is more used when you would like a boolean return e.g. True or False. Instead of searching to see if the string contains what you are looking for, it needs to match it completely. 

One example of where this can proove useful is in the example above. Here we are looking at the Country of users however wanting to sort into some European countries, North America and then others. 

CASE
   WHEN REGEXP_MATCH
(country, “(USA|Canada|Mexico)”) THEN “North America”
   WHEN REGEXP_MATCH(country, “(United Kingdom|France|Germany|Italy)” ) THEN “Europe”
   ELSE “Other”
END

As you can see, using regular expressions can help transform your data into actionable insights. Sometimes you don;t always get the data presented to you in a way in which is useful, so being able to manipulate it is crucial. 

As always, we’re here to help so do contact us if you need any support.