How to use regular expressions (RegEx)

Created: 26.01.2023

Updated: 06.04.2023

Author: Polina A.

Regular expressions (also known as "regexp" or "regex") is a formal language used for working with text, searching for substrings within it and performing manipulations on them. Metacharacters are special symbols used when writing regular expressions.

When working with the Gravity Field system, regular expressions can be used in some of the targeting conditions in campaigns and in audiences. A search is performed using a pattern string or "mask," consisting of characters and metacharacters, which define the search rule.

Before applying regular expressions to targeting conditions, you can check their accuracy using the website https://regex101.com/.

Example of a Regular Expression

There are various link formats:

http://site.com
https://site.com/
http://site.com/page/
site.com/page.html

Using the regular expression (https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w\.-]*)*\/?), all possible link variants within an arbitrary string will be found. This means all four variants listed above will be matched.

How to read a regular expression?

(https?:\/\/) — everything within parentheses forms a matching group. In the example, there are 4 such groups;
\ — escape symbol. The following symbols need to be escaped: . ^ $ * + ? { } \ | ( ), as they are special symbols in the regular expression language;
? — the so-called "lazy" quantifier. Since a regular expression, by default, uses "greedy" matching (i.e., it tries to match as much as possible), this quantifier limits the search only up to the specified value before it. For instance, https in the first usage and https:// in the second;
([\da-z\.-]+) — the second matching group. Square brackets indicate that the matching should be done character by character;
\d — denotes digits (0-9);
a-z — a range of checked letters, and - is also a character that needs to be matched;
+ — a special character requiring the matching to be done as many times as needed, meaning from 1 to an infinite number of times for the presence of characters;
([a-z\.]{2,6}) — the third matching group. {2,6} is a range indicating the number of character matches. In this case, it's from 2 to 6 since we're checking the top-level domain, which can contain 2 to 6 characters, and an infinite number of checks is not necessary;
([\/\w\.-]*)*\/? — the fourth matching group. \w checks any character from a to z in both lower and upper case, as well as digits;
* - a special character requiring the matching to be done as many times as needed, meaning from 0 to an infinite number of times for the presence of characters.

PreviousAffinity Profile and How It Works NextTraffic Sources Identification

Last updated 1 year ago

Was this helpful?