How to use regular expressions (RegEx)
Last updated
Was this helpful?
Last updated
Was this helpful?
Created: 26.01.2023
Updated: 06.04.2023
Author: Polina A.
Regular expressions (also known as "regexp" or "regex") is a formal language used for working with text, searching for substrings within it and performing manipulations on them. are special symbols used when writing regular expressions.
When working with the Gravity Field system, regular expressions can be used in some of the targeting conditions in campaigns and in audiences. A search is performed using a pattern string or "mask," consisting of characters and metacharacters, which define the search rule.
Before applying regular expressions to targeting conditions, you can check their accuracy using the website .
There are various link formats:
http://site.com
https://site.com/
http://site.com/page/
site.com/page.html
Using the regular expression (https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w\.-]*)*\/?)
, all possible link variants within an arbitrary string will be found. This means all four variants listed above will be matched.
(https?:\/\/)
— everything within parentheses forms a matching group. In the example, there are 4 such groups;
\
— escape symbol. The following symbols need to be escaped: . ^ $ * + ? { } \ | ( )
, as they are special symbols in the regular expression language;
?
— the so-called "lazy" quantifier. Since a regular expression, by default, uses "greedy" matching (i.e., it tries to match as much as possible), this quantifier limits the search only up to the specified value before it. For instance, https
in the first usage and https://
in the second;
([\da-z\.-]+)
— the second matching group. Square brackets indicate that the matching should be done character by character;
\d
— denotes digits (0-9);
a-z
— a range of checked letters, and -
is also a character that needs to be matched;
+
— a special character requiring the matching to be done as many times as needed, meaning from 1 to an infinite number of times for the presence of characters;
([a-z\.]{2,6})
— the third matching group. {2,6}
is a range indicating the number of character matches. In this case, it's from 2 to 6 since we're checking the top-level domain, which can contain 2 to 6 characters, and an infinite number of checks is not necessary;
([\/\w\.-]*)*\/?
— the fourth matching group. \w
checks any character from a to z in both lower and upper case, as well as digits;
*
- a special character requiring the matching to be done as many times as needed, meaning from 0 to an infinite number of times for the presence of characters.