How to use regular expressions (RegEx)

Created: 26.01.2023

Updated: 06.04.2023

Author: Polina A.

Example of a Regular Expression

There are various link formats:

  1. http://site.com

  2. https://site.com/

  3. http://site.com/page/

  4. site.com/page.html

Using the regular expression (https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w\.-]*)*\/?), all possible link variants within an arbitrary string will be found. This means all four variants listed above will be matched.

How to read a regular expression?

  1. (https?:\/\/) — everything within parentheses forms a matching group. In the example, there are 4 such groups;

  2. \ — escape symbol. The following symbols need to be escaped: . ^ $ * + ? { } \ | ( ), as they are special symbols in the regular expression language;

  3. ? — the so-called "lazy" quantifier. Since a regular expression, by default, uses "greedy" matching (i.e., it tries to match as much as possible), this quantifier limits the search only up to the specified value before it. For instance, https in the first usage and https:// in the second;

  4. ([\da-z\.-]+) — the second matching group. Square brackets indicate that the matching should be done character by character;

  5. \d — denotes digits (0-9);

  6. a-z — a range of checked letters, and - is also a character that needs to be matched;

  7. + — a special character requiring the matching to be done as many times as needed, meaning from 1 to an infinite number of times for the presence of characters;

  8. ([a-z\.]{2,6}) — the third matching group. {2,6} is a range indicating the number of character matches. In this case, it's from 2 to 6 since we're checking the top-level domain, which can contain 2 to 6 characters, and an infinite number of checks is not necessary;

  9. ([\/\w\.-]*)*\/? — the fourth matching group. \w checks any character from a to z in both lower and upper case, as well as digits;

  10. * - a special character requiring the matching to be done as many times as needed, meaning from 0 to an infinite number of times for the presence of characters.

Last updated

Was this helpful?