Regular Expressions
✅ What is a Regular Expression?
A regular expression (short: regex) is a sequence of characters that defines a search pattern, mainly used for string matching and manipulation (e.g., validation, extraction, replacement).
🔍 Basic Components of Regex
-
Literals
Match exact characters.
Example:cat
matches"cat"
in"concatenate"
.
-
Metacharacters
Special characters with specific meanings:.
→ Any single character except newline^
→ Start of string$
→ End of string*
→ 0 or more repetitions+
→ 1 or more repetitions?
→ 0 or 1 occurrence|
→ OR operator\
→ Escape special characters
🔢 Character Classes
[abc]
→ Matchesa
,b
, orc
[^abc]
→ Matches any character excepta
,b
, orc
[a-z]
→ Matches lowercase letters[0-9]
→ Matches digits
Predefined classes:
\d
→ Digit (0-9)\D
→ Non-digit\w
→ Word character (letters, digits, underscore)\W
→ Non-word character\s
→ Whitespace\S
→ Non-whitespace
🔁 Quantifiers
a*
→ 0 or morea
a+
→ 1 or morea
a?
→ 0 or 1a
a{3}
→ Exactly 3a
a{2,4}
→ Between 2 and 4a
a{2,}
→ 2 or morea
🧩 Groups and Capturing
You can use groups if you need to access the information captured inside a group later, for example in back-references or from calling code.
(abc)
→ Capturing group(?:abc)
→ Non-capturing group(?P<name>abc)
→ Named capturing group\1
,\2
→ Backreferences to captured groups
✅ Common Regex Patterns
-
Email Validation
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ -
Phone Number (US)
^\(\d{3}\)\s?\d{3}-\d{4}$ -
URL
^https?:\/\/(www\.)?[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}(/.*)?$ -
IP Address (IPv4)
^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$ -
Date (YYYY-MM-DD)
^\d{4}-\d{2}-\d{2}$ -
Password (8+ chars, 1 uppercase, 1 digit)
^(?=.*[A-Z])(?=.*\d)[A-Za-z\d]{8,}$
⚡ Advanced Features
- Lookahead / Lookbehind
(?=...)
→ Positive lookahead(?!...)
→ Negative lookahead(?<=...)
→ Positive lookbehind(?<!...)
→ Negative lookbehind
Example:
(?=.*\d)(?=.*[A-Z])
ensures at least one digit and one uppercase letter.