Skip to content

Regular Expressions

✅ What is a Regular Expression?

regular expression (short: regex) is a sequence of characters that defines a search pattern, mainly used for string matching and manipulation (e.g., validation, extraction, replacement).


🔍 Basic Components of Regex

  1. Literals
    Match exact characters.
    Example:

    • cat matches "cat" in "concatenate".
  2. Metacharacters
    Special characters with specific meanings:

    • . → Any single character except newline
    • ^ → Start of string
    • $ → End of string
    • * → 0 or more repetitions
    • + → 1 or more repetitions
    • ? → 0 or 1 occurrence
    • | → OR operator
    • \ → Escape special characters

🔢 Character Classes

  • [abc] → Matches ab, or c
  • [^abc] → Matches any character except ab, or c
  • [a-z] → Matches lowercase letters
  • [0-9] → Matches digits

Predefined classes:

  • \d → Digit (0-9)
  • \D → Non-digit
  • \w → Word character (letters, digits, underscore)
  • \W → Non-word character
  • \s → Whitespace
  • \S → Non-whitespace

🔁 Quantifiers

  • a* → 0 or more a
  • a+ → 1 or more a
  • a? → 0 or 1 a
  • a{3} → Exactly 3 a
  • a{2,4} → Between 2 and 4 a
  • a{2,} → 2 or more a

🧩 Groups and Capturing

You can use groups if you need to access the information captured inside a group later, for example in back-references or from calling code.

  • (abc) → Capturing group
  • (?:abc) → Non-capturing group
  • (?P<name>abc) → Named capturing group
  • \1\2 → Backreferences to captured groups

✅ Common Regex Patterns

  1. Email Validation

    ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
  2. Phone Number (US)

    ^\(\d{3}\)\s?\d{3}-\d{4}$
  3. URL

    ^https?:\/\/(www\.)?[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}(/.*)?$
  4. IP Address (IPv4)

    ^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$
  5. Date (YYYY-MM-DD)

    ^\d{4}-\d{2}-\d{2}$
  6. Password (8+ chars, 1 uppercase, 1 digit)

    ^(?=.*[A-Z])(?=.*\d)[A-Za-z\d]{8,}$

⚡ Advanced Features

  • Lookahead / Lookbehind
    • (?=...) → Positive lookahead
    • (?!...) → Negative lookahead
    • (?<=...) → Positive lookbehind
    • (?<!...) → Negative lookbehind

Example:
(?=.*\d)(?=.*[A-Z]) ensures at least one digit and one uppercase letter.