AskDB
·7 min read

Regex Explained: How Regular Expressions Work

Regular expressions pattern-match text using a specialized syntax. Understanding how the regex engine works helps you write patterns that are both correct and efficient.

How the Engine Works

A regex engine reads your pattern left to right and tries to match it against the input string character by character. When it encounters a quantifier (*, +, ?), it decides how many characters to consume.

Literals and Metacharacters

abc         # Matches "abc" literally
.           # Any character (except newline)
\*         # Escaped literal asterisk
\n         # Newline character
\t         # Tab character

Character Classes

[abc]       # a, b, or c
[^abc]      # NOT a, b, or c
[a-z]       # a through z
[0-9]       # 0 through 9
[A-Za-z]    # Any letter
\d          # Digit = [0-9]
\w          # Word char = [A-Za-z0-9_]
\s          # Whitespace = [ \t\n\r]
\D          # NOT digit
\W          # NOT word char
\S          # NOT whitespace

Quantifiers

*           # 0 or more (greedy)
+           # 1 or more (greedy)
?           # 0 or 1 (optional)
{3}         # Exactly 3
{2,4}       # Between 2 and 4
{3,}        # 3 or more

*?          # 0 or more (lazy)
+?          # 1 or more (lazy)
??          # 0 or 1 (lazy)

Greedy vs Lazy

Input: <div>hello</div><div>world</div>

Greedy: <div>.*</div>
Match:  <div>hello</div><div>world</div>

Lazy:   <div>.*?</div>
Match:  <div>hello</div>

Greedy matches as much as possible. Lazy matches as little as possible. Add ? after a quantifier to make it lazy.

Anchors

^           # Start of string/line
$           # End of string/line
\b          # Word boundary
\B          # NOT word boundary

Groups and Backreferences

(abc)       # Capture group
(?:abc)     # Non-capturing group
(abc|def)   # Alternation
\1          # Backreference to group 1

Example: (\w+) \1
Matches: "hello hello" (repeated word)

Lookahead and Lookbehind

(?=abc)     # Positive lookahead (followed by abc)
(?!abc)     # Negative lookahead (NOT followed by abc)
(?<=abc)    # Positive lookbehind (preceded by abc)
(?<!abc)    # Negative lookbehind (NOT preceded by abc)

Example: \d+(?=px)
Matches: "100" in "100px" (without consuming "px")

Performance Tips

  • Avoid nested quantifiers like (a+)+ (catastrophic backtracking)
  • Use non-capturing groups (?:...) when you do not need to extract
  • Be specific: \\d+ is faster than [0-9]+ which is faster than .+
  • Use anchors ^...$ when matching the entire string
  • Prefer lazy quantifiers when possible

Test Your Regex

Use the Regex Tester to test patterns with live match highlighting and capture groups.