·7 min read
Regex Explained: How Regular Expressions Work
Regular expressions pattern-match text using a specialized syntax. Understanding how the regex engine works helps you write patterns that are both correct and efficient.
How the Engine Works
A regex engine reads your pattern left to right and tries to match it against the input string character by character. When it encounters a quantifier (*, +, ?), it decides how many characters to consume.
Literals and Metacharacters
abc # Matches "abc" literally . # Any character (except newline) \* # Escaped literal asterisk \n # Newline character \t # Tab character
Character Classes
[abc] # a, b, or c [^abc] # NOT a, b, or c [a-z] # a through z [0-9] # 0 through 9 [A-Za-z] # Any letter \d # Digit = [0-9] \w # Word char = [A-Za-z0-9_] \s # Whitespace = [ \t\n\r] \D # NOT digit \W # NOT word char \S # NOT whitespace
Quantifiers
* # 0 or more (greedy)
+ # 1 or more (greedy)
? # 0 or 1 (optional)
{3} # Exactly 3
{2,4} # Between 2 and 4
{3,} # 3 or more
*? # 0 or more (lazy)
+? # 1 or more (lazy)
?? # 0 or 1 (lazy)Greedy vs Lazy
Input: <div>hello</div><div>world</div> Greedy: <div>.*</div> Match: <div>hello</div><div>world</div> Lazy: <div>.*?</div> Match: <div>hello</div>
Greedy matches as much as possible. Lazy matches as little as possible. Add ? after a quantifier to make it lazy.
Anchors
^ # Start of string/line $ # End of string/line \b # Word boundary \B # NOT word boundary
Groups and Backreferences
(abc) # Capture group (?:abc) # Non-capturing group (abc|def) # Alternation \1 # Backreference to group 1 Example: (\w+) \1 Matches: "hello hello" (repeated word)
Lookahead and Lookbehind
(?=abc) # Positive lookahead (followed by abc) (?!abc) # Negative lookahead (NOT followed by abc) (?<=abc) # Positive lookbehind (preceded by abc) (?<!abc) # Negative lookbehind (NOT preceded by abc) Example: \d+(?=px) Matches: "100" in "100px" (without consuming "px")
Performance Tips
- Avoid nested quantifiers like
(a+)+(catastrophic backtracking) - Use non-capturing groups
(?:...)when you do not need to extract - Be specific:
\\d+is faster than[0-9]+which is faster than.+ - Use anchors
^...$when matching the entire string - Prefer lazy quantifiers when possible
Test Your Regex
Use the Regex Tester to test patterns with live match highlighting and capture groups.