re — Regular Expressions

The re module provides regular expression matching operations for pattern searching, validation, and text manipulation.

import re

Basic Matching

import re

# Search for pattern anywhere in string
match = re.search(r'\d+', 'Age: 25 years')
if match:
    print(match.group())  # '25'
    print(match.start())  # 5
    print(match.end())    # 7

# Match at beginning of string
match = re.match(r'Hello', 'Hello World')
if match:
    print(match.group())  # 'Hello'

# Full string match
match = re.fullmatch(r'\d{3}-\d{4}', '123-4567')

Finding All Matches

import re

text = "Call 555-1234 or 555-5678"

# Find all matches
numbers = re.findall(r'\d{3}-\d{4}', text)
print(numbers)  # ['555-1234', '555-5678']

# Find with groups
pairs = re.findall(r'(\d{3})-(\d{4})', text)
print(pairs)  # [('555', '1234'), ('555', '5678')]

# Iterator of match objects
for m in re.finditer(r'\d{3}-\d{4}', text):
    print(f"Found {m.group()} at position {m.start()}")

Substitution

import re

text = "Hello World"

# Simple replace
result = re.sub(r'World', 'Python', text)
print(result)  # 'Hello Python'

# Replace with function
def double(match):
    return str(int(match.group()) * 2)

re.sub(r'\d+', double, 'Price: 10, Tax: 5')
# 'Price: 20, Tax: 10'

# Limit replacements
re.sub(r'\d+', 'X', '1 2 3 4', count=2)
# 'X X 3 4'

Splitting

import re

# Split by pattern
re.split(r'[,;\s]+', 'one, two; three four')
# ['one', 'two', 'three', 'four']

# Split with limit
re.split(r'\s+', 'a b c d', maxsplit=2)
# ['a', 'b', 'c d']

Common Patterns

import re

# Email validation
email_pattern = r'^[\w.-]+@[\w.-]+\.\w{2,}$'
re.match(email_pattern, 'user@example.com')  # Match

# URL
url_pattern = r'https?://[\w.-]+(?:/[\w.-]*)*'

# Phone number
phone_pattern = r'\+?\d{1,3}[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}'

# IP address
ip_pattern = r'\b(?:\d{1,3}\.){3}\d{1,3}\b'

Compiled Patterns

import re

# Compile for reuse (faster in loops)
pattern = re.compile(r'\b\w+@\w+\.\w+\b', re.IGNORECASE)
emails = pattern.findall('Contact us at info@site.com or help@site.com')

Pattern Syntax Quick Reference

Pattern	Meaning
`.`	Any character (except newline)
`\d`	Digit `[0-9]`
`\w`	Word character `[a-zA-Z0-9_]`
`\s`	Whitespace
`*`	0 or more
`+`	1 or more
`?`	0 or 1
`{n,m}`	n to m repetitions
`^` / `$`	Start / end of string
`[abc]`	Character class
`(...)`	Capture group
`(?:...)`	Non-capturing group
`\b`	Word boundary

Flags

re.IGNORECASE  # Case-insensitive matching (re.I)
re.MULTILINE   # ^ and $ match line boundaries (re.M)
re.DOTALL      # . matches newline too (re.S)
re.VERBOSE     # Allow comments in pattern (re.X)

# Combine flags
pattern = re.compile(r"""
    \d{3}   # area code
    [-.]    # separator
    \d{4}   # number
""", re.VERBOSE)

Common Pitfalls

Greedy vs lazy — .* is greedy, .*? is lazy (minimal match)
Raw strings — always use r'pattern' to avoid backslash issues
match vs search — match only checks the beginning of string
Catastrophic backtracking — nested quantifiers like (a+)+ can hang

Official Documentation

re — Regular expression operations

API Reference

Important Functions

Function	Description
`re.compile(pattern, flags=0)`	Compile a regular expression pattern into a regular expression object.
`re.search(pattern, string, flags=0)`	Scan through string looking for the first location where pattern produces a match.
`re.match(pattern, string, flags=0)`	Determine if the RE matches at the beginning of the string.
`re.fullmatch(pattern, string, flags=0)`	Determine if the RE matches the entire string.
`re.split(pattern, string, maxsplit=0, flags=0)`	Split string by the occurrences of pattern.
`re.findall(pattern, string, flags=0)`	Return all non-overlapping matches of pattern in string, as a list of strings.
`re.sub(pattern, repl, string, count=0, flags=0)`	Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern.

Match Object Attributes

Attribute/Method	Description
`Match.group([group1, ...])`	Returns one or more subgroups of the match.
`Match.groups()`	Return a tuple containing all the subgroups of the match.
`Match.start([group])`	Return the indices of the start of the substring matched by group.
`Match.end([group])`	Return the indices of the end of the substring matched by group.
`Match.span([group])`	Return a 2-tuple `(m.start(group), m.end(group))`.