re — Regular Expressions

The re module provides regular expression matching operations for pattern searching, validation, and text manipulation.

import re

Basic Matching

import re

# Search for pattern anywhere in string
match = re.search(r'\d+', 'Age: 25 years')
if match:
    print(match.group())  # '25'
    print(match.start())  # 5
    print(match.end())    # 7

# Match at beginning of string
match = re.match(r'Hello', 'Hello World')
if match:
    print(match.group())  # 'Hello'

# Full string match
match = re.fullmatch(r'\d{3}-\d{4}', '123-4567')

Finding All Matches

import re

text = "Call 555-1234 or 555-5678"

# Find all matches
numbers = re.findall(r'\d{3}-\d{4}', text)
print(numbers)  # ['555-1234', '555-5678']

# Find with groups
pairs = re.findall(r'(\d{3})-(\d{4})', text)
print(pairs)  # [('555', '1234'), ('555', '5678')]

# Iterator of match objects
for m in re.finditer(r'\d{3}-\d{4}', text):
    print(f"Found {m.group()} at position {m.start()}")

Substitution

import re

text = "Hello World"

# Simple replace
result = re.sub(r'World', 'Python', text)
print(result)  # 'Hello Python'

# Replace with function
def double(match):
    return str(int(match.group()) * 2)

re.sub(r'\d+', double, 'Price: 10, Tax: 5')
# 'Price: 20, Tax: 10'

# Limit replacements
re.sub(r'\d+', 'X', '1 2 3 4', count=2)
# 'X X 3 4'

Splitting

import re

# Split by pattern
re.split(r'[,;\s]+', 'one, two; three four')
# ['one', 'two', 'three', 'four']

# Split with limit
re.split(r'\s+', 'a b c d', maxsplit=2)
# ['a', 'b', 'c d']

Common Patterns

import re

# Email validation
email_pattern = r'^[\w.-]+@[\w.-]+\.\w{2,}$'
re.match(email_pattern, 'user@example.com')  # Match

# URL
url_pattern = r'https?://[\w.-]+(?:/[\w.-]*)*'

# Phone number
phone_pattern = r'\+?\d{1,3}[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}'

# IP address
ip_pattern = r'\b(?:\d{1,3}\.){3}\d{1,3}\b'

Compiled Patterns

import re

# Compile for reuse (faster in loops)
pattern = re.compile(r'\b\w+@\w+\.\w+\b', re.IGNORECASE)
emails = pattern.findall('Contact us at info@site.com or help@site.com')

Pattern Syntax Quick Reference

Pattern Meaning
. Any character (except newline)
\d Digit [0-9]
\w Word character [a-zA-Z0-9_]
\s Whitespace
* 0 or more
+ 1 or more
? 0 or 1
{n,m} n to m repetitions
^ / $ Start / end of string
[abc] Character class
(...) Capture group
(?:...) Non-capturing group
\b Word boundary

Flags

re.IGNORECASE  # Case-insensitive matching (re.I)
re.MULTILINE   # ^ and $ match line boundaries (re.M)
re.DOTALL      # . matches newline too (re.S)
re.VERBOSE     # Allow comments in pattern (re.X)

# Combine flags
pattern = re.compile(r"""
    \d{3}   # area code
    [-.]    # separator
    \d{4}   # number
""", re.VERBOSE)

Common Pitfalls

Official Documentation

re — Regular expression operations

API Reference

Important Functions

Function Description
re.compile(pattern, flags=0) Compile a regular expression pattern into a regular expression object.
re.search(pattern, string, flags=0) Scan through string looking for the first location where pattern produces a match.
re.match(pattern, string, flags=0) Determine if the RE matches at the beginning of the string.
re.fullmatch(pattern, string, flags=0) Determine if the RE matches the entire string.
re.split(pattern, string, maxsplit=0, flags=0) Split string by the occurrences of pattern.
re.findall(pattern, string, flags=0) Return all non-overlapping matches of pattern in string, as a list of strings.
re.sub(pattern, repl, string, count=0, flags=0) Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern.

Match Object Attributes

Attribute/Method Description
Match.group([group1, ...]) Returns one or more subgroups of the match.
Match.groups() Return a tuple containing all the subgroups of the match.
Match.start([group]) Return the indices of the start of the substring matched by group.
Match.end([group]) Return the indices of the end of the substring matched by group.
Match.span([group]) Return a 2-tuple (m.start(group), m.end(group)).