PYnative

Python Programming

  • Learn Python
    • Python Tutorials
    • Python Basics
    • Python Interview Q&As
  • Exercises
    • Python Exercises
    • C Programming Exercises
    • C++ Exercises
  • Quizzes
  • Code Editor
    • Online Python Code Editor
    • Online C Compiler
    • Online C++ Compiler
Home » Python Exercises » Python Regex Exercises: 30 Coding Problems with Solutions

Python Regex Exercises: 30 Coding Problems with Solutions

Updated on: May 24, 2026 | Leave a Comment

Python regular expressions (Regex) are a powerful way to search, extract, validate, and manipulate text in Python. From cleaning datasets and parsing logs to validating user input, mastering Python’s built-in re module improves coding efficiency.

This Regex exercise set helps you build hands-on experience with pattern matching through 30 exercises, progressing from basic matching to advanced text extraction and manipulation.

Each coding challenge includes a Practice Problem, Hint, Solution code, and detailed Explanation, ensuring you don’t just copy code, but genuinely practice and understand how and why it works.

  • All solutions have been fully tested on Python 3.
  • Read Python Regex: Python’s RE module for pattern matching with regular expressions.
  • Use our Online Code Editor to solve these exercises in real time.

What you’ll practice:

  • Basic Matching & Quantifiers: Character classes, sets, and repetition (*, +, ?, {m,n})
  • Word & Character Boundaries: Using ^, $, \b, and \B
  • Data Validation & Cleaning: Validating IDs, formatting, and standardizing data
  • Search & Extraction: Using re.search(), re.findall(), and re.finditer()
  • String Manipulation: Performing advanced replacements with re.sub()

Who is this for?
Beginner to intermediate Python developers with basic knowledge of Python strings who want practical experience with the re module.

+ Table of Contents (30 Exercises)

Table of contents

  • Exercise 1: Check Allowed Characters
  • Exercise 2: Match Zero or More
  • Exercise 3: Match One or More
  • Exercise 4: Match Optional Characters
  • Exercise 5: Match Exact Occurrences
  • Exercise 6: Match Range of Occurrences
  • Exercise 7: Find Underscore Joined Lowercase
  • Exercise 8: PascalCase Match
  • Exercise 9: Match Start and End
  • Exercise 10: Match Word at Start
  • Exercise 11: Match Word at End
  • Exercise 12: Find a Specific Letter
  • Exercise 13: Find Letter in Middle
  • Exercise 14: Match Adjacent Words
  • Exercise 15: Filter by Starting Letter
  • Exercise 16: Validate Alphanumeric ID
  • Exercise 17: Check Starting Number
  • Exercise 18: Number at End
  • Exercise 19: Clean IP Addresses
  • Exercise 20: Convert Date Format
  • Exercise 21: Extract 1-3 Digit Numbers
  • Exercise 22: Search Literal Strings
  • Exercise 23: Find Pattern Location
  • Exercise 24: Find All Substrings
  • Exercise 25: Iterate Matches
  • Exercise 26: Extract Date from URL
  • Exercise 27: Extract All Numbers
  • Exercise 28: Extract Email Addresses
  • Exercise 29: Swap Characters
  • Exercise 30: Replace Multiple Delimiters

Exercise 1: Check Allowed Characters

Problem Statement: Write a Python program to verify that a string contains only alphanumeric characters (a-z, A-Z, and 0-9).

Purpose: This exercise helps you practice using regular expressions to validate input strings. Checking for allowed characters is a foundational technique used in form validation, data sanitisation, and security-sensitive input handling.

Given Input: text = "Hello123"

Expected Output: Valid: contains only alphanumeric characters

▼ Hint
  • Import the re module at the top of your program.
  • Use re.fullmatch() to check whether the entire string matches a pattern, not just part of it.
  • The pattern [a-zA-Z0-9]+ matches one or more alphanumeric characters.
  • re.fullmatch() returns a match object if the whole string matches, or None if it does not.
▼ Solution & Explanation
import re

text = "Hello123"

if re.fullmatch(r"[a-zA-Z0-9]+", text):
    print("Valid: contains only alphanumeric characters")
else:
    print("Invalid: contains non-alphanumeric characters")Code language: Python (python)

Explanation:

  • import re: Loads Python’s built-in regular expression module, which is required for all re functions.
  • [a-zA-Z0-9]: A character class that matches any single uppercase letter, lowercase letter, or digit.
  • +: A quantifier meaning one or more of the preceding character class, so the string must not be empty.
  • re.fullmatch(): Requires the pattern to match the entire string from start to finish. This is stricter than re.search(), which would match even a partial substring.

Exercise 2: Match Zero or More

Problem Statement: Write a Python program to match a string that has an a followed by zero or more bs (e.g., a, ab, abb).

Purpose: This exercise introduces the * quantifier, one of the most commonly used tools in regular expressions. Understanding zero-or-more matching is essential for parsing optional repeated elements in text processing and pattern recognition.

Given Input: test_strings = ["a", "ab", "abb", "abbb", "b", "ba"]

See: Python Regex Metacharacters and Operators

Expected Output:

a      -> Match
ab     -> Match
abb    -> Match
abbb   -> Match
b      -> No match
ba     -> No match
▼ Hint
  • The pattern ab* means: the letter a, followed by zero or more bs.
  • Use re.fullmatch() so that strings like ba or abc are not incorrectly accepted.
  • Loop over a list of test strings and print whether each one matches or not.
▼ Solution & Explanation
import re
pattern = r"ab*"
test_strings = ["a", "ab", "abb", "abbb", "b", "ba"]
for s in test_strings:
    result = re.fullmatch(pattern, s)
    print(f"{s:<6} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

  • ab*: Matches a literal a followed by zero or more occurrences of b. The * quantifier means the b is entirely optional but can repeat any number of times.
  • re.fullmatch(): Ensures the entire string is evaluated against the pattern. Without it, re.search() would match the a inside ba and produce a false positive.
  • f"{s:<6}": A format specifier that left-aligns the string in a field of width 6, making the output easier to read in a column.
  • Why b and ba fail: b has no leading a, and ba has the letters in the wrong order, so neither satisfies the ab* pattern.

Exercise 3: Match One or More

Problem Statement: Write a Python program to match a string that has an a followed by one or more bs (e.g., ab, abb, but not a).

Purpose: This exercise demonstrates the + quantifier, which enforces that at least one occurrence of a character must be present. It is a small but critical distinction from * and is widely used when a repeated element is required rather than optional.

Given Input: test_strings = ["a", "ab", "abb", "abbb", "b", "ba"]

Expected Output:

a      -> No match
ab     -> Match
abb    -> Match
abbb   -> Match
b      -> No match
ba     -> No match
▼ Hint
  • The pattern ab+ means: the letter a, followed by one or more bs.
  • This is identical in structure to Exercise 2, but swapping * for + makes the b mandatory.
  • Use re.fullmatch() to reject strings like ba where the order is incorrect.
▼ Solution & Explanation
import re
pattern = r"ab+"
test_strings = ["a", "ab", "abb", "abbb", "b", "ba"]
for s in test_strings:
    result = re.fullmatch(pattern, s)
    print(f"{s:<6} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

  • ab+: Matches the letter a followed by one or more bs. The + quantifier requires at least one b to be present, unlike * which allows zero.
  • Why a now fails: The lone a matched in Exercise 2 because * allowed zero bs. Here, + demands at least one, so a alone is rejected.
  • re.fullmatch(): Continues to play an important role by preventing partial matches. Without it, re.search(r"ab+", "abXYZ") would incorrectly return a match.
  • Key distinction: * means zero or more; + means one or more. This single character difference changes whether the repeated element is optional or required.

Exercise 4: Match Optional Characters

Problem Statement: Write a Python program to match a string that has an a followed by zero or one b (i.e., exactly a or ab, nothing else).

Purpose: This exercise introduces the ? quantifier, which marks a character as optional but non-repeating. It is commonly used when parsing elements that may or may not appear, such as an optional sign in a number, an optional prefix, or an optional suffix in a word.

Given Input: test_strings = ["a", "ab", "abb", "abbb", "b", "ba"]

Expected Output:

a      -> Match
ab     -> Match
abb    -> No match
abbb   -> No match
b      -> No match
ba     -> No match
▼ Hint
  • The pattern ab? means: the letter a, followed by zero or one b.
  • The ? quantifier does not allow repetition. It only permits the character to appear once at most.
  • Use re.fullmatch() so that abb and longer strings are correctly rejected.
▼ Solution & Explanation
import re
pattern = r"ab?"
test_strings = ["a", "ab", "abb", "abbb", "b", "ba"]
for s in test_strings:
    result = re.fullmatch(pattern, s)
    print(f"{s:<6} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

  • ab?: Matches the letter a followed by an optional single b. The ? quantifier means zero or one occurrence, so only a and ab are valid.
  • Why abb fails: The ? quantifier allows at most one b. Two or more bs exceed the limit, so abb and abbb do not match when using re.fullmatch().
  • Comparison with * and +: All three quantifiers are closely related. ? is 0-1, * is 0 to infinity, and + is 1 to infinity. Choosing the right one depends on how many repetitions are acceptable.
  • Common use case: The ? quantifier is frequently used in real-world patterns, for example https? matches both http and https in URL validation.

Exercise 5: Match Exact Occurrences

Problem Statement: Write a Python program to match a string that has an a followed by exactly three bs (i.e., only abbb is a valid match).

Purpose: This exercise introduces curly-brace quantifiers, which allow you to specify an exact number of repetitions. Exact-count matching is useful in tasks such as validating fixed-length codes, parsing structured data fields, and enforcing strict formatting rules.

Given Input: test_strings = ["a", "ab", "abb", "abbb", "abbbb", "b"]

Expected Output:

a      -> No match
ab     -> No match
abb    -> No match
abbb   -> Match
abbbb  -> No match
b      -> No match
▼ Hint
  • Use curly braces to specify an exact count: b{3} means exactly three bs.
  • The full pattern ab{3} matches an a followed by exactly three bs.
  • Use re.fullmatch() to ensure strings with more than three bs (like abbbb) are rejected.
  • You can also use the range form {m,n} to match between m and n repetitions if a range is ever needed.
▼ Solution & Explanation
import re
pattern = r"ab{3}"
test_strings = ["a", "ab", "abb", "abbb", "abbbb", "b"]
for s in test_strings:
    result = re.fullmatch(pattern, s)
    print(f"{s:<6} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

  • b{3}: A curly-brace quantifier that matches the character b repeated exactly three times. It is equivalent to writing bbb explicitly, but is more readable and easier to adjust.
  • ab{3}: The {3} applies only to the immediately preceding element, which is b. The a is still a single literal character.
  • Why abbbb fails: re.fullmatch() requires the entire string to match the pattern. Four bs exceed the exact count of three, so the match fails.
  • Range variant: {m,n} matches between m and n repetitions inclusive. For example, b{2,4} would match bb, bbb, or bbbb. Omitting n as in b{2,} means two or more, which behaves like a bounded version of +.

Exercise 6: Match Range of Occurrences

Problem Statement: Write a Python program to match a string that has an a followed by two to three bs (i.e., abb or abbb).

Purpose: This exercise introduces the range form of curly-brace quantifiers, {m,n}, which lets you set a lower and upper bound on repetitions. Range quantifiers are useful when validating fields that must fall within a length window, such as short codes, postal abbreviations, or bounded identifiers.

Given Input: test_strings = ["a", "ab", "abb", "abbb", "abbbb", "b"]

Expected Output:

a      -> No match
ab     -> No match
abb    -> Match
abbb   -> Match
abbbb  -> No match
b      -> No match
▼ Hint
  • The pattern ab{2,3} means: the letter a followed by two to three bs.
  • The {m,n} quantifier is inclusive on both ends, so both abb (two bs) and abbb (three bs) are valid.
  • Use re.fullmatch() to ensure strings with only one b or more than three bs are correctly rejected.
  • Do not add a space between the comma and the numbers inside the curly braces: write {2,3}, not {2, 3}.
▼ Solution & Explanation
import re

pattern = r"ab{2,3}"
test_strings = ["a", "ab", "abb", "abbb", "abbbb", "b"]

for s in test_strings:
    result = re.fullmatch(pattern, s)
    print(f"{s:<6} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

  • b{2,3}: A range quantifier that matches b repeated at least twice and at most three times. Both bounds are inclusive, so two and three are both accepted.
  • Why ab fails: A single b falls below the minimum of two required by {2,3}, so ab does not satisfy the pattern.
  • Why abbbb fails: Four bs exceed the upper bound of three. Because re.fullmatch() requires the entire string to be consumed, the extra b causes the match to fail.
  • Relationship to other quantifiers: {2,3} is the bounded middle ground between exact matching ({3}) and open-ended matching ({2,}, which means two or more). Choosing the right form depends on how tightly you need to constrain the input.

Exercise 7: Find Underscore Joined Lowercase

Problem Statement: Write a Python program to find sequences of lowercase letters joined with an underscore (e.g., hello_world).

Purpose: This exercise practises matching multi-part patterns that involve a separator character between word segments. Recognising underscore-joined identifiers is directly applicable to parsing Python variable names, snake_case tokens in configuration files, and structured log fields.

Given Input: test_strings = ["hello_world", "foo_bar", "hello", "hello_", "_world", "Hello_world", "hello_World"]

Expected Output:

hello_world  -> Match
foo_bar      -> Match
hello        -> No match
hello_       -> No match
_world       -> No match
Hello_world  -> No match
hello_World  -> No match
▼ Hint
  • The pattern needs to match one or more lowercase letters, then a literal underscore, then one or more lowercase letters.
  • Use [a-z]+ to match a sequence of lowercase letters only. This deliberately excludes uppercase letters and digits.
  • Combine the parts as [a-z]+_[a-z]+ and use re.fullmatch() to reject strings with a leading or trailing underscore or any uppercase letters.
▼ Solution & Explanation
import re

pattern = r"[a-z]+_[a-z]+"
test_strings = ["hello_world", "foo_bar", "hello", "hello_", "_world", "Hello_world", "hello_World"]

for s in test_strings:
    result = re.fullmatch(pattern, s)
    print(f"{s:<12} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

  • [a-z]+: A character class that matches one or more lowercase ASCII letters. The + ensures at least one letter must appear on each side of the underscore.
  • _: A literal underscore acting as the required separator between the two lowercase word segments.
  • Why hello_ and _world fail: The pattern requires at least one lowercase letter both before and after the underscore. A trailing or leading underscore with nothing on the other side leaves one side unsatisfied.
  • Why Hello_world and hello_World fail: The character class [a-z] matches only lowercase letters. An uppercase letter anywhere in the string causes re.fullmatch() to return no match.

Exercise 8: PascalCase Match

Problem Statement: Write a Python program to find sequences of one uppercase letter followed by lowercase letters (e.g., Hello, World, Python).

Purpose: This exercise practises combining character classes to enforce a strict positional rule: one thing here, something else there. Matching PascalCase or title-case words is a common requirement when parsing names, class identifiers, or capitalised tokens in natural language processing.

Given Input: test_strings = ["Hello", "World", "python", "HELLO", "Hello123", "H", "Ha"]

Expected Output:

Hello    -> Match
World    -> Match
python   -> No match
HELLO    -> No match
Hello123 -> No match
H        -> No match
Ha       -> Match
▼ Hint
  • Split the pattern into two parts: one character class for the single uppercase letter, and another for the one or more lowercase letters that follow.
  • Use [A-Z] to match exactly one uppercase letter, and [a-z]+ to match one or more lowercase letters.
  • Use re.fullmatch() to reject strings like Hello123 where digits appear after the lowercase letters, and H where no lowercase letters follow.
▼ Solution & Explanation
import re
pattern = r"[A-Z][a-z]+"
test_strings = ["Hello", "World", "python", "HELLO", "Hello123", "H", "Ha"]
for s in test_strings:
    result = re.fullmatch(pattern, s)
    print(f"{s:<8} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

  • [A-Z]: Matches exactly one uppercase ASCII letter. No quantifier is attached, so it cannot match zero or two uppercase letters.
  • [a-z]+: Matches one or more lowercase letters immediately after the uppercase one. The + ensures the string cannot end at just the capital letter.
  • Why H fails: The + on [a-z] requires at least one lowercase letter to follow the capital. A lone uppercase letter does not satisfy the pattern.
  • Why HELLO fails: After matching the first H with [A-Z], the pattern expects lowercase letters. The remaining characters ELLO are uppercase, so [a-z]+ finds nothing to match and the overall match fails.
  • Why Hello123 fails: re.fullmatch() requires the entire string to be consumed. After matching Hello, the digits 123 remain unmatched, causing the full match to fail.

Exercise 9: Match Start and End

Problem Statement: Write a Python program to match a string that starts with a, ends with b, and has any characters in between (e.g., a123b, axyzb).

Purpose: This exercise introduces the dot . wildcard and the use of anchors ^ and $ together with re.match() and re.fullmatch(). Matching by a known start and end while allowing arbitrary content in between is a practical technique used in file extension checks, protocol parsing, and delimiter-bounded field extraction.

Given Input: test_strings = ["a123b", "axyzb", "ab", "a b", "ab ", "b123a", "a123"]

Refer: Python regex re.match() for pattern matching

Expected Output:

a123b  -> Match
axyzb  -> Match
ab     -> Match
a b    -> Match
ab     -> No match (trailing space)
b123a  -> No match
a123   -> No match
▼ Hint
  • The dot . in a regular expression matches any single character except a newline by default.
  • Combine . with * to allow zero or more characters between a and b. This means the string ab with nothing in between is also a valid match.
  • The pattern a.*b used with re.fullmatch() will anchor it to the full string automatically, so explicit ^ and $ anchors are not needed in this case.
▼ Solution & Explanation
import re

pattern = r"a.*b"
test_strings = ["a123b", "axyzb", "ab", "a b", "ab ", "b123a", "a123"]

for s in test_strings:
    result = re.fullmatch(pattern, s)
    print(f"{s:<6} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

  • .: The dot wildcard matches any single character except a newline. It does not represent a literal period; to match a literal dot you would need to escape it as \..
  • .*: Combines the dot with the * quantifier to match zero or more of any character. This allows the middle section of the string to be empty, one character, or arbitrarily long.
  • Why ab matches: The .* portion matches zero characters, so a immediately followed by b satisfies the pattern.
  • Why ab (trailing space) fails: re.fullmatch() requires the entire string to be consumed. The trailing space is not part of the pattern, so the match fails. This highlights how re.fullmatch() is stricter than re.search() or re.match() for boundary checking.
  • Greedy behaviour: By default .* is greedy and will match as many characters as possible while still allowing the overall pattern to succeed. In this pattern it consumes everything up to the last b in the string.

Exercise 10: Match Word at Start

Problem Statement: Write a Python program to match a specific word only if it appears at the very beginning of a string.

Purpose: This exercise introduces the caret anchor ^, which asserts that a match must occur at the start of the string. Start-of-string anchoring is essential in command parsing, log processing, and any situation where the position of a token in a line carries meaning.

Given Input: test_strings = ["Hello world", "Hello", "Say Hello", "hello world", "HelloWorld"]

Expected Output:

Refer: Python regex search

Hello world  -> Match
Hello        -> Match
Say Hello    -> No match
hello world  -> No match
HelloWorld   -> No match
▼ Hint
  • Place ^ at the start of the pattern to anchor it to the beginning of the string.
  • Use a word boundary \b after the word to ensure you are matching the whole word and not just a prefix (e.g., so that HelloWorld is not accepted as a match for Hello).
  • Use re.match() or re.search() with ^ rather than re.fullmatch(), because the string may contain additional content after the target word.
▼ Solution & Explanation
import re
pattern = r"^Hello\b"
test_strings = ["Hello world", "Hello", "Say Hello", "hello world", "HelloWorld"]
for s in test_strings:
    result = re.search(pattern, s)
    print(f"{s:<12} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

  • ^: The start-of-string anchor. It does not consume any characters; it simply asserts that the next part of the pattern must begin at position zero of the string.
  • Hello: A literal sequence of five characters. The match is case-sensitive by default, so hello (all lowercase) does not satisfy this part of the pattern.
  • \b: A word boundary assertion. It matches the position between a word character and a non-word character, ensuring that Hello is treated as a complete word. Without it, HelloWorld would also match because Hello appears at the start.
  • Why re.search() is used here instead of re.fullmatch(): The goal is only to check the beginning of the string. The string may legitimately contain more content after the word, as in Hello world. Using re.fullmatch() would incorrectly reject those valid strings.
  • Why Say Hello fails: Although Hello appears in the string, it is not at the start. The ^ anchor fails at position zero because the string begins with S, not H.

Exercise 11: Match Word at End

Problem Statement: Write a Python program to match a specific word only if it appears at the end of a string, ignoring any optional trailing punctuation.

Purpose: This exercise introduces the dollar anchor $ and combines it with an optional character class to handle real-world strings that may end with punctuation. End-of-string anchoring is commonly used in sentence parsing, command validation, and log line analysis where the final token carries meaning.

Given Input: test_strings = ["I love Python", "Python is great", "I love Python!", "python", "I love Python."]

Expected Output:

I love Python   -> Match
Python is great -> No match
I love Python!  -> Match
python          -> No match
I love Python.  -> Match
▼ Hint
  • Place $ at the end of the pattern to anchor it to the end of the string.
  • To allow optional trailing punctuation, add [.,!?]? just before the $. The ? makes the punctuation character optional.
  • Use a word boundary \b before the target word to avoid matching it as a suffix of a longer word.
  • Use re.search() rather than re.fullmatch(), as the word may be preceded by other content in the string.
▼ Solution & Explanation
import re

pattern = r"\bPython[.,!?]?$"
test_strings = ["I love Python", "Python is great", "I love Python!", "python", "I love Python."]

for s in test_strings:
    result = re.search(pattern, s)
    print(f"{s:<16} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

  • \b: A word boundary assertion placed before Python to ensure the match begins at a word edge and does not accidentally match a longer token like CPython.
  • Python: A literal, case-sensitive sequence. The lowercase string python does not match because regular expressions are case-sensitive by default.
  • [.,!?]?: A character class covering common punctuation marks, made optional by ?. This allows the word to be followed by at most one punctuation character before the end of the string.
  • $: The end-of-string anchor. It asserts that nothing may follow the matched content, so Python is great fails because Python is not at the end.
  • Why re.search() is used: The target word is preceded by other content in most test strings. re.search() scans the entire string for a match at any position, while still respecting the $ anchor to enforce the end-of-string constraint.

Exercise 12: Find a Specific Letter

Problem Statement: Write a Python program to find all words in a string that contain the letter z.

Purpose: This exercise practises using re.findall() to extract multiple matches from a string in a single call. Scanning text for words that contain a specific character is a foundational technique in search tools, spell checkers, and vocabulary analysis.

Given Input: text = "The pizza was amazing but the fizz and buzz were too loud"

Expected Output: ['pizza', 'amazing', 'fizz', 'buzz']

Refer: Python regex find all matches

▼ Hint
  • A word containing z can be broken down as: zero or more word characters, then a z, then zero or more word characters.
  • Use \w to match any word character (letters, digits, and underscores) and combine it with * on each side of the z.
  • Add word boundaries \b on both sides of the pattern to ensure you extract complete words rather than partial substrings.
  • Use re.findall() to return all matches as a list in a single call.
▼ Solution & Explanation
import re

text = "The pizza was amazing but the fizz and buzz were too loud"
pattern = r"\b\w*z\w*\b"

matches = re.findall(pattern, text)
print(matches)Code language: Python (python)

Explanation:

  • \w*: Matches zero or more word characters on either side of the z. Using * rather than + ensures the pattern also captures words where z appears at the very start or end, such as fizz or buzz.
  • z: The literal character being searched for. Because the match is case-sensitive by default, uppercase Z would not be captured. To include both cases you could use [zZ] or pass the re.IGNORECASE flag.
  • \b on both sides: Word boundary assertions ensure the pattern matches complete words only. Without them, the pattern could return partial matches from within longer tokens.
  • re.findall(): Scans the entire string from left to right and returns a list of all non-overlapping matches. This is more concise than manually looping over words and checking each one with re.search().

Exercise 13: Find Letter in Middle

Problem Statement: Write a Python program to find words containing the letter z, but only if the z is not at the start or end of the word.

Purpose: This exercise builds on the previous one by adding positional constraints within a word. Requiring at least one character on both sides of a target letter is a practical technique used in linguistic pattern matching, morphological analysis, and filtering tokens by internal structure.

Given Input: text = "The pizza was amazing but the fizz and buzz were too loud"

Expected Output: ['pizza', 'amazing']

▼ Hint
  • The difference from Exercise 12 is that at least one word character must appear before the z and at least one must appear after it.
  • Replace the \w* on each side with \w+ to enforce a minimum of one character on both sides of the z.
  • Keep the \b word boundaries on the outside so that only complete words are returned.
▼ Solution & Explanation
import re

text = "The pizza was amazing but the fizz and buzz were too loud"
pattern = r"\b\w+z\w+\b"

matches = re.findall(pattern, text)
print(matches)Code language: Python (python)

Explanation:

  • \w+ before z: Requires at least one word character to precede the z. This eliminates words where z is the first letter, since there would be nothing to satisfy the + quantifier before it.
  • \w+ after z: Requires at least one word character to follow the z. This eliminates words like fizz and buzz where z is the final character.
  • Contrast with Exercise 12: Swapping \w* for \w+ on both sides is the only change needed. The * quantifier (zero or more) allowed z at any position; the + quantifier (one or more) forces it into a strictly interior position.
  • Why fizz and buzz are excluded: In both words the z characters appear at the end. After matching the final z, there are no remaining word characters to satisfy \w+, so these words are correctly filtered out.

Exercise 14: Match Adjacent Words

Problem Statement: Write a Python program to match if two consecutive words in a sentence both start with the letter P.

Purpose: This exercise practises matching multi-token patterns separated by whitespace. Detecting adjacent words that share a property is useful in natural language processing tasks such as identifying repeated initials, alliterative phrases, and consecutive proper nouns.

Given Input: test_strings = ["Peter Parker is here", "Paul and Peter met", "Pretty Please", "Python Programming is fun", "No match here"]

Expected Output:

Peter Parker is here      -> Match: Peter Parker
Paul and Peter met        -> No match
Pretty Please             -> Match: Pretty Please
Python Programming is fun -> Match: Python Programming
No match here             -> No match
▼ Hint
  • A word starting with P can be matched with P\w*: a literal uppercase P followed by zero or more word characters.
  • Between the two words there will be one or more whitespace characters. Use \s+ to match that gap.
  • Combine the two word patterns and the whitespace into a single pattern: P\w*\s+P\w*.
  • Use re.search() so the pair can be found anywhere within a longer sentence.
▼ Solution & Explanation
import re

pattern = r"P\w*\s+P\w*"
test_strings = [
    "Peter Parker is here",
    "Paul and Peter met",
    "Pretty Please",
    "Python Programming is fun",
    "No match here"
]

for s in test_strings:
    result = re.search(pattern, s)
    if result:
        print(f"{s:<26} -> Match: {result.group()}")
    else:
        print(f"{s:<26} -> No match")Code language: Python (python)

Explanation:

  • P\w*: Matches any word that begins with an uppercase P, followed by zero or more word characters. Using * rather than + means the single letter P on its own would also be a valid match.
  • \s+: Matches one or more whitespace characters between the two words. Using + rather than a literal space handles edge cases such as multiple spaces or a tab character separating the words.
  • result.group(): Returns the exact substring that was matched. This makes the output more informative by showing precisely which consecutive pair was found.
  • Why Paul and Peter met fails: Although both Paul and Peter start with P, they are not consecutive. The word and sits between them, breaking the adjacency required by the pattern.
  • Case sensitivity note: The pattern only matches words starting with uppercase P. To also match lowercase p, you could use [Pp]\w* or pass the re.IGNORECASE flag to re.search().

Exercise 15: Filter by Starting Letter

Problem Statement: Write a Python program to find all words starting with either a or e in a given string.

Purpose: This exercise practises using alternation inside a character class to match multiple possible starting characters. Filtering words by their first letter is a common requirement in text analysis, vocabulary sorting, concordance building, and educational language tools.

Given Input: text = "an eagle soared above the endless empty arena every afternoon"

Expected Output: ['an', 'eagle', 'above', 'endless', 'empty', 'arena', 'every', 'afternoon']

▼ Hint
  • A word starting with a or e can be matched with a character class at the front: [ae] covers both options in a single concise expression.
  • Follow the character class with \w* to capture the rest of the word after the initial letter.
  • Add a \b word boundary at the start so the pattern matches only at the beginning of a word and not in the middle of one.
  • Use re.findall() to collect all matching words from the string at once.
▼ Solution & Explanation
import re

text = "an eagle soared above the endless empty arena every afternoon"
pattern = r"\b[ae]\w*"

matches = re.findall(pattern, text)
print(matches)Code language: Python (python)

Explanation:

  • \b: A word boundary placed at the start of the pattern ensures that matching begins only at the edge of a word. Without it, the pattern could match e or a appearing in the interior of a longer word.
  • [ae]: A character class that matches either the lowercase letter a or the lowercase letter e. This is more concise than the alternation operator (a|e) for single-character options and integrates naturally with the rest of the pattern.
  • \w*: Matches zero or more word characters following the initial letter, capturing the full remainder of the word. Using * means single-letter words like a are also captured if they appear in the text.
  • re.findall(): Returns every non-overlapping match as a plain list of strings. Because the pattern contains no capturing groups, each element in the list is the full matched word rather than a tuple.
  • Extending the pattern: To match words starting with any vowel, expand the character class to [aeiou]. To make the match case-insensitive and include uppercase initials, either use [aeAE] or pass re.IGNORECASE as a flag to re.findall().

Exercise 16: Validate Alphanumeric ID

Problem Statement: Write a Python program to match a string that contains only uppercase letters, lowercase letters, numbers, and underscores, with no spaces or special characters allowed.

Purpose: This exercise practises building strict allowlist patterns for input validation. Alphanumeric-plus-underscore strings are the standard format for identifiers in most programming languages, database column names, and API keys. Being able to validate this format reliably is a foundational defensive-programming skill.

Given Input: test_strings = ["user_123", "User_Name", "invalid id", "bad-char!", "_leadingUnderscore", "ALL_CAPS_99"]

Expected Output:

user_123           -> Valid
User_Name          -> Valid
invalid id         -> Invalid
bad-char!          -> Invalid
_leadingUnderscore -> Valid
ALL_CAPS_99        -> Valid
▼ Hint
  • The shorthand \w matches any word character, which is exactly the set of letters (upper and lower), digits, and the underscore. This makes it a natural fit for this pattern.
  • Use \w+ with re.fullmatch() to require the entire string to consist of one or more such characters, with nothing else permitted.
  • No additional character class is needed because \w already excludes spaces, hyphens, exclamation marks, and all other special characters.
▼ Solution & Explanation
import re

pattern = r"\w+"
test_strings = ["user_123", "User_Name", "invalid id", "bad-char!", "_leadingUnderscore", "ALL_CAPS_99"]

for s in test_strings:
    result = re.fullmatch(pattern, s)
    print(f"{s:<20} -> {'Valid' if result else 'Invalid'}")Code language: Python (python)

Explanation:

  • \w: A shorthand character class that is equivalent to [a-zA-Z0-9_]. It matches any uppercase or lowercase letter, any digit, and the underscore. It does not match spaces, hyphens, punctuation, or any other special character.
  • \w+: Requires at least one word character. An empty string would not match, which is typically the correct behaviour for an identifier validator.
  • re.fullmatch(): Enforces that every character in the string belongs to \w. A single disallowed character anywhere in the string, such as the space in invalid id or the hyphen in bad-char!, causes the entire match to fail.
  • Why _leadingUnderscore is valid: The underscore is part of the \w character class, so a leading underscore is perfectly acceptable. This aligns with Python’s own identifier rules, where _name is a valid variable name.
  • Alternative approach: You could write the explicit character class [a-zA-Z0-9_]+ instead of \w+. Both are equivalent in standard ASCII contexts, but \w is shorter and more idiomatic.

Exercise 17: Check Starting Number

Problem Statement: Write a Python program to verify if a string starts with a specific number.

Purpose: This exercise practises anchoring a numeric pattern at the start of a string. Detecting a specific leading number is useful in tasks such as validating version strings, parsing log lines that begin with a status code, and routing input based on a numeric prefix.

Given Input: test_strings = ["42 is the answer", "42", "The answer is 42", "420 wide", "142 steps"], target number: 42

Expected Output:

42 is the answer -> Match
42               -> Match
The answer is 42 -> No match
420 wide         -> No match
142 steps        -> No match
▼ Hint
  • Use the ^ anchor to assert that the number must appear at the very beginning of the string.
  • Add a word boundary \b after the number so that 42 does not incorrectly match the start of 420 or 4200.
  • Use re.search() or re.match(). Both respect the ^ anchor, but re.match() implicitly starts at the beginning of the string even without ^.
▼ Solution & Explanation
import re

target = "42"
pattern = rf"^{target}\b"
test_strings = ["42 is the answer", "42", "The answer is 42", "420 wide", "142 steps"]

for s in test_strings:
    result = re.search(pattern, s)
    print(f"{s:<17} -> {'Match' if result else 'No match'}")Code language: Python (python)

Explanation:

  • rf"^{target}\b": An f-string prefixed with both r and f. The r prefix treats backslashes as raw characters (needed for \b), and the f prefix allows the {target} variable to be interpolated into the pattern at runtime. This makes the pattern reusable for any target number without editing the regex directly.
  • ^: Anchors the match to the very start of the string. Strings like The answer is 42 and 142 steps fail immediately because their first character is not a digit matching the target.
  • \b after the number: A word boundary that prevents the pattern from matching 42 at the start of 420. Without this, 420 wide would incorrectly be reported as a match because its first two characters are 42.
  • Why 142 steps fails: The ^ anchor requires the match to start at position zero. The string begins with 1, not 4, so the pattern fails before consuming any characters.

Exercise 18: Number at End

Problem Statement: Write a Python program to check if a string ends with a number.

Purpose: This exercise practises anchoring a numeric pattern at the end of a string using the $ anchor combined with \d. Detecting a trailing number is useful when processing filenames with version suffixes, log entries that end with a numeric code, or any structured string where a numeric tail carries meaning.

Given Input: test_strings = ["version 2", "file_backup_3", "hello", "order 99b", "track5", "2024"]

Expected Output:

version 2     -> Ends with a number
file_backup_3 -> Ends with a number
hello         -> Does not end with a number
order 99b     -> Does not end with a number
track5        -> Ends with a number
2024          -> Ends with a number
▼ Hint
  • Use \d to match any digit character (0-9).
  • Place $ at the end of the pattern to assert that the digit must be the very last character of the string.
  • Use \d+ rather than \d if you want to match one or more trailing digits as a group, though for this check either form works since you only need to confirm the string ends with at least one digit.
▼ Solution & Explanation
import re

pattern = r"\d+$"
test_strings = ["version 2", "file_backup_3", "hello", "order 99b", "track5", "2024"]

for s in test_strings:
    result = re.search(pattern, s)
    if result:
        print(f"{s:<14} -> Ends with a number")
    else:
        print(f"{s:<14} -> Does not end with a number")Code language: Python (python)

Explanation:

  • \d: A shorthand character class that matches any single decimal digit, equivalent to [0-9]. It does not match letters, underscores, or any other character.
  • \d+: Matches one or more consecutive digits. Using + means the pattern captures the full trailing numeric run (e.g., 99 in a future input) rather than just the last digit, which is useful if you later want to extract the value via result.group().
  • $: Anchors the match so the digit sequence must appear at the very end of the string. Combined with re.search(), the engine scans the string for a digit run that terminates exactly at the last position.
  • Why order 99b fails: The string ends with the letter b, not a digit. Even though digits appear earlier in the string, the $ anchor requires the final character to satisfy \d.
  • Extracting the number: If you need the trailing number itself rather than just confirming its presence, replace the print statement with print(result.group()) to display the matched digit sequence.

Exercise 19: Clean IP Addresses

Problem Statement: Write a Python program to remove leading zeros from each segment of an IP address (e.g., convert 192.168.001.001 to 192.168.1.1).

Purpose: This exercise introduces re.sub() with a callable replacement function, a powerful technique that goes beyond simple string substitution. Normalising IP address segments is a practical data-cleaning task encountered in network log processing, configuration file parsing, and input sanitisation pipelines.

Given Input: ip_addresses = ["192.168.001.001", "010.000.000.001", "255.255.255.000", "192.168.1.1"]

Expected Output:

192.168.001.001 -> 192.168.1.1
010.000.000.001 -> 10.0.0.1
255.255.255.000 -> 255.255.255.0
192.168.1.1     -> 192.168.1.1

Refer: Python re.sub() regex replace

▼ Hint
  • Use re.sub() with a pattern that matches each numeric segment and a replacement function that converts the matched string to an integer using int() and then back to a string with str(). Converting to int automatically strips any leading zeros.
  • The pattern \d+ will match each individual numeric segment between the dots, since the dots themselves are not digits and are therefore skipped.
  • Pass a lambda as the replacement argument to re.sub(): lambda m: str(int(m.group())).
▼ Solution & Explanation
import re

ip_addresses = ["192.168.001.001", "010.000.000.001", "255.255.255.000", "192.168.1.1"]

def remove_leading_zeros(ip):
    return re.sub(r"\d+", lambda m: str(int(m.group())), ip)

for ip in ip_addresses:
    cleaned = remove_leading_zeros(ip)
    print(f"{ip:<16} -> {cleaned}")Code language: Python (python)

Explanation:

  • re.sub(pattern, repl, string): Finds every non-overlapping match of pattern in string and replaces each one with the value returned by repl. When repl is a callable rather than a plain string, it receives the match object as its argument and its return value is used as the replacement text.
  • \d+: Matches each run of one or more consecutive digits. Because the dot separators in the IP address are not digits, re.sub() naturally processes each of the four segments independently without needing to split the string manually.
  • lambda m: str(int(m.group())): The replacement function receives a match object m for each segment. m.group() returns the matched text (e.g., "001"), int() converts it to an integer (dropping leading zeros), and str() converts it back to a string for substitution.
  • Why 192.168.1.1 is unchanged: The segments 192, 168, 1, and 1 have no leading zeros, so converting them to int and back to str produces the same value. re.sub() handles already-clean inputs safely.
  • Alternative without a lambda: You could split on ., apply str(int(seg)) to each part in a list comprehension, and rejoin with .. The re.sub() approach is more concise and generalises better to less-regular input formats.

Exercise 20: Convert Date Format

Problem Statement: Write a Python program to convert a date string from yyyy-mm-dd format to dd-mm-yyyy format.

Purpose: This exercise introduces capturing groups in re.sub(), one of the most practical regex techniques for restructuring text. Reformatting dates is a ubiquitous data-wrangling task in ETL pipelines, report generation, and any system that exchanges data between regions with different date conventions.

Given Input: dates = ["2024-01-15", "1999-12-31", "2000-07-04", "2024-11-05"]

Expected Output:

2024-01-15 -> 15-01-2024
1999-12-31 -> 31-12-1999
2000-07-04 -> 04-07-2000
2024-11-05 -> 05-11-2024
▼ Hint
  • Wrap each part of the date in a capturing group using parentheses: one group for the year, one for the month, and one for the day.
  • In the replacement string, refer to the captured groups using backreferences: \1 for the first group (year), \2 for the second (month), and \3 for the third (day).
  • To swap the format, write the replacement string as \3-\2-\1, which places the day first, then the month, then the year.
▼ Solution & Explanation
import re

dates = ["2024-01-15", "1999-12-31", "2000-07-04", "2024-11-05"]
pattern = r"(\d{4})-(\d{2})-(\d{2})"
replacement = r"\3-\2-\1"

for date in dates:
    converted = re.sub(pattern, replacement, date)
    print(f"{date} -> {converted}")Code language: Python (python)

Explanation:

  • (\d{4}): The first capturing group. It matches exactly four consecutive digits and captures them as group 1, representing the year portion of the date.
  • (\d{2}): Used twice: once for the month (group 2) and once for the day (group 3). Each matches exactly two consecutive digits, corresponding to the zero-padded month and day values in the source format.
  • Backreferences in the replacement string: \1, \2, and \3 refer to the text captured by the first, second, and third groups respectively. Writing \3-\2-\1 reorders them to day-month-year without any manual string slicing.
  • Zero-padding is preserved: Because the groups capture the raw digit strings rather than converting them to integers, leading zeros in the month and day (e.g., 01, 07) are carried over unchanged into the output. This is the correct behaviour for date formatting.
  • Named groups as an alternative: For improved readability in complex patterns, you can use named groups: (?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2}) and reference them in the replacement as \g<day>-\g<month>-\g<year>. Named groups make the intent of each captured segment self-documenting.

Exercise 21: Extract 1-3 Digit Numbers

Problem Statement: Write a Python program to search a string and extract all numbers that are between 1 and 3 digits long.

Purpose: This exercise practises combining numeric patterns with range quantifiers and word boundaries to extract only the numbers that satisfy a length constraint. Selective numeric extraction is commonly needed in data parsing tasks where very large numbers (such as timestamps or IDs) must be excluded from results that are intended to capture shorter codes, counts, or scores.

Given Input: text = "There are 3 cats, 12 dogs, 500 fish, 1000 birds, and 42 turtles in the sanctuary"

Expected Output: ['3', '12', '500', '42']

▼ Hint
  • Use \d{1,3} to match a run of one to three digits.
  • Wrap the pattern in word boundaries \b on both sides. Without them, \d{1,3} would match the first three digits of a longer number like 1000 and incorrectly include it in the results.
  • Use re.findall() to return all qualifying matches as a list of strings in a single call.
▼ Solution & Explanation
import re

text = "There are 3 cats, 12 dogs, 500 fish, 1000 birds, and 42 turtles in the sanctuary"
pattern = r"\b\d{1,3}\b"

matches = re.findall(pattern, text)
print(matches)Code language: Python (python)

Explanation:

  • \d{1,3}: A range quantifier that matches a consecutive run of digits with a minimum length of one and a maximum of three. On its own, without boundaries, this is a greedy sub-pattern that will match within any larger digit sequence.
  • \b on both sides: Word boundaries are essential here. They assert that the digit run must be bordered by a non-word character (or the start or end of the string) on each side. This prevents 1000 from being partially matched as 100 and correctly excludes it from the results entirely.
  • Why 1000 is excluded: The number 1000 has four digits. The opening \b anchors the pattern at the start of the digit run, and \d{1,3} can only match up to three of those digits. The closing \b then finds itself between two digit characters, which is not a word boundary, so the match fails for the whole token.
  • re.findall(): Returns each match as a plain string rather than a match object. Because the pattern contains no capturing groups, the full matched text is returned for each hit, giving a clean list of number strings that can be converted to integers with list(map(int, matches)) if needed.

Exercise 22: Search Literal Strings

Problem Statement: Write a Python program to search for a set of specific literal strings within a larger text and report which ones are found and where.

Purpose: This exercise introduces the alternation operator |, which allows a single pattern to match any one of several fixed strings. Searching for multiple literals simultaneously is more efficient than running separate searches and is widely used in keyword filtering, content moderation, and log scanning.

Given Input: text = "The quick brown fox jumps over the lazy dog", target words: fox and dog

Expected Output:

Found "fox" at index 16-19
Found "dog" at index 40-43
▼ Hint
  • Use the alternation operator | to combine the target words into a single pattern: fox|dog.
  • Wrap each alternative in word boundaries to avoid partial matches inside longer words (e.g., matching dog inside hotdog).
  • Use re.finditer() rather than re.findall() so you have access to the match object and can call .start() and .end() on each result to retrieve the exact position.
▼ Solution & Explanation
import re

text = "The quick brown fox jumps over the lazy dog"
pattern = r"\b(fox|dog)\b"

for match in re.finditer(pattern, text):
    print(f'Found "{match.group()}" at index {match.start()}-{match.end()}')Code language: Python (python)

Explanation:

  • fox|dog: The alternation operator | instructs the regex engine to attempt matching fox first and, if that fails at the current position, to attempt dog. The alternatives are evaluated left to right. Any number of alternatives can be chained with additional | characters.
  • Parentheses around the alternatives: The grouping (fox|dog) ensures the | operator applies only between fox and dog, not to any surrounding pattern elements. Without parentheses, a pattern like \bfox|dog\b would be parsed as (\bfox) or (dog\b), producing incorrect boundary behaviour.
  • re.finditer(): Returns an iterator of match objects rather than a list of strings. This gives access to positional metadata for each hit without storing all matches in memory at once, which matters for large texts.
  • match.start() and match.end(): Return the start index (inclusive) and end index (exclusive) of the matched substring within the original string. For fox in this text, start() returns 16 and end() returns 19, meaning the match spans characters at positions 16, 17, and 18.

Exercise 23: Find Pattern Location

Problem Statement: Write a Python program to find a literal string in a text and return its exact starting and ending index position.

Purpose: This exercise focuses on using re.search() to locate a single pattern and then extracting precise position information from the resulting match object. Knowing the exact span of a match is essential in text editors, syntax highlighters, and any tool that needs to annotate or replace a specific region of a string.

Given Input: text = "The quick brown fox jumps over the lazy dog", target: "brown fox"

Expected Output: Found "brown fox" at start=10, end=19

▼ Hint
  • Use re.search() with the literal target string as the pattern. Because you are searching for a fixed phrase with no special regex characters, no escaping is needed for this input, but using re.escape() around the target is a good habit to protect against inputs that contain characters like . or *.
  • Check that the result is not None before accessing the match object.
  • Use .start() and .end() on the match object, or use .span() to get both values as a tuple in a single call.
▼ Solution & Explanation
import re

text = "The quick brown fox jumps over the lazy dog"
target = "brown fox"
pattern = re.escape(target)

match = re.search(pattern, text)
if match:
    print(f'Found "{match.group()}" at start={match.start()}, end={match.end()}')
else:
    print(f'"{target}" not found in the text')Code language: Python (python)

Explanation:

  • re.escape(target): Escapes any characters in the target string that have special meaning in a regular expression, such as ., *, +, or (. For the input "brown fox" this makes no visible difference, but it is the correct practice whenever the search term comes from user input or an external source where special characters cannot be guaranteed absent.
  • re.search(): Scans the entire string from left to right and returns the first match object it finds, or None if the pattern is not present. Unlike re.match(), it does not restrict the search to the start of the string.
  • match.start() and match.end(): Return the zero-based start and end positions of the matched substring. The end index is exclusive, following Python’s standard slice convention. For "brown fox", which begins at position 10, the end value is 19 because the substring occupies indices 10 through 18.
  • match.span() as an alternative: Calling match.span() returns the tuple (start, end) in a single call. This is convenient when you need to pass the position to another function or unpack it with start, end = match.span().

Exercise 24: Find All Substrings

Problem Statement: Write a Python program to find all occurrences of a specific substring within a string using re.findall().

Purpose: This exercise demonstrates how re.findall() handles repeated occurrences of the same substring and builds familiarity with its return behaviour. Counting and collecting all occurrences of a substring is a routine operation in text analysis, frequency counting, and search-and-highlight features.

Given Input: text = "cat and cattle and catfish and catch and tomcat", target: "cat"

Expected Output:

Occurrences of "cat": ['cat', 'cat', 'cat', 'cat', 'cat']
Total count: 5
▼ Hint
  • Pass the literal target string directly to re.findall(). It will return a list containing one entry per occurrence.
  • For this exercise the search is intentionally substring-level, not whole-word, so do not add \b word boundaries. The goal is to find "cat" wherever it appears, including inside words like cattle, catfish, and tomcat.
  • Use len() on the returned list to get the total count without any additional counting logic.
▼ Solution & Explanation
import re

text = "cat and cattle and catfish and catch and tomcat"
target = "cat"
pattern = re.escape(target)

matches = re.findall(pattern, text)
print(f'Occurrences of "{target}": {matches}')
print(f"Total count: {len(matches)}")Code language: Python (python)

Explanation:

  • re.escape(target): Wraps the target string to neutralise any regex metacharacters it might contain. For "cat" this has no effect, but it makes the code robust against targets like "c.t", which without escaping would be interpreted as a regex pattern rather than a literal string.
  • No word boundaries by design: This exercise deliberately omits \b to perform a raw substring search. All five occurrences of "cat" are captured regardless of whether they appear as standalone words (cat), prefixes (cattle, catfish, catch), or suffixes (tomcat). Compare this with Exercise 12, where \b was used to restrict matches to complete words.
  • re.findall() return value: When the pattern contains no capturing groups, re.findall() returns a list of the matched strings themselves. Every element is identical here because the pattern is a fixed literal, but the list length directly encodes the frequency of the substring.
  • len(matches): A straightforward way to obtain the occurrence count without a separate loop or counter variable. It is equivalent to text.count(target) for plain substring counting, but the regex approach scales to more complex patterns where str.count() is not applicable.

Exercise 25: Iterate Matches

Problem Statement: Write a Python program to find the occurrence and position of all matches of a substring within a string using re.finditer().

Purpose: This exercise demonstrates how re.finditer() differs from re.findall() by returning full match objects rather than plain strings. Having access to the position of every occurrence, alongside the matched text itself, is essential in tools that need to annotate, highlight, or replace matches at precise locations within a document.

Given Input: text = "cat and cattle and catfish and catch and tomcat", target: "cat"

Expected Output:

Match 1: "cat" found at position 0-3
Match 2: "cat" found at position 8-11
Match 3: "cat" found at position 19-22
Match 4: "cat" found at position 30-33
Match 5: "cat" found at position 43-46

Refer: Python regex capturing groups

▼ Hint
  • Use re.finditer() in place of re.findall(). It returns an iterator of match objects, each of which carries both the matched text and its position.
  • Use enumerate() on the iterator to get a running match number alongside each match object, which makes the output easier to read.
  • Access match.group() for the matched text, and match.start() and match.end() for the positional span.
▼ Solution & Explanation
import re

text = "cat and cattle and catfish and catch and tomcat"
target = "cat"
pattern = re.escape(target)

for i, match in enumerate(re.finditer(pattern, text), start=1):
    print(f'Match {i}: "{match.group()}" found at position {match.start()}-{match.end()}')Code language: Python (python)

Explanation:

  • re.finditer(): Returns a lazy iterator of match objects rather than materialising all results into a list at once. This is more memory-efficient than re.findall() for large texts, because each match object is produced and processed one at a time.
  • enumerate(..., start=1): Wraps the iterator to produce (counter, match_object) pairs. The start=1 argument makes the counter begin at 1 instead of 0, which reads more naturally in human-facing output like Match 1, Match 2, and so on.
  • match.group(): Returns the exact text that was matched. In this exercise every match is the same string "cat", but for variable patterns this method would return different text for each hit.
  • Contrast with Exercise 24: Both exercises search the same text for the same target. Exercise 24 uses re.findall() to retrieve matched strings and a total count. This exercise uses re.finditer() to retrieve matched strings together with their exact positions. The two functions are complementary: use re.findall() when you only need the values, and re.finditer() when you also need positional information or want to process matches one at a time without building a full list in memory.

Exercise 26: Extract Date from URL

Problem Statement: Write a Python program to extract the year, month, and day components from a URL string formatted as https://example.com/yyyy/mm/dd/article-slug.

Purpose: This exercise practises using multiple capturing groups to pull structured data out of a predictably formatted string. Extracting date segments from URLs is a common task in web scraping, content management systems, and analytics pipelines where publication dates are embedded in permalink structures.

Given Input: urls = ["https://example.com/2026/05/22/my-article", "https://news.site.org/2019/11/03/breaking-story", "https://blog.example.com/2023/07/30/summer-update"]

Expected Output:

URL: https://example.com/2026/05/22/my-article
  Year: 2026 | Month: 05 | Day: 22

URL: https://news.site.org/2019/11/03/breaking-story
  Year: 2019 | Month: 11 | Day: 03

URL: https://blog.example.com/2023/07/30/summer-update
  Year: 2023 | Month: 07 | Day: 30
▼ Hint
  • Use three capturing groups, one for each date component: (\d{4}) for the four-digit year, (\d{2}) for the two-digit month, and (\d{2}) for the two-digit day.
  • Separate the groups with a literal forward slash / to match the URL path structure.
  • Use re.search() to locate the date pattern anywhere within the URL string, then unpack the three groups from the match object using match.groups().
▼ Solution & Explanation
import re

urls = [
    "https://example.com/2026/05/22/my-article",
    "https://news.site.org/2019/11/03/breaking-story",
    "https://blog.example.com/2023/07/30/summer-update"
]

pattern = r"/(\d{4})/(\d{2})/(\d{2})/"

for url in urls:
    match = re.search(pattern, url)
    if match:
        year, month, day = match.groups()
        print(f"URL: {url}")
        print(f"  Year: {year} | Month: {month} | Day: {day}\n")Code language: Python (python)

Explanation:

  • Leading and trailing / in the pattern: The forward slashes outside the capturing groups anchor each date segment within the URL path structure. This prevents the pattern from accidentally matching a four-digit number that appears in a different part of the URL, such as a port number or a numeric slug.
  • (\d{4}), (\d{2}), (\d{2}): Three capturing groups that isolate the year, month, and day respectively. The fixed-width quantifiers mirror the expected format exactly, so a three-digit year or single-digit month would not match.
  • match.groups(): Returns all captured groups as a tuple in the order they appear in the pattern. Unpacking directly into year, month, day gives each value a meaningful name without needing to index into the tuple manually.
  • Zero-padding is preserved: Because the groups capture raw digit strings rather than converting to integers, leading zeros in the month and day (e.g., 05, 03) are retained in the output. This matches the source format and avoids an unintended change in representation.
  • Named groups as an alternative: The pattern could be rewritten as /(?P<year>\d{4})/(?P<month>\d{2})/(?P<day>\d{2})/, after which values can be accessed by name with match.group("year") and so on. Named groups improve readability when a pattern has many components.

Exercise 27: Extract All Numbers

Problem Statement: Write a Python program to separate and extract all numeric values from a mixed string of text and digits.

Purpose: This exercise practises using re.findall() with a digit pattern to strip numbers out of unstructured mixed content. Extracting numeric values from prose is a frequent requirement in data entry parsing, invoice processing, scientific text mining, and any pipeline that ingests human-written content containing figures.

Given Input: text = "In 2024 there were 1200 participants across 3 events, with scores of 98.5, 76, and 100"

Expected Output: ['2024', '1200', '3', '98.5', '76', '100']

▼ Hint
  • To capture both integers and decimal numbers, your pattern needs to handle an optional fractional part: one or more digits, followed optionally by a dot and one or more further digits.
  • The pattern \d+\.?\d* matches a run of digits, an optional dot, and zero or more digits after the dot. This covers integers like 2024 and decimals like 98.5.
  • Use re.findall() to collect all matches as a list of strings in one call.
▼ Solution & Explanation
import re

text = "In 2024 there were 1200 participants across 3 events, with scores of 98.5, 76, and 100"
pattern = r"\d+\.?\d*"

matches = re.findall(pattern, text)
print(matches)Code language: Python (python)

Explanation:

  • \d+: Matches one or more consecutive digit characters before any decimal point. This handles pure integers like 2024, 1200, 3, 76, and 100, and also anchors the start of a decimal number such as 98.5.
  • \.?: Matches a literal dot zero or one time. The backslash is necessary because an unescaped . in a regex pattern matches any character. The ? makes it optional so that integers without a fractional part are still matched.
  • \d*: Matches zero or more digits after the optional dot. Using * rather than + means a trailing dot with no digits (e.g., 98.) is still captured, with the fractional part being empty. If you need to exclude such cases, replace \d* with \d+ and use a full alternation: \d+\.\d+|\d+.
  • Results are strings: re.findall() always returns strings, not numeric types. To work with the values arithmetically, convert them with float(n) or int(n) as appropriate: [float(n) for n in matches].

Exercise 28: Extract Email Addresses

Problem Statement: Write a Python program to extract all valid email addresses from a large block of unstructured text.

Purpose: This exercise practises constructing a multi-part pattern that mirrors the structural rules of a real-world format. Email extraction is one of the most common practical applications of regular expressions, appearing in contact harvesting tools, data cleaning pipelines, and communication platform integrations.

Refer: Regex Special Sequences and Character classes

Given Input:

text = """Please reach out to support@example.com for help.
You can also contact the team at admin.team@company.org or sales@shop.co.uk.
Invalid addresses like @nodomain and user@ should be ignored.
For billing queries write to billing_dept+invoices@finance.example.net."""

Expected Output: ['support@example.com', 'admin.team@company.org', 'sales@shop.co.uk', 'billing_dept+invoices@finance.example.net']

▼ Hint
  • An email address has three structural parts: the local part before the @, the @ symbol itself, and the domain part after it.
  • The local part can contain letters, digits, dots, underscores, hyphens, and plus signs. Match it with a character class such as [\w.+\-]+.
  • The domain part consists of one or more labels separated by dots, where each label contains letters, digits, or hyphens. The top-level domain (e.g., com, org, co.uk) must have at least two characters.
  • Use re.findall() to collect all matches. Wrapping the full pattern in a single group ensures each complete email address is returned as one string.
▼ Solution & Explanation
import re

text = """Please reach out to support@example.com for help.
You can also contact the team at admin.team@company.org or sales@shop.co.uk.
Invalid addresses like @nodomain and user@ should be ignored.
For billing queries write to billing_dept+invoices@finance.example.net."""

pattern = r"[\w.+\-]+@[\w\-]+(?:\.[\w\-]+)*\.[a-zA-Z]{2,}"

matches = re.findall(pattern, text)
print(matches)Code language: Python (python)

Explanation:

  • [\w.+\-]+: Matches the local part of the email address. The character class includes word characters (\w, which covers letters, digits, and underscores), dots, plus signs, and hyphens. The + quantifier requires at least one character, which rejects entries like @nodomain that have nothing before the @.
  • @: A literal at-sign acting as the required separator between the local part and the domain. Its presence is mandatory, so bare strings without @ are never matched.
  • [\w\-]+: Matches the first label of the domain (e.g., example, company, shop). This rejects user@ because there is nothing after the @ to satisfy the + quantifier.
  • (?:\.[\w\-]+)*: A non-capturing group that matches zero or more additional dot-separated domain labels. The ?: prefix means the group is used purely for grouping the alternation without creating a capture that would affect re.findall()‘s return value. This handles subdomains such as finance.example in the final test address.
  • \.[a-zA-Z]{2,}: Matches the mandatory top-level domain: a literal dot followed by at least two letters. This enforces that the address ends with a recognisable TLD (e.g., .com, .org, .uk, .net) and rejects fragments that trail off without one.

Exercise 29: Swap Characters

Problem Statement: Write a Python program to replace all whitespace characters with an underscore, and all underscores with a whitespace, in a single pass over the string.

Purpose: This exercise introduces the technique of performing two simultaneous character substitutions without one replacement interfering with the other. True in-place swapping requires a strategy that distinguishes the original characters from those that have already been substituted, and is a practical problem in slug generation, identifier normalisation, and format conversion pipelines.

Given Input: test_strings = ["hello world", "hello_world", "the quick_brown fox_jumps", "no_change"]

Expected Output:

hello world           -> hello_world
hello_world           -> hello world
the quick_brown fox_jumps -> the_quick brown_fox jumps
no_change             -> no change
▼ Hint
  • Running two separate re.sub() calls in sequence will not work: the second call will undo part of the first. You need to handle both substitutions in a single pass.
  • Use a pattern that matches either a space or an underscore: [ _].
  • Pass a lambda as the replacement argument. Inside the lambda, check what character was matched using m.group() and return the opposite character.
▼ Solution & Explanation
import re

test_strings = [
    "hello world",
    "hello_world",
    "the quick_brown fox_jumps",
    "no_change"
]

def swap(s):
    return re.sub(r"[ _]", lambda m: "_" if m.group() == " " else " ", s)

for s in test_strings:
    print(f"{s:<26} -> {swap(s)}")Code language: Python (python)

Explanation:

  • Why two sequential re.sub() calls fail: If you first replace spaces with underscores and then replace underscores with spaces, the second call converts both the original underscores and the newly inserted ones back to spaces, producing a result with no underscores at all. The single-pass approach avoids this by deciding the replacement for each character before any substitutions have been written back into the string.
  • [ _]: A character class that matches either a single space or a single underscore. Each character is handled individually as the regex engine scans left to right, so mixed strings like the quick_brown fox_jumps are processed correctly in one pass.
  • Lambda as the replacement: The callable form of re.sub() receives a fresh match object for each character hit. The lambda inspects m.group() and returns the opposite character. Because the decision is made per-match before the string is modified, there is no risk of a substituted character being re-evaluated.
  • Alternative using a translation table: For plain character-for-character swaps without regex, Python’s str.translate(str.maketrans(" _", "_ ")) performs the same operation in a single pass and is slightly more efficient for this specific case. The regex approach is shown here because it scales to more complex conditional substitutions that str.translate() cannot handle.

Exercise 30: Replace Multiple Delimiters

Problem Statement: Write a Python program to replace all occurrences of spaces, commas, and dots in a string with a colon.

Purpose: This exercise demonstrates how a single re.sub() call with a character class can replace multiple different delimiters simultaneously, replacing the need for chained str.replace() calls. Normalising mixed delimiters into a single consistent separator is a standard data-cleaning step in CSV processing, configuration parsing, and token splitting.

Given Input: test_strings = ["one two three", "one,two,three", "one.two.three", "one, two. three", "no.delimiters,here today"]

Expected Output:

one two three          -> one:two:three
one,two,three          -> one:two:three
one.two.three          -> one:two:three
one, two. three        -> one::two::three
no.delimiters,here today -> no:delimiters:here:today

Refer:

  • Python regex split
  • Python Regex replace
▼ Hint
  • Use a character class that lists all three target delimiters: [ ,.]. Inside a character class, the dot is treated as a literal character and does not need to be escaped.
  • Pass the plain string ":" as the replacement argument to re.sub(). Every matched delimiter, regardless of which one it is, will be replaced with a colon.
  • Note that a space followed immediately by a comma (or any two adjacent delimiters) will produce two consecutive colons in the output, because each delimiter is replaced independently.
▼ Solution & Explanation
import re

test_strings = [
    "one two three",
    "one,two,three",
    "one.two.three",
    "one, two. three",
    "no.delimiters,here today"
]

pattern = r"[ ,.]"

for s in test_strings:
    result = re.sub(pattern, ":", s)
    print(f"{s:<26} -> {result}")Code language: Python (python)

Explanation:

  • [ ,.]: A character class that matches any one of three characters: a space, a comma, or a dot. Inside a character class, the dot loses its wildcard meaning and is treated as a literal period, so no backslash is needed. Each character in the class is an independent alternative; the regex engine replaces whichever one it encounters at each position.
  • Plain string replacement: The second argument to re.sub() is the string ":". Because every matched delimiter is replaced with the same fixed value, no lambda or backreference is needed, keeping the call simple and readable.
  • Why one, two. three produces double colons: The comma and the space are two separate characters, and each is an independent match. The comma is replaced with : and the space immediately after it is also replaced with :, producing ::. This is the correct and expected behaviour for a per-character replacement. If you want to collapse consecutive delimiters into a single colon, change the pattern to [ ,.]+ to match one or more delimiters as a group.
  • Advantage over chained str.replace(): Replacing three delimiters with str.replace() would require three separate calls: s.replace(" ", ":").replace(",", ":").replace(".", ":"). The re.sub() approach handles all three in a single pass over the string, which is both more concise and more efficient for longer strings or larger sets of delimiters.

Filed Under: Python, Python Exercises, Python RegEx

Did you find this page helpful? Let others know about it. Sharing helps me continue to create free Python resources.

TweetF  sharein  shareP  Pin

About Vishal

I’m Vishal Hule, the Founder of PYnative.com. As a Python developer, I enjoy assisting students, developers, and learners. Follow me on Twitter.

Related Tutorial Topics:

Python Python Exercises Python RegEx

All Coding Exercises:

C Exercises
C++ Exercises
Python Exercises

Python Exercises and Quizzes

Free coding exercises and quizzes cover Python basics, data structure, data analytics, and more.

  • 15+ Topic-specific Exercises and Quizzes
  • Each Exercise contains 25+ questions
  • Each Quiz contains 25 MCQ
Exercises
Quizzes

Leave a Reply Cancel reply

your email address will NOT be published. all comments are moderated according to our comment policy.

Use <pre> tag for posting code. E.g. <pre> Your entire code </pre>

In: Python Python Exercises Python RegEx
TweetF  sharein  shareP  Pin

  Python Exercises

  • All Python Exercises
  • Basic Exercises for Beginners
  • Intermediate Python Exercises
  • Input and Output Exercises
  • Loop Exercises
  • Functions Exercises
  • String Exercises
  • List Exercises
  • Dictionary Exercises
  • Set Exercises
  • Tuple Exercises
  • Data Structure Exercises
  • Date and Time Exercises
  • OOP Exercises
  • File Handling Exercises
  • Iterators & Generators Exercises
  • Regex Exercises
  • Python JSON Exercises
  • Random Data Generation Exercises
  • NumPy Exercises
  • Pandas Exercises
  • Matplotlib Exercises
  • Python Database Exercises

 Explore Python

  • Python Tutorials
  • Python Exercises
  • Python Quizzes
  • Python Interview Q&A
  • Python Programs

All Python Topics

Python Basics Python Exercises Python Quizzes Python Interview Python File Handling Python OOP Python Date and Time Python Random Python Regex Python Pandas Python Databases Python MySQL Python PostgreSQL Python SQLite Python JSON

About PYnative

PYnative.com is for Python lovers. Here, You can get Tutorials, Exercises, and Quizzes to practice and improve your Python skills.

Follow Us

To get New Python Tutorials, Exercises, and Quizzes

  • Twitter
  • Facebook
  • Sitemap

Explore Python

  • Learn Python
  • Python Basics
  • Python Databases
  • Python Exercises
  • Python Quizzes
  • Online Python Code Editor
  • Python Tricks

Coding Exercises

  • C Exercises
  • C++ Exercises
  • Python Exercises

Legal Stuff

  • About Us
  • Contact Us

We use cookies to improve your experience. While using PYnative, you agree to have read and accepted our:

  • Terms Of Use
  • Privacy Policy
  • Cookie Policy

Copyright © 2018–2026 pynative.com