Python Regex Capturing Groups

In this article, will learn how to capture regex groups in Python. By capturing groups we can match several distinct patterns inside the same target string.

What is Group in Regex?
Example to Capture Multiple Groups
- Access Each Group Result Separately
Regex Capture Group Multiple Times
Extract Range of Groups Matches

What is Group in Regex?

A group is a part of a regex pattern enclosed in parentheses () metacharacter. We create a group by placing the regex pattern inside the set of parentheses ( and ) . For example, the regular expression (cat) creates a single group containing the letters ‘c’, ‘a’, and ‘t’.

For example, in a real-world case, you want to capture emails and phone numbers, So you should write two groups, the first will search email, and the second will search phone numbers.

Also, capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses (, ).

For example, In the expression, ((\w)(\s\d)), there are three such groups

((\w)(\s\d))
(\w)
(\s\d)

We can specify as many groups as we wish. Each sub-pattern inside a pair of parentheses will be captured as a group. Capturing groups are numbered by counting their opening parentheses from left to right.

Capturing groups are a handy feature of regular expression matching that allows us to query the Match object to find out the part of the string that matched against a particular part of the regular expression.

Anything you have in parentheses () will be a capture group. using the group(group_number) method of the regex Match object we can extract the matching value of each group.

We will see how to capture single as well as multiple groups.

Example to Capture Multiple Groups

Let’s assume you have the following string:

target_string = "The price of PINEAPPLE ice cream is 20"Code language: Python (python)

And, you wanted to match the following two regex groups inside a string

To match an UPPERCASE word
To match a number

To extract the uppercase word and number from the target string we must first write two regular expression patterns.

Pattern to match the uppercase word (PINEAPPLE)
Pattern to match the number (20).

The first group pattern to search for an uppercase word: [A-Z]+

[A-Z] is the character class. It means match any letter from the capital A to capital Z in uppercase exclusively.
Then the + metacharacter indicates 1 or more occurrence of an uppercase letter

Second group pattern to search for the price: \d+

The \d means match any digit from 0 to 9 in a target string
Then the + metacharacter indicates number can contain a minimum of 1 or maximum any number of digits.

Extract matched group values

In the end, we can use the groups() and group() method of match object to get the matched values.

Now Let’s move to the example.

Example

import re

target_string = "The price of PINEAPPLE ice cream is 20"

# two groups enclosed in separate ( and ) bracket
result = re.search(r"(\b[A-Z]+\b).+(\b\d+)", target_string)

# Extract matching values of all groups
print(result.groups())
# Output ('PINEAPPLE', '20')

# Extract match value of group 1
print(result.group(1))
# Output 'PINEAPPLE'

# Extract match value of group 2
print(result.group(2))
# Output 20Code language: Python (python)

Let’s understand the above example

First of all, I used a raw string to specify the regular expression pattern. As you may already know, the backslash has a special meaning in some cases because it may indicate an escape character or escape sequence to avoid that we must use raw string.

Now let’s take a closer look at the regular expression syntax to define and isolate the two patterns we are looking to match. We need two things.

First, we need to enclose each of the two patterns inside a pair of parentheses. So (\b[A-Z]+\b) is the first group, and (\b\d+) is the second group in between parentheses. Therefore each pair of parentheses is a group.

Note:

The parentheses are not part of the pattern. It indicates a group.
The \b indicates a word boundary.

Secondly, we need to consider the larger context in which these groups reside. This means that we also care about the location of each of these groups inside the entire target string and that’s why we need to provide context or borders for each group.

Next, I have added .+ at the start of each group. the dot represents any character except a new line and the plus sign means that the preceding pattern is repeating one or more times. This syntax means that before the group, we have a bunch of characters that we can ignore, only take uppercase words followed by the word boundary (whitespace). it will match to PINEAPPLE.

I have also added .+ at the start of the second pattern, it means before the second group, we have a bunch of characters that we can ignore, only take numbers followed by a boundary. it will match to 20.

Next, we passed both the patterns to the re.search() method to find the match.

The groups() method

At last, using the groups() method of a Match object, we can extract all the group matches at once. It provides all matches in the tuple format.

Access Each Group Result Separately

We can use the group() method to extract each group result separately by specifying a group index in between parentheses. Capturing groups are numbered by counting their opening parentheses from left to right. In our case, we used two groups.

Please note that unlike string indexing, which always starts at 0, group numbering always starts at 1.

The group with the number 0 is always the target string. If you call The group() method with no arguments at all or with 0 as an argument you will get the entire target string.

To get access to the text matched by each regex group, pass the group’s number to the group(group_number) method.

So the first group will be a group of 1. The second group will be a group of 2 and so on.

Example

# Extract first group
print(result.group(1))

# Extract second group
print(result.group(2))

# Target string
print(result.group(0))Code language: Python (python)

So this is the simple way to access each of the groups as long as the patterns were matched.

Regex Capture Group Multiple Times

In earlier examples, we used the search method. It will return only the first match for each group. But what if a string contains the multiple occurrences of a regex group and you want to extract all matches.

In this section, we will learn how to capture all matches to a regex group. To capture all matches to a regex group we need to use the finditer() method.

The finditer() method finds all matches and returns an iterator yielding match objects matching the regex pattern. Next, we can iterate each Match object and extract its value.

Note: Don’t use the findall() method because it returns a list, the group() method cannot be applied. If you try to apply it to the findall method, you will get AttributeError: ‘list’ object has no attribute ‘groups.’

So always use finditer if you wanted to capture all matches to the group.

Example

import re

target_string = "The price of ice-creams PINEAPPLE 20 MANGO 30 CHOCOLATE 40"

# two groups enclosed in separate ( and ) bracket
# group 1: find all uppercase letter
# group 2: find all numbers
# you can compile a pattern or directly pass to the finditer() method
pattern = re.compile(r"(\b[A-Z]+\b).(\b\d+\b)")

# find all matches to groups
for match in pattern.finditer(target_string):
    # extract words
    print(match.group(1))
    # extract numbers
    print(match.group(2))Code language: Python (python)

Output

PINEAPPLE
20
MANGO
30
CHOCOLATE
40

Extract Range of Groups Matches

One more thing that you can do with the group() method is to have the matches returned as a tuple by specifying the associated group numbers in between the group() method’s parentheses. This is useful when we want to extract the range of groups.

For example, get the first 5 group matches only by executing the group(1, 5).

Let’s try this as well.

Example

import re

target_string = "The price of PINEAPPLE ice cream is 20"
# two pattern enclosed in separate ( and ) bracket
result = re.search(r".+(\b[A-Z]+\b).+(\b\d+)", target_string)

print(result.group(1, 2))
# Output ('PINEAPPLE', '20')Code language: Python (python)

Previous:

Python Regex Replace

Next:

Regex Metacharacters

Comments

Moses Wuniche says

January 27, 2023 at 6:46 pm

Thank you very much, this really helped me a lot.

Irvine Sunday says

November 24, 2022 at 12:59 am

Thanks

adam says

October 9, 2022 at 3:03 am

Hello.
I was just trying to follow your exercises and faced a situation on the first example on this page. (https://pynative.com/python-regex-capturing-groups/)

I changed the string to target_string = “The 1st price of PINEAPPLE ice cream is 20 and ORANGE, BANANA and VANILIA are 15, 41 and 13.”
Then tested re.search(r"(\b[A-Z]+\b).*(\b\d+\b).*

Python Regex Capturing Groups

Table of contents

What is Group in Regex?

Example to Capture Multiple Groups

Access Each Group Result Separately

Regex Capture Group Multiple Times

Extract Range of Groups Matches

About Vishal

Related Tutorial Topics:

All Coding Exercises:

Python Exercises and Quizzes

About PYnative

Follow Us

Explore Python

Coding Exercises

Legal Stuff

Table of contents

What is Group in Regex?

Example to Capture Multiple Groups

Access Each Group Result Separately

Regex Capture Group Multiple Times

Extract Range of Groups Matches

About Vishal

Related Tutorial Topics:

All Coding Exercises:

Python Exercises and Quizzes

Comments

Leave a Reply Cancel reply

About PYnative

Follow Us

Explore Python

Coding Exercises

Legal Stuff