In this article, will learn how to use regular expressions to perform search and replace operations on strings in Python.
Python regex offers sub()
the subn()
methods to search and replace patterns in a string. Using these methods we can replace one or more occurrences of a regex pattern in the target string with a substitute string.
After reading this article you will able to perform the following regex replacement operations in Python.
Operation | Description |
---|---|
re.sub(pattern, replacement, string) | Find and replaces all occurrences of pattern with replacement |
re.sub(pattern, replacement, string, count=1) | Find and replaces only the first occurrences of pattern with replacement |
re.sub(pattern, replacement, string, count=n) | Find and replaces first n occurrences of pattern with the replacement |
Before moving further, let’s see the syntax of the sub()
method.
Table of contents
How to use re.sub()
method
To understand how to use the re.sub()
for regex replacement, we first need to understand its syntax.
Syntax of re.sub()
re.sub(pattern, replacement, string[, count, flags])
The regular expression pattern, replacement, and target string are the mandatory arguments. The count and flags are optional.
pattern
: The regular expression pattern to find inside the target string.- replacement: The replacement that we are going to insert for each occurrence of a pattern. The
replacement
can be a string or function. string
: The variable pointing to the target string (In which we want to perform the replacement).count
: Maximum number of pattern occurrences to be replaced. Thecount
must always be a positive integer if specified. .By default, thecount
is set to zero, which means there.sub()
method will replace all pattern occurrences in the target string.flags
: Finally, the last argument is optional and refers to regex flags. By default, no flags are applied.
There are many flag values we can use. For example, there.I
is used for performing case-insensitive searching and replacing.
Return value
It returns the string obtained by replacing the pattern occurrences in the string with the replacement string. If the pattern isn’t found, the string is returned unchanged.
Now, let’s test this.
Regex example to replace all whitespace with an underscore
Now, let’s see how to use re.sub()
with the help of a simple example. Here, we will perform two replacement operations
- Replace all the whitespace with a hyphen
- Remove all whitespaces
Let’s see the first scenario first.
Pattern to replace: \s
In this example, we will use the \s
regex special sequence that matches any whitespace character, short for [ \t\n\x0b\r\f]
Let’s assume you have the following string and you wanted to replace all the whitespace with an underscore.
target_string = "Jessa knows testing and machine learning"
Example
import re
target_str = "Jessa knows testing and machine learning"
res_str = re.sub(r"\s", "_", target_str)
# String after replacement
print(res_str)
# Output 'Jessa_knows_testing_and_machine_learning'
Regex to remove whitespaces from a string
Now, let’s move to the second scenario, where you can remove all whitespace from a string using regex. This regex remove operation includes the following four cases.
- Remove all spaces, including single or multiple spaces ( pattern to remove
\s+
) - Remove leading spaces ( pattern to remove
^\s+
) - Remove trailing spaces ( pattern to remove
\s+$
) - Remove both leading and trailing spaces. (pattern to remove
^\s+|\s+$
)
Example 1: Remove all spaces
import re
target_str = " Jessa Knows Testing And Machine Learning \t ."
# \s+ to remove all spaces
# + indicate 1 or more occurrence of a space
res_str = re.sub(r"\s+", "", target_str)
# String after replacement
print(res_str)
# Output 'JessaKnowsTestingAndMachineLearning.'
Example 2: Remove leading spaces
import re
target_str = " Jessa Knows Testing And Machine Learning \t ."
# ^\s+ remove only leading spaces
# caret (^) matches only at the start of the string
res_str = re.sub(r"^\s+", "", target_str)
# String after replacement
print(res_str)
# Output 'Jessa Knows Testing And Machine Learning .'
Example 3: Remove trailing spaces
import re
target_str = " Jessa Knows Testing And Machine Learning \t\n"
# ^\s+$ remove only trailing spaces
# dollar ($) matches spaces only at the end of the string
res_str = re.sub(r"\s+$", "", target_str)
# String after replacement
print(res_str)
# Output ' Jessa Knows Testing And Machine Learning'
Example 4: Remove both leading and trailing spaces
import re
target_str = " Jessa Knows Testing And Machine Learning \t\n"
# ^\s+ remove leading spaces
# ^\s+$ removes trailing spaces
# | operator to combine both patterns
res_str = re.sub(r"^\s+|\s+$", "", target_str)
# String after replacement
print(res_str)
# Output 'Jessa Knows Testing And Machine Learning'
Substitute multiple whitespaces with single whitespace using regex
import re
target_str = "Jessa Knows Testing And Machine Learning \t \n"
# \s+ to match all whitespaces
# replace them using single space " "
res_str = re.sub(r"\s+", " ", target_str)
# string after replacement
print(res_str)
# Output 'Jessa Knows Testing And Machine Learning'
Limit the maximum number of pattern occurrences to be replaced
As I told you, the count
argument of the re.sub()
method is optional. The count argument will set the maximum number of replacements that we want to make inside the string. By default, the count
is set to zero, which means the re.sub()
method will replace all pattern occurrences in the target string.
Replaces only the first occurrences of a pattern
By setting the count=1
inside a re.sub()
we can replace only the first occurrence of a pattern in the target string with another string.
Replaces the n occurrences of a pattern
Set the count value to the number of replacements you want to perform.
Now let’s see the example.
Example
import re
# original string
target_str = "Jessa knows testing and machine learning"
# replace only first occurrence
res_str = re.sub(r"\s", "-", target_str, count=1)
# String after replacement
print(res_str)
# Output 'Jessa-knows testing and machine learning'
# replace three occurrence
res_str = re.sub(r"\s", "-", target_str, count=3)
print(res_str)
# Output 'Jessa-knows-testing-and machine learning'
Regex replacement function
We saw how to find and replace the regex pattern with a fixed string in the earlier example. In this example, we see how to replace a pattern with an output of a function.
For example, you want to replace all uppercase letters with a lowercase letter. To achieve this we need the following two things
- A regular expression pattern that matches all uppercase letters
- and the replacement function will convert matched uppercase letters to lowercase.
Pattern to replace: [A-Z]
This pattern will match any uppercase letters inside a target string.
replacement function
You can pass a function to re.sub
. When you execute re.sub()
your function will receive a match object as the argument. If can perform replacement operation by extracting matched value from a match object.
If a replacement is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument and returns the replacement string
So in our case, we will do the followings
- First, we need to create a function to replace uppercase letters with a lowercase letter
- Next, we need to pass this function as the replacement argument to the
re.sub()
- Whenever
re.sub()
matches the pattern, It will send the corresponding match object to the replacement function - Inside a replacement function, we will use the group() method to extract an uppercase letter and convert it into a lowercase letter
Example:
import re
# replacement function to convert uppercase letter to lowercase
def convert_to_lower(match_obj):
if match_obj.group() is not None:
return match_obj.group().lower()
# Original String
str = "Emma LOves PINEAPPLE DEssert and COCONUT Ice Cream"
# pass replacement function to re.sub()
res_str = re.sub(r"[A-Z]", convert_to_lower, str)
# String after replacement
print(res_str)
# Output 'Emma loves pineapple dessert and coconut Ice Cream'
Regex replace group/multiple regex patterns
We saw how to find and replace the single regex pattern in the earlier examples. In this section, we will learn how to search and replace multiple patterns in the target string.
To understand this take the example of the following string
student_names = "Emma-Kelly Jessa Joy Scott-Joe Jerry"
Here, we want to find and replace two distinct patterns at the same time.
We want to replace each whitespace and hyphen(-) with a comma (,) inside the target string. To achieve this, we must first write two regular expression patterns.
- Pattern 1:
\s
matches all whitespaces - Pattern 2:
-
matches hyphen(-)
Example
import re
# Original string
student_names = "Emma-Kelly Jessa Joy Scott-Joe Jerry"
# replace two pattern at the same time
# use OR (|) to separate two pattern
res = re.sub(r"(\s)|(-)", ",", student_names)
print(res)
# Output 'Emma,Kelly,Jessa,Joy,Scott,Joe,Jerry'
Replace multiple regex patterns with different replacement
To understand this take the example of the following string
target_string = "EMMA loves PINEAPPLE dessert and COCONUT ice CREAM"
The above string contains a combination of uppercase and lowercase words.
Here, we want to match and replace two distinct patterns with two different replacements.
- Replace each uppercase word with a lowercase
- And replace each lowercase word with uppercase
So we will first capture two groups and then replace each group with a replacement function. If you don’t know the replacement function please read it here.
Group 1: ([A-Z]+)
- To capture and replace all uppercase word with a lowercase.
- [A-Z] character class means, any character from the capital A to capital Z in uppercase exclusively.
Group 2: ([a-z]+)
- To capture and replace all lowercase word with an uppercase
- [a-z] character class means, match any character from the small case a to z in lowercase exclusively.
Note: Whenever you wanted to capture groups always write them in parenthesis (
, )
.
Example:
import re
# replacement function to convert uppercase word to lowercase
# and lowercase word to uppercase
def convert_case(match_obj):
if match_obj.group(1) is not None:
return match_obj.group(1).lower()
if match_obj.group(2) is not None:
return match_obj.group(2).upper()
# Original String
str = "EMMA loves PINEAPPLE dessert and COCONUT ice CREAM"
# group 1 [A-Z]+ matches uppercase words
# group 2 [a-z]+ matches lowercase words
# pass replacement function 'convert_case' to re.sub()
res_str = re.sub(r"([A-Z]+)|([a-z]+)", convert_case, str)
# String after replacement
print(res_str)
# Output 'emma LOVES pineapple DESSERT AND coconut ICE cream'
RE’s subn() method
The re.subn()
method is the new method, although it performs the same task as the re.sub()
method, the result it returns is a bit different.
The re.subn()
method returns a tuple of two elements.
- The first element of the result is the new version of the target string after all the replacements have been made.
- The second element is the number of replacements it has made
Let’s test this using the same example as before and only replacing the method.
Example
import re
target_string = "Emma loves PINEAPPLE, COCONUT, BANANA ice cream"
result = re.subn(r"[A-Z]{2,}", "MANGO", target_string)
print(result)
# Output ('Emma loves MANGO, MANGO, MANGO ice cream', 3)
Note: Note: I haven’t changed anything in the regular expression pattern, and the resulting string is the same, only that this time it is included in a tuple as the first element of that tuple. Then after the comma, we have the number of replacements being made, and that is three.
We can also use the count argument of the subn()
method. So the value of the second element of the result tuple should change accordingly.
So let’s test this.
Example
import re
target_string = "Emma loves PINEAPPLE, COCONUT, BANANA ice cream"
result = re.subn(r"[A-Z]{2,}", "MANGO", target_string, count=2)
print(result)
# Output ('Emma loves MANGO, MANGO, BANANA ice cream', 2)
Previous: