PYnative

Python Programming

  • Learn Python
  • Exercises
  • Quizzes
  • Code Editor
  • Tricks
Home » Python » Python Glob: Filename Pattern Matching

Python Glob: Filename Pattern Matching

Updated on: June 17, 2021 | 5 Comments

The glob module, part of the Python Standard Library, is used to find the files and folders whose names follow a specific pattern. The searching rules are similar to the Unix Shell path expansion rules.

After reading this article, you will learn:

  • How to find all files that match the specified pattern
  • How to search files recursively using the glob() function
  • The iglob() to iterate over a list of filenames.
  • Search Files Using Wildcard Characters

The following are the list of functions available in the glob module. we’ll learn each one by one.

FunctionDescription
glob.glob(pathname)Returns a list of files that matches the path specified in the function argument
glob.iglob(pathname)Return a generator object that we can iterate over and get the individual file names
glob.escape(pathname)Useful especially in the case of the filenames with special characters
Python glob module functions
Python glob
Python glob

Table of contents

  • Python glob() Method to Search Files
  • glob() to Search Files Recursively
  • Glob to Search Files Using Wildcard Characters
    • Match Any Character in File Name Using asterisk (*):
    • Search all files and folders in given directory
    • Match Single character in File Name Using Question Mark(?):
    • Match File Name using a Range of Characters
  • iglob() for Looping through the Files
  • Search for Filenames with Special Characters using escape() method
  • glob() Files with Multiple Extensions
  • Using glob() with regex
  • glob for finding text in files
  • Sorting the glob() output
  • Deleting files using glob()
  • scandir() vs glob()

Python glob() Method to Search Files

Using the glob module we can search for exact file names or even specify part of it using the patterns created using wildcard characters.

These patterns are similar to regular expressions but much simpler.

  • Asterisk (*): Matches zero or more characters
  • Question Mark (?) matches exactly one character
  • We can specify a range of alphanumeric characters inside the [].

We need to import Python’s built-in glob module to use the glob() function.

Syntax of glob() function

glob.glob(pathname, *, recursive=False)

Python glob.glob() method returns a list of files or folders that matches the path specified in the pathname argument. This function takes two arguments, namely pathname, and recursive flag.

  • pathname: Absolute (with full path and the file name) or relative (with UNIX shell-style wildcards). We can perform file search by passing the absolute or relative path to the glob() method.
    An absolute path is a path name with a complete directory structure. A relative path is a pathname with one or more wild card characters in the path along with the directory names.
  • recursive: If set to True it will search files recursively.

Example:  Search all .txt files present in the current working directory

Let’s assume the following test files are present in the current working directory.

sales_march.txt
profit_march.txt
sales_april.txt
profit_april.txt
import glob

# relative path to search all text files
files = glob.glob("*.txt")
print(files)

Output:

['profit_april.txt', 'profit_march.txt', 'sales_april.txt', 'sales_march.txt']

Example 2: Search files using a absolute path

Also, you can use the absolute path to search files.

import glob

# absolute path to search all text files inside a specific folder
path = r'E:/performance/pynative/*.txt'
print(glob.glob(path))

glob() to Search Files Recursively

Set recursive=True to search inside all subdirectories. It is helpful If we are not sure exactly in which folder our search term or file is located. it recursively searches files under all subdirectories of the current directory.

The default value of the recursive flag is False. I.e., it will only search in the folder specified in our search path. For example, if our search path is '/sales/abc.jpeg' and you set recursive to True, it will search abc.jpeg under all subfolders of sales.

Use Python 3.5+ to find files recursively using the glob module. The glob module supports the ** directive. When you set a recursive flag to True, the glob method parses the given path look recursively in the directories.

Example to search .txt files under all subdirectories of the current directory.

import glob

# path to search file
path = '**/*.txt'
for file in glob.glob(path, recursive=True):
    print(file)

Output:

profit_april.txt
profit_march.txt
sales_april.txt
sales_march.txt
sales\march_profit_2020.txt
sales\march_sales_2020.txt

Note: If the pathname has **, the method will search for the directories and sub-directories. In a large file structure, this operation will typically consume a lot of time.

Glob to Search Files Using Wildcard Characters

We can use glob() with wildcard characters to search for a folder or file in a multi-level directory. Two wildcards are most commonly used for search operations. Let us see both of them with examples.

WildcardMatchesExample
*Matches everything*.pdf matches all files with the pdf extension
?Matches any single charactersales/??.jpeg matches all files with two characters long present in the sales folder
[]Matches any character in the sequence.[psr]* matches files starting with the letter p, s, or r.
[!]Matches any character not in sequence[!psr]* matches files not starting with the letter p, s, or r.
Glob wildcard characters

Match Any Character in File Name Using asterisk (*):

This wildcard character(*) will return a list of files or folders with zero or more character matches. We can extend our search of the glob() function using the wild character up to multi-level directories.

The following example will return all the files with a .txt extension and further extending the search in the subdirectory levels.

Example:

import glob

# path to search all txt files 
path = "sales/*.txt"
for file in glob.glob(path):
    print(file)

Output:

sales\march_profit_2020.txt
sales\march_sales_2020.txt

Search all files and folders in given directory

Here we will see following three scenarios:

  1. Match every pathname inside a current directory, i.e. We will print all folders and files present inside the current directory
  2. Match every files and folder inside a given directory
  3. Match every files and folder that starts with the word ‘march’
import glob

# using glob to match every pathname
print('Inside current directory')
for item in glob.glob("*"):
    print(item)

# Match every files and folder from a given folder
print('Inside Sales folder')
for item in glob.glob("sales/*"):
    print(item)

print('All files starts with word march')
for item in glob.glob("sales/march*"):
    print(item)

Output:

Inside current directory
sales
glob_demo.py
profit_april.txt
profit_march.txt
sales_april.txt
sales_march.txt

Inside Sales folder
sales\bar.jpeg
sales\chart.jpeg
sales\march_profit_2020.txt
sales\march_sales_2020.txt
sales\p.jpeg

All files starts with word march
sales\march_profit_2020.txt
sales\march_sales_2020.txt

Match Single character in File Name Using Question Mark(?):

This wildcard(?) will return a list of files or folders with exactly one character match. This is generally used to search for a list of filenames, almost similar names with only one or few characters unique.

The following example will return all the files with single character names.

import glob

# path to search single character filename
path = "sales/?.jpeg"
for file in glob.glob(path):
    print(file)

# path to search three-character filename
path = "sales/???.jpeg"
for file in glob.glob(path):
    print(file)

# search file that starts with word 'cha' followed by exact two-character
path = "sales/cha??.txt"
for file in glob.glob(path):
    print(file)

Output:

sales\p.jpeg
sales\bar.jpeg
sales\chart.txt

Match File Name using a Range of Characters

We can give a range of characters or numbers as the search string by enclosing them inside the square brackets ([]).

We can have either alphabets or numbers in the search pattern. The following example will show how to use glob to match files with characters from a-t and a list of files with numerals 2 to 5 in theirs names.

import glob

print(glob.glob("sales/[a-f]*.txt"))

print(glob.glob("sales/[2-5].*"))

Output:

['sales\bar.txt', 'sales\chart.txt']
['sales\2.txt']

iglob() for Looping through the Files

The glob.iglob() works exactly the same as the glob() method except it returns an iterator yielding file names matching the pattern. This method returns an iterator object that we can iterate over and get the individual file names.

Syntax:

glob.iglob(pathname, *, recursive=False)

Return an iterator which yields the same values as glob() without actually storing them all simultaneously.

Why use iglob():

In some scenarios, the number of file or folders to match is high, and you could risk filling up your memory by loading them all using glob(). Instead of that using the iglob(), you can get all matching filenames in the form of an iterator object, which will improve performance.

It means, iglob() returns a callable object which will load results in memory when called. Please refer to this Stackoverflow answer to get to know the performance benefits of iterators.

We can loop through the folders and subfolders to get the list of files in two ways.

Example

import glob

# using iglob
for item in glob.iglob("*.txt"):
    print(item)

# check type
print('glob()')
print(type(glob.glob("*.txt")))

print('iglob()')
print(type(glob.iglob("*.txt")))

Output:

profit_april.txt
profit_march.txt
sales_april.txt
sales_march.txt

glob()
<class 'list'>
iglob()
<class 'generator'>

Search for Filenames with Special Characters using escape() method

In addition to the character and numeric ranges, we have the escape() method to enable the pattern inside the glob() with special characters.

syntax:

glob.escape(pathname)

As the name of the function suggests, this method escapes the special characters in the pathname passed in the argument. This function is useful to search filenames with special characters like _, #, $, etc.

We can use this method along with the glob() while searching for filenames with special characters. Let us see an example to find the files with special characters in their names.

import glob

print("All JPEG's files")
print(glob.glob("*.jpeg"))

print("JPEGs files with special characters in their name")
# set of special characters _, $, #
char_seq = "_$#"
for char in char_seq:
    esc_set = "*" + glob.escape(char) + "*" + ".jpeg"
    for file in (glob.glob(esc_set)):
        print(file)

Output

All JPEG's files
['abc.jpeg', 'y_.jpeg', 'z$.jpeg', 'x#.jpeg'] 

JPEGs files with special characters in their name
y_.jpeg 
z$.jpeg 
x#.jpeg

glob() Files with Multiple Extensions

We can search files having different extensions using the glob module. For example, you wanted to find files having .pdf or .txt extensions from a given folder.

import glob

print("All pdf and txt files")
extensions = ('*.pdf', '*.jpeg')
files_list = []
for ext in extensions:
    files_list.extend(glob.glob(ext))
print(files_list)

Output

['christmas_envelope.pdf', 'reindeer.pdf', '1.jpeg', '2.jpeg', '4.jpeg', '3.jpeg', 'abc.jpeg']

Using glob() with regex

The glob() function internally calls the fnmatch.fnmatch which uses only the following four rules for pattern matching.

If you want to extend file matching with more flexible rules, we can combine the glob with regular expressions.

Consider a folder with jpeg files for employees, and we want to search for an employee whose name matches the user input. We can mention the folder name where the glob has to search and then use the regex search to search pattern.

import glob
import re

num = input('Enter the employee number ')
# [a-z] for any employee name
# {file_name} is the employee number
regex = r'[a-z_]+{file_num}.*'.format(file_num=num)

# search emp jpeg in employees folder
for file in glob.glob("2020/*"):
    if re.search(regex, file):
        print('Employee Photo:', file)

Output:

Enter the employee number 3
Employee Photo: 2020\emp_3.jpeg

glob for finding text in files

The glob module is also handy for finding text in files. We generally use the glob module to find matching file names.

But most of the time, we wanted to replace a specific word from a file. Or we wanted files that contain the exact text, such as user id.

We can follow the below steps to get the files that contain the specific text

  • Use glob to list all files in a directory and its subdirectories that match a file search pattern.
  • Next, read the file and search for the matching text. (You can use regex if you wanted to find a specific pattern in the file)

Example: Search word profit in files

import glob

# Look all txt files of current directory and its sub-directories
path = '**/*.txt'
search_word = 'profit'
# list to store files that contain matching word
final_files = []
for file in glob.glob(path, recursive=True):
    try:
        with open(file) as fp:
            # read the file as a string
            data = fp.read()
            if search_word in data:
                final_files.append(file)
    except:
        print('Exception while reading file')
print(final_files)

Output:

['sales\data_2021.txt']

Sorting the glob() output

We can sort the output files list of the glob() method simply by using the sorted() function.

import glob


path = "*.txt"
print(sorted(glob.glob(path)))

Output:

['profit_april.txt', 'profit_march.txt', 'sales_april.txt', 'sales_march.txt']

We can sort the files based on the date and time of modification by combining the glob() method with the getmtime() method in the os module.

import glob
import os

# List all files and folders in the current  directory
files = glob.glob(os.path.expanduser("*"))

# Sort by modification time (mtime) ascending and descending

files_ascending = sorted(files, key=lambda t: os.stat(t).st_mtime)
print(files_ascending)
files_descending = sorted(files, key=lambda t: -os.stat(t).st_mtime)
print(files_descending)

Output:

['sales_april.txt', 'sales_march.txt', 'profit_april.txt', 'profit_march.txt', 'sales', 'glob_demo.py']
['glob_demo.py', 'sales', 'profit_march.txt', 'profit_april.txt', 'sales_april.txt', 'sales_march.txt']

Deleting files using glob()

We can remove the files from the directories using the glob() method by iterating over the list and then calling the os.remove() for that file.

import glob
import os

# delete all pdf files
for pdf in (glob.glob("2020/*.pdf")):
    # Removing the pdf file from the directory
    print("Removing ", pdf)
    os.remove(pdf)

Output:

Removing  sales\june.pdf

scandir() vs glob()

Both the scandir() and glob() functions are internally searching for the files in a directory that matches a particular pattern.

But scandir() is a generator function that returns an iterator object. The glob() method instead returns a list that consumes a lot of memory.

Filed Under: Python

Did you find this page helpful? Let others know about it. Sharing helps me continue to create free Python resources.

TweetF  sharein  shareP  Pin

About Vishal

Founder of PYnative.com I am a Python developer and I love to write articles to help developers. Follow me on Twitter. All the best for your future Python endeavors!

Related Tutorial Topics:

Python

Python Exercises and Quizzes

Free coding exercises and quizzes cover Python basics, data structure, data analytics, and more.

  • 15+ Topic-specific Exercises and Quizzes
  • Each Exercise contains 10 questions
  • Each Quiz contains 12-15 MCQ
Exercises
Quizzes

Posted In

Python
TweetF  sharein  shareP  Pin

  Python Tutorials

  • Get Started with Python
  • Python Statements
  • Python Comments
  • Python Keywords
  • Python Variables
  • Python Operators
  • Python Data Types
  • Python Casting
  • Python Control Flow statements
  • Python For Loop
  • Python While Loop
  • Python Break and Continue
  • Python Nested Loops
  • Python Input and Output
  • Python range function
  • Check user input is String or Number
  • Accept List as a input from user
  • Python Numbers
  • Python Lists
  • Python Tuples
  • Python Sets
  • Python Dictionaries
  • Python Functions
  • Python Modules
  • Python isinstance()
  • Python Object-Oriented Programming
  • Python Exceptions
  • Python Exercise for Beginners
  • Python Quiz for Beginners

All Python Topics

Python Basics Python Exercises Python Quizzes Python File Handling Python OOP Python Date and Time Python Random Python Regex Python Pandas Python Databases Python MySQL Python PostgreSQL Python SQLite Python JSON

About PYnative

PYnative.com is for Python lovers. Here, You can get Tutorials, Exercises, and Quizzes to practice and improve your Python skills.

Explore Python

  • Learn Python
  • Python Basics
  • Python Databases
  • Python Exercises
  • Python Quizzes
  • Online Python Code Editor
  • Python Tricks

Follow Us

To get New Python Tutorials, Exercises, and Quizzes

  • Twitter
  • Facebook
  • Sitemap

Legal Stuff

  • About Us
  • Contact Us

We use cookies to improve your experience. While using PYnative, you agree to have read and accepted our Terms Of Use, Cookie Policy, and Privacy Policy.

Copyright © 2018–2023 pynative.com