This article covers all the cases to remove columns from pandas DataFrame that contains missing or NA values.
For multiple reasons, it could happen that data in the Dataset is missing or not available. It is a very usual case where we need to clean the data before start analyzing it.
Also, See:
Table of contents
The DataFrame.dropna()
function
We can use this pandas function to remove columns from the DataFrame with values Not Available(NA).
DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
Parameters:
axis
: It determines the axis to remove. Set it to 1 orcolumn
to remove columns containing missing values. By default, it removes rows with NA from DataFrame.how
: It takes the following inputs:
‘any’: This is the default case to drop the column if it has at least one value missing.
‘all’: Drop the column only if it has all the values as NA.thresh
: It applies a condition to drop the columns only if it does not contain the required number of values. It takes an int as input.subset
: While dropping columns, it is used to specify the list of rows to be considered to find NA.inplace
: It is used to specify whether to return a new DataFrame or update an existing one. It is a boolean flag with default False.
Returns:
It returns the DataFrame with dropped NA or None if inplace=True
Drop column where at least one value is missing
There is a case when we cannot process the dataset with missing values. If we need to drop such columns that contain NA, we can use the axis=column
s parameter of DataFrame.dropna()
to specify deleting the columns.
By default, it removes the column where one or more values are missing.
Example:
In the below example, it drops column ‘marks‘ because it contains NaN.

import pandas as pd
import numpy as np
student_dict = {"name": ["Joe", "Sam", "Harry"], "age": [20, 21, 19], "marks": [85.10, np.nan, 91.54]}
# Create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print(student_df)
# drop column with NaN
student_df = student_df.dropna(axis='columns')
print(student_df)
Output:
Before dropping column NA: name age marks 0 Joe 20 85.10 1 Sam 21 NaN 2 Harry 19 91.54 After dropping column NA: name age 0 Joe 20 1 Sam 21 2 Harry 19
Drop column where all values are missing
We can drop an empty column from DataFrame using DataFrame.dropna()
.
We need to use how
parameter as follows:
- If
how='all'
, it drops the column where all the values are NA. - By default,
how='any'
, it removes the columns where one or more values are NA.
Example
The below example shows that it only drops the ‘age‘ column where all values are NaN. Other columns are not dropped even if it contains NaN.
import pandas as pd
import numpy as np
student_dict = {"name": ["Joe", "Sam", np.nan, "Harry"], "age": [np.nan, np.nan, np.nan, np.nan],
"marks": [85.10, np.nan, np.nan, 91.54]}
# Create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print(student_df)
# drop column with NaN
student_df = student_df.dropna(axis='columns', how='all')
print(student_df)
Output:
Before dropping column NA: name age marks 0 Joe NaN 85.10 1 Sam NaN NaN 2 NaN NaN NaN 3 Harry NaN 91.54 After dropping column NA: name marks 0 Joe 85.10 1 Sam NaN 2 NaN NaN 3 Harry 91.54
Drop column with the number of NA
While cleaning the dataset, we can keep the columns with at least some data available in it else drop otherwise.
We need to use the parameter thresh=no_of_nonNA_values
of DataFrame.drop()
to specify the number of values that must be available in the column. Else, drop the column.
Example
In the below example, we keep the column where at least three or more values are available and drop the column if the condition is not met.

import pandas as pd
import numpy as np
student_dict = {"name": ["Joe", "Sam", np.nan, "Harry"], "age": [np.nan, np.nan, np.nan, np.nan],
"marks": [85.10, np.nan, np.nan, 91.54]}
# Create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print(student_df)
# keep column with 3 or more non-NA values
student_df = student_df.dropna(axis='columns', thresh=3)
print(student_df)
Output:
Before dropping column NA: name age marks 0 Joe NaN 85.10 1 Sam NaN NaN 2 NaN NaN NaN 3 Harry NaN 91.54 After dropping column NA: name 0 Joe 1 Sam 2 NaN 3 Harry
Drop NA from defined rows
Suppose we are interested in dropping the column only if it contains null values in some particular rows. For example, consider when we need to drop a column if it does not have data in its initial rows.
In such a case, we can use subset=[row1, row2]
of DataFrame.dropna()
to specify the list of row indexes so that it drops the columns containing missing values in these rows only, i.e., row1 and row2 in this case.
Example
Let’s see how to delete a column only if it contains the empty value in row 0 or 2, otherwise do not delete the column.

import pandas as pd
import numpy as np
student_dict = {"name": ["Joe", "Sam", "Harry"], "age": [np.nan, np.nan, np.nan], "marks": [85.10, np.nan, 91.54]}
# Create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print(student_df)
# drop marks column with NaN
student_df = student_df.dropna(axis='columns', subset=[0, 2])
print(student_df)
Output:
Before dropping column with NA: name age marks 0 Joe NaN 85.10 1 Sam NaN NaN 2 Harry NaN 91.54 After dropping column with NA: name marks 0 Joe 85.10 1 Sam NaN 2 Harry 91.54
Drop column with missing values in place
We can drop columns from the existing DataFrame or by creating a copy of it. For that, we can use a flag inplace
of DataFrame.dropna()
.
- If the
inplace=True
, then it updates the DataFrame and returns None. - If
inplace=False
, it returns the updated copy of the DataFrame.
Example
As shown in the below example, we are dropping the column from the existing DataFrame without reassigning it to a new variable.
import pandas as pd
import numpy as np
student_dict = {"name": ["Joe", "Sam", "Harry"], "age": [20, 21, 19], "marks": [85.10, np.nan, 91.54]}
# Create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print(student_df)
# drop marks row with NaN
student_df.dropna(inplace=True)
print(student_df)
Output:
Before dropping row with NA: name age marks 0 Joe 20 85.10 1 Sam 21 NaN 2 Harry 19 91.54 After dropping row with NA: name age marks 0 Joe 20 85.10 2 Harry 19 91.54
Leave a Reply