PYnative

Python Programming

  • Learn Python
  • Exercises
  • Quizzes
  • Code Editor
  • Tricks
Home » Python » Pandas » Drop columns with NA in pandas DataFrame

Drop columns with NA in pandas DataFrame

Updated on: March 9, 2023 | Leave a Comment

This article covers all the cases to remove columns from pandas DataFrame that contains missing or NA values.

For multiple reasons, it could happen that data in the Dataset is missing or not available. It is a very usual case where we need to clean the data before start analyzing it.

Also, See:

  • Drop columns in pandas DataFrame
  • Drop duplicates in pandas DataFrame

Table of contents

  • The DataFrame.dropna() function
  • Drop column where at least one value is missing
  • Drop column where all values are missing
  • Drop column with the number of NA
  • Drop NA from defined rows
  • Drop column with missing values in place

The DataFrame.dropna() function

We can use this pandas function to remove columns from the DataFrame with values Not Available(NA).

DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

Parameters:

  • axis: It determines the axis to remove. Set it to 1 or column to remove columns containing missing values. By default, it removes rows with NA from DataFrame.
  • how: It takes the following inputs:
    ‘any’: This is the default case to drop the column if it has at least one value missing.
    ‘all’: Drop the column only if it has all the values as NA.
  • thresh: It applies a condition to drop the columns only if it does not contain the required number of values. It takes an int as input.
  • subset: While dropping columns, it is used to specify the list of rows to be considered to find NA.
  • inplace: It is used to specify whether to return a new DataFrame or update an existing one. It is a boolean flag with default False.

Returns:

It returns the DataFrame with dropped NA or None if inplace=True

Drop column where at least one value is missing

There is a case when we cannot process the dataset with missing values. If we need to drop such columns that contain NA, we can use the axis=columns parameter of DataFrame.dropna() to specify deleting the columns.

By default, it removes the column where one or more values are missing.

Example:

In the below example, it drops column ‘marks‘ because it contains NaN.

Drop Column
import pandas as pd
import numpy as np

student_dict = {"name": ["Joe", "Sam", "Harry"], "age": [20, 21, 19], "marks": [85.10, np.nan, 91.54]}

# Create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print(student_df)

# drop column with NaN
student_df = student_df.dropna(axis='columns')

print(student_df)

Output:

Before dropping column NA:
    name  age  marks
0    Joe   20  85.10
1    Sam   21    NaN
2  Harry   19  91.54

After dropping column NA:
    name  age
0    Joe   20
1    Sam   21
2  Harry   19

Drop column where all values are missing

We can drop an empty column from DataFrame using DataFrame.dropna().

We need to use how parameter as follows:

  • If how='all', it drops the column where all the values are NA.
  • By default, how='any', it removes the columns where one or more values are NA.

Example

The below example shows that it only drops the ‘age‘ column where all values are NaN. Other columns are not dropped even if it contains NaN.

import pandas as pd
import numpy as np

student_dict = {"name": ["Joe", "Sam", np.nan, "Harry"], "age": [np.nan, np.nan, np.nan, np.nan],
                "marks": [85.10, np.nan, np.nan, 91.54]}

# Create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print(student_df)

# drop column with NaN
student_df = student_df.dropna(axis='columns', how='all')

print(student_df)

Output:

Before dropping column NA:
    name  age  marks
0    Joe  NaN  85.10
1    Sam  NaN    NaN
2    NaN  NaN    NaN
3  Harry  NaN  91.54

After dropping column NA:
    name  marks
0    Joe  85.10
1    Sam    NaN
2    NaN    NaN
3  Harry  91.54

Drop column with the number of NA

While cleaning the dataset, we can keep the columns with at least some data available in it else drop otherwise.

We need to use the parameter thresh=no_of_nonNA_values of DataFrame.drop() to specify the number of values that must be available in the column. Else, drop the column.

Example

In the below example, we keep the column where at least three or more values are available and drop the column if the condition is not met.

Drop Column with No. of NA
import pandas as pd
import numpy as np

student_dict = {"name": ["Joe", "Sam", np.nan, "Harry"], "age": [np.nan, np.nan, np.nan, np.nan],
                "marks": [85.10, np.nan, np.nan, 91.54]}

# Create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print(student_df)

# keep column with 3 or more non-NA values
student_df = student_df.dropna(axis='columns', thresh=3)

print(student_df)

Output:

Before dropping column NA:
    name  age  marks
0    Joe  NaN  85.10
1    Sam  NaN    NaN
2    NaN  NaN    NaN
3  Harry  NaN  91.54

After dropping column NA:
    name
0    Joe
1    Sam
2    NaN
3  Harry

Drop NA from defined rows

Suppose we are interested in dropping the column only if it contains null values in some particular rows. For example, consider when we need to drop a column if it does not have data in its initial rows.

In such a case, we can use subset=[row1, row2] of DataFrame.dropna() to specify the list of row indexes so that it drops the columns containing missing values in these rows only, i.e., row1 and row2 in this case.

Example

Let’s see how to delete a column only if it contains the empty value in row 0 or 2, otherwise do not delete the column.

Drop Column with NA from subset of Row
import pandas as pd
import numpy as np

student_dict = {"name": ["Joe", "Sam", "Harry"], "age": [np.nan, np.nan, np.nan], "marks": [85.10, np.nan, 91.54]}

# Create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print(student_df)

# drop marks column with NaN
student_df = student_df.dropna(axis='columns', subset=[0, 2])

print(student_df)

Output:

Before dropping column with NA:
    name  age  marks
0    Joe  NaN  85.10
1    Sam  NaN    NaN
2  Harry  NaN  91.54

After dropping column with NA:
    name  marks
0    Joe  85.10
1    Sam    NaN
2  Harry  91.54

Drop column with missing values in place

We can drop columns from the existing DataFrame or by creating a copy of it. For that, we can use a flag inplace of DataFrame.dropna().

  • If the inplace=True, then it updates the DataFrame and returns None.
  • If inplace=False, it returns the updated copy of the DataFrame.

Example

As shown in the below example, we are dropping the column from the existing DataFrame without reassigning it to a new variable.

import pandas as pd
import numpy as np

student_dict = {"name": ["Joe", "Sam", "Harry"], "age": [20, 21, 19], "marks": [85.10, np.nan, 91.54]}

# Create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print(student_df)

# drop marks row with NaN
student_df.dropna(inplace=True)

print(student_df)

Output:

Before dropping row with NA:
    name  age  marks
0    Joe   20  85.10
1    Sam   21    NaN
2  Harry   19  91.54

After dropping row with NA:
    name  age  marks
0    Joe   20  85.10
2  Harry   19  91.54

Filed Under: Pandas, Python

Did you find this page helpful? Let others know about it. Sharing helps me continue to create free Python resources.

TweetF  sharein  shareP  Pin

About Vishal

Founder of PYnative.com I am a Python developer and I love to write articles to help developers. Follow me on Twitter. All the best for your future Python endeavors!

Related Tutorial Topics:

Pandas Python

Python Exercises and Quizzes

Free coding exercises and quizzes cover Python basics, data structure, data analytics, and more.

  • 15+ Topic-specific Exercises and Quizzes
  • Each Exercise contains 10 questions
  • Each Quiz contains 12-15 MCQ
Exercises
Quizzes

Leave a Reply Cancel reply

your email address will NOT be published. all comments are moderated according to our comment policy.

Use <pre> tag for posting code. E.g. <pre> Your entire code </pre>

Posted In

Pandas Python
TweetF  sharein  shareP  Pin

  Python Pandas

  • Pandas DataFrame
  • Pandas DataFrame from Dict
  • Pandas DataFrame from List
  • Pandas DataFrame head() and tail()
  • Pandas Drop Columns
  • Pandas Drop Duplicates
  • Pandas Drop Columns with NA
  • Pandas Rename columns
  • DataFrame to Python dictionary
  • Pandas Set Index
  • Pandas ReSet Index

About PYnative

PYnative.com is for Python lovers. Here, You can get Tutorials, Exercises, and Quizzes to practice and improve your Python skills.

Explore Python

  • Learn Python
  • Python Basics
  • Python Databases
  • Python Exercises
  • Python Quizzes
  • Online Python Code Editor
  • Python Tricks

Follow Us

To get New Python Tutorials, Exercises, and Quizzes

  • Twitter
  • Facebook
  • Sitemap

Legal Stuff

  • About Us
  • Contact Us

We use cookies to improve your experience. While using PYnative, you agree to have read and accepted our Terms Of Use, Cookie Policy, and Privacy Policy.

Copyright © 2018–2023 pynative.com