PYnative

Python Programming

  • Learn Python
  • Exercises
  • Quizzes
  • Code Editor
  • Tricks
Home » Python » Pandas » Set index in pandas DataFrame

Set index in pandas DataFrame

Updated on: March 9, 2023 | Leave a Comment

In this article, we learn how to set the index of the Pandas DataFrame using existing columns or a list of labels. We cover all the cases of replacing the existing row labels or assign new ones.

DataFrame is the tabular structure in the Python pandas library. It represents each row and column by the label. Row label is called an index, whereas column label is called column index/header.

By default, while creating DataFrame, Python pandas assign a range of numbers (starting at 0) as a row index. Row indexes are used to identify each row. We can set a new row index or replace the existing ones using DataFrame.set_index() function, which we discuss further in more detail.

Table of contents

  • The DataFrame.set_index() function
  • Set index using a column
  • Set index using a list
  • Set index using multiple columns
    • Set multi-index using a list and column
    • Set multi-index using two Python series
  • Set index using a Python range
  • Set index but keep column
  • Set index by keeping old index
  • Set index in place
  • Set index using a column with duplicates
  • Set index by column number

The DataFrame.set_index() function

This function is used to re-assign a row label using the existing column of the DataFrame. It can assign one or multiple columns as a row index. Let’s see how to use DataFrame.set_index() function to set row index or replace existing.

Syntax

DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

Parameters

  1. keys: It takes a single or list of column labels to set as an index. It also takes a list of new labels as input.
  2. drop: It is a flag to specify if columns to be used as the new index should be deleted From DataFrame or not.
    By default value is True, i.e., assign the column as an index and delete it.
  3. append: It is a flag to specify whether to append columns to the existing index. By default, it is False, i.e., it replaces the current index rather than appending.
  4. inplace: It is used to specify whether to return a new DataFrame or update an existing one. It is a boolean flag with default False.
  5. verify_integrity: It is a boolean flag,
    – If True, then it checks the new index for duplicates and throws ValueError.
    – If False, then it defers the check until necessary.
    Note: Setting to False will improve the performance of this method.

Return

DataFrame with the changed row labels or None if inplace=True.

Set index using a column

How to set index in pandas DataFrame

  1. Create pandas DataFrame

    We can create a DataFrame from a CSV file or dict.

  2. Identify the columns to set as index

    We can set a specific column or multiple columns as an index in pandas DataFrame. Create a list of column labels to be used to set an index.
    ['col_label1', 'col_label2'...]

  3. Use DataFrame.set_index() function

    We need to pass the column or list of column labels as input to the DataFrame.set_index() function to set it as an index of DataFrame. By default, these new index columns are deleted from the DataFrame.
    df = df.set_index(['col_label1', 'col_label2'…])

  4. Set the index in place

    We can use the parameter inplace to set the index in the existing DataFrame rather than create a new copy.
    df.set_index(inplace=True)

Example

Let’s see how we can set a specific column as an index in the DataFrame. In the below example, we have default index as a range of numbers replaced with set index using first column ‘Name’ of the student DataFrame.

import pandas as pd

student_dict = {'Name': ['Joe', 'Nat', 'Harry'], 'Age': [20, 21, 19], 'Marks': [85.10, 77.80, 91.54]}

# create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print(student_df)

# set index using column
student_df = student_df.set_index('Name')
print(student_df)

Output

Before set index:
     Name  Age  Marks
 0    Joe   20  85.10
 1    Nat   21  77.80
 2  Harry   19  91.54

After set index:
        Age  Marks
 Name             
 Joe     20  85.10
 Nat     21  77.80
 Harry   19  91.54

Set index using a list

As we have seen, we can pass column labels of the DataFrame to assign it as an index of the DataFrame. We can also give a list of labels which can be strings or numbers to DataFrame.set_index() function to set a new index in the DataFrame.

First, we create a Python Index object from a list of labels and then pass it as input to the DataFrame.set_index() function.

Example

import pandas as pd

student_dict = {'Name': ['Joe', 'Nat', 'Harry'], 'Age': [20, 21, 19], 'Marks': [85.10, 77.80, 91.54]}

# create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print(student_df)

index = pd.Index(['s1', 's2', 's3'])
student_df = student_df.set_index(index)
print(student_df)

Output

Before set index:
     Name  Age  Marks
 0    Joe   20  85.10
 1    Nat   21  77.80
 2  Harry   19  91.54

After set index:
      Name  Age  Marks
 s1    Joe   20  85.10
 s2    Nat   21  77.80
 s3  Harry   19  91.54

Set index using multiple columns

Python pandas have DataFrame with multiple columns or rows as an index, and they are also called multi-index DataFrame. If we want to set multiple columns as row labels, we can use DataFrame.set_index() function.

Example

In the below example, we pass a list of existing column labels ‘Name’ and ‘Marks’ to set a multi-level index in the student DataFrame.

Note: It throws KeyError for unknown column labels.

import pandas as pd

student_dict = {'Name': ['Joe', 'Nat', 'Harry'], 'Age': [20, 21, 19], 'Marks': [85.10, 77.80, 91.54]}

# create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print(student_df)

# set multi-index
student_df = student_df.set_index(['Name', 'Marks'])
print(student_df)

Output

 Before set index:
     Name  Age  Marks
 0    Joe   20  85.10
 1    Nat   21  77.80
 2  Harry   19  91.54

After set index:
              Age
 Name  Marks     
 Joe   85.10   20
 Nat   77.80   21
 Harry 91.54   19

Set multi-index using a list and column

If there is a case where we want to create a two-level row index of the DataFrame, where one level is the new list of labels and another level is created from the existing column.

We can use DataFrame.set_index() to set the multi-level index of pandas DataFrame using a combination of a new list and the existing column.

We need to create a Python Index object from a list of new labels and pass that Index object and an existing column label as input to the DataFrame.set_index() function to create a two-level index.

Example

Here, we are passing two parameters to the DataFrame.set_index() function. The first parameter is the Python Index created using multiple strings of size matches to the length of DataFrame. The second parameter is the existing column label ‘Name’ of student DataFrame.

import pandas as pd

student_dict = {'Name': ['Joe', 'Nat', 'Harry'], 'Age': [20, 21, 19], 'Marks': [85.10, 77.80, 91.54]}

# create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print(student_df)

index = pd.Index(['s1', 's2', 's3'])
student_df = student_df.set_index([index, 'Name'])
print(student_df)

Output

Before set index:
     Name  Age  Marks
 0    Joe   20  85.10
 1    Nat   21  77.80
 2  Harry   19  91.54

After set index:
            Age  Marks
    Name             
 s1 Joe     20  85.10
 s2 Nat     21  77.80
 s3 Harry   19  91.54

Set multi-index using two Python series

When we want to replace the existing index with the multiple new series rather than the existing columns, we can create such a multi-index DataFrame by assigning new series using DataFrame.set_index() function.

Example

Let’s see how we can pass two Python series of numbers as a first and second-level index of the DataFrame.

import pandas as pd

student_dict = {'Name': ['Joe', 'Nat', 'Harry'], 'Age': [20, 21, 19], 'Marks': [85.10, 77.80, 91.54]}

# create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print(student_df)

# set multi-index
s = pd.Series([1, 2, 3])
student_df = student_df.set_index([s, s ** 2])
print(student_df)

Output

Before set index:
    Name  Age  Marks
0    Joe   20  85.10
1    Nat   21  77.80
2  Harry   19  91.54

After set index:
      Name  Age  Marks
1 1    Joe   20  85.10
2 4    Nat   21  77.80
3 9  Harry   19  91.54

Set index using a Python range

Suppose we need to set a sequence of numbers as an index of the DataFrame such that it should start at any number. For example, we want to assign a roll number to the student DataFrame beginning from 1.

It is not feasible to pass all the numbers as a list to the DataFrame.set_index() function. In such a case, we can use the Python range() function.

We can create pandas Index using range() function and pass it to the DataFrame.set_index() function.

Example

Let’s see how we can use Python range() function with DataFrame.set_index() to assign a sequential index to the DataFrame.

import pandas as pd

student_dict = {'Name': ['Joe', 'Nat', 'Harry'], 'Age': [20, 21, 19], 'Marks': [85.10, 77.80, 91.54]}

# create DataFrame from dict
student_df = pd.DataFrame(student_dict, index=['s1', 's2', 's3'])
print(student_df)

# set index
index = pd.Index(range(1, 4, 1))
student_df = student_df.set_index(index)
print(student_df)

Output

Before set index: 
      Name  Age  Marks
 s1    Joe   20  85.10
 s2    Nat   21  77.80
 s3  Harry   19  91.54

After set index:
     Name  Age  Marks
 1    Joe   20  85.10
 2    Nat   21  77.80
 3  Harry   19  91.54

Set index but keep column

By default, DataFrame.set_index() function takes column name as input which should be used as an index of the DataFrame. After setting the new index, it deletes the column which is used.

If we do not want to delete such a column from DataFrame, then we need to use the drop parameter of DataFrame.set_index(). It is a boolean flag such that,

  • If drop=True (default case), it deletes the column and uses it as an index.
  • If drop=False, it does not delete the column and uses it as an index.

Example

In the below example, we use the ‘Name’ column as an index of the student DataFrame without deleting it.

import pandas as pd

student_dict = {'Name': ['Joe', 'Nat', 'Harry'], 'Age': [20, 21, 19], 'Marks': [85.10, 77.80, 91.54]}

# create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print(student_df)

# set index, keep column
student_df = student_df.set_index('Name', drop=False)
print(student_df)

Output

Before set index:
     Name  Age  Marks
 0    Joe   20  85.10
 1    Nat   21  77.80
 2  Harry   19  91.54

After set index:
         Name  Age  Marks
 Name                    
 Joe      Joe   20  85.10
 Nat      Nat   21  77.80
 Harry  Harry   19  91.54

Set index by keeping old index

DataFrame.set_index() is used to set a new index to the DataFrame. It is also used to extend the existing DataFrame, i.e., we can update the index by append to the existing index.

We need to use the append parameter of the DataFrame.set_index() function to append the new index to the existing one. By default, the value of append is False.

Example

In the below example, each row has an index like s1, s2, and so on, but we update the index by appending values in the ‘Name’ column.

import pandas as pd

student_dict = {'Name':['Joe','Nat','Harry'], 'Age':[20,21,19], 'Marks':[85.10, 77.80, 91.54]}

# create DataFrame from dict
student_df = pd.DataFrame(student_dict, index = ['s1','s2','s3'])
print(student_df)

# set index by append
student_df = student_df.set_index('Name', append=True)
print(student_df)

Output

Before set index: 
Name  Age  Marks
 s1    Joe   20  85.10
 s2    Nat   21  77.80
 s3  Harry   19  91.54

After set index:
           Age  Marks
    Name             
 s1 Joe     20  85.10
 s2 Nat     21  77.80
 s3 Harry   19  91.54

Set index in place

In the above examples, whenever we executed DataFrame.set_index() operation, pandas created a new copy of DataFrame because the modification is not-in place.

Specify inplace=True to set index in the existing DataFrame rather than creating a copy of it.

  • If inplace=True then it updates the existing DataFrame and does not return anything.
  • If inplace=False then it creates a new DataFrame with updated changes and returns it.

Note: You don’t need to assign the result back to a variable as we are performing modifications in place.

Example

import pandas as pd

student_dict = {'Name': ['Joe', 'Nat', 'Harry'], 'Age': [20, 21, 19], 'Marks': [85.10, 77.80, 91.54]}

# create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print(student_df)

# set index in place
student_df.set_index('Name', inplace=True)
print(student_df)

Output

Before set index:
     Name  Age  Marks
 0    Joe   20  85.10
 1    Nat   21  77.80
 2  Harry   19  91.54

After set index:
        Age  Marks
 Name             
 Joe     20  85.10
 Nat     21  77.80
 Harry   19  91.54

Set index using a column with duplicates

As we have discussed, we need to pass a column name that needs to be used to set row index in the DataFrame.

But, the column may contain duplicate values. By default, DataFrame.set_index() allows duplicate index. If we want to change this behavior, then we can use the verify_integrity parameter of DataFrame.set_index().

  • If verify_integrity=True, then it checks the new index for duplicates and throws ValueError.
  • If verify_integrity=False, then it defers the check until necessary.

Note: Setting to False will improve the performance of this method.

Example

In the below example, we set verify_integrity=True and use the ‘Name’ column to set an index that contains duplicate values.

import pandas as pd

student_dict = {'Name':['Joe','Nat','Joe'], 'Age':[20,21,19], 'Marks':[85.10, 77.80, 91.54]}

# create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print(student_df)

# set index error case
student_df = student_df.set_index('Name', verify_integrity=True)
print(student_df)

Output

ValueError: Index has duplicate keys: Index(['Joe'], dtype='object', name='Name')

Set index by column number

If we need to set single or multiple columns as an index of the DataFrame, but we do not know the column labels to pass to DataFrame.set_index() function. In such a case, we can use the columns parameter of the DataFrame to retrieve the column index position.

We need to create a list of columns using column position df.columns[[0,1]] and pass it to the DataFrame.set_index() function.

Example

In the below example, we set column 0 and 2 as an index of the student DataFrame.

import pandas as pd

student_dict = {'Name':['Joe','Nat','Harry'], 'Age':[20,21,19], 'Marks':[85.10, 77.80, 91.54]}

# create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print(student_df)

# set index
cols = list(student_df.columns[[0,2]])
student_df = student_df.set_index(cols)
print(student_df)

Output

Before set index:
     Name  Age  Marks
 0    Joe   20  85.10
 1    Nat   21  77.80
 2  Harry   19  91.54

After set index:
              Age
 Name  Marks     
 Joe   85.10   20
 Nat   77.80   21
 Harry 91.54   19

Filed Under: Pandas, Python

Did you find this page helpful? Let others know about it. Sharing helps me continue to create free Python resources.

TweetF  sharein  shareP  Pin

About Vishal

Founder of PYnative.com I am a Python developer and I love to write articles to help developers. Follow me on Twitter. All the best for your future Python endeavors!

Related Tutorial Topics:

Pandas Python

Python Exercises and Quizzes

Free coding exercises and quizzes cover Python basics, data structure, data analytics, and more.

  • 15+ Topic-specific Exercises and Quizzes
  • Each Exercise contains 10 questions
  • Each Quiz contains 12-15 MCQ
Exercises
Quizzes

Leave a Reply Cancel reply

your email address will NOT be published. all comments are moderated according to our comment policy.

Use <pre> tag for posting code. E.g. <pre> Your entire code </pre>

Posted In

Pandas Python
TweetF  sharein  shareP  Pin

  Python Pandas

  • Pandas DataFrame
  • Pandas DataFrame from Dict
  • Pandas DataFrame from List
  • Pandas DataFrame head() and tail()
  • Pandas Drop Columns
  • Pandas Drop Duplicates
  • Pandas Drop Columns with NA
  • Pandas Rename columns
  • DataFrame to Python dictionary
  • Pandas Set Index
  • Pandas ReSet Index

About PYnative

PYnative.com is for Python lovers. Here, You can get Tutorials, Exercises, and Quizzes to practice and improve your Python skills.

Explore Python

  • Learn Python
  • Python Basics
  • Python Databases
  • Python Exercises
  • Python Quizzes
  • Online Python Code Editor
  • Python Tricks

Follow Us

To get New Python Tutorials, Exercises, and Quizzes

  • Twitter
  • Facebook
  • Sitemap

Legal Stuff

  • About Us
  • Contact Us

We use cookies to improve your experience. While using PYnative, you agree to have read and accepted our Terms Of Use, Cookie Policy, and Privacy Policy.

Copyright © 2018–2023 pynative.com