In this lesson, you will learn how to convert Python List to a pandas DataFrame. It covers creating DataFrame from different types of a list like single list, multiple lists, nested lists. It creates DataFame from a list where a list can be added as a row or a column.
The List is a simple data structure in Python that stores the values as a List. The List can have heterogeneous elements, i.e., it can have values of different types. To analyze such a List, we can convert it into the pandas DataFrame. By converting the List into a 2-dimensional structure makes it efficient to process.
DataFrame can be created from List using DataFrame constructor. This article discusses all the cases of it in detail.
Table of contents
- Create DataFrame from list using constructor
- Create DataFrame from list with a customized column name
- Create DataFrame from list with a customized index
- Create DataFrame from list by changing data type
- Create DataFrame from hierarchical lists as rows
- Create DataFrame from Hierarchical lists as columns
- Create DataFrame from multiple lists
Create DataFrame from list using constructor
DataFrame constructor can create DataFrame from different data structures in python like dict
, list, set, tuple, and ndarray
.
In the below example, we create a DataFrame object using a list of heterogeneous data. By default, all list elements are added as a row in the DataFrame. And row index is the range of numbers(starting at 0).
Example
import pandas as pd
# Create list
fruits_list = ['Apple', 10, 'Orange', 55.50]
print(fruits_list)
# Create DataFrame from list
fruits_df = pd.DataFrame(fruits_list)
print(fruits_df)
Output:
['Apple', 10, 'Orange', 55.5] 0 0 Apple 1 10 2 Orange 3 55.5
Create DataFrame from list with a customized column name
While creating a DataFrame from the list, we can give a customized column label in the resultant DataFrame. By default, it provides a range of integers as column labels, i.e., 0, 1, 2…n.
We can specify column labels into the columns=[col_labels]
parameter in the DataFrame constructor.
Example
In the below example, we create DataFrame from a list of fruit names and provides a column label as “Fruits”.
import pandas as pd
# Create list
fruits_list = ['Apple', 'Banana', 'Orange','Mango']
print(fruits_list)
# Create DataFrame from list
fruits_df = pd.DataFrame(fruits_list, columns=['Fruits'])
print(fruits_df)
Output:
['Apple', 'Banana', 'Orange', 'Mango'] Fruits 0 Apple 1 Banana 2 Orange 3 Mango
Create DataFrame from list with a customized index
As we just discussed the changing column label, we can even customize the row index as well. We can give a meaningful row index to identify each row uniquely. It becomes easier to access the rows using the index label.
We can specify row index into the index=[row_index1, row_index2]
parameter in the DataFrame constructor. By default, it gives a range of integers as row index i.e. 0, 1, 2…n.
Example
Let’s see how we can provide the custom row index while creating DataFrame from the List.
import pandas as pd
# Create list
fruits_list = ['Apple', 'Banana', 'Orange','Mango']
print(fruits_list)
# Create DataFrame from list
fruits_df = pd.DataFrame(fruits_list, index=['Fruit1', 'Fruit2', 'Fruit3', 'Fruit4'])
print(fruits_df)
Output:
['Apple', 'Banana', 'Orange', 'Mango'] 0 Fruit1 Apple Fruit2 Banana Fruit3 Orange Fruit4 Mango
Create DataFrame from list by changing data type
While converting a Python List to the DataFrame, we may need to change the values’ data type.
We can change the data type of the list elements using the dtype
parameter of the DataFrame constructor.
Example
Suppose we have a list of fruit’s prices of type object. But, while creating DataFrame we need to correct its data type to float64. In such case we use dtype
parameter as shown below example.
import pandas as pd
# Create list
price_list = ['50', '100', '60', '20']
print(price_list)
# Create DataFrame from list
price_df = pd.DataFrame(price_list)
print("Data type before : ", price_df.dtypes)
# Create DataFrame from list with type change
price_df = pd.DataFrame(price_list, dtype='float64')
print("Data type after : ", price_df.dtypes)
print(price_df)
Output:
['50', '100', '60', '20'] Data type before : 0 object dtype: object Data type after : 0 float64 dtype: object 0 0 50.0 1 100.0 2 60.0 3 20.0
Create DataFrame from hierarchical lists as rows
It may be possible to have data scattered into multiple lists or in the list of lists, also called a multi-dimensional list. In such a case, We can pass such a list to the DataFrame constructor to convert it into the DataFrame. By default, it adds each list as a row in the resultant DataFrame.
Example
In the below example, we have a list that has lists of fruit names and their prices. DataFrame constructor will add both the lists as a separate row in the resulting DataFrame.
import pandas as pd
# Create list
fruits_list = [['Apple', 'Banana', 'Orange', 'Mango'],[120, 40, 80, 500]]
print(fruits_list)
# Create DataFrame from list
fruits_df = pd.DataFrame(fruits_list)
print(fruits_df)
Output:
[['Apple', 'Banana', 'Orange', 'Mango'], [120, 40, 80, 500]] 0 1 2 3 0 Apple Banana Orange Mango 1 120 40 80 500
Create DataFrame from Hierarchical lists as columns
As discussed in the above section, we have a multi-dimensional list, but we do not want them to add to the DataFrame as a row. Instead, we want to add each list as a separate column in the DataFrame. For that, we need to use the transpose()
function.
In the below example, we have a list of two lists, fruit names and another for the fruits’ price. And we want to add both the list as a separate column in the DataFrame.
import pandas as pd
# Create list
fruits_list = [['Apple', 'Banana', 'Orange', 'Mango'],[120, 40, 80, 500]]
print(fruits_list)
# Create DataFrame from list
fruits_df = pd.DataFrame(fruits_list).transpose()
print(fruits_df)
Output:
[['Apple', 'Banana', 'Orange', 'Mango'], [120, 40, 80, 500]] 0 1 0 Apple 120 1 Banana 40 2 Orange 80 3 Mango 500
Create DataFrame from multiple lists
It is the most common use case in the industry where you have multiple separate lists, and you need to add them as different columns in the DataFrame. This case can be resolved by following two ways:
- using
zip(list1, list2...)
- using
dict { 'col1' : list1, 'col2' : list2}
Example
The below example demonstrates the use of zip()
function to combine multiple lists in one list and pass it to the DataFrame constructor.
import pandas as pd
# Create multiple lists
fruits_list = ['Apple', 'Banana', 'Orange', 'Mango']
price_list = [120, 40, 80, 500]
# Create DataFrame
fruits_df = pd.DataFrame(list(zip(fruits_list, price_list )), columns = ['Name', 'Price'])
print(fruits_df)
Output:
Name Price 0 Apple 120 1 Banana 40 2 Orange 80 3 Mango 500
The below example demonstrates the use of Python dictionary data structure to solve the purpose. Here, column names are keys of the dict and, lists are the values of dict which need to be added in the DataFrame.
import pandas as pd
# Create multiple lists
fruits_list = ['Apple', 'Banana', 'Orange', 'Mango']
price_list = [120, 40, 80, 500]
# Create dict
fruits_dict = {'Name': fruits_list,
'Price': price_list}
print(fruits_dict)
# Create DataFrame from dict
fruits_df = pd.DataFrame(fruits_dict)
print(fruits_df)
Output:
{'Name': ['Apple', 'Banana', 'Orange', 'Mango'], 'Price': [120, 40, 80, 500]} Name Price 0 Apple 120 1 Banana 40 2 Orange 80 3 Mango 500
Leave a Reply