How to Make a Column Index in Pandas Dataframe – with Examples

In this short Pandas tutorial, you will learn how to make column index in a dataframe. Standardly, when creating a dataframe, whether from a dictionary or by reading a file (e.g., reading a CSV file, or opening an Excel file) an index column is created. For this reason, we need to either set the specific column we want to be index when creating the file or make one of the columns index later (e.g., after we’ve read a CSV file). Namely, if we want a specific column to be the index in our dataframe. 

Table of Contents

Outline of this Tutorial

For this purpose, the standard functionality of Pandas, this tutorial will teach you how to set which column you want to make index. First of all, you will learn the general syntax of the set_index() method and get the answer to the question, “How do I create an index in Pandas??” followed by the general syntax of the set_index() method. In the following two sections, you will learn how to make a specific column index when reading data from 1) a CSV file and 2) an Excel file. 

Following this, the next two sections will get into more details on how to use set_index() to make 1) a column in the dataframe index, and 2) how to create multiple indexes (MultiIndex) from two columns. First, of all, you need to know the column names (or get column names of the Pandas dataframe) to set a column index.

Make a Column index: Syntax of the set_index() method

Here is how to make a specific column index in Pandas dataframe:

your_df.set_index('Column_to_make_index')Code language: Python (python)

In the code example above, ‘Column_to_make_index’ is the name of the column you want to make the index. As often, when working with Pandas, this will not make the column you chose permanently the dataframes index (to make it permanent you should use the inplace parameter).

Column made as index in Pandas dataframe
  • Save

One neat thing to remember is that set_index() can take multiple columns as the first argument. Here is how to set multiple columns index in the dataframe:

your_df.set_index(['Col1', 'Col2'])Code language: Python (python)

As you may have understood now, Pandas set_index()method can take a string, list, series, or dataframe to make index of your dataframe. Have a look at the documentation for more information.

As you will learn later, index column can also be set while creating or importing data. This, among other things, is what you will learn later in this post.

How do I create an index in Pandas??

To create an index, from a column, in Pandas dataframe you use the set_index() method. For example, if you want the column “Year” to be index you type df.set_index(“Year”)<. Now, the set_index()method will return the modified dataframe as a result. Therefore, you should use the inplace parameter to make the change permanent. The next two sections will teach you to make a column index while importing data.

Make a Column Index when Reading a CSV file with Pandas

To make a specific column index when reading a csv file, we use the index_col parameter. First, let us have a look if we import data from the .csv file with Pandas read_csv method:

import pandas as pd

csv_url = https://raw.githubusercontent.com/marsja/jupyter/master/SimData/FifthDayData.csv'
df = pd.read_csv(csv_url)Code language: Python (python)
Dataframe without setting column as index Pandas
  • Save

The index column was created when importing the data from the file. In this situation, if we want to make a specific column in the CSV file a column, we can change the code above to this:

# Make column index while reading CSV
df = pd.read_csv('https://raw.githubusercontent.com/marsja/jupyter/master/SimData/FifthDayData.csv',
                index_col=0)
df.head()Code language: Python (python)

As a result, we set the “ID” column as index. Here is what the first five rows in the new dataframe, in which we set the index to a column, look like:

The column "ID" was set as index when reading the csv file
  • Save

As you can see, making the “ID” column index was pretty simple. In the next section, you will get an example of importing an Excel file and setting a column as index.  Note it is also possible to use an integer as input as the parameter. For example, instead of using “ID”, in the example above, using “0” would give us the same result (without “”, of course). The next step might be to rename a column in Pandas dataframe (or multiple columns).

How to Set Column as Index when Reading an Excel file with Pandas

Here is how to read an Excel (.xlsx) file using Pandas and making a column of your choice index:

xlsx_url = 'https://github.com/marsja/jupyter/blob/master/SimData/example_concat.xlsx?raw=true'

# Make column index while reading xlsx
df = pd.read_excel(xlsx_url, csv_url
                  index_col='ID')
df.head(Code language: Python (python)

Again, we used the index_col parameter. As a result, this is how the first five rows look like:

Here is the dataframe in which we made "ID" column when reading an Excel file
  • Save

Note, if you want to play around with the Excel file, you can download it from here. Additionally, you can now add an empty column to the dataframe. In the next section, we will look at how to make a column index in an existing dataframe. As an illustration, we will first create a dataframe from a Python dictionary. 

How to Make a Column Index in an Existing Dataframe

In this section, we will look at how to make a column index when we already have a dataframe. First, however, we will create a dataframe from a Python dictionary:

import pandas as pd

data = {'ID':[1, 2, 3, 4, 5],
       'Day':['Sun', 'Mon', 'Tue', 
              'Sun', 'Sat'],
       'Pandas':['GroupBy', 'Histogram',
                'Correlation Matrix', 'Rename Column',
                'Slicing']}

# Create Dataframe from dict
df = pd.DataFrame(data)Code language: Python (python)

In the example above, we first created a dictionary. Each key in the dictionary will become a column. Furthermore, data (or cells) will be the values in the dictionary (i.e., in lists). Here is what the resulting Pandas dataframe looks like:

Pandas dataframe where we are going to make the "ID" column index
  • Save

If you want to, you can add a column to the Pandas dataframe. That is a column that you want to set as index (see next section).

Set a Column Index

Here is how to use Pandas dataframe set_index() method to make a column index:

# make column index
df.set_index('ID', inplace=True)Code language: Python (python)

As previously mentioned, and as you can see in the code example above, we used the inplace parameter and set it to True. Remember, we must do this to make the change (e.g., setting the index) permanent. Here is what the resulting dataframe looks like:

Here is the dataframe after we set the "ID" column as index pandas
  • Save

Note, if you have data containing dates in e.g., string format, it is possible to convert this column to datetime. You can make a datetime index of this column.

Set Multiple Columns Index

In this example, we will make two columns as index in the Pandas dataframe. As you may have understood already, this can be done by merely adding a list of column names to the set_index() method:

df.set_index(['ID', 'Pandas'], inplace=True)Code language: Python (python)

All in all, setting two or more columns index will end up with a MultiIndex dataframe. Look at the image below for an example where we printed the indexes we made above.

indexes (columns)
  • Save

However, It is worth noting that the code example above will not work if you use the previous example. That is, this will only work if we did not set “ID” as index before trying to set both “ID” and “Pandas” as indexes.

That was it, now you know how to set a column as index. There are, of course, other things you can do now. For instance, you can also use reset_index() to revert what you have done in this post (i.e., undo making a column index). Furthermore, you can go on and rename columns in the dataframe, calculate descriptive statistics, creating dummy variables with the get_dummies() method, to name a few things. 

Conclusion

In this Pandas tutorial, you have learned how to make a column index in the dataframe. First, you had a look at the syntax of set_index(). Later on, you learned how to make a column in a data file and index when reading 1) a .csv file, and 2) an Excel file. Finally, in the two last sections, you learned how to 1) make a single-column index and 2) how to make multiple (here two) columns index in an existing dataframe. 

I hope you learned something valuable! If you did, please leave a comment and share the post (press the buttons below) on social media. Finally, you can also comment if something is not working in the post or if you want me to cover something in a future post.

Resources

Here are some great Python tutorials that you might find handy:

  • Save

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top
Share via
Copy link
Powered by Social Snap