Pandas Tutorial: Renaming Columns in Pandas Dataframe

Renaming columns in Pandas can be a lifesaver when working with data. Sometimes, we might receive data from a colleague or download it from the internet, and the column names might not be informative or consistent with our analysis. This could lead to confusion and errors in data interpretation, so it’s essential to know how to rename columns in a pandas dataframe.

For instance, imagine you are working on a project where you have to analyze data collected from multiple sources. Each source might use different column names, making comparing and combining datasets difficult. Renaming the columns can help standardize the names, making it easier to work with the data.

Thankfully, pandas provide a straightforward way to rename columns in a dataframe, allowing us to manipulate data quickly and efficiently.

This Pandas tutorial will cover the basics of renaming columns in pandas. We will start by exploring how to rename a single column, then move on to renaming multiple columns simultaneously. We will also learn to rename columns while importing data and change column names to lowercase. By the end of this post, you will have a solid understanding of how to rename columns in pandas and how to apply these skills to your data analysis projects.

Table of Contents

Outline

The structure of this post is designed to provide you with a comprehensive guide on how to rename columns in a Pandas dataframe. First, we look at the general syntax of the rename method that we can use to change the name of a variable.

Second, we outline the prerequisites for following this post’s instructions. We then look at the example data we will work with throughout the guide.

The heart of the post lies in teaching you how to rename a column in a Pandas dataframe. Example 1 demonstrates a step-by-step approach to renaming columns, emphasizing practical implementation. In Example 2, we showcase an alternative method of renaming columns.

We also explore the ability to rename columns while importing data, offering insights into streamlining the data preprocessing pipeline. Additionally, we address how to rename grouped columns in a Pandas dataframe, a skill that can improve data organization.

Moreover, we look at changing column names to lowercase, enhancing the consistency and readability of the data. To accommodate different learning preferences, we provide a video tutorial that visually walks you through renaming columns using Pandas.

This post equips you with the knowledge to rename columns effectively within a Pandas dataframe.

How to Rename a Column in Pandas Dataframe

First of all, renaming columns in Pandas dataframe is very simple: to rename a column in a dataframe we can use the rename method:

df.rename(columns={'OldName':'NewName'},
inplace=True)Code language: Python (python)

In the code example above, the column “OldName” will be renamed “NewName”. Furthermore, using the inplace parameter makes this change permanent in the dataframe.

Why Bother with Renaming Columns with Pandas?

Now, when working with a dataset, whether big data or a smaller data set, the columns may have a name that needs to be changed. For instance, if we have scraped our data from HTML tables using Pandas read_html, the column names may not be suitable for displaying our data later. Furthermore, this is, at many times, part of the pre-processing of our data.

pandas rename columns
  • Save

Prerequisites

Now, before learning how to rename columns, we need to have Python 3.x and Pandas installed. Python and Pandas can be installed with a scientific Python distribution, such as Anaconda or ActivePython. On the other hand, Pandas can be installed, as many Python packages, using Pip: pip install pandas. Refer to the blog post about installing Python packages for more information.

If we install Pandas and get a message that there is a newer version of Pip, we can upgrade pip using pip, conda, or Anaconda navigator.

How to Rename Columns in Pandas?

So how do you change column names in Pandas? Well, after importing the data, we can change the column names in the Pandas dataframe by either using df.rename(columns={'OldName':'NewName'}, inplace=True or assigning a list the columns method; df.columns = list_of_new_names.

Example Data

This tutorial will read an Excel file with Pandas to import data. More information about importing data from Excel files can be found in the Pandas read excel tutorial, previously posted on this blog.

import pandas as pd

xlsx_url = 'https://github.com/marsja/jupyter/blob/master/SimData/play_data.xlsx?raw=true'

df = pd.read_excel(xlsx_url, index_col=0)Code language: Python (python)

In the code chunk above, we used the index_col argument because the first column in the Excel file we imported is the index column. If we want to get the column names from the Pandas dataframe we can use df.columns:

df.columnsCode language: Python (python)
  • Save

Now, it is also possible that our data is stored in other formats such as CSV, SPSS, Stata, or SAS. Check out the post on how to use Pandas read_csv to learn more about importing data from .csv files.

How to Rename a Column in Pandas

In the first example, we will learn how to rename a single column in Pandas dataframe. Note, in the code snippet below, we use df.rename to change the name of the column “Subject ID”, and we use the inplace=True to make the change permanent.

# inplace=True to affect the dataframe
df.rename(columns = {'Subject ID': 'SubID'}, 
          inplace=True)

df.head()Code language: Python (python)
pandas dataframe
  • Save

In the code chunk above, we used the Pandas library to rename a column in a dataframe. The dataframe is referred to as df. We used the rename function with the columns argument to specify the current column name and the new name. Specifically, we renamed the Subject ID column to SubID by passing a dictionary with the old column name as the key and the new name as the value: {'Subject ID': 'SubID'}.

To ensure that the change is made in the original dataframe and not a copy, we set the inplace argument to True. This argument allows us to modify the dataframe directly without creating a copy.

To summarize, this code is an example of using Pandas to rename a column in a dataframe, allowing us to customize the column names to suit our needs better.

In the next section, we will look at how to use Pandas’ rename function to rename multiple columns.

How To Rename Columns in Pandas: Example 1

To rename columns in Pandas dataframe we do as follows:

  1. Get the column names by using df.columns (if we don’t know the names)
  2. Use the df.rename, and use a dictionary of the columns we want to rename as input.

Here is a working example of renaming columns in Pandas:

import pandas as pd

xlsx_url = 'https://github.com/marsja/jupyter/blob/master/SimData/play_data.xlsx?raw=true'
df = pd.read_excel(xlsx_url, index_col=0)
print(df.columns)

df.rename(columns = {'RT':'ResponseTime', 
                    'First Name':'Given Name',
                    'Subject ID':'SubID'},
         inplace=True)Code language: Python (python)
pandas rename column
  • Save

In the code chunk above, we imported the pandas library as pd and used it to read an Excel file from a URL. Note that we stored the URL in the variable xlsx_url and passed it as an argument to the pd.read_excel() function. We also set index_col=0 to use the first column of the Excel file as the index of the resulting dataframe, which we assigned to the variable df. We then printed the column names of the dataframe using df.columns. Next, we used the df.rename() function to rename three columns of the dataframe. We passed a dictionary as an argument to the columns parameter of the rename function. The keys of the dictionary represent the old column names, while the values represent the new column names. We used the inplace=True parameter to modify the original dataframe, rather than returning a new one. The columns that were renamed were ‘RT‘ to 'ResponseTime', 'First Name' to 'Given Name‘, and 'Subject ID' to 'SubID'.

Renaming Columns in Pandas Example 2

Another example of how to rename many columns in Pandas dataframe is to assign a list of new column names to df.columns:

import pandas as pd

xlsx_url = 'https://github.com/marsja/jupyter/blob/master/SimData/play_data.xlsx?raw=true'
df = pd.read_excel(xlsx_url, index_col=0)

# New column names
new_cols = ['SubID', 'Given Name', 'Day', 'Age', 'ResponseTime', 'Gender']

# Renaming the columns
df.columns = new_cols

df.head()Code language: Python (python)
changing name on multiple columns in pandas dataframe
  • Save
resulting dataframe with renamed columns

In the code chunk above, we first create a variable xlsx_url that links to an Excel file stored on GitHub. Then, we use the pd.read_excel() function from the Pandas package to read the Excel file and store it as a dataframe in the variable df. The index_col=0 argument specifies that the first column of the Excel file should be used as the index column of the dataframe.

Next, we create a list called new_cols containing the new column names we want to assign to the dataframe columns. We then use the df.columns attribute to access the column names of the dataframe, and assign the new column names to the dataframe columns using the = operator and the new_cols list.

Renaming Columns while Importing Data

This section will teach us how to rename columns while reading the Excel file. Now, this is also very simple. To accomplish this, we create a list before we use the read_excel method. Note we need to add a column name in the list for the index column.

import pandas as pd

xlsx_url = 'https://github.com/marsja/jupyter/blob/master/SimData/play_data.xlsx?raw=true'

# New column names
new_cols = ['Index', 'Subject_ID', 'Given Name', 'Day', 'Age', 'ResponseTime', 'Gender']

df = pd.read_excel(xlsx_url, names=new_cols, index_col=0)

df.columnsCode language: Python (python)
renaming columns in pandas
  • Save
new column names

In the code chunk above, we first create a list called new_cols that contains the new column names we want to assign to our dataframe. We include the existing column names we want to keep and any new ones we want to add.

Next, we use the “pd.read_excel” function to read the Excel file located at the “xlsx_url" and create a pandas dataframe. We also specify the names parameter as the list of new column names we just created, and set the index column as the first column in the dataframe using the index_col parameter.

By doing this, we can ensure that the column names match our desired naming convention and that the correct column is used as the index column.

Importantly, when changing the name of the columns while reading the data, we need to know the number of columns before we load the data.

  • Save

In the following example, we will learn how to rename grouped columns in Pandas dataframe.

Renaming Grouped Columns in Pandas Dataframe

In this section, we will rename grouped columns in Pandas dataframe. First, we will use Pandas groupby method (if needed, check the post about Pandas groupby method for more information). Second, we will rename the grouped columns using Python list comprehension and df.columns, among other dataframe methods.

import pandas as pd
import numpy as np

xlsx_url = 'https://github.com/marsja/jupyter/blob/master/SimData/play_data.xlsx?raw=true'
df = pd.read_excel(xlsx_url, index_col=0)

grouped = df.groupby('Day').agg({'RT':[np.mean, np.median, np.std]})
groupedCode language: Python (python)
renaming columns in grouped dataframe
  • Save
Grouped dataframe

As you can see in the image above, we have a dataframe with multiple indexes. In the next code chunk, however, we will rename the grouped dataframe.

grouped.columns =  ['_'.join(x) for x in grouped.columns.ravel()]
groupedCode language: Python (python)
renaming columns in grouped pandas dataframe
  • Save
Renamed column names in Pandas

Note, in the code above, we also used Pandas ravel method to flatten the output to an ndarray. Now, there are other ways we can change the name of columns in Pandas dataframe. For instance, if we only want to change the column names so that the names are in lower case, we can use str.lower.

df.rename(columns=str.lower).head()Code language: Python (python)

Importantly, if we want the change to be permanent, we must add the inplace=True argument.

Changing Column Names in Pandas Dataframe to Lowercase

To change all column names to lowercase, we can use the following code:

df.rename(columns=str.lower)Code language: Python (python)
changing column names in pandas
  • Save
Column names changed to lowercase

Now, this is one way to preprocess data in Python with pandas. In another post on this blog, we can learn about data cleaning in Python with Pandas and Pyjanitor.

Video Guide: Pandas Rename Column(s)

If you prefer to learn audiovisually, here is a YouTube Tutorial covering changing the variable names in Pandas dataframe.

Conclusion: How to use Pandas to Rename a Column

In this post, we have explored how to use Pandas to rename a column in a dataframe. Renaming columns is essential in data analysis, making the dataset more readable and understandable. Sometimes, we may need to rename columns if we have data from different sources with different naming conventions or if we receive data from other researchers who used different naming schemes.

We discussed the importance of renaming variables and reviewed the prerequisites needed to perform this task in Pandas. Next, we explored different methods to rename columns in Pandas, including renaming single and multiple columns using dictionaries.

We also covered renaming columns while importing data and renaming grouped columns in a dataframe. Additionally, we learned how to change column names to lowercase using Pandas.

Renaming columns is an easy process in Pandas that can significantly improve the readability and organization of data. With the knowledge gained from this post, readers can confidently and efficiently rename columns in their datasets to meet their specific needs.

In conclusion, renaming columns is critical for anyone working with data. Using Pandas to rename columns, data analysts and scientists can make their datasets more organized and easily understood. With the examples and techniques this post covers, readers can now master the art of renaming columns in their Pandas dataframes.

So, there you have it! Now you know how to use Pandas to rename a column and can easily handle any dataset. Do not forget to share this post with your colleagues and friends, as well as comment below if you have any questions or suggestions for future tutorials.

Resources

Here are some more Python and Pandas tutorials that you may find helpful:

  • Save

2 thoughts on “Pandas Tutorial: Renaming Columns in Pandas Dataframe”

  1. import pandas as pds

    xlsx_url=’SharepointlinkThatHasExcelFile’

    df = pd.read_excel(xlsx_url, index_col=0)
    I got this error
    HTTPError: HTTP Error 403: Forbidden

    1. Hey Michelle!

      I don’t know the URL or file you are trying to read using Pandas. However, at the University I work, you must log in to Sharepoint to access the file. It might be possible to do this using Python, but not with the Pandas package alone (you might need to log in using another package). But I might be wrong (especially in your case. E.g., he file might be open to the internet) and it might have to do something with that Sharepoint is blocking the file. In any case, the error suggests that the server refuses your connection.

      Best,

      Erik

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top
Share via
Copy link
Powered by Social Snap