Your Guide to Reading Excel (xlsx) Files in Python

In this brief Python tutorial, we will learn how to read Excel (xlsx) files using Python. Specifically, we will read xlsx files in Python using the Python module openpyxl. First, we start by the simplest example of reading a xlsx file in Python. Second, we will learn how to read multiple Excel files using Python.

In previous posts, we have learned how to use Pandas read_excel method to import xlsx files with Python. As previously mentioned, however, we will use another package called openpyxl in this post. In the next paragraph, we will learn how to install openpyxl.

Table of Contents

Openpyxl Syntax

Basically, here’s the simplest form of using openpyxl for reading a xlsx file in Python:

import openpyxl
from pathlib import Path

xlsx_file = Path('SimData', 'play_data.xlsx')
wb_obj = openpyxl.load_workbook(xlsx_file) 

# Read the active sheet:
sheet = wb_obj.activeCode language: Python (python)
how to read xlsx files in python
  • Save

It is, of course, also possible to learn how to read, write, and append to files in Python (e.g., text files). Make sure to check that post out, as well.

Prerequisites: Python and Openpyxl

Now, before we will learn what Openpyxl is, we need to make sure that we have both Python 3 and the module openpyxl installed. One easy way to install Python is to download a Python distribution such as Anaconda or ActivePython. Openpyxl, on the other hand, can, as with many Python packages, be installed using both pip and conda. Now, using pip we type the following in a command prompt or terminal window, pip install openpyxl and using conda we type this; conda install openpyxl. Note, sometimes, when we install Python packages with pip, we may notice that we don’t have the latest version of pip. Luckily, it is quite easy to upgrade pip to the latest version using pip.

learn all about reading excel files in python
  • Save
Example file 1 (xlsx)
What is the use of Openpyxl in Python?

Openpyxl is a Python module that reads and writes Excel (with extension xlsx/xlsm/xltx/xltm) files. Furthermore, this module enables a Python script to modify Excel files. For instance, if we want to go through thousands of rows but just read specific data points and make small changes to these points, we can do this based on some criteria with openpyxl.

How do I read an Excel (xlsx) File in Python?

Now, the general method for reading xlsx files in Python (with openpyxl) is to import openpyxl (import openpyxl) and then read the workbook: wb = openpyxl.load_workbook(PATH_TO_EXCEL_FILE). In this post, we will learn more about this, of course.

How to Read an Excel (xlsx) File in Python

Now, in this section, we will be reading an xlsx file in Python using openpyxl. In a previous section, we have already been familiarized with the general template (syntax) for reading an Excel file using openpyxl and we will now get into this module in more detail. Note we will also work with the Path method from the Pathlib module.

1. Import the Needed Modules

In the first step, to reading an xlsx file in Python, we need to import the modules we need. That is, we will import Path and openpyxl:

import openpyxl
from pathlib import PathCode language: Python (python)
reading excel (xlsx) files in python
  • Save

2. Setting the Path to the Excel (xlsx) File

In the second step, we will create a variable using Path. Furthermore, this variable will point at the location and filename of the Excel file we want to import with Python:

# Setting the path to the xlsx file:
xlsx_file = Path('SimData', 'play_data.xlsx')</code></pre>Code language: Python (python)
reading xlsx files in python
  • Save

Note, “SimData” is a subdirectory to that of the Python script (or notebook). That is, if we were to store the Excel file in a completely different directory, we need to put in the full path. For example, xlsx_file = Path(Path.home(), 'Documents', 'SimData', 'play_data.xlsx')if the data is stored in the Documents in our home directory.

3. Read the Excel File (Workbook)

In the third step, we are going to use Python to read the xlsx file. Now, we are using the load_workbook() method:

wb_obj = openpyxl.load_workbook(xlsx_file)Code language: Python (python)
how to read excel in python
  • Save

4. Read the Active Sheet from the Excel file

Now, in the fourth step, we are going to read the active sheet using the active method:

sheet = wb_obj.activeCode language: Python (python)
python read xlsx files and print the sheet
  • Save

Note, if we know the sheet name we can also use this to read the sheet we want: play_data = wb_obj['play_data']

5. Work or Manipulate the Excel Sheet

In the final and fifth steps, we can work or manipulate the Excel sheet imported with Python. For example, if we want to get the value from a specific cell, we can do as follows:

print(sheet["C2"].value)Code language: Python (python)

Another example of what we can do with the spreadsheet in Python is we can iterate through the rows and print them:

for row in sheet.iter_rows(max_row=6):
    for cell in row:
        print(cell.value, end=" ")
    print()Code language: Python (python)

Note that we used the max_row and set it to 6 to print the six first rows from the Excel file.

6. Bonus: Determining the Number of Rows and Columns in the Excel File

In the sixth, and bonus step, we are going to find out how many rows and columns we have in the example Excel file we have imported with Python:

print(sheet.max_row, sheet.max_column)Code language: Python (python)
learning how to read xlsx files in python
  • Save

Reading an Excel (xlsx) File to a Python Dictionary

Now, before we learn how to read multiple xlsx files, we will import data from Excel into a Python dictionary. It’s quite simple, but for the example below, we need to know the column names before we start. If we want to find out the column names, we can run the following code (or open the Excel file):

import openpyxl
from pathlib import Path

xlsx_file = Path('SimData', 'play_data.xlsx')
wb_obj = openpyxl.load_workbook(xlsx_file)
sheet = wb_obj.active

col_names = []
for column in sheet.iter_cols(1, sheet.max_column):
    col_names.append(column[0].value)
   
    
print(col_names)Code language: Python (python)

Creating a Dictionary from an Excel File

In this section, we will finally read the Excel file using Python and create a dictionary.

data = {}

for i, row in enumerate(sheet.iter_rows(values_only=True)):
    if i == 0:
        data[row[1]] = []
        data[row[2]] = []
        data[row[3]] = []
        data[row[4]] = []
        data[row[5]] = []
        data[row[6]] = []

    else:
        data['Subject ID'].append(row[1])
        data['First Name'].append(row[2])
        data['Day'].append(row[3])
        data['Age'].append(row[4])
        data['RT'].append(row[5])
        data['Gender'].append(row[6])Code language: Python (python)

Now, let’s walk through the code example above. First, we create a Python dictionary (data). Second, we loop through each row (using iter_rows), and we only go through the rows where there are values. Second, we have an if statement where we check if it’s the first row and add the keys to the dictionary. That is, we set the column names as keys. Third, we append the data to each key (column name) in the other statement.

How to Read Multiple Excel (xlsx) Files in Python

In this section, we will learn how to read multiple xlsx files in Python using openpyxl. Additionally to openpyxl and Path, we are also going to work with the glob module.

1. Import the Modules

In the first step, we are going to import the modules Path, glob, and openpyxl:

import glob
import openpyxl
from pathlib import PathCode language: Python (python)

The glob module and the os module can be used to check if your file is empty with Python.

2. Read all xlsx Files in the Directory to a List

Second, we will read all the .xlsx files in a subdirectory into a list. Now, we use the glob module together with Path:

xlsx_files = [path for path in Path('XLSX_FILES').rglob('*.xlsx')]Code language: Python (python)

3. Create Workbook Objects (i.e., read the xlsx files)

Third, we can now read all the xlsx files using Python. Again, we will use the load_workbook method. However, this time we will loop through each file we found in the subdirectory,

wbs = [openpyxl.load_workbook(wb) for wb in xlsx_files]Code language: Python (python)

Now, in the code examples above, we are using Python list comprehension (twice, in both step 2 and 3). First, we create a list of all the xlsx files in the “XLSX_FILES” directory. Second, we loop through this list and create a list of workbooks. Of course, we could add this to the first line of code above.

4. Work with the Imported Excel Files

We can now work with the imported excel files in the fourth step. For example, we can get the first file by adding “[0]” to the list. If we want to know the sheet names of this file we do like this:wbs[0].sheetnames . Many of the things we can do, and have done in the previous example on reading xlsx files in Python, can be done when we’ve read multiple Excel files.

Notice this is one great example of how to use this programming language. Other examples are, for instance, to use it for renaming files in Python.

Conclusion: Reading Excel (xlsx) Files in Python

In conclusion, Openpyxl is a powerful Python library that enables developers to read, write, and manipulate Excel files using Python. This tutorial covered how to read an Excel (xlsx) file in Python using Openpyxl. We started by importing the necessary modules, setting the path to the Excel file, and then reading the file and its active sheet. We then explored how to work with the sheet and even determine the number of rows and columns in the file.


Moreover, we also learned how to read an Excel file to a Python dictionary and create a dictionary from an Excel file. Finally, we learned how to read multiple Excel files in a directory by importing the required modules, reading all the xlsx files in the directory to a list, creating workbook objects, and working with the imported Excel files.


By mastering these techniques, you can easily read and manipulate Excel files using Python, which can be incredibly useful in various data processing applications. So go ahead, try it out, and unlock the full potential of Openpyxl in your Python projects!

It is, of course, possible to import data from various other file formats. For instance, read the post about parsing json files in Python to learn more about reading JSON files.

  • Save

11 thoughts on “Your Guide to Reading Excel (xlsx) Files in Python”

  1. Amir Katz (work)

    This section ‘Reading an Excel (xlsx) FIle to a Python Dictionary’ has a typo in its name, should be ‘File…’
    Great article!

  2. your examples use xlsx_file = Path(‘SimData’, ‘play_data.xlsx’)
    Since I do not have a copy of that spreadsheet, I find it difficult to adapt your examples to the use of my own spreadsheet.
    May I have a copy of ‘play_data.xlsx’

    1. Hey Robert,

      Thank you, for your comment. Typically I do link to the example data used in my tutorials. I’ll add a link to the Excel file later today. If you need practicing reading xlsx files ASAP here’s the link: Example .xlsx file. Have a nice day,

      Erik

  3. This is a great guide! I’ve been struggling to get my head around reading xlsx files in Python, but this guide has made it much easier.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top
Share via
Copy link
Powered by Social Snap