Pandas Read CSV Tutorial: How to Read and Write

This Pandas tutorial will show you, by examples, how to use Pandas read_csv() method to import data from .csv files. In the first section, we will go through how to read a CSV file, how to read specific columns from a CSV, how to read multiple CSV files and combine them to one dataframe. Finally, we will also learn how to convert data according to specific datatypes (e.g., using Pandas read_csv dtypes parameter).

In the last section, we will continue by learning how to use Pandas to write CSV files. That is, we will learn how to export dataframes to CSV files.

Table of Contents

Pandas Import CSV from the Harddrive

In the first example, of this Pandas tutorial, we will just use read_csv to load CSV files, to dataframe, that are in the same directory as the script. If we have the file in another directory we have to remember to add the full path to the file.

Naturally, Pandas can be used to import data from a range of different file types. For instance, Excel (xlsx), and JSON files can be read into Pandas dataframes. Learn more about importing data in Pandas:

How to Read CSV File in Python Pandas

In this section, we are going to learn how to read CSV files from the harddrive. First, however, we are going to answer the question How do I open a CSV file in pandas?

How do I import a CSV file into Pandas using Python?

Here are two simple steps to learn how to read a CSV file in Pandas:
1) Import the Pandas package:
import pandas as pd

2)Use the pd.read_csv() method:
df = pd.read_csv('yourCSVfile.csv')

Note, the first parameter should be the file path to your CSV file.

In this tutorial, we will learn how to work with comma-separated (CSV) files in Python and Pandas. We will get an overview of how to use Pandas to load CSV to dataframes and how to write dataframes to CSV.

df = pd.read_csv('amis.csv')
df.head()Code language: Python (python)

If we are interested in reading files, in general, using Python we can use the open() method. This way, we can read many file formats (e.g., .txt) in Python.

Pandas Read CSV Example
  • Save
Dataframe

Additionally, we can use the index_col argument to make a column index in the Pandas dataframe. Finally, the data can be downloaded here but in the following examples, we are going to use Pandas read_csv to load data from a URL.

Pandas Read CSV from a URL Examples

Can we import a CSV file from a URL using Pandas? Yes, and in this section, we are going to learn how to read a CSV file in Python using Pandas, just like in the previous example. However, in the next read_csv example, we are going to read the same dataset but this time from a URL. It’s very simple we just put the URL in as the first parameter. Here areto three simple steps that will help us read a CSV from a URL:

  1. Again, we need to import Pandas
  2. Create a string variable with the URL
  3. Now use Pandas read_csv together with the URL (see example below)

Example 1: Read CSV from a URL

In the example code below, we follow the three easy steps to import a CSV file into a Pandas dataframe:

import pandas as pd

# String with URL:
url_csv = 'https://vincentarelbundock.github.io/Rdatasets/csv/boot/amis.csv'
# First example to read csv from URL
df = pd.read_csv(url_csv)
df.head()Code language: Python (python)
  • Save

In the image above, we can see that we get a column named ‘Unnamed: 0’. Furthermore, we can see that it contains numbers. Thus, when using Pandas, we can use this column as the index column. In the next code example, we are doing exactly this; we are going to use Pandas read_csv and the index_col parameter.

Example 2: Read CSV from a URL with index_col

This parameter can take an integer or a sequence. In our case, we are going to use the integer 0 and we will get a nicer dataframe:

url_csv = 'https://vincentarelbundock.github.io/Rdatasets/csv/boot/amis.csv'

# Pandas Read CSV from URL Example:
df = pd.read_csv(url_csv, index_col=0)
df.head()Code language: Python (python)
Pandas read_csv using index_cols
  • Save

The index_col parameter also can take a string as input and we will now use a different datafile. In the next example we will read a CSV into a Pandas dataframe and use the idNum column as index.

csv_url = 'http://vincentarelbundock.github.io/Rdatasets/csv/carData/MplsStops.csv'
df = pd.read_csv(csv_url, index_col='idNum')
df.iloc[:, 0:6].head()Code language: Python (python)
  • Save

Note, to get the above output we used Pandas iloc to select the first 7 rows. This was done to get an output that could be easier illustrated. That said, we are now continuing to the next section where we are going to read certain columns to a dataframe from a CSV file.

A final note before going futher with reading CSV files: It is also possible to use Pandas to read the iex cloud api with Python to import stock data.

Pandas Read CSV usecols

In some cases, we don’t want to parse every column in the CSV file. To only read certain columns we can use the parameter usecols. Note, if we want the first column to be the index column and we want to parse the three first columns we need to have a list with 4 elements (compare my read_excel usecols example here).

Pandas Read CSV Example: Specifying Columns to Import

Here’s an example when we use Pandas read_csv() and only read the three first columns:

cols = [0, 1, 2, 3]
df = pd.read_csv(url_csv,
                   index_col=0, usecols=cols)
df.head()Code language: Python (python)
  • Save
read_csv usecols

Note, we actually did read 4 columns but set the first column as the index column. Of course, using read_csv usecols make more sense if we had a CSV file with more columns. We can use usecols with a list of strings, as well. In the next example, we return to the larger file we used previously. Here’s how to use the column names in the datafile:

csv_url = 'http://vincentarelbundock.github.io/Rdatasets/csv/carData/MplsStops.csv'
df = pd.read_csv(csv_url, index_col='idNum',
                   usecols=['idNum', 'date', 'problem', 'MDC'])
df.head()Code language: Python (python)
usecols example
  • Save
usecols with list of strings

Pandas Import CSV files and Remove Unnamed Column

In some of the previous read_csv example, we get an unnamed column. In previous sections, of this Pandas read CSV tutorial, we have solved this by setting this column as the index columns, or used usecols to select specific columns from the CSV file. However, we may not want to do that for some reason. Here’s one example of how to use pd.read_csv to get rid of the column “Unnamed:0”:

csv_url = 'http://vincentarelbundock.github.io/Rdatasets/csv/carData/MplsStops.csv'
cols = pd.read_csv(csv_url, nrows=1).columns
df = pd.read_csv(csv_url, usecols=cols[1:])
df.iloc[:, 0:6].head()Code language: Python (python)
  • Save

How to Drop a Column from Pandas dataframe

It’s of course also possible to remove the unnamed columns after we have loaded the CSV to a dataframe. To remove the unnamed columns we can use two different methods; loc and drop, together with other Pandas dataframe methods. When using the drop method we can use the inplace parameter and get a dataframe without unnamed columns.

df.drop(df.columns[df.columns.str.contains('unnamed', case=False)],
          axis=1, inplace=True)

# The following line will give us the same result as the line above
# df = df.loc[:, ~df.columns.str.contains('unnamed', case=False)]

df.iloc[:, 0:7].head()Code language: PHP (php)
  • Save

To explain the code example above; we select the columns without columns containing the string ‘unnamed’. Furthermore, we used the case parameter so that the contains method is not case-sensitive. Thus, we will get columns named “Unnamed” and “unnamed”. In the first row, using Pandas drop, we are also using the inplace parameter so that it changes our dataframe. The axis parameter, however, is used to drop columns instead of indices (i.e., rows).

Pandas Read CSV and Missing Values

In the next Pandas read .csv example, we will learn how to handle missing values in a Pandas dataframe. If we have missing data in our CSV file and it’s coded in a way that makes it impossible for Pandas to find them we can use the parameter na_values. In the example below, the amis.csv file has been changed and there are some cells with the string “Not Available”.

  • Save
CSV file

That is, we are going to change “Not Available” to something that we easily can remove when carrying out data analysis later.

df = pd.read_csv('Simdata/MissingData.csv', index_col=0,
                   na_values="Not Available")
df.head()Code language: Python (python)
  • Save

Reading a CSV file and Skipping Rows

What if our data file(s) contain information on the first x rows and we need to skip rows when using Pandas read_csv? For instance, how can we skip the first three rows in a file looking like this:

  • Save

We will now learn how to use Pandas read_csv and skip x amount of rows. Luckily, it’s very simple we just use the skiprows parameter. In the following example, we are setting skiprows to 3 to skip the first 3 rows.

Pandas read_csv skiprows example:

How do we use Pandas skiprow parameter? Here’s a Pandas read_csv example, where we skip the three first rows:

df = pd.read_csv('Simdata/skiprow.csv', index_col=0, skiprows=3)
df.head()Code language: Python (python)
  • Save

Note we can obtain the same result as above using the header parameter (i.e., data = pd.read_csv(‘Simdata/skiprow.csv’, header=3)).

How to Read Certain Rows using Pandas

Can we read specific rows from a CSV file using Pandas read_csv method? If we don’t want to read every row in the CSV file we ca use the parameter nrows. In the next example, below we read the first 8 rows of a CSV file.

df = pd.read_csv(url_csv, nrows=8)
dfCode language: Python (python)

If we want to select random rows we can load the complete CSV file and use Pandas sample to randomly select rows (learn more about this by reading the Pandas Sample tutorial).

Pandas read_csv dtype

We can also set the data types for the columns. Although, in the amis dataset all columns contain integers we can set some of them to string data type. This is exactly what we will do in the next Pandas read_csv pandas example. We will use the dtype parameter and put in a dictionary:

url_csv = 'https://vincentarelbundock.github.io/Rdatasets/csv/boot/amis.csv'
df = pd.read_csv(url_csv, dtype={'speed':int, 'period':str, 'warning':str, 'pair':int})
df.info()Code language: Python (python)
  • Save

It’s, of course, possible to force other datatypes such as integer and float. All we have to do is change str to float, for instance (given that we have decimal numbers in that column, of course).

Load Multiple Files to a Dataframe

If we have data from many sources such as experiment participants we may have them in multiple CSV files. If the data, from the different CSV files, are going to be analyzed together we may want to load them all into one dataframe. In the next examples, we are going to use Pandas read_csv to read multiple files.

Example 1: Reading Multiple CSV Files using os fnmatch

First, we are going to use Python os and fnmatch to list all files with the word “Day” of the file type CSV in the directory “SimData”. Next, we are using Python list comprehension to load the CSV files into dataframes (stored in a list, see the type(dfs) output).

import os, fnmatch

csv_files = fnmatch.filter(os.listdir('./SimData'), '*Day*.csv')
dfs = [pd.read_csv('SimData/' + os.sep + csv_file)
       for csv_file in csv_files]

type(dfs)
# Output: listCode language: Python (python)

Finally, we use the method concat to concatenate the dataframes in our list. In the example files, there is a column called ‘Day’ so that each day (i.e., CSV file) is unique.

df = pd.concat(dfs, sort=False) df.Day.unique()
  • Save

Example 2: Reading Multiple CSV Files using glob

The second method we are going to use is a bit simpler; using Python glob. If we compare the two methods (os + fnmatch vs. glob) we can see that in the list comprehension we don’t have to put the path. This is because glob will have the full path to our files. Handy!

import glob

csv_files = glob.glob('SimData/*Day*.csv')
dfs = []

for csv_file in csv_files:
    temp_df = pd.read_csv(csv_file)
    temp_df['DataF'] = csv_file.split('\\')[1]
    dfs.append(temp_df)Code language: Python (python)

If we don’t have a column, in each CSV file, identifying which dataset it is (e.g., data from different days) we could apply the filename in a new column of each dataframe:

import glob

csv_files = glob.glob('SimData/*Day*.csv')
dfs = []

for csv_file in csv_files:
    temp_df = pd.read_csv(csv_file)
    temp_df['DataF'] = csv_file.split('\\')[1]
    dfs.append(temp_df)Code language: Python (python)

There are, of course, times when we need to rename multiple files (e.g., CSV files before loading them into Pandas dataframes). Luckily, to rename a file in Python we can use os.rename(). This method can be used regardless if we need to rename CSV or .txt files.

Now we know how to import multiple CSV files and, in the next section, we will learn how to use Pandas to write to a CSV file.

How to Write CSV files in Pandas

In this section, we will learn how to export dataframes to CSV files. We will start by creating a dataframe with some variables but first, we start by importing the modules Pandas:

import pandas as pdCode language: Python (python)

Before we go on and learn how to use Pandas to write a CSV file, we will create a dataframe. We will create the dataframe using a dictionary. The keys will be the column names and the values will be lists containing our data:

df = pd.DataFrame({'Names':['Andreas', 'George', 'Steve',
                           'Sarah', 'Joanna', 'Hanna'],
                  'Age':[21, 22, 20, 19, 18, 23]})
df.head()Code language: Python (python)
  • Save

Saving Pandas Dataframe to CSV

Now we are ready to learn how to save Pandas dataframe to CSV. It’s quite simple, we write the dataframe to CSV file using Pandas to_csv method. In the example below we don’t use any parameters but the path_or_buf which is, in our case, the file name.

df.to_csv('NamesAndAges.csv')Code language: Python (python)

Here’s how the exported dataframe look like:

  • Save

As can be seen in the image above we get a new column when we are not using any parameters. This column is the index column from our Pandas dataframe. When working with Pandas to_csv, we can use the parameter index and set it to False to get rid of this column.

df.to_csv('NamesAndAges.csv', index=False)Code language: PHP (php)

How to Write Multiple Dataframes to one CSV file

If we have many dataframes and we want to export them all to the same CSV file it is, of course, possible. In the Pandas to_csv example below we have 3 dataframes. We are going to use Pandas concat with the parameters keys and names.

This is done to create two new columns, named Group and Row Num. The important part is Group which will identify the different dataframes. In the last row of the code example we use Pandas to_csv to write the dataframes to CSV.

df1 = pd.DataFrame({'Names': ['Andreas', 'George', 'Steve',
                           'Sarah', 'Joanna', 'Hanna'],
                   'Age':[21, 22, 20, 19, 18, 23]})
df2 = pd.DataFrame({'Names': ['Pete', 'Jordan', 'Gustaf',
                           'Sophie', 'Sally', 'Simone'],
                   'Age':[22, 21, 19, 19, 29, 21]})
df3 = pd.DataFrame({'Names': ['Ulrich', 'Donald', 'Jon',
                           'Jessica', 'Elisabeth', 'Diana'],
                   'Age':[21, 21, 20, 19, 19, 22]})


df = pd.concat([df1, df2, df3], keys =['Group1', 'Group2', 'Group3'],
               names=['Group', 'Row Num']).reset_index()

df.to_csv('MultipleDfs.csv', index=False)Code language: Python (python)
  • Save

In the CSV file we get 4 columns. The keys parameter with the list ([‘Group1’, ‘Group2’, ‘Group3’]) will enable identification of the different dataframes we wrote. We also get the column “Row Num” which will contain the row numbers for each dataframe:

Conclusion

In this tutorial we have learned about importing CSV files into Pandas dataframe. More specifically, we have learned how to:

  • Load CSV files to dataframe using  Pandas read_csv
    • locally
    • from the WEB
  • Read certain columns
  • Remove unnamed columns
  • Handle missing values
  • Skipping rows and reading certain rows
  • Changing datatypes using dtypes
  • Reading many CSV files
  • Saving dataframes to CSV using Pandas to_csv
  • Save

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top
Share via
Copy link
Powered by Social Snap