In this Pandas tutorial, we are going to learn how to convert a NumPy array to a DataFrame object. Now, you may already know that it is possible to create a dataframe in a range of different ways. For example, it is possible to create a Pandas dataframe from a dictionary.
As Pandas dataframe objects already are 2-dimensional data structures, it is of course quite easy to create a dataframe from a 2-dimensional array. Much like when converting a dictionary, to convert a NumPy array we use the pd.DataFrame() constructor:
In the next two sections, you will learn about the NumPy array and Pandas dataframe. After that, you will get the answer to the question “How do you convert an array to a DataFrame in Python?” Here will see an example on the simplest way to create a dataframe from an array. In the next section, on the other hand, we will get into more details about the syntax of the dataframe constructor. Finally, we will look at a couple of examples of converting NumPy arrays to dataframes. In these, last, sections you will see how to name the columns, make index, and such.
Multidimensional arrays are a means of storing values in several dimensions. For example, an array in two dimensions can be likened to a matrix and an array in three dimensions can be likened to a cube. In Python, multidimensional arrays are usually created using the NumPy library.
Storing data in this way can make it easier to organize large amounts of data in a structure that is easier to work with. An NumPy array in two dimensions can be likened to a grid, where each box contains a value. See the image above.
A dataframe is similar to an Excel sheet, i.e. a table of rows and columns. A typical Pandas dataframe may look as follows:
For most purposes, your observations (customers, patients, etc) make up the rows and columns describing the observations (e.g., variables such as age, gender, income, health status). A Pandas dataframe is simply a two-dimensional table. As you may know, there are plenty of ways to create a dataframe. Most of the time, we import our data from a file. For example, we can read a CSV file to a Pandas dataframe or reading the data from Excel files.
Now that we have an idea of what NumPy arrays and Pandas dataframes are, it may be obvious that converting one to the other is something that is very easy to do.
How do you convert an array to a DataFrame in Python?
To convert an array to a dataframe with Python you need to 1) have your NumPy array (e.g., np_array), and 2) use the pd.DataFrame() constructor like this:
df = pd.DataFrame(np_array, columns=[‘Column1’, ‘Column2’]). Remember, that each column in your NumPy array needs to be named with columns. If you use this parameter, that is.
Pandas DataFrame() Constructor Syntax
In this section, we will have a look at the synta, as well as the parameters, of the DataFrame() constructor. As you may be aware of, right now, this is the method we will use to create a dataframe from a NumPy array. Typically we import Pandas as pd and then we can use the DataFrame() method. Here’s the syntax of the constructor:
As you can see, in the image above, there’s one required parameter (the first one): data. Now, this is where we will put the NumPy array that we want to convert to a dataframe. Note, that if your data is stored in a Python dictionary, for instance, it is also possible to use this as input here. The other parameters of the DataFrame class is as follows:
- index : Index or array-like
Index to use for the resulting dataframe. If we don’t use this parameter, it will default to RangeIndex.
- columns : Index or array-like
Column labels to use for the resulting dataframe. Again, if we don’t use this parameter it will default to RangeIndex (0, 1, 2, …, n).
- dtype : dtype, default None
If we want data to be of a certain data type, dtype is the parameter to use. Only a single dtype is allowed.
- copy : boolean, default False
Will make a copy of data from inputs.
Our NumPy Array
First, before having a look at the examples we will create an array. First, we import NumPy and then we add a nested list to create a 2-dimensional array:
import numpy as np # Creating the array to convert numpy_array = np.array([[1, 'yo'], [4, 'bro'], [4, 'low'], [1, 'NumPy']])
In the next sections, we will go through a couple of examples on how to transform a NumPy array into a Pandas dataframe.
How to Convert a NumPy Array to Pandas dataframe: Example 1
Here’s a very simple example to convert an array to a dataframe:
import pandas as pd df = pd.DataFrame(numpy_array)
In the code above, we have the array (numpy_array). Second, we use the DataFrame class and here we only use the data parameter (i.e., our NumPy array). The resulting dataframe will look like this:
As you can see, if we’re not using the columns parameter, we will get numbers as column names (see the previous section for the parameters). Often, this is not a result to strive for as the later data analysis may be a bit hard to carry out if we don’t know what the different numbers are reflecting in terms of variables. Now, one option is to rename the columns in the Pandas dataframe or to set the names when creating the dataframe. If you need to know, you can list column names using Pandas
In the next example, we will have a look at transforming the NumPy array to a dataframe using the columns parameter.
Convert a NumPy Array to Pandas Dataframe with Column Names
If you want to convert an array to a dataframe and create column names you’ll just do as follows:
df = pd.DataFrame(numpy_array, columns=['digits', 'words'])
In the image below, you will see the resulting dataframe. It is important to know that the input to the columns parameter needs to be as long as the number of columns in the array. For example, in our NumPy array that we converted, we have 2 columns, and thus we need to add two column names. If we, on the other hand, had an array with 3 columns we need to put in e.g. a list with three column names.
Notice that the indexes are numbers (0-3, our dataframe). In the next example, we are going to work with the index parameter to change the index column. If your data contain dates, you can convert a column to datetime data type after you have created your dataframe.
Create a Pandas Dataframe from a NumPy Array with Custom Indexes
Here’s how to make a custom index column when converting the array to a dataframe:
df = pd.DataFrame(numpy_array, index=['day1', 'day2', 'day3', 'day4'], columns=['digits', 'words'])
Notice how we used the index parameter and used a list as the indexes. Again, as when adding column names, we need to have a e.g. list that is of the same size as the length of the index. In our example array, we have 4 rows and we, therefore, need to put in e.g. a list that has four elements. Here’s the converted NumPy array:
Note, that we can also make a column index in the Pandas dataframe after we have created it. For example, if we use the
set_index() method and use the column we want as index, we’re set. Now you have your data stored in a dataframe object and can start exploring your data. For example, Pandas have methods that enable you to create histograms, scatter matrix plots, and to add columns to the dataframe.
In this Pandas tutorial, you have learned how to transform an array into a dataframe. First, you learned about NumPy arrays and Pandas dataframe objects. After that, we had a look at the syntax and the DataFrame class, which we can use to create dataframe objects. After that, we had a look at three examples in which we converted NumPy arrays to Pandas dataframes. To summarize, here are the 2the simple steps for converting an array to a dataframe:
- importing NumPy and Pandas,
- Using pd.DataFrame() on your array
Hope you learned something valuable. If you did, please share the posts on your social media accounts.