In this Python data visualization tutorial, we will work with Pandas scatter_matrix method to explore trends in data. Previously, we have learned how to create scatter plots with Seaborn and histograms with Pandas, for instance. In this post, we’ll focus on scatter matrices (pair plots) using Pandas. Now, Pandas is using Matplotlib to make the scatter matrix.
In this Python tutorial, we will learn how to get the absolute value in Python. First, we will use the function abs() to do this. In this section, we will go through a couple of examples of how to get the absolute value. Second, we will import data with Pandas and use the abs method to get the absolute values in a Pandas dataframe.
For those of us who prefer audio-visual tutorials, there’s also a YouTube video explaining the content of this absolute value in Python tutorial (check the end of the post).
Python Absolute Value Tutorial
In this short Python Pandas tutorial, we will learn how to convert a Pandas dataframe to a NumPy array. Specifically, we will learn how easy it is to transform a dataframe to an array using the two methods values and to_numpy, respectively. Furthermore, we will also learn how to import data from an Excel file and change this data to an array.
In this brief Python Pandas tutorial, we will go through the steps of creating a dataframe from a dictionary. Specifically, we will learn how to convert a dictionary to a Pandas dataframe in 3 simple steps. First, however, we will just look at the syntax. After we have had a quick look at the syntax on how to create a dataframe from a dictionary we will learn the easy steps and some extra things. In the end, there’s a YouTube Video and a link to the Jupyter Notebook containing all the example code from this post.
In this short post, we will learn 6 methods to get the column names from Pandas dataframe. One of the nice things about Pandas dataframes is that each column will have a name (i.e., the variables in the dataset). Now, we can use these names to access specific columns by name without having to know which column number it is.
To access the names of a Pandas dataframe, we can the method columns(). For example, if our dataframe is called df we just type print(df.columns) to get all the columns of the pandas dataframe.
In this post, we are going to learn how to plot histograms with Pandas in Python. Specifically, we are going to learn 3 simple steps to make a histogram with Pandas. Now, plotting a histogram is a good way to explore the distribution of our data.
Note, at the end of this post there’s a YouTube tutorial explaining the simple steps to plot a Histogram with Pandas.
First of all, and quite obvious, we need to have Python 3.x and Pandas installed to be able to create a histogram with Pandas. Now, Python and Pandas will be installed if we have a scientific Python distribution, such as Anaconda or ActivePython, installed. On the other hand, Pandas can be installed, as many Python packages, using Pip: pip install pandas.
In this short post, we will learn how to save Seaborn plots to a range of different file formats. More specifically, we will learn how to use the plt.savefig method save plots made with Seaborn to:
- Portable Network Graphics (PNG)
- Portable Document Format (PDF)
- Encapsulated Postscript (EPS)
- Tagged Image File Format (TIFF)
- Scalable Vector Graphics (SVG)
In this post, we will learn how to use Pandas drop_duplicates() to remove duplicate records and combinations of columns from a Pandas dataframe. That is, we will delete duplicate data and only keep the unique values.
This Pandas tutorial will cover the following; what’s needed to follow the tutorial, importing Pandas, and how to create a dataframe fro a dictionary. After this, we will get into how to use Pandas drop_duplicates() to drop duplicate rows and duplicate columns.
In this post, we will learn how to use Pandas get_dummies() method to create dummy variables in Python. Dummy variables (or binary/indicator variables) are often used in statistical analyses as well as in more simple descriptive statistics. Towards the end of the post, there’s a link to a Jupyter Notebook containing all Pandas get_dummies() examples.
Dummy Coding for Regression Analysis
One statistical analysis in which we may need to create dummy variables in regression analysis. In fact, regression analysis requires numerical variables and this means that when we, whether doing research or just analyzing data, wishes to include a categorical variable in a regression model, supplementary steps are required to make the results interpretable.
In this post, we are going to learn how to read Stata (.dta) files in Python.
As previously described (in the read .sav files in Python post) Python is a general-purpose language that also can be used for doing data analysis and data visualization. One example of data visualization will be found in this post.
One potential downside, however, is that Python is not really user-friendly for data storage. This has, of course, lead to that our data many times are stored using Excel, SPSS, SAS, or similar software. See, for instance, the posts about reading .sav, and sas files in Python: