Press "Enter" to skip to content

Author: Erik Marsja

PhD in Psychology, Linköping University. Main interest is experimental and cognitive psychology. Enjoy programming in Python and R.

How to use Pandas read_html to Scrape Data from HTML Tables

In this Pandas tutorial, we will go through the steps on how to use Pandas read_html method for scraping data from HTML. First, in the simplest example, we are going to use Pandas to read HTML from a string. Second, we are going to go through a couple of examples in which we scrape data from Wikipedia tables with Pandas read_html. In a previous post, about exploratory data analysis in Python, we also used Pandas to read data from HTML tables.

How to use Pandas Scatter Matrix (Pair Plot) to Visualize Trends in Data

In this Python data visualization tutorial, we will work with Pandas scatter_matrix method to explore trends in data. Previously, we have learned how to create scatter plots with Seaborn and histograms with Pandas, for instance. In this post, we’ll focus on scatter matrices (pair plots) using Pandas. Now, Pandas is using Matplotlib to make the scatter matrix.

scatter matrix pair plot pandas

How to get Absolute Value in Python with abs() and Pandas

In this Python tutorial, we will learn how to get the absolute value in Python. First, we will use the function abs() to do this. In this section, we will go through a couple of examples of how to get the absolute value. Second, we will import data with Pandas and use the abs method to get the absolute values in a Pandas dataframe.

For those of us who prefer audio-visual tutorials, there’s also a YouTube video explaining the content of this absolute value in Python tutorial (check the end of the post).

Python Absolute Value Tutorial

python absolute value

How to Convert a Pandas DataFrame to a NumPy Array

In this short Python Pandas tutorial, we will learn how to convert a Pandas dataframe to a NumPy array.  Specifically, we will learn how easy it is to transform a dataframe to an array using the two methods values and to_numpy, respectively. Furthermore, we will also learn how to import data from an Excel file and change this data to an array.

transform dataframe to numpy array

How to Convert a Python Dictionary to a Pandas DataFrame

In this brief Python Pandas tutorial, we will go through the steps of creating a dataframe from a dictionary. Specifically, we will learn how to convert a dictionary to a Pandas dataframe in 3 simple steps. First, however, we will just look at the syntax. After we have had a quick look at the syntax on how to create a dataframe from a dictionary we will learn the easy steps and some extra things. In the end, there’s a YouTube Video and a link to the Jupyter Notebook containing all the example code from this post.

how to make a dataframe from python dictionary

How to Get the Column Names from a Pandas Dataframe – Print and List

In this short post, we will learn 6 methods to get the column names from Pandas dataframe. One of the nice things about Pandas dataframes is that each column will have a name (i.e., the variables in the dataset). Now, we can use these names to access specific columns by name without having to know which column number it is.

To access the names of a Pandas dataframe, we can the method columns(). For example, if our dataframe is called df we just type print(df.columns) to get all the columns of the pandas dataframe.

get pandas column names

How to Plot a Histogram with Pandas in 3 Simple Steps

In this post, we are going to learn how to plot histograms with Pandas in Python. Specifically, we are going to learn 3 simple steps to make a histogram with Pandas. Now, plotting a histogram is a good way to explore the distribution of our data.

Note, at the end of this post there’s a YouTube tutorial explaining the simple steps to plot a Histogram with Pandas.

Prerequisites

First of all, and quite obvious, we need to have Python 3.x and Pandas installed to be able to create a histogram with Pandas. Now, Python and Pandas will be installed if we have a scientific Python distribution, such as Anaconda or ActivePython, installed. On the other hand, Pandas can be installed, as many Python packages, using Pip: pip install pandas.

how to plot a histogram using Pandas

How to Read and Write Stata (.dta) Files in R with Haven

In this post, we are going to learn how to read Stata (.dta) files in R statistical environment. Specifically, we will learn 1) who to import .dta files in R using Haven, and 2) how to write dataframes to .dta file.

Data Import in R: Reading Stata Files

Now, R is, as we all know, a superb statistical programming environment. When it comes to importing and storing data, we can store our data in the native .rda format. However, if we have a collaborator that uses other statistical software (e.g., Stata) and/or that are storing their data in different formats (e.g., .dta files).

Now, this is when R shows us its brilliance; as an R user we can load data from a range of file formats; e.g., SAS (.7bdat), Stata (.dta), Excel (e.g., .xlsx), and CSV (.csv). On this site there are other tutorials on how to import data from (some) of these formats:

Before we go on and learn how to read SAS files in R, we will answer the questions: