Press "Enter" to skip to content

Tag: Pandas

How to Read & Write SPSS Files in Python using Pandas

In this post we are going to learn 1) how to read SPSS (.sav) files in Python, and 2) how to write to SPSS (.sav) files using Python. 

Python is a great general-purpose language as well as for carrying out statistical analysis and data visualization. However, Python is not really user-friendly for data storage. Thus, often our data will be archived using Excel, SPSS or similar software.

Python MANOVA Made Easy using Statsmodels

In previous posts, we learned how to use Python to detect group differences on a single dependent variable. However, there may be situations in which we are interested in several dependent variables. In these situations, the simple ANOVA model is inadequate.

One way to examine multiple dependent variables using Python would, of course, be to carry out multiple ANOVA. That is, one ANOVA for each of these dependent variables. However, the more tests we conduct on the same data, the more we inflate the family-wise error rate (the greater chance of making a Type I error).

This is where MANOVA comes in handy. MANOVA, or Multivariate Analysis of Variance, is an extension of Analysis of Variance (ANOVA). However, when using MANOVA we have two, or more, dependent variables.

MANOVA and ANOVA is similar when it comes to some of the assumptions. That is, the data have to be:

  • normally distributed dependent variables
  • equal covariance matrices)

In this post will learn how carry out MANOVA using Python (i.e., we will use Pandas and Statsmodels). Here, we are going to use the Iris dataset which can be downloaded here.

The Easiest Data Cleaning Method using Python & Pandas

In this post we are going to learn how to do simplify our data preprocessing work using the Python package Pyjanitor. More specifically, we are going to learn how to:

  • Add a column to a Pandas dataframe
  • Remove missing values
  • Remove an empty column
  • Cleaning up column names

That is, we are going to learn how clean Pandas dataframes using Pyjanitor. In all Python data manipulation examples, here we are also going to see how to carry out them using only Pandas functionality.

How to Read and Write JSON Files using Python and Pandas

In this post, we will learn how to read and write JSON files using Python. In the first part, we are going to use the Python package json to create a JSON file and write a JSON file. After that, we are going to use Pandas json method to load JSON files into Pandas dataframe. Here, we will learn how to read from a JSON file locally and from an URL as well as how to read a nested JSON file using Pandas.

Finally, as a bonus, we will also learn how to manipulate data in Pandas dataframes, rename columns, and plot the data using Seaborn.

Exploratory Data Analysis in Python Using Pandas, SciPy, and Seaborn

In this post we are going to learn how to explore data using Python, Pandas, and Seaborn. The data we are going to explore is data from a Wikipedia article. In this post we are actually going to learn how to parse data from a URL using Python Pandas. Furthermore, we are going to explore the scraped data by grouping it and by Python data visualization. More specifically, we will learn how to count missing values, group data to calculate the mean, and then visualize relationships between two variables, among other things.

In previous posts we have used Pandas to import data from Excel and CSV files. In this post, however, we are going to use Pandas read_html, because it has support for reading data from HTML from URLs (https or http). To read HTML Pandas use one of the Python libraries LXML, Html5Lib, or BeautifulSoup4. This means that you have to make sure that at least one of these libraries are installed. In the specific Pandas read_html example here, we use BeautifulSoup4 to parse the html tables from the Wikipedia article.

Pandas Read CSV Tutorial: How to Read and Write

In this tutorial, we will learn how to work with comma separated (CSV) files in Python and Pandas. We will get an overview of how to use Pandas to load CSV to dataframes and how to write dataframes to CSV.

In the first section, we will go through, with examples, how to read a CSV file, how to read specific columns from a CSV, how to read multiple CSV files and combine them to one dataframe, and, finally, how to convert data according to specific datatypes (e.g., using Pandas read_csv dtypes). In the last section we will continue by learning how to write CSV files. That is, we will learn how to export dataframes to CSV files.

How to use Pandas Sample to Select Rows and Columns

In this tutorial we will learn how to use Pandas sample to randomly select rows and columns from a Pandas dataframe. There are some reasons for randomly sample our data; for instance, we may have a very large dataset and want to build our models on a smaller sample of the data. Other examples are when carrying out bootstrapping or cross-validation. Here we will learn how to; select rows at random, set a random seed, sample by group, using weights, and conditions, among other useful things.

Data Manipulation with Pandas: A Brief Tutorial

Learn three data manipulation techniques with Pandas in this guest post by Harish Garg, a software developer and data analyst, and the author of Mastering Exploratory Analysis with pandas.

Modifying a Pandas DataFrame Using the inplace Parameter

In this section, you’ll learn how to modify a DataFrame using the inplace parameter. You’ll first read a real dataset into Pandas. You’ll then see how the inplace parameter impacts a method execution’s end result. You’ll also execute methods with and without the inplace parameter to demonstrate the effect of inplace.

A Basic Pandas Dataframe Tutorial for Beginners

In this Pandas tutorial we will learn how to work with Pandas dataframes. More specifically, we will learn how to read and write Excel (i.e., xlsx) and CSV files using Pandas.

We will also learn how to add a column to Pandas dataframe object, and how to remove a column. Finally, we will learn how to subset and group our dataframe.

If you are not familiar with installing Python packages I have recorded a YouTube video explaining how to install Pandas. There’s also a playlist with videos towards the end of the post with videos of all topics covered in this post.