Press "Enter" to skip to content

Month: November 2018

Exploratory Data Analysis in Python Using Pandas, SciPy, and Seaborn

In this post we are going to learn how to explore data using Python, Pandas, and Seaborn. The data we are going to explore is data from a Wikipedia article. In this post we are actually going to learn how to parse data from a URL using Python Pandas. Furthermore, we are going to explore the scraped data by grouping it and by Python data visualization. More specifically, we will learn how to count missing values, group data to calculate the mean, and then visualize relationships between two variables, among other things.

In previous posts we have used Pandas to import data from Excel and CSV files. In this post, however, we are going to use Pandas read_html, because it has support for reading data from HTML from URLs (https or http). To read HTML Pandas use one of the Python libraries LXML, Html5Lib, or BeautifulSoup4. This means that you have to make sure that at least one of these libraries are installed. In the specific Pandas read_html example here, we use BeautifulSoup4 to parse the html tables from the Wikipedia article.

Pandas Read CSV Tutorial: How to Read and Write

In the first section, we will go through, with examples, how to read a CSV file, how to read specific columns from a CSV, how to read multiple CSV files and combine them to one dataframe, and, finally, how to convert data according to specific datatypes (e.g., using Pandas read_csv dtypes).

In the last section, we will continue by learning how to use Pandas to write CSV files. That is, we will learn how to export dataframes to CSV files.

How to use Pandas Sample to Select Rows and Columns

In this tutorial, we will learn how to use Pandas sample to randomly select rows and columns from a Pandas dataframe. There are some reasons for randomly sample our data; for instance, we may have a very large dataset and want to build our models on a smaller sample of the data. Other examples are when carrying out bootstrapping or cross-validation. Here we will learn how to; select rows at random, set a random seed, sample by group, using weights, and conditions, among other useful things.

Pandas Excel Tutorial: How to Read and Write Excel files

In this Pandas Excel tutorial, we will learn how to work with Excel files in Python. It will provide an overview of how to use Pandas to load xlsx files and write spreadsheets to Excel.

In the first section, we will go through, with examples, how to use Pandas read_excel to; read an Excel file, read specific columns from a spreadsheet, read multiple spreadsheets and combine them to one dataframe. Furthermore, we are going to learn how to read many Excel files, and how to convert data according to specific datatypes (e.g., using Pandas dtypes).

.xlsx file

When we have done this, we will continue by learning how to use Pandas to write Excel files; how to name the sheets and how to write to multiple sheets.

Data Manipulation with Pandas: A Brief Tutorial

Learn three data manipulation techniques with Pandas in this guest post by Harish Garg, a software developer and data analyst, and the author of Mastering Exploratory Analysis with pandas.

Modifying a Pandas DataFrame Using the inplace Parameter

In this section, you’ll learn how to modify a DataFrame using the inplace parameter. You’ll first read a real dataset into Pandas. You’ll then see how the inplace parameter impacts a method execution’s end result. You’ll also execute methods with and without the inplace parameter to demonstrate the effect of inplace.