Inspired by my post for the JEPS Bulletin (Python programming in Psychology), where I try to show how Python can be used from collecting to analyzing and visualizing data, I have started to learn more data exploring techniques for Psychology experiments (e.g., response time and accuracy). Here are some methods, using Python, for visualization of distributed data that I have learned; kernel density estimation, cumulative distribution functions, delta plots, and conditional accuracy functions. These graphing methods let you explore your data in a way just looking at averages will not (e.g., Balota & Yap, 2011).
Required Python packages
I used the following Python packages; Pandas for data storing/manipulation, NumPy for some calculations, Seaborn for most of the plotting, and Matplotlib for some tweaking of the plots. Any script using these functions should import them:
Python is gaining popularity in many fields of science. This means that there also are many applications and libraries specifically for use in Psychological research. For instance, there are packages for collecting data & analysing brain imaging data. In this post, I have collected some useful Python packages for researchers within the field of Psychology and Neuroscience. I have used and tested some of them but others I have yet to try.
In this post I will describe, shortly, how to use InLine scripts in E-prime to save your data in comma-separated values (CSV) files. For those who are not familiar with E-prime it is an experiment generating software based on visual basic (i.e., it has its own scripting language called e-basic). Its main purpose is to make building experiment easy/easier. It offers a drag-and-drop graphical user interface (GUI) which is fairly easy to use (although I prefer OpenSesame and PsychoPy – which both offers drag-and-drop GUIs). See the Wikipedia article if you want to know more about e-prime.
This guide will assume that you have worked with e-prime before. That is, you should already have a, more or less, ready experiment that you can add the scripts to. In the guide I use a Simon task created in e-prime as an example.
PsychoPy, as I have previously written about (e.g., Free and Useful Software and PsychoPy tutorial) is really a great Python tool for creating Psychology experiments. You can write Python code by either using “code view” or import the package in your favourite IDE. Furthermore, you can use the builder mode and just drag and drop different items and PsychoPy will create a Python Script for you. If needed inline scripts (in Python, of course) can be inserted. That is, you can combine drag-and-drop building with some coding. In this post I have collected some tutorial videos that can be useful for someone unfamiliar with PsychoPy.
I created a playlist with 4 youtube tutorials. In the first video you will learn how to create a classical psychology experiment; the stroop task. The second is more into psycholinguistics and you will learn how to create a language experiment using PsychoPy. In the third you get to know how to create text input fields in PsychoPy. In this tutorial inline Python coding is used so you will also get to know how you may use programming. In the forth video you will get acquainted with using video stimuli in PsychoPy.
Programming for Psychology & Vision Science Tutorial series
Recently I found the following playlist on youtube and it is amazing. In this series of tutorial videos you will learn how to use PsychoPy as a Python package. For instance, it is starting at a very basic level; importing the visual module to create windows. In this video he uses my favourite Python IDE Spyder. The videos are actually screencasts from the course Programming for Psychology & Vision Science and contains 10 videos. The first 5 videos covers drawing stimuli on the screen (i.e., drawing to a video, gratings, shapes, images, dots). Watching these videos you will also learn how to collect responses, providing input, and saving your data.
That was all for now. If you know more good video tutorial for PsychoPy please leave a comment. Preferably the tutorials should cover coding but just building experiments with the builder mode is also fine. I may update my playlist with more PsychoPy tutorials (the first playlist).
Previously I have shown how to analyze data collected using within-subjects designs using rpy2 (i.e., R from within Python) and Pyvttbl. In this post I will extend it into a factorial ANOVA using Python (i.e., Pyvttbl). In fact, we are going to carry out a Two-way ANOVA but the same method will enable you to analyze any factorial design. I start with importing the Python libraries that are going to be use. Continue reading →
An important advantage of the two-way ANOVA is that it is more efficient compared to the one-way. There are two assignable sources of variation – supp and dose in our example – and this helps to reduce error variation thereby making this design more efficient. Two-way ANOVA (factorial) can be used to, for instance, compare the means of populations that are different in two ways. It can also be used to analyse the mean responses in an experiment with two factors. Unlike One-Way ANOVA, it enables us to test the effect of two factors at the same time. One can also test for independence of the factors provided there are more than one observation in each cell. The only restriction is that the number of observations in each cell has to be equal (there is no such restriction in case of one-way ANOVA).
The current post will focus on how to carry out between-subjects ANOVA using Python. As mentioned in an earlier post (Repeated measures ANOVA with Python) ANOVAs are commonly used in Psychology.
We start with some brief introduction on theory of ANOVA. If you are more interested in the four methods to carry out one-way ANOVA with Python click here. ANOVA is a means of comparing the ratio of systematic variance to unsystematic variance in an experimental study. Variance in the ANOVA is partitioned in to total variance, variance due to groups, and variance due to individual differences.
A common method in experimental psychology is within-subjects designs. One way to analysis the data collected using within-subjects designs are using repeated measures ANOVA. I recently wrote a post on how to conduct a repeated measures ANOVA using Python and rpy2. I wrote that post since the great Python package statsmodels do not include repeated measures ANOVA. However, the approach using rpy2 requires R statistical environment installed. Recently, I found a python library called pyvttbl whith which you can do within-subjects ANOVAs. Pyvttbl enables you to create multidimensional pivot tables, process data and carry out statistical tests. Using the method anova on pyvttbl’s DataFrame we can carry out repeated measures ANOVA using only Python. Continue reading →
After data collection, most Psychology researchers use different ways to summarise the data. In this tutorial we will learn how to do descriptive statistics in Python. Python, being a programming language, enables us many ways to carry out descriptive statistics.
One useful library for data manipulation and summary statistics is Pandas. Actually, Pandas offers an API similar to Rs. I think that the dataframe in R is very intuitive to use and Pandas offers a DataFrame method similar to Rs. Also, many Psychology researchers may have experience of R.
Thus, in this tutorial you will learn how to do descriptive statistics using Pandas, but also using NumPy, and SciPy. We start with using Pandas for obtaining summary statistics and some variance measures. After that we continue with the central tenancy measures (e.g., mean and median) using Pandas and NumPy. The harmonic, geometric, and trimmed mean cannot be calculated using Pandas or NumPy. For these measures of central tendency we will use SciPy. Towards the end we learn how get some measures of variability (e.g., variance using Pandas).
frompandas importDataFrame asdf
Simulate response time data
Many times in experimental psychology response time is the dependent variable. I to simulate an experiment in which the dependent variable is response time to some arbitrary targets. The simulated data will, further, have two independent variables (IV, “iv1” have 2 levels and “iv2” have 3 levels). The data are simulated as the same time as a dataframe is created and the first descriptive statistics is obtained using the method describe.
Pandas will output summary statistics by using this method. Output is a table, as you can see below.
Typically, a researcher is interested in the descriptive statistics of the IVs. Therefore, I group the data by these. Using describe on the grouped date aggregated data for each level in each IV. As can be seen from the output it is somewhat hard to read. Note, the method unstack is used to get the mean, standard deviation (std), etc as columns and it becomes somewhat easier to read.
Often we want to know something about the “average” or “middle” of our data. Using Pandas and NumPy the two most commonly used measures of central tenancy can be obtained; the mean and the median. The mode and trimmed mean can also be obtained using Pandas but I will use methods from SciPy.
There are at least two ways of doing this using our grouped data. First, Pandas have the method mean;
But the method aggregate in combination with NumPys mean can also be used;
Both methods will give the same output but the aggregate method have some advantages that I will explain later.
Geometric & Harmonic mean
Sometimes the geometric or harmonic mean can be of interested. These two descriptive statistics can be obtained using the method apply with the methods gmean and hmean (from SciPy) as arguments. That is, there is no method in Pandas or NumPy that enables us to calculate geometric and harmonic means.
Trimmed means are, at times, used. Pandas or NumPy seems not to have methods for obtaining the trimmed mean. However, we can use the method trim_mean from SciPy . By using apply to our grouped data we can use the function (‘trim_mean’) with an argument that will make 10 % av the largest and smallest values to be removed.
Most of the time I probably would want to see all measures of central tendency at the same time. Luckily, aggregate enables us to use many NumPy and SciPy methods. In the example below the standard deviation (std), mean, harmonic mean, geometric mean, and trimmed mean are all in the same output. Note that we will have to add the trimmed means afterwards.
That is all. Now you know how to obtain some of the most common descriptive statistics using Python. Pandas, NumPy, and SciPy really makes these calculation almost as easy as doing it in graphical statistical software such as SPSS. One great advantage of the methods apply and aggregate is that we can input other methods or functions to obtain other types of descriptives.
Update: Recently, I learned some methods to explore response times visualizing the distribution of different conditions: Exploring response time distributions using Python.
I am sorry that the images (i.e., the tables) are so ugly. If you happen to know a good way to output tables and figures from Python (something like Knitr & Rmarkdown) please let me know.
In this post we will learn how to reversepandasdataframe. We start by changing the first column with the last column and continue with reversing the order completely. After we have learned how to do that we continue by reversing the order of the rows. That is, pandas data frame can be reversed such that the last column becomes the first or such that the last row becomes the first.