A common method in experimental psychology is **within-subjects designs**. One way to analysis the data collected using within-subjects designs are using **repeated measures ANOVA**. I recently wrote a post on how to conduct a repeated measures ANOVA using Python and rpy2. I wrote that post since the great Python package statsmodels do not include repeated measures ANOVA. However, the approach using rpy2 requires R statistical environment installed. Recently, I found a python library called pyvttbl whith which you can do within-subjects ANOVAs. Pyvttbl enables you to create multidimensional pivot tables, process data and carry out statistical tests. Using the method anova on pyvttbl’s DataFrame we can carry out repeated measures ANOVA using only Python. Continue reading

## Descriptive Statistics using Python

## Descriptive Statistics

After data collection, most **Psychology researchers** use different ways to summarise the data. In this tutorial we will learn how to do **descriptive statistics **in **Python**. Python, being a programming language, enables us many ways to carry out descriptive statistics.

One useful library for data manipulation and summary statistics is Pandas. Actually, Pandas offers an API similar to Rs. I think that the dataframe in R is very intuitive to use and Pandas offers a DataFrame method similar to Rs. Also, many Psychology researchers may have experience of R.

Thus, in this tutorial you will learn how to do descriptive statistics using Pandas, but also using NumPy, and SciPy. We start with using Pandas for obtaining summary statistics and some variance measures. After that we continue with the central tenancy measures (e.g., mean and median) using Pandas and NumPy. The harmonic, geometric, and trimmed mean cannot be calculated using Pandas or NumPy. For these measures of central tendency we will use SciPy. Towards the end we learn how get some measures of variability (e.g., variance using Pandas).

1 2 3 4 |
import numpy as np from pandas import DataFrame as df from scipy.stats import trim_mean, kurtosis from scipy.stats.mstats import mode, gmean, hmean |

### Simulate response time data

Many times in **experimental psychology** response time is the dependent variable. I to simulate an experiment in which the dependent variable is response time to some arbitrary targets. The simulated data will, further, have two independent variables (IV, “iv1” have 2 levels and “iv2” have 3 levels). The data are simulated as the same time as a dataframe is created and the first descriptive statistics is obtained using the method *describe*.

5 6 7 8 9 10 11 12 13 14 15 16 |
N = 20 P = ["noise","quiet"] Q = [1,2,3] values = [[998,511], [1119,620], [1300,790]] mus = np.concatenate([np.repeat(value, N) for value in values]) data = df(data = {'id': [subid for subid in xrange(N)]*(len(P)*len(Q)) ,'iv1': np.concatenate([np.array([p]*N) for p in P]*len(Q)) ,'iv2': np.concatenate([np.array([q]*(N*len(P))) for q in Q]) ,'rt': np.random.normal(mus, scale=112.0, size=N*len(P)*len(Q))}) |

#### Descriptive statistics using Pandas

17 |
data.describe() |

Pandas will output summary statistics by using this method. Output is a table, as you can see below.

Typically, a researcher is interested in the descriptive statistics of the IVs. Therefore, I group the data by these. Using describe on the grouped date aggregated data for each level in each IV. As can be seen from the output it is somewhat hard to read. Note, the method *unstack* is used to get the mean, standard deviation (std), etc as columns and it becomes somewhat easier to read.

18 19 |
grouped_data = data.groupby(['iv1', 'iv2']) grouped_data['rt'].describe().unstack() |

#### Central tendency

Often we want to know something about the “*average*” or “*middle*” of our data. Using Pandas and NumPy the two most commonly used measures of central tenancy can be obtained; the mean and the median. The mode and trimmed mean can also be obtained using Pandas but I will use methods from SciPy.

#### Mean

There are at least two ways of doing this using our grouped data. First, Pandas have the method mean;

20 |
grouped_data['rt'].mean().reset_index() |

But the method *aggregate* in combination with NumPys mean can also be used;

21 |
grouped_data['rt'].aggregate(np.mean).reset_index() |

Both methods will give the same output but the aggregate method have some advantages that I will explain later.

#### Geometric & Harmonic mean

Sometimes the *geometric* or *harmonic* mean can be of interested. These two descriptive statistics can be obtained using the method apply with the methods *gmean* and *hmean* (from SciPy) as arguments. That is, there is no method in Pandas or NumPy that enables us to calculate geometric and harmonic means.

##### Geometric

22 |
grouped_data['rt'].apply(gmean, axis=None).reset_index() |

##### Harmonic

23 |
grouped_data['rt'].apply(hmean, axis=None).reset_index() |

#### Trimmed mean

Trimmed means are, at times, used. Pandas or NumPy seems not to have methods for obtaining the *trimmed mean*. However, we can use the method *trim_mean* from SciPy . By using apply to our grouped data we can use the function (‘trim_mean’) with an argument that will make 10 % av the largest and smallest values to be removed.

24 25 |
trimmed_mean = grouped_data['rt'].apply(trim_mean, .1) trimmed_mean.reset_index() |

Output from the mean values above (trimmed, harmonic, and geometric means):

#### Median

The *median *can also be obtained using two methods;

26 27 |
grouped_data['rt'].median().reset_index() grouped_data['rt'].aggregate(np.median).reset_index() |

#### Mode

There is a method (i.e., pandas.DataFrame.mode()) for getting the mode for a DataFrame object. However, it cannot be used on the grouped data so I will use mode from SciPy:

28 |
grouped_data['rt'].apply(mode, axis=None).reset_index() |

Most of the time I probably would want to see all measures of central tendency at the same time. Luckily, aggregate enables us to use many NumPy and SciPy methods. In the example below the standard deviation (*std*), mean, harmonic mean, geometric mean, and trimmed mean are all in the same output. Note that we will have to add the trimmed means afterwards.

29 30 31 |
descr = grouped_data['rt'].aggregate([np.median, np.std, np.mean]).reset_index() descr['trimmed_mean'] = pd.Series(trimmed_mean.values, index=descr.index) descr |

### Measures of variability

Central tendency (e.g., the mean & median) is not the only type of summary statistic that we want to calculate. We will probably also want to have a look at a measure of the variability of the data.

#### Standard deviation

32 |
grouped_data['rt'].std().reset_index() |

#### Inter quartile range

Note that here the use unstack() also get the quantiles as columns and the output is easier to read.

33 |
ggrouped_data['rt'].quantile([.25, .5, .75]).unstack() |

#### Variance

34 |
ggrouped_data['rt'].var().reset_index() |

That is all. Now you know how to obtain some of the most common descriptive statistics using Python. Pandas, NumPy, and SciPy really makes these calculation **almost **as easy as doing it in graphical statistical software such as SPSS. One great advantage of the methods apply and aggregate is that we can input other methods or functions to obtain other types of descriptives.

Update: Recently, I learned some methods to explore response times visualizing the distribution of different conditions: Exploring response time distributions using Python.

I am sorry that the images (i.e., the tables) are so ugly. If you happen to know a good way to output tables and figures from Python (something like Knitr & Rmarkdown) please let me know.

## Six ways to reverse pandas dataframe

In this post we will learn how to **reverse** pandas dataframe. We start by changing the first column with the last column and continue with reversing the order completely. After we have learned how to do that we continue by reversing the order of the rows. That is, pandas data frame can be reversed such that the last column becomes the first or such that the last row becomes the first.

## Why Spyder is the Best Python IDE for Science

Spyder is the Python **best IDE** that I have tested so far for doing **data analysis,** but also for plain programming. In this post I will start to briefly describe the IDE. Following the description of this top IDE the text will continue with a discussion of my favorite features. You will also find out how to install Spyder on Ubuntu 14.04 and at the end of the post you will find a comparison of Rodeo (a newer IDE more RStudio like) and Spyder.

When I started programming in Python I used IDLE which is the IDE that you will get with your installation of Python (e.g., on Windows computers). I actually used IDLE IDE for some time. It was not until I started to learn R and found RStudio IDE. I thought that RStudio was great (and it still is!). However, after learning R and RStudio I started to look for a better Python IDE. Continue reading

## Every Psychologist Should Learn Programming

The aim of this post is to show you why you, as a psychology student or researcher (or any other kind researcher or student) should learn to program. The post is structured as follows. First I start with discussing why you should learn programming and then give some examples when programming skills are useful. I continue to suggest two programming languages that I think all Psychology students and researchers should learn.

## R resources for Psychologists

**Good resources** for learning R as a Psychologist are hard to find. By that I mean that there are so many great sites and blogs around the internet to learn R. Thus, it may be hard to find learning resources that targets Psychology researchers. Recently I wrote about 4 **good R books** targeted for **Psychology students** and researchers (i.e., R books for Psychologists). There are, however, of course other good resources for Psychological researchers to learn **R programming**. Therefore, this post will list some of the **best blogs** and **sites** to learn R. The post will be divided into two categories; **general **and Psychology focused R sites and blogs. For those who are not familiar with R I will start with a brief introduction on what R is (if you know R already;click here to skip to the links).

## What is R?

R is a **free **and** open source **programming language and environment. **Data analysis** in R is carried out by writing **scripts** and **functions**. R is a complete, interactive, object-oriented language.

In R statistical environment you are able to carry out a variety of statistical and graphical techniques. For instance, linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, and many more can be carried using both frequentist and bayesian paradigms.

You may want to start with R commander. It provides a menu which may make learning R a bit easier at the beginning. R can be downloaded here: The Comprehensive R Archive Network.

## R resources

R has a broad and helpful community and, therefore, there are many good resources for learning the language. Continue reading

## Installing Rodeo IDE on Linux

I recently wrote a post on the **RStudio** like **Python** **IDE** **Rodeo** (RStudio-like Python IDEs – Rodeo and Spyder). In that post, I installed and tested Rodeo 0.44. However, Rodeo 1.0 was released in October. Rodeo 1.0 cannot be installed using Pip. Therefore, I wrote a bash script for **downloading** and **unzipping** Rodeo. Note, the script below will now install Rodeo 2 and is tested on my Ubuntu 16.04 machines.

# What is Rodeo?

Rodeo is, as previously mentioned, a Python IDE very similar to RStudio. It is intended to use for **Data Science**. If you are coming from **R** and plan to add Python to your stack, Rodeo is probably going to be very familiar to you. Given that you have used RStudio, that is. I would still say that Spyder may be a better IDE for doing Data Science in Python. Why? Because, up to date, there are plenty of more features in **Spyder** compared Rodeo. Update: now you will get a .deb file when using wget (see below). Thus, we can use dpkg to install the Rodeo.

## Installing Rodeo

1 2 3 4 5 |
#!/bin/sh wget -O rodeo.deb https://www.yhat.com/products/rodeo/downloads/linux_64 sudo dpkg -i install rodeo.deb |

The above code will download the Linux 64 binaries for Rodeo, unzip it into the ‘/usr/local/bin’ directory, and remove the downloaded file. Finally, a symbolic link to the executable is created. As Jo writes in the comments, it seems like Rodeo is released exclusively for 64 bit only – there is no 32 bit Rodeo.

Note, you can just cut & paste the code and paste it into a command window. If you, however, save it as a bash script (i.e., install_rodeo.sh) you need to make it executable; *chmod +x install_rodeo.sh*. To download and install Rodeo:

1 |
sh install_rodeo.sh |

I have tested installing Rodeo IDE on **Ubuntu 14.04 and 16.04** with the above script. If you don’t have Jupyter and Matplotlib installed it may need to install them also;

1 |
pip install matplotlib jupyter |

## Installing Rodeo on Ubuntu/Debian

If you are using Ubuntu or Debian you can add Yhats repository (source Yhat Downloads):

1 2 3 4 5 |
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 33D40BC6 sudo add-apt-repository "deb http://rodeo-deb.yhat.com/ rodeo main" sudo apt update sudo apt -y install rodeo |

Thats all, now you should know how to easily download and install Rodeo 1.0 on your Linux machine. Please let me know if you need to do more than I described in this post.

## RStudio-like Python IDEs – Rodeo and Spyder

In this post I will discuss two **Python** **Integrated Development Environments (IDE)**; **Rodeo** and **Spyder**. Both IDEs might be useful for researchers used to work with **R** and **RStudio** (a very good and popular IDE for R) because they offer similar functionalities and graphical interfaces as RStudio. That is, Rodeo and Spyder can be seen as the RStudio for Python.

## Arduino plus a vibrating coin motor

Recently we bought 10 coin like vibrating motors from Precision Microdrives. In the following video you will one of them in use. It is connected to an Arduino Uno micro controller.

Update:

Now we have built some wrist bands/straps using the vibrating motors, velcro, silver tape, arduino, batteries and more.

Hopefully, we will have a couple of interesting research projects that make use of them in a near future (currently we have two planned). Right now the plan is to use PsychoPy to control them for our experiments. In fact, we have one planned to start in two weeks and another for being piloted this week. One more applied cognitive psychology/human factors and the other project more ground research. Very exiting. I may update the blog with a new post when we tested more.

## What programming language should I learn – 2

I recently asked** which programming language I should learn next year (i.e., 2016).** In this post I will evaluate the alternatives that I have by asking the question in different places around the internet. The post will end with the choice I made and how to install the language

To summarize my earlier post, I mainly use programming for creating Psychology experiments and, thus, need a powerful language. Furthermore, in Psychological experiments stimuli are typically being presented (e.g., sounds, images, text, or video). Responses need to be collected from the keyboard, mouse and specially built equipment (e.g., via USB; Arduino). For some experiments timing of the presentation and collection of responses might be significant. The language should, of course, be free, open source, and work on a computer running Windows, Linux, and OS-X. However, mobile platforms such as Smartphones and Tablets might also be interesting in the future. Note that all languages considered are more or less general purpose languages and might, therefore, be attractive to anyone that want to extend their stack and learn a new programming language 2016.

One of the most valuable answers I got was that I should look for a **functional language. **