The aim of this post is to show you why you, as a psychology student or researcher (or any other kind researcher or student) should learn to program. The post is structured as follows. First I start by discussing why you should learn programming and then give some examples when programming skills are useful. I continue to suggest two programming languages that I think all Psychology students and researchers should learn.
Why should Psychologists learn programming?
Everyone should learn computer programming. It should be taught to our kids, in our school. There are so many benefits of programming and I will, therefore, only write about the ones that I find most important. Writing code (i.e., programming) can be seen as applied mathematics and sciences. Some programming languages make you think algorithmically. It teaches you an iterative approach to solving problems and testing out your ideas. Most of the time it is challenging, fun, and, on top of this, it can make your life easier!
For instance, using simple scripts you can automate things such as extracting information from your PDFs. This information may be used to rename the PDFs and, if you’d like, organize them in folders. Apart from making your everyday life easier (e.g., automate everyday tasks) of course more reasons for learning programming. This post is going to focus on some of these reasons with an emphasis on when psychologists have the use of coding skills.
I am a Ph.D. student with a focus on Cognitive Psychology (BSc. and MSc in Cognitive Science). Naturally, my views on programming for researchers and students are colored by my discipline. However, I think that everyone should know about programming. Many researchers and Ph.D. students that I know are conducting experiments (e.g., in Psychology and Neuroscience) do some programming. Some do easier stuff such as using SPSS syntax and Mplus. Others use more advanced coding in Matlab, E-prime, Python, or C.
If you are planning on graduate studies (i.e., aiming for a Masters or a Ph.D.) in Psychology or another of the cognitive sciences, programming is almost essential. I learned this when doing my Master’s thesis. There was no time for my supervisor to create an experiment in E-prime for me. After my Masters, I started to look for Ph.D. positions and many of the ads required knowledge in Matlab, Python, or R.
When are programming skills useful?
Many Psychology researchers, and students doing projects and their thesis’, use some software for collecting data. Although there are many graphical interfaces (i.e., E-prime & Presentation) you will most likely need to use some scripting language to solve some issues. For instance, the graphical interfaces will probably not be able to do pseudo-randomization. I would say studying cognition or perception at the graduate-level will require you to have some scripting or coding skills.
There is also an emerging interest in doing psychological research on social media (e.g., how different personality types expresses in Facebook behavior). The flexibility of programming languages may also open doors to research projects that cannot be reached with common statistical software (e.g., Stata & SAS). Data can be spread across thousands of text documents or available around the web (e.g., social media).
There is lots of software available for data analysis: spreadsheets like Excel, batch-oriented procedure-based systems like SAS; point-and-click GUI-based systems like SPSS, Stata, and Statistica. Choosing an open-source programming language is typically free of charge (e.g., R or Python) and offers greater flexibility. Using a programming language, you do data analysis by writing functions and scripts.
Although the learning curve may be steeper it is a very natural and expressive method for conducting data analysis. There are several positive aspects of writing scripts; it documents all your work, and you can automate sequences of tasks. That is, you can easily follow what you did last year (i.e., it is in your script) and save time if you are running a similar analysis on many experiments.
Another very important aspect of doing analysis in an open-source and free programming language is that it makes reproducibility easier. Your script documents every step of your process and anyone can download and install the software needed to run your analysis. I would say that if you are a proponent of Open Science you should learn a programming language for doing your analysis.
Here are some example blog posts on data analysis using Python:
- Repeated Measures ANOVA using Python
- Exploratory Data Analysis in Python
- For Ways to Carry out One-Way ANOVA using Python
What language should I learn?
When people discuss beginner programming languages and which languages that are easier and quicker to learn, Python inevitably comes up. It was created by Guido van Rossum, but is administrated by the non-profit organization Python Software Foundation. The language is open source and free, even for commercial applications. Python is usually used and referred to as a scripting language. It is a high-level and general-purpose programming language. Thanks to its flexibility, Python is one of the most popular programming languages (e.g., number 3 on the TIOBE Index for November 2019). It got full support for both object-oriented programming and structured programming.
Why should I learn Python?
Why Python? First, it is open source and free. My personal experience with Matlab is that The Python is just far better than Matlab’s weird language. Furthermore, it integrates better with other languages (e.g. C/C++). Most importantly, however, is that there is a variety of both general-purpose and specialized python libraries. This means you can do data collection (e.g., scrape the Web or software psychological experiments) using Python. You can also analyze your data using both common and more advanced statistical methods. Note that it may be more complicated to install Python. However, there are scientifically focused distributions that have a lot of the libraries that you will want to use. Personally, I have only used Anaconda and Python(x, y) but there is also Canopy.
A few Useful Python Libraries for Psychologists
There are of course so many useful Python libraries so that it would need, at least, a separate post to list them all. However, here are a few I have found useful.
Python for Data Collection:
- OpenSesame is easy to install package and works on Linux, OS-X, Windows, and Android (both tablets, phones, and computers). The application offers a graphical user interface (GUI) for creating experiments. Requires minimal coding but lets you write in-line scripts. See OpenSesame Tutorial: Using Image Stimuli for an example.
- PsychoPy is also simple to install and cross-platform (Linux, OS-X, and Windows). It promises precision timing, and it has a lot of different types of stimuli ready to use. PsychoPy offers both an Application Programming Interface (API) and a graphical interface for drag-and-drop creation of experiments. You can, of course, combine both.
Python for Data Analysis:
- Pandas is an open-source library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
Make sure to check out my Python Pandas dataframe tutorial!
- Statsmodels is a module that allows users to explore data, estimate statistical models, and perform statistical tests. Some of the features of Statsmodels are; linear regression, generalized linear models, functions for plotting, tools for outputting results in tables in formats such as Text, LaTex, and HTML. See the post on how to carry out Repeated Measures ANOVA using Statsmodels for an example on how to use Statsmodels.
Note that if you install Anaconda, Canopy, or Python(x, y) you will get Pandas and Statsmodels and many more useful libraries. Python(x, y) also includes Spyder IDE.
If you are interested in Bayesian statistics:
- PyStan enables you to write Python code and send it to Stan. Stan is a package for Bayesian statistics using the No-U-Turn sampler.
- PyMC is a module that implements Bayesian statistical models and fitting algorithms. It includes Markov chain Monte Carlo. PyMC includes methods for summarizing output, plotting, goodness-of-fit, and convergence diagnostics.
Make sure to check out the guest post, Probabilistic Programming in Python (Bayesian Data Analysis), for examples using PyMC3.
Python for Data Visualization:
- Matplotlib is a data visualization library that is quite easy to use and the plots are very modifiable.
- Seaborn is based on Matplotlib and is easier to use. Using this Python package you can create most of the common plots such as bar plots, histograms, scatter plots, and many more. See this Data Visualization in Python Tutorial to learn about 9 plots you should master. A more in-depth tutorial on how to make scatter plots using Seaborn can be found here.
R Statistical Environment
What is R? R is a free and open-source programming language. It is a complete, interactive, object-oriented language with a focus on data analysis. In R statistical environment you can carry out a variety of statistical and graphical techniques. For instance, linear and non-linear modeling, classical statistical tests, time-series analysis, classification, and many more can be carried using both frequentist and Bayesian paradigms. Many new and exiting methods are developed in R meaning that if you learn R you have the possibility to use pioneering techniques. There are too many good resources and packages. You can find some resources on learning R in my post: R resources for Psychologists.
Which language should I learn (Python vs. R)?
Python and R are, both, some of the most popular languages for data analysis. As previously mentioned, while Python is a general-purpose language with an easy-to-understand syntax, R’s functionality is developed with statisticians in mind (“language for statisticians by statisticians”). If your primary interest is doing data analysis R is probably the language to begin with. Python, on the other side, enables you to create experiments and may be good to start with if you need to program your experiments. The amount of data science libraries for Python is increasing so in the future maybe it will be the one for statistics also. Whichever you choose, I suggest that you learn one at a time.
Note, if you learn both languages you can actually run R in Python. This will enable you to use the rich statistical packages from R within the elegant language of Python. Very handy!
How can you learn to program?
Reproducible Computational Environment
Now, if you decided how to learn how to program, either in Python or R, or maybe both, you may want to know about some cool tools to create a fully reproducible code. That is if you want to make it easier for other researchers to reproduce your experiments (e.g., coded in PsychoPy), and your figures and data analysis (e.g., coded in R statistical environment) we can use Code Ocean or Binder. These two tools create a Docker image of your computational environment exactly like the one you used when coding. If you want to learn how to user Binder see the following two tutorials:
- How to Use Binder and R for Reproducible Research
- How to Use Binder and Python for Reproducible Research
In the above tutorials, you will learn how to use git and Binder to create a fully reproducible computational environment for your Python or R code and analysis.
Learning programming is, of course, not easy. There are really good resources available, and with the right motivation, you can learn how to program. If you are early in your career (Master, Ph.D., or Post-doc) the time you spend now on learning programming now will be rewarding in many ways. It is both challenging and fun and may change your way of thinking. It will also make your research easier to reproduce. Join the Open Science movement and upload your scripts on, for instance, GitHub or Bitbucket. This will enable other researchers to run your exact analysis. What do you think? Programming is essential or nothing for a Psychologist?