This post aims to show why you, as a psychology student or researcher (or any other kind of researcher or student) should learn to program. The post is structured as follows. First, I start by discussing why you should learn to program and then give some examples of when programming skills are helpful. I continue to suggest two programming languages that all Psychology students and researchers should learn.
Table of Contents
- Why should Psychologists learn to Program?
- When are programming skills useful?
- What language should I learn?
- How can you learn to program?
- Reproducible Computational Environment
Why should Psychologists learn to Program?
Everyone should learn computer programming. It should be taught to our kids in our school. There are so many benefits of programming, and I will only write about the ones that I find most important. Writing code (i.e., programming) can be seen as applied mathematics and sciences. Some programming languages make you think algorithmically. It teaches you an iterative approach to solving problems and testing out your ideas. Most of the time, it is challenging and fun, and on top of this, it can make your life easier!
For instance, using simple scripts, you can automate things such as extracting information from your PDFs. This information may be used to rename the PDFs and organize them in folders if you’d like. Apart from making your everyday life easier (e.g., automating everyday tasks), of course, more reasons for learning programming. This post will focus on some of these reasons with an emphasis on when psychologists use coding skills.
I am a Ph.D. student focusing on Cognitive Psychology (BSc. and MSc in Cognitive Science). Naturally, my views on programming for researchers and students are colored by my discipline. However, I think that everyone should know about programming. Many researchers and Ph.D. students I know are conducting experiments (e.g., in Psychology and Neuroscience) doing some programming. Some do easier stuff, such as using SPSS syntax and Mplus. Others use more advanced coding in Matlab, E-prime, Python, or C.
If you are planning on graduate studies (i.e., aiming for a Masters’s or a Ph.D.) in Psychology or other cognitive sciences, programming is almost essential. I learned this when doing my Master’s thesis. There was no time for my supervisor to create an experiment in E-prime for me. After my Master’s, I started looking for Ph.D. positions, and many ads required Matlab, Python, or R knowledge.
When are programming skills useful?
Many Psychology researchers and students doing projects and their theses’, use some software for collecting data. Although there are many graphical interfaces (i.e., E-prime & Presentation), you will most likely need to use some scripting language to solve some issues. For instance, the graphical interfaces will probably not be able to do pseudo-randomization. Studying cognition or perception at the graduate level will require you to have some scripting or coding skills.
There is also an emerging interest in doing psychological research on social media (e.g., how different personality types express in Facebook behavior). The flexibility of programming languages may also open doors to research projects that cannot be reached with common statistical software (e.g., Stata & SAS). Data can be spread across thousands of text documents or available around the web (e.g., social media).
- Read the page about how to use Python for data collection to get examples of when programming can be used in Psychology.
There is lot of software available for data analysis: spreadsheets like Excel, batch-oriented procedure-based systems like SAS; point-and-click GUI-based systems like SPSS, Stata, and Statistica. Choosing an open-source programming language is typically free of charge (e.g., R or Python) and offers greater flexibility. Using a programming language, you analyze data by writing functions and scripts.
Although the learning curve may be steeper, it is a natural and expressive method for data analysis. Writing scripts has several positive aspects; it documents all your work, and you can automate sequences of tasks. That is, you can easily follow what you did last year (i.e., it is in your script) and save time if you run a similar analysis on many experiments.
- For a collection of all the data analysis-related tutorials, check out this page.
Another essential aspect of analyzing an open-source and free programming language is that it makes reproducibility easier. Your script documents every step of your process; anyone can download and install the software needed to run your analysis. I would say that if you are a proponent of Open Science, you should learn a programming language for doing your analysis.
Here are some example blog posts on data analysis using Python:
- Repeated Measures ANOVA using Python
- Exploratory Data Analysis in Python
- For Ways to Carry out One-Way ANOVA using Python
What language should I learn?
When people discuss beginner programming languages and which languages are easier and quicker to learn, Python inevitably comes up. It was created by Guido van Rossum but is administrated by the non-profit organization Python Software Foundation. The language is open source and free, even for commercial applications. Python is usually used and referred to as a scripting language. It is a high-level and general-purpose programming language. Thanks to its flexibility, Python is one of the most popular programming languages (e.g., number 3 on the TIOBE Index for November 2019). It got full support for both object-oriented programming and structured programming.
Why should I learn Python?
Why Python? First, it is open source and free. My personal experience with Matlab is that Python is far better than Matlab’s weird language. Furthermore, it integrates better with other languages (e.g., C/C++). Most importantly, however, is that there is a variety of both general-purpose and specialized python libraries. This means you can collect data (e.g., scrape the Web or software psychological experiments) using Python. Using common and more advanced statistical methods, you can also analyze your data. Note that it may be more complicated to install Python. However, there are scientifically focused distributions with many libraries that you will want to use. I have only used Anaconda and Python(x, y), but there is also Canopy.
Knowing a programming language like Python can make life much easier. You can, for instance, rename files in Python without any hustle. Furthermore, it is possible to use Python to read, append, and write to a file. Another useful example is that you can use Python to read xlsx files. Apart from the previous three example blog posts, there are more statistical tests you can perform in Python. If you want to carry out data analysis, you can carry out two-sample t-tests and Mann-Whitney U test in Python, for example.
A few Useful Python Libraries for Psychologists
There are, of course, so many useful Python libraries that it would need, at least, a separate post to list them all. However, here are a few I have found useful. First of all, it is worth mentioning that one very essential package is pip.
Python for Data Collection:
- OpenSesame is easy to install package and works on Linux, OS-X, Windows, and Android (both tablets, phones, and computers). The application offers a graphical user interface (GUI) for creating experiments. Requires minimal coding but lets you write in-line scripts. See OpenSesame Tutorial: Using Image Stimuli for an example.
- PsychoPy is also simple to install and cross-platform (Linux, OS-X, and Windows). It promises precision timing, and it has a lot of different types of stimuli ready to use. PsychoPy offers an Application Programming Interface (API) and a graphical interface for the drag-and-drop creation of experiments. You can, of course, combine both. For an example, see Psychomotor Vigilance Task (PVT) in PsychoPy.
Python for Data Analysis:
- Pandas is an open-source library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
Make sure to check out my Python Pandas dataframe tutorial!
- Statsmodels is a module that allows users to explore data, estimate statistical models, and perform statistical tests. Some features of Statsmodels are; linear regression, generalized linear models, functions for plotting, and tools for outputting results in tables in formats such as Text, LaTex, and HTML. See the post on how to carry out Repeated Measures ANOVA using Statsmodels for an example of how to use Statsmodels.
Note that if you install Anaconda, Canopy, or Python(x, y) you will get Pandas and Statsmodels and many more useful libraries. Python(x, y) also includes Spyder IDE.
If you are interested in Bayesian statistics:
- PyStan enables you to write Python code and send it to Stan. Stan is a package for Bayesian statistics using the No-U-Turn sampler.
- PyMC is a module that implements Bayesian statistical models and fitting algorithms. It includes Markov chain Monte Carlo. PyMC includes methods for summarizing output, plotting, goodness-of-fit, and convergence diagnostics.
Check out the guest post, Probabilistic Programming in Python (Bayesian Data Analysis), for examples using PyMC3.
Python for Data Visualization:
- Matplotlib is a data visualization library that is quite easy to use, and the plots are very modifiable.
- Seaborn is based on Matplotlib and is easier to use. Using this Python package, you can create common plots, such as bar plots, histograms, scatter plots, and many more. See this Data Visualization in Python Tutorial to learn about the 9 plots you should master. A more in-depth tutorial on how to make scatter plots using Seaborn can be found here.
Check this out: good how-tos, and tutorials, for collecting data, cleaning data, descriptive statistics, data analysis, and visualizing data in Python. These tutorials will teach you how to use Pandas, Seaborn, Matplotlib, and Statsmodels, among other great Python packages.
Installing Packages in Python
Now, you’ve learned about some useful Python packages. If you install a Python distribution (e.g., Anaconda), you will get most of the Python packages discussed in this post. However, installing, using, and upgrading Python packages is quite easy (e.g., pip can install specific versions of packages). In general, it is suggested to install Python packages in virtual environments (which can be done using pipx).
R Statistical Environment
What is R? R is a free and open-source programming language. It is a complete, interactive, object-oriented language focusing on data analysis. In R statistical environment, you can carry out a variety of statistical and graphical techniques. For instance, linear and non-linear modeling, classical statistical tests, time-series analysis, classification, and many more can be carried out using frequentist and Bayesian paradigms. Many new and exciting methods are developed in R, meaning that if you learn R, you can use pioneering techniques. There are too many good resources and packages. You can find some resources on learning R in my post: R resources for Psychologists.
Now, you can also import data using R and if you need to learn how to read Excel (.xlsx) files in R – check that post out. Of course, R can also be used to visualize data. See, for instance, how to make a scatter plot in R using ggplot2. Finally, there are many useful functions that you may want to learn more about. See the following R tutorials, for example.
- How to use %in% in R: 7 Example Uses of the Operator
- Using R to Add a Column to Dataframe Based on Other Columns with dplyr
- How to Generate a Sequence of Numbers in R with :, seq() and rep()
Which language should I learn (Python vs. R)?
Python and R are some of the most popular languages for data analysis. As previously mentioned, while Python is a general-purpose language with an easy-to-understand syntax, R’s functionality is developed with statisticians in mind (“language for statisticians by statisticians”). If your primary interest is doing data analysis, R is probably the language, to begin with. On the other hand, Python enables you to create experiments and may be good to start with if you need to program your experiments. The amount of data science libraries for Python is increasing, so in the future, maybe it will be the one for statistics. Whichever you choose, I suggest that you learn one at a time.
Note if you learn both languages, you can run R in Python. This will enable you to use the rich statistical packages from R within the elegant language of Python. Very handy!
How can you learn to program?
You can find more good resources and books for learning R in my posts, R resources for Psychologists, and R books for Psychologists. Note that you can also learn statistics online if you need that.
Reproducible Computational Environment
Now, if you decide how to learn to program, either in Python or R or maybe both, you may want to know about some cool tools to create fully reproducible code. That is if you want to make it easier for other researchers to reproduce your experiments (e.g., coded in PsychoPy) and your figures and data analysis (e.g., coded in R statistical environment), we can use Code Ocean or Binder. These two tools create a Docker image of your computational environment exactly like the one you used when coding. If you want to learn how to use Binder, see the following two tutorials:
- How to Use Binder and R for Reproducible Research
- How to Use Binder and Python for Reproducible Research
In the above tutorials, you will learn how to use git and Binder to create a fully reproducible computational environment for your Python or R code and analysis.
Learning programming is, of course, not easy. Excellent resources are available, and you can learn how to program with the right motivation. If you are early in your career (Master, Ph.D., or Post-doc), the time you spend now on learning programming will be rewarding in many ways. It is both challenging and fun and may change your way of thinking. It will also make your research easier to reproduce. Join the Open Science movement and upload your scripts on, for instance, GitHub or Bitbucket. This will enable other researchers to run your exact analysis. What do you think? Is programming essential or nothing for a Psychologist?