The aim of this post is to show you why you, as a psychology student or researcher (or any other kind researcher or student) should learn to program. The post is structured as follows. First I start with discussing why you should learn programming and then give some examples when programming skills are useful. I continue to suggest two programming languages that I think all Psychology students and researchers should learn.
Why should Psychologists learn programming?
Everyone should learn computer programming. It should be taught to our kids, in our school. There are so many benefits of programming and I will, therefore, only write about the ones that I find most important. Writing code (i.e., programming) can be seen as applied mathematics and sciences. Some programming languages make you think algorithmically. It teaches you an iterative approach to solving problems and testing out your ideas. Most of the time it is challenging, fun, and, on top of this, it can make your life easier!
For instance, using simple scripts you can automate things such as extracting information from your PDFs. This information may be used to rename the PDFs and, if you’d like, organize them in folders. Apart from making your everyday life easier (e.g., automate everyday tasks) of course more reasons for learning programming. This post is going to focus on some of these reasons with an emphasis on when psychologists have use of coding skills.
I am a Ph.D. student with a focus on Cognitive Psychology (BSc. and MSc in Cognitive Science). Naturally, my views on programming for researchers and students are coloured by my discipline. However, I think that everyone should know programming. Many researchers and Ph.D. students that I know are conducting experiments (e.g., in Psychology and Neuroscience) do some programming. Some do easier stuff such as using SPSS syntax and Mplus. Others use more advanced coding in Matlab, E-prime, Python, or C.
If you are planning on graduate studies (i.e., aiming for a Masters or a Ph.D.) in Psychology, or another of the cognitive sciences, programming is almost essential. I learned this when doing my Masters thesis. There was no time for my supervisor to create an experiment in E-prime for me. After my Masters, I started to look for Ph.D. positions and many of the ads required knowledge in Matlab, Python, or R.
When is programming skills useful?
Many Psychology researchers, and students doing projects and their thesis’, use some software for collecting data. Although, there are many graphical interfaces (i.e., E-prime & Presentation) you will most likely need to use some scripting language to solve some issues. For instance, the graphical interfaces will probably not be able to do pseudo-randomization. I would say studying cognition or perception on graduate level will require you to have some scripting or coding skills.
There is also an emerging interest in doing psychological research on social media (e.g., how different personality types expresses in Facebook behaviour). The flexibility of programming languages may also open doors to research projects that cannot be reached with common statistical software (e.g., Stata & SAS). Data can be spread across thousands of text documents or available around the web (e.g., social media).
There is lots of software available for data analysis: spreadsheets like Excel, batch-oriented procedure-based systems like SAS; point-and-click GUI-based systems like SPSS, Stata, and Statistica. Choosing an open source programming language is typically free of charge (e.g., R or Python) and offers greater flexibility. Using a programming language, you do data analysis by writing functions and scripts.
Although, the learning curve may be steeper it is a very natural and expressive method for conducting data analysis. There are several positive aspects of writing scripts; it documents all your work, and you can automate sequences of tasks. That is, you can easily follow what you did last year (i.e., it is in your script) and save time if you are running similar analysis on many experiments.
Another very important aspect of doing analysis in an open source and free programming language is that it makes reproducibility easier. Your script documents every step of your process and anyone can download and install the software needed to run your analysis. I would say that if you are a proponent of Open Science you should learn a programming language for doing your analysis.
What language should I learn?
When people discuss beginner programming languages and which languages that are easier and quicker to learn, Python inevitably comes up. It was created by Guido van Rossum, but is administrated by the non-profit organisation Python Software Foundation. The language is open source and free, even for commercial applications. Python is usually used and referred to as a scripting language. It is a high-level and general-purpose programming language. Thanks to its flexibility, Python is one of the most popular programming languages (e.g., number 3 on the TIOBE Index for September 2018). It got full support for both object-oriented programming and structured programming.
Why Python? First, it is open source and free. My personal experience with Matlab is that The Python is just far better that Matlab’s weird language. Furthermore, it integrates better with other languages (e.g. C/C++). Most importantly, however, is that there is a variety of both general-purpose and specialized python libraries. This means you can do data collection (e.g., scrape the Web or software psychological experiments) using Python. You can also analyse the your data using both common and more advanced statistical methods. Note that it may be more complicated to install Python. However, there are scientifically focused distributions that has a lot of the libraries that you will want to use. Personally, I have only used Anaconda and Python(x, y) but there is also Canopy.
A few useful libraries
There are of course so many useful Python libraries so that it would need, at least, a separate post to list them all. However, here are a few I have found useful.
For data collection:
- OpenSesame is easy to install package and works on Linux, OS-X, Windows, and Android (both tablets, phones, and computers). The application offers a graphical user interface (GUI) for creating experiments. Requires minimal coding but let you write in-line scripts.
- PsychoPy is also simple to install and cross-platform (Linux, OS-X, and Windows). It promises precision timing, and it has a lot of different types of stimuli ready to use. PsychoPy offers both an Application Programming Interface (API) and a graphical interface for drag-and-drop creation of experiment. You can, of course, combine both.
For data analysis:
- Pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
Be sure to check out my Python Pandas dataframe tutorial!
- Statsmodels is a module that allows users to explore data, estimate statistical models, and perform statistical tests. Some of the features of Statsmodels are; linear regression, generalized linear models, functions for plotting, tools for outputting results in tables in formats such as Text, LaTex, and HTML.
Note that if you install Anaconda, Canopy, or Python(x, y) you will get Pandas and Statsmodels and many more useful libraries. Python(x, y) also includes Spyder IDE.
If you are interested in Bayesian statistics:
- PyStan enables you write Python code and send it to Stan. Stan is a package for Bayesian statistics using the No-U-Turn sampler.
- PyMC is a module that implements Bayesian statistical models and fitting algorithms. It includes Markov chain Monte Carlo. PyMC includes methods for summarizing output, plotting, goodness-of-fit and convergence diagnostics.
R statistical environment
R is a free and open source programming language. It is a complete, interactive, object-oriented language with a focus on data analysis. In R statistical environment you can carry out a variety of statistical and graphical techniques. For instance, linear and non-linear modeling, classical statistical tests, time-series analysis, classification, and many more can be carried using both frequentist and Bayesian paradigms. Many new and exiting methods are developed in R meaning that if you learn R you have the possibility to use pioneering techniques. There are to many good resources and packages. You can find some resources on learning R in my post: R resources for Psychologists.
Which language should I learn (Python vs. R)?
Python and R are, both, some the most popular languages for data analysis. As previously mentioned, while Python is a general-purpose language with an easy-to-understand syntax, R’s functionality is developed with statisticians in mind (“language for statisticians by statisticians”). If your primarily interest is doing data analysis R is probably the language to begin with. Python, on the other side, enables you to create experiments and may be good to start with if you need to program your experiments. The amount of data science libraries for Python is increasing so in the future maybe it will be the one for statistics also. Whichever you choose, I suggest that you learn one at a time.
How can you learn to program?
Learning programming is, of course, not easy. There are really good resources available, and with the right motivation you can learn how to program. If you are early in your career (Master, Ph.D., or Post-doc) the time you spend now on learning programming now will be rewarding in many ways. It is both challenging and fun, and may change your way of thinking. It will also make your research easier to reproduce. Join the Open Science movement and upload your scripts on, for instance, GitHub or Bitbucket. This will enable other researchers to run your exact analysis. What do you think? Programming is essential or nothing for a Psychologist?