This topic may sound technical and boring at first, but please bare with me 🙏. It will be useful!
Have you ever had the problem that you ran an old code and it just did not work anymore? After hours of digging into the issue you find that it’s because the software package you use has changed in the meantime 🧐
Or have you tried to reproduce someone else’s code, which seems to run on their machine but not on yours and you just don’t know why.
This chapter is all about avoiding such problems in the future by stabilizing your computing environment and software. ✅
What is a computing environment?
Your computing environment is defined by your computer, the operating system and the software installed. If you update your operating system or your software, your computing environment changes. In R, for example, you can learn a lot about your computing environment by typing sessionInfo()
.
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices datasets utils methods base
loaded via a namespace (and not attached):
[1] compiler_4.2.0 fastmap_1.1.1 cli_3.6.1 htmltools_0.5.5
[5] tools_4.2.0 yaml_2.3.7 rmarkdown_2.21 knitr_1.42
[9] jsonlite_1.8.4 xfun_0.39 digest_0.6.31 rlang_1.1.1
[13] renv_1.0.0 evaluate_0.20
It tells the R version, operating system, loaded R packages as well as their versions.
Options for stabilizing your computing environment
1) Record your computing environment
Document the software versions you used. For example if you use R, you could copy the output of sessionInfo()
into your README or somewhere else where future you (and others) can find this information. This is not exactly “stabilizing” but it gives the possibility to install the same software versions again.
2) Use one virtual machine per research project
You don’t need to know what a virtual machine is or how to set it up to be able to do this. I used to ask the wonderful IT person at my institute to set up a virtual machine for me and if your IT supporters know their job, they’ll be able to help you here.
A virtual machine is essentially a virtual computer on another computer or server (To those nerds out there, I know I am probably explaining it incorrectly but for the purpose of what we want to achieve here, it’s good enough). If you have one virtual machine for each project, you can keep the computing environment stable by not installing or updating software after you’ve finished the research project.
The downside of this strategy is that this is only for future you and your collaborators, but not for other researchers who want to work with the same computing environment.
3) Use one container per research project
Containers are similar to virtual machines (think little computer inside your computer). The big difference is that you can make them available for others. So you can send your container image (or the file describing it) to others.
Popular container tools are Docker and Apptainer (formerly Singularity). Learning to work with containers is not super easy, but it is worth the time and actually can be applied in so many other situations. So, a great skill to have even if you decide to quit research.
4) Other
There are many other options out there. I wrote down the three that are least dependent on the actual software you use. For R users, check out packages logrx
, rang
, packrat
, versions
, and renv
.
Further reading
That’s all for this chapter. I hope it was helpful and not too technical. Happy researching! 🙌