C. Computational workflows

Learning targets
  • You will have the version control system git installed on your computer and be able to run the important commands.

  • You will be able to keep your computing environment stable.

  • You will have a roadmap on how to automate your code.

Important

This week’s assignments are a bit more difficult and will take longer to implement that the previous. Please plan for this.

Tasks

Read this chapter and watch this week’s videos. Afterwards go through the following assignments:

  • Install git (see Booklet)
  • Create a Github account (if you don’t yet have any)

Practical Task 1: Work on your own Github repository

  • Create a Git repository on GitHub or GitLab (you can call it e.g. “sandbox” or “just-trying-Git” 😉). Then clone it, edit a file, add, commit, and push your changes back to Github. Practical task 2: Contribute to someone else’s repository
  • Fork the following repository (it contains the file structure you should be already familiar with) : https://github.com/likeajumprope/BERD_course
  • Make a change to some of the files. For example, you can add a code comment (a command that is not executed) using the # symbol
  • Commit your changes.
  • Make a pull request to the original (my) repository. Make sure to summarize your changes and write a nice commit message.

Practical Task 2 (week 4): Practice using make

  • You will generate a simple plot from the existing sadata.csv file in the project folder of the repository you forked in Practical Task 1 (https://github.com/likeajumprope/BERD_course).

Step 1: Create a base R script

Create a file named plot_data_base.R in your project folder with the following content:

Read the CSV and make a simple base R plot

data <- read.csv("sadata.csv")

pdf("plot.pdf")
plot(data$date, data$SP.DYN.LE00.IN, type = "p", col = "blue",
     xlab = "Year", ylab = "Life Expectancy",
     main = "South Asia - Life Expectancy (sample)")
dev.off()

This script reads the data and creates a scatter plot saved as plot.pdf.


Step 2: Create a Makefile

In the same folder, create a file named Makefile with the following content:

plot.pdf: plot_data_base.R sadata.csv
    Rscript plot_data_base.R

clean:
    rm -f plot.pdf

⚠️ Use a real Tab (not spaces) at the beginning of the lines with Rscript and rm.


Step 3: Run and test

  1. Run the workflow:

    make plot.pdf

    → This creates plot.pdf

  2. Run it again:

    make plot.pdf

    → Nothing happens. The plot is already up to date.

  3. Add a comment to plot_data_base.R, save the file, and run again:

    make plot.pdf

    → The script will run again.

  4. Clean up:

    make clean

Questions for Reflection - What would happen if you deleted the plot.pdf file and ran make again? - What triggers make to re-run the R script? What doesn’t? - Why do we specify both the input data and the R script as dependencies in the Makefile?

Optional Tasks:

  • Create a Git repository for your current research project.
  • Think of things you would like to see improved in our course booklet (Alternatively: view the open issues) and choose one thing where you can make a contribution. Create a fork of the booklet project and then a pull request with your contribution and mark it in the respective issue.

Discuss your progress and potential issues with your accountability buddy.