Hi beautiful creature,
A couple of days ago, a helicopter dropped me off on Lundy Island, where we’re carrying out our winter fieldwork. This time it’s all about sparrows (and quirky mushrooms like the parrot waxcap), but it’s nice to chase something other than your main study subject from time to time 🚁.
So here I am, sitting in an old armchair in a house that once belonged to a man who not so long ago called himself the king of the island. He even put his profile on a coin and issued his own stamps. I imagine most of his subjects must have been sheep, which still greatly outnumber people on the island, but I can understand why someone would find pleasure in claiming all the beautiful views for themselves.
Today’s piece will be a letter to my future self, but I hope you’ll find it helpful too.
Do you remember how, back in school, teachers made you put your name, the date, and the title of the essay on the first page during exams? Or how they made you start every entry in your notebook in the same format?
You probably can’t recall most of those essays or lessons, but that habit can turn out to be surprisingly useful now.
Primary school essays and R scripts have a lot in common.
When writing your script, you should be both empathetic and egoistic. Think about:
a) Your future self, 10 years from now, desperately trying to recall what the project was even about.
b) Your fellow researcher, doing their best to understand what you’re trying to achieve with your code.
So, be kind to your future self and your collaborators.
There’s no definitive rule on how to start a script, but here’s how I usually do it:
Name (yours and any collaborators — always give credit where it’s due!)
Date
Title
Purpose of the script — why you bothered to write it in the first place.
Optional, but highly recommended: A list of packages used, including their versions.
I used to think that reporting package versions was an unnecessary hassle. Then one day, I finally understood why people do it. It took me three hours to figure out which version of an R package I needed to install to run my meta-analysis (I followed a tutorial), and I’m certain I don’t want to go through that process again.
This is because if someone wants to replicate your analysis, they need to know exactly what you did. You’d be surprised how vastly different results we can get when trying to replicate someone else’s work, so it’s important to minimise that risk.
If you’re lazy, you might argue that this information belongs in a README file (and don’t worry if you’re not using them yet). And yes, it should be there — but I also like to include it in the script itself. Especially if you just started your journey and maybe you don’t have a GitLab or GitHub account where you would normally upload your README files.
Do you also remember your literature teacher encouraging you to walk readers through your arguments step by step until delivering the punchline? The same applies to your code. Ideally, your code would be self-explanatory. However, it’s a valuable skill to write it in a way that requires minimal comments. Even so, it’s a good idea to divide your code into smaller, clearly titled segments.
At the beginning of your journey, it’s better to over-describe than under-describe. Why? Because we tend to overestimate our (or others’) ability to recall the reasoning behind certain lines of code. Even if you’re confident about what a long, intricate bunch of functions does now, you’ll probably have to Google some of them in a month.
Here’s a simple example of the comments I leave in my scripts:
To create a comment in R (a line of text that R will ignore), simply start the line with a “#
” symbol. If you want to silence an entire block of text or code, you can select it and press ctrl + shift + C.
Finally, explore good practices for naming variables, functions, etc. A very brief but nice resource is the GitHub page with guidelines compiled by Google software engineers. You can find them here. You can also take a look at the Tidyverse style guide.
For me, the biggest takeaway is to be consistent (Yup, future-me, I’m looking at you!). If you decide to write all your variable names in lowercase and function names in uppercase, stick to it. If you format segment titles with ###
and inline comments with #
, don’t mix them up.
It might seem pedantic, but there’s an odd satisfaction in reading a well-organized script rather than a sCRiPt.
You may have some personal preferences that are different from those described on the GitHub page, but as long as you are sticking to your own rules and taking into account feedback from people who have to re-run your code, you should be fine.
Carpe diem,
Aga
When working with conda in R, all code should always be shared with the relevant conda env file, so one can reproduce the environment precisely and identically and programatically.
More sheep pictures please!