Hi all!
Recently, I came across an article about how microbes might have helped us develop big brains. Researchers suggested that changes in our gut microbiome composition might have led to shifts in our metabolism. Instead of storing energy in our tissues, we might have joined forces with some friendly microbes that helped increase our blood glucose levels. Since glucose is the brain’s primary fuel, this could have enabled us to develop larger brains. Fascinating idea 🦠!
I am sure that they had to use at least one R for loop in their analysis!
When you are done, grab the data here:
### Libraries ###
library(readr)
### Loading data ###
blue_tits <- read_csv("../data/blue_tits.csv") # remember to use the correct pathway
Our little dataset has given us plenty of opportunities to practice data cleaning and manipulation. However, it’s high time we expanded it — let’s create the “Year,” “Month,” “Day,” and “Check_day” columns. For this exercise, we’ll assume all the data were collected in 2024, so we can fill in the “Year” column right away. I like diversity, so feel free to create the remaining columns using your preferred method — including the cbind() function 🙂.
### Data manipulation ###
### Let's start with 3 different ways of adding columns to your existing data frame
blue_tits <- mutate(blue_tits, "Month" = NA,
"Day" = NA, "Check_day" = NA) # Or
blue_tits[,c("Month","Day", "Check_day")] <- NA # Or
Month <- NA
Day <- NA
Check_day <- NA
blue_tits <- cbind(blue_tits, Month, Day, Check_day)
### Let's add one more column we will need in a second
blue_tits[,"Year"] <- 2024
Let’s start small — we will jump to more complex for loops in just a few lines of code. The general scaffold of an R for loop looks like this:
for (i in something) {
do x
}
“Something” can be many things: a range, a vector or a list. You can think of “i” as an “iteration” or “index” — but you can give it any other name!
Now, imagine we need to create a list of days to check on our chicks when they are 14 days old. This will be the day we schedule their ringing and taking body measurements. If we wanted to write it in pseudo code ( basically, a list of steps we have to follow written in plain language pretending to be a code), it could look like this:
for (each row of blue tits data in row range) {
calculate the day I should check up on chicks
}
Now, let’s translate it into something that R will understand":
for (i in 1:nrow(blue_tits)) {
blue_tits$Check_day[i] <- blue_tits$Hatching_day[i] + 14
}
The first line
1:nrow(blue_tits)
give us a range of numbers from 1 to 152. The [i] part makes sure that our R loop iterates over 152 rows and fills the “Check_day” column with 152 values.
You may remember from previous tutorials that you can achieve the same goal using a much simpler command:
blue_tits$Check_day <- blue_tits$Hatching_day + 14
What’s more, this would be the preferred way to execute this idea. While it may not be immediately obvious, R loops can quickly become memory-intensive, so it’s best to run our commands as efficiently as possible.
However, a small issue arises when the “Hatching_day” column contains NAs — they will simply carry over to the “Check_day” column. This can sometimes be confusing; for instance, if we were only provided with the final table, we might not know whether the chicks didn’t hatch or if some data were missing. So, how about introducing a checkpoint to address this?
for (each row of blue tits data in row range) {
if (chicks from a given nest hatched) {
add 14 days to the hatching day
} else {
fill the cell with "didn't hatch"
}
}
And now, let’s make sure that R does what we want it to do. We’ll have it check whether “Hatching_day” is not NA. If that’s true (i.e. “Hatching_day” value is not NA), it means that chicks from the given nest have hatched and we can plan our ringing session.
for (i in 1:nrow(blue_tits)) {
if (!is.na(blue_tits$Hatching_day[i])) {
blue_tits$Check_day[i] <- blue_tits$Hatching_day[i] + 14
} else {
blue_tits$Check_day[i] <- "didn't hatch"
}
}
Do you see what the data frame looks like now?
If you’ve never worked with for loops before, congrats 🎉— that’s a lot to tackle in one day! On the other hand, if you’re ready to throw yourself in at the deep end, consider this challenge: imagine you need to share your project with friends who aren’t familiar with biology. They’d like to explore your data, but they don’t want to think about egg-laying dates the way we do. Instead, they want to know the exact day and month right away, without having to figure out, for example, that 38 corresponds to the 8th of May (reminder: in our dataset, day 1 is April 1st).
How would you do it using a for loop? How would you fill in the “Month” column?
for (i in 1:nrow(blue_tits)){
if (blue_tits$Laying_day[i] < 1) {
blue_tits$Month[i] <- 3
}
else if (blue_tits$Laying_day[i] >= 1 & blue_tits$Laying_day[i] < 31) {
blue_tits$Month[i] <- 4
}
else if (blue_tits$Laying_day[i] >= 31 & blue_tits$Laying_day[i] < 62){
blue_tits$Month[i] <- 5
}
else {
blue_tits$Month[i] <- 6
}
}
And what about the “Day” column?
for (i in 1:nrow(blue_tits)){
if (blue_tits$Month[i] == 3) {
blue_tits$Day[i] <- blue_tits$Laying_day[i] + 31
}
else if (blue_tits$Month[i] == 4) {
blue_tits$Day[i] <- blue_tits$Laying_day[i]
}
else if (blue_tits$Month[i] == 5){
blue_tits$Day[i] <- blue_tits$Laying_day[i] - 30
}
else {
blue_tits$Day[i] <- blue_tits$Laying_day[i] - 61
}
}
Enjoy the evening!
Aga
PS: Here is the survey in which you can tell me what R topic you find particularly confusing and why you want to learn it so that we can shape this space together!