Hi!
Imagine that you have a very long list of things. But a truly long one. And it is messy. You want to figure out whether this list contains certain words or values of your interest. Going through such a list would take you ages, and chances are you would still make mistakes 🤡.
What do you do then?
Well, let me introduce the ✨grepl()✨ function!
It can help you pick out entire words or values or just bits of them.
Some time ago, we learnt about the filter() function (check it out here), so now we can incorporate it into our workflow.
In April and May I walk through the woods and check whether the nest boxes are occupied. Did you know that blue tits lay around a dozen of paper clip-sized eggs, each of them weighing a gram 🐣? It may seem tiny, but each egg makes up for one-tenth of the female body weight. What a spirit!
Since the nestboxes are scattered across 100 hectares of forest, we divided this area into sub-territories. Each territory is denoted by a different letter: A, B, C, F, X, GA, J or H.

If, let’s say, I received a list of nest boxes and wanted to check how many of them I have to visit in section A, I could type:
### Libraries ###
library(dplyr)
library(readr)
### Loading data ###
blue_tits <- read_csv("paste_the_path_to_the file_here")
section_A <- filter(blue_tits, grepl("A", Nestbox))
# here you are basically telling R that you want to search for the letter "A" in the "Nestbox" column
Oops, if you tried this yourself, you might have noticed that R is returning rows with nest boxes that contain the letter A anywhere in their names, e.g. GA14.
We don’t want that.
section_A <- filter(blue_tits, grepl("^A", Nestbox))
# the "^" denotes that we are searching for something that STARTS with an A.
Since I am lazy and I want to make R do the counting for me, I would type in the console:
nrow(section_A)
which returns the number of rows that meet the criteria.
Are you interested in finding all the nest boxes no. 11, no matter the territory they are in? No worries, try:
boxes_no_11 <- filter(blue_tits, grepl("11", Nestbox))
Okay, okay, how can we know it won't pick up the 111,112,113 etc. as well?
It will!
So to prevent that, type:
boxes_no_11 <- filter(blue_tits, grepl("^[A-Z]11$", Nestbox))
Here, the “$” sign tells us that we don’t want to search for anything that has letters or numbers after 11, to avoid picking up 111, 112, 113 etc.
“^[A-Z]” tells R that it can be proceeded by any letter but not a number, for the same reasons as above.
And if you did a little bit of googling, you might have noticed that in fact, there are two functions: grep() and grepl(), which stands for grep logical. If you are curious what the difference is, you can try typing this in the console:
grepl("A", blue_tits$Nestbox)
VS
grep("A", blue_tits$Nestbox)
I’m proud that you made it all the way down here. Not keeping you any longer!
Have a great evening,
Agnes
PS: Here is the survey in which you can tell me what R topic you find particularly confusing and why you want to learn it so that we can shape this space together!