Hi!
Having graduated in Zoology, I am not a big fan of zoos and wildlife parks. Recently, though, I made an exception and visited a butterfly conservatory. It’s mid-December, but with no snow in sight, the idea didn’t feel too odd to me. Plus, populations of butterflies and other insects are in decline, so I don’t get to see them very often anymore 😔. It’s such a pity, as they are truly bizarre creatures. For instance, while it's true that butterflies suck their food up with a proboscis, most of their taste buds are located on their… feet (tarsi).
The beautiful thing about R is that declining or non-existent populations are not always an issue— you can simply simulate them. Of course, there are some rules. I don’t recommend opening your Excel sheet and randomly changing numbers or tweaking them to better fit your statistics. I also don’t recommend fabricating data out of thin air. That’s cheating. And you may end up making some not-so-flattering headlines, like this one.
I’m thinking about generating an R population based on a set of defined rules, which are clear to the reader. While simulated populations may seem less exciting than real ones, they are still very useful. We often create them to compare how much the “real world” deviates from a simulated reality with known parameters. We won’t go that far today, but we will learn how to simulate a dataframe which I used so many times in the past. Our goal will be to create some random nest boxes and assign fictional egg-laying and hatching dates, resembling those of my blue tit population 🐦.
# Restrict the pool of available letters
letters <- c("A", "B", "C", "E", "J", "G", "H", "X")
# Create a vector of random letters
random_letters <- sample(letters, size = 153, replace = TRUE)
# Create a vector of unique random numbers
random_numbers <- sample(1:99, size = 153, replace = FALSE)
# Combine letters and numbers into nest box names
nest_box_names <- paste0(random_letters, random_numbers)
Our fictional nest boxes will start with one of the letters listed above and will be followed by numbers from the 1:153 range. To generate the letters and numbers, we’ll use the , ✨sample() ✨ function — the “replace” argument determines whether values can repeat. The paste0() function will help us glue the letters and numbers into a single string.
# Setting the seed
set.seed(42)
# Simulating egg-laying dates
egg_laying_dates <- rnorm(153, mean = 38, sd = 7)
# Simulating hatching dates
hatching_dates <- egg_laying_dates + rnorm(153, mean = 22, sd = 2)
A funny thing about humans is that we excel at chaos and randomness. Computers, on the other hand, cannot generate truly random values — they can only mimic randomness 💻.
Sometimes, however, this handicap turns out to be handy — for example, when it’s necessary to replicate someone else’s analysis. For that even made up data has to be be made up in exactly the same way. That's when the seed() function comes in. We don't have to set the seed to 42 — it can be your lucky number or the first thing that comes to your mind. If you change it though, our final data frames will look a bit different.
For the egg-laying and hatching dates, we will draw values from a normal distribution using the rnorm() function, specifying our means and standard deviations. It's important to remember that chicks hatch only after all the eggs have been laid and incubated, a process that typically takes about twenty-something days.
# Create a final data frame
nest_data <- data.frame(NestBox = nest_box_names, EggLayingDate = round(egg_laying_dates), HatchingDate = round(hatching_dates))
Now, we can include everything in one data frame. Round() function will help us make all the dates biology sensible. You can also quickly plot these the two columns of dates against each other to see whether our dataset looks more or less realistic.
And voilà!
Have a great evening,
Aga
PS: Here is the survey in which you can tell me what R topic you find particularly confusing and why you want to learn it so that we can shape this space together!