# Enjoy R: Do two consecutive seeds behave independently?

I’ve always wondered whether two random seeds in R provide independent results, whatever they are. In particular, I wanted to check if repeating a sampling operation with two consecutive seeds, say `set.seed(20)`

and `set.seed(21)`

, this would produce unrelated outputs as expected. Pseudo-randomness in R is based on algorithms I honestly have read nothing about, and therefore I am only checking this as a simple naif user.

Let’s say we have a vector of one million consecutive seeds, and, under each of them, we want to sample an integer from 1 to 10. Hence, every time one of these 10 numbers shows up, we can start to count the failures we observe before that specific number shows up again. Theoretically, the distribution of failures for each of the 10 numbers has to follow a geometric distribution of probability 0.1.

```
random_num <- 8217015 # from random.org
N <- 1000000
seeds <- seq_len(N) + random_num
numbers <- sapply(seeds, function(x) { set.seed(x); sample(1:10, 1) })
library(tidyverse)
mydata0 <- data_frame(seeds = seeds, numbers = numbers)
mydata0
```

```
## # A tibble: 1,000,000 x 2
## seeds numbers
##
## 1 8217016 5
## 2 8217017 9
## 3 8217018 4
## 4 8217019 6
## 5 8217020 4
## 6 8217021 2
## 7 8217022 9
## 8 8217023 7
## 9 8217024 2
## 10 8217025 5
## # ... with 999,990 more rows
```

```
mydata <- mydata0 %>%
mutate(seeds = 1:N) %>%
group_by(numbers) %>%
mutate(failures = seeds - lag(seeds) - 1) %>%
ungroup() %>%
drop_na()
mydata
```

```
## # A tibble: 999,990 x 3
## seeds numbers failures
##
## 1 5 4 1
## 2 7 9 4
## 3 9 2 2
## 4 10 5 8
## 5 11 4 5
## 6 12 4 0
## 7 14 3 0
## 8 15 6 10
## 9 16 4 3
## 10 19 1 1
## # ... with 999,980 more rows
```

```
summ_table <- mydata %>%
group_by(numbers, failures) %>%
summarise(n = n()) %>%
mutate(frequencies = n/sum(n)) %>%
ungroup()
summ_table
```

```
## # A tibble: 902 x 4
## numbers failures n frequencies
##
## 1 1 0 8801 0.08780016
## 2 1 1 9068 0.09046379
## 3 1 2 6929 0.06912479
## 4 1 3 7566 0.07547960
## 5 1 4 7109 0.07092050
## 6 1 5 6305 0.06289967
## 7 1 6 5371 0.05358194
## 8 1 7 5447 0.05434013
## 9 1 8 4529 0.04518201
## 10 1 9 4166 0.04156067
## # ... with 892 more rows
```

```
max_plot <- max(summ_table$failures)
qplot(data = summ_table, failures, frequencies, geom = 'path') +
facet_wrap(~ numbers, ncol = 2) +
scale_x_continuous(breaks = 0:max_plot) +
geom_line(data = data.frame(x = 0:max_plot,
y = dgeom(0:max_plot, .1)),
aes(x, y), color = 'red', alpha = .5) +
theme_bw() +
theme(axis.text.x = element_blank(),
panel.grid.major = element_line(colour = "darkgrey"))
```

The red curve is the theoretical geometric model, whereas the black one is the observed distribution of the failures for each of the 10 numbers. We immediately see that a given number is less likely to be sampled under two consecutive seeds. This is clearly not what a statistician trusting a software would like to observe.

Let’s see whether shuffling the seeds (so to remove contiguity) would fix the problem.

```
mydata2 <- mydata0 %>%
sample_frac() %>% # shuffling here
mutate(seeds = 1:N) %>%
group_by(numbers) %>%
mutate(failures = seeds - lag(seeds) - 1) %>%
ungroup() %>%
drop_na()
summ_table2 <- mydata2 %>%
group_by(numbers, failures) %>%
summarise(n = n()) %>%
mutate(frequencies = n/sum(n)) %>%
ungroup()
max_plot2 <- max(summ_table2$failures)
qplot(data = summ_table2, failures, frequencies, geom = 'path') +
facet_wrap(~ numbers, ncol = 2) +
scale_x_continuous(breaks = 0:max_plot2) +
geom_line(data = data.frame(x = 0:max_plot2,
y = dgeom(0:max_plot2, .1)),
aes(x, y), color = 'red', alpha = .5) +
theme_bw() +
theme(axis.text.x = element_blank(),
panel.grid.major = element_line(colour = "darkgrey"))
```

As we expected, now the red curve and the black curve seem to be completely overlapping.

Hence, in whatever application you would need to set seeds in different iterations avoid using consecutive seeds!

- Posted in: dplyr ♦ ggplot2 ♦ R ♦ rstats ♦ statistical software
- Tagged: probability theory, r, randomness, rstats, statistics

## Recent Comments