Category Archives: dplyr

Enjoy R: Do two consecutive seeds behave independently?

I’ve always wondered whether two random seeds in R provide independent results, whatever they are. In particular, I wanted to check if repeating a sampling operation with two consecutive seeds, say set.seed(20) and set.seed(21), this would produce unrelated outputs as expected. Pseudo-randomness in R is based on algorithms I honestly have read nothing about, and …

Continue reading

Enjoy R: Stratified sampling and its application using dplyr

author: Davide Passaretti Simple random sampling is the most common practise when dealing with data sets which are large enough to be split into training and test set for predictive purposes. Think of classification models. You randomly extract, say, 75% of the rows, and that’s a fair technique, at least until you are quite sure that …

Continue reading

Enjoy R: How to make a Pareto Chart using ggplot2 (and dplyr)

Hi all. The well-known choice of pushing ggplot2 users towards a cleaner and more correct way of plotting data, has led to the miss-implementation of a secondary axis. This is at the basis of the difficulty of plotting a Pareto Chart using this smart R package. In this post, I suggest a way to overcome this hurdle, by …

Continue reading