Enjoy R: Looping in R

Loops are run in most applications and are supported by all languages.
In general, there are more than one way to execute the same task via looping, and the efficiency of each choice varies among languages.

This post is not intended to demonstrate any general truth about R loops, but aims to provide some insights into some common methods for looping in base-R.
In particular, four possible types of loop will be executed, all simulating the probability of obtaining as many successess as the number of trials for a binomial random variable.
Such a probability is clearly equal to πn, where π is the probability of success in each independent trial, and n denotes the size, i.e. the number of trials.
The purpose of looping is to estimate the probability of NO FAILURE for different combinations of n and π.

The No_Failure function

Let us build a function that estimates the long-term proportion of no failures for a given size and probability of success. The n_rep argument indicates the number of deviates to be drawn from the binomial distribution.

No_Failure = function(n, PI, n_rep) {
  mean(rbinom(n_rep, n, PI) == n)
}

ns = seq_len(80)
PIs = seq(0, 1, .05)
n_rep = 1000

Method (I). A standard for loop

The for loop may be the first solution everyone’s brain comes out with for a problem like this, because it really represents the way each of us reasons when the number of iteration is fixed and not conditional on anything.

FOR_fun = function() {
  outputFor = matrix(0, nrow = length(ns), ncol = length(PIs))
  for(i in seq_along(ns)) {
    for(j in seq_along(PIs)) {
      outputFor[i, j] = No_Failure(ns[i], PIs[j], n_rep)
    }
  }
  outputFor
}

Method (II). A standard while loop

Every for loop can be expressed by a while loop. Implementing a while loop requires a bit more code, and some more attention on the counter(s) to be updated.

WHILE_fun = function() {
  outputWhile = matrix(0, nrow = length(ns), ncol = length(PIs))
  i = 1
  while(i <= length(ns)) {
    j = 1
    while(j <= length(PIs)) {
      outputWhile[i, j] = No_Failure(ns[i], PIs[j], n_rep)
      j = j + 1
    }
    i = i + 1
  }
  outputWhile
}

Method (III). A double sapply

The apply family in R is rich of functions which really identify the R-way of looping.
The sapply() function is one of the most common, in that it works mainly on vectors (but also lists). It is very powerful, as it accomplishes the task with very little code.
Moreover, this function creates automatically the output in a simplified version, therefore no empty matrix should be initialized as it happened in the previous two methods.

SAPPLY_fun = function() {
  sapply(PIs, function(j) sapply(ns, No_Failure, j, n_rep))
}

Method (IV). A double Vectorize

The Vectorize() function is the one I like the most. What it does is to create another function whose arguments are the same as the initial function, but some of them can be vectorized. As far as I see, sometimes (an this is the case) vectorizing more than one argument at a time does not give the expected output.

VECTORIZE_fun = function() {
  No_Failure_vect = Vectorize(Vectorize(No_Failure, 'n'), 'PI')
  No_Failure_vect(ns, PIs, n_rep)
}

Benchmarking the performances

library(microbenchmark)
microbenchmark(FOR_fun(),
               WHILE_fun(),
               SAPPLY_fun(),
               VECTORIZE_fun())benchm

In this case, the four methods do not differ so much in terms of computational time required. The winner is on average the sapply() function, which will be used to run the simulation.

Plotting the outcome of the simulation

library(reshape2)
set.seed(1)
n_rep = 100000 # more precise
mydata = SAPPLY_fun()
dimnames(mydata) = list(n = ns, PI = PIs)
curves = melt(mydata)

library(ggplot2)
ggplot(curves, aes(PI, value, color = n, group = n)) + 
  geom_path() +
  theme_bw() +
  scale_x_continuous(breaks = PIs) +
  scale_y_continuous(breaks = PIs) +
  scale_color_gradient(low = 'green', high = 'navy') +
  xlab(expression(pi)) +
  ylab('Pr(NO FAILURE)')
plot_pi_n

The plot shows all the estimated curves representing the probability of having no failures as a function of π and for different sizes (from 1 to 80).
As n increases, it is less likely to have no failure: for n approaching infinity, the right corner of the graph will approach the point (1, 0).
This confirms that even a really rare event occurs almost certainly if the number of repetitions of the same experiment is considerably large.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: