# Enjoy R: Looping in R

Loops are run in most applications and are supported by all languages.

In general, there are more than one way to execute the same task via looping, and the efficiency of each choice varies among languages.

This post is not intended to demonstrate any general truth about R loops, but aims to provide some insights into some common methods for looping in base-R.

In particular, four possible types of loop will be executed, all simulating the probability of obtaining as many successess as the number of trials for a binomial random variable.

Such a probability is clearly equal to *π ^{n}*, where

*π*is the probability of success in each independent trial, and

*n*denotes the size, i.e. the number of trials.

The purpose of looping is to estimate the probability of NO FAILURE for different combinations of

*n*and

*π*.

#### The `No_Failure`

function

Let us build a function that estimates the long-term proportion of no failures for a given size and probability of success. The `n_rep`

argument indicates the number of deviates to be drawn from the binomial distribution.

No_Failure = function(n, PI, n_rep) { mean(rbinom(n_rep, n, PI) == n) } ns = seq_len(80) PIs = seq(0, 1, .05) n_rep = 1000

#### Method (I). A standard `for`

loop

The `for`

loop may be the first solution everyone’s brain comes out with for a problem like this, because it really represents the way each of us reasons when the number of iteration is fixed and not conditional on anything.

FOR_fun = function() { outputFor = matrix(0, nrow = length(ns), ncol = length(PIs)) for(i in seq_along(ns)) { for(j in seq_along(PIs)) { outputFor[i, j] = No_Failure(ns[i], PIs[j], n_rep) } } outputFor }

#### Method (II). A standard `while`

loop

Every `for`

loop can be expressed by a `while`

loop. Implementing a `while`

loop requires a bit more code, and some more attention on the counter(s) to be updated.

WHILE_fun = function() { outputWhile = matrix(0, nrow = length(ns), ncol = length(PIs)) i = 1 while(i <= length(ns)) { j = 1 while(j <= length(PIs)) { outputWhile[i, j] = No_Failure(ns[i], PIs[j], n_rep) j = j + 1 } i = i + 1 } outputWhile }

#### Method (III). A *double* `sapply`

The apply family in R is rich of functions which really identify the R-way of looping.

The `sapply()`

function is one of the most common, in that it works mainly on vectors (but also lists). It is very powerful, as it accomplishes the task with very little code.

Moreover, this function creates automatically the output in a simplified version, therefore no empty matrix should be initialized as it happened in the previous two methods.

SAPPLY_fun = function() { sapply(PIs, function(j) sapply(ns, No_Failure, j, n_rep)) }

#### Method (IV). A *double* `Vectorize`

The `Vectorize()`

function is the one I like the most. What it does is to create another function whose arguments are the same as the initial function, but some of them can be vectorized. As far as I see, sometimes (an this is the case) vectorizing more than one argument at a time does not give the expected output.

VECTORIZE_fun = function() { No_Failure_vect = Vectorize(Vectorize(No_Failure, 'n'), 'PI') No_Failure_vect(ns, PIs, n_rep) }

#### Benchmarking the performances

library(microbenchmark) microbenchmark(FOR_fun(), WHILE_fun(), SAPPLY_fun(), VECTORIZE_fun())

In this case, the four methods do not differ so much in terms of computational time required. The winner is on average the `sapply()`

function, which will be used to run the simulation.

### Plotting the outcome of the simulation

library(reshape2) set.seed(1) n_rep = 100000 # more precise mydata = SAPPLY_fun() dimnames(mydata) = list(n = ns, PI = PIs) curves = melt(mydata) library(ggplot2) ggplot(curves, aes(PI, value, color = n, group = n)) + geom_path() + theme_bw() + scale_x_continuous(breaks = PIs) + scale_y_continuous(breaks = PIs) + scale_color_gradient(low = 'green', high = 'navy') + xlab(expression(pi)) + ylab('Pr(NO FAILURE)')

The plot shows all the estimated curves representing the probability of having no failures as a function of *π* and for different sizes (from 1 to 80).

As *n* increases, it is less likely to have no failure: for *n* approaching infinity, the right corner of the graph will approach the point (1, 0).

This confirms that even a really rare event occurs almost certainly if the number of repetitions of the same experiment is considerably large.

- Posted in: ggplot2 ♦ loops ♦ probability theory ♦ R ♦ rstats ♦ statistical software
- Tagged: ggplot2, loops, probability theory, statistics

## Recent Comments