Enjoy R

Estimate an equation of lagged variables

When you deal with a multiple time series, you often have to estimate equations for modeling the relationships between variables.
And what you always come to do is to lag them, with the hope to find the best description of one time series, given the others.
What I wrote is a function based on the still existing function lm, but with a quicker and easier input and an output of class lm which includes some summarizing statistics.

estimate.equation <- function(d.vb, matrix.ind.vbs, list.lags, intercept = T, 
    display.statistics = T) {

    dipvrbname <- deparse(substitute(d.vb))

    orig.len <- length(d.vb)

    lagstoelim <- max(unlist(list.lags))

    if (lagstoelim != 0) 
        d.vb <- d.vb[-(1:lagstoelim)]

    suppressMessages(library(foreach))
    suppressMessages(library(car))

    coef. <- foreach(i = 1:ncol(matrix.ind.vbs), .combine = cbind) %do% {

        foreach(j = list.lags[[i]], .combine = cbind) %do% matrix.ind.vbs[(lagstoelim + 
            1 - j):(orig.len - j), i]

    }

    colnames(coef.) <- unlist(foreach(i = 1:ncol(matrix.ind.vbs)) %do% paste0(colnames(matrix.ind.vbs)[i], 
        "_Lag", list.lags[[i]]))

    if (intercept) {
        model <- lm(d.vb ~ coef.)
    } else {
        model <- lm(d.vb ~ coef. + 0)
    }

    if (display.statistics) {

        cat("Equation Estimated on", as.character(Sys.Date()))
        cat(". Time:", format(Sys.time(), "%H:%M"), "\n")
        cat("Dipendent Variable:", dipvrbname, "\nMethod: Least Squares\n")
        cat("Observations included:", length(d.vb), "out of", orig.len, "\n\n")
        cat("R-squared:", summary(model)$r.squared, "\n")
        cat("Adjusted R-squared:", summary(model)$adj.r.squared, "\n")
        cat("S.E. of regression:", summary(model)$sigma, "\n")
        cat("Sum of Squared Residuals:", deviance(model), "\n")
        cat("F-statistic:", summary(model)$fstatistic[1], "\n")
        cat("p-value F-statistic:", formatC(pf(summary(model)$fstatistic[1], 
            summary(model)$fstatistic[2], summary(model)$fstatistic[3], lower.tail = F), 
            format = "f", digits = 4), "\n")
        cat("Mean Dependent Variable:", mean(d.vb), "\n")
        cat("S.D. Dependent Variable:", sd(d.vb), "\n")
        cat("Akaike Info Criterion:", 2/length(d.vb) * (-logLik(model) + length(model$coef)), 
            "\n")
        cat("Schwarz Criterion:", 1/length(d.vb) * (-2 * logLik(model) + length(model$coef) * 
            log(length(d.vb))), "\n")
        cat("Hannan-Quinn Info Criterion:", 2/length(d.vb) * (-logLik(model) + 
            length(model$coef) * log(log(length(d.vb)))), "\n")
        cat("Log likelihood:", logLik(model), "\n")
        cat("Durbin-Watson statistic:", durbinWatsonTest(model)$dw, "\n\n")

    }

    return(model)

}

Let’s use a dataset to have a look at how the function works.

suppressMessages(library(vars))

data(Canada)

estimate.equation(d.vb = Canada[, 1], matrix.ind.vbs = Canada, list.lags = list(1:4, 
    0:4, 0:4, 0:4), intercept = T, display.statistics = T)

## Equation Estimated on 2013-07-25. Time: 20:42 
## Dipendent Variable: Canada[, 1] 
## Method: Least Squares
## Observations included: 80 out of 84 
## 
## R-squared: 0.9995 
## Adjusted R-squared: 0.9993 
## S.E. of regression: 0.2362 
## Sum of Squared Residuals: 3.347 
## F-statistic: 5789 
## p-value F-statistic: 0.0000 
## Mean Dependent Variable: 945 
## S.D. Dependent Variable: 8.816 
## Akaike Info Criterion: 0.1639 
## Schwarz Criterion: 0.7594 
## Hannan-Quinn Info Criterion: 0.4027 
## Log likelihood: 13.44 
## Durbin-Watson statistic: 1.965

## 
## Call:
## lm(formula = d.vb ~ coef.)
## 
## Coefficients:
##    (Intercept)     coef.e_Lag1     coef.e_Lag2     coef.e_Lag3  
##      -62.78355         1.11189        -0.60589         0.37358  
##    coef.e_Lag4  coef.prod_Lag0  coef.prod_Lag1  coef.prod_Lag2  
##        0.17793        -0.04555         0.11338         0.00238  
## coef.prod_Lag3  coef.prod_Lag4    coef.rw_Lag0    coef.rw_Lag1  
##       -0.02996         0.02146        -0.01864        -0.06493  
##   coef.rw_Lag2    coef.rw_Lag3    coef.rw_Lag4     coef.U_Lag0  
##        0.02604        -0.03685         0.05640        -0.87155  
##    coef.U_Lag1     coef.U_Lag2     coef.U_Lag3     coef.U_Lag4  
##        0.60206        -0.12122         0.39380         0.09083

Let’s save the equation without the statistics and have the ordinary R summary of it.

eq <- estimate.equation(d.vb = Canada[, 1], matrix.ind.vbs = Canada, list.lags = list(1:4, 
    0:4, 0:4, 0:4), intercept = T, display.statistics = F)
summary(eq)

## 
## Call:
## lm(formula = d.vb ~ coef.)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4910 -0.0985  0.0037  0.1204  0.5186 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -62.78355   49.64958   -1.26  0.21093    
## coef.e_Lag1      1.11189    0.14520    7.66  1.9e-10 ***
## coef.e_Lag2     -0.60589    0.20424   -2.97  0.00432 ** 
## coef.e_Lag3      0.37358    0.19800    1.89  0.06404 .  
## coef.e_Lag4      0.17793    0.13984    1.27  0.20816    
## coef.prod_Lag0  -0.04555    0.04471   -1.02  0.31246    
## coef.prod_Lag1   0.11338    0.07018    1.62  0.11142    
## coef.prod_Lag2   0.00238    0.07259    0.03  0.97398    
## coef.prod_Lag3  -0.02996    0.06855   -0.44  0.66361    
## coef.prod_Lag4   0.02146    0.04549    0.47  0.63888    
## coef.rw_Lag0    -0.01864    0.04027   -0.46  0.64511    
## coef.rw_Lag1    -0.06493    0.05221   -1.24  0.21844    
## coef.rw_Lag2     0.02604    0.05295    0.49  0.62462    
## coef.rw_Lag3    -0.03685    0.05078   -0.73  0.47085    
## coef.rw_Lag4     0.05640    0.03968    1.42  0.16043    
## coef.U_Lag0     -0.87155    0.10750   -8.11  3.2e-11 ***
## coef.U_Lag1      0.60206    0.16573    3.63  0.00058 ***
## coef.U_Lag2     -0.12122    0.18113   -0.67  0.50588    
## coef.U_Lag3      0.39380    0.17752    2.22  0.03033 *  
## coef.U_Lag4      0.09083    0.15513    0.59  0.56042    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.236 on 60 degrees of freedom
## Multiple R-squared:  0.999,  Adjusted R-squared:  0.999 
## F-statistic: 5.79e+03 on 19 and 60 DF,  p-value: <2e-16
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

%d bloggers like this: