-
Notifications
You must be signed in to change notification settings - Fork 19
Description
I am working on a simulation study that looks at performance when lambda is not determined a priori, but is instead calculated by cross-validation. I am doing this as an independent verification of the results found in Taylor & Tibshirani (2018) that show using cross-validation yields valid inferential statistics. (I know that Loftus also proposed a way to deal with a lambda determined by cross-validation, but it doesn't appear to be in the package yet, and the simulations in the 2018 paper performed well enough for me.)
I see that in the documentation it says that {glmnet} uses the 1/n parameterization, whereas {selectiveInference} uses the common parameterization. The documentation shows how to go from common lambda and transform it to something that {glmnet} can use. I need to do the opposite: Go from something cv.glmnet() gives me, and turn it into the lambda on the common scale that fixedLassoInf() wants.
Specifically, the {glmnet} documentation reads:
Note also that for "gaussian", glmnet standardizes y to have unit variance (using 1/n rather than 1/(n-1) formula) before computing its lambda sequence (and then unstandardizes the resulting coefficients); if you wish to reproduce/compare results with other software, best to supply a standardized y
While {selectiveInference} says:
Estimated lasso coefficients (e.g., from glmnet). This is of length p (so the intercept is not included as the first component). Be careful! This function uses the "standard" lasso objective... In contrast, glmnet multiplies the first term by a factor of 1/n. So after running glmnet, to extract the beta corresponding to a value lambda, you need to use beta = coef(obj,s=lambda/n)[-1]...
For a reproducible example, see the code below. My question specifically concerns how to adjust this line: si_lambda <- glmnet_lambda. That is, what transformation do I do to go from a lambda cv.glmnet() gives me (I assign this to glmnet_lambda) into a lambda that {selectiveInference} will use (which I call si_lambda)?
My original thought was that, since the documentation says to divide by n, my thinking would be to multiply what cv.glmnet() gives me by my sample size. That runs without throwing a warning or an error, but it gives me a lambda of 188.5121, which feels wrong. Apologies if that is the answer and I'm just being dense—but I wanted to make sure I am going from one software to the other in an appropriate manner.
library(glmnet)
library(selectiveInference)
library(tidyverse)
set.seed(1839)
n <- 1000 # sample size
B <- c(0, 1, 0) # intercept 0, beta1 = 1, beta2 = 0
eps_sd <- 1 # sd of the error
# make data
X <- cbind(1, replicate(length(B) - 1, rnorm(n, 0, 1)))
y <- X %*% B + rnorm(n, 0, eps_sd)
dat <- as.data.frame(X[, -1])
dat <- as_tibble(cbind(dat, y))
# get lambda by way of cross-validation
glmnet_lambda <- cv.glmnet(
x = as.matrix(select(dat, -y)),
y = dat$y
) %>%
getElement("lambda.1se")
# run glmnet with that lambda
m1 <- glmnet(
x = as.matrix(select(dat, -y)),
y = dat$y,
lambda = glmnet_lambda
)
# get coefs from that model, dropping intercept, per the docs
m1_coefs <- coef(m1)[-1]
# what reparameterization do I do here?
si_lambda <- glmnet_lambda
# do post-selection inference with m1
# runs with warning, so I assume parameterized incorrectly -- how to fix?
m2 <- fixedLassoInf(
x = as.matrix(select(dat, -y)),
y = dat$y,
beta = m1_coefs,
lambda = si_lambda
)
And session information:
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.4
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.6
[4] purrr_0.3.4 readr_1.4.0 tidyr_1.1.3
[7] tibble_3.1.2 ggplot2_3.3.3 tidyverse_1.3.1
[10] selectiveInference_1.2.5 MASS_7.3-54 adaptMCMC_1.4
[13] coda_0.19-4 survival_3.2-11 intervals_0.15.2
[16] glmnet_4.1-1 Matrix_1.3-3