\DeclareMathOperator{\E}{E} \DeclareMathOperator{\Cov}{Cov} \DeclareMathOperator{\Var}{Var} \newcommand{\SD}{\mathit{SD}} \newcommand{\SE}{\mathit{SE}} \DeclareMathOperator{\MSE}{MSE} \DeclareMathOperator{\RE}{RE} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator{\acov}{acov} \DeclareMathOperator{\vect}{vec} \newcommand{\bv}[1]{\boldsymbol{\mathbf{#1}}}

Asymptotic Standard Errors in CFA

Statistics
Author

Mark Lai

Published

April 11, 2020

In our lab meeting I’m going to present the article by Maydeu-Olivares (2017), which talked about standard errors (SEs) of the maximum likelihood estimators (MLEs) in SEM models. As I’m also working on something relevant, and I haven’t found some good references (I’m sure there are some, just too lazy to do a deep search), I decided to write this demo to show how the SEs can be obtained in R and compared that to the lavaan package, which automates these computations.

Define True Model and Simulate Some Data

To keep things simple, I’ll use a one-factor model with four indicators (without mean structure):

\[\mathbf{y} = \boldsymbol{\Lambda} \boldsymbol{\eta} + \boldsymbol{\varepsilon}\]

  • \(\mathbf{y}\) = observed scores
  • \(\boldsymbol{\Lambda}\) = loading matrix (vector in this case)
  • \(\boldsymbol{\eta}\) = latent score (true score)
  • \(\boldsymbol{\varepsilon}\) = random measurement error/unique factor scores
library(lavaan)
># This is lavaan 0.6-17
># lavaan is FREE software! Please report any bugs.
LAM <- rep(.7, 4)  # loading vector
THETA <- diag(1 - LAM^2)
N <- 100
set.seed(1)  # Set seed

I’ll simulate data that follows the normal distribution:

y <- mvnfast::rmvn(n = N, mu = rep(0, 4), sigma = tcrossprod(LAM) + THETA)
# # Check kurtosis
# apply(y, 2, e1071::kurtosis)

Define Log-Likelihood Function

The model parameters are \((\boldsymbol{\lambda}, \boldsymbol{\theta})\), which include four loadings and four uniquenesses, with latent factor variance fixed to 1. Assuming normality, the sample covariance matrix, \(\mathbf{S}\) (Sy in the code), is sufficient statistic:

# Sample covariance S
(Sy <- cov(y) * (N - 1) / N)
>#           [,1]      [,2]      [,3]      [,4]
># [1,] 0.9962082 0.4808498 0.3465193 0.3733755
># [2,] 0.4808498 1.0076039 0.4208120 0.5062814
># [3,] 0.3465193 0.4208120 0.9332648 0.3749744
># [4,] 0.3733755 0.5062814 0.3749744 0.9388328

Note that I divided by \(N\) instead of \(N - 1\), which is the default for lavaan (\(N - 1\) corresponds to Wishart likelihood). When normality holds, \(\mathbf{S}\) follows a Wishart distribution with degrees of freedom \(N\). Denote \(\bv \Sigma = \bv \Sigma(\boldsymbol{\lambda}, \boldsymbol{\theta})\) as the model-implied covariance matrix. The likelihood function is \[\mathcal{L}(\bv \Sigma; S) \propto \frac{\exp\left[- \frac{1}{2}\tr(\bv \Sigma^{-1} N \bv S)\right]} {|\Sigma|^\frac{N}{2}}\] and so the log-likelihood function is \[\mathcal{l}(\bv \Sigma; S) = - \frac{N}{2}\left[\tr(\bv \Sigma^{-1} \bv S) + \log |\Sigma|\right] + C\] where \(C\) is some constant that does not involve \(\bv \Sigma\).

Defining \(\mathcal{l}(\bv \Sigma; S)\) in R:

This is pretty straight forward in R

loglikSigma <- function(pars, S = Sy, N = 100) {
  # pars: First four as lambdas, then thetas
  Sigma <- tcrossprod(pars[1:4]) + diag(pars[5:8])
  - N / 2 * (determinant(Sigma)$modulus[1] + 
       sum(diag(solve(Sigma, S))))
}

MLE

In maximum likelihood estimation, we find the parameter values that would maximize the above log-likelihood function (as log is monotonic). As an example, we consider these two sets of estimates:

# Set 1
lam1 <- c(.6, .6, .8, .7); theta1 <- c(.5, .4, .4, .5)
loglikSigma(c(lam1, theta1))
># [1] -158.8054
# Set 2
lam2 <- c(.7, .7, .8, .7); theta2 <- c(.5, .5, .4, .3)
loglikSigma(c(lam2, theta2))
># [1] -161.6866

The first set gives higher log-likelihood (i.e., less negative) value, so should be preferred. We can try many sets of values until we find the best by hand, but it is very tedious. So instead we rely on some optimizers to help us achieve that. In R, some general-purpose optimizer can be found in the package optim.

One thing to note before we start. Usually optimizers help us find parameters that minimizes an objective function, but here we want to maximizes it. So one trick is to instead minimize the -2 \(\times\) log-likelihood:

minus2lSigma <- function(pars, S = Sy, N = 100) {
  # pars: First four as lambdas, then thetas
  Sigma <- tcrossprod(pars[1:4]) + diag(pars[5:8])
  N * (determinant(Sigma)$modulus[1] + 
         sum(diag(solve(Sigma, S))))
}
minus2lSigma(c(lam1, theta1))
># [1] 317.6109
minus2lSigma(c(lam2, theta2))
># [1] 323.3733

So again the first set is preferred with lower -2 log-likelihood. Now we use optim, with the “L-BFGS-B” algorithm. It requires some starting values, so I put it the values of set 1 (you can verify that the results are the same with set 2 as starting values):

opt_info <- optim(c(lam1, theta1), minus2lSigma, 
                  lower = c(rep(-Inf, 4), rep(0, 4)), 
                  method = "L-BFGS-B", hessian = TRUE)
# Maximum likelihood estimates:
opt_info$par
># [1] 0.6068613 0.7783132 0.5576544 0.6467392 0.6279234 0.4018284 0.6222836
># [8] 0.5205601

So in terms of point estimates, that’s it! Let’s compare the results to lavaan:

fit1 <- lavaan::cfa(model = 'f =~ y1 + y2 + y3 + y4', 
                    std.lv = TRUE, 
                    # Note: lavaan transform S automatically by multiplying 
                    # (N - 1) / N
                    sample.cov = `colnames<-`(cov(y), paste0("y", 1:4)), 
                    sample.nobs = 100)
cbind(optim = opt_info$par, 
      `lavaan::cfa` = lavaan::coef(fit1))
>#            optim lavaan::cfa
># f=~y1  0.6068613   0.6068697
># f=~y2  0.7783132   0.7783153
># f=~y3  0.5576544   0.5576568
># f=~y4  0.6467392   0.6467406
># y1~~y1 0.6279234   0.6279174
># y2~~y2 0.4018284   0.4018292
># y3~~y3 0.6222836   0.6222836
># y4~~y4 0.5205601   0.5205593

So they are essentially the same (the difference in rounding is mainly due to different optimizer and starting values).

Asymptotic Standard Errors

Now onto standard errors, which is the main focus of Maydeu-Olivares (2017). The paper also talk about sandwich estimators which are robust (to some degree) to nonnormality, but I will focus on the ones with observed and expected information first, and then show how to obtain MLM and MLR standard errors in plain R (to some degree).

Although in the statistics literature, generally observed information is preferred over the use of expected information (both of which related to the second partial derivatives of the log-likelihood function, see here), in lavaan the default is to use the expected information, and so I will demonstrate that first.

Using expected information

As described in the paper, the asymptotic covariance matrix of the MLE, which estimates how the estimates vary and covary across repeated samples, has this formula: \[\acov(\hat{\bv \lambda}, \hat{\bv \theta}) = N^{-1} (\bv \Delta' \bv \Gamma^{-1}_E \bv \Delta)^{-1}\] where \[\bv \Gamma^{-1}_E = \frac{1}{2} (\bv \Sigma^{-1} \otimes \bv \Sigma^{-1})\] is the expected information matrix (which will be evaluated as the MLE). Note that the paper also talked about the duplication matrix (which potentially increases computational efficiency), but for simplicity I left it out as we don’t have a huge matrix here.

TODO: Add the explanation on the terms

The asymptotic standard errors are the square root values of the diagonal elements in \(\acov(\hat{\bv \lambda}, \hat{\bv \theta})\). After we obtained the MLEs, we can easily construct the model-implied covariance \(\Sigma\), and let R do the inverse and the Kronecker product (there are computational shortcuts in this example, but not essential in this demo). The \(\bv \Delta\) matrix contains partial derivatives. As an example, with \[\bv \Sigma = \begin{bmatrix} \lambda_1^2 + \theta_1 & & & \\ \lambda_1 \lambda_2 & \lambda_2^2 + \theta_2 & & \\ \lambda_1 \lambda_3 & \lambda_2 \lambda_3 & \lambda_3^2 + \theta_3 & \\ \lambda_1 \lambda_4 & \lambda_2 \lambda_4 & \lambda_3 \lambda_4 & \lambda_4^2 + \theta_4 \\ \end{bmatrix},\] if we align the elements in \(\Sigma\) to be a vector, denoted as \(\vect(\Sigma)\), we have \[\vect(\Sigma) = [\lambda_1^2 + \theta_1, \lambda_1 \lambda_2, \lambda_1 \lambda_3, \cdots]', \] so \[\bv \Delta = \begin{bmatrix} \frac{\partial (\lambda_1^2 + \theta_1)}{\partial \lambda_1} & \frac{\partial (\lambda_1^2 + \theta_1)}{\partial \lambda_2} & \cdots \\ \frac{\partial (\lambda_1 \lambda_2)}{\partial \lambda_1} & \frac{\partial (\lambda_1 \lambda_2)}{\partial \lambda_2} & \cdots \\ \vdots & \ddots & \vdots \\ \end{bmatrix}, \] which is a \(p^2 \times r\) matrix where \(r\) is the number of parameters (8 in this case). This can be obtained by hand, but we can also use the numDeriv::jacobian() function like:

# Using parameter estimates from optim to construct Sigma
(Sigmahat <- tcrossprod(opt_info$par[1:4]) + diag(opt_info$par[5:8]))
>#           [,1]      [,2]      [,3]      [,4]
># [1,] 0.9962040 0.4723281 0.3384188 0.3924810
># [2,] 0.4723281 1.0075997 0.4340297 0.5033656
># [3,] 0.3384188 0.4340297 0.9332620 0.3606569
># [4,] 0.3924810 0.5033656 0.3606569 0.9388317
# Make it into a function, outputting vec(Sigma)
vecSigma <- function(pars) {
  as.vector(tcrossprod(pars[1:4]) + diag(pars[5:8]))
}
# Numeric derivative:
(Delta <- numDeriv::jacobian(vecSigma, opt_info$par))
>#            [,1]      [,2]      [,3]      [,4] [,5] [,6] [,7] [,8]
>#  [1,] 1.2137226 0.0000000 0.0000000 0.0000000    1    0    0    0
>#  [2,] 0.7783132 0.6068613 0.0000000 0.0000000    0    0    0    0
>#  [3,] 0.5576544 0.0000000 0.6068613 0.0000000    0    0    0    0
>#  [4,] 0.6467392 0.0000000 0.0000000 0.6068613    0    0    0    0
>#  [5,] 0.7783132 0.6068613 0.0000000 0.0000000    0    0    0    0
>#  [6,] 0.0000000 1.5566263 0.0000000 0.0000000    0    1    0    0
>#  [7,] 0.0000000 0.5576544 0.7783132 0.0000000    0    0    0    0
>#  [8,] 0.0000000 0.6467392 0.0000000 0.7783132    0    0    0    0
>#  [9,] 0.5576544 0.0000000 0.6068613 0.0000000    0    0    0    0
># [10,] 0.0000000 0.5576544 0.7783132 0.0000000    0    0    0    0
># [11,] 0.0000000 0.0000000 1.1153087 0.0000000    0    0    1    0
># [12,] 0.0000000 0.0000000 0.6467392 0.5576544    0    0    0    0
># [13,] 0.6467392 0.0000000 0.0000000 0.6068613    0    0    0    0
># [14,] 0.0000000 0.6467392 0.0000000 0.7783132    0    0    0    0
># [15,] 0.0000000 0.0000000 0.6467392 0.5576544    0    0    0    0
># [16,] 0.0000000 0.0000000 0.0000000 1.2934784    0    0    0    1

Now, we’re ready to compute the asymptotic standard errors:

invSigmahat <- solve(Sigmahat)
# Expected Fisher information
exp_I <- invSigmahat %x% invSigmahat / 2
# Asymptotic covariance
acov_exp <- solve(crossprod(Delta, exp_I) %*% Delta) / N
# Asymptotic SE
ase_exp <- sqrt(diag(acov_exp))

Compared to lavaan:

cbind(optim = ase_exp, 
      `lavaan::cfa ('expected')` = sqrt(diag(lavaan::vcov(fit1))))
>#             optim lavaan::cfa ('expected')
># f=~y1  0.10400371               0.10400353
># f=~y2  0.10242249               0.10242227
># f=~y3  0.10139392               0.10139383
># f=~y4  0.09991077               0.09991052
># y1~~y1 0.10889211               0.10889173
># y2~~y2 0.10757581               0.10757529
># y3~~y3 0.10420294               0.10420288
># y4~~y4 0.09956021               0.09955978

So that matches pretty well!

Observed information (using Hessian)

While Maydeu-Olivares (2017) also presented the formula (in matrix form) for the asymptotic covariance estimates using the expected information, in practice such a matrix is usually obtained based on the Hessian, which usually is part of the output in the maximization algorithm in MLE. For example, going back to the output of the optim() call,

# As we've multiplied by -2 before, we need to divide by 2 now.
# The Hessian is not affected by multiplication of -1
acov_obs <- solve(opt_info$hessian / 2)
(ase_obs <- sqrt(diag(solve(opt_info$hessian / 2))))
># [1] 0.10381614 0.10240102 0.10189012 0.09983351 0.10862886 0.10752628 0.10480306
># [8] 0.09943057

And compare to lavaan using the information = 'observed' option:

fit1_obs <- lavaan::update(fit1, information = 'observed')
cbind(optim = ase_obs, 
      `lavaan::cfa ('obs')` = sqrt(diag(lavaan::vcov(fit1_obs))))
>#             optim lavaan::cfa ('obs')
># f=~y1  0.10381614          0.10381636
># f=~y2  0.10240102          0.10240110
># f=~y3  0.10189012          0.10189033
># f=~y4  0.09983351          0.09983363
># y1~~y1 0.10862886          0.10862830
># y2~~y2 0.10752628          0.10752643
># y3~~y3 0.10480306          0.10480358
># y4~~y4 0.09943057          0.09943066

So again it’s pretty close. You can try using the formula in Maydeu-Olivares (2017) to verify the results.

MLM/MLMV

# Asymptotic covariance of S (4th moment)
Wi_adf <- apply(t(y) - colMeans(y), 2, tcrossprod)
adf_I <- tcrossprod(Wi_adf - as.vector(Sy)) / N
# Sandwich estimator:
Deltat_exp_I <- crossprod(Delta, exp_I)
acov_mlm <- acov_exp %*% Deltat_exp_I %*% adf_I %*% 
  crossprod(Deltat_exp_I, acov_exp) * N
ase_mlm <- sqrt(diag(acov_mlm))

Compare to lavaan with estimator = 'MLR':

# lavaan
fit1_mlm <- lavaan::cfa(model = 'f =~ y1 + y2 + y3 + y4', 
                        std.lv = TRUE, 
                        # Note: Full data needed for MLM
                        data = `colnames<-`(y, paste0("y", 1:4)), 
                        estimator = "MLM")
cbind(optim = ase_mlm, 
      `lavaan::cfa ('MLM')` = sqrt(diag(lavaan::vcov(fit1_mlm))))
>#             optim lavaan::cfa ('MLM')
># f=~y1  0.12965429          0.12965371
># f=~y2  0.10522671          0.10522608
># f=~y3  0.08901649          0.08901603
># f=~y4  0.08981597          0.08981560
># y1~~y1 0.10278055          0.10278061
># y2~~y2 0.10252608          0.10252560
># y3~~y3 0.09792141          0.09792136
># y4~~y4 0.10608980          0.10608957

MLR

# Cross-product matrix (without mean structure)
# First, need the deviation/residual of covariance for each individual
# observation
Sdev <- apply(t(y) - colMeans(y), 2, tcrossprod) - as.vector(Sigmahat)
g_score <- crossprod(Sdev, invSigmahat %x% invSigmahat) / 2
xp_I <- crossprod(g_score) / N  # Gamma^-1_XP
# Sandwich estimator:
acov_mlr <- acov_obs %*% crossprod(Delta, xp_I %*% Delta) %*% acov_obs * N
ase_mlr <- sqrt(diag(acov_mlr))

Compare to lavaan with estimator = 'MLR':

# lavaan
fit1_mlr <- lavaan::cfa(model = 'f =~ y1 + y2 + y3 + y4', 
                        std.lv = TRUE, 
                        # Note: Full data needed for MLR
                        data = `colnames<-`(y, paste0("y", 1:4)), 
                        estimator = "MLR")
cbind(optim = ase_mlr, 
      `lavaan::cfa ('MLR, Hessian')` = sqrt(diag(lavaan::vcov(fit1_mlr))))
>#             optim lavaan::cfa ('MLR, Hessian')
># f=~y1  0.12884960                   0.12885037
># f=~y2  0.10504885                   0.10504891
># f=~y3  0.08977431                   0.08977434
># f=~y4  0.08883634                   0.08883676
># y1~~y1 0.10203109                   0.10203105
># y2~~y2 0.10139778                   0.10139847
># y3~~y3 0.09785923                   0.09786031
># y4~~y4 0.10566978                   0.10567060

Bibliography

Maydeu-Olivares, Alberto. 2017. “Maximum Likelihood Estimation of Structural Equation Models for Continuous Data: Standard Errors and Goodness of Fit.” Structural Equation Modeling: A Multidisciplinary Journal 24 (3): 383–94. https://doi.org/10.1080/10705511.2016.1269606.