library(lavaan)
># This is lavaan 0.6-17
># lavaan is FREE software! Please report any bugs.
<- rep(.7, 4) # loading vector
LAM <- diag(1 - LAM^2)
THETA <- 100
N set.seed(1) # Set seed
Mark Lai
April 11, 2020
In our lab meeting I’m going to present the article by Maydeu-Olivares (2017), which talked about standard errors (SEs) of the maximum likelihood estimators (MLEs) in SEM models. As I’m also working on something relevant, and I haven’t found some good references (I’m sure there are some, just too lazy to do a deep search), I decided to write this demo to show how the SEs can be obtained in R and compared that to the lavaan
package, which automates these computations.
To keep things simple, I’ll use a one-factor model with four indicators (without mean structure):
\[\mathbf{y} = \boldsymbol{\Lambda} \boldsymbol{\eta} + \boldsymbol{\varepsilon}\]
># This is lavaan 0.6-17
># lavaan is FREE software! Please report any bugs.
I’ll simulate data that follows the normal distribution:
The model parameters are \((\boldsymbol{\lambda}, \boldsymbol{\theta})\), which include four loadings and four uniquenesses, with latent factor variance fixed to 1. Assuming normality, the sample covariance matrix, \(\mathbf{S}\) (Sy
in the code), is sufficient statistic:
># [,1] [,2] [,3] [,4]
># [1,] 0.9962082 0.4808498 0.3465193 0.3733755
># [2,] 0.4808498 1.0076039 0.4208120 0.5062814
># [3,] 0.3465193 0.4208120 0.9332648 0.3749744
># [4,] 0.3733755 0.5062814 0.3749744 0.9388328
Note that I divided by \(N\) instead of \(N - 1\), which is the default for lavaan
(\(N - 1\) corresponds to Wishart likelihood). When normality holds, \(\mathbf{S}\) follows a Wishart distribution with degrees of freedom \(N\). Denote \(\bv \Sigma = \bv \Sigma(\boldsymbol{\lambda}, \boldsymbol{\theta})\) as the model-implied covariance matrix. The likelihood function is \[\mathcal{L}(\bv \Sigma; S) \propto
\frac{\exp\left[- \frac{1}{2}\tr(\bv \Sigma^{-1} N \bv S)\right]}
{|\Sigma|^\frac{N}{2}}\] and so the log-likelihood function is \[\mathcal{l}(\bv \Sigma; S) =
- \frac{N}{2}\left[\tr(\bv \Sigma^{-1} \bv S) +
\log |\Sigma|\right] + C\] where \(C\) is some constant that does not involve \(\bv \Sigma\).
This is pretty straight forward in R
In maximum likelihood estimation, we find the parameter values that would maximize the above log-likelihood function (as log is monotonic). As an example, we consider these two sets of estimates:
># [1] -158.8054
># [1] -161.6866
The first set gives higher log-likelihood (i.e., less negative) value, so should be preferred. We can try many sets of values until we find the best by hand, but it is very tedious. So instead we rely on some optimizers to help us achieve that. In R, some general-purpose optimizer can be found in the package optim
.
One thing to note before we start. Usually optimizers help us find parameters that minimizes an objective function, but here we want to maximizes it. So one trick is to instead minimize the -2 \(\times\) log-likelihood:
So again the first set is preferred with lower -2 log-likelihood. Now we use optim
, with the “L-BFGS-B” algorithm. It requires some starting values, so I put it the values of set 1 (you can verify that the results are the same with set 2 as starting values):
opt_info <- optim(c(lam1, theta1), minus2lSigma,
lower = c(rep(-Inf, 4), rep(0, 4)),
method = "L-BFGS-B", hessian = TRUE)
# Maximum likelihood estimates:
opt_info$par
># [1] 0.6068613 0.7783132 0.5576544 0.6467392 0.6279234 0.4018284 0.6222836
># [8] 0.5205601
So in terms of point estimates, that’s it! Let’s compare the results to lavaan
:
># optim lavaan::cfa
># f=~y1 0.6068613 0.6068697
># f=~y2 0.7783132 0.7783153
># f=~y3 0.5576544 0.5576568
># f=~y4 0.6467392 0.6467406
># y1~~y1 0.6279234 0.6279174
># y2~~y2 0.4018284 0.4018292
># y3~~y3 0.6222836 0.6222836
># y4~~y4 0.5205601 0.5205593
So they are essentially the same (the difference in rounding is mainly due to different optimizer and starting values).
Now onto standard errors, which is the main focus of Maydeu-Olivares (2017). The paper also talk about sandwich estimators which are robust (to some degree) to nonnormality, but I will focus on the ones with observed and expected information first, and then show how to obtain MLM and MLR standard errors in plain R (to some degree).
Although in the statistics literature, generally observed information is preferred over the use of expected information (both of which related to the second partial derivatives of the log-likelihood function, see here), in lavaan
the default is to use the expected information, and so I will demonstrate that first.
As described in the paper, the asymptotic covariance matrix of the MLE, which estimates how the estimates vary and covary across repeated samples, has this formula: \[\acov(\hat{\bv \lambda}, \hat{\bv \theta}) = N^{-1} (\bv \Delta' \bv \Gamma^{-1}_E \bv \Delta)^{-1}\] where \[\bv \Gamma^{-1}_E = \frac{1}{2} (\bv \Sigma^{-1} \otimes \bv \Sigma^{-1})\] is the expected information matrix (which will be evaluated as the MLE). Note that the paper also talked about the duplication matrix (which potentially increases computational efficiency), but for simplicity I left it out as we don’t have a huge matrix here.
TODO: Add the explanation on the terms
The asymptotic standard errors are the square root values of the diagonal elements in \(\acov(\hat{\bv \lambda}, \hat{\bv \theta})\). After we obtained the MLEs, we can easily construct the model-implied covariance \(\Sigma\), and let R do the inverse and the Kronecker product (there are computational shortcuts in this example, but not essential in this demo). The \(\bv \Delta\) matrix contains partial derivatives. As an example, with \[\bv \Sigma = \begin{bmatrix}
\lambda_1^2 + \theta_1 & & & \\
\lambda_1 \lambda_2 & \lambda_2^2 + \theta_2 & & \\
\lambda_1 \lambda_3 & \lambda_2 \lambda_3 &
\lambda_3^2 + \theta_3 & \\
\lambda_1 \lambda_4 & \lambda_2 \lambda_4 &
\lambda_3 \lambda_4 & \lambda_4^2 + \theta_4 \\
\end{bmatrix},\] if we align the elements in \(\Sigma\) to be a vector, denoted as \(\vect(\Sigma)\), we have \[\vect(\Sigma) = [\lambda_1^2 + \theta_1, \lambda_1 \lambda_2, \lambda_1 \lambda_3, \cdots]', \] so \[\bv \Delta = \begin{bmatrix}
\frac{\partial (\lambda_1^2 + \theta_1)}{\partial \lambda_1} &
\frac{\partial (\lambda_1^2 + \theta_1)}{\partial \lambda_2} & \cdots \\
\frac{\partial (\lambda_1 \lambda_2)}{\partial \lambda_1} &
\frac{\partial (\lambda_1 \lambda_2)}{\partial \lambda_2} & \cdots \\
\vdots & \ddots & \vdots \\
\end{bmatrix}, \] which is a \(p^2 \times r\) matrix where \(r\) is the number of parameters (8 in this case). This can be obtained by hand, but we can also use the numDeriv::jacobian()
function like:
# Using parameter estimates from optim to construct Sigma
(Sigmahat <- tcrossprod(opt_info$par[1:4]) + diag(opt_info$par[5:8]))
># [,1] [,2] [,3] [,4]
># [1,] 0.9962040 0.4723281 0.3384188 0.3924810
># [2,] 0.4723281 1.0075997 0.4340297 0.5033656
># [3,] 0.3384188 0.4340297 0.9332620 0.3606569
># [4,] 0.3924810 0.5033656 0.3606569 0.9388317
# Make it into a function, outputting vec(Sigma)
vecSigma <- function(pars) {
as.vector(tcrossprod(pars[1:4]) + diag(pars[5:8]))
}
# Numeric derivative:
(Delta <- numDeriv::jacobian(vecSigma, opt_info$par))
># [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
># [1,] 1.2137226 0.0000000 0.0000000 0.0000000 1 0 0 0
># [2,] 0.7783132 0.6068613 0.0000000 0.0000000 0 0 0 0
># [3,] 0.5576544 0.0000000 0.6068613 0.0000000 0 0 0 0
># [4,] 0.6467392 0.0000000 0.0000000 0.6068613 0 0 0 0
># [5,] 0.7783132 0.6068613 0.0000000 0.0000000 0 0 0 0
># [6,] 0.0000000 1.5566263 0.0000000 0.0000000 0 1 0 0
># [7,] 0.0000000 0.5576544 0.7783132 0.0000000 0 0 0 0
># [8,] 0.0000000 0.6467392 0.0000000 0.7783132 0 0 0 0
># [9,] 0.5576544 0.0000000 0.6068613 0.0000000 0 0 0 0
># [10,] 0.0000000 0.5576544 0.7783132 0.0000000 0 0 0 0
># [11,] 0.0000000 0.0000000 1.1153087 0.0000000 0 0 1 0
># [12,] 0.0000000 0.0000000 0.6467392 0.5576544 0 0 0 0
># [13,] 0.6467392 0.0000000 0.0000000 0.6068613 0 0 0 0
># [14,] 0.0000000 0.6467392 0.0000000 0.7783132 0 0 0 0
># [15,] 0.0000000 0.0000000 0.6467392 0.5576544 0 0 0 0
># [16,] 0.0000000 0.0000000 0.0000000 1.2934784 0 0 0 1
Now, we’re ready to compute the asymptotic standard errors:
Compared to lavaan
:
># optim lavaan::cfa ('expected')
># f=~y1 0.10400371 0.10400353
># f=~y2 0.10242249 0.10242227
># f=~y3 0.10139392 0.10139383
># f=~y4 0.09991077 0.09991052
># y1~~y1 0.10889211 0.10889173
># y2~~y2 0.10757581 0.10757529
># y3~~y3 0.10420294 0.10420288
># y4~~y4 0.09956021 0.09955978
So that matches pretty well!
While Maydeu-Olivares (2017) also presented the formula (in matrix form) for the asymptotic covariance estimates using the expected information, in practice such a matrix is usually obtained based on the Hessian, which usually is part of the output in the maximization algorithm in MLE. For example, going back to the output of the optim()
call,
# As we've multiplied by -2 before, we need to divide by 2 now.
# The Hessian is not affected by multiplication of -1
acov_obs <- solve(opt_info$hessian / 2)
(ase_obs <- sqrt(diag(solve(opt_info$hessian / 2))))
># [1] 0.10381614 0.10240102 0.10189012 0.09983351 0.10862886 0.10752628 0.10480306
># [8] 0.09943057
And compare to lavaan
using the information = 'observed'
option:
fit1_obs <- lavaan::update(fit1, information = 'observed')
cbind(optim = ase_obs,
`lavaan::cfa ('obs')` = sqrt(diag(lavaan::vcov(fit1_obs))))
># optim lavaan::cfa ('obs')
># f=~y1 0.10381614 0.10381636
># f=~y2 0.10240102 0.10240110
># f=~y3 0.10189012 0.10189033
># f=~y4 0.09983351 0.09983363
># y1~~y1 0.10862886 0.10862830
># y2~~y2 0.10752628 0.10752643
># y3~~y3 0.10480306 0.10480358
># y4~~y4 0.09943057 0.09943066
So again it’s pretty close. You can try using the formula in Maydeu-Olivares (2017) to verify the results.
# Asymptotic covariance of S (4th moment)
Wi_adf <- apply(t(y) - colMeans(y), 2, tcrossprod)
adf_I <- tcrossprod(Wi_adf - as.vector(Sy)) / N
# Sandwich estimator:
Deltat_exp_I <- crossprod(Delta, exp_I)
acov_mlm <- acov_exp %*% Deltat_exp_I %*% adf_I %*%
crossprod(Deltat_exp_I, acov_exp) * N
ase_mlm <- sqrt(diag(acov_mlm))
Compare to lavaan
with estimator = 'MLR'
:
># optim lavaan::cfa ('MLM')
># f=~y1 0.12965429 0.12965371
># f=~y2 0.10522671 0.10522608
># f=~y3 0.08901649 0.08901603
># f=~y4 0.08981597 0.08981560
># y1~~y1 0.10278055 0.10278061
># y2~~y2 0.10252608 0.10252560
># y3~~y3 0.09792141 0.09792136
># y4~~y4 0.10608980 0.10608957
# Cross-product matrix (without mean structure)
# First, need the deviation/residual of covariance for each individual
# observation
Sdev <- apply(t(y) - colMeans(y), 2, tcrossprod) - as.vector(Sigmahat)
g_score <- crossprod(Sdev, invSigmahat %x% invSigmahat) / 2
xp_I <- crossprod(g_score) / N # Gamma^-1_XP
# Sandwich estimator:
acov_mlr <- acov_obs %*% crossprod(Delta, xp_I %*% Delta) %*% acov_obs * N
ase_mlr <- sqrt(diag(acov_mlr))
Compare to lavaan
with estimator = 'MLR'
:
># optim lavaan::cfa ('MLR, Hessian')
># f=~y1 0.12884960 0.12885037
># f=~y2 0.10504885 0.10504891
># f=~y3 0.08977431 0.08977434
># f=~y4 0.08883634 0.08883676
># y1~~y1 0.10203109 0.10203105
># y2~~y2 0.10139778 0.10139847
># y3~~y3 0.09785923 0.09786031
># y4~~y4 0.10566978 0.10567060