Bayesian Multilevel Factor Model (Part II)

Comparing different approaches for fitting multilevel factor models with random intercepts and latent variances

Author

Mark Lai

Published

August 2, 2024

library(lavaan)
library(MplusAutomation)
library(posterior)
library(cmdstanr)
register_knitr_engine()

In Part I, I explore how to fit a traditional multilevel factor model with STAN. That model, which is one used in > 90% of the literature, makes a very strong assumption that I believe is rarely discussed: that it assumes homogeneous variance of the latent variable across clusters. In Part II, I explore how such an assumption can be relaxed, and what its impact is when ignored.

Simulate Data

Same set up: One-factor four items; two items with random intercepts

However, the SD of the latent variable forms a uniform distribution in the range [0.2, 1.8] and varies across clusters. Also, we arrange the data such that clusters with higher mean $η$ also tend to have lower SD of $η$ .

# Design parameters
set.seed(123)
num_clus <- 500
clus_size <- 10
icc_eta <- .10
lambda <- c(.7, .9, .7, .8)
nu <- c(0, 0.3, -0.5, 0.2)
range_dpsiw <- .80
theta_nu <- c(0, 0, 0.1, 0.05) # intercept variances
theta <- c(.51, .51, .40, .40) # uniqueness
# assume one-factor model holds for now

# Simulate latent variable
clus_id <- rep(1:num_clus, each = clus_size)
etab <- rnorm(num_clus, sd = sqrt(icc_eta / (1 - icc_eta)))
psiw <- 1 + seq(-range_dpsiw, range_dpsiw, length.out = num_clus)
# Arrange etaw and psiw so that they're negatively correlated
etab <- sort(etab)[c(
    sample((num_clus / 2 + 1):num_clus),
    (sample(seq_len(num_clus / 2)))
)]
etaw <- rnorm(num_clus * clus_size, sd = psiw[clus_id])
eta <- etab[clus_id] + etaw
# confirm the simulated variable behaves as expected
# lmer(eta ~ (1 | clus_id))

# Simulate items for each cluster
num_items <- 4
y <- lapply(1:num_clus, FUN = \(j) {
    yj <- nu + rnorm(nu, sd = sqrt(theta_nu)) +
        tcrossprod(
            lambda, # + rnorm(lambda, sd = sqrt(theta_lambda)),
            etab[j] + etaw[clus_id == j]
        ) +
        rnorm(num_items * clus_size, sd = sqrt(theta))
    t(yj)
})
dat <- do.call(rbind, y) |>
    as.data.frame() |>
    setNames(paste0("y", 1:4)) |>
    cbind(id = clus_id)

Frequentist with `lavaan` (assuming homogeneous variance)

mcfa_mod <- "
  level: 1
    f =~ l1 * y1 + l2 * y2 + l3 * y3 + l4 * y4
    f ~~ 1 * f
  level: 2
    f =~ l1 * y1 + l2 * y2 + l3 * y3 + l4 * y4
"
mcfa_fit <- cfa(mcfa_mod, data = dat, cluster = "id", auto.fix.first = FALSE)

Warning: lavaan->lav_object_post_check():  
   some estimated ov variances are negative

summary(mcfa_fit)

lavaan 0.6-19 ended normally after 31 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        21
  Number of equality constraints                     4

  Number of observations                          5000
  Number of clusters [id]                          500

Model Test User Model:
                                                      
  Test statistic                                 0.737
  Degrees of freedom                                 7
  P-value (Chi-square)                           0.998

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Observed
  Observed information based on                Hessian


Level 1 [within]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  f =~                                                
    y1        (l1)    0.760    0.014   56.267    0.000
    y2        (l2)    0.982    0.015   64.482    0.000
    y3        (l3)    0.774    0.013   59.934    0.000
    y4        (l4)    0.867    0.014   62.671    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
    f                 1.000                           
   .y1                0.512    0.013   39.022    0.000
   .y2                0.491    0.015   32.610    0.000
   .y3                0.379    0.011   35.328    0.000
   .y4                0.419    0.012   33.777    0.000


Level 2 [id]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  f =~                                                
    y1        (l1)    0.760    0.014   56.267    0.000
    y2        (l2)    0.982    0.015   64.482    0.000
    y3        (l3)    0.774    0.013   59.934    0.000
    y4        (l4)    0.867    0.014   62.671    0.000

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .y1                0.004    0.018    0.214    0.830
   .y2                0.294    0.022   13.531    0.000
   .y3               -0.471    0.023  -20.428    0.000
   .y4                0.186    0.021    8.660    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .y1               -0.007    0.004   -1.677    0.094
   .y2               -0.003    0.005   -0.649    0.516
   .y3                0.110    0.010   10.755    0.000
   .y4                0.040    0.006    6.163    0.000
    f                 0.097    0.015    6.597    0.000

From the results, it appears that the factor loadings are inflated when ignoring the random latent variance.

Bayesian using Mplus

We can fit a Bayesian model with random latent variance using Mplus, which uses Gibbs sampling and conjugate priors. Specifically, for the latent variable $η_{j}$ in cluster $j$ ,

$η_{j} \sim N (η_{j}^{b}, ψ_{j}^{w}),$

and we model $(η_{j}^{b}, \log [ψ_{j}^{w}])$ as jointly normal

$\begin{matrix} (1) & (\begin{matrix} η_{j}^{b} \\ \log [ψ_{j}^{w}] \end{matrix}) \sim N ([\begin{matrix} α_{1} \\ α_{2} \end{matrix}], [\begin{matrix} ψ_{1}^{b} \\ ψ_{12}^{b} & ψ_{2}^{b} \end{matrix}]) \end{matrix}$

Identification Constraints

Once again, we need one constraint on the mean structure and one constraint on the covariance structure. Here, we can conveniently set $α_{1}$ = $α_{2}$ = 0 so that $E (η) = 0$ and $E [\log (ψ_{j}^{w})] = 0$ (and when log variance = 0, variance = 1).

# Save data for Mplus
write.table(dat, file = "mcfa.dat", row.names = FALSE, col.names = FALSE)
writeLines("
TITLE:  Bayesian 1-factor model with random variance;
DATA:   FILE = mcfa.dat;
VARIABLE: 
        NAMES = y1-y4 id;
        USEVAR = y1-y4;
        CLUSTER = id;
ANALYSIS:
        TYPE = TWOLEVEL RANDOM;
        ESTIMATOR = BAYES;  ! This is default
        BITERATION = 500000 (10000);
        BCONVERGENCE = .005;  ! ESS of ~ 100
        CHAINS = 3;
        PROCESS = 3;
MODEL:  %WITHIN%
        fw BY y1-y4* (l1-l4);
        logpsiw | fw;
        %BETWEEN%
        fb BY y1-y4* (l1-l4);
        logpsiw with fb;  ! psi_12
        fb; logpsiw;  ! psi_1 and psi_2
        [logpsiw@0];  ! for setting the scale
        y1-y4;
OUTPUT: TECH1 TECH8;
SAVEDATA:
        BPARAMETERS = mcfa_random_var_draws.dat;
", con = "mcfa_random_var.inp")

mplus_draws <- readModels("mcfa_random_var.out")$bparameters[[2]]  # discard burn-in
# Drop chain and iteration columns
mplus_draws[[1]] <- mplus_draws[[1]][, -c(1, 2)]
mplus_draws[[2]] <- mplus_draws[[2]][, -c(1, 2)]
mplus_draws[[3]] <- mplus_draws[[3]][, -c(1, 2)]
mplus_draws <- posterior::as_draws_list(mplus_draws)
mplus_draws |>
    posterior::summarise_draws() |>
    knitr::kable(digits = 2)

variable	mean	median	sd	mad	q5	q95	rhat	ess_bulk	ess_tail
Parameter.1_%WITHIN%:.FW.BY.Y1.(equality/label)	0.61	0.61	0.02	0.02	0.58	0.64	1.04	158.87	395.22
Parameter.2_%WITHIN%:.FW.BY.Y2.(equality/label)	0.78	0.78	0.02	0.02	0.74	0.82	1.05	148.31	243.70
Parameter.3_%WITHIN%:.FW.BY.Y3.(equality/label)	0.61	0.61	0.02	0.02	0.58	0.64	1.04	135.09	306.24
Parameter.4_%WITHIN%:.FW.BY.Y4.(equality/label)	0.69	0.69	0.02	0.02	0.66	0.73	1.04	162.75	316.70
Parameter.5_%WITHIN%:.Y1	0.50	0.50	0.01	0.01	0.48	0.52	1.00	9178.99	20794.23
Parameter.6_%WITHIN%:.Y2	0.49	0.49	0.01	0.01	0.47	0.52	1.00	5870.86	13847.52
Parameter.7_%WITHIN%:.Y3	0.38	0.38	0.01	0.01	0.36	0.40	1.00	9548.73	18329.28
Parameter.8_%WITHIN%:.Y4	0.41	0.41	0.01	0.01	0.39	0.43	1.00	7283.10	14730.57
Parameter.9_%BETWEEN%:.MEAN.Y1	0.01	0.01	0.02	0.02	-0.02	0.04	1.02	76.65	99.72
Parameter.10_%BETWEEN%:.MEAN.Y2	0.30	0.30	0.02	0.02	0.27	0.33	1.03	75.07	174.96
Parameter.11_%BETWEEN%:.MEAN.Y3	-0.46	-0.46	0.02	0.02	-0.50	-0.43	1.00	199.26	602.42
Parameter.12_%BETWEEN%:.MEAN.Y4	0.19	0.19	0.02	0.02	0.16	0.23	1.01	123.90	323.58
Parameter.13_%BETWEEN%:.Y1	0.00	0.00	0.00	0.00	0.00	0.01	1.01	90.05	152.61
Parameter.14_%BETWEEN%:.Y2	0.00	0.00	0.00	0.00	0.00	0.01	1.03	47.44	69.72
Parameter.15_%BETWEEN%:.Y3	0.11	0.11	0.01	0.01	0.09	0.13	1.00	8387.99	14736.77
Parameter.16_%BETWEEN%:.Y4	0.04	0.04	0.01	0.01	0.03	0.05	1.00	2281.47	3623.19
Parameter.17_%BETWEEN%:.FB	0.15	0.15	0.02	0.02	0.11	0.19	1.02	448.34	1637.46
Parameter.18_%BETWEEN%:.LOGPSIW.WITH.FB	-0.24	-0.24	0.03	0.03	-0.30	-0.18	1.00	764.27	3103.02
Parameter.19_%BETWEEN%:.LOGPSIW	1.10	1.09	0.11	0.11	0.93	1.28	1.01	825.07	2752.46

Using STAN

Trial 1

In the first trial, I sample both $η_{j}^{b}$ and $\log (ψ_{j}^{w})$ , and use the conditional distributions:

$\begin{aligned} {\bar{y}}_{j} ∣ η_{j}^{b}, \log (ψ_{j}^{w}) & \sim N (ν + Λ η_{j}^{b}, Σ_{W j} + Θ^{b}) \\ y_{i j} - {\bar{y}}_{j} ∣ η_{j}^{b}, \log (ψ_{j}^{w}) & \sim N (0, Σ_{W j}) for i = 1, \dots, n_{j} - 1 \end{aligned}$

//
// This Stan program defines a one-factor two-level CFA model with a
// marginal likelihood approach similar to one described in
// https://ecmerkle.github.io/blavaan/articles/multilevel.html. 
//
// Define function for computing multi-normal log density using sufficient
// statistics
functions {
  real multi_normal_suff_lpdf(matrix S,
                              matrix Sigma,
                              int n) {
    return -0.5 * (n * (log_determinant_spd(Sigma) + cols(S) * log(2 * pi())) +
      trace(mdivide_left_spd(Sigma, S)));
  }
}
// The input data are (a) sw, cluster-specific within-cluster cross-product
// matrix, (b) ybar, vectors of cluster means, and (c) n, cluster sample
// size. It assumes complete data.
data {
  int<lower=1> p;  // number of items
  int<lower=1> J;  // number of clusters
  array[J] int<lower=1> n;  // cluster size
  array[J] matrix[p, p] sw;  // within cross-product matrices
  array[J] vector[p] ybar;  // cluster means
}
// The parameters accepted by the model. Our model
// accepts six parameters: 
// 'lambda_star' for unscaled loadings, 'nu' for intercepts, 
// 'thetaw' and 'thetab' for uniqueness at the within and the between levels,
// and 'L_phi' and 'sd_phi' for the covariance matrix of cluster-specific
// log(variance) and mean of latent eta. 'z_xi' are the realized values
// of the cluster-specific log(variance) and mean.
parameters {
  vector[p] lambda_star;  // unscaled loading vector assuming cross-level 
                          // invariance
  vector<lower=0>[p] thetaw;  // within-cluster uniqueness vector
  vector<lower=0>[p] thetab;  // between-cluster uniqueness vector
  cholesky_factor_corr[2] L_phi;  // cholesky of correlation matrix of (logvar, mean)
  vector<lower=0>[2] sd_phi;  // standard deviations of (logvar, mean)
  matrix[2, J] z_xi;  // scaled values of (logvar, mean)
  // real<lower=0> sd_logpsiw;  // SD of log(SD) of etaw
  // real<lower=0> psib;  // between-cluster latent standard deviation
  // vector[J] z_logpsiw;  // scaled value of log(SD) of etaw
  vector[p] nu;  // item means/intercepts
}

// The model to be estimated. We compute the log-likelihood as the sum of
// two components: cluster-mean centered variables are multivariate normal
// with covariance matrix sigmaw (and degree of freedom adjustment),
// and cluster-mean variables are multivariate normal with covariance
// matrix sigmab + sigmawj / nj
model {
  matrix[p, p] sigmawj;
  matrix[2, J] xi = diag_pre_multiply(sd_phi, L_phi) * z_xi;
  matrix[p, J] mu = lambda_star * xi[2, ];
  matrix[p, p] LLt = lambda_star * lambda_star';
  // matrix[p, p] sigmab = add_diag(square(psib) .* LLt, square(thetab));
  lambda_star ~ std_normal();
  thetaw ~ student_t(4, 0, 1);
  thetab ~ std_normal();
  to_vector(z_xi) ~ std_normal();
  sd_phi ~ student_t(4, 0, 1);
  L_phi ~ lkj_corr_cholesky(2);
  for (j in 1:J) {
    sigmawj = add_diag(exp(2 * xi[1, j]) .* LLt, square(thetaw));
    target += multi_normal_suff_lpdf(sw[j] | sigmawj, n[j] - 1);
    ybar[j] ~ multi_normal(nu + mu[, j], add_diag(sigmawj / n[j], square(thetab)));
  }
}
// Rotating the factor loadings
// (see https://discourse.mc-stan.org/t/latent-factor-loadings/1483)
generated quantities {
  vector[p] lambda;  // final loading vector
  real cor_logpsiw_etab;
  {
    int sign_l1 = lambda_star[1] > 0 ? 1 : -1;
    lambda = sign_l1 * lambda_star;
    cor_logpsiw_etab = sign_l1 * L_phi[2, 1];
  }
}

# Prepare data for STAN
yc <- sweep(dat[1:4], MARGIN = 2, STATS = colMeans(dat[1:4])) # centered data
nj <- table(dat$id)
# Sufficient statistics
sswj <- tapply(yc, INDEX = dat$id,
               FUN = \(x) tcrossprod(t(x) - colMeans(x)))
ybarj <- tapply(dat[1:4], INDEX = dat$id, FUN = \(x) colMeans(x))

# Run STAN
mcfa_randvar_fit <- mcfa_randvar$sample(
    data = list(
        p = 4,
        J = num_clus,
        n = nj,
        sw = sswj,
        ybar = ybarj
    ),
    chains = 3,
    parallel_chains = 3,
    iter_warmup = 500,
    iter_sampling = 1000,
    refresh = 500
)

mcfa_randvar_fit$summary(
    c("lambda", "thetaw", "thetab", "sd_phi", "cor_logpsiw_etab")
) |>
    knitr::kable(digits = 2)

variable	mean	median	sd	mad	q5	q95	rhat	ess_bulk	ess_tail
lambda[1]	0.61	0.61	0.02	0.02	0.58	0.64	1.01	416.82	951.89
lambda[2]	0.78	0.78	0.02	0.02	0.75	0.82	1.01	372.45	688.70
lambda[3]	0.62	0.62	0.02	0.02	0.59	0.65	1.01	388.12	627.81
lambda[4]	0.70	0.70	0.02	0.02	0.66	0.73	1.01	411.88	716.34
thetaw[1]	0.71	0.71	0.01	0.01	0.69	0.72	1.00	3409.43	2070.68
thetaw[2]	0.70	0.70	0.01	0.01	0.69	0.72	1.00	2969.34	1962.94
thetaw[3]	0.62	0.62	0.01	0.01	0.60	0.63	1.00	2880.06	1899.37
thetaw[4]	0.64	0.64	0.01	0.01	0.63	0.66	1.00	3217.95	2394.14
thetab[1]	0.03	0.02	0.02	0.02	0.00	0.06	1.00	1792.84	1053.89
thetab[2]	0.03	0.03	0.02	0.02	0.00	0.08	1.00	1627.72	907.86
thetab[3]	0.33	0.33	0.02	0.02	0.30	0.36	1.00	3307.89	2123.94
thetab[4]	0.20	0.20	0.02	0.02	0.17	0.22	1.00	3168.89	2084.24
sd_phi[1]	0.52	0.52	0.03	0.03	0.48	0.56	1.00	633.21	1289.95
sd_phi[2]	0.38	0.38	0.03	0.03	0.33	0.42	1.00	834.20	1498.27
cor_logpsiw_etab	-0.60	-0.60	0.06	0.07	-0.70	-0.49	1.01	586.29	1360.98

Trial 2

The previous approach requires conditioning on two latent variables: $η_{j}^{b}$ and $\log (ψ_{j}^{w})$ . However, for normal models, we can reduce the dimensionality using a trick introduced by Du Toit and Cudeck (2009) for normal models. First, we decompose the covariance matrix in Equation 1 into the standard deviations and the correlation matrix (see https://mc-stan.org/docs/stan-users-guide/regression.html#multivariate-regression-example):

$V (\begin{matrix} η_{j}^{b} \\ \log [ψ_{j}^{w}] \end{matrix}) = diag (σ_{ϕ}) Ω_{ϕ} diag (σ_{ϕ}) = [\begin{matrix} σ_{1} & 0 \\ 0 & σ_{2} \end{matrix}] [\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix}] [\begin{matrix} σ_{1} & 0 \\ 0 & σ_{2} \end{matrix}] .$

Further, for positive definite $Ω_{ϕ}$ , we can use the Cholesky decomposition:

$Ω_{ϕ} = L_{ϕ} L_{ϕ}^{⊤} = [\begin{matrix} 1 & 0 \\ ρ & \sqrt{1 - ρ^{2}} \end{matrix}] [\begin{matrix} 1 & ρ \\ 0 & \sqrt{1 - ρ^{2}} \end{matrix}]$

Also, the conditional distribution of $η_{j}^{b} ∣ ψ_{j}^{w}$ is $N (α_{∣ ψ^{w}}), ψ_{∣ ψ^{w}}^{b}$ , where

$\begin{aligned} α_{∣ ψ^{w}} & = α_{1} + ρ σ_{1} Z_{\log (ψ_{j}^{w})} \\ ψ_{∣ ψ^{w}}^{b} & = σ_{1}^{2} [1 - ρ^{2}], \end{aligned}$

One can thus first sample $\log (ψ_{j}^{w})$ from a univariate normal distribution, and then update $α_{∣ ψ^{w}}$ and $ψ_{∣ ψ^{w}}^{b}$ to derive $Σ_{j}^{w}$ , $μ_{j}$ (implied item means), and $Σ^{b}$ , conditioned on $ψ_{j}^{w}$ , as shown in the following implementation.

//
// This Stan program defines a one-factor two-level CFA model with a
// marginal likelihood approach similar to one described in
// https://ecmerkle.github.io/blavaan/articles/multilevel.html. 
//
// Define function for computing multi-normal log density using sufficient
// statistics
functions {
  real multi_normal_suff_lpdf(matrix S,
                              matrix Sigma,
                              int n) {
    return -0.5 * (n * (log_determinant_spd(Sigma) + cols(S) * log(2 * pi())) +
      trace(mdivide_left_spd(Sigma, S)));
  }
}
// The input data are (a) sw, cluster-specific within-cluster cross-product
// matrix, (b) ybar, vectors of cluster means, and (c) n, cluster sample
// size. It assumes complete data.
data {
  int<lower=1> p;  // number of items
  int<lower=1> J;  // number of clusters
  array[J] int<lower=1> n;  // cluster size
  array[J] matrix[p, p] sw;  // within cross-product matrices
  array[J] vector[p] ybar;  // cluster means
}
// The parameters accepted by the model. Our model
// accepts six parameters: 
// 'lambda_star' for unscaled loadings, 'nu' for intercepts, 
// 'thetaw' and 'thetab' for uniqueness at the within and the between levels,
// and 'psib' for conditional between-level latent factor SD.
// 'Lphi' for the correlation matrix of cluster-specific
// log(variance) and mean of latent eta, and 'sd_logpsiw' for the SD of the
// cluster-specific within-cluster latent log(variance). 
// 'z_logpsiw' is the scaled realized values of the cluster-specific 
// log(variance).
parameters {
  vector[p] lambda_star;  // unscaled loading vector assuming cross-level 
                          // invariance
  vector<lower=0>[p] thetaw;  // within-cluster uniqueness vector
  vector<lower=0>[p] thetab;  // between-cluster uniqueness vector
  cholesky_factor_corr[2] Lphi;  // cholesky of correlation matrix of (logvar, mean)
  real<lower=0> sd_logpsiw;  // SD of log(SD) of etaw
  real<lower=0> psib;  // between-cluster latent standard deviation
  row_vector[J] z_logpsiw;  // scaled value of log(SD) of etaw
  vector[p] nu;  // item means/intercepts
}

// The model to be estimated. We compute the log-likelihood as the sum of
// two components: cluster-mean centered variables are multivariate normal
// with covariance matrix sigmaw (and degree of freedom adjustment),
// and cluster-mean variables are multivariate normal with covariance
// matrix sigmab + sigmawj / nj
model {
  matrix[p, J] mu = lambda_star * ((psib * Lphi[2, 1]) .* z_logpsiw);
  row_vector[J] psiw = exp(sd_logpsiw .* z_logpsiw);
  matrix[p, p] LLt = lambda_star * lambda_star';
  matrix[p, p] sigmab = add_diag(square(psib * Lphi[2, 2]) .* LLt, square(thetab));
  lambda_star ~ normal(0, 10);
  thetaw ~ gamma(1, .5);
  thetab ~ gamma(1, .5);
  psib ~ gamma(1, .5);
  nu ~ normal(0, 32);
  z_logpsiw ~ std_normal();
  sd_logpsiw ~ gamma(1, 0.5);
  Lphi ~ lkj_corr_cholesky(2);
  for (j in 1:J) {
    matrix[p, p] sigmawj = add_diag(square(psiw[j]) .* LLt, square(thetaw));
    target += multi_normal_suff_lpdf(sw[j] | sigmawj, n[j] - 1);
    ybar[j] ~ multi_normal(nu + mu[, j], sigmab + sigmawj / n[j]);
  }
}
// Rotating the factor loadings
// (see https://discourse.mc-stan.org/t/latent-factor-loadings/1483)
generated quantities {
  vector[p] lambda;  // final loading vector
  real cor_logpsiw_etab;
  {
    int sign_l1 = lambda_star[1] > 0 ? 1 : -1;
    lambda = sign_l1 * lambda_star;
    cor_logpsiw_etab = sign_l1 * Lphi[2, 1];
  }
}

# Run STAN
mcfa_randvar2_fit <- mcfa_randvar2$sample(
    data = list(
        p = 4,
        J = num_clus,
        n = nj,
        sw = sswj,
        ybar = ybarj
    ),
    chains = 3,
    parallel_chains = 3,
    iter_warmup = 500,
    iter_sampling = 1000,
    refresh = 500
)

mcfa_randvar2_fit$summary(
    c("lambda", "thetaw", "thetab", "sd_logpsiw", "psib", "cor_logpsiw_etab")
) |>
    knitr::kable(digits = 2)

variable	mean	median	sd	mad	q5	q95	rhat	ess_bulk	ess_tail
lambda[1]	0.61	0.61	0.02	0.02	0.58	0.64	1	675.59	1318.93
lambda[2]	0.78	0.78	0.02	0.02	0.74	0.82	1	624.06	1247.15
lambda[3]	0.62	0.62	0.02	0.02	0.59	0.65	1	635.16	1158.55
lambda[4]	0.70	0.70	0.02	0.02	0.66	0.73	1	603.41	1079.80
thetaw[1]	0.71	0.71	0.01	0.01	0.70	0.72	1	5695.97	1845.04
thetaw[2]	0.70	0.70	0.01	0.01	0.69	0.72	1	4627.07	2296.33
thetaw[3]	0.62	0.62	0.01	0.01	0.60	0.63	1	4026.94	1949.65
thetaw[4]	0.64	0.64	0.01	0.01	0.63	0.66	1	4727.75	2143.58
thetab[1]	0.03	0.02	0.02	0.02	0.00	0.06	1	2165.05	1508.75
thetab[2]	0.03	0.03	0.02	0.02	0.00	0.08	1	2245.10	1644.61
thetab[3]	0.33	0.33	0.02	0.02	0.31	0.36	1	4212.69	1915.84
thetab[4]	0.20	0.20	0.02	0.02	0.17	0.22	1	4267.48	2346.73
sd_logpsiw	0.52	0.52	0.03	0.03	0.48	0.56	1	979.93	1723.84
psib	0.38	0.38	0.03	0.03	0.33	0.42	1	1672.81	2044.86
cor_logpsiw_etab	-0.60	-0.60	0.06	0.06	-0.70	-0.49	1	1715.41	2175.00

The second approach is faster, and seems more efficient than the Mplus approach. It is unclear why the estimates of the variance/standard deviation of $\log (ψ_{j}^{w})$ is larger with the Mplus approach.

data.frame(
    model = c("Mplus (Gibbs)", "Sampling both $\\eta^b$ and $\\log(\\psi^w)$", "Sampling only $\\log(\\psi^w)$"),
    ess = c(
        posterior::summarise_draws(mplus_draws, "ess_bulk")[[2]][17],
        mcfa_randvar_fit$summary(variables = "sd_phi[2]", posterior::ess_bulk)[[2]],
        mcfa_randvar2_fit$summary(variables = "psib", posterior::ess_bulk)[[2]]
    ),
    time = c(
        mplus_sec,
        mcfa_randvar_fit$time()$total,
        mcfa_randvar2_fit$time()$total
    )
) |>
    knitr::kable(digits = 2, col.names = c("Model", "ESS (Bulk) for $\\psi^b$", "Time (seconds)"))

Table 1: Sampling efficiency of different approaches

Model	ESS (Bulk) for $ψ^{b}$	Time (seconds)
Mplus (Gibbs)	448.34	171.00
Sampling both $η^{b}$ and $\log (ψ^{w})$	834.20	229.72
Sampling only $\log (ψ^{w})$	1672.81	170.23

Session Info

sessioninfo::session_info()

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.1 (2024-06-14)
 os       Ubuntu 24.04.1 LTS
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Los_Angeles
 date     2024-12-09
 pandoc   3.3 @ /usr/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package         * version  date (UTC) lib source
 abind             1.4-5    2016-07-21 [3] CRAN (R 4.2.0)
 backports         1.4.1    2021-12-13 [3] CRAN (R 4.2.0)
 boot              1.3-30   2024-02-26 [4] CRAN (R 4.3.3)
 checkmate         2.3.1    2023-12-04 [3] CRAN (R 4.3.2)
 cli               3.6.3    2024-06-21 [1] CRAN (R 4.4.1)
 cmdstanr        * 0.8.1    2024-09-01 [1] https://stan-dev.r-universe.dev (R 4.4.1)
 coda              0.19-4.1 2024-01-31 [3] CRAN (R 4.3.2)
 colorspace        2.1-0    2023-01-23 [3] CRAN (R 4.2.2)
 data.table        1.15.0   2024-01-30 [3] CRAN (R 4.3.2)
 digest            0.6.37   2024-08-19 [1] CRAN (R 4.4.1)
 distributional    0.4.0    2024-02-07 [3] CRAN (R 4.3.2)
 dplyr             1.1.4    2023-11-17 [3] CRAN (R 4.3.2)
 evaluate          0.23     2023-11-01 [3] CRAN (R 4.3.2)
 fansi             1.0.6    2023-12-08 [3] CRAN (R 4.3.2)
 fastDummies       1.7.3    2023-07-06 [3] CRAN (R 4.3.2)
 fastmap           1.2.0    2024-05-15 [1] CRAN (R 4.4.1)
 generics          0.1.3    2022-07-05 [3] CRAN (R 4.2.1)
 ggplot2           3.4.4    2023-10-12 [3] CRAN (R 4.3.1)
 glue              1.8.0    2024-09-30 [1] CRAN (R 4.4.1)
 gsubfn            0.7      2018-03-16 [3] CRAN (R 4.1.1)
 gtable            0.3.4    2023-08-21 [3] CRAN (R 4.3.1)
 htmltools         0.5.7    2023-11-03 [3] CRAN (R 4.3.2)
 htmlwidgets       1.6.4    2023-12-06 [3] CRAN (R 4.3.2)
 httr              1.4.7    2023-08-15 [3] CRAN (R 4.3.1)
 jsonlite          1.8.8    2023-12-04 [3] CRAN (R 4.3.2)
 knitr             1.45     2023-10-30 [3] CRAN (R 4.3.2)
 lattice           0.22-5   2023-10-24 [4] CRAN (R 4.3.1)
 lavaan          * 0.6-19   2024-09-26 [1] CRAN (R 4.4.1)
 lifecycle         1.0.4    2023-11-07 [3] CRAN (R 4.3.2)
 magrittr          2.0.3    2022-03-30 [3] CRAN (R 4.2.0)
 MASS              7.3-61   2024-06-13 [4] CRAN (R 4.4.1)
 matrixStats       1.2.0    2023-12-11 [3] CRAN (R 4.3.2)
 mnormt            2.1.1    2022-09-26 [3] CRAN (R 4.2.2)
 MplusAutomation * 1.1.1    2024-01-30 [3] CRAN (R 4.3.2)
 munsell           0.5.0    2018-06-12 [3] CRAN (R 4.0.1)
 pander            0.6.5    2022-03-18 [3] CRAN (R 4.3.0)
 pbivnorm          0.6.0    2015-01-23 [3] CRAN (R 4.0.2)
 pillar            1.9.0    2023-03-22 [3] CRAN (R 4.3.0)
 pkgconfig         2.0.3    2019-09-22 [3] CRAN (R 4.0.1)
 plyr              1.8.9    2023-10-02 [3] CRAN (R 4.3.1)
 posterior       * 1.6.0    2024-07-03 [1] CRAN (R 4.4.1)
 processx          3.8.3    2023-12-10 [3] CRAN (R 4.3.2)
 proto             1.0.0    2016-10-29 [3] CRAN (R 4.0.1)
 ps                1.7.6    2024-01-18 [3] CRAN (R 4.3.2)
 quadprog          1.5-8    2019-11-20 [3] CRAN (R 4.2.0)
 R6                2.5.1    2021-08-19 [1] CRAN (R 4.4.1)
 Rcpp              1.0.13-1 2024-11-02 [1] CRAN (R 4.4.1)
 rlang             1.1.4    2024-06-04 [1] CRAN (R 4.4.1)
 rmarkdown         2.25     2023-09-18 [3] CRAN (R 4.3.2)
 scales            1.3.0    2023-11-28 [3] CRAN (R 4.3.2)
 sessioninfo       1.2.2    2021-12-06 [3] CRAN (R 4.2.0)
 tensorA           0.36.2.1 2023-12-13 [3] CRAN (R 4.3.2)
 texreg            1.39.3   2023-11-10 [3] CRAN (R 4.3.2)
 tibble            3.2.1    2023-03-20 [3] CRAN (R 4.3.1)
 tidyselect        1.2.0    2022-10-10 [3] CRAN (R 4.2.2)
 utf8              1.2.4    2023-10-22 [3] CRAN (R 4.3.2)
 vctrs             0.6.5    2023-12-01 [3] CRAN (R 4.3.2)
 withr             3.0.0    2024-01-16 [3] CRAN (R 4.3.2)
 xfun              0.41     2023-11-01 [3] CRAN (R 4.3.2)
 xtable            1.8-4    2019-04-21 [3] CRAN (R 4.0.1)
 yaml              2.3.8    2023-12-11 [3] CRAN (R 4.3.2)

 [1] /home/markl/R/x86_64-pc-linux-gnu-library/4.4
 [2] /usr/local/lib/R/site-library
 [3] /usr/lib/R/site-library
 [4] /usr/lib/R/library

──────────────────────────────────────────────────────────────────────────────

References

Du Toit, Stephen H. C., and Robert Cudeck. 2009. “Estimation of the Nonlinear Random Coefficient Model When Some Random Effects Are Separable.” Psychometrika 74 (1): 65–82. https://doi.org/10.1007/s11336-008-9107-7.