Bayesian Multilevel Factor Model (Part I)

Multilevel structural equation modeling
Bayesian

Comparing different approaches for fitting multilevel factor models with random intercepts

Author

Mark Lai

Published

July 31, 2024

In this note, I explore (mainly Bayesian) methods for fitting a multilevel factor model.

library(lavaan)
This is lavaan 0.6-18
lavaan is FREE software! Please report any bugs.
library(cmdstanr)
This is cmdstanr version 0.8.1
- CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
- CmdStan path: /home/markl/.cmdstan/cmdstan-2.35.0
- CmdStan version: 2.35.0
register_knitr_engine()
library(blavaan)
Loading required package: Rcpp
This is blavaan 0.5-5
On multicore systems, we suggest use of future::plan("multicore") or
  future::plan("multisession") for faster post-MCMC computations.
future::plan("multicore", workers = 3)

Model Equation

For response vector \(\boldsymbol{\mathbf{y}}_{ij}\) of length \(p\) for person \(i\) in cluster \(j\), the multilevel factor model is

\[ \boldsymbol{\mathbf{y}}_{ij} = \boldsymbol{\mathbf{\nu }}+ \boldsymbol{\mathbf{\Lambda }}\boldsymbol{\mathbf{\eta}}_{ij} + \boldsymbol{\mathbf{\varepsilon}}^{b}_j + \boldsymbol{\mathbf{\varepsilon}}^{w}_{ij} \]

with

\[ \begin{aligned} E(\boldsymbol{\mathbf{\varepsilon}}^{b}_j) & = \boldsymbol{\mathbf{0}} \\ E(\boldsymbol{\mathbf{\varepsilon}}^{w}_{ij}) & = \boldsymbol{\mathbf{0}} \\ E(\boldsymbol{\mathbf{\eta}}_{ij}) & = \boldsymbol{\mathbf{\alpha }}\\ V(\boldsymbol{\mathbf{\varepsilon}}^{b}_j) & = \boldsymbol{\mathbf{\Theta}}^b \\ V(\boldsymbol{\mathbf{\varepsilon}}^{w}_{ij}) & = \boldsymbol{\mathbf{\Theta}}^w \\ V(\boldsymbol{\mathbf{\eta}}_{ij}) & = \boldsymbol{\mathbf{\Psi}}^w + \boldsymbol{\mathbf{\Psi}}^b \end{aligned} \]

Identification Constraints

For each component in \(\boldsymbol{\mathbf{\eta}}\), we need one constraint on the mean structure and one constraint on the covariance structure. Conventionally, for the mean structure, we either set \(E(\eta) = 0\) or \(\nu = 0\) for one item; for the covariance structure, we either set \(V(\eta) = 1\) or \(\lambda = 1\) for one item.

Assumptions

Here we assume cross-level loading invariance (see this paper and this paper, for example), such that there is only one loading matrix \(\boldsymbol{\mathbf{\Lambda}}\) that applies to both levels, so that the latent variable has both a within-level and a between-level component:

\[ \boldsymbol{\mathbf{\eta}}_{ij} = \boldsymbol{\mathbf{\eta}}^b_j + \boldsymbol{\mathbf{\eta}}^{w}_{ij} \]

However, the loadings can still be randomly varying across clusters, which will be a discussion for Part II.

We also assume that \(\boldsymbol{\mathbf{\varepsilon}}^{b}_j\) is i.i.d. across clusters and that \(\boldsymbol{\mathbf{\varepsilon}}^{w}_{ij}\) is i.i.d. across individuals.

Conditional Independence

In Bayesian estimation we try to find conditional independence in the data so that the estimation can be simplified. Here, assuming multivariate normality of \(\boldsymbol{\mathbf{\varepsilon}}^{b}\) and \(\boldsymbol{\mathbf{\varepsilon}}^{w}\), we have independent observations when conditioned on the level-2 parameters:

\[ \boldsymbol{\mathbf{y}}_{ij} \mid \boldsymbol{\mathbf{\eta}}^b_j, \boldsymbol{\mathbf{\varepsilon}}^{b}_j \sim N(\nu + \boldsymbol{\mathbf{\varepsilon}}^{b}_j + \boldsymbol{\mathbf{\Lambda }}\boldsymbol{\mathbf{\alpha}}, \boldsymbol{\mathbf{\Sigma}}^w), \]

where \(\boldsymbol{\mathbf{\Sigma}}^w = \boldsymbol{\mathbf{\Lambda }}\boldsymbol{\mathbf{\Psi}}^w \boldsymbol{\mathbf{\Lambda}}^\top + \boldsymbol{\mathbf{\Theta}}^w\). Note that the within-cluster covariance is constant.

Simulate Data

Set up: One factor, Four items; two items with random intercepts

# Design parameters
set.seed(123)
num_clus <- 100
clus_size <- 10
icc_eta <- .10
lambda <- c(.7, .9, .7, .8)
nu <- c(0, 0.3, -0.5, 0.2)
theta_lambda <- c(0, 0, 0, 0.08) # loading variances
theta_nu <- c(0, 0, 0.1, 0.05) # intercept variances
theta <- c(.51, .51, .40, .40) # uniqueness
# assume one-factor model holds for now

# Simulate latent variable
clus_id <- rep(1:num_clus, each = clus_size)
etab <- rnorm(num_clus, sd = sqrt(icc_eta / (1 - icc_eta)))
etaw <- rnorm(num_clus * clus_size)
eta <- etab[clus_id] + etaw
# confirm the simulated variable behaves as expected
# lmer(eta ~ (1 | clus_id))

# Simulate items for each cluster
num_items <- 4
y <- lapply(1:num_clus, FUN = \(j) {
    yj <- nu + rnorm(nu, sd = sqrt(theta_nu)) +
    tcrossprod(
        lambda + rnorm(lambda, sd = sqrt(theta_lambda)),
        etab[j] + etaw[clus_id == j]
    ) +
        rnorm(num_items * clus_size, sd = sqrt(theta))
    t(yj)
})
dat <- do.call(rbind, y) |>
    as.data.frame() |>
    setNames(paste0("y", 1:4)) |>
    cbind(id = clus_id)

Frequentist with lavaan

mcfa_mod <- "
  level: 1
    f =~ l1 * y1 + l2 * y2 + l3 * y3 + l4 * y4
    f ~~ 1 * f
  level: 2
    f =~ l1 * y1 + l2 * y2 + l3 * y3 + l4 * y4
"
mcfa_fit <- cfa(mcfa_mod, data = dat, cluster = "id", auto.fix.first = FALSE)
Warning: lavaan->lav_object_post_check():  
   some estimated ov variances are negative
summary(mcfa_fit)
lavaan 0.6-18 ended normally after 25 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        21
  Number of equality constraints                     4

  Number of observations                          1000
  Number of clusters [id]                          100

Model Test User Model:
                                                      
  Test statistic                                13.531
  Degrees of freedom                                 7
  P-value (Chi-square)                           0.060

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Observed
  Observed information based on                Hessian


Level 1 [within]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  f =~                                                
    y1        (l1)    0.685    0.030   22.669    0.000
    y2        (l2)    0.901    0.034   26.834    0.000
    y3        (l3)    0.711    0.029   24.260    0.000
    y4        (l4)    0.797    0.032   24.981    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
    f                 1.000                           
   .y1                0.507    0.029   17.291    0.000
   .y2                0.462    0.034   13.728    0.000
   .y3                0.417    0.026   16.021    0.000
   .y4                0.484    0.031   15.681    0.000


Level 2 [id]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  f =~                                                
    y1        (l1)    0.685    0.030   22.669    0.000
    y2        (l2)    0.901    0.034   26.834    0.000
    y3        (l3)    0.711    0.029   24.260    0.000
    y4        (l4)    0.797    0.032   24.981    0.000

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .y1                0.019    0.035    0.530    0.596
   .y2                0.351    0.041    8.639    0.000
   .y3               -0.457    0.044  -10.380    0.000
   .y4                0.266    0.046    5.747    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .y1                0.003    0.010    0.325    0.745
   .y2               -0.003    0.012   -0.257    0.798
   .y3                0.076    0.019    4.022    0.000
   .y4                0.069    0.019    3.595    0.000
    f                 0.051    0.027    1.893    0.058

Bayesian with blavaan

To my knowledge, the current version of blavaan does not support cross-level constraints.

mcfa_bfit <- bcfa(mcfa_mod, data = dat, cluster = "id", auto.fix.first = FALSE)
blavaan NOTE: two-level models are new, please report bugs!
https://github.com/ecmerkle/blavaan/issues
Warning: blavaan WARNING: If you are manually fixing lv variances to 1 for
identification, please use std.lv=TRUE.
[1] "Error in matattr(fr, es, constrain, mat = \"Lambda_y\", Ng, opts$std.lv,  : \n  blavaan ERROR: cross-matrix equality constraints not supported.\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in matattr(fr, es, constrain, mat = "Lambda_y", Ng, opts$std.lv,     tmpwig): blavaan ERROR: cross-matrix equality constraints not supported.>
Error in blavaan(mcfa_mod, data = dat, cluster = "id", auto.fix.first = TRUE, : blavaan ERROR: problem with translation from lavaan to MCMC syntax.

It can fit a model without cross-level invariance:

mcfa_uncon_mod <- "
  level: 1
    fw =~ y1 + y2 + y3 + y4
  level: 2
    fb =~ y1 + y2 + y3 + y4
"
mcfa_uncon_bfit <- bcfa(mcfa_uncon_mod, data = dat, cluster = "id",
                        bcontrol = list(cores = 3))
blavaan NOTE: two-level models are new, please report bugs!
https://github.com/ecmerkle/blavaan/issues
Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess
Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess
Computing post-estimation metrics (including lvs if requested)...
summary(mcfa_uncon_bfit)
blavaan 0.5.5 ended normally after 1000 iterations

  Estimator                                      BAYES
  Optimization method                             MCMC
  Number of model parameters                        20

  Number of observations                          1000
  Number of clusters [id]                          100

  Statistic                                 MargLogLik         PPP
  Value                                      -5249.174       0.173

Parameter Estimates:



Level 1 [within]:

Latent Variables:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
  fw =~                                                                        
    y1                1.000                                                    
    y2                1.320    0.068    1.199    1.460    1.001    normal(0,10)
    y3                1.038    0.053    0.939    1.155    1.001    normal(0,10)
    y4                1.161    0.061    1.045    1.284    1.002    normal(0,10)

Intercepts:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
   .y1                0.000                                                    
   .y2                0.000                                                    
   .y3                0.000                                                    
   .y4                0.000                                                    
    fw                0.000                                                    

Variances:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
   .y1                0.508    0.029    0.453    0.571    1.000 gamma(1,.5)[sd]
   .y2                0.457    0.033    0.396    0.524    0.999 gamma(1,.5)[sd]
   .y3                0.419    0.026    0.367    0.472    1.002 gamma(1,.5)[sd]
   .y4                0.487    0.031    0.431    0.550    1.000 gamma(1,.5)[sd]
    fw                0.491    0.044    0.408    0.583    1.001 gamma(1,.5)[sd]


Level 2 [id]:

Latent Variables:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
  fb =~                                                                        
    y1                1.000                                                    
    y2                0.407    6.260  -12.891   12.845    1.004    normal(0,10)
    y3                1.727    7.628  -14.137   17.502    1.004    normal(0,10)
    y4                0.567    9.717  -19.056   19.071    1.008    normal(0,10)

Intercepts:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
   .y1                0.019    0.032   -0.044    0.081    1.003    normal(0,32)
   .y2                0.352    0.038    0.278    0.424    1.002    normal(0,32)
   .y3               -0.457    0.041   -0.538   -0.379    1.004    normal(0,32)
   .y4                0.265    0.043    0.181    0.350    1.004    normal(0,32)
    fb                0.000                                                    

Variances:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
   .y1                0.007    0.007    0.000    0.025    1.006 gamma(1,.5)[sd]
   .y2                0.006    0.007    0.000    0.024    0.999 gamma(1,.5)[sd]
   .y3                0.065    0.028    0.004    0.117    1.012 gamma(1,.5)[sd]
   .y4                0.057    0.029    0.002    0.113    1.006 gamma(1,.5)[sd]
    fb                0.001    0.002    0.000    0.007    1.003 gamma(1,.5)[sd]

It takes about 32.568 seconds to estimate the model.

Using STAN

Columnwise approach

Here’s the code I used to fit multilevel factor model many years ago. It assumes that \(\boldsymbol{\mathbf{\Theta}}^w\) is diagonal, aka local independence, so that I can treat each item as independent normal variable (after conditioning on the parameters). It does require sampling the latent variables: \(\eta^b\), \(\eta^w\), and \(\varepsilon^b\).

Listing 1: STAN code for the columnwise approach
//
// This Stan program defines a one-factor two-level CFA model with parameter 
// expansion implemented in Merkle & Wang (2018). 
//

// The input data is a matrix 'y' of 'N' rows and 'p' columns.
data {
  int<lower=0> N;  // number of observations
  int<lower=0> p;  // number of items
  int<lower=0> J;  // number of clusters
  array[N] int<lower=1, upper=J> jj;  // cluster membership indicator
  array[p] vector[N] Y;  // item responses
}

// The parameters accepted by the model. Our model
// accepts these parameters: 
// 'lambda_star' for unscaled loadings, 'nu' for intercepts, 
// 'thetaw' and 'thetab' for uniqueness at the within and the between levels
// 'etaw_star' and 'z_etab_star' for the standardized within-cluster and 
// between-cluster latent factor scores
// 'z_epsb' for standardized between-cluster uniqe factor scores, and 
// 'psib' for latent factor SD
parameters {
  // vector[p] lambdaw_star;
  // vector[p] lambdab_star;
  vector[p] lambda_star;  // unscaled loading vector assuming cross-level 
                          // invariance
  vector[p] nu;  // intercept vector
  vector<lower=0>[p] thetaw;  // within-cluster uniqueness vector
  vector<lower=0>[p] thetab;  // between-cluster uniqueness vector
  vector[N] etaw_star;  // within-cluster latent scores
  vector[J] z_etab_star;  // unscaled between-cluster latent scores
  real<lower=0> psib;  // between-cluster latent standard deviation
  matrix[J, p] z_epsb;  // unscaled between-cluster uniqe factor score matrix
}

// The model to be estimated. We model each column in the output
// 'y' to be normally distributed with mean 
// 'nu + lambda_star * etaw_star + Yb' and standard deviation 'thetaw', where
// 'Yb' = 'etab_star * lambda_star + epsb', and 'epsb' = z_epsb * thetab'
model {
  // lambdaw_star ~ std_normal();
  // lambdab_star ~ std_normal();
  lambda_star ~ std_normal();
  etaw_star ~ std_normal(); 
  z_etab_star ~ std_normal(); 
  thetaw ~ student_t(4, 0, 1);
  thetab ~ std_normal();
  psib ~ std_normal();
  to_vector(z_epsb) ~ std_normal(); 
  for (i in 1:p)
    {
      // 'Yb = etab_star * lambda_star + epsb'
      vector[J] Yb = z_etab_star * psib * lambda_star[i] + 
                     col(z_epsb, i) * thetab[i];
      // 'Y ~ N(nu + lambda_star * etaw_star + Yb, thetaw)'
      Y[i] ~ normal(nu[i] + etaw_star * lambda_star[i] + Yb[jj], 
                    thetaw[i]);
    }
}

generated quantities {
  vector[p] lambda;  // final loading vector
  vector[N] etaw;  // final within-cluster factor score
  vector[J] etab;  // final between-cluster factor score
  {
    int sign_l1 = lambda_star[1] > 0 ? 1 : -1;
    lambda = sign_l1 * lambda_star;
    etaw = sign_l1 * etaw_star;
    etab = sign_l1 * z_etab_star * psib;
  }
}
mcfa_col_fit <- mcfa_col$sample(
    data = list(
        N = nrow(dat),
        p = 4,
        Y = t(dat[1:4]),
        J = length(unique(dat$id)),
        jj = dat$id
    ),
    chains = 3,
    parallel_chains = 3,
    iter_warmup = 500,
    iter_sampling = 1000,
    refresh = 500
)
Running MCMC with 3 parallel chains...

Chain 1 Iteration:    1 / 1500 [  0%]  (Warmup) 
Chain 2 Iteration:    1 / 1500 [  0%]  (Warmup) 
Chain 3 Iteration:    1 / 1500 [  0%]  (Warmup) 
Chain 1 Iteration:  500 / 1500 [ 33%]  (Warmup) 
Chain 1 Iteration:  501 / 1500 [ 33%]  (Sampling) 
Chain 3 Iteration:  500 / 1500 [ 33%]  (Warmup) 
Chain 3 Iteration:  501 / 1500 [ 33%]  (Sampling) 
Chain 2 Iteration:  500 / 1500 [ 33%]  (Warmup) 
Chain 2 Iteration:  501 / 1500 [ 33%]  (Sampling) 
Chain 1 Iteration: 1000 / 1500 [ 66%]  (Sampling) 
Chain 2 Iteration: 1000 / 1500 [ 66%]  (Sampling) 
Chain 1 Iteration: 1500 / 1500 [100%]  (Sampling) 
Chain 1 finished in 20.6 seconds.
Chain 3 Iteration: 1000 / 1500 [ 66%]  (Sampling) 
Chain 2 Iteration: 1500 / 1500 [100%]  (Sampling) 
Chain 2 finished in 24.6 seconds.
Chain 3 Iteration: 1500 / 1500 [100%]  (Sampling) 
Chain 3 finished in 29.7 seconds.

All 3 chains finished successfully.
Mean chain execution time: 24.9 seconds.
Total execution time: 29.9 seconds.
mcfa_col_fit$summary(c("lambda", "thetaw", "thetab", "psib")) |>
    knitr::kable(digits = 2)
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lambda[1] 0.69 0.69 0.03 0.03 0.64 0.74 1.00 999.44 1628.74
lambda[2] 0.90 0.90 0.03 0.03 0.85 0.96 1.01 531.93 1299.32
lambda[3] 0.71 0.71 0.03 0.03 0.66 0.76 1.01 908.96 1675.79
lambda[4] 0.80 0.80 0.03 0.03 0.75 0.86 1.00 821.88 1468.70
thetaw[1] 0.71 0.71 0.02 0.02 0.68 0.74 1.00 1662.40 1910.28
thetaw[2] 0.68 0.68 0.02 0.03 0.64 0.72 1.00 1077.08 2154.01
thetaw[3] 0.65 0.65 0.02 0.02 0.61 0.68 1.00 2099.57 1898.13
thetaw[4] 0.70 0.70 0.02 0.02 0.66 0.73 1.00 1601.97 2288.34
thetab[1] 0.07 0.07 0.04 0.05 0.01 0.15 1.00 437.48 926.96
thetab[2] 0.06 0.06 0.04 0.05 0.01 0.14 1.00 601.17 1007.60
thetab[3] 0.28 0.28 0.04 0.04 0.22 0.34 1.00 1184.81 1481.16
thetab[4] 0.27 0.27 0.04 0.04 0.21 0.33 1.00 1173.91 1415.87
psib 0.21 0.21 0.07 0.07 0.06 0.31 1.02 101.08 137.19

This is still relatively quick, but won’t work when \(\Theta^w\) is not diagonal.

Marginal likelihood using sufficient statistics

From citations such as this and this and this, the likelihood function for cluster \(j\) is

\[ \begin{aligned} \ell_j & = n_j \log (2 \pi) + (n_j - 1) \log | \boldsymbol{\mathbf{\Sigma}}^w_j | + \sum_{i = 1}^{n_j}(\boldsymbol{\mathbf{y}}_{ij} - \bar{\boldsymbol{\mathbf{y}}}_j)^\top {\boldsymbol{\mathbf{\Sigma}}^w_j}^{-1} (\boldsymbol{\mathbf{y}}_{ij} - \bar{\boldsymbol{\mathbf{y}}}_j) \\ & \quad + \log | n_j \boldsymbol{\mathbf{\Sigma}}^b + \boldsymbol{\mathbf{\Sigma}}^w_j | + (\bar{\boldsymbol{\mathbf{y}}}_j - \bar{\boldsymbol{\mathbf{y}}})^\top (n_j \boldsymbol{\mathbf{\Sigma}}^b + \boldsymbol{\mathbf{\Sigma}}^w_j)^{-1} (\bar{\boldsymbol{\mathbf{y}}} - \mu) + \\ & \quad (\bar{\boldsymbol{\mathbf{y}}} - \mu)^\top {\boldsymbol{\mathbf{\Sigma}}^w_j}^{-1} (\boldsymbol{\mathbf{y}}_j - \bar{\boldsymbol{\mathbf{y}}}_j) \end{aligned} \tag{1}\]

This shows that the likelihood function has two multivariate normal components: that \((\boldsymbol{\mathbf{y}}_j - \bar{\boldsymbol{\mathbf{y}}}_j)\) is multivariate normal with mean 0 and covariance \(\boldsymbol{\mathbf{\Sigma}}^w_j\) with a sample size of \(n_j - 1\), and that \((\bar{\boldsymbol{\mathbf{y}}}_j - \bar{\boldsymbol{\mathbf{y}}})\) is multivariate normal with mean 0 and covariance \(n_j \boldsymbol{\mathbf{\Sigma}}^b + \boldsymbol{\mathbf{\Sigma}}^w_j\).

As \(\boldsymbol{\mathbf{a}}^\top \boldsymbol{\mathbf{B}} \boldsymbol{\mathbf{a}}\) is a scalar and can be written as \(\mathop{\mathrm{Tr}}(\boldsymbol{\mathbf{B}} \boldsymbol{\mathbf{a}} \boldsymbol{\mathbf{a}}^\top)\), the sufficient statistics in the above likelihood function is the cluster-specific sample cross-product matrix, \(\boldsymbol{\mathbf{S}}^w_j\) = \(\sum_{i = 1}^{n_j} (\boldsymbol{\mathbf{y}}_{ij} - \bar{\boldsymbol{\mathbf{y}}}_j) (\boldsymbol{\mathbf{y}}_{ij} - \bar{\boldsymbol{\mathbf{y}}}_j)^\top\), and the sample cluster means, \(\bar{\boldsymbol{\mathbf{y}}}_j\).

Tip

When the within-cluster covariance is assumed homogeneous, the likelihood function will depend only on the total within-sample cross-product matrix,

\[ \boldsymbol{\mathbf{S}}^w_\text{pooled} = \sum_{j = 1}^J \boldsymbol{\mathbf{S}}^w_j. \]

Tip

When the design is balanced such that the cluster sizes are equal, on top of homogeneous of covariance condition, the likelihood function will depend only on the between-sample cross-product matrix,

\[ \boldsymbol{\mathbf{S}}^b = \sum_{j = 1}^J (\boldsymbol{\mathbf{y}}_j - \bar{\boldsymbol{\mathbf{y}}}_j) (\boldsymbol{\mathbf{y}}_j - \bar{\boldsymbol{\mathbf{y}}}_j)^\top. \]

The marginal likelihood approach is also used in blavaan. Below, I explore coding the model in STAN. Note that I ignore the mean structure since it is saturated in the factor model (as in single-level analysis), such that the last term in Equation 1 is 0.

Balanced design, pooled matrices

//
// This Stan program defines a one-factor two-level CFA model with a
// marginal likelihood approach similar to one described in
// https://ecmerkle.github.io/blavaan/articles/multilevel.html. 
//
// Define function for computing multi-normal log density using sufficient
// statistics
functions {
  real multi_normal_suff_lpdf(matrix S,
                              matrix Sigma,
                              int n) {
    return -0.5 * (n * (log_determinant_spd(Sigma) + cols(S) * log(2 * pi())) +
      trace(mdivide_left_spd(Sigma, S)));
  }
}
// The input data are (a) sw, pooled within-cluster cross-product matrix,
// and (b) sb, cluster-specific outer-product of cluster means from the 
// grand means.
data {
  int<lower=1> p;  // number of items
  int<lower=1> J;  // number of clusters
  int<lower=1> n;  // cluster size
  matrix[p, p] sw;  // pooled within cross-product matrix
  matrix[p, p] sb;  // between cross-product matrix of cluster mean
}
// Compute within degrees of freedom = N - J
transformed data {
  int dfw = J * (n - 1);
}
// The parameters accepted by the model. Our model
// accepts four parameters: 
// 'lambda_star' for unscaled loadings, 'nu' for intercepts, 
// 'thetaw' and 'thetab' for uniqueness at the within and the between levels,
// and 'psib' for latent factor SD
parameters {
  vector[p] lambda_star;  // unscaled loading vector assuming cross-level 
                          // invariance
  vector<lower=0>[p] thetaw;  // within-cluster uniqueness vector
  vector<lower=0>[p] thetab;  // between-cluster uniqueness vector
  real<lower=0> psib;  // between-cluster latent standard deviation
}

// The model to be estimated. We compute the log-likelihood as the sum of
// two components: cluster-mean centered variables are multivariate normal
// with covariance matrix sigmaw (and degree of freedom adjustment),
// and cluster-mean variables are multivariate normal with covariance
// matrix sigmab + sigmaw / nj
model {
  matrix[p, p] LLt = lambda_star * lambda_star';
  matrix[p, p] sigmaw = add_diag(LLt, square(thetaw));
  matrix[p, p] sigmab = add_diag(square(psib) .* LLt, square(thetab));
  lambda_star ~ normal(0, 10);
  thetaw ~ gamma(1, .5);
  thetab ~ gamma(1, .5);
  psib ~ gamma(1, .5);
  target += multi_normal_suff_lpdf(sw | sigmaw, dfw);
  target += multi_normal_suff_lpdf(sb | n * sigmab + sigmaw, J);
}
// Rotating the factor loadings
// (see https://discourse.mc-stan.org/t/latent-factor-loadings/1483)
generated quantities {
  vector[p] lambda;  // final loading vector
  {
    int sign_l1 = lambda_star[1] > 0 ? 1 : -1;
    lambda = sign_l1 * lambda_star;
  }
}
# Prepare data for STAN
yc <- sweep(dat[1:4], MARGIN = 2, STATS = colMeans(dat[1:4])) # centered data
n <- table(dat$id)[1]
# Cross-product matrices
ssw <- crossprod(apply(yc, MARGIN = 2, FUN = \(x) x - ave(x, dat$id)))
ssb <- crossprod(apply(yc, MARGIN = 2, FUN = ave, dat$id))
# Run STAN
mcfa_suff_eq_fit <- mcfa_suff_eq$sample(
    data = list(
        p = 4,
        J = num_clus,
        n = n,
        sw = ssw,
        sb = ssb
    ),
    chains = 3,
    parallel_chains = 3,
    iter_warmup = 500,
    iter_sampling = 1000,
    refresh = 500
)
Running MCMC with 3 parallel chains...

Chain 1 Iteration:    1 / 1500 [  0%]  (Warmup) 
Chain 1 Iteration:  500 / 1500 [ 33%]  (Warmup) 
Chain 1 Iteration:  501 / 1500 [ 33%]  (Sampling) 
Chain 2 Iteration:    1 / 1500 [  0%]  (Warmup) 
Chain 2 Iteration:  500 / 1500 [ 33%]  (Warmup) 
Chain 2 Iteration:  501 / 1500 [ 33%]  (Sampling) 
Chain 3 Iteration:    1 / 1500 [  0%]  (Warmup) 
Chain 3 Iteration:  500 / 1500 [ 33%]  (Warmup) 
Chain 3 Iteration:  501 / 1500 [ 33%]  (Sampling) 
Chain 1 Iteration: 1000 / 1500 [ 66%]  (Sampling) 
Chain 2 Iteration: 1000 / 1500 [ 66%]  (Sampling) 
Chain 2 Iteration: 1500 / 1500 [100%]  (Sampling) 
Chain 3 Iteration: 1000 / 1500 [ 66%]  (Sampling) 
Chain 3 Iteration: 1500 / 1500 [100%]  (Sampling) 
Chain 2 finished in 0.2 seconds.
Chain 3 finished in 0.2 seconds.
Chain 1 Iteration: 1500 / 1500 [100%]  (Sampling) 
Chain 1 finished in 0.4 seconds.

All 3 chains finished successfully.
Mean chain execution time: 0.3 seconds.
Total execution time: 0.4 seconds.
mcfa_suff_eq_fit$summary(c("lambda", "thetaw", "thetab", "psib")) |>
    knitr::kable(digits = 2)
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lambda[1] 0.69 0.69 0.03 0.03 0.64 0.74 1 2115.03 1824.81
lambda[2] 0.91 0.91 0.03 0.03 0.85 0.96 1 2112.61 2370.28
lambda[3] 0.71 0.71 0.03 0.03 0.67 0.76 1 2532.31 2363.83
lambda[4] 0.80 0.80 0.03 0.03 0.75 0.85 1 2024.79 1859.57
thetaw[1] 0.71 0.71 0.02 0.02 0.68 0.75 1 2896.23 2004.86
thetaw[2] 0.68 0.68 0.02 0.02 0.64 0.72 1 3073.11 2277.23
thetaw[3] 0.65 0.65 0.02 0.02 0.61 0.68 1 3117.24 2193.45
thetaw[4] 0.70 0.70 0.02 0.02 0.66 0.73 1 3421.58 2427.64
thetab[1] 0.07 0.06 0.04 0.05 0.01 0.14 1 1405.63 879.04
thetab[2] 0.06 0.06 0.04 0.05 0.01 0.14 1 1523.94 998.94
thetab[3] 0.28 0.28 0.03 0.03 0.22 0.33 1 4263.74 1585.69
thetab[4] 0.27 0.27 0.04 0.04 0.21 0.33 1 3161.41 1439.60
psib 0.20 0.21 0.08 0.07 0.06 0.31 1 1017.21 492.93

Using cluster-specific matrices

//
// This Stan program defines a one-factor two-level CFA model with a
// marginal likelihood approach similar to one described in
// https://ecmerkle.github.io/blavaan/articles/multilevel.html. The approach
// allows for unequal cluster sizes, but assumes equal loadings and latent
// variances across clusters, and every observations have complete data
// on all items.
//
// Define function for computing multi-normal log density using sufficient
// statistics
functions {
  real multi_normal_suff_lpdf(matrix S,
                              matrix Sigma,
                              int n) {
    return -0.5 * (n * (log_determinant_spd(Sigma) + cols(S) * log(2 * pi())) +
      trace(mdivide_left_spd(Sigma, S)));
  }
}
// The input data are (a) sw, cluster-specific within-cluster cross-product
// matrix, and (b) ybar, cluster-specific cluster means of the items.
data {
  int<lower=1> p;  // number of items
  int<lower=1> J;  // number of clusters
  array[J] int<lower=1> n;  // cluster size
  array[J] matrix[p, p] sw;  // within cross-product matrices
  array[J] vector[p] ybar;  // cluster means
}
// The parameters accepted by the model. Our model
// accepts four parameters: 
// 'lambda_star' for unscaled loadings, 'nu' for intercepts, 
// 'thetaw' and 'thetab' for uniqueness at the within and the between levels,
// and 'psib' for latent factor SD
parameters {
  vector[p] lambda_star;  // unscaled loading vector assuming cross-level 
                          // invariance
  vector<lower=0>[p] thetaw;  // within-cluster uniqueness vector
  vector<lower=0>[p] thetab;  // between-cluster uniqueness vector
  real<lower=0> psib;  // between-cluster latent standard deviation
  vector[p] nu;  // item means/intercepts
}

// The model to be estimated. We compute the log-likelihood as the sum of
// two components: cluster-mean centered variables are multivariate normal
// with covariance matrix sigmaw (and degree of freedom adjustment),
// and cluster-mean variables are multivariate normal with covariance
// matrix sigmab + sigmaw / nj
model {
  matrix[p, p] LLt = lambda_star * lambda_star';
  matrix[p, p] sigmaw = add_diag(LLt, square(thetaw));
  matrix[p, p] sigmab = add_diag(square(psib) .* LLt, square(thetab));
  lambda_star ~ normal(0, 10);
  thetaw ~ gamma(1, .5);
  thetab ~ gamma(1, .5);
  psib ~ gamma(1, .5);
  nu ~ normal(0, 32);
  for (j in 1:J) {
    target += multi_normal_suff_lpdf(sw[j] | sigmaw, n[j] - 1);
    ybar[j] ~ multi_normal(nu, sigmab + sigmaw / n[j]);
  }
}
// Rotating the factor loadings
// (see https://discourse.mc-stan.org/t/latent-factor-loadings/1483)
generated quantities {
  vector[p] lambda;  // final loading vector
  {
    int sign_l1 = lambda_star[1] > 0 ? 1 : -1;
    lambda = sign_l1 * lambda_star;
  }
}
# Prepare data for STAN
nj <- table(dat$id)
# Cluster-specific cross-product matrices
sswj <- tapply(dat[1:4], INDEX = dat$id,
               FUN = \(x) tcrossprod(t(x) - colMeans(x)))
ybarj <- tapply(dat[1:4], INDEX = dat$id, FUN = \(x) colMeans(x))
# ssbj <- tapply(yc, INDEX = dat$id,
#                FUN = \(x) nrow(x) * tcrossprod(colMeans(x)))
# Run STAN
mcfa_suffj_fit <- mcfa_suffj$sample(
    data = list(
        p = 4,
        J = num_clus,
        n = nj,
        sw = sswj,
        ybar = ybarj
    ),
    chains = 3,
    parallel_chains = 3,
    iter_warmup = 500,
    iter_sampling = 1000,
    refresh = 500
)
Running MCMC with 3 parallel chains...

Chain 1 Iteration:    1 / 1500 [  0%]  (Warmup) 
Chain 2 Iteration:    1 / 1500 [  0%]  (Warmup) 
Chain 3 Iteration:    1 / 1500 [  0%]  (Warmup) 
Chain 3 Iteration:  500 / 1500 [ 33%]  (Warmup) 
Chain 3 Iteration:  501 / 1500 [ 33%]  (Sampling) 
Chain 2 Iteration:  500 / 1500 [ 33%]  (Warmup) 
Chain 2 Iteration:  501 / 1500 [ 33%]  (Sampling) 
Chain 2 Iteration: 1000 / 1500 [ 66%]  (Sampling) 
Chain 3 Iteration: 1000 / 1500 [ 66%]  (Sampling) 
Chain 1 Iteration:  500 / 1500 [ 33%]  (Warmup) 
Chain 1 Iteration:  501 / 1500 [ 33%]  (Sampling) 
Chain 2 Iteration: 1500 / 1500 [100%]  (Sampling) 
Chain 2 finished in 9.5 seconds.
Chain 3 Iteration: 1500 / 1500 [100%]  (Sampling) 
Chain 3 finished in 9.6 seconds.
Chain 1 Iteration: 1000 / 1500 [ 66%]  (Sampling) 
Chain 1 Iteration: 1500 / 1500 [100%]  (Sampling) 
Chain 1 finished in 12.7 seconds.

All 3 chains finished successfully.
Mean chain execution time: 10.6 seconds.
Total execution time: 12.8 seconds.
mcfa_suffj_fit$summary(c("lambda", "thetaw", "thetab", "psib", "nu")) |>
    knitr::kable(digits = 2)
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lambda[1] 0.69 0.69 0.03 0.03 0.64 0.74 1 2389.14 2410.00
lambda[2] 0.90 0.90 0.03 0.03 0.85 0.96 1 2533.68 2465.12
lambda[3] 0.71 0.71 0.03 0.03 0.67 0.76 1 2537.37 1888.83
lambda[4] 0.80 0.80 0.03 0.03 0.75 0.85 1 2589.05 2194.45
thetaw[1] 0.71 0.71 0.02 0.02 0.68 0.74 1 3178.38 1778.06
thetaw[2] 0.68 0.68 0.02 0.02 0.64 0.72 1 3006.17 2181.10
thetaw[3] 0.65 0.65 0.02 0.02 0.62 0.68 1 3638.43 2348.81
thetaw[4] 0.70 0.70 0.02 0.02 0.66 0.73 1 3676.69 2328.54
thetab[1] 0.07 0.06 0.04 0.05 0.01 0.15 1 1581.35 995.32
thetab[2] 0.06 0.06 0.04 0.05 0.00 0.14 1 1348.69 792.04
thetab[3] 0.28 0.28 0.04 0.04 0.22 0.34 1 3844.85 1633.48
thetab[4] 0.27 0.27 0.04 0.04 0.21 0.34 1 2751.53 1579.56
psib 0.21 0.22 0.07 0.07 0.10 0.32 1 2068.06 990.22
nu[1] 0.02 0.02 0.04 0.04 -0.04 0.08 1 1769.28 2057.31
nu[2] 0.35 0.35 0.04 0.04 0.29 0.42 1 1671.45 1916.33
nu[3] -0.45 -0.45 0.05 0.04 -0.53 -0.38 1 2071.82 1845.36
nu[4] 0.27 0.27 0.05 0.05 0.19 0.35 1 1846.74 2067.26

Note that not even is the sampling faster, but the effective sample sizes are also higher, so the second approach is much more efficient.

data.frame(
    model = c("blavaan", "columnwise", "sufficient statistics (balanced)", "sufficient statistics (cluster-specific)"),
    ess = c(
        blavInspect(mcfa_uncon_bfit, "draws") |>
            posterior::summarise_draws("ess_bulk") |>
            base::`[[`(2) |> base::`[`(16),
        mcfa_col_fit$summary(variables = "psib", posterior::ess_bulk)[[2]],
        mcfa_suff_eq_fit$summary(variables = "psib", posterior::ess_bulk)[[2]],
        mcfa_suffj_fit$summary(variables = "psib", posterior::ess_bulk)[[2]]
    ),
    time = c(
        mcfa_uncon_bfit@timing$Estimate,
        mcfa_col_fit$time()$total,
        mcfa_suff_eq_fit$time()$total,
        mcfa_suffj_fit$time()$total
    )
) |>
    knitr::kable(digits = 2, col.names = c("Model", "ESS (Bulk)", "Time (seconds)"))
Table 1: Sampling efficiency of different approaches
Model ESS (Bulk) Time (seconds)
blavaan 289.90 32.57
columnwise 101.08 29.88
sufficient statistics (balanced) 1017.21 0.42
sufficient statistics (cluster-specific) 2068.06 12.79