Bayesian Multilevel Factor Model (Part I)

Comparing different approaches for fitting multilevel factor models with random intercepts

Author

Mark Lai

Published

July 31, 2024

In this note, I explore (mainly Bayesian) methods for fitting a multilevel factor model.

library(lavaan)

This is lavaan 0.6-18
lavaan is FREE software! Please report any bugs.

library(cmdstanr)

This is cmdstanr version 0.8.1

- CmdStanR documentation and vignettes: mc-stan.org/cmdstanr

- CmdStan path: /home/markl/.cmdstan/cmdstan-2.35.0

- CmdStan version: 2.35.0

register_knitr_engine()
library(blavaan)

Loading required package: Rcpp

This is blavaan 0.5-5

On multicore systems, we suggest use of future::plan("multicore") or
  future::plan("multisession") for faster post-MCMC computations.

future::plan("multicore", workers = 3)

Model Equation

For response vector $y_{i j}$ of length $p$ for person $i$ in cluster $j$ , the multilevel factor model is

$y_{i j} = ν + Λ η_{i j} + ε_{j}^{b} + ε_{i j}^{w}$

with

$\begin{aligned} E (ε_{j}^{b}) & = 0 \\ E (ε_{i j}^{w}) & = 0 \\ E (η_{i j}) & = α \\ V (ε_{j}^{b}) & = Θ^{b} \\ V (ε_{i j}^{w}) & = Θ^{w} \\ V (η_{i j}) & = Ψ^{w} + Ψ^{b} \end{aligned}$

Identification Constraints

For each component in $η$ , we need one constraint on the mean structure and one constraint on the covariance structure. Conventionally, for the mean structure, we either set $E (η) = 0$ or $ν = 0$ for one item; for the covariance structure, we either set $V (η) = 1$ or $λ = 1$ for one item.

Assumptions

Here we assume cross-level loading invariance (see this paper and this paper, for example), such that there is only one loading matrix $Λ$ that applies to both levels, so that the latent variable has both a within-level and a between-level component:

$η_{i j} = η_{j}^{b} + η_{i j}^{w}$

However, the loadings can still be randomly varying across clusters, which will be a discussion for Part II.

We also assume that $ε_{j}^{b}$ is i.i.d. across clusters and that $ε_{i j}^{w}$ is i.i.d. across individuals.

Conditional Independence

In Bayesian estimation we try to find conditional independence in the data so that the estimation can be simplified. Here, assuming multivariate normality of $ε^{b}$ and $ε^{w}$ , we have independent observations when conditioned on the level-2 parameters:

$y_{i j} ∣ η_{j}^{b}, ε_{j}^{b} \sim N (ν + ε_{j}^{b} + Λ α, Σ^{w}),$

where $Σ^{w} = Λ Ψ^{w} Λ^{⊤} + Θ^{w}$ . Note that the within-cluster covariance is constant.

Simulate Data

Set up: One factor, Four items; two items with random intercepts

# Design parameters
set.seed(123)
num_clus <- 100
clus_size <- 10
icc_eta <- .10
lambda <- c(.7, .9, .7, .8)
nu <- c(0, 0.3, -0.5, 0.2)
theta_lambda <- c(0, 0, 0, 0.08) # loading variances
theta_nu <- c(0, 0, 0.1, 0.05) # intercept variances
theta <- c(.51, .51, .40, .40) # uniqueness
# assume one-factor model holds for now

# Simulate latent variable
clus_id <- rep(1:num_clus, each = clus_size)
etab <- rnorm(num_clus, sd = sqrt(icc_eta / (1 - icc_eta)))
etaw <- rnorm(num_clus * clus_size)
eta <- etab[clus_id] + etaw
# confirm the simulated variable behaves as expected
# lmer(eta ~ (1 | clus_id))

# Simulate items for each cluster
num_items <- 4
y <- lapply(1:num_clus, FUN = \(j) {
    yj <- nu + rnorm(nu, sd = sqrt(theta_nu)) +
    tcrossprod(
        lambda + rnorm(lambda, sd = sqrt(theta_lambda)),
        etab[j] + etaw[clus_id == j]
    ) +
        rnorm(num_items * clus_size, sd = sqrt(theta))
    t(yj)
})
dat <- do.call(rbind, y) |>
    as.data.frame() |>
    setNames(paste0("y", 1:4)) |>
    cbind(id = clus_id)

Frequentist with `lavaan`

mcfa_mod <- "
  level: 1
    f =~ l1 * y1 + l2 * y2 + l3 * y3 + l4 * y4
    f ~~ 1 * f
  level: 2
    f =~ l1 * y1 + l2 * y2 + l3 * y3 + l4 * y4
"
mcfa_fit <- cfa(mcfa_mod, data = dat, cluster = "id", auto.fix.first = FALSE)

Warning: lavaan->lav_object_post_check():  
   some estimated ov variances are negative

summary(mcfa_fit)

lavaan 0.6-18 ended normally after 25 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        21
  Number of equality constraints                     4

  Number of observations                          1000
  Number of clusters [id]                          100

Model Test User Model:
                                                      
  Test statistic                                13.531
  Degrees of freedom                                 7
  P-value (Chi-square)                           0.060

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Observed
  Observed information based on                Hessian


Level 1 [within]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  f =~                                                
    y1        (l1)    0.685    0.030   22.669    0.000
    y2        (l2)    0.901    0.034   26.834    0.000
    y3        (l3)    0.711    0.029   24.260    0.000
    y4        (l4)    0.797    0.032   24.981    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
    f                 1.000                           
   .y1                0.507    0.029   17.291    0.000
   .y2                0.462    0.034   13.728    0.000
   .y3                0.417    0.026   16.021    0.000
   .y4                0.484    0.031   15.681    0.000


Level 2 [id]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  f =~                                                
    y1        (l1)    0.685    0.030   22.669    0.000
    y2        (l2)    0.901    0.034   26.834    0.000
    y3        (l3)    0.711    0.029   24.260    0.000
    y4        (l4)    0.797    0.032   24.981    0.000

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .y1                0.019    0.035    0.530    0.596
   .y2                0.351    0.041    8.639    0.000
   .y3               -0.457    0.044  -10.380    0.000
   .y4                0.266    0.046    5.747    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .y1                0.003    0.010    0.325    0.745
   .y2               -0.003    0.012   -0.257    0.798
   .y3                0.076    0.019    4.022    0.000
   .y4                0.069    0.019    3.595    0.000
    f                 0.051    0.027    1.893    0.058

Bayesian with `blavaan`

To my knowledge, the current version of blavaan does not support cross-level constraints.

mcfa_bfit <- bcfa(mcfa_mod, data = dat, cluster = "id", auto.fix.first = FALSE)

blavaan NOTE: two-level models are new, please report bugs!
https://github.com/ecmerkle/blavaan/issues

Warning: blavaan WARNING: If you are manually fixing lv variances to 1 for
identification, please use std.lv=TRUE.

[1] "Error in matattr(fr, es, constrain, mat = \"Lambda_y\", Ng, opts$std.lv,  : \n  blavaan ERROR: cross-matrix equality constraints not supported.\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in matattr(fr, es, constrain, mat = "Lambda_y", Ng, opts$std.lv,     tmpwig): blavaan ERROR: cross-matrix equality constraints not supported.>

Error in blavaan(mcfa_mod, data = dat, cluster = "id", auto.fix.first = TRUE, : blavaan ERROR: problem with translation from lavaan to MCMC syntax.

It can fit a model without cross-level invariance:

mcfa_uncon_mod <- "
  level: 1
    fw =~ y1 + y2 + y3 + y4
  level: 2
    fb =~ y1 + y2 + y3 + y4
"
mcfa_uncon_bfit <- bcfa(mcfa_uncon_mod, data = dat, cluster = "id",
                        bcontrol = list(cores = 3))

blavaan NOTE: two-level models are new, please report bugs!
https://github.com/ecmerkle/blavaan/issues

Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess

Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess

Computing post-estimation metrics (including lvs if requested)...

summary(mcfa_uncon_bfit)

blavaan 0.5.5 ended normally after 1000 iterations

  Estimator                                      BAYES
  Optimization method                             MCMC
  Number of model parameters                        20

  Number of observations                          1000
  Number of clusters [id]                          100

  Statistic                                 MargLogLik         PPP
  Value                                      -5249.174       0.173

Parameter Estimates:



Level 1 [within]:

Latent Variables:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
  fw =~                                                                        
    y1                1.000                                                    
    y2                1.320    0.068    1.199    1.460    1.001    normal(0,10)
    y3                1.038    0.053    0.939    1.155    1.001    normal(0,10)
    y4                1.161    0.061    1.045    1.284    1.002    normal(0,10)

Intercepts:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
   .y1                0.000                                                    
   .y2                0.000                                                    
   .y3                0.000                                                    
   .y4                0.000                                                    
    fw                0.000                                                    

Variances:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
   .y1                0.508    0.029    0.453    0.571    1.000 gamma(1,.5)[sd]
   .y2                0.457    0.033    0.396    0.524    0.999 gamma(1,.5)[sd]
   .y3                0.419    0.026    0.367    0.472    1.002 gamma(1,.5)[sd]
   .y4                0.487    0.031    0.431    0.550    1.000 gamma(1,.5)[sd]
    fw                0.491    0.044    0.408    0.583    1.001 gamma(1,.5)[sd]


Level 2 [id]:

Latent Variables:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
  fb =~                                                                        
    y1                1.000                                                    
    y2                0.407    6.260  -12.891   12.845    1.004    normal(0,10)
    y3                1.727    7.628  -14.137   17.502    1.004    normal(0,10)
    y4                0.567    9.717  -19.056   19.071    1.008    normal(0,10)

Intercepts:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
   .y1                0.019    0.032   -0.044    0.081    1.003    normal(0,32)
   .y2                0.352    0.038    0.278    0.424    1.002    normal(0,32)
   .y3               -0.457    0.041   -0.538   -0.379    1.004    normal(0,32)
   .y4                0.265    0.043    0.181    0.350    1.004    normal(0,32)
    fb                0.000                                                    

Variances:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
   .y1                0.007    0.007    0.000    0.025    1.006 gamma(1,.5)[sd]
   .y2                0.006    0.007    0.000    0.024    0.999 gamma(1,.5)[sd]
   .y3                0.065    0.028    0.004    0.117    1.012 gamma(1,.5)[sd]
   .y4                0.057    0.029    0.002    0.113    1.006 gamma(1,.5)[sd]
    fb                0.001    0.002    0.000    0.007    1.003 gamma(1,.5)[sd]

It takes about 32.568 seconds to estimate the model.

Using STAN

Columnwise approach

Here’s the code I used to fit multilevel factor model many years ago. It assumes that $Θ^{w}$ is diagonal, aka local independence, so that I can treat each item as independent normal variable (after conditioning on the parameters). It does require sampling the latent variables: $η^{b}$ , $η^{w}$ , and $ε^{b}$ .

Listing 1: STAN code for the columnwise approach

//
// This Stan program defines a one-factor two-level CFA model with parameter 
// expansion implemented in Merkle & Wang (2018). 
//

// The input data is a matrix 'y' of 'N' rows and 'p' columns.
data {
  int<lower=0> N;  // number of observations
  int<lower=0> p;  // number of items
  int<lower=0> J;  // number of clusters
  array[N] int<lower=1, upper=J> jj;  // cluster membership indicator
  array[p] vector[N] Y;  // item responses
}

// The parameters accepted by the model. Our model
// accepts these parameters: 
// 'lambda_star' for unscaled loadings, 'nu' for intercepts, 
// 'thetaw' and 'thetab' for uniqueness at the within and the between levels
// 'etaw_star' and 'z_etab_star' for the standardized within-cluster and 
// between-cluster latent factor scores
// 'z_epsb' for standardized between-cluster uniqe factor scores, and 
// 'psib' for latent factor SD
parameters {
  // vector[p] lambdaw_star;
  // vector[p] lambdab_star;
  vector[p] lambda_star;  // unscaled loading vector assuming cross-level 
                          // invariance
  vector[p] nu;  // intercept vector
  vector<lower=0>[p] thetaw;  // within-cluster uniqueness vector
  vector<lower=0>[p] thetab;  // between-cluster uniqueness vector
  vector[N] etaw_star;  // within-cluster latent scores
  vector[J] z_etab_star;  // unscaled between-cluster latent scores
  real<lower=0> psib;  // between-cluster latent standard deviation
  matrix[J, p] z_epsb;  // unscaled between-cluster uniqe factor score matrix
}

// The model to be estimated. We model each column in the output
// 'y' to be normally distributed with mean 
// 'nu + lambda_star * etaw_star + Yb' and standard deviation 'thetaw', where
// 'Yb' = 'etab_star * lambda_star + epsb', and 'epsb' = z_epsb * thetab'
model {
  // lambdaw_star ~ std_normal();
  // lambdab_star ~ std_normal();
  lambda_star ~ std_normal();
  etaw_star ~ std_normal(); 
  z_etab_star ~ std_normal(); 
  thetaw ~ student_t(4, 0, 1);
  thetab ~ std_normal();
  psib ~ std_normal();
  to_vector(z_epsb) ~ std_normal(); 
  for (i in 1:p)
    {
      // 'Yb = etab_star * lambda_star + epsb'
      vector[J] Yb = z_etab_star * psib * lambda_star[i] + 
                     col(z_epsb, i) * thetab[i];
      // 'Y ~ N(nu + lambda_star * etaw_star + Yb, thetaw)'
      Y[i] ~ normal(nu[i] + etaw_star * lambda_star[i] + Yb[jj], 
                    thetaw[i]);
    }
}

generated quantities {
  vector[p] lambda;  // final loading vector
  vector[N] etaw;  // final within-cluster factor score
  vector[J] etab;  // final between-cluster factor score
  {
    int sign_l1 = lambda_star[1] > 0 ? 1 : -1;
    lambda = sign_l1 * lambda_star;
    etaw = sign_l1 * etaw_star;
    etab = sign_l1 * z_etab_star * psib;
  }
}

mcfa_col_fit <- mcfa_col$sample(
    data = list(
        N = nrow(dat),
        p = 4,
        Y = t(dat[1:4]),
        J = length(unique(dat$id)),
        jj = dat$id
    ),
    chains = 3,
    parallel_chains = 3,
    iter_warmup = 500,
    iter_sampling = 1000,
    refresh = 500
)

Running MCMC with 3 parallel chains...

Chain 1 Iteration:    1 / 1500 [  0%]  (Warmup) 
Chain 2 Iteration:    1 / 1500 [  0%]  (Warmup) 
Chain 3 Iteration:    1 / 1500 [  0%]  (Warmup) 
Chain 1 Iteration:  500 / 1500 [ 33%]  (Warmup) 
Chain 1 Iteration:  501 / 1500 [ 33%]  (Sampling) 
Chain 3 Iteration:  500 / 1500 [ 33%]  (Warmup) 
Chain 3 Iteration:  501 / 1500 [ 33%]  (Sampling) 
Chain 2 Iteration:  500 / 1500 [ 33%]  (Warmup) 
Chain 2 Iteration:  501 / 1500 [ 33%]  (Sampling) 
Chain 1 Iteration: 1000 / 1500 [ 66%]  (Sampling) 
Chain 2 Iteration: 1000 / 1500 [ 66%]  (Sampling) 
Chain 1 Iteration: 1500 / 1500 [100%]  (Sampling) 
Chain 1 finished in 20.6 seconds.
Chain 3 Iteration: 1000 / 1500 [ 66%]  (Sampling) 
Chain 2 Iteration: 1500 / 1500 [100%]  (Sampling) 
Chain 2 finished in 24.6 seconds.
Chain 3 Iteration: 1500 / 1500 [100%]  (Sampling) 
Chain 3 finished in 29.7 seconds.

All 3 chains finished successfully.
Mean chain execution time: 24.9 seconds.
Total execution time: 29.9 seconds.

mcfa_col_fit$summary(c("lambda", "thetaw", "thetab", "psib")) |>
    knitr::kable(digits = 2)

variable	mean	median	sd	mad	q5	q95	rhat	ess_bulk	ess_tail
lambda[1]	0.69	0.69	0.03	0.03	0.64	0.74	1.00	999.44	1628.74
lambda[2]	0.90	0.90	0.03	0.03	0.85	0.96	1.01	531.93	1299.32
lambda[3]	0.71	0.71	0.03	0.03	0.66	0.76	1.01	908.96	1675.79
lambda[4]	0.80	0.80	0.03	0.03	0.75	0.86	1.00	821.88	1468.70
thetaw[1]	0.71	0.71	0.02	0.02	0.68	0.74	1.00	1662.40	1910.28
thetaw[2]	0.68	0.68	0.02	0.03	0.64	0.72	1.00	1077.08	2154.01
thetaw[3]	0.65	0.65	0.02	0.02	0.61	0.68	1.00	2099.57	1898.13
thetaw[4]	0.70	0.70	0.02	0.02	0.66	0.73	1.00	1601.97	2288.34
thetab[1]	0.07	0.07	0.04	0.05	0.01	0.15	1.00	437.48	926.96
thetab[2]	0.06	0.06	0.04	0.05	0.01	0.14	1.00	601.17	1007.60
thetab[3]	0.28	0.28	0.04	0.04	0.22	0.34	1.00	1184.81	1481.16
thetab[4]	0.27	0.27	0.04	0.04	0.21	0.33	1.00	1173.91	1415.87
psib	0.21	0.21	0.07	0.07	0.06	0.31	1.02	101.08	137.19

This is still relatively quick, but won’t work when $Θ^{w}$ is not diagonal.

Marginal likelihood using sufficient statistics

From citations such as this and this and this, the likelihood function for cluster $j$ is

$\begin{matrix} (1) & \begin{aligned} ℓ_{j} & = n_{j} \log (2 π) + (n_{j} - 1) \log | Σ_{j}^{w} | + \sum_{i = 1}^{n_{j}} (y_{i j} - {\bar{y}}_{j})^{⊤} {Σ_{j}^{w}}^{- 1} (y_{i j} - {\bar{y}}_{j}) \\ + \log | n_{j} Σ^{b} + Σ_{j}^{w} | + ({\bar{y}}_{j} - \bar{y})^{⊤} (n_{j} Σ^{b} + Σ_{j}^{w})^{- 1} (\bar{y} - μ) + \\ (\bar{y} - μ)^{⊤} {Σ_{j}^{w}}^{- 1} (y_{j} - {\bar{y}}_{j}) \end{aligned} \end{matrix}$

This shows that the likelihood function has two multivariate normal components: that $(y_{j} - {\bar{y}}_{j})$ is multivariate normal with mean 0 and covariance $Σ_{j}^{w}$ with a sample size of $n_{j} - 1$ , and that $({\bar{y}}_{j} - \bar{y})$ is multivariate normal with mean 0 and covariance $n_{j} Σ^{b} + Σ_{j}^{w}$ .

As $a^{⊤} B a$ is a scalar and can be written as $Tr (B a a^{⊤})$ , the sufficient statistics in the above likelihood function is the cluster-specific sample cross-product matrix, $S_{j}^{w}$ = $\sum_{i = 1}^{n_{j}} (y_{i j} - {\bar{y}}_{j}) (y_{i j} - {\bar{y}}_{j})^{⊤}$ , and the sample cluster means, ${\bar{y}}_{j}$ .

Tip

When the within-cluster covariance is assumed homogeneous, the likelihood function will depend only on the total within-sample cross-product matrix,

$S_{pooled}^{w} = \sum_{j = 1}^{J} S_{j}^{w} .$

Tip

When the design is balanced such that the cluster sizes are equal, on top of homogeneous of covariance condition, the likelihood function will depend only on the between-sample cross-product matrix,

$S^{b} = \sum_{j = 1}^{J} (y_{j} - {\bar{y}}_{j}) (y_{j} - {\bar{y}}_{j})^{⊤} .$

The marginal likelihood approach is also used in blavaan. Below, I explore coding the model in STAN. Note that I ignore the mean structure since it is saturated in the factor model (as in single-level analysis), such that the last term in Equation 1 is 0.

Balanced design, pooled matrices

//
// This Stan program defines a one-factor two-level CFA model with a
// marginal likelihood approach similar to one described in
// https://ecmerkle.github.io/blavaan/articles/multilevel.html. 
//
// Define function for computing multi-normal log density using sufficient
// statistics
functions {
  real multi_normal_suff_lpdf(matrix S,
                              matrix Sigma,
                              int n) {
    return -0.5 * (n * (log_determinant_spd(Sigma) + cols(S) * log(2 * pi())) +
      trace(mdivide_left_spd(Sigma, S)));
  }
}
// The input data are (a) sw, pooled within-cluster cross-product matrix,
// and (b) sb, cluster-specific outer-product of cluster means from the 
// grand means.
data {
  int<lower=1> p;  // number of items
  int<lower=1> J;  // number of clusters
  int<lower=1> n;  // cluster size
  matrix[p, p] sw;  // pooled within cross-product matrix
  matrix[p, p] sb;  // between cross-product matrix of cluster mean
}
// Compute within degrees of freedom = N - J
transformed data {
  int dfw = J * (n - 1);
}
// The parameters accepted by the model. Our model
// accepts four parameters: 
// 'lambda_star' for unscaled loadings, 'nu' for intercepts, 
// 'thetaw' and 'thetab' for uniqueness at the within and the between levels,
// and 'psib' for latent factor SD
parameters {
  vector[p] lambda_star;  // unscaled loading vector assuming cross-level 
                          // invariance
  vector<lower=0>[p] thetaw;  // within-cluster uniqueness vector
  vector<lower=0>[p] thetab;  // between-cluster uniqueness vector
  real<lower=0> psib;  // between-cluster latent standard deviation
}

// The model to be estimated. We compute the log-likelihood as the sum of
// two components: cluster-mean centered variables are multivariate normal
// with covariance matrix sigmaw (and degree of freedom adjustment),
// and cluster-mean variables are multivariate normal with covariance
// matrix sigmab + sigmaw / nj
model {
  matrix[p, p] LLt = lambda_star * lambda_star';
  matrix[p, p] sigmaw = add_diag(LLt, square(thetaw));
  matrix[p, p] sigmab = add_diag(square(psib) .* LLt, square(thetab));
  lambda_star ~ normal(0, 10);
  thetaw ~ gamma(1, .5);
  thetab ~ gamma(1, .5);
  psib ~ gamma(1, .5);
  target += multi_normal_suff_lpdf(sw | sigmaw, dfw);
  target += multi_normal_suff_lpdf(sb | n * sigmab + sigmaw, J);
}
// Rotating the factor loadings
// (see https://discourse.mc-stan.org/t/latent-factor-loadings/1483)
generated quantities {
  vector[p] lambda;  // final loading vector
  {
    int sign_l1 = lambda_star[1] > 0 ? 1 : -1;
    lambda = sign_l1 * lambda_star;
  }
}

# Prepare data for STAN
yc <- sweep(dat[1:4], MARGIN = 2, STATS = colMeans(dat[1:4])) # centered data
n <- table(dat$id)[1]
# Cross-product matrices
ssw <- crossprod(apply(yc, MARGIN = 2, FUN = \(x) x - ave(x, dat$id)))
ssb <- crossprod(apply(yc, MARGIN = 2, FUN = ave, dat$id))
# Run STAN
mcfa_suff_eq_fit <- mcfa_suff_eq$sample(
    data = list(
        p = 4,
        J = num_clus,
        n = n,
        sw = ssw,
        sb = ssb
    ),
    chains = 3,
    parallel_chains = 3,
    iter_warmup = 500,
    iter_sampling = 1000,
    refresh = 500
)

Running MCMC with 3 parallel chains...

Chain 1 Iteration:    1 / 1500 [  0%]  (Warmup) 
Chain 1 Iteration:  500 / 1500 [ 33%]  (Warmup) 
Chain 1 Iteration:  501 / 1500 [ 33%]  (Sampling) 
Chain 2 Iteration:    1 / 1500 [  0%]  (Warmup) 
Chain 2 Iteration:  500 / 1500 [ 33%]  (Warmup) 
Chain 2 Iteration:  501 / 1500 [ 33%]  (Sampling) 
Chain 3 Iteration:    1 / 1500 [  0%]  (Warmup) 
Chain 3 Iteration:  500 / 1500 [ 33%]  (Warmup) 
Chain 3 Iteration:  501 / 1500 [ 33%]  (Sampling) 
Chain 1 Iteration: 1000 / 1500 [ 66%]  (Sampling) 
Chain 2 Iteration: 1000 / 1500 [ 66%]  (Sampling) 
Chain 2 Iteration: 1500 / 1500 [100%]  (Sampling) 
Chain 3 Iteration: 1000 / 1500 [ 66%]  (Sampling) 
Chain 3 Iteration: 1500 / 1500 [100%]  (Sampling) 
Chain 2 finished in 0.2 seconds.
Chain 3 finished in 0.2 seconds.
Chain 1 Iteration: 1500 / 1500 [100%]  (Sampling) 
Chain 1 finished in 0.4 seconds.

All 3 chains finished successfully.
Mean chain execution time: 0.3 seconds.
Total execution time: 0.4 seconds.

mcfa_suff_eq_fit$summary(c("lambda", "thetaw", "thetab", "psib")) |>
    knitr::kable(digits = 2)

variable	mean	median	sd	mad	q5	q95	rhat	ess_bulk	ess_tail
lambda[1]	0.69	0.69	0.03	0.03	0.64	0.74	1	2115.03	1824.81
lambda[2]	0.91	0.91	0.03	0.03	0.85	0.96	1	2112.61	2370.28
lambda[3]	0.71	0.71	0.03	0.03	0.67	0.76	1	2532.31	2363.83
lambda[4]	0.80	0.80	0.03	0.03	0.75	0.85	1	2024.79	1859.57
thetaw[1]	0.71	0.71	0.02	0.02	0.68	0.75	1	2896.23	2004.86
thetaw[2]	0.68	0.68	0.02	0.02	0.64	0.72	1	3073.11	2277.23
thetaw[3]	0.65	0.65	0.02	0.02	0.61	0.68	1	3117.24	2193.45
thetaw[4]	0.70	0.70	0.02	0.02	0.66	0.73	1	3421.58	2427.64
thetab[1]	0.07	0.06	0.04	0.05	0.01	0.14	1	1405.63	879.04
thetab[2]	0.06	0.06	0.04	0.05	0.01	0.14	1	1523.94	998.94
thetab[3]	0.28	0.28	0.03	0.03	0.22	0.33	1	4263.74	1585.69
thetab[4]	0.27	0.27	0.04	0.04	0.21	0.33	1	3161.41	1439.60
psib	0.20	0.21	0.08	0.07	0.06	0.31	1	1017.21	492.93

Using cluster-specific matrices

//
// This Stan program defines a one-factor two-level CFA model with a
// marginal likelihood approach similar to one described in
// https://ecmerkle.github.io/blavaan/articles/multilevel.html. The approach
// allows for unequal cluster sizes, but assumes equal loadings and latent
// variances across clusters, and every observations have complete data
// on all items.
//
// Define function for computing multi-normal log density using sufficient
// statistics
functions {
  real multi_normal_suff_lpdf(matrix S,
                              matrix Sigma,
                              int n) {
    return -0.5 * (n * (log_determinant_spd(Sigma) + cols(S) * log(2 * pi())) +
      trace(mdivide_left_spd(Sigma, S)));
  }
}
// The input data are (a) sw, cluster-specific within-cluster cross-product
// matrix, and (b) ybar, cluster-specific cluster means of the items.
data {
  int<lower=1> p;  // number of items
  int<lower=1> J;  // number of clusters
  array[J] int<lower=1> n;  // cluster size
  array[J] matrix[p, p] sw;  // within cross-product matrices
  array[J] vector[p] ybar;  // cluster means
}
// The parameters accepted by the model. Our model
// accepts four parameters: 
// 'lambda_star' for unscaled loadings, 'nu' for intercepts, 
// 'thetaw' and 'thetab' for uniqueness at the within and the between levels,
// and 'psib' for latent factor SD
parameters {
  vector[p] lambda_star;  // unscaled loading vector assuming cross-level 
                          // invariance
  vector<lower=0>[p] thetaw;  // within-cluster uniqueness vector
  vector<lower=0>[p] thetab;  // between-cluster uniqueness vector
  real<lower=0> psib;  // between-cluster latent standard deviation
  vector[p] nu;  // item means/intercepts
}

// The model to be estimated. We compute the log-likelihood as the sum of
// two components: cluster-mean centered variables are multivariate normal
// with covariance matrix sigmaw (and degree of freedom adjustment),
// and cluster-mean variables are multivariate normal with covariance
// matrix sigmab + sigmaw / nj
model {
  matrix[p, p] LLt = lambda_star * lambda_star';
  matrix[p, p] sigmaw = add_diag(LLt, square(thetaw));
  matrix[p, p] sigmab = add_diag(square(psib) .* LLt, square(thetab));
  lambda_star ~ normal(0, 10);
  thetaw ~ gamma(1, .5);
  thetab ~ gamma(1, .5);
  psib ~ gamma(1, .5);
  nu ~ normal(0, 32);
  for (j in 1:J) {
    target += multi_normal_suff_lpdf(sw[j] | sigmaw, n[j] - 1);
    ybar[j] ~ multi_normal(nu, sigmab + sigmaw / n[j]);
  }
}
// Rotating the factor loadings
// (see https://discourse.mc-stan.org/t/latent-factor-loadings/1483)
generated quantities {
  vector[p] lambda;  // final loading vector
  {
    int sign_l1 = lambda_star[1] > 0 ? 1 : -1;
    lambda = sign_l1 * lambda_star;
  }
}

# Prepare data for STAN
nj <- table(dat$id)
# Cluster-specific cross-product matrices
sswj <- tapply(dat[1:4], INDEX = dat$id,
               FUN = \(x) tcrossprod(t(x) - colMeans(x)))
ybarj <- tapply(dat[1:4], INDEX = dat$id, FUN = \(x) colMeans(x))
# ssbj <- tapply(yc, INDEX = dat$id,
#                FUN = \(x) nrow(x) * tcrossprod(colMeans(x)))
# Run STAN
mcfa_suffj_fit <- mcfa_suffj$sample(
    data = list(
        p = 4,
        J = num_clus,
        n = nj,
        sw = sswj,
        ybar = ybarj
    ),
    chains = 3,
    parallel_chains = 3,
    iter_warmup = 500,
    iter_sampling = 1000,
    refresh = 500
)

Running MCMC with 3 parallel chains...

Chain 1 Iteration:    1 / 1500 [  0%]  (Warmup) 
Chain 2 Iteration:    1 / 1500 [  0%]  (Warmup) 
Chain 3 Iteration:    1 / 1500 [  0%]  (Warmup) 
Chain 3 Iteration:  500 / 1500 [ 33%]  (Warmup) 
Chain 3 Iteration:  501 / 1500 [ 33%]  (Sampling) 
Chain 2 Iteration:  500 / 1500 [ 33%]  (Warmup) 
Chain 2 Iteration:  501 / 1500 [ 33%]  (Sampling) 
Chain 2 Iteration: 1000 / 1500 [ 66%]  (Sampling) 
Chain 3 Iteration: 1000 / 1500 [ 66%]  (Sampling) 
Chain 1 Iteration:  500 / 1500 [ 33%]  (Warmup) 
Chain 1 Iteration:  501 / 1500 [ 33%]  (Sampling) 
Chain 2 Iteration: 1500 / 1500 [100%]  (Sampling) 
Chain 2 finished in 9.5 seconds.
Chain 3 Iteration: 1500 / 1500 [100%]  (Sampling) 
Chain 3 finished in 9.6 seconds.
Chain 1 Iteration: 1000 / 1500 [ 66%]  (Sampling) 
Chain 1 Iteration: 1500 / 1500 [100%]  (Sampling) 
Chain 1 finished in 12.7 seconds.

All 3 chains finished successfully.
Mean chain execution time: 10.6 seconds.
Total execution time: 12.8 seconds.

mcfa_suffj_fit$summary(c("lambda", "thetaw", "thetab", "psib", "nu")) |>
    knitr::kable(digits = 2)

variable	mean	median	sd	mad	q5	q95	rhat	ess_bulk	ess_tail
lambda[1]	0.69	0.69	0.03	0.03	0.64	0.74	1	2389.14	2410.00
lambda[2]	0.90	0.90	0.03	0.03	0.85	0.96	1	2533.68	2465.12
lambda[3]	0.71	0.71	0.03	0.03	0.67	0.76	1	2537.37	1888.83
lambda[4]	0.80	0.80	0.03	0.03	0.75	0.85	1	2589.05	2194.45
thetaw[1]	0.71	0.71	0.02	0.02	0.68	0.74	1	3178.38	1778.06
thetaw[2]	0.68	0.68	0.02	0.02	0.64	0.72	1	3006.17	2181.10
thetaw[3]	0.65	0.65	0.02	0.02	0.62	0.68	1	3638.43	2348.81
thetaw[4]	0.70	0.70	0.02	0.02	0.66	0.73	1	3676.69	2328.54
thetab[1]	0.07	0.06	0.04	0.05	0.01	0.15	1	1581.35	995.32
thetab[2]	0.06	0.06	0.04	0.05	0.00	0.14	1	1348.69	792.04
thetab[3]	0.28	0.28	0.04	0.04	0.22	0.34	1	3844.85	1633.48
thetab[4]	0.27	0.27	0.04	0.04	0.21	0.34	1	2751.53	1579.56
psib	0.21	0.22	0.07	0.07	0.10	0.32	1	2068.06	990.22
nu[1]	0.02	0.02	0.04	0.04	-0.04	0.08	1	1769.28	2057.31
nu[2]	0.35	0.35	0.04	0.04	0.29	0.42	1	1671.45	1916.33
nu[3]	-0.45	-0.45	0.05	0.04	-0.53	-0.38	1	2071.82	1845.36
nu[4]	0.27	0.27	0.05	0.05	0.19	0.35	1	1846.74	2067.26

Note that not even is the sampling faster, but the effective sample sizes are also higher, so the second approach is much more efficient.

data.frame(
    model = c("blavaan", "columnwise", "sufficient statistics (balanced)", "sufficient statistics (cluster-specific)"),
    ess = c(
        blavInspect(mcfa_uncon_bfit, "draws") |>
            posterior::summarise_draws("ess_bulk") |>
            base::`[[`(2) |> base::`[`(16),
        mcfa_col_fit$summary(variables = "psib", posterior::ess_bulk)[[2]],
        mcfa_suff_eq_fit$summary(variables = "psib", posterior::ess_bulk)[[2]],
        mcfa_suffj_fit$summary(variables = "psib", posterior::ess_bulk)[[2]]
    ),
    time = c(
        mcfa_uncon_bfit@timing$Estimate,
        mcfa_col_fit$time()$total,
        mcfa_suff_eq_fit$time()$total,
        mcfa_suffj_fit$time()$total
    )
) |>
    knitr::kable(digits = 2, col.names = c("Model", "ESS (Bulk)", "Time (seconds)"))

Table 1: Sampling efficiency of different approaches

Model	ESS (Bulk)	Time (seconds)
blavaan	289.90	32.57
columnwise	101.08	29.88
sufficient statistics (balanced)	1017.21	0.42
sufficient statistics (cluster-specific)	2068.06	12.79