For each component in \(\boldsymbol{\mathbf{\eta}}\), we need one constraint on the mean structure and one constraint on the covariance structure. Conventionally, for the mean structure, we either set \(E(\eta) = 0\) or \(\nu = 0\) for one item; for the covariance structure, we either set \(V(\eta) = 1\) or \(\lambda = 1\) for one item.
Assumptions
Here we assume cross-level loading invariance (see this paper and this paper, for example), such that there is only one loading matrix \(\boldsymbol{\mathbf{\Lambda}}\) that applies to both levels, so that the latent variable has both a within-level and a between-level component:
However, the loadings can still be randomly varying across clusters, which will be a discussion for Part II.
We also assume that \(\boldsymbol{\mathbf{\varepsilon}}^{b}_j\) is i.i.d. across clusters and that \(\boldsymbol{\mathbf{\varepsilon}}^{w}_{ij}\) is i.i.d. across individuals.
Conditional Independence
In Bayesian estimation we try to find conditional independence in the data so that the estimation can be simplified. Here, assuming multivariate normality of \(\boldsymbol{\mathbf{\varepsilon}}^{b}\) and \(\boldsymbol{\mathbf{\varepsilon}}^{w}\), we have independent observations when conditioned on the level-2 parameters:
where \(\boldsymbol{\mathbf{\Sigma}}^w = \boldsymbol{\mathbf{\Lambda }}\boldsymbol{\mathbf{\Psi}}^w \boldsymbol{\mathbf{\Lambda}}^\top + \boldsymbol{\mathbf{\Theta}}^w\). Note that the within-cluster covariance is constant.
Simulate Data
Set up: One factor, Four items; two items with random intercepts
blavaan NOTE: two-level models are new, please report bugs!
https://github.com/ecmerkle/blavaan/issues
Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess
Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess
Computing post-estimation metrics (including lvs if requested)...
It takes about 32.568 seconds to estimate the model.
Using STAN
Columnwise approach
Here’s the code I used to fit multilevel factor model many years ago. It assumes that \(\boldsymbol{\mathbf{\Theta}}^w\) is diagonal, aka local independence, so that I can treat each item as independent normal variable (after conditioning on the parameters). It does require sampling the latent variables: \(\eta^b\), \(\eta^w\), and \(\varepsilon^b\).
This shows that the likelihood function has two multivariate normal components: that \((\boldsymbol{\mathbf{y}}_j - \bar{\boldsymbol{\mathbf{y}}}_j)\) is multivariate normal with mean 0 and covariance \(\boldsymbol{\mathbf{\Sigma}}^w_j\) with a sample size of \(n_j - 1\), and that \((\bar{\boldsymbol{\mathbf{y}}}_j - \bar{\boldsymbol{\mathbf{y}}})\) is multivariate normal with mean 0 and covariance \(n_j \boldsymbol{\mathbf{\Sigma}}^b + \boldsymbol{\mathbf{\Sigma}}^w_j\).
As \(\boldsymbol{\mathbf{a}}^\top \boldsymbol{\mathbf{B}} \boldsymbol{\mathbf{a}}\) is a scalar and can be written as \(\mathop{\mathrm{Tr}}(\boldsymbol{\mathbf{B}} \boldsymbol{\mathbf{a}} \boldsymbol{\mathbf{a}}^\top)\), the sufficient statistics in the above likelihood function is the cluster-specific sample cross-product matrix, \(\boldsymbol{\mathbf{S}}^w_j\) = \(\sum_{i = 1}^{n_j} (\boldsymbol{\mathbf{y}}_{ij} - \bar{\boldsymbol{\mathbf{y}}}_j) (\boldsymbol{\mathbf{y}}_{ij} - \bar{\boldsymbol{\mathbf{y}}}_j)^\top\), and the sample cluster means, \(\bar{\boldsymbol{\mathbf{y}}}_j\).
Tip
When the within-cluster covariance is assumed homogeneous, the likelihood function will depend only on the total within-sample cross-product matrix,
When the design is balanced such that the cluster sizes are equal, on top of homogeneous of covariance condition, the likelihood function will depend only on the between-sample cross-product matrix,
The marginal likelihood approach is also used in blavaan. Below, I explore coding the model in STAN. Note that I ignore the mean structure since it is saturated in the factor model (as in single-level analysis), such that the last term in Equation 1 is 0.
Balanced design, pooled matrices
//// This Stan program defines a one-factor two-level CFA model with a// marginal likelihood approach similar to one described in// https://ecmerkle.github.io/blavaan/articles/multilevel.html. //// Define function for computing multi-normal log density using sufficient// statisticsfunctions {real multi_normal_suff_lpdf(matrix S,matrix Sigma,int n) {return -0.5 * (n * (log_determinant_spd(Sigma) + cols(S) * log(2 * pi())) + trace(mdivide_left_spd(Sigma, S))); }}// The input data are (a) sw, pooled within-cluster cross-product matrix,// and (b) sb, cluster-specific outer-product of cluster means from the // grand means.data {int<lower=1> p; // number of itemsint<lower=1> J; // number of clustersint<lower=1> n; // cluster sizematrix[p, p] sw; // pooled within cross-product matrixmatrix[p, p] sb; // between cross-product matrix of cluster mean}// Compute within degrees of freedom = N - Jtransformed data {int dfw = J * (n - 1);}// The parameters accepted by the model. Our model// accepts four parameters: // 'lambda_star' for unscaled loadings, 'nu' for intercepts, // 'thetaw' and 'thetab' for uniqueness at the within and the between levels,// and 'psib' for latent factor SDparameters {vector[p] lambda_star; // unscaled loading vector assuming cross-level // invariancevector<lower=0>[p] thetaw; // within-cluster uniqueness vectorvector<lower=0>[p] thetab; // between-cluster uniqueness vectorreal<lower=0> psib; // between-cluster latent standard deviation}// The model to be estimated. We compute the log-likelihood as the sum of// two components: cluster-mean centered variables are multivariate normal// with covariance matrix sigmaw (and degree of freedom adjustment),// and cluster-mean variables are multivariate normal with covariance// matrix sigmab + sigmaw / njmodel {matrix[p, p] LLt = lambda_star * lambda_star';matrix[p, p] sigmaw = add_diag(LLt, square(thetaw));matrix[p, p] sigmab = add_diag(square(psib) .* LLt, square(thetab)); lambda_star ~ normal(0, 10); thetaw ~ gamma(1, .5); thetab ~ gamma(1, .5); psib ~ gamma(1, .5);target += multi_normal_suff_lpdf(sw | sigmaw, dfw);target += multi_normal_suff_lpdf(sb | n * sigmab + sigmaw, J);}// Rotating the factor loadings// (see https://discourse.mc-stan.org/t/latent-factor-loadings/1483)generated quantities {vector[p] lambda; // final loading vector {int sign_l1 = lambda_star[1] > 0 ? 1 : -1; lambda = sign_l1 * lambda_star; }}
# Prepare data for STANyc <-sweep(dat[1:4], MARGIN =2, STATS =colMeans(dat[1:4])) # centered datan <-table(dat$id)[1]# Cross-product matricesssw <-crossprod(apply(yc, MARGIN =2, FUN = \(x) x -ave(x, dat$id)))ssb <-crossprod(apply(yc, MARGIN =2, FUN = ave, dat$id))# Run STANmcfa_suff_eq_fit <- mcfa_suff_eq$sample(data =list(p =4,J = num_clus,n = n,sw = ssw,sb = ssb ),chains =3,parallel_chains =3,iter_warmup =500,iter_sampling =1000,refresh =500)
//// This Stan program defines a one-factor two-level CFA model with a// marginal likelihood approach similar to one described in// https://ecmerkle.github.io/blavaan/articles/multilevel.html. The approach// allows for unequal cluster sizes, but assumes equal loadings and latent// variances across clusters, and every observations have complete data// on all items.//// Define function for computing multi-normal log density using sufficient// statisticsfunctions {real multi_normal_suff_lpdf(matrix S,matrix Sigma,int n) {return -0.5 * (n * (log_determinant_spd(Sigma) + cols(S) * log(2 * pi())) + trace(mdivide_left_spd(Sigma, S))); }}// The input data are (a) sw, cluster-specific within-cluster cross-product// matrix, and (b) ybar, cluster-specific cluster means of the items.data {int<lower=1> p; // number of itemsint<lower=1> J; // number of clustersarray[J] int<lower=1> n; // cluster sizearray[J] matrix[p, p] sw; // within cross-product matricesarray[J] vector[p] ybar; // cluster means}// The parameters accepted by the model. Our model// accepts four parameters: // 'lambda_star' for unscaled loadings, 'nu' for intercepts, // 'thetaw' and 'thetab' for uniqueness at the within and the between levels,// and 'psib' for latent factor SDparameters {vector[p] lambda_star; // unscaled loading vector assuming cross-level // invariancevector<lower=0>[p] thetaw; // within-cluster uniqueness vectorvector<lower=0>[p] thetab; // between-cluster uniqueness vectorreal<lower=0> psib; // between-cluster latent standard deviationvector[p] nu; // item means/intercepts}// The model to be estimated. We compute the log-likelihood as the sum of// two components: cluster-mean centered variables are multivariate normal// with covariance matrix sigmaw (and degree of freedom adjustment),// and cluster-mean variables are multivariate normal with covariance// matrix sigmab + sigmaw / njmodel {matrix[p, p] LLt = lambda_star * lambda_star';matrix[p, p] sigmaw = add_diag(LLt, square(thetaw));matrix[p, p] sigmab = add_diag(square(psib) .* LLt, square(thetab)); lambda_star ~ normal(0, 10); thetaw ~ gamma(1, .5); thetab ~ gamma(1, .5); psib ~ gamma(1, .5); nu ~ normal(0, 32);for (j in1:J) {target += multi_normal_suff_lpdf(sw[j] | sigmaw, n[j] - 1); ybar[j] ~ multi_normal(nu, sigmab + sigmaw / n[j]); }}// Rotating the factor loadings// (see https://discourse.mc-stan.org/t/latent-factor-loadings/1483)generated quantities {vector[p] lambda; // final loading vector {int sign_l1 = lambda_star[1] > 0 ? 1 : -1; lambda = sign_l1 * lambda_star; }}
# Prepare data for STANnj <-table(dat$id)# Cluster-specific cross-product matricessswj <-tapply(dat[1:4], INDEX = dat$id,FUN = \(x) tcrossprod(t(x) -colMeans(x)))ybarj <-tapply(dat[1:4], INDEX = dat$id, FUN = \(x) colMeans(x))# ssbj <- tapply(yc, INDEX = dat$id,# FUN = \(x) nrow(x) * tcrossprod(colMeans(x)))# Run STANmcfa_suffj_fit <- mcfa_suffj$sample(data =list(p =4,J = num_clus,n = nj,sw = sswj,ybar = ybarj ),chains =3,parallel_chains =3,iter_warmup =500,iter_sampling =1000,refresh =500)