Composite reliability of multilevel data: It's about observed scores and construct meanings


This paper shows how the concept of reliability of composite scores, as defined in classical test theory, can be extended to the context of multilevel modeling. In particular, it discusses the contributions and limitations of the various level-specific reliability indices proposed by Geldhof, Preacher, and Zyphur (2014), denoted as $\tilde \omega^b$ and $\tilde \omega^w$ (and also $\tilde \alpha^b$ and $\tilde \alpha^w$). One major limitation of those indices is that they are quantities for latent, unobserved level-specific composite scores, and are not suitable for observed composites at different levels. As illustrated using simulated data in this paper, $\tilde \omega^b$ can drastically overestimate the true reliability of between-level composite scores (i.e., observed cluster means). Another limitation is that the development of those indices did not consider the recent conceptual development on construct meanings in multilevel modeling (Stapleton & Johnson, 2019; Stapleton, Yang, & Hancock, 2016). To address the second limitation, this paper defines reliability indices ($\omega^{2l}$, $\omega^b$, $\omega^w$, $\alpha^{2l}$, $\alpha^b$, $\alpha^w$) for three types of multilevel observed composite scores measuring various multilevel constructs: individual, configural, shared, and within-cluster. The paper also shows how researchers can obtain sample point and interval estimates using the derived formulas and the provided R and Mplus code. In addition, a large-scale national data set was used to illustrate the proposed methods for estimating reliability for the three types of multilevel composite scores, and practical recommendations on when different indices should be reported are provided.

Psychological Methods, 26(1), 90–102. Advance Online Publication
comments powered by Disqus