class: center, middle, inverse, title-slide # Internal Consistency of Multilevel Data ## Cluster Means, Centering, and Construct Meanings ### Mark Lai ### University of Southern California ### 2020/11/09 --- # Outline ### Reliability in factor analysis ??? Such an honor to be here. Co-chairs of my dissertation are both graduates of ASU, and they would share with me how good the program at ASU is Lineage of ASU Quant . . . -- ### Multilevel reliability -- ### Issues of level-specific reliability coefficients 1. Reliability of latent scores 2. Cross-level invariance 3. Construct meanings -- ### Reliability indices for observed composite scores - `\(\omega^{2l}\)`, `\(\omega^b\)`, `\(\omega^w\)` ??? Some alternative indices I proposed to solve these limitations Looking forward to comments and suggestions; whether I'm doing something wrong or right -- ### Longitudinal Data? `$$\newcommand{\bv}[1]{\boldsymbol{\mathbf{#1}}}$$` --- background-image: url(https://www.picpedia.org/chalkboard/images/reliability.jpg) background-position: 90% 10% background-size: 25% # Importance of Reliability - Psychological scales are not perfect -- - Certain level of reliability needed * Statistical analyses are not trustworthy when the numbers are not consistent .footnote[ Image credit: Reliability by [Nick Youngson](http://www.nyphotographic.com/) [CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) [Alpha Stock Images](http://alphastockimages.com/) ] --- # APA Journal Article Reporting Standards (JARS) - In the [Psychometrics section](https://apastyle.apa.org/jars/quant-table-1.pdf) (Appelbaum, Cooper, Kline, Mayo-Wilson, Nezu, and Rao, 2018), researchers were asked to > Estimate and report values of reliability coefficients for the scores analyzed (i.e., the research's sample) (p. 7) ??? Similar recommendations can be found in numerous journal and methodological guidelines --- class: inverse, center, middle # Reliability ??? Just a quick introduction on the foundational work on reliability that this research relies on. --- # Classical Test Theory Lord & Novick (1968) Observed score = True score + Error `$$Y = T + E$$` ??? For example, we ask students report their attitutes toward math -- `\(T\)` and `\(E\)` independent, so `$$\sigma^2_Y = \sigma^2_T + \sigma^2_E$$` -- Reliability `\(\rho = \dfrac{\sigma^2_T}{\sigma^2_Y} = \dfrac{\sigma^2_T}{\sigma^2_T + \sigma^2_E} = [Corr(Y, T)]^2\)` --- background-image: url(images/tau_equivalence.png) background-position: 90% 20% background-size: 25% # Latent Variable/Factor Analysis ### (Essential) Tau-equivalence `\(p\)` items: `\(k = 1, \ldots, p\)` `$$Y_k = \nu_k + \eta + \epsilon_k$$` ??? When we have multiple items, we can estimate the error variance For the true score proportion of `\(Y\)`, it's on the same metric/unit as the latent variable -- `\(Var(\eta) = \psi\)`, `\(Var(\epsilon_k) = \theta_k\)`, `\(\epsilon_k\)` and `\(\epsilon_{k'}\)` independent `\(Cov(Y_k, Y_{k'}) = \psi\)` -- Unweighted (unit-weight) composite: `\(Z = \sum_k Y_{j}\)` Variance of unweighted composite: `$$Var(Z) = p^2 \psi + \sum_k \theta_k$$` -- Reliability = `\(\dfrac{p^2 \psi}{Var(Z)}\)`, or **Cronbach's `\(\alpha\)`** ??? There were different ways to justify the derivation of `\(\alpha\)` --- background-image: url(images/congeneric.png) background-position: 90% 20% background-size: 25% # Latent Variable ### Congeneric `$$Y_k = \nu_k + \lambda_k \eta + \epsilon_k$$` -- - True Score Variance `\(V^\text{True} = \sum_k (\lambda_k)^2 \psi\)` - Error Variance = `\(V^\text{Error} = \sum_k \theta_k\)` Composite reliability `\(\omega = \dfrac{V^\text{True}}{V^\text{True} + V^\text{Error}}\)` -- More generally, with `\(Cov([\epsilon_1, \epsilon_2, \ldots]) = \bv \Theta\)`, `\(V^\text{Error} = \bv 1' \bv \Theta \bv 1\)` <!-- `\(\omega = \dfrac{\sum_k (\lambda_k)^2 \psi}{\sum_k (\lambda_k)^2 \psi + \bv 1' \bv \Theta \bv 1}\)` --> --- class: center, middle # Reliability is a property of observed test scores `\((Z)\)`, not the latent scores `\((\eta)\)` --- class: inverse, center, middle # Multilevel Data Lai, M. H. C. (2020). Composite reliability of multilevel data: It's about observed scores and construct meanings. *Psychological Methods.* Advance online publication. https://doi.org/10.1037/met0000287 --- # Example 2007 Trends in International Mathematics and Science Study (TIMSS; Williams et al., 2009) * 7,896 students (4th grade) from 515 schools Positive attitudes toward math (PATM) | Item | Wording | | -------- | ---------------------------------| | AS4MAMOR | Would like to do more math | | AS4MAENJ | I enjoy learning mathematics | | AS4MALIK | I like math | | AS4MABOR | Math is boring (reverse-coded) | --- # Multilevel Reliability Not Consistently Reported Kim et al. (2016): Only 54% reported reliability, among 39 articles using multilevel confirmatory factor analysis (MCFA) - Usually only one reliability reported for one scale ??? However, discussion on multilevel reliability is not new --- # Multilevel Reliability - Raykov and du Toit (2005); Raykov and Marcoulides (2006) * Two-level composite reliability - Cranford, Shrout, Iida, Rafaeli, Yip, and Bolger (2006) * Generalizability Theory framework * Reliability of change - Geldhof, Preacher, and Zyphur (2014) * Level-specific reliability (within and between) * Most popular with cross-sectional data * Only approach discussed in Kim et al. (2016) --- # Geldhof et al. (2014) .pull-left[ <img src="images/mcfa11.png" width="60%" style="display: block; margin: auto;" /> ] -- .pull-right[ ### "Unconstrained" Multilevel Factor Model `\(j\)` indexes cluster `$$\bv Y_{ij} = \bv \nu + \bv \lambda^b \eta^b_j + \bv \lambda^w_j \eta^w_{ij} + \bv \epsilon_{ij}$$` `$$\bv \epsilon_{ij} = \bv \epsilon^{b}_j + \bv \epsilon^{w}_{ij}$$` `\(Var(\eta^b) = \psi^b\)`, `\(Var(\eta^w_j) = \psi^w\)` `\(Var(\epsilon^b) = \theta^b\)`, `\(Var(\epsilon^w_j) = \theta^w\)` Loading invariance across clusters: `\(\bv \lambda^{w}_j = \bv \lambda^{w}\)` for all `\(j\)` ] ??? No cross-level invariance `\(\epsilon\)` is the uniqueness, separated into the within and the between level --- # Geldhof et al. (2014) .pull-left[ <img src="images/mcfa11_fixed1.png" width="60%" style="display: block; margin: auto;" /> ] .pull-right[ Fixed `\(\psi^b = \psi^w = 1\)` for identification `\begin{align} \label{eq:tilomgb} \tilde \omega^{b} & = \frac{(\sum_{k = 1}^p \lambda^{b}_k)^2}{(\sum_{k = 1}^p \lambda^{b}_k)^2 + \sum_{k = 1}^p \theta^{b}_{kk}} \\ \label{eq:tilomgw} \tilde \omega^{w} & = \frac{(\sum_{k = 1}^p \lambda^{w}_k)^2}{(\sum_{k = 1}^p \lambda^{w}_k)^2 + \sum_{k = 1}^p \theta^{w}_{kk}}. \end{align}` For the TIMSS data - Est `\(\tilde \omega^w\)` = .857, 95% CI [.849, .863] - Est `\(\tilde \omega^b\)` = <span style="color:red">.977, 95% CI [.964, .987]</span> !! ] ??? Use tilde to distinguish them with the indices I will discuss later Why I got interested in this is that the reliability indices seem extremely large --- # `\(\tilde \omega^b\)` is Usually High Not uncommon in the literature . . . -- Positive and negative affects: `\(\tilde \omega^b\)` = .94 to .97 (Rush and Hofer, 2014) Instructional Skills Questionnaire: `\(\tilde \alpha^b\)` between .90 to .99 (Knol, Dolan, Mellenbergh, and van der Maas, 2016) ??? 1. Repeated measures within persons 2. Multiple factors in ISQ, Team from Netherland --- class: inverse, middle, center <!-- # What Does This Mean? --> <!-- > If we use these four items to measure school-level PATM, we always get the same rank-ordering of schools --> <!-- -- --> <!-- ### Or should we? --> # Are we that good at measuring between-level variables? --- # Three Issues 1. Which "scores" are reliable? * Cluster means and centering * Latent vs. observed composites 2. Cross-level invariance 3. Construct meanings ??? Although it is a critique on the level-specific reliability, to be fair -- ### To be fair, most of these issues have only started getting attentions recently --- # Issue 1: "Scores" in Multilevel Studies First compute a composite of the 4 PATM items If we use composite PATM to predict student's math achievement, we can compute -- | IDSCHOOL| AS4MAMOR| AS4MAENJ| AS4MALIK| AS4MABORr| Z| Zb| Zw| |--------:|--------:|--------:|--------:|---------:|--:|------:|-------:| | 1| 2| 2| 1| 2| 7| 6.5000| 0.5000| | 1| 2| 1| 1| 1| 5| 6.5000| -1.5000| | 1| 2| 1| 1| 1| 5| 6.5000| -1.5000| | 1| 2| 1| 2| 1| 6| 6.5000| -0.5000| | 1| 1| 1| 1| 1| 4| 6.5000| -2.5000| | 2| 3| 2| 2| 2| 9| 6.5625| 2.4375| | 2| 1| 2| 2| 1| 6| 6.5625| -0.5625| | 2| 1| 1| 1| 1| 4| 6.5625| -2.5625| | 2| 3| 2| 1| 1| 7| 6.5625| 0.4375| | 2| 2| 2| 3| 1| 8| 6.5625| 1.4375| --- # Three Sets of Scores - Raw/Overall composite PATM `\((Z_{ij})\)` - School means of composite PATM (cluster mean; `\(Z^b_j\)`) - Student deviations from school means (cluster-mean centered; `\(Z^w_{ij} = Z_{ij} - Z^b_j\)`) -- ### We should compute reliability for each of them --- # Which Score Is `\(\tilde \omega^b\)` for? Is `\(\tilde \omega^b\)` the reliability of the school means? ??? Not clear in the original paper -- .pull-left[ `\(Var(Y^b_1) = (\lambda^b_1)^2 + \theta^b_{11}\)` `\(Var(\sum_k Y^b_k) = {\color{red}{(\sum_k \lambda^{b}_k)^2}} + \sum_k \theta^{b}_{kk}\)` `$$\tilde \omega^{b} = \frac{{\color{red}{(\sum_k \lambda^{b}_k)^2}}}{{\color{red}{(\sum_k \lambda^{b}_k)^2}} + \sum_k \theta^{b}_{kk}}$$` ] .pull-right[ <img src="images/mcfa11_btw.png" width="80%" style="display: block; margin: auto;" /> ] --- # But What is `\(Y^b_k\)`? `\(Y^b_{jk}\)` (in circle) is the **latent** school mean of item `\(k\)` - True/Population mean of all students of school `\(j\)` ??? Let's say the school has 500 students. The one in circle is the mean of everyone from that school. But the sample may only contain 50 students May be easier to think in terms of a population mean vs a sample mean -- Different from the **observed** school mean, `\(\bar Y_{.jk} = \sum_{i = 1}^{n_j} Y_{ijk} / n_j\)` - Mean of students in the sample from school `\(j\)` -- Raudenbush and Bryk (2002): Reliability of cluster means `\(Var(Y_{ijk} - Y^b_{jk}) = \sigma^w_{kk} / n_j\)` ??? 1. Raudenbush & Bryk also talks about the reliability of the cluster mean, which is not perfect with a finite sample 2. The observed mean converges to the latent mean when `\(n_j \to \infty\)` --- class: middle Therefore, `\(\tilde \omega^b\)` is the internal consistency of a **latent** composite. ### Is that a problem? -- Let's go back to `\(Y = T + E\)`, where `\(T\)` is a latent variable. What is the reliability of `\(T\)`? -- It should be 1 as `\(T\)` is the true score -- But if we know the true score, we don't need to worry about reliability --- # Illustration Using Simulated Data .pull-left[ `\(\psi^b = \psi^w = 1\)`, `\(n_j = 10\)` for all `\(j\)` Five items - `\(\lambda^b = 0.25\)`, `\(\theta^b = 0.1\)` - `\(\lambda^w = 0.5\)`, `\(\theta^w = 1\)` ] ??? Just to make things more clear, I simulated a data set Ten observations in each cluster -- .pull-right[ <img src="asu_brownbag_2020_Lai_mcfa_reliability_files/figure-html/unnamed-chunk-8-1.png" width="100%" /> ] --- # Illustration Using Simulated Data .pull-left[ `\(\psi^b = \psi^w = 1\)`, `\(n_j = 10\)` for all `\(j\)` Five items - `\(\lambda^b = 0.25\)`, `\(\theta^b = 0.1\)` - `\(\lambda^w = 0.5\)`, `\(\theta^w = 1\)` Sources of measurement error: | | | |---------|---------| |Latent Mean | item uniqueness | |Observed Mean | item uniqueness + <span style="color:red">sampling error</color> | ] .pull-right[ <img src="asu_brownbag_2020_Lai_mcfa_reliability_files/figure-html/unnamed-chunk-9-1.png" width="100%" /> ] --- <img src="asu_brownbag_2020_Lai_mcfa_reliability_files/figure-html/unnamed-chunk-10-1.png" width="80%" /> ??? `\(\eta^b\)` is the true score at the school level Left: Correlation between latent score and latent composite Right: Correlation between latent score and observed composite, which is smaller -- `\(\tilde \omega^b = .76 = [Corr(\eta^b, \sum_k Y^b_k)]^2\)` -- However, `\([Corr(\eta^b, Z^b)]^2 = .49\)`, as `\(V^\text{Error} = \sum_{k = 1}^p \theta^{b}_{kk} + {\color{red}{[(\sum_{k = 1}^p \lambda^{w}_k)^2 + \sum_{k = 1}^p \theta^{w}_{kk}] / n}}\)` <!-- `$$\omega^{b} = \frac{(\sum_{k = 1}^p \lambda^b_k)^2}{(\sum_{k = 1}^p \lambda^{b}_k)^2 + \sum_{k = 1}^p \theta^{b}_{kk} + {\color{red}{[(\sum_{k = 1}^p \lambda^{w}_k)^2 + \sum_{k = 1}^p \theta^{w}_{kk}] / n}}} = .49.$$` --> `\(\omega^b = .49 \neq \tilde \omega^b\)` ??? Overly optimistic information imagine in a single-level context, saying that the reliability of the instrument was .76, but when it was less than .5 -- For the TIMSS items, `\(\omega^b = .719\)`, 95% CI [.668, .771] * as opposed to `\(\tilde \omega^b = .977\)` --- # How About `\(\tilde \omega^w\)`? -- .pull-left[ - `\(\tilde \omega^w\)` is composite reliability of latent-mean-centered scores * Also latent variables * But algebraically, `\(\tilde \omega^w = \omega^w\)` ] .pull-right[ <img src="images/mcfa11.png" width="60%" style="display: block; margin: auto;" /> ] --- class: inverse, middle, center # Issue 2: Cross-Level Loading Invariance --- .pull-left[ ## Without Constraints on Loadings - `\(\eta^b\)`: school-level construct, no connection to `\(\eta^w\)` - `\(\eta^w\)`: **purely** student-level construct (i.e., ICC = 0) * E.g., PATW **relative to the school mean** (See e.g., Mehta & Neale, 2005) ] ??? Can only compare relative standing, not absolute value -- .pull-right[ <img src="images/mcfa11.png" width="60%" style="display: block; margin: auto;" /> ] --- .pull-left[ ## With Cross-Level Invariance One construct `\(\eta\)`: `\(\eta_{ij} = \eta^b_j + \eta^w_{ij}\)` ICC = `\(\frac{\psi^b}{\psi^b + \psi^w}\)` <!-- Under cross-level invariance, `\(\omega^b\)` only measures the degree that scalar/strong invariance holds (Jak, Oort, and Dolan, 2014) --> ] .pull-right[ <img src="images/mcfa11_inv.png" width="60%" style="display: block; margin: auto;" /> ] ??? This is more consistent with the way we use cluster means and do centering in MLM --- .pull-left[ ## Strong/Scalar Invariance Across Clusters Implies that `\(\theta^b_{kk} = 0\)` for all `\(k\)`s * `\(\Rightarrow \tilde \omega^b = 1.0\)` (Jak, Oort, and Dolan, 2014) For an individual construct, `\(\tilde \omega^b\)` is roughly a measure of strong invariance ] .pull-right[ <img src="images/mcfa11_strong.png" width="70%" style="display: block; margin: auto;" /> ] --- class: inverse, middle, center # Construct Meanings Based on Stapleton, Yang, and Hancock (2016); Stapleton and Johnson (2019) --- # What is the Target Construct? - What is your attitude toward math? - What is your attitude toward math, relative to the school norm? - What is your school's overall attitude toward math? --- # Individual/Configural Construct -- .pull-left[ ### What is your attitude toward math? **Individual** construct `\(\eta\)` Partitioning: `\(\eta = \eta^b + \eta^w\)` **Configural** construct `\(\eta^b\)` (i.e., true cluster mean) - `\(Var(\eta^b) / Var(\eta)\)` = ICC **Within-cluster** component `\(\eta^w\)` ] .pull-right[ <img src="images/mcfa11_inv_est.png" width="60%" style="display: block; margin: auto;" /> ] --- # Matching Composites and Constructs - Individual construct--Raw composite `\(Z_{ij} = \sum_k Y_{ijk}\)` * `\(V^\text{True} = (\sum_{k = 1}^p \lambda_k)^2 (\psi^{w} + \psi^{b})\)` * `\(V^\text{Error} = \bv 1'\bv \Theta^b \bv 1 + \bv 1'\bv \Theta^w \bv 1\)` * Discussed in Raykov and du Toit (2005) <!-- `$$\omega^{2l} = \frac{(\sum_{k = 1}^p \lambda_k)^2 (\psi^{w} + \psi^{b})}{(\sum_{k = 1}^p \lambda_k)^2 (\psi^{w} + \psi^{b}) + \bv 1'\bv \Theta^b \bv 1 + \bv 1'\bv \Theta^w \bv 1}$$` --> -- - Configural construct--Composite cluster mean `\(Z^b_{j} = \sum_k \bar Y_{jk}\)` <!-- `$$\omega^{b} = \frac{(\sum_{k = 1}^p \lambda_k)^2 \psi^{b}}{(\sum_{k = 1}^p \lambda_k)^2 (\psi^{b} + \psi^{w} / {\color{red}{n}}) + \bv 1' \bv \Theta^b \bv 1 + \bv 1' \bv \Theta^w \bv 1 / {\color{red}{n}}}$$` --> * `\(V^\text{True} = (\sum_{k = 1}^p \lambda_k)^2 \psi^{b}\)` * `\(V^\text{Error} = \bv 1' \bv \Theta^b \bv 1 + {\color{red}{[(\sum_{k = 1}^p \lambda_k)^2 \psi^{w} + \bv 1' \bv \Theta^w \bv 1] / n}}\)` * For unbalanced cluster sizes, use the harmonic mean `\(\tilde n\)` -- - Within-cluster construct--Composite of deviation scores `\(Z^w_{ij} = \sum_k (Y_{ijk} - \bar Y_{jk})\)` <!-- `$$\omega^{w} = \frac{(\sum_{k = 1}^p \lambda_k)^2 \psi^{w}}{(\sum_{k = 1}^p \lambda_k)^2 \psi^{w} + \bv 1'\bv \Theta^w \bv 1}$$` --> * `\(V^\text{True} = (\sum_{k = 1}^p \lambda_k)^2 \psi^{w}\)` * `\(V^\text{Error} = \bv 1'\bv \Theta^w \bv 1\)` ??? Replace `\(n\)` with the harmonic mean for unequal cluster sizes --- # Within-Cluster Construct -- .pull-left[ ### What is your attitude toward math, relative to the school norm? **Within-cluster** construct `\(\eta^w\)` Expected ICC = 0 `\(\omega^{w}\)` reliability of `\(Z^w_{ij}\)` `$$\frac{(\sum_{k = 1}^p \lambda_k)^2 \psi^{w}}{(\sum_{k = 1}^p \lambda_k)^2 \psi^{w} + \bv 1'\bv \Theta^w \bv 1}$$` ] .pull-right[ <img src="images/mcfas1.png" width="70%" style="display: block; margin: auto;" /> ] --- # Shared Construct -- .pull-left[ ### What is your school's attitude toward math? **Shared** construct `\(\eta^b\)`: Cluster-level attribute (aka climate) `\(\omega^{b}\)` reliability of `\(Z^b_j\)` * `\(V^\text{True} = (\sum_{k = 1}^p \lambda_k)^2 \psi^{b}\)` * `\(V^\text{Error} = \bv 1' \bv \Theta^b \bv 1 + \bv 1'\bv \Sigma^w \bv 1 / \tilde n\)` <!-- `$$\omega^{b} = \frac{(\sum_{k = 1}^p \lambda_k^b)^2 \psi^{b}}{(\sum_{k = 1}^p \lambda^b_k)^2 \psi^{b} + \bv 1' \bv \Theta^b \bv 1 + \bv 1'\bv \Sigma^w \bv 1 / \tilde n}$$` --> ] .pull-right[ <img src="images/mcfa1s.png" width="70%" style="display: block; margin: auto;" /> ] --- # Shared + Configural/Individual Constructs -- .pull-left[ ### What is your school's attitude toward math? There may be rater acquiescence **Shared** construct `\(\eta^s\)`: School climate **Individual** construct `\(\eta^w\)`: Acquiescence **Configural** construct `\(\eta^b\)`: School means of Acquiescence ] .pull-right[ <img src="images/mcfashared1.png" width="80%" style="display: block; margin: auto 0 auto auto;" /> ] --- # Shared + Configural/Individual Constructs .pull-left[ The school-level composite, `\(Z^b_j\)`, measures both `\(\eta^s\)` and `\(\eta^b\)` <!-- `\(\omega^{b}\)` is the consistency of `\(Z^b_j\)` (due to both `\(\eta^s\)` and `\(\eta^b\)`) --> `\(\omega^{b(s)}\)`: construct reliability of `\(Z^b_j\)` measuring `\(\eta^s\)` * `\(V^\text{True} = (\sum_{k = 1}^p \lambda^s_k)^2 \psi^s\)` * `\(V^\text{Error} = (\sum_{k = 1}^p \lambda_k)^2 (\psi^{b} + \psi^{w} / \tilde n)\)` `\(\quad + \bv 1' \bv \Theta^b \bv 1 + \bv 1' \bv \Theta^w \bv 1 / \tilde n\)` <!-- `$$\omega^{b(s)} = \frac{(\sum_{k = 1}^p \lambda^{s}_k)^2 \psi^{s}}{Var(Z^b_j)}$$` --> <!-- \begin{align*} --> <!-- Var(Z^b_j) & = (\sum_{k = 1}^p \lambda^{s}_k)^2 \psi^{s} + (\sum_{k = 1}^p \lambda_k)^2 (\psi^{b} + \psi^{w} / \tilde n) \\ --> <!-- & \quad + \bv 1' \bv \Theta^b \bv 1 + \bv 1' \bv \Theta^w \bv 1 / \tilde n --> <!-- \end{align*} --> ] .pull-right[ <img src="images/mcfashared1.png" width="80%" style="display: block; margin: auto 0 auto auto;" /> ] --- # Extensions of `\(\alpha\)` `\begin{align*} \alpha^{2l} & = \frac{p}{p - 1}\left(\frac{\sum_{k \neq k'} (\sigma^{b}_{k k'} + \sigma^{w}_{k k'})}{\bv 1'\bv \Sigma^b \bv 1 + \bv 1' \bv \Sigma^w \bv 1}\right) \\ \alpha^{b} & = \frac{p}{p - 1}\left(\frac{\sum_{k \neq k'} \sigma^{b}_{k k'}}{\bv 1'\bv \Sigma^b \bv 1 + \bv 1' \bv \Sigma^w \bv 1 / \tilde n}\right) \\ \alpha^{w} & = \frac{p}{p - 1}\left(\frac{\sum_{k \neq k'} \sigma^{w}_{k k'}}{\bv 1' \bv \Sigma^w \bv 1}\right) \end{align*}` --- class: middle, center ![](images/table3.png) --- # Which One to Report? - If a variable is partitioned in a multilevel model (most likely an individual construct), all three `\((\omega^{2l}, \omega^{b}, \omega^{w})\)` should be reported * Cluster means and cluster-mean centered predictors * Outcome variable -- - Otherwise, reliability at the corresponding level `\((\omega^b\)` or `\(\omega^w)\)` --- # Summary (Lai, 2020) ### Computing and reporting reliability information is important for multilevel data -- ### Reliability information is needed for raw, cluster means, and cluster-mean centered scores -- ### Previous approach to between-level reliability is an overestimate when cluster size is small -- ### Nature of target construct should be considered, and it has implications on reliability computation --- class: middle, center | Construct | `\(\omega^{2l}\)`, `\(\alpha^{2l}\)` | `\(\omega^b\)`, `\(\alpha^b\)` | `\(\omega^w\)`, `\(\alpha^w\)` | `\(\omega^{b(s)}\)` | | --------- | ------------- | ---------- | ---------- | --------------- | | Individual | X | X | X | | | Configural | | X | | | | Within-Cluster | | | X | | | Shared | | X | | X | --- class: inverse, middle, center # Longitudinal Data Preliminary ideas. Suggestions are greatly appreciated. --- # Midlife in the United States Data from MIDUS 2: Daily Stress Project, 2004-2009 (Ryff and Almeida, 2009) - 2,022 participants, 8 days each - Target construct: Positive affect | Item | Wording | | -------- | ---------------------------------| | b2dc24 | Did you feel attentive? | | b2dc25 | Did you feel proud? | | b2dc26 | Did you feel active? | | b2dc27 | Did you feel confident? | - Type of scores: raw composite, person means, person-mean centered --- .pull-left[ ### From MCFA Est `\(\text{ICC}(\eta) = .778\)` | Composite |Est `\(\omega\)` | 95% CI | | --------- | ------- | -----------------------------| | Raw | .812 | [.801, .822] | | Within | .609 | [.595, .623] | | Between | .852 | [.839, .864] | ] .pull-right[ <img src="images/mcfa11_pa.png" width="60%" style="display: block; margin: auto;" /> ] --- # Can We Incorporate Time? .pull-left[ Cross-Classified CFA (Jeon and Rabe-Hesketh, 2012; Asparouhov and Muthén, 2012) Assuming cross-level invariance for an individual construct, with decomposition `$$\eta_{ti} = \eta^P_i + \eta^T_t + \eta^W_{ti}$$` ] .pull-right[ <img src="images/mcfa111_crossed_est.png" width="100%" style="display: block; margin: auto;" /> ] --- # Relation to the Generalizability Theory Most meaningful when participants are measured on the same days/times Cranford, Shrout, Iida, et al. (2006): generalizability coefficients for diary studies ??? Not the case for the MIDUS data, as everyone starts on a different day -- Differences: - Fixed vs. Random item facet (in estimation) - Relax the essential parallel test assumption * Item-specific loadings and uniqueness - Flexible SEM modeling --- # Some Possible Reliability Coefficients ### Reliability of raw scores * `\(V^\text{True} = (\sum_k \lambda_k)^2 (\psi^P + \psi^T + \psi^W)\)` * `\(V^\text{Error} = \bv 1' (\bv \Theta^P + \bv \Theta^T + \bv \Theta^W) \bv 1\)` <!-- `$$\omega^\text{raw} = \frac{(\sum_k \lambda_k)^2 (\psi^P + \psi^T + \psi^W)}{(\sum_k \lambda_k)^2 (\psi^P + \psi^T + \psi^W) + \bv 1' \bv \Theta^P \bv 1 + \bv 1' \bv \Theta^T \bv 1 + \bv 1' \bv \Theta^W \bv 1}$$` --> -- ### Reliability of person means (across T time points) <!-- `$$\omega^b = \frac{(\sum_k \lambda_k)^2 \psi^P}{(\sum_k \lambda_k)^2 [\psi^P + (\psi^T + \psi^W) / T] + \bv 1' \bv \Theta^P \bv 1 + (\bv 1' \bv \Theta^T \bv 1 + \bv 1' \bv \Theta^W \bv 1) / T]}$$` --> * `\(V^\text{True} = (\sum_k \lambda_k)^2 \psi^P\)` * `\(V^\text{Error} = \bv 1' \bv \Theta^P \bv 1 + {\color{red}{[(\sum_k \lambda_k)^2 \psi^W + \bv 1' \bv \Theta^W \bv 1] / T}}\)` -- ### Reliability of deviation from person mean * `\(V^\text{True} = (\sum_k \lambda_k)^2 (\psi^T + \psi^W)\)` * `\(V^\text{Error} = \bv 1' (\bv \Theta^T + \bv \Theta^W) \bv 1\)` <!-- Reliability at a given time `\((T = t_0)\)` --> <!-- Expected `\(\omega\)` for raw score on any day (i.e., cross-sectional `\(\omega\)`) --> <!-- `$$\omega^{t_0} = \frac{(\sum_k \lambda_k)^2 (\psi^P + \psi^W)}{(\sum_k \lambda_k)^2 (\psi^P + \psi^W) + \bv 1' \bv \Theta^P \bv 1 + \bv 1' \bv \Theta^W \bv 1}$$` --> ??? Person-level (trait-level) variance is not part of true score for the deviation score --- class: middle, center In this example, there is essentially no time-level variance * E.g., no day of participation effect | Composite | Est `\(\omega\)` | 95% CI | | --------- | ------- | -----------------------------| | Raw | .829 | [.820, .837] | | Within | .646 | [.635, .660] | | Between | .859 | [.849, .868] | --- # Many Questions Remain 1. Linkage to generalizability coefficients by Cranford et al. (2006) 2. Discrete indicators? 3. Should constructs at the within-person level and the between-person level be on the same metric? 4. Are there "shared" constructs at the person level? * E.g., Intensively measuring a stable trait? 5. Reliability of change? (Rogosa, Brandt, and Zimowski, 1982) * Related to reliability of within-person deviation? --- class: center, middle # Thanks! Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan). --- # References Appelbaum, M., H. Cooper, R. B. Kline, et al. (2018). "Journal article reporting standards for quantitative research in psychology". In: _American Psychologist_ 73.1, pp. 3-25. ISSN: 0003066X. DOI: [10.1037/amp0000191](https://doi.org/10.1037%2Famp0000191). Asparouhov, T. and B. Muthén "General random effect latent variable modeling: Random subjects, items, contexts, and parameter". In: _Advances in multilevel modeling for educational research: Addressing practical issues found in real-world applications_. Charlotte, NC: Information Age, pp. 163-192. Cranford, J. A., P. E. Shrout, M. Iida, et al. (2006). "A procedure for evaluating sensitivity to within-person change: Can mood measures in diary studies detect change reliably?" En. In: _Personality and Social Psychology Bulletin_ 32.7, pp. 917-929. ISSN: 0146-1672, 1552-7433. DOI: [10.1177/0146167206287721](https://doi.org/10.1177%2F0146167206287721). URL: [http://journals.sagepub.com/doi/10.1177/0146167206287721](http://journals.sagepub.com/doi/10.1177/0146167206287721) (visited on Nov. 08, 2020). Geldhof, G. J., K. J. Preacher, and M. J. Zyphur (2014). "Reliability estimation in a multilevel confirmatory factor analysis framework". In: _Psychological Methods_ 19.1, pp. 72-91. ISSN: 1082989X. DOI: [10.1037/a0032138](https://doi.org/10.1037%2Fa0032138). --- # References (cont'd) Jak, S., F. J. Oort, and C. V. Dolan (2014). "Measurement bias in multilevel data". In: _Structural Equation Modeling: A Multidisciplinary Journal_ 21.1, pp. 31-39. ISSN: 1070-5511. DOI: [10.1080/10705511.2014.856694](https://doi.org/10.1080%2F10705511.2014.856694). Jeon, M. and S. Rabe-Hesketh (2012). "Profile-likelihood approach for estimating generalized linear mixed models with factor structures". En. In: _Journal of Educational and Behavioral Statistics_ 37.4, pp. 518-542. ISSN: 1076-9986, 1935-1054. DOI: [10.3102/1076998611417628](https://doi.org/10.3102%2F1076998611417628). URL: [http://journals.sagepub.com/doi/10.3102/1076998611417628](http://journals.sagepub.com/doi/10.3102/1076998611417628) (visited on Nov. 08, 2020). Knol, M. H., C. V. Dolan, G. J. Mellenbergh, et al. (2016). "Measuring the quality of university lectures: Development and validation of the Instructional Skills Questionnaire (ISQ)". In: _PLOS ONE_ 11.2. Ed. by D. S. Courvoisier, p. e0149163. ISSN: 1932-6203. DOI: [10.1371/journal.pone.0149163](https://doi.org/10.1371%2Fjournal.pone.0149163). Lai, M. H. C. (2020). "Composite reliability of multilevel data: It’s about observed scores and construct meanings." En. In: _Psychological Methods_. ISSN: 1939-1463, 1082-989X. DOI: [10.1037/met0000287](https://doi.org/10.1037%2Fmet0000287). URL: [http://doi.apa.org/getdoi.cfm?doi=10.1037/met0000287](http://doi.apa.org/getdoi.cfm?doi=10.1037/met0000287) (visited on Nov. 08, 2020). --- # References (cont'd) Raudenbush, S. W. and A. S. Bryk (2002). _Hierarchical linear models: Applications and data analysis methods_. 2nd ed. Thousand Oaks, CA: Sage. ISBN: 076191904X. Raykov, T. and G. A. Marcoulides (2006). "On multilevel model reliability estimation from the perspective of structural equation modeling". In: _Structural Equation Modeling: A Multidisciplinary Journal_ 13.1, pp. 130-141. ISSN: 1070-5511. DOI: [10.1207/s15328007sem1301_7](https://doi.org/10.1207%2Fs15328007sem1301_7). Raykov, T. and S. H. C. du Toit (2005). "Estimation of reliability for multiple-component measuring instruments in hierarchical designs". In: _Structural Equation Modeling: A Multidisciplinary Journal_ 12.4, pp. 536-550. ISSN: 1070-5511. DOI: [10.1207/s15328007sem1204_2](https://doi.org/10.1207%2Fs15328007sem1204_2). Rogosa, D., D. Brandt, and M. Zimowski (1982). "A growth curve approach to the measurement of change." En. In: _Psychological Bulletin_ 92.3, pp. 726-748. ISSN: 0033-2909. DOI: [10.1037/0033-2909.92.3.726](https://doi.org/10.1037%2F0033-2909.92.3.726). URL: [http://content.apa.org/journals/bul/92/3/726](http://content.apa.org/journals/bul/92/3/726) (visited on Nov. 08, 2020). --- # References (cont'd) Rush, J. and S. M. Hofer (2014). "Differences in within- and between-person factor structure of positive and negative affect: Analysis of two intensive measurement studies using multilevel structural equation modeling.". In: _Psychological Assessment_ 26.2, pp. 462-473. ISSN: 1939-134X. DOI: [10.1037/a0035666](https://doi.org/10.1037%2Fa0035666). Ryff, C. D. and D. M. Almeida (2009). _Midlife in the United States (MIDUS 2): Daily Stress Project, 2004-2009: Version 2_. En. type: dataset. DOI: [10.3886/ICPSR26841.V2](https://doi.org/10.3886%2FICPSR26841.V2). URL: [http://www.icpsr.umich.edu/icpsrweb/NACDA/studies/26841/version/2](http://www.icpsr.umich.edu/icpsrweb/NACDA/studies/26841/version/2) (visited on Nov. 08, 2020). Stapleton, L. M. and T. L. Johnson (2019). "Models to examine the validity of cluster-level factor structure using individual-level data". In: _Advances in Methods and Practices in Psychological Science_, p. 251524591985503. ISSN: 2515-2459. DOI: [10.1177/2515245919855039](https://doi.org/10.1177%2F2515245919855039). Stapleton, L. M., J. S. Yang, and G. R. Hancock (2016). "Construct meaning in multilevel settings". In: _Journal of Educational and Behavioral Statistics_ 41.5, pp. 481-520. ISSN: 1076-9986. DOI: [10.3102/1076998616646200](https://doi.org/10.3102%2F1076998616646200).