Internal Consistency of Multilevel DataCluster Means, Centering, and Construct MeaningsMark LaiUniversity of Southern California2020/11/091 / 57

OutlineReliability in factor analysis2 / 57

Such an honor to be here. Co-chairs of my dissertation are both graduates of ASU, and they would share with me how good the program at ASU is

Lineage of ASU Quant . . .

OutlineReliability in factor analysisMultilevel reliability2 / 57

Such an honor to be here. Co-chairs of my dissertation are both graduates of ASU, and they would share with me how good the program at ASU is

Lineage of ASU Quant . . .

OutlineReliability in factor analysisMultilevel reliabilityIssues of level-specific reliability coefficientsReliability of latent scores
Cross-level invariance
Construct meanings
2 / 57

Such an honor to be here. Co-chairs of my dissertation are both graduates of ASU, and they would share with me how good the program at ASU is

Lineage of ASU Quant . . .

OutlineReliability in factor analysisMultilevel reliabilityIssues of level-specific reliability coefficientsReliability of latent scores
Cross-level invariance
Construct meanings
Reliability indices for observed composite scoresω2lω2l, ωbωb, ωwωw
2 / 57

Such an honor to be here. Co-chairs of my dissertation are both graduates of ASU, and they would share with me how good the program at ASU is

Lineage of ASU Quant . . .

Some alternative indices I proposed to solve these limitations

Looking forward to comments and suggestions; whether I'm doing something wrong or right

Outline

Reliability in factor analysis

Multilevel reliability

Issues of level-specific reliability coefficients

Reliability of latent scores
Cross-level invariance
Construct meanings

Reliability indices for observed composite scores

$ω^{2 l}$ , $ω^{b}$ , $ω^{w}$

Longitudinal Data?

2 / 57

Such an honor to be here. Co-chairs of my dissertation are both graduates of ASU, and they would share with me how good the program at ASU is

Lineage of ASU Quant . . .

Some alternative indices I proposed to solve these limitations

Looking forward to comments and suggestions; whether I'm doing something wrong or right

Importance of ReliabilityPsychological scales are not perfect
3 / 57

Importance of Reliability

Psychological scales are not perfect
Certain level of reliability needed
- Statistical analyses are not trustworthy when the numbers are not consistent

Image credit: Reliability by Nick Youngson CC BY-SA 3.0 Alpha Stock Images

3 / 57

APA Journal Article Reporting Standards (JARS)

In the Psychometrics section (Appelbaum, Cooper, Kline, Mayo-Wilson, Nezu, and Rao, 2018), researchers were asked to

Estimate and report values of reliability coefficients for the scores analyzed (i.e., the research's sample) (p. 7)

4 / 57

Similar recommendations can be found in numerous journal and methodological guidelines

Reliability5 / 57

Just a quick introduction on the foundational work on reliability that this research relies on.

Classical Test Theory

Lord & Novick (1968)

Observed score = True score + Error

$Y = T + E$

6 / 57

For example, we ask students report their attitutes toward math

Classical Test Theory

Lord & Novick (1968)

Observed score = True score + Error

$Y = T + E$

$T$ and $E$ independent, so

$σ_{Y}^{2} = σ_{T}^{2} + σ_{E}^{2}$

6 / 57

For example, we ask students report their attitutes toward math

Classical Test Theory

Lord & Novick (1968)

Observed score = True score + Error

$Y = T + E$

$T$ and $E$ independent, so

$σ_{Y}^{2} = σ_{T}^{2} + σ_{E}^{2}$

Reliability $ρ = \frac{σ_{T}^{2}}{σ_{Y}^{2}} = \frac{σ_{T}^{2}}{σ_{T}^{2} + σ_{E}^{2}} = [C o r r (Y, T)]^{2}$

6 / 57

For example, we ask students report their attitutes toward math

Latent Variable/Factor Analysis

(Essential) Tau-equivalence

$p$ items: $k = 1, \dots, p$

$Y_{k} = ν_{k} + η + ϵ_{k}$

7 / 57

When we have multiple items, we can estimate the error variance

For the true score proportion of $Y$ , it's on the same metric/unit as the latent variable

Latent Variable/Factor Analysis

(Essential) Tau-equivalence

$p$ items: $k = 1, \dots, p$

$Y_{k} = ν_{k} + η + ϵ_{k}$ $V a r (η) = ψ$ , $V a r (ϵ_{k}) = θ_{k}$ , $ϵ_{k}$ and $ϵ_{k^{'}}$ independent

$C o v (Y_{k}, Y_{k^{'}}) = ψ$

7 / 57

When we have multiple items, we can estimate the error variance

For the true score proportion of $Y$ , it's on the same metric/unit as the latent variable

Latent Variable/Factor Analysis

(Essential) Tau-equivalence

$p$ items: $k = 1, \dots, p$

$Y_{k} = ν_{k} + η + ϵ_{k}$ $V a r (η) = ψ$ , $V a r (ϵ_{k}) = θ_{k}$ , $ϵ_{k}$ and $ϵ_{k^{'}}$ independent

$C o v (Y_{k}, Y_{k^{'}}) = ψ$

Unweighted (unit-weight) composite: $Z = \sum_{k} Y_{j}$

Variance of unweighted composite: $V a r (Z) = p^{2} ψ + \sum k θ_{k}$

7 / 57

When we have multiple items, we can estimate the error variance

For the true score proportion of $Y$ , it's on the same metric/unit as the latent variable

Latent Variable/Factor Analysis

(Essential) Tau-equivalence

$p$ items: $k = 1, \dots, p$

$Y_{k} = ν_{k} + η + ϵ_{k}$ $V a r (η) = ψ$ , $V a r (ϵ_{k}) = θ_{k}$ , $ϵ_{k}$ and $ϵ_{k^{'}}$ independent

$C o v (Y_{k}, Y_{k^{'}}) = ψ$

Unweighted (unit-weight) composite: $Z = \sum_{k} Y_{j}$

Variance of unweighted composite: $V a r (Z) = p^{2} ψ + \sum k θ_{k}$ Reliability = $\frac{p^{2} ψ}{V a r (Z)}$ , or Cronbach's $α$

7 / 57

When we have multiple items, we can estimate the error variance

For the true score proportion of $Y$ , it's on the same metric/unit as the latent variable

There were different ways to justify the derivation of $α$

Latent Variable

Congeneric

$Y_{k} = ν_{k} + λ_{k} η + ϵ_{k}$

8 / 57

Latent Variable

Congeneric

$Y_{k} = ν_{k} + λ_{k} η + ϵ_{k}$

True Score Variance $V^{True} = \sum_{k} (λ_{k})^{2} ψ$
Error Variance = $V^{Error} = \sum_{k} θ_{k}$

Composite reliability $ω = \frac{V^{True}}{V^{True} + V^{Error}}$

8 / 57

Latent Variable

Congeneric

$Y_{k} = ν_{k} + λ_{k} η + ϵ_{k}$

True Score Variance $V^{True} = \sum_{k} (λ_{k})^{2} ψ$
Error Variance = $V^{Error} = \sum_{k} θ_{k}$

Composite reliability $ω = \frac{V^{True}}{V^{True} + V^{Error}}$

More generally, with $C o v ([ϵ_{1}, ϵ_{2}, \dots]) = Θ$ , $V^{Error} = 1^{'} Θ 1$

8 / 57

Reliability is a property of observed test scores (Z)(Z), not the latent scores (η)(η)9 / 57

Multilevel Data

Lai, M. H. C. (2020). Composite reliability of multilevel data: It's about observed scores and construct meanings. Psychological Methods. Advance online publication. https://doi.org/10.1037/met0000287

10 / 57

Example

2007 Trends in International Mathematics and Science Study (TIMSS; Williams et al., 2009)

7,896 students (4th grade) from 515 schools

Positive attitudes toward math (PATM)

Item	Wording
AS4MAMOR	Would like to do more math
AS4MAENJ	I enjoy learning mathematics
AS4MALIK	I like math
AS4MABOR	Math is boring (reverse-coded)

11 / 57

Multilevel Reliability Not Consistently Reported

Kim et al. (2016): Only 54% reported reliability, among 39 articles using multilevel confirmatory factor analysis (MCFA)

Usually only one reliability reported for one scale

12 / 57

However, discussion on multilevel reliability is not new

Multilevel Reliability

Raykov and du Toit (2005); Raykov and Marcoulides (2006)
- Two-level composite reliability
Cranford, Shrout, Iida, Rafaeli, Yip, and Bolger (2006)
- Generalizability Theory framework
- Reliability of change
Geldhof, Preacher, and Zyphur (2014)
- Level-specific reliability (within and between)
- Most popular with cross-sectional data
- Only approach discussed in Kim et al. (2016)

13 / 57

Geldhof et al. (2014)

14 / 57

Geldhof et al. (2014)

"Unconstrained" Multilevel Factor Model

$j$ indexes cluster

$Y_{i j} = ν + λ^{b} η_{j}^{b} + λ_{j}^{w} η_{i j}^{w} + ϵ_{i j}$

$ϵ_{i j} = ϵ_{j}^{b} + ϵ_{i j}^{w}$ $V a r (η^{b}) = ψ^{b}$ , $V a r (η_{j}^{w}) = ψ^{w}$

$V a r (ϵ^{b}) = θ^{b}$ , $V a r (ϵ_{j}^{w}) = θ^{w}$

Loading invariance across clusters: $λ_{j}^{w} = λ^{w}$ for all $j$

14 / 57

No cross-level invariance

$ϵ$ is the uniqueness, separated into the within and the between level

Geldhof et al. (2014)

Fixed $ψ^{b} = ψ^{w} = 1$ for identification

$\begin{matrix} {~ ω}^{b} & = \frac{(\sum_{k = 1}^{p} λ_{k}^{b})^{2}}{(\sum_{k = 1}^{p} λ_{k}^{b})^{2} + \sum_{k = 1}^{p} θ_{k k}^{b}} {~ ω}^{w} & = \frac{(\sum_{k = 1}^{p} λ_{k}^{w})^{2}}{(\sum_{k = 1}^{p} λ_{k}^{w})^{2} + \sum_{k = 1}^{p} θ_{k k}^{w}} . \end{matrix}$

For the TIMSS data

Est ${~ ω}^{w}$ = .857, 95% CI [.849, .863]
Est ${~ ω}^{b}$ = .977, 95% CI [.964, .987] !!

15 / 57

Use tilde to distinguish them with the indices I will discuss later

Why I got interested in this is that the reliability indices seem extremely large

${~ ω}^{b}$ is Usually High

Not uncommon in the literature . . .

16 / 57

${~ ω}^{b}$ is Usually High

Not uncommon in the literature . . .

Positive and negative affects: ${~ ω}^{b}$ = .94 to .97 (Rush and Hofer, 2014)

Instructional Skills Questionnaire: ${~ α}^{b}$ between .90 to .99 (Knol, Dolan, Mellenbergh, and van der Maas, 2016)

16 / 57

Repeated measures within persons
Multiple factors in ISQ, Team from Netherland

Are we that good at measuring between-level variables?17 / 57

Three Issues

Which "scores" are reliable?
- Cluster means and centering
- Latent vs. observed composites
Cross-level invariance
Construct meanings

18 / 57

Although it is a critique on the level-specific reliability, to be fair

Three Issues

Which "scores" are reliable?
- Cluster means and centering
- Latent vs. observed composites
Cross-level invariance
Construct meanings

To be fair, most of these issues have only started getting attentions recently

18 / 57

Although it is a critique on the level-specific reliability, to be fair

Issue 1: "Scores" in Multilevel Studies

First compute a composite of the 4 PATM items

If we use composite PATM to predict student's math achievement, we can compute

19 / 57

Issue 1: "Scores" in Multilevel Studies

First compute a composite of the 4 PATM items

If we use composite PATM to predict student's math achievement, we can compute

IDSCHOOL	AS4MAMOR	AS4MAENJ	AS4MALIK	AS4MABORr	Z	Zb	Zw
1	2	2	1	2	7	6.5000	0.5000
1	2	1	1	1	5	6.5000	-1.5000
1	2	1	1	1	5	6.5000	-1.5000
1	2	1	2	1	6	6.5000	-0.5000
1	1	1	1	1	4	6.5000	-2.5000
2	3	2	2	2	9	6.5625	2.4375
2	1	2	2	1	6	6.5625	-0.5625
2	1	1	1	1	4	6.5625	-2.5625
2	3	2	1	1	7	6.5625	0.4375
2	2	2	3	1	8	6.5625	1.4375

19 / 57

Three Sets of Scores

Raw/Overall composite PATM $(Z_{i j})$
School means of composite PATM (cluster mean; $Z_{j}^{b}$ )
Student deviations from school means (cluster-mean centered; $Z_{i j}^{w} = Z_{i j} - Z_{j}^{b}$ )

20 / 57

Three Sets of Scores

Raw/Overall composite PATM $(Z_{i j})$
School means of composite PATM (cluster mean; $Z_{j}^{b}$ )
Student deviations from school means (cluster-mean centered; $Z_{i j}^{w} = Z_{i j} - Z_{j}^{b}$ )

We should compute reliability for each of them

20 / 57

Which Score Is ${~ ω}^{b}$ for?

Is ${~ ω}^{b}$ the reliability of the school means?

21 / 57

Not clear in the original paper

Which Score Is ${~ ω}^{b}$ for?

Is ${~ ω}^{b}$ the reliability of the school means?

$V a r (Y_{1}^{b}) = (λ_{1}^{b})^{2} + θ_{11}^{b}$

$V a r (\sum_{k} Y_{k}^{b}) = (\sum_{k} λ_{k}^{b})^{2} + \sum_{k} θ_{k k}^{b}$

${~ ω}^{b} = \frac{(\sum_{k} λ_{k}^{b})^{2}}{(\sum_{k} λ_{k}^{b})^{2} + \sum_{k} θ_{k k}^{b}}$

21 / 57

Not clear in the original paper

But What is $Y_{k}^{b}$ ?

$Y_{j k}^{b}$ (in circle) is the latent school mean of item $k$

True/Population mean of all students of school $j$

22 / 57

Let's say the school has 500 students. The one in circle is the mean of everyone from that school. But the sample may only contain 50 students

May be easier to think in terms of a population mean vs a sample mean

But What is $Y_{k}^{b}$ ?

$Y_{j k}^{b}$ (in circle) is the latent school mean of item $k$

True/Population mean of all students of school $j$

Different from the observed school mean, ${¯ Y}_{. j k} = \sum_{i = 1}^{n_{j}} Y_{i j k} / n_{j}$

Mean of students in the sample from school $j$

22 / 57

Let's say the school has 500 students. The one in circle is the mean of everyone from that school. But the sample may only contain 50 students

May be easier to think in terms of a population mean vs a sample mean

But What is $Y_{k}^{b}$ ?

$Y_{j k}^{b}$ (in circle) is the latent school mean of item $k$

True/Population mean of all students of school $j$

Different from the observed school mean, ${¯ Y}_{. j k} = \sum_{i = 1}^{n_{j}} Y_{i j k} / n_{j}$

Mean of students in the sample from school $j$

Raudenbush and Bryk (2002): Reliability of cluster means

$V a r (Y_{i j k} - Y_{j k}^{b}) = σ_{k k}^{w} / n_{j}$

22 / 57

Let's say the school has 500 students. The one in circle is the mean of everyone from that school. But the sample may only contain 50 students

May be easier to think in terms of a population mean vs a sample mean

Raudenbush & Bryk also talks about the reliability of the cluster mean, which is not perfect with a finite sample
The observed mean converges to the latent mean when $n_{j} \to \infty$

Therefore, ${~ ω}^{b}$ is the internal consistency of a latent composite.

Is that a problem?

23 / 57

Therefore, ${~ ω}^{b}$ is the internal consistency of a latent composite.

Is that a problem?

Let's go back to $Y = T + E$ , where $T$ is a latent variable. What is the reliability of $T$ ?

23 / 57

Therefore, ${~ ω}^{b}$ is the internal consistency of a latent composite.

Is that a problem?

Let's go back to $Y = T + E$ , where $T$ is a latent variable. What is the reliability of $T$ ?

It should be 1 as $T$ is the true score

23 / 57

Therefore, ${~ ω}^{b}$ is the internal consistency of a latent composite.

Is that a problem?

Let's go back to $Y = T + E$ , where $T$ is a latent variable. What is the reliability of $T$ ?

It should be 1 as $T$ is the true score

But if we know the true score, we don't need to worry about reliability

23 / 57

Illustration Using Simulated Data

$ψ^{b} = ψ^{w} = 1$ , $n_{j} = 10$ for all $j$

Five items

$λ^{b} = 0.25$ , $θ^{b} = 0.1$
$λ^{w} = 0.5$ , $θ^{w} = 1$

24 / 57

Just to make things more clear, I simulated a data set

Ten observations in each cluster

Illustration Using Simulated Data

$ψ^{b} = ψ^{w} = 1$ , $n_{j} = 10$ for all $j$

Five items

$λ^{b} = 0.25$ , $θ^{b} = 0.1$
$λ^{w} = 0.5$ , $θ^{w} = 1$

24 / 57

Just to make things more clear, I simulated a data set

Ten observations in each cluster

Illustration Using Simulated Data

$ψ^{b} = ψ^{w} = 1$ , $n_{j} = 10$ for all $j$

Five items

$λ^{b} = 0.25$ , $θ^{b} = 0.1$
$λ^{w} = 0.5$ , $θ^{w} = 1$

Sources of measurement error:


Latent Mean	item uniqueness
Observed Mean	item uniqueness + sampling error

25 / 57

26 / 57

$η^{b}$ is the true score at the school level

Left: Correlation between latent score and latent composite

Right: Correlation between latent score and observed composite, which is smaller

${~ ω}^{b} = .76 = [C o r r (η^{b}, \sum_{k} Y_{k}^{b})]^{2}$

26 / 57

$η^{b}$ is the true score at the school level

Left: Correlation between latent score and latent composite

Right: Correlation between latent score and observed composite, which is smaller

${~ ω}^{b} = .76 = [C o r r (η^{b}, \sum_{k} Y_{k}^{b})]^{2}$

However, $[C o r r (η^{b}, Z^{b})]^{2} = .49$ , as

$V^{Error} = \sum_{k = 1}^{p} θ_{k k}^{b} + [(\sum_{k = 1}^{p} λ_{k}^{w})^{2} + \sum_{k = 1}^{p} θ_{k k}^{w}] / n$

$ω^{b} = .49 \neq {~ ω}^{b}$

26 / 57

$η^{b}$ is the true score at the school level

Left: Correlation between latent score and latent composite

Right: Correlation between latent score and observed composite, which is smaller

Overly optimistic information

imagine in a single-level context, saying that the reliability of the instrument was .76, but when it was less than .5

${~ ω}^{b} = .76 = [C o r r (η^{b}, \sum_{k} Y_{k}^{b})]^{2}$

However, $[C o r r (η^{b}, Z^{b})]^{2} = .49$ , as

$V^{Error} = \sum_{k = 1}^{p} θ_{k k}^{b} + [(\sum_{k = 1}^{p} λ_{k}^{w})^{2} + \sum_{k = 1}^{p} θ_{k k}^{w}] / n$

$ω^{b} = .49 \neq {~ ω}^{b}$

For the TIMSS items, $ω^{b} = .719$ , 95% CI [.668, .771]

as opposed to ${~ ω}^{b} = .977$

26 / 57

$η^{b}$ is the true score at the school level

Left: Correlation between latent score and latent composite

Right: Correlation between latent score and observed composite, which is smaller

Overly optimistic information

imagine in a single-level context, saying that the reliability of the instrument was .76, but when it was less than .5

How About ˜ωw~ωw?27 / 57

How About ${~ ω}^{w}$ ?

${~ ω}^{w}$ is composite reliability of latent-mean-centered scores
- Also latent variables
- But algebraically, ${~ ω}^{w} = ω^{w}$

27 / 57

Issue 2: Cross-Level Loading Invariance28 / 57

Without Constraints on Loadings

$η^{b}$ : school-level construct, no connection to $η^{w}$
$η^{w}$ : purely student-level construct (i.e., ICC = 0)
- E.g., PATW relative to the school mean

(See e.g., Mehta & Neale, 2005)

29 / 57

Can only compare relative standing, not absolute value

Without Constraints on Loadings

$η^{b}$ : school-level construct, no connection to $η^{w}$
$η^{w}$ : purely student-level construct (i.e., ICC = 0)
- E.g., PATW relative to the school mean

(See e.g., Mehta & Neale, 2005)

29 / 57

Can only compare relative standing, not absolute value

With Cross-Level Invariance

One construct $η$ : $η_{i j} = η_{j}^{b} + η_{i j}^{w}$

ICC = $\frac{ψ^{b}}{ψ^{b} + ψ^{w}}$

30 / 57

This is more consistent with the way we use cluster means and do centering in MLM

Strong/Scalar Invariance Across Clusters

Implies that $θ_{k k}^{b} = 0$ for all $k$ s

$\Rightarrow {~ ω}^{b} = 1.0$ (Jak, Oort, and Dolan, 2014)

For an individual construct, ${~ ω}^{b}$ is roughly a measure of strong invariance

31 / 57

Construct Meanings

Based on Stapleton, Yang, and Hancock (2016); Stapleton and Johnson (2019)

32 / 57

What is the Target Construct?

What is your attitude toward math?
What is your attitude toward math, relative to the school norm?
What is your school's overall attitude toward math?

33 / 57

Individual/Configural Construct34 / 57

Individual/Configural Construct

What is your attitude toward math?

Individual construct $η$

Partitioning: $η = η^{b} + η^{w}$

Configural construct $η^{b}$ (i.e., true cluster mean)

$V a r (η^{b}) / V a r (η)$ = ICC

Within-cluster component $η^{w}$

34 / 57

Matching Composites and ConstructsIndividual construct--Raw composite Zij=∑kYijkZij=∑kYijkVTrue=(∑pk=1λk)2(ψw+ψb)VTrue=(∑pk=1λk)2(ψw+ψb)
VError=1′Θb1+1′Θw1VError=1′Θb1+1′Θw1
Discussed in Raykov and du
Toit (2005)

35 / 57

Matching Composites and ConstructsIndividual construct--Raw composite Zij=∑kYijkZij=∑kYijkVTrue=(∑pk=1λk)2(ψw+ψb)VTrue=(∑pk=1λk)2(ψw+ψb)
VError=1′Θb1+1′Θw1VError=1′Θb1+1′Θw1
Discussed in Raykov and du
Toit (2005)

Configural construct--Composite cluster mean Zbj=∑kˉYjkZbj=∑k¯Yjk
VTrue=(∑pk=1λk)2ψbVTrue=(∑pk=1λk)2ψb
VError=1′Θb1+[(∑pk=1λk)2ψw+1′Θw1]/nVError=1′Θb1+[(∑pk=1λk)2ψw+1′Θw1]/n
For unbalanced cluster sizes, use the harmonic mean ˜n~n

35 / 57

Matching Composites and ConstructsIndividual construct--Raw composite Zij=∑kYijkZij=∑kYijkVTrue=(∑pk=1λk)2(ψw+ψb)VTrue=(∑pk=1λk)2(ψw+ψb)
VError=1′Θb1+1′Θw1VError=1′Θb1+1′Θw1
Discussed in Raykov and du
Toit (2005)

Configural construct--Composite cluster mean Zbj=∑kˉYjkZbj=∑k¯Yjk
VTrue=(∑pk=1λk)2ψbVTrue=(∑pk=1λk)2ψb
VError=1′Θb1+[(∑pk=1λk)2ψw+1′Θw1]/nVError=1′Θb1+[(∑pk=1λk)2ψw+1′Θw1]/n
For unbalanced cluster sizes, use the harmonic mean ˜n~n

Within-cluster construct--Composite of deviation scores Zwij=∑k(Yijk−ˉYjk)Zwij=∑k(Yijk−¯Yjk)
VTrue=(∑pk=1λk)2ψwVTrue=(∑pk=1λk)2ψw
VError=1′Θw1VError=1′Θw1

35 / 57

Replace $n$ with the harmonic mean for unequal cluster sizes

Within-Cluster Construct36 / 57

Within-Cluster Construct

What is your attitude toward math, relative to the school norm?

Within-cluster construct $η^{w}$

Expected ICC = 0

$ω^{w}$ reliability of $Z_{i j}^{w}$

$\frac{(\sum_{k = 1}^{p} λ_{k})^{2} ψ^{w}}{(\sum_{k = 1}^{p} λ_{k})^{2} ψ^{w} + 1^{'} Θ^{w} 1}$

36 / 57

Shared Construct37 / 57

Shared Construct

What is your school's attitude toward math?

Shared construct $η^{b}$ : Cluster-level attribute (aka climate)

$ω^{b}$ reliability of $Z_{j}^{b}$

$V^{True} = (\sum_{k = 1}^{p} λ_{k})^{2} ψ^{b}$
$V^{Error} = 1^{'} Θ^{b} 1 + 1^{'} Σ^{w} 1 / ~ n$

37 / 57

Shared + Configural/Individual Constructs38 / 57

Shared + Configural/Individual Constructs

What is your school's attitude toward math?

There may be rater acquiescence

Shared construct $η^{s}$ : School climate

Individual construct $η^{w}$ : Acquiescence

Configural construct $η^{b}$ : School means of Acquiescence

38 / 57

Shared + Configural/Individual Constructs

The school-level composite, $Z_{j}^{b}$ , measures both $η^{s}$ and $η^{b}$

$ω^{b (s)}$ : construct reliability of $Z_{j}^{b}$ measuring $η^{s}$

$V^{True} = (\sum_{k = 1}^{p} λ_{k}^{s})^{2} ψ^{s}$
$V^{Error} = (\sum_{k = 1}^{p} λ_{k})^{2} (ψ^{b} + ψ^{w} / ~ n)$
$+ 1^{'} Θ^{b} 1 + 1^{'} Θ^{w} 1 / ~ n$

39 / 57

Extensions of $α$

$\begin{matrix} α^{2 l} & = \frac{p}{p - 1} (\frac{\sum_{k \neq k^{'}} (σ_{k k^{'}}^{b} + σ_{k k^{'}}^{w})}{1^{'} Σ^{b} 1 + 1^{'} Σ^{w} 1}) α^{b} & = \frac{p}{p - 1} (\frac{\sum_{k \neq k^{'}} σ_{k k^{'}}^{b}}{1^{'} Σ^{b} 1 + 1^{'} Σ^{w} 1 / ~ n}) α^{w} & = \frac{p}{p - 1} (\frac{\sum_{k \neq k^{'}} σ_{k k^{'}}^{w}}{1^{'} Σ^{w} 1}) \end{matrix}$

40 / 57

41 / 57

Which One to Report?If a variable is partitioned in a multilevel model (most likely an individual construct), all three (ω2l,ωb,ωw)(ω2l,ωb,ωw) should be reportedCluster means and cluster-mean centered predictors
Outcome variable

42 / 57

Which One to Report?If a variable is partitioned in a multilevel model (most likely an individual construct), all three (ω2l,ωb,ωw)(ω2l,ωb,ωw) should be reportedCluster means and cluster-mean centered predictors
Outcome variable

Otherwise, reliability at the corresponding level (ωb(ωb or ωw)ωw)
42 / 57

Summary (Lai, 2020)Computing and reporting reliability information is important for multilevel data43 / 57

Summary (Lai, 2020)Computing and reporting reliability information is important for multilevel dataReliability information is needed for raw, cluster means, and cluster-mean centered scores43 / 57

Summary (Lai, 2020)Computing and reporting reliability information is important for multilevel dataReliability information is needed for raw, cluster means, and cluster-mean centered scoresPrevious approach to between-level reliability is an overestimate when cluster size is small43 / 57

Summary (Lai, 2020)Computing and reporting reliability information is important for multilevel dataReliability information is needed for raw, cluster means, and cluster-mean centered scoresPrevious approach to between-level reliability is an overestimate when cluster size is smallNature of target construct should be considered, and it has implications on reliability computation43 / 57

Construct
ω2lω2l, α2lα2l
ωbωb, αbαb
ωwωw, αwαw
ωb(s)ωb(s)

Individual
X
X
X

Configural

X

Within-Cluster

X

Shared

X

X

44 / 57

Construct	$ω^{2 l}$ , $α^{2 l}$	$ω^{b}$ , $α^{b}$	$ω^{w}$ , $α^{w}$	$ω^{b (s)}$
Individual	X	X	X
Configural		X
Within-Cluster			X
Shared		X		X

Longitudinal Data

Preliminary ideas. Suggestions are greatly appreciated.

45 / 57

Midlife in the United States

Data from MIDUS 2: Daily Stress Project, 2004-2009 (Ryff and Almeida, 2009)

2,022 participants, 8 days each
Target construct: Positive affect

Item	Wording
b2dc24	Did you feel attentive?
b2dc25	Did you feel proud?
b2dc26	Did you feel active?
b2dc27	Did you feel confident?

Type of scores: raw composite, person means, person-mean centered

46 / 57

From MCFA

Est $ICC (η) = .778$

Composite	Est $ω$	95% CI
Raw	.812	[.801, .822]
Within	.609	[.595, .623]
Between	.852	[.839, .864]

47 / 57

Can We Incorporate Time?

Cross-Classified CFA (Jeon and Rabe-Hesketh, 2012; Asparouhov and Muthén, 2012)

Assuming cross-level invariance for an individual construct, with decomposition $η_{t i} = η_{i}^{P} + η_{t}^{T} + η_{t i}^{W}$

48 / 57

Relation to the Generalizability Theory

Most meaningful when participants are measured on the same days/times

Cranford, Shrout, Iida, et al. (2006): generalizability coefficients for diary studies

49 / 57

Not the case for the MIDUS data, as everyone starts on a different day

Relation to the Generalizability Theory

Most meaningful when participants are measured on the same days/times

Cranford, Shrout, Iida, et al. (2006): generalizability coefficients for diary studies

Fixed vs. Random item facet (in estimation)
Relax the essential parallel test assumption
- Item-specific loadings and uniqueness
Flexible SEM modeling

49 / 57

Not the case for the MIDUS data, as everyone starts on a different day

Some Possible Reliability CoefficientsReliability of raw scoresVTrue=(∑kλk)2(ψP+ψT+ψW)VTrue=(∑kλk)2(ψP+ψT+ψW)
VError=1′(ΘP+ΘT+ΘW)1VError=1′(ΘP+ΘT+ΘW)1
50 / 57

Some Possible Reliability CoefficientsReliability of raw scoresVTrue=(∑kλk)2(ψP+ψT+ψW)VTrue=(∑kλk)2(ψP+ψT+ψW)
VError=1′(ΘP+ΘT+ΘW)1VError=1′(ΘP+ΘT+ΘW)1
Reliability of person means (across T time points)VTrue=(∑kλk)2ψPVTrue=(∑kλk)2ψP
VError=1′ΘP1+[(∑kλk)2ψW+1′ΘW1]/TVError=1′ΘP1+[(∑kλk)2ψW+1′ΘW1]/T
50 / 57

Some Possible Reliability CoefficientsReliability of raw scoresVTrue=(∑kλk)2(ψP+ψT+ψW)
VError=1′(ΘP+ΘT+ΘW)1
Reliability of person means (across T time points)VTrue=(∑kλk)2ψP
VError=1′ΘP1+[(∑kλk)2ψW+1′ΘW1]/T
Reliability of deviation from person meanVTrue=(∑kλk)2(ψT+ψW)
VError=1′(ΘT+ΘW)1
50 / 57

Person-level (trait-level) variance is not part of true score for the deviation score

In this example, there is essentially no time-level variance

E.g., no day of participation effect

Composite	Est	95% CI
Raw	.829	[.820, .837]
Within	.646	[.635, .660]
Between	.859	[.849, .868]

51 / 57

Many Questions Remain

Linkage to generalizability coefficients by Cranford et al. (2006)
Discrete indicators?
Should constructs at the within-person level and the between-person level be on the same metric?
Are there "shared" constructs at the person level?
- E.g., Intensively measuring a stable trait?
Reliability of change? (Rogosa, Brandt, and Zimowski, 1982)
- Related to reliability of within-person deviation?

52 / 57

Thanks!

Slides created via the R package xaringan.

53 / 57

References

Appelbaum, M., H. Cooper, R. B. Kline, et al. (2018). "Journal article reporting standards for quantitative research in psychology". In: American Psychologist 73.1, pp. 3-25. ISSN: 0003066X. DOI: 10.1037/amp0000191.

Asparouhov, T. and B. Muthén "General random effect latent variable modeling: Random subjects, items, contexts, and parameter". In: Advances in multilevel modeling for educational research: Addressing practical issues found in real-world applications. Charlotte, NC: Information Age, pp. 163-192.

Cranford, J. A., P. E. Shrout, M. Iida, et al. (2006). "A procedure for evaluating sensitivity to within-person change: Can mood measures in diary studies detect change reliably?" En. In: Personality and Social Psychology Bulletin 32.7, pp. 917-929. ISSN: 0146-1672, 1552-7433. DOI: 10.1177/0146167206287721. URL: http://journals.sagepub.com/doi/10.1177/0146167206287721 (visited on Nov. 08, 2020).

Geldhof, G. J., K. J. Preacher, and M. J. Zyphur (2014). "Reliability estimation in a multilevel confirmatory factor analysis framework". In: Psychological Methods 19.1, pp. 72-91. ISSN: 1082989X. DOI: 10.1037/a0032138.

54 / 57

References (cont'd)

Jak, S., F. J. Oort, and C. V. Dolan (2014). "Measurement bias in multilevel data". In: Structural Equation Modeling: A Multidisciplinary Journal 21.1, pp. 31-39. ISSN: 1070-5511. DOI: 10.1080/10705511.2014.856694.

Jeon, M. and S. Rabe-Hesketh (2012). "Profile-likelihood approach for estimating generalized linear mixed models with factor structures". En. In: Journal of Educational and Behavioral Statistics 37.4, pp. 518-542. ISSN: 1076-9986, 1935-1054. DOI: 10.3102/1076998611417628. URL: http://journals.sagepub.com/doi/10.3102/1076998611417628 (visited on Nov. 08, 2020).

Knol, M. H., C. V. Dolan, G. J. Mellenbergh, et al. (2016). "Measuring the quality of university lectures: Development and validation of the Instructional Skills Questionnaire (ISQ)". In: PLOS ONE 11.2. Ed. by D. S. Courvoisier, p. e0149163. ISSN: 1932-6203. DOI: 10.1371/journal.pone.0149163.

Lai, M. H. C. (2020). "Composite reliability of multilevel data: It’s about observed scores and construct meanings." En. In: Psychological Methods. ISSN: 1939-1463, 1082-989X. DOI: 10.1037/met0000287. URL: http://doi.apa.org/getdoi.cfm?doi=10.1037/met0000287 (visited on Nov. 08, 2020).

55 / 57

References (cont'd)

Raudenbush, S. W. and A. S. Bryk (2002). Hierarchical linear models: Applications and data analysis methods. 2nd ed. Thousand Oaks, CA: Sage. ISBN: 076191904X.

Raykov, T. and G. A. Marcoulides (2006). "On multilevel model reliability estimation from the perspective of structural equation modeling". In: Structural Equation Modeling: A Multidisciplinary Journal 13.1, pp. 130-141. ISSN: 1070-5511. DOI: 10.1207/s15328007sem1301_7.

Raykov, T. and S. H. C. du Toit (2005). "Estimation of reliability for multiple-component measuring instruments in hierarchical designs". In: Structural Equation Modeling: A Multidisciplinary Journal 12.4, pp. 536-550. ISSN: 1070-5511. DOI: 10.1207/s15328007sem1204_2.

Rogosa, D., D. Brandt, and M. Zimowski (1982). "A growth curve approach to the measurement of change." En. In: Psychological Bulletin 92.3, pp. 726-748. ISSN: 0033-2909. DOI: 10.1037/0033-2909.92.3.726. URL: http://content.apa.org/journals/bul/92/3/726 (visited on Nov. 08, 2020).

56 / 57

References (cont'd)

Rush, J. and S. M. Hofer (2014). "Differences in within- and between-person factor structure of positive and negative affect: Analysis of two intensive measurement studies using multilevel structural equation modeling.". In: Psychological Assessment 26.2, pp. 462-473. ISSN: 1939-134X. DOI: 10.1037/a0035666.

Ryff, C. D. and D. M. Almeida (2009). Midlife in the United States (MIDUS 2): Daily Stress Project, 2004-2009: Version 2. En. type: dataset. DOI: 10.3886/ICPSR26841.V2. URL: http://www.icpsr.umich.edu/icpsrweb/NACDA/studies/26841/version/2 (visited on Nov. 08, 2020).

Stapleton, L. M. and T. L. Johnson (2019). "Models to examine the validity of cluster-level factor structure using individual-level data". In: Advances in Methods and Practices in Psychological Science, p. 251524591985503. ISSN: 2515-2459. DOI: 10.1177/2515245919855039.

Stapleton, L. M., J. S. Yang, and G. R. Hancock (2016). "Construct meaning in multilevel settings". In: Journal of Educational and Behavioral Statistics 41.5, pp. 481-520. ISSN: 1076-9986. DOI: 10.3102/1076998616646200.

57 / 57

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Internal Consistency of Multilevel Data

Cluster Means, Centering, and Construct Meanings

Mark Lai

University of Southern California

2020/11/09

Outline

Reliability in factor analysis

Outline

Reliability in factor analysis

Multilevel reliability

Outline

Reliability in factor analysis

Multilevel reliability

Issues of level-specific reliability coefficients

Outline

Reliability in factor analysis

Multilevel reliability

Issues of level-specific reliability coefficients

Reliability indices for observed composite scores

Outline

Reliability in factor analysis

Multilevel reliability

Issues of level-specific reliability coefficients

Reliability indices for observed composite scores

Longitudinal Data?

Importance of Reliability

Importance of Reliability

APA Journal Article Reporting Standards (JARS)

Reliability

Classical Test Theory

Classical Test Theory

Classical Test Theory

Latent Variable/Factor Analysis

(Essential) Tau-equivalence

Latent Variable/Factor Analysis

(Essential) Tau-equivalence

Latent Variable/Factor Analysis

(Essential) Tau-equivalence

Latent Variable/Factor Analysis

(Essential) Tau-equivalence

Latent Variable

Congeneric

Latent Variable

Congeneric

Latent Variable

Congeneric

Reliability is a property of observed test scores (Z)(Z), not the latent scores (η)(η)

Multilevel Data

Example

Multilevel Reliability Not Consistently Reported

Multilevel Reliability

Geldhof et al. (2014)

Geldhof et al. (2014)

"Unconstrained" Multilevel Factor Model

Geldhof et al. (2014)

˜ωb~ωb is Usually High

˜ωb~ωb is Usually High

Are we that good at measuring between-level variables?

Three Issues

Three Issues

To be fair, most of these issues have only started getting attentions recently

Issue 1: "Scores" in Multilevel Studies

Issue 1: "Scores" in Multilevel Studies

Three Sets of Scores

Three Sets of Scores

We should compute reliability for each of them

Which Score Is ˜ωb~ωb for?

Which Score Is ˜ωb~ωb for?

But What is YbkYbk?

But What is YbkYbk?

But What is YbkYbk?

Is that a problem?

Is that a problem?

Is that a problem?

Is that a problem?

Illustration Using Simulated Data

Illustration Using Simulated Data

Illustration Using Simulated Data

How About ˜ωw~ωw?

How About ˜ωw~ωw?

Reliability is a property of observed test scores $(Z)$ , not the latent scores $(η)$

${~ ω}^{b}$ is Usually High

${~ ω}^{b}$ is Usually High

Which Score Is ${~ ω}^{b}$ for?

Which Score Is ${~ ω}^{b}$ for?

But What is $Y_{k}^{b}$ ?

But What is $Y_{k}^{b}$ ?

But What is $Y_{k}^{b}$ ?

How About ${~ ω}^{w}$ ?

How About ${~ ω}^{w}$ ?

Extensions of $α$