Such an honor to be here. Co-chairs of my dissertation are both graduates of ASU, and they would share with me how good the program at ASU is
Lineage of ASU Quant . . .
Such an honor to be here. Co-chairs of my dissertation are both graduates of ASU, and they would share with me how good the program at ASU is
Lineage of ASU Quant . . .
Such an honor to be here. Co-chairs of my dissertation are both graduates of ASU, and they would share with me how good the program at ASU is
Lineage of ASU Quant . . .
Such an honor to be here. Co-chairs of my dissertation are both graduates of ASU, and they would share with me how good the program at ASU is
Lineage of ASU Quant . . .
Such an honor to be here. Co-chairs of my dissertation are both graduates of ASU, and they would share with me how good the program at ASU is
Lineage of ASU Quant . . .
Some alternative indices I proposed to solve these limitations
Looking forward to comments and suggestions; whether I'm doing something wrong or right
Such an honor to be here. Co-chairs of my dissertation are both graduates of ASU, and they would share with me how good the program at ASU is
Lineage of ASU Quant . . .
Some alternative indices I proposed to solve these limitations
Looking forward to comments and suggestions; whether I'm doing something wrong or right
Psychological scales are not perfect
Certain level of reliability needed
Image credit: Reliability by Nick Youngson CC BY-SA 3.0 Alpha Stock Images
Estimate and report values of reliability coefficients for the scores analyzed (i.e., the research's sample) (p. 7)
Similar recommendations can be found in numerous journal and methodological guidelines
Just a quick introduction on the foundational work on reliability that this research relies on.
Lord & Novick (1968)
Observed score = True score + Error
Y=T+E
For example, we ask students report their attitutes toward math
Lord & Novick (1968)
Observed score = True score + Error
Y=T+E
T and E independent, so
σ2Y=σ2T+σ2E
For example, we ask students report their attitutes toward math
Lord & Novick (1968)
Observed score = True score + Error
Y=T+E
T and E independent, so
σ2Y=σ2T+σ2E
Reliability ρ=σ2Tσ2Y=σ2Tσ2T+σ2E=[Corr(Y,T)]2
For example, we ask students report their attitutes toward math
p items: k=1,…,p
Yk=νk+η+ϵk
When we have multiple items, we can estimate the error variance
For the true score proportion of Y, it's on the same metric/unit as the latent variable
p items: k=1,…,p
Yk=νk+η+ϵk Var(η)=ψ, Var(ϵk)=θk, ϵk and ϵk′ independent
Cov(Yk,Yk′)=ψ
When we have multiple items, we can estimate the error variance
For the true score proportion of Y, it's on the same metric/unit as the latent variable
p items: k=1,…,p
Yk=νk+η+ϵk Var(η)=ψ, Var(ϵk)=θk, ϵk and ϵk′ independent
Cov(Yk,Yk′)=ψ
Unweighted (unit-weight) composite: Z=∑kYj
Variance of unweighted composite: Var(Z)=p2ψ+∑kθk
When we have multiple items, we can estimate the error variance
For the true score proportion of Y, it's on the same metric/unit as the latent variable
p items: k=1,…,p
Yk=νk+η+ϵk Var(η)=ψ, Var(ϵk)=θk, ϵk and ϵk′ independent
Cov(Yk,Yk′)=ψ
Unweighted (unit-weight) composite: Z=∑kYj
Variance of unweighted composite: Var(Z)=p2ψ+∑kθk Reliability = p2ψVar(Z), or Cronbach's α
When we have multiple items, we can estimate the error variance
For the true score proportion of Y, it's on the same metric/unit as the latent variable
There were different ways to justify the derivation of α
Yk=νk+λkη+ϵk
Yk=νk+λkη+ϵk
Composite reliability ω=VTrueVTrue+VError
Yk=νk+λkη+ϵk
Composite reliability ω=VTrueVTrue+VError
More generally, with Cov([ϵ1,ϵ2,…])=Θ, VError=1′Θ1
Lai, M. H. C. (2020). Composite reliability of multilevel data: It's about observed scores and construct meanings. Psychological Methods. Advance online publication. https://doi.org/10.1037/met0000287
2007 Trends in International Mathematics and Science Study (TIMSS; Williams et al., 2009)
Positive attitudes toward math (PATM)
Item | Wording |
---|---|
AS4MAMOR | Would like to do more math |
AS4MAENJ | I enjoy learning mathematics |
AS4MALIK | I like math |
AS4MABOR | Math is boring (reverse-coded) |
Kim et al. (2016): Only 54% reported reliability, among 39 articles using multilevel confirmatory factor analysis (MCFA)
However, discussion on multilevel reliability is not new
Raykov and du Toit (2005); Raykov and Marcoulides (2006)
Cranford, Shrout, Iida, Rafaeli, Yip, and Bolger (2006)
Geldhof, Preacher, and Zyphur (2014)
j indexes cluster
Yij=ν+λbηbj+λwjηwij+ϵij
ϵij=ϵbj+ϵwij Var(ηb)=ψb, Var(ηwj)=ψw
Var(ϵb)=θb, Var(ϵwj)=θw
Loading invariance across clusters: λwj=λw for all j
No cross-level invariance
ϵ is the uniqueness, separated into the within and the between level
Fixed ψb=ψw=1 for identification
~ωb=(∑pk=1λbk)2(∑pk=1λbk)2+∑pk=1θbkk~ωw=(∑pk=1λwk)2(∑pk=1λwk)2+∑pk=1θwkk.
For the TIMSS data
Est ~ωw = .857, 95% CI [.849, .863]
Est ~ωb = .977, 95% CI [.964, .987] !!
Use tilde to distinguish them with the indices I will discuss later
Why I got interested in this is that the reliability indices seem extremely large
Not uncommon in the literature . . .
Not uncommon in the literature . . .
Positive and negative affects: ~ωb = .94 to .97 (Rush and Hofer, 2014)
Instructional Skills Questionnaire: ~αb between .90 to .99 (Knol, Dolan, Mellenbergh, and van der Maas, 2016)
Repeated measures within persons
Multiple factors in ISQ, Team from Netherland
Which "scores" are reliable?
Cross-level invariance
Construct meanings
Although it is a critique on the level-specific reliability, to be fair
Which "scores" are reliable?
Cross-level invariance
Construct meanings
Although it is a critique on the level-specific reliability, to be fair
First compute a composite of the 4 PATM items
If we use composite PATM to predict student's math achievement, we can compute
First compute a composite of the 4 PATM items
If we use composite PATM to predict student's math achievement, we can compute
IDSCHOOL | AS4MAMOR | AS4MAENJ | AS4MALIK | AS4MABORr | Z | Zb | Zw |
---|---|---|---|---|---|---|---|
1 | 2 | 2 | 1 | 2 | 7 | 6.5000 | 0.5000 |
1 | 2 | 1 | 1 | 1 | 5 | 6.5000 | -1.5000 |
1 | 2 | 1 | 1 | 1 | 5 | 6.5000 | -1.5000 |
1 | 2 | 1 | 2 | 1 | 6 | 6.5000 | -0.5000 |
1 | 1 | 1 | 1 | 1 | 4 | 6.5000 | -2.5000 |
2 | 3 | 2 | 2 | 2 | 9 | 6.5625 | 2.4375 |
2 | 1 | 2 | 2 | 1 | 6 | 6.5625 | -0.5625 |
2 | 1 | 1 | 1 | 1 | 4 | 6.5625 | -2.5625 |
2 | 3 | 2 | 1 | 1 | 7 | 6.5625 | 0.4375 |
2 | 2 | 2 | 3 | 1 | 8 | 6.5625 | 1.4375 |
Raw/Overall composite PATM (Zij)
School means of composite PATM (cluster mean; Zbj)
Student deviations from school means (cluster-mean centered; Zwij=Zij−Zbj)
Raw/Overall composite PATM (Zij)
School means of composite PATM (cluster mean; Zbj)
Student deviations from school means (cluster-mean centered; Zwij=Zij−Zbj)
Is ~ωb the reliability of the school means?
Not clear in the original paper
Is ~ωb the reliability of the school means?
Var(Yb1)=(λb1)2+θb11
Var(∑kYbk)=(∑kλbk)2+∑kθbkk
~ωb=(∑kλbk)2(∑kλbk)2+∑kθbkk
Not clear in the original paper
Ybjk (in circle) is the latent school mean of item k
Let's say the school has 500 students. The one in circle is the mean of everyone from that school. But the sample may only contain 50 students
May be easier to think in terms of a population mean vs a sample mean
Ybjk (in circle) is the latent school mean of item k
Different from the observed school mean, ¯Y.jk=∑nji=1Yijk/nj
Let's say the school has 500 students. The one in circle is the mean of everyone from that school. But the sample may only contain 50 students
May be easier to think in terms of a population mean vs a sample mean
Ybjk (in circle) is the latent school mean of item k
Different from the observed school mean, ¯Y.jk=∑nji=1Yijk/nj
Raudenbush and Bryk (2002): Reliability of cluster means
Var(Yijk−Ybjk)=σwkk/nj
Let's say the school has 500 students. The one in circle is the mean of everyone from that school. But the sample may only contain 50 students
May be easier to think in terms of a population mean vs a sample mean
Therefore, ~ωb is the internal consistency of a latent composite.
Therefore, ~ωb is the internal consistency of a latent composite.
Let's go back to Y=T+E, where T is a latent variable. What is the reliability of T?
Therefore, ~ωb is the internal consistency of a latent composite.
Let's go back to Y=T+E, where T is a latent variable. What is the reliability of T?
It should be 1 as T is the true score
Therefore, ~ωb is the internal consistency of a latent composite.
Let's go back to Y=T+E, where T is a latent variable. What is the reliability of T?
It should be 1 as T is the true score
But if we know the true score, we don't need to worry about reliability
ψb=ψw=1, nj=10 for all j
Five items
Just to make things more clear, I simulated a data set
Ten observations in each cluster
ψb=ψw=1, nj=10 for all j
Five items
Just to make things more clear, I simulated a data set
Ten observations in each cluster
ψb=ψw=1, nj=10 for all j
Five items
Sources of measurement error:
Latent Mean | item uniqueness |
Observed Mean | item uniqueness + sampling error |
ηb is the true score at the school level
Left: Correlation between latent score and latent composite
Right: Correlation between latent score and observed composite, which is smaller
~ωb=.76=[Corr(ηb,∑kYbk)]2
ηb is the true score at the school level
Left: Correlation between latent score and latent composite
Right: Correlation between latent score and observed composite, which is smaller
~ωb=.76=[Corr(ηb,∑kYbk)]2
However, [Corr(ηb,Zb)]2=.49, as
VError=∑pk=1θbkk+[(∑pk=1λwk)2+∑pk=1θwkk]/n
ωb=.49≠~ωb
ηb is the true score at the school level
Left: Correlation between latent score and latent composite
Right: Correlation between latent score and observed composite, which is smaller
Overly optimistic information
imagine in a single-level context, saying that the reliability of the instrument was .76, but when it was less than .5
~ωb=.76=[Corr(ηb,∑kYbk)]2
However, [Corr(ηb,Zb)]2=.49, as
VError=∑pk=1θbkk+[(∑pk=1λwk)2+∑pk=1θwkk]/n
ωb=.49≠~ωb
For the TIMSS items, ωb=.719, 95% CI [.668, .771]
ηb is the true score at the school level
Left: Correlation between latent score and latent composite
Right: Correlation between latent score and observed composite, which is smaller
Overly optimistic information
imagine in a single-level context, saying that the reliability of the instrument was .76, but when it was less than .5
~ωw is composite reliability of latent-mean-centered scores
ηb: school-level construct, no connection to ηw
ηw: purely student-level construct (i.e., ICC = 0)
(See e.g., Mehta & Neale, 2005)
Can only compare relative standing, not absolute value
ηb: school-level construct, no connection to ηw
ηw: purely student-level construct (i.e., ICC = 0)
(See e.g., Mehta & Neale, 2005)
Can only compare relative standing, not absolute value
One construct η: ηij=ηbj+ηwij
ICC = ψbψb+ψw
This is more consistent with the way we use cluster means and do centering in MLM
Implies that θbkk=0 for all ks
For an individual construct, ~ωb is roughly a measure of strong invariance
Based on Stapleton, Yang, and Hancock (2016); Stapleton and Johnson (2019)
What is your attitude toward math?
What is your attitude toward math, relative to the school norm?
What is your school's overall attitude toward math?
Individual construct η
Partitioning: η=ηb+ηw
Configural construct ηb (i.e., true cluster mean)
Within-cluster component ηw
Replace n with the harmonic mean for unequal cluster sizes
Within-cluster construct ηw
Expected ICC = 0
ωw reliability of Zwij
(∑pk=1λk)2ψw(∑pk=1λk)2ψw+1′Θw1
Shared construct ηb: Cluster-level attribute (aka climate)
ωb reliability of Zbj
There may be rater acquiescence
Shared construct ηs: School climate
Individual construct ηw: Acquiescence
Configural construct ηb: School means of Acquiescence
The school-level composite, Zbj, measures both ηs and ηb
ωb(s): construct reliability of Zbj measuring ηs
α2l=pp−1(∑k≠k′(σbkk′+σwkk′)1′Σb1+1′Σw1)αb=pp−1(∑k≠k′σbkk′1′Σb1+1′Σw1/~n)αw=pp−1(∑k≠k′σwkk′1′Σw1)
Construct | ω2l, α2l | ωb, αb | ωw, αw | ωb(s) |
---|---|---|---|---|
Individual | X | X | X | |
Configural | X | |||
Within-Cluster | X | |||
Shared | X | X |
Preliminary ideas. Suggestions are greatly appreciated.
Data from MIDUS 2: Daily Stress Project, 2004-2009 (Ryff and Almeida, 2009)
2,022 participants, 8 days each
Target construct: Positive affect
Item | Wording |
---|---|
b2dc24 | Did you feel attentive? |
b2dc25 | Did you feel proud? |
b2dc26 | Did you feel active? |
b2dc27 | Did you feel confident? |
Est ICC(η)=.778
Composite | Est ω | 95% CI |
---|---|---|
Raw | .812 | [.801, .822] |
Within | .609 | [.595, .623] |
Between | .852 | [.839, .864] |
Cross-Classified CFA (Jeon and Rabe-Hesketh, 2012; Asparouhov and Muthén, 2012)
Assuming cross-level invariance for an individual construct, with decomposition ηti=ηPi+ηTt+ηWti
Most meaningful when participants are measured on the same days/times
Cranford, Shrout, Iida, et al. (2006): generalizability coefficients for diary studies
Not the case for the MIDUS data, as everyone starts on a different day
Most meaningful when participants are measured on the same days/times
Cranford, Shrout, Iida, et al. (2006): generalizability coefficients for diary studies
Not the case for the MIDUS data, as everyone starts on a different day
Person-level (trait-level) variance is not part of true score for the deviation score
In this example, there is essentially no time-level variance
Composite | Est ω | 95% CI |
---|---|---|
Raw | .829 | [.820, .837] |
Within | .646 | [.635, .660] |
Between | .859 | [.849, .868] |
Linkage to generalizability coefficients by Cranford et al. (2006)
Discrete indicators?
Should constructs at the within-person level and the between-person level be on the same metric?
Are there "shared" constructs at the person level?
Reliability of change? (Rogosa, Brandt, and Zimowski, 1982)
Appelbaum, M., H. Cooper, R. B. Kline, et al. (2018). "Journal article reporting standards for quantitative research in psychology". In: American Psychologist 73.1, pp. 3-25. ISSN: 0003066X. DOI: 10.1037/amp0000191.
Asparouhov, T. and B. Muthén "General random effect latent variable modeling: Random subjects, items, contexts, and parameter". In: Advances in multilevel modeling for educational research: Addressing practical issues found in real-world applications. Charlotte, NC: Information Age, pp. 163-192.
Cranford, J. A., P. E. Shrout, M. Iida, et al. (2006). "A procedure for evaluating sensitivity to within-person change: Can mood measures in diary studies detect change reliably?" En. In: Personality and Social Psychology Bulletin 32.7, pp. 917-929. ISSN: 0146-1672, 1552-7433. DOI: 10.1177/0146167206287721. URL: http://journals.sagepub.com/doi/10.1177/0146167206287721 (visited on Nov. 08, 2020).
Geldhof, G. J., K. J. Preacher, and M. J. Zyphur (2014). "Reliability estimation in a multilevel confirmatory factor analysis framework". In: Psychological Methods 19.1, pp. 72-91. ISSN: 1082989X. DOI: 10.1037/a0032138.
Jak, S., F. J. Oort, and C. V. Dolan (2014). "Measurement bias in multilevel data". In: Structural Equation Modeling: A Multidisciplinary Journal 21.1, pp. 31-39. ISSN: 1070-5511. DOI: 10.1080/10705511.2014.856694.
Jeon, M. and S. Rabe-Hesketh (2012). "Profile-likelihood approach for estimating generalized linear mixed models with factor structures". En. In: Journal of Educational and Behavioral Statistics 37.4, pp. 518-542. ISSN: 1076-9986, 1935-1054. DOI: 10.3102/1076998611417628. URL: http://journals.sagepub.com/doi/10.3102/1076998611417628 (visited on Nov. 08, 2020).
Knol, M. H., C. V. Dolan, G. J. Mellenbergh, et al. (2016). "Measuring the quality of university lectures: Development and validation of the Instructional Skills Questionnaire (ISQ)". In: PLOS ONE 11.2. Ed. by D. S. Courvoisier, p. e0149163. ISSN: 1932-6203. DOI: 10.1371/journal.pone.0149163.
Lai, M. H. C. (2020). "Composite reliability of multilevel data: It’s about observed scores and construct meanings." En. In: Psychological Methods. ISSN: 1939-1463, 1082-989X. DOI: 10.1037/met0000287. URL: http://doi.apa.org/getdoi.cfm?doi=10.1037/met0000287 (visited on Nov. 08, 2020).
Raudenbush, S. W. and A. S. Bryk (2002). Hierarchical linear models: Applications and data analysis methods. 2nd ed. Thousand Oaks, CA: Sage. ISBN: 076191904X.
Raykov, T. and G. A. Marcoulides (2006). "On multilevel model reliability estimation from the perspective of structural equation modeling". In: Structural Equation Modeling: A Multidisciplinary Journal 13.1, pp. 130-141. ISSN: 1070-5511. DOI: 10.1207/s15328007sem1301_7.
Raykov, T. and S. H. C. du Toit (2005). "Estimation of reliability for multiple-component measuring instruments in hierarchical designs". In: Structural Equation Modeling: A Multidisciplinary Journal 12.4, pp. 536-550. ISSN: 1070-5511. DOI: 10.1207/s15328007sem1204_2.
Rogosa, D., D. Brandt, and M. Zimowski (1982). "A growth curve approach to the measurement of change." En. In: Psychological Bulletin 92.3, pp. 726-748. ISSN: 0033-2909. DOI: 10.1037/0033-2909.92.3.726. URL: http://content.apa.org/journals/bul/92/3/726 (visited on Nov. 08, 2020).
Rush, J. and S. M. Hofer (2014). "Differences in within- and between-person factor structure of positive and negative affect: Analysis of two intensive measurement studies using multilevel structural equation modeling.". In: Psychological Assessment 26.2, pp. 462-473. ISSN: 1939-134X. DOI: 10.1037/a0035666.
Ryff, C. D. and D. M. Almeida (2009). Midlife in the United States (MIDUS 2): Daily Stress Project, 2004-2009: Version 2. En. type: dataset. DOI: 10.3886/ICPSR26841.V2. URL: http://www.icpsr.umich.edu/icpsrweb/NACDA/studies/26841/version/2 (visited on Nov. 08, 2020).
Stapleton, L. M. and T. L. Johnson (2019). "Models to examine the validity of cluster-level factor structure using individual-level data". In: Advances in Methods and Practices in Psychological Science, p. 251524591985503. ISSN: 2515-2459. DOI: 10.1177/2515245919855039.
Stapleton, L. M., J. S. Yang, and G. R. Hancock (2016). "Construct meaning in multilevel settings". In: Journal of Educational and Behavioral Statistics 41.5, pp. 481-520. ISSN: 1076-9986. DOI: 10.3102/1076998616646200.
Such an honor to be here. Co-chairs of my dissertation are both graduates of ASU, and they would share with me how good the program at ASU is
Lineage of ASU Quant . . .
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |