A New Effect Size Statistic for Measurement Non-Invariance With Multiple Groups and Multiple Grouping Variables

University of Southern California

Mark Hok Chio Lai
Yichi Zhang
Meltem Ozcan
Winnie Wing-Yee Tse
Alexander Miles

Acknowledgements

This research is based on work supported by the National Science Foundation (Grant 2141790)

The paper has been accepted for publication in Structural Equation Modeling

Measurement Invariance

P(yj∣η,g)=P(yj∣η)for all g,η

  • The same construct is measured in the same way across groups, time, etc

Measurement invariance is the idea that a construct is measured in the same way across groups, occasions, time, and so forth.

For example, if we think about a test of depressive symptoms as a ruler, it is invariant across gender, if it yields the same number for people of different genders, who have the same level of true depression.

Image credits:

  • Ksiom, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons
  • Lyon Cyborg, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

Violation of MI → spurious/biased group differences

  • Most studies focused on binary conclusions (e.g., invariant or not)
  • Effect sizes rarely discussed or reported

Measurement invariance is fundamental because without it, any findings on group differences maybe spurious or biased.

Therefore, there has been an exponential growth over the past two decades on invariance articles, and right now we have about 700 articles per year doing invariance testing on some psychological measures.

“Repliction Crisis” in MI Studies

  • Zhang (2022): Synthesis of 32 studies evaluating gender invariance of the Center for Epidemiologic Studies Depression Scale (CES-D)
    • Drastic differences in MI findings across studies
    • All 20 items were found noninvariant in at least one study
    • Only a few provided sufficient information to compute effect sizes

Cohen’s d analogue

  • dMACS (Nye et al., 2019; Nye & Drasgow, 2011)

dMACSj,(g1,g2)=√∫(ˆYjg1−ˆYjg2|η)2f(η)dηVar(Yj)

  • ˆY = expected item score
  • Standardized mean difference in expected item scores due to noninvariance

Multiple Groups and Grouping Variables

  • Cross-cultural, Ethnicity × Gender, etc

Variability of Expected Item Score (ˆY) at a Given η.

Cohen’s f Analogue

For G groups, each of size ng and total sample size N,

f2MACSj=1NGjVar(Yj)Gj∑g=1ng∫∞−∞(ˆYjg−ˉˆYj|η)2f(η)dη=SD2noninvarianceSD2item score.

  • Expected (squared) deviation from grand mean due to noninvariance, in standardized units

fMACS

  • For both continuous and categorical items
  • Like Cohen’s f, 2 fMACS = dMACS for two groups
  • f < .10 indicate negligible effect size

Empirical Example 2: Alcohol Beliefs Scale

  • Data: 1,148 U.S. undergraduates (2 gender × 3 ethnicity groups), from Lui (2019)
  • Measure: College Life Alcohol Salience Scale, 15 items

Osberg et al. (2010), p. 6, Table 2

library(pinsearch)
# Specification search for 
# partial invariance
ps <- pinSearch(
  mod, data = dat,
  group = "group",
  estimator = "MLR",
  missing = "fiml",
  type = "residual.covariances")
# Obtain omnibus fmacs
# effect size (for lavaan objects)
(f_omni <- pin_effsize(ps[[1]]))library(pinsearch)
# Specification search for 
# partial invariance
ps <- pinSearch(
  mod, data = dat,
  group = "group",
  estimator = "MLR",
  missing = "fiml",
  type = "residual.covariances")
# Obtain omnibus fmacs
# effect size (for lavaan objects)
(f_omni <- pin_effsize(ps[[1]]))
  • 8 items with noninvariant intercepts
  • fMACS ranged from 0.06 to 0.15

Contrast: Main and Interaction

  • Like ANOVA, we can decompose the effect sizes into main and interaction effects
    • Not orthogonal with unbalanced sample sizes
fMACS effect sizes for the CLASS items
Overall Gender Ethnicity Gender x Ethnicity
class1 0.10 0.03 0.05 0.05
class2 0.10 0.08 0.06 0.06
class3 0.07 0.03 0.04 0.04
class4 0.11 0.04 0.09 0.05
class5 0.06 0.03 0.04 0.04
class7 0.08 0.00 0.07 0.00
class8 0.09 0.04 0.05 0.05
class14 0.15 0.04 0.07 0.07

Other Supported Features

  • Test-level fMACS (unweighted or weighted sums)

    pin_effsize(..., item_weights = rep(1, 15))
  • Bootstrap bias correction and confidence intervals

Conclusion

  • fMACS: A versatile effect size for quantifying noninvariance across multiple groups and variables.
  • Impact: Enhances transparency, replicability, and practical significance in MI research.

Questions?

Thank you for your attention!

References

Lui, P. P. (2019). College alcohol beliefs: Measurement invariance, mean differences, and correlations with alcohol use outcomes across sociodemographic groups. Journal of Counseling Psychology, 66(4), 487–495. https://doi.org/10.1037/cou0000338
Nye, C. D., Bradburn, J., Olenick, J., Bialko, C., & Drasgow, F. (2019). How big are my effects? Examining the magnitude of effect sizes in studies of measurement equivalence. Organizational Research Methods, 22(3), 678–709. https://doi.org/10.1177/1094428118761122
Nye, C. D., & Drasgow, F. (2011). Effect size indices for analyses of measurement equivalence: Understanding the practical importance of differences between groups. Journal of Applied Psychology, 96, 966–980. https://doi.org/10.1037/a0022955
Osberg, T. M., Atkins, L., Buchholz, L., Shirshova, V., Swiantek, A., Whitley, J., Hartman, S., & Oquendo, N. (2010). Development and validation of the College Life Alcohol Salience Scale: A measure of beliefs about the role of alcohol in college life. Psychology of Addictive Behaviors, 24(1), 1–12. https://doi.org/10.1037/a0018197
Zhang, G. (2022). A systematic review of measurement invariance research of the CES-D scale across gender [Unpublished master’s thesis, University of Southern California]. University of Southern California. https://doi.org/10.25549/usctheses-oUC111375873
1 / 16
A New Effect Size Statistic for Measurement Non-Invariance With Multiple Groups and Multiple Grouping Variables University of Southern California Mark Hok Chio Lai Yichi Zhang Meltem Ozcan Winnie Wing-Yee Tse Alexander Miles

  1. Slides

  2. Tools

  3. Close
  • A New Effect Size Statistic for Measurement Non-Invariance With Multiple Groups and Multiple Grouping Variables
  • Acknowledgements
  • Measurement Invariance
  • Violation of MI \(\to\) spurious/biased group differences
  • “Repliction Crisis” in MI Studies
  • Cohen’s \(d\) analogue
  • Multiple Groups and Grouping Variables
  • Cohen’s \(f\) Analogue
  • \(f_\text{MACS}\)
  • Empirical Example 2: Alcohol Beliefs Scale
  • library(pinsearch)...
  • Contrast: Main and Interaction
  • Other Supported Features
  • Conclusion
  • Questions?
  • References
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • r Scroll View Mode
  • ? Keyboard Help