A New Effect Size Statistic for Measurement Non-Invariance With Multiple Groups and Multiple Grouping Variables

University of Southern California

Mark Hok Chio Lai
Yichi Zhang
Meltem Ozcan
Winnie Wing-Yee Tse
Alexander Miles

Acknowledgements

This research is based on work supported by the National Science Foundation (Grant 2141790)

The paper has been accepted for publication in Structural Equation Modeling

Measurement Invariance

\[ P(y_j \mid \eta, g) = P(y_j \mid \eta)\quad \text{for all }g, \eta \]

  • The same construct is measured in the same way across groups, time, etc

Violation of MI \(\to\) spurious/biased group differences

  • Most studies focused on binary conclusions (e.g., invariant or not)
  • Effect sizes rarely discussed or reported

“Repliction Crisis” in MI Studies

  • Zhang (2022): Synthesis of 32 studies evaluating gender invariance of the Center for Epidemiologic Studies Depression Scale (CES-D)
    • Drastic differences in MI findings across studies
    • All 20 items were found noninvariant in at least one study
    • Only a few provided sufficient information to compute effect sizes

Cohen’s \(d\) analogue

\[ {d_\mathrm{MACS}}_{j, (g_1, g_2)} = \sqrt{\frac{\int (\hat Y_{j g_1} - \hat Y_{j g_2} | \eta)^2 f(\eta) d \eta}{\mathop{\mathrm{\mathrm{Var}}}(Y_j)}} \]

  • \(\hat Y\) = expected item score
  • Standardized mean difference in expected item scores due to noninvariance

Multiple Groups and Grouping Variables

  • Cross-cultural, Ethnicity \(\times\) Gender, etc

Variability of Expected Item Score (\(\hat{Y}\)) at a Given \(\eta\).

Cohen’s \(f\) Analogue

For \(G\) groups, each of size \(n_g\) and total sample size \(N\),

\[ \begin{aligned} {f^2_\text{MACS}}_j & = \frac{1}{N G_j \mathop{\mathrm{\mathrm{Var}}}(Y_j)} \sum_{g = 1}^{G_j} n_g \int_{-\infty}^\infty \left(\hat Y_{j g} - \bar{\hat Y}_j | \eta\right)^2 f(\eta) d \eta \\ & = \frac{\mathit{SD}^2_\text{noninvariance}}{\mathit{SD}^2_\text{item score}}. \end{aligned} \]

  • Expected (squared) deviation from grand mean due to noninvariance, in standardized units

\(f_\text{MACS}\)

  • For both continuous and categorical items
  • Like Cohen’s \(f\), 2 \(f_\text{MACS}\) = \(d_\text{MACS}\) for two groups
  • \(f\) < .10 indicate negligible effect size

Empirical Example 2: Alcohol Beliefs Scale

  • Data: 1,148 U.S. undergraduates (2 gender × 3 ethnicity groups), from Lui (2019)
  • Measure: College Life Alcohol Salience Scale, 15 items

Osberg et al. (2010), p. 6, Table 2

library(pinsearch)
# Specification search for 
# partial invariance
ps <- pinSearch(
  mod, data = dat,
  group = "group",
  estimator = "MLR",
  missing = "fiml",
  type = "residual.covariances")
# Obtain omnibus fmacs
# effect size (for lavaan objects)
(f_omni <- pin_effsize(ps[[1]]))
  • 8 items with noninvariant intercepts
  • \({f_\mathrm{MACS}}\) ranged from 0.06 to 0.15

Contrast: Main and Interaction

  • Like ANOVA, we can decompose the effect sizes into main and interaction effects
    • Not orthogonal with unbalanced sample sizes
fMACS effect sizes for the CLASS items
Overall Gender Ethnicity Gender x Ethnicity
class1 0.10 0.03 0.05 0.05
class2 0.10 0.08 0.06 0.06
class3 0.07 0.03 0.04 0.04
class4 0.11 0.04 0.09 0.05
class5 0.06 0.03 0.04 0.04
class7 0.08 0.00 0.07 0.00
class8 0.09 0.04 0.05 0.05
class14 0.15 0.04 0.07 0.07

Other Supported Features

  • Test-level \({f_\mathrm{MACS}}\) (unweighted or weighted sums)

    pin_effsize(..., item_weights = rep(1, 15))
  • Bootstrap bias correction and confidence intervals

Conclusion

  • \({f_\mathrm{MACS}}\): A versatile effect size for quantifying noninvariance across multiple groups and variables.
  • Impact: Enhances transparency, replicability, and practical significance in MI research.

Questions?

Thank you for your attention!

References

Lui, P. P. (2019). College alcohol beliefs: Measurement invariance, mean differences, and correlations with alcohol use outcomes across sociodemographic groups. Journal of Counseling Psychology, 66(4), 487–495. https://doi.org/10.1037/cou0000338
Nye, C. D., Bradburn, J., Olenick, J., Bialko, C., & Drasgow, F. (2019). How big are my effects? Examining the magnitude of effect sizes in studies of measurement equivalence. Organizational Research Methods, 22(3), 678–709. https://doi.org/10.1177/1094428118761122
Nye, C. D., & Drasgow, F. (2011). Effect size indices for analyses of measurement equivalence: Understanding the practical importance of differences between groups. Journal of Applied Psychology, 96, 966–980. https://doi.org/10.1037/a0022955
Osberg, T. M., Atkins, L., Buchholz, L., Shirshova, V., Swiantek, A., Whitley, J., Hartman, S., & Oquendo, N. (2010). Development and validation of the College Life Alcohol Salience Scale: A measure of beliefs about the role of alcohol in college life. Psychology of Addictive Behaviors, 24(1), 1–12. https://doi.org/10.1037/a0018197
Zhang, G. (2022). A systematic review of measurement invariance research of the CES-D scale across gender [Unpublished master’s thesis, University of Southern California]. University of Southern California. https://doi.org/10.25549/usctheses-oUC111375873