Unit-Specific vs. Population-Average Models

Author

Mark Lai

Published

December 28, 2020

One thing that I always felt uncomfortable in multilevel modeling (MLM) is the concept of a unit-specific (US)/subject-specific model vs. a population-average (PA) model. I’ve come across it several times, but for some reason I haven’t really made an effort to fully understand it. I happened to come across this paper by Harring and Blozis, which I read before, and think that why not try to really understand the relationship between the coefficient estimates in a US model and in a PA model in the context of generalized linear mixed-effect model (GLMM). So I have this learning note.

library(tidyverse)
library(modelsummary)
library(glmmTMB)
Warning in checkDepPackageVersion(dep_pkg = "TMB"): Package version inconsistency detected.
glmmTMB was built with TMB version 1.9.6
Current TMB version is 1.9.10
Please re-install glmmTMB from source or restore original 'TMB' package (see '?reinstalling' for more information)
library(geepack)

While MLM/GLMM is a US model, which models the associations between predictors and the outcome for each cluster, PA models are popular in some areas of research, with the popular method of the generalized estimating equation (GEE). Whereas the fixed effect coefficients in US are the same as the coefficients in PA in linear models, when it comes to generalized linear models with nonlinear link functions, the coefficients are not the same. This is because some of the generalized linear models typically assume constant variance on the latent continuous response variable. For example, in a single-level logistic model and a GEE model, the latent response Y has a variance of π2/3, but in a two-level model, the variance is π2/3+τ02. Because the coefficients are in the unit of the latent response, it means that the coefficients are on different units for US vs. PA. But how are they different? I will explore four link functions: identity, log, probit, and logit. But first, some notations.

Model Notations

While in actual modeling, the distributional assumptions of the response variables are important (e.g., normal, Poisson), the comparison of US vs. PA mainly concerns the mean of the outcome and the link function. For all models, the random effects are normally distributed.

Conditional (US) Model

E(yij|uj)=μijh(μij)=xijγ+zijuj where h() is the link function, xij and zij are the fixed and random covariates for the ith person in the jth cluster. The distributional assumption is ujNq(0,G)

Marginal (PA) Model

Now one is modeling the marginal mean:

E(yij)=E[E(yij|μij)]=μijPAh(μijPA)=xijγPA The above two formulas can be used to find the transformation from the unit-specific coefficients, γ, to the population-average coefficients, γPA.

Footnotes

  1. Snijders & Bosker (2012), chapter 17.↩︎

  2. https://www.johndcook.com/blog/2010/05/18/normal-approximation-to-logistic/↩︎