Structural Equation Modeling | Exercises 2
1 What we are going to cover
- Ex.1 – MIMIC model
- Ex.2 – Mediation Analysis
2 Data
The data set used throughout is the European Social Survey ESS4-2008 Edition 4.5 was released on 1 December 2018. We will restrict the analysis to the Belgian case. Each line in the data set represents a Belgian respondent. The full dataset an documentation can be found on the ESS website
Codebook:
gvslvol Standard of living for the old, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
gvslvue Standard of living for the unemployed, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
gvhlthc Health care for the sick, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
gvcldcr Child care services for working parents, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
gvjbevn Job for everyone, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
gvpdlwk Paid leave from work to care for sick family, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
sbstrec Social benefits/services place too great strain on economy (1 Agree strongly - 5 Disagree strongly)
sbbsntx Social benefits/services cost businesses too much in taxes/charges (1 Agree strongly - 5 Disagree strongly)
sbprvpv Social benefits/services prevent widespread poverty (1 Agree strongly - 5 Disagree strongly)
sbeqsoc Social benefits/services lead to a more equal society (1 Agree strongly - 5 Disagree strongly)
sbcwkfm Social benefits/services make it easier to combine work and family (1 Agree strongly - 5 Disagree strongly)
sblazy Social benefits/services make people lazy (1 Agree strongly - 5 Disagree strongly)
sblwcoa Social benefits/services make people less willing care for one another (1 Agree strongly - 5 Disagree strongly)
sblwlka Social benefits/services make people less willing look after themselves/family (1 Agree strongly - 5 Disagree strongly)
In addition, we will use some other variables
agea Respondent’s age
eduyrs Years of full-time education completed
gndr Gender (1 Male, 2 Female)
hinctnta Household’s total net income, all sources (Deciles of the actual household income range in Belgium)
gincdif Government should reduce differences in income levels (1 Agree strongly - 5 Disagree strongly)
dfincac Large differences in income acceptable to reward talents and efforts (1 Agree strongly - 5 Disagree strongly)
smdfslv For fair society, differences in standard of living should be small (1 Agree strongly - 5 Disagree strongly)
3 Environment preparation
First, let’s load the necessary packages to load, manipulate, visualize and analyse the data.
# Uncomment this once if you need to install the packages on your system
### DATA MANIPULATION ###
# install.packages("haven") # data import from spss
# install.packages("dplyr") # data manipulation
# install.packages("psych") # descriptives
# install.packages("stringr") # string manipulation
### MODELING ###
# install.packages("lavaan") # SEM modelling
### VISUALIZATION ###
# install.packages("tidySEM") # plotting SEM models
# Load the packages
### DATA MANIPULATION ###
library("haven")
library("dplyr")
library("psych")
library('stringr')
### MODELING ###
library("lavaan")
### VISUALIZATION ###
library("tidySEM")
4 Ex.1 - MIMIC model
- Fit the a measurement model with both welfare support and egalitarianism and review the fit statistics. Also review the parameter estimates. How close are they to the population values?
- Predict welfare support by adding the gender covariate and assess the changes in fit statistics and parameter estimates.
- Add in succession age, income, and education and again see if fit statistics or parameter estimates change.
- Is there a trend for the fit statistics to change (improve or deteriorate) when including covariates?
- Calculate the degrees of freedom for each of these models manually. Explain the changes in degrees of freedom.
- Create an R function for the degrees of freedom calculations (OPTIONAL)
<- haven::read_sav("https://github.com/albertostefanelli/SEM_labs/raw/master/data/ESS4_belgium.sav")
ess_df
<- '
model_ws_eg ## Welfare Support Factor ##
welf_supp =~ gvslvol + gvslvue + gvhlthc
## Egalitarianism ##
egual =~ gincdif + dfincac + smdfslv
'
<- cfa(model_ws_eg, # model formula
fit_ws_eg data=ess_df # data frame
)
summary(fit_ws_eg, standardized=TRUE)
lavaan 0.6-10 ended normally after 36 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 13
Used Total
Number of observations 1735 1760
Model Test User Model:
Test statistic 40.125
Degrees of freedom 8
P-value (Chi-square) 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
welf_supp =~
gvslvol 1.000 1.250 0.845
gvslvue 0.580 0.046 12.504 0.000 0.725 0.380
gvhlthc 0.858 0.056 15.432 0.000 1.072 0.729
egual =~
gincdif 1.000 0.684 0.645
dfincac -0.622 0.059 -10.481 0.000 -0.426 -0.404
smdfslv 0.877 0.082 10.731 0.000 0.600 0.615
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
welf_supp ~~
egual -0.225 0.032 -6.986 0.000 -0.263 -0.263
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.gvslvol 0.624 0.097 6.438 0.000 0.624 0.285
.gvslvue 3.126 0.112 27.986 0.000 3.126 0.856
.gvhlthc 1.013 0.078 13.040 0.000 1.013 0.468
.gincdif 0.656 0.048 13.696 0.000 0.656 0.584
.dfincac 0.928 0.036 25.612 0.000 0.928 0.837
.smdfslv 0.592 0.038 15.421 0.000 0.592 0.622
welf_supp 1.563 0.118 13.211 0.000 1.000 1.000
egual 0.468 0.053 8.913 0.000 1.000 1.000
<- '
model_ws_eg_g ## Welfare Support Factor ##
welf_supp =~ gvslvol + gvslvue + gvhlthc
## Egalitarianism ##
egual =~ gincdif + dfincac + smdfslv
welf_supp ~ gndr
'
<- cfa(model_ws_eg_g, # model formula
fit_ws_eg_g data=ess_df # data frame
)graph_sem(fit_ws_eg_g, standardized=TRUE)
<- '
model_ws_eg_ga ## Welfare Support Factor ##
welf_supp =~ gvslvol + gvslvue + gvhlthc
## Egalitarianism ##
egual =~ gincdif + dfincac + smdfslv
welf_supp ~ gndr + agea
'
<- cfa(model_ws_eg_ga, # model formula
fit_ws_eg_ga data=ess_df # data frame
)
<- '
model_ws_eg_gai ## Welfare Support Factor ##
welf_supp =~ gvslvol + gvslvue + gvhlthc
## Egalitarianism ##
egual =~ gincdif + dfincac + smdfslv
welf_supp ~ gndr + agea + hinctnta
'
<- cfa(model_ws_eg_gai, # model formula
fit_ws_eg_gai data=ess_df # data frame
)
<- '
model_ws_eg_gaie ## Welfare Support Factor ##
welf_supp =~ gvslvol + gvslvue + gvhlthc
## Egalitarianism ##
egual =~ gincdif + dfincac + smdfslv
welf_supp ~ gndr + agea + hinctnta + eduyrs
'
<- cfa(model_ws_eg_gaie, # model formula
fit_ws_eg_gaie data=ess_df # data frame
)
<- fitMeasures(fit_ws_eg, c("logl",
fitm_model_ws_eg "AIC",
"BIC",
"chisq",
"df",
"pvalue",
"cfi",
"tli",
"rmsea"),
output = "matrix")
<- fitMeasures(fit_ws_eg_g, c("logl",
fitm_model_ws_eg_g "AIC",
"BIC",
"chisq",
"df",
"pvalue",
"cfi",
"tli",
"rmsea"),
output = "matrix")
<- fitMeasures(fit_ws_eg_ga, c("logl",
fitm_model_ws_eg_ga "AIC",
"BIC",
"chisq",
"df",
"pvalue",
"cfi",
"tli",
"rmsea"),
output = "matrix")
<- fitMeasures(fit_ws_eg_gai, c("logl",
fitm_model_ws_eg_gai "AIC",
"BIC",
"chisq",
"df",
"pvalue",
"cfi",
"tli",
"rmsea"),
output = "matrix")
<- fitMeasures(fit_ws_eg_gaie, c("logl",
fitm_model_ws_eg_gaie "AIC",
"BIC",
"chisq",
"df",
"pvalue",
"cfi",
"tli",
"rmsea"),
output = "matrix")
data.frame(Fit=rownames(fitm_model_ws_eg),
"Measurement" = round(fitm_model_ws_eg[,1],2),
"plus_gender" = round(fitm_model_ws_eg_g[,1],2),
"plus_age" = round(fitm_model_ws_eg_ga[,1],2),
"plus_income" = round(fitm_model_ws_eg_gai[,1],2),
"plus_education" = round(fitm_model_ws_eg_gaie[,1],2)
)
Fit Measurement plus_gender plus_age plus_income plus_education
logl logl -16610.17 -16638.80 -16628.25 -14843.65 -14831.68
aic aic 33246.35 33303.60 33284.50 29717.30 29695.35
bic bic 33317.31 33374.57 33360.93 29797.51 29780.90
chisq chisq 40.13 114.35 165.95 179.85 218.88
df df 8.00 14.00 19.00 24.00 29.00
pvalue pvalue 0.00 0.00 0.00 0.00 0.00
cfi cfi 0.98 0.94 0.91 0.90 0.88
tli tli 0.96 0.91 0.87 0.86 0.84
rmsea rmsea 0.05 0.06 0.07 0.06 0.06
Degrees of freedom calculation:
- pieces of information (observed variables) = \(p(p+1)/2=21\)
- measurement part = n factor loadings + n residual variances.
- structural part = n regressions + n residual factor variance.
- exogenous variables = n variances + n covariances.
- df = pieces of information - (measurement part + structural part + exogenous variables).
# let's build a function to calculate the df for our models
<- function(obs_var=obs_var, # n observed variables
degres_of_freedom loads=loads, # n estimated factors loadings
regression=regression, # n regression
latent_var=latent_var, # n latent variables
covariance_latent=TRUE, # covariance latent vars
exogenous=exogenous){ # n exogenous
# next function returns n combination of 2 or more variables
# we use it to calculate the n covariance of the exogenous variables
= function(n, x) {
comb if(n==0){return(0)} # if we have no exogenous variables return 0
if(n==1){return(0)} # if we have 1 exogenous variable return 0
else{
factorial(n) / factorial(n-x) / factorial(x)}
}
# 1. pieces of information (observed variables)
<- obs_var*(obs_var + 1)/2
info #2. measurement part = n factor loadings + n residual variances.
<- (loads) + obs_var
measurement #3. structural part = n regressions + n residual factor variance.
<- ifelse(regression>0,0,regression) + latent_var
structural # 4. exogenous variables = n variances + n covariances.
# first, let's calculate the covariances
<- NA
cov # lavaan removes by default the LV covariance when you include a regression
# this code takes into account lavaan behaviour
if(covariance_latent==TRUE){
<-comb(latent_var, 2)}else{cov<-0}
cov# second, let's get the n variances and n covariances for our exogenous vars.
<- exogenous + comb(exogenous, 2)
exogenous_var # finally, let's calculate the df
= info - (measurement + structural + exogenous_var+ cov)
df return(df)
}
<- degres_of_freedom(obs_var=6,
df_model_ws_eg loads=4,
regression=0,
latent_var=2,
covariance_latent=TRUE,
exogenous=0)
<- degres_of_freedom(obs_var=7,
df_model_ws_eg_g loads=4,
regression=1,
latent_var=2,
covariance_latent=FALSE,
exogenous=1)
<- degres_of_freedom(obs_var=8,
df_model_ws_eg_ga loads=4,
regression=2,
latent_var=2,
covariance_latent=FALSE,
exogenous=2)
<- degres_of_freedom(obs_var=9,
df_model_ws_eg_gai loads=4,
regression=3,
latent_var=2,
covariance_latent=FALSE,
exogenous=3)
<- degres_of_freedom(obs_var=10,
df_model_ws_eg_gaie loads=4,
regression=4,
latent_var=2,
covariance_latent=FALSE,
exogenous=4)
data.frame(Models = c("ws_eg","ws_eg_g","ws_eg_ga","ws_eg_gai","ws_eg_gaie"),
df=c(df_model_ws_eg,
df_model_ws_eg_g,
df_model_ws_eg_ga,
df_model_ws_eg_gai, df_model_ws_eg_gaie))
Models df
1 ws_eg 8
2 ws_eg_g 14
3 ws_eg_ga 19
4 ws_eg_gai 24
5 ws_eg_gaie 29
5 Ex.2 – Mediation Analysis
- Fit a model where egalitarianism (gincdif, dfincac, smdfslv) mediates the relationship between age (agea), education (eduyrs), gender (gndr), income (hinctnta) and welfare support (gvslvol, gvslvue, gvhlthc)
- Assess if the model fits the data well
- Inspect the regression parameters and the R-squared values of the latent variables.
- Manually calculate the indirect effect of age via egalitarianism \(a \times b\) and its standard error (\(\sigma_{a \times b}\)) using Sobel’s formula:
- Indirect effect = \(a \times b\).
- \(\sigma_{a \times b} = \sqrt{b^2 \times \sigma_{a}^2 + a^2 \times \sigma_{b}^2}\).
- Manually calculate the total effect of age \(c' + (a \times b)\) on welfare support
- Ask lavaan to calculate these indirect and total effects. Compare to your own solution
- Interpret the results. Does egalitarianism mediates age? Can you interpret this result causally ?
<- '
model_mediation ## Welfare Support Factor ##
welf_supp =~ gvslvol + gvslvue + gvhlthc
## Egalitarianism ##
egual =~ gincdif + dfincac + smdfslv
## Direct effect(s) ##
welf_supp ~ c*agea + c1*eduyrs + c2*hinctnta + c3*gndr
## Mediator ##
# Path A
egual ~ a*agea
egual ~ a1*eduyrs
egual ~ a2*hinctnta
egual ~ a3*gndr
# Path B
welf_supp ~ b*egual
## Indirect effect (a*b) ##
ab_age := a*b
a1b_edu := a1*b
a2b_inco := a2*b
a3b_gndr := a3*b
## Total effect ##
total_age := c + (a*b)
total1_edu := c1 + (a1*b)
total2_inco := c2 + (a2*b)
total3_gndr := c3 + (a3*b)
'
<- cfa(model_mediation, # model formula
fit_mediation data=ess_df # data frame
)
# Fit indices
<- fitMeasures(fit_mediation, c("logl",
fitm_model_mediation"AIC",
"BIC",
"chisq",
"df",
"pvalue",
"cfi",
"tli",
"rmsea"),
output = "matrix")
fitm_model_mediation
logl -14776.024
aic 29594.048
bic 29706.328
chisq 107.575
df 24.000
pvalue 0.000
cfi 0.948
tli 0.915
rmsea 0.047
The model fits the data fairly well. Let’s now move to regression estimates and \(R^2\).
<- table_results(fit_mediation,
tidy_results columns = c("label",
"est_sig"),
digits = 2,
)
%>% filter(str_detect(label,
tidy_results "welf_supp.ON|egual.ON") # | means OR
)
label est_sig
1 welf_supp.ON.agea 0.01***
2 welf_supp.ON.eduyrs 0.02*
3 welf_supp.ON.hinctnta 0.01
4 welf_supp.ON.gndr -0.03
5 egual.ON.agea -0.00*
6 egual.ON.eduyrs 0.02***
7 egual.ON.hinctnta 0.03***
8 egual.ON.gndr -0.09*
9 welf_supp.ON.egual -0.49***
<- round(inspect(fit_mediation,"r2"),3)
r2 r2
gvslvol gvslvue gvhlthc gincdif dfincac smdfslv welf_supp egual
0.775 0.142 0.511 0.416 0.151 0.395 0.084 0.066
Remember that the \(R^2\) is calculated as 1 minus the ratio of residual variance (appearing in the parameter estimates, although it might be a fixed value) to the model-implied total variance of that variable. In other words, the \(R^2\) is the proportion of variance of the latent response that was explained by the predictors.
Next, let’s calculate the direct and indirect effects manually
## Coefficients ##
<- inspect(fit_mediation,"est")$beta["egual","agea"]
a <- inspect(fit_mediation,"est")$beta["welf_supp","egual"]
b ## Standard Errors ##
<- inspect(fit_mediation,"se")$beta["egual","agea"]
sigma_a <- inspect(fit_mediation,"se")$beta["welf_supp","egual"]
sigma_b ## Manual calculation of indirect effect a x b ##
* b a
[1] 0.001457582
## Manual calculation of the sigma a x b ##
sqrt(a^2*sigma_b^2 + b^2*sigma_a^2)
[1] 0.0006588107
## Manual calculation of the total effect (a x b) + c' ##
* b + inspect(fit_mediation,"est")$beta["welf_supp","agea"] a
[1] 0.01047894
Let’s use tidy SEM to extract the direct and indirect effects calculated by lavaan
<- table_results(fit_mediation,
tidy_results columns = c("label",
"est_sig", # beta
"se" # sigma
),digits = 2,
)
%>% filter(str_detect(label,
tidy_results "ab_age|total_age") # | means OR
)
label est_sig se
1 ab_age.:=.a*b 0.00* 0.00
2 total_age.:=.c+(a*b) 0.01*** 0.00
Unsurprisingly, they are the same as the one that we calculated manually.