Structural Equation Modeling | Lab Session 2
1 What we are going to cover
- MIMIC models
- Mediation Analysis
2 Data
The data set used throughout is the European Social Survey ESS4-2008 Edition 4.5 was released on 1 December 2018. We will restrict the analysis to the Belgian case. Each line in the data set represents a Belgian respondent. The full dataset an documentation can be found on the ESS website
Codebook:
gvslvol Standard of living for the old, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
gvslvue Standard of living for the unemployed, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
gvhlthc Health care for the sick, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
gvcldcr Child care services for working parents, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
gvjbevn Job for everyone, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
gvpdlwk Paid leave from work to care for sick family, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
sbstrec Social benefits/services place too great strain on economy (1 Agree strongly - 5 Disagree strongly)
sbbsntx Social benefits/services cost businesses too much in taxes/charges (1 Agree strongly - 5 Disagree strongly)
sbprvpv Social benefits/services prevent widespread poverty (1 Agree strongly - 5 Disagree strongly)
sbeqsoc Social benefits/services lead to a more equal society (1 Agree strongly - 5 Disagree strongly)
sbcwkfm Social benefits/services make it easier to combine work and family (1 Agree strongly - 5 Disagree strongly)
sblazy Social benefits/services make people lazy (1 Agree strongly - 5 Disagree strongly)
sblwcoa Social benefits/services make people less willing care for one another (1 Agree strongly - 5 Disagree strongly)
sblwlka Social benefits/services make people less willing look after themselves/family (1 Agree strongly - 5 Disagree strongly)
In addition, we will use some other variables
agea Respondent’s age
eduyrs Years of full-time education completed
gndr Gender (1 Male, 2 Female)
hinctnta Household’s total net income, all sources (Deciles of the actual household income range in Belgium)
gincdif Government should reduce differences in income levels (1 Agree strongly - 5 Disagree strongly)
dfincac Large differences in income acceptable to reward talents and efforts (1 Agree strongly - 5 Disagree strongly)
smdfslv For fair society, differences in standard of living should be small (1 Agree strongly - 5 Disagree strongly)
3 Environment preparation
First, let’s load the necessary packages to load, manipulate, visualize and analyse the data.
# Uncomment this once if you need to install the packages on your system
### DATA MANIPULATION ###
# install.packages("haven") # data import from spss
# install.packages("dplyr") # data manipulation
# install.packages("psych") # descriptives
# install.packages("stringr") # string manipulation
# ### MODELING ###
# install.packages("lavaan") # SEM modelling
# ### VISUALIZATION ###
# install.packages("tidySEM") # plotting SEM models
# Load the packages
### DATA MANIPULATION ###
library("haven")
library("dplyr")
library("psych")
library('stringr')
### MODELING ###
library("lavaan")
### VISUALIZATION ###
library("tidySEM")
4 Data exploration
It is a good practice to check that everything is in order and make sense of the data that we are going to analyse. Since we addedd few variables to the dataset, we will check that eveything is in order.
<- haven::read_sav("https://github.com/albertostefanelli/SEM_labs/raw/master/data/ESS4_belgium.sav")
ess_df
<- ess_df %>% select(
ess_df_selected ## Egalitarianism ##
gincdif,
dfincac,
smdfslv,## Demographics ##
agea,
eduyrs,
gndr,
hinctnta
)
<- as.data.frame(psych::describe(ess_df_selected))
descriptive_ess
<- dplyr::select(descriptive_ess,
descriptive_ess
n,
mean,
sd,
median,
min,
max,
skew,
kurtosis)
descriptive_ess
n mean sd median min max skew kurtosis
gincdif 1751 2.233010 1.0590918 2 1 5 0.73154748 -0.2460184
dfincac 1756 2.625854 1.0544131 2 1 5 0.50378434 -0.5730429
smdfslv 1752 2.472603 0.9744553 2 1 5 0.65984232 -0.1917674
agea 1760 46.456818 18.7300429 46 15 105 0.20358225 -0.8100249
eduyrs 1759 12.666856 3.6579256 12 0 30 0.01431515 0.7436964
gndr 1760 1.509091 0.5000594 2 1 2 -0.03633866 -1.9998148
hinctnta 1567 7.456924 2.3668693 8 1 10 -0.70485395 -0.5743999
Q: Is everything ok ?
5 MIMIC model
In the previous lab, we tested the validity of our measurement model. Now that we are more confident that our measurement model is valid, we can apply our theoretical knowledge and test some simple hypotheses. We hypothesise that respondents’ structural characteristics influence their support of welfare state. These type of models are called MIMIC models and stands for “Multiple Indicators, Multiple Causes.” Typically, the measurement model is developed first (as we did in the first lab), after which covariates are added.
Simple example:
- one latent factor measured by 3 indicators (“Welfare Support”)
- influenced by 2 causes (gender and education)
<-'welf_supp =~ gvslvol + gvslvue + gvhlthc
model_ws_mimic welf_supp ~ gndr + eduyrs
'
<- cfa(model_ws_mimic, # model formula
fit_ws_mimic data=ess_df # data frame
)
summary(fit_ws_mimic, standardized=TRUE)
lavaan 0.6-10 ended normally after 30 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 8
Used Total
Number of observations 1751 1760
Model Test User Model:
Test statistic 31.016
Degrees of freedom 4
P-value (Chi-square) 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
welf_supp =~
gvslvol 1.000 1.227 0.829
gvslvue 0.599 0.048 12.514 0.000 0.735 0.383
gvhlthc 0.896 0.062 14.391 0.000 1.100 0.748
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
welf_supp ~
gndr 0.013 0.066 0.197 0.844 0.011 0.005
eduyrs 0.003 0.009 0.373 0.709 0.003 0.010
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.gvslvol 0.688 0.102 6.760 0.000 0.688 0.313
.gvslvue 3.133 0.112 28.052 0.000 3.133 0.853
.gvhlthc 0.954 0.086 11.103 0.000 0.954 0.441
.welf_supp 1.507 0.122 12.395 0.000 1.000 1.000
Nothing is significant. Neither gender nor education seem to significantly impact individual support for the welfare state.
6 Mediation analysis
Mediation analysis (or path analysis) tests whether the relationship between two variables is explained by a third intermediate variable. It can have a casual interpretation such as the extent to which a variable (mediator) participates in the transmittance of change from a cause to its effect. In empirical applications, you will see both interpretations used. However, many equivalent models can fit the data so be careful when using mediation analysis. If you want to know more, check Sacha Epskamp presentation on causality and equivalent models.
Consider a classical mediation setup with three variables:
- Y is the dependent variable (Welfare support)
- X is the predictor (Income)
- M is a mediator (Egalitarianism)
This results in different paths
- a path: Test whether X and M are significantly associated
- b path: Test whether M and Y are significantly associated
- c path: Test whether X and Y are significantly associated (Direct Effect)
- c’ path: Test whether Y from X are significantly associated after controlling for M (Indirect Effect). This is usually called “the amount of mediation.”
Note that the Total Effect is equal to Direct Effect + Indirect Effect or \(c= ab +c'\)
<- '
model_mediation ## Welfare Support Factor ##
welf_supp =~ gvslvol + gvslvue + gvhlthc
## Egalitarianism ##
egual =~ gincdif + dfincac + smdfslv
## Direct effect ##
welf_supp ~ c*hinctnta
## Mediator ##
egual ~ a*hinctnta
welf_supp ~ b*egual
## Indirect effect (a*b) ##
ab := a*b
## Total effect ##
total := c + (a*b)
'
<- cfa(model_mediation, # model formula
fit_mediation data=ess_df # data frame
)
summary(fit_mediation, standardized=TRUE)
lavaan 0.6-10 ended normally after 38 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 15
Used Total
Number of observations 1552 1760
Model Test User Model:
Test statistic 45.534
Degrees of freedom 12
P-value (Chi-square) 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
welf_supp =~
gvslvol 1.000 1.275 0.868
gvslvue 0.575 0.049 11.838 0.000 0.733 0.379
gvhlthc 0.827 0.056 14.868 0.000 1.055 0.726
egual =~
gincdif 1.000 0.687 0.649
dfincac -0.610 0.061 -10.068 0.000 -0.419 -0.396
smdfslv 0.880 0.082 10.757 0.000 0.605 0.620
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
welf_supp ~
hinctnta (c) 0.009 0.016 0.584 0.559 0.007 0.017
egual ~
hinctnta (a) 0.057 0.010 5.853 0.000 0.083 0.196
welf_supp ~
egual (b) -0.488 0.074 -6.556 0.000 -0.263 -0.263
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.gvslvol 0.534 0.103 5.155 0.000 0.534 0.247
.gvslvue 3.208 0.121 26.554 0.000 3.208 0.857
.gvhlthc 1.000 0.078 12.748 0.000 1.000 0.473
.gincdif 0.650 0.048 13.436 0.000 0.650 0.579
.dfincac 0.945 0.038 24.543 0.000 0.945 0.843
.smdfslv 0.585 0.039 14.976 0.000 0.585 0.616
.welf_supp 1.515 0.122 12.388 0.000 0.932 0.932
.egual 0.454 0.052 8.742 0.000 0.962 0.962
Defined Parameters:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
ab -0.028 0.006 -4.566 0.000 -0.022 -0.051
total -0.019 0.015 -1.227 0.220 -0.015 -0.035
The indirect effect is significant and negative. We can say that egalitarianism mediates the effect between income and welfare support. However, the total effect is still not significant.
Let’s plot our model to get better grasp of what it is happening.
# let's organize our plot on 4 rows
# this help our readers by having a more comprehensible plot
<- get_layout(
lay "gincdif", "dfincac", "smdfslv", "",
"", "egual", "", "",
"hinctnta", "", "welf_supp", "",
"", "gvslvol", "gvslvue", "gvhlthc",
rows = 4)
<- graph_sem(model = fit_mediation, # model fit
plot_mediation layout = lay, # layout
angle = 170 # adjust the arrows
#label = "est_std" # get standardized results (not rounded)
)
plot_mediation
Q: The path between welfare support and income (direct effect) is not significant. Nor the total effect. What does that mean ?
Zhao, Lynch and Chen (2010) classify mediation effects as following:
- Complementary mediation: Mediated effect (a x b) and direct effect (c) both exist and point at the same direction.
- Competitive mediation: Mediated effect (a x b) and direct effect (c) both exist and point in opposite directions.
- Indirect-only mediation: Mediated effect (a x b) exists, but no direct effect (c).
- Direct-only non-mediation: Direct effect (c) exists, but no indirect effect.
- No-effect non-mediation: Nether direct effect (c), nor indirect effect exists.