Structural Equation Modeling | Lab Session 2

1 What we are going to cover

  1. MIMIC models
  2. Mediation Analysis

2 Data

The data set used throughout is the European Social Survey ESS4-2008 Edition 4.5 was released on 1 December 2018. We will restrict the analysis to the Belgian case. Each line in the data set represents a Belgian respondent. The full dataset an documentation can be found on the ESS website

Codebook:

  • gvslvol Standard of living for the old, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)

  • gvslvue Standard of living for the unemployed, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)

  • gvhlthc Health care for the sick, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)

  • gvcldcr Child care services for working parents, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)

  • gvjbevn Job for everyone, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)

  • gvpdlwk Paid leave from work to care for sick family, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)

  • sbstrec Social benefits/services place too great strain on economy (1 Agree strongly - 5 Disagree strongly)

  • sbbsntx Social benefits/services cost businesses too much in taxes/charges (1 Agree strongly - 5 Disagree strongly)

  • sbprvpv Social benefits/services prevent widespread poverty (1 Agree strongly - 5 Disagree strongly)

  • sbeqsoc Social benefits/services lead to a more equal society (1 Agree strongly - 5 Disagree strongly)

  • sbcwkfm Social benefits/services make it easier to combine work and family (1 Agree strongly - 5 Disagree strongly)

  • sblazy Social benefits/services make people lazy (1 Agree strongly - 5 Disagree strongly)

  • sblwcoa Social benefits/services make people less willing care for one another (1 Agree strongly - 5 Disagree strongly)

  • sblwlka Social benefits/services make people less willing look after themselves/family (1 Agree strongly - 5 Disagree strongly)

In addition, we will use some other variables

  • agea Respondent’s age

  • eduyrs Years of full-time education completed

  • gndr Gender (1 Male, 2 Female)

  • hinctnta Household’s total net income, all sources (Deciles of the actual household income range in Belgium)

  • gincdif Government should reduce differences in income levels (1 Agree strongly - 5 Disagree strongly)

  • dfincac Large differences in income acceptable to reward talents and efforts (1 Agree strongly - 5 Disagree strongly)

  • smdfslv For fair society, differences in standard of living should be small (1 Agree strongly - 5 Disagree strongly)

3 Environment preparation

First, let’s load the necessary packages to load, manipulate, visualize and analyse the data.

# Uncomment this once if you need to install the packages on your system 

### DATA MANIPULATION ###
# install.packages("haven")                 # data import from spss
# install.packages("dplyr")                 # data manipulation
# install.packages("psych")                 # descriptives
# install.packages("stringr")               # string manipulation

# ### MODELING ###
# install.packages("lavaan")                # SEM modelling

# ### VISUALIZATION ###
# install.packages("tidySEM")               # plotting SEM models

# Load the packages 

### DATA MANIPULATION ###
library("haven")        
library("dplyr")      
library("psych")
library('stringr')

### MODELING ###
library("lavaan")       

### VISUALIZATION ###
library("tidySEM")

4 Data exploration

It is a good practice to check that everything is in order and make sense of the data that we are going to analyse. Since we addedd few variables to the dataset, we will check that eveything is in order.

ess_df <- haven::read_sav("https://github.com/albertostefanelli/SEM_labs/raw/master/data/ESS4_belgium.sav")


ess_df_selected <- ess_df %>% select(
                  ## Egalitarianism ##
                  gincdif,
                  dfincac,
                  smdfslv,
                  ## Demographics ##
                  agea, 
                  eduyrs,
                  gndr,
                  hinctnta


)

descriptive_ess <- as.data.frame(psych::describe(ess_df_selected))

descriptive_ess <- dplyr::select(descriptive_ess, 
  n,
  mean,
  sd,
  median,
  min,
  max,
  skew,
  kurtosis)

descriptive_ess
            n      mean         sd median min max        skew   kurtosis
gincdif  1751  2.233010  1.0590918      2   1   5  0.73154748 -0.2460184
dfincac  1756  2.625854  1.0544131      2   1   5  0.50378434 -0.5730429
smdfslv  1752  2.472603  0.9744553      2   1   5  0.65984232 -0.1917674
agea     1760 46.456818 18.7300429     46  15 105  0.20358225 -0.8100249
eduyrs   1759 12.666856  3.6579256     12   0  30  0.01431515  0.7436964
gndr     1760  1.509091  0.5000594      2   1   2 -0.03633866 -1.9998148
hinctnta 1567  7.456924  2.3668693      8   1  10 -0.70485395 -0.5743999

Q: Is everything ok ?

5 MIMIC model

In the previous lab, we tested the validity of our measurement model. Now that we are more confident that our measurement model is valid, we can apply our theoretical knowledge and test some simple hypotheses. We hypothesise that respondents’ structural characteristics influence their support of welfare state. These type of models are called MIMIC models and stands for “Multiple Indicators, Multiple Causes.” Typically, the measurement model is developed first (as we did in the first lab), after which covariates are added.

Simple example:

  • one latent factor measured by 3 indicators (“Welfare Support”)
  • influenced by 2 causes (gender and education)
model_ws_mimic <-'welf_supp =~ gvslvol + gvslvue + gvhlthc
welf_supp ~ gndr + eduyrs
'

fit_ws_mimic <- cfa(model_ws_mimic, # model formula
                   data=ess_df      # data frame
  )

summary(fit_ws_mimic, standardized=TRUE)
lavaan 0.6-10 ended normally after 30 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         8
                                                      
                                                  Used       Total
  Number of observations                          1751        1760
                                                                  
Model Test User Model:
                                                      
  Test statistic                                31.016
  Degrees of freedom                                 4
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  welf_supp =~                                                          
    gvslvol           1.000                               1.227    0.829
    gvslvue           0.599    0.048   12.514    0.000    0.735    0.383
    gvhlthc           0.896    0.062   14.391    0.000    1.100    0.748

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  welf_supp ~                                                           
    gndr              0.013    0.066    0.197    0.844    0.011    0.005
    eduyrs            0.003    0.009    0.373    0.709    0.003    0.010

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .gvslvol           0.688    0.102    6.760    0.000    0.688    0.313
   .gvslvue           3.133    0.112   28.052    0.000    3.133    0.853
   .gvhlthc           0.954    0.086   11.103    0.000    0.954    0.441
   .welf_supp         1.507    0.122   12.395    0.000    1.000    1.000

Nothing is significant. Neither gender nor education seem to significantly impact individual support for the welfare state.

6 Mediation analysis

Mediation analysis (or path analysis) tests whether the relationship between two variables is explained by a third intermediate variable. It can have a casual interpretation such as the extent to which a variable (mediator) participates in the transmittance of change from a cause to its effect. In empirical applications, you will see both interpretations used. However, many equivalent models can fit the data so be careful when using mediation analysis. If you want to know more, check Sacha Epskamp presentation on causality and equivalent models.

Consider a classical mediation setup with three variables:

  • Y is the dependent variable (Welfare support)
  • X is the predictor (Income)
  • M is a mediator (Egalitarianism)

This results in different paths

  1. a path: Test whether X and M are significantly associated
  2. b path: Test whether M and Y are significantly associated
  3. c path: Test whether X and Y are significantly associated (Direct Effect)
  4. c’ path: Test whether Y from X are significantly associated after controlling for M (Indirect Effect). This is usually called “the amount of mediation.”

Note that the Total Effect is equal to Direct Effect + Indirect Effect or \(c= ab +c'\)

model_mediation <- '
## Welfare Support Factor ##
welf_supp =~ gvslvol + gvslvue + gvhlthc

## Egalitarianism ##
egual =~  gincdif + dfincac + smdfslv

## Direct effect ##
welf_supp ~ c*hinctnta

## Mediator ##
egual ~ a*hinctnta
welf_supp ~ b*egual

## Indirect effect (a*b) ##
ab := a*b
## Total effect ##
total := c + (a*b)
'

fit_mediation <- cfa(model_mediation, # model formula
           data=ess_df                # data frame
  )

summary(fit_mediation, standardized=TRUE)
lavaan 0.6-10 ended normally after 38 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        15
                                                      
                                                  Used       Total
  Number of observations                          1552        1760
                                                                  
Model Test User Model:
                                                      
  Test statistic                                45.534
  Degrees of freedom                                12
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  welf_supp =~                                                          
    gvslvol           1.000                               1.275    0.868
    gvslvue           0.575    0.049   11.838    0.000    0.733    0.379
    gvhlthc           0.827    0.056   14.868    0.000    1.055    0.726
  egual =~                                                              
    gincdif           1.000                               0.687    0.649
    dfincac          -0.610    0.061  -10.068    0.000   -0.419   -0.396
    smdfslv           0.880    0.082   10.757    0.000    0.605    0.620

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  welf_supp ~                                                           
    hinctnta   (c)    0.009    0.016    0.584    0.559    0.007    0.017
  egual ~                                                               
    hinctnta   (a)    0.057    0.010    5.853    0.000    0.083    0.196
  welf_supp ~                                                           
    egual      (b)   -0.488    0.074   -6.556    0.000   -0.263   -0.263

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .gvslvol           0.534    0.103    5.155    0.000    0.534    0.247
   .gvslvue           3.208    0.121   26.554    0.000    3.208    0.857
   .gvhlthc           1.000    0.078   12.748    0.000    1.000    0.473
   .gincdif           0.650    0.048   13.436    0.000    0.650    0.579
   .dfincac           0.945    0.038   24.543    0.000    0.945    0.843
   .smdfslv           0.585    0.039   14.976    0.000    0.585    0.616
   .welf_supp         1.515    0.122   12.388    0.000    0.932    0.932
   .egual             0.454    0.052    8.742    0.000    0.962    0.962

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    ab               -0.028    0.006   -4.566    0.000   -0.022   -0.051
    total            -0.019    0.015   -1.227    0.220   -0.015   -0.035

The indirect effect is significant and negative. We can say that egalitarianism mediates the effect between income and welfare support. However, the total effect is still not significant.

Let’s plot our model to get better grasp of what it is happening.

# let's organize our plot on 4 rows 
# this help our readers by having a more comprehensible plot

lay <- get_layout(
"gincdif", "dfincac", "smdfslv", "", 
"", "egual", "", "", 
"hinctnta", "", "welf_supp", "", 
"",  "gvslvol", "gvslvue", "gvhlthc",
rows = 4)

plot_mediation <- graph_sem(model = fit_mediation,   # model fit
          layout = lay,        # layout
          angle = 170          # adjust the arrows 
          #label = "est_std"   # get standardized results (not rounded)
          )

plot_mediation

Q: The path between welfare support and income (direct effect) is not significant. Nor the total effect. What does that mean ?

Zhao, Lynch and Chen (2010) classify mediation effects as following:

  • Complementary mediation: Mediated effect (a x b) and direct effect (c) both exist and point at the same direction.
  • Competitive mediation: Mediated effect (a x b) and direct effect (c) both exist and point in opposite directions.
  • Indirect-only mediation: Mediated effect (a x b) exists, but no direct effect (c).
  • Direct-only non-mediation: Direct effect (c) exists, but no indirect effect.
  • No-effect non-mediation: Nether direct effect (c), nor indirect effect exists.

7 !!Support Ukraine!!

References

Zhao, Xinshu, John G Lynch Jr, and Qimei Chen. 2010. “Reconsidering Baron and Kenny: Myths and Truths about Mediation Analysis.” Journal of Consumer Research 37 (2): 197–206.