Structural Equation Modeling | Exercises 2

1 What we are going to cover

Ex.1 – MIMIC model
Ex.2 – Mediation Analysis

2 Data

The data set used throughout is the European Social Survey ESS4-2008 Edition 4.5 was released on 1 December 2018. We will restrict the analysis to the Belgian case. Each line in the data set represents a Belgian respondent. The full dataset an documentation can be found on the ESS website

Codebook:

gvslvol Standard of living for the old, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
gvslvue Standard of living for the unemployed, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
gvhlthc Health care for the sick, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
gvcldcr Child care services for working parents, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
gvjbevn Job for everyone, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
gvpdlwk Paid leave from work to care for sick family, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
sbstrec Social benefits/services place too great strain on economy (1 Agree strongly - 5 Disagree strongly)
sbbsntx Social benefits/services cost businesses too much in taxes/charges (1 Agree strongly - 5 Disagree strongly)
sbprvpv Social benefits/services prevent widespread poverty (1 Agree strongly - 5 Disagree strongly)
sbeqsoc Social benefits/services lead to a more equal society (1 Agree strongly - 5 Disagree strongly)
sbcwkfm Social benefits/services make it easier to combine work and family (1 Agree strongly - 5 Disagree strongly)
sblazy Social benefits/services make people lazy (1 Agree strongly - 5 Disagree strongly)
sblwcoa Social benefits/services make people less willing care for one another (1 Agree strongly - 5 Disagree strongly)
sblwlka Social benefits/services make people less willing look after themselves/family (1 Agree strongly - 5 Disagree strongly)

In addition, we will use some other variables

agea Respondent’s age
eduyrs Years of full-time education completed
gndr Gender (1 Male, 2 Female)
hinctnta Household’s total net income, all sources (Deciles of the actual household income range in Belgium)
gincdif Government should reduce differences in income levels (1 Agree strongly - 5 Disagree strongly)
dfincac Large differences in income acceptable to reward talents and efforts (1 Agree strongly - 5 Disagree strongly)
smdfslv For fair society, differences in standard of living should be small (1 Agree strongly - 5 Disagree strongly)

3 Environment preparation

First, let’s load the necessary packages to load, manipulate, visualize and analyse the data.

# Uncomment this once if you need to install the packages on your system 

### DATA MANIPULATION ###
# install.packages("haven")                 # data import from spss
# install.packages("dplyr")                 # data manipulation
# install.packages("psych")                 # descriptives
# install.packages("stringr")               # string manipulation

### MODELING ###
# install.packages("lavaan")                # SEM modelling

### VISUALIZATION ###
# install.packages("tidySEM")               # plotting SEM models

# Load the packages 

### DATA MANIPULATION ###
library("haven")        
library("dplyr")      
library("psych")
library('stringr')

### MODELING ###
library("lavaan")       

### VISUALIZATION ###
library("tidySEM")

4 Ex.1 - MIMIC model

Fit the a measurement model with both welfare support and egalitarianism and review the fit statistics. Also review the parameter estimates. How close are they to the population values?
Predict welfare support by adding the gender covariate and assess the changes in fit statistics and parameter estimates.
Add in succession age, income, and education and again see if fit statistics or parameter estimates change.
Is there a trend for the fit statistics to change (improve or deteriorate) when including covariates?
Calculate the degrees of freedom for each of these models manually. Explain the changes in degrees of freedom.
Create an R function for the degrees of freedom calculations (OPTIONAL)

ess_df <- haven::read_sav("https://github.com/albertostefanelli/SEM_labs/raw/master/data/ESS4_belgium.sav")

model_ws_eg <- '
## Welfare Support Factor ##
welf_supp =~ gvslvol + gvslvue + gvhlthc

## Egalitarianism ##
egual =~  gincdif + dfincac + smdfslv
'

fit_ws_eg <- cfa(model_ws_eg, # model formula
                 data=ess_df  # data frame
  )

summary(fit_ws_eg, standardized=TRUE)

lavaan 0.6-10 ended normally after 36 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        13
                                                      
                                                  Used       Total
  Number of observations                          1735        1760
                                                                  
Model Test User Model:
                                                      
  Test statistic                                40.125
  Degrees of freedom                                 8
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  welf_supp =~                                                          
    gvslvol           1.000                               1.250    0.845
    gvslvue           0.580    0.046   12.504    0.000    0.725    0.380
    gvhlthc           0.858    0.056   15.432    0.000    1.072    0.729
  egual =~                                                              
    gincdif           1.000                               0.684    0.645
    dfincac          -0.622    0.059  -10.481    0.000   -0.426   -0.404
    smdfslv           0.877    0.082   10.731    0.000    0.600    0.615

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  welf_supp ~~                                                          
    egual            -0.225    0.032   -6.986    0.000   -0.263   -0.263

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .gvslvol           0.624    0.097    6.438    0.000    0.624    0.285
   .gvslvue           3.126    0.112   27.986    0.000    3.126    0.856
   .gvhlthc           1.013    0.078   13.040    0.000    1.013    0.468
   .gincdif           0.656    0.048   13.696    0.000    0.656    0.584
   .dfincac           0.928    0.036   25.612    0.000    0.928    0.837
   .smdfslv           0.592    0.038   15.421    0.000    0.592    0.622
    welf_supp         1.563    0.118   13.211    0.000    1.000    1.000
    egual             0.468    0.053    8.913    0.000    1.000    1.000

model_ws_eg_g <- '
## Welfare Support Factor ##
welf_supp =~ gvslvol + gvslvue + gvhlthc

## Egalitarianism ##
egual =~  gincdif + dfincac + smdfslv

welf_supp ~ gndr
'

fit_ws_eg_g <- cfa(model_ws_eg_g, # model formula
           data=ess_df                # data frame
  )
graph_sem(fit_ws_eg_g, standardized=TRUE)

model_ws_eg_ga <- '
## Welfare Support Factor ##
welf_supp =~ gvslvol + gvslvue + gvhlthc

## Egalitarianism ##
egual =~  gincdif + dfincac + smdfslv

welf_supp ~ gndr + agea
'

fit_ws_eg_ga <- cfa(model_ws_eg_ga, # model formula
           data=ess_df                # data frame
  )

model_ws_eg_gai <- '
## Welfare Support Factor ##
welf_supp =~ gvslvol + gvslvue + gvhlthc

## Egalitarianism ##
egual =~  gincdif + dfincac + smdfslv

welf_supp ~ gndr + agea + hinctnta
'

fit_ws_eg_gai <- cfa(model_ws_eg_gai, # model formula
           data=ess_df                # data frame
  )

model_ws_eg_gaie <- '
## Welfare Support Factor ##
welf_supp =~ gvslvol + gvslvue + gvhlthc

## Egalitarianism ##
egual =~  gincdif + dfincac + smdfslv

welf_supp ~ gndr + agea + hinctnta + eduyrs
'

fit_ws_eg_gaie <- cfa(model_ws_eg_gaie, # model formula
           data=ess_df                # data frame
  )



fitm_model_ws_eg <-  fitMeasures(fit_ws_eg, c("logl",
                                                "AIC", 
                                                "BIC", 
                                                "chisq",
                                                "df",
                                                "pvalue",
                                                "cfi",
                                                "tli",
                                                "rmsea"), 
                                                output = "matrix")

fitm_model_ws_eg_g <- fitMeasures(fit_ws_eg_g, c("logl",
                                                   "AIC", 
                                                   "BIC", 
                                                   "chisq",
                                                   "df",
                                                   "pvalue",
                                                   "cfi",
                                                   "tli",
                                                   "rmsea"), 
                                                   output = "matrix")

fitm_model_ws_eg_ga <- fitMeasures(fit_ws_eg_ga, c("logl",
                                                  "AIC", 
                                                  "BIC", 
                                                  "chisq",
                                                  "df",
                                                  "pvalue",
                                                  "cfi",
                                                  "tli",
                                                  "rmsea"), 
                                                  output = "matrix")

fitm_model_ws_eg_gai <- fitMeasures(fit_ws_eg_gai, c("logl",
                                                       "AIC", 
                                                       "BIC", 
                                                       "chisq",
                                                       "df",
                                                       "pvalue",
                                                       "cfi",
                                                       "tli",
                                                       "rmsea"), 
                                                       output = "matrix")

fitm_model_ws_eg_gaie <- fitMeasures(fit_ws_eg_gaie, c("logl",
                                                         "AIC", 
                                                         "BIC", 
                                                         "chisq",
                                                         "df",
                                                         "pvalue",
                                                         "cfi",
                                                         "tli",
                                                         "rmsea"), 
                                                         output = "matrix")

data.frame(Fit=rownames(fitm_model_ws_eg),
           "Measurement" = round(fitm_model_ws_eg[,1],2),
           "plus_gender" = round(fitm_model_ws_eg_g[,1],2),
           "plus_age" = round(fitm_model_ws_eg_ga[,1],2),
           "plus_income" = round(fitm_model_ws_eg_gai[,1],2),
           "plus_education" = round(fitm_model_ws_eg_gaie[,1],2)

  
)

          Fit Measurement plus_gender  plus_age plus_income plus_education
logl     logl   -16610.17   -16638.80 -16628.25   -14843.65      -14831.68
aic       aic    33246.35    33303.60  33284.50    29717.30       29695.35
bic       bic    33317.31    33374.57  33360.93    29797.51       29780.90
chisq   chisq       40.13      114.35    165.95      179.85         218.88
df         df        8.00       14.00     19.00       24.00          29.00
pvalue pvalue        0.00        0.00      0.00        0.00           0.00
cfi       cfi        0.98        0.94      0.91        0.90           0.88
tli       tli        0.96        0.91      0.87        0.86           0.84
rmsea   rmsea        0.05        0.06      0.07        0.06           0.06

Degrees of freedom calculation:

pieces of information (observed variables) = \(p(p+1)/2=21\)
measurement part = n factor loadings + n residual variances.
structural part = n regressions + n residual factor variance.
exogenous variables = n variances + n covariances.
df = pieces of information - (measurement part + structural part + exogenous variables).

# let's build a function to calculate the df for our models

degres_of_freedom <- function(obs_var=obs_var, # n observed variables 
                              loads=loads,     # n estimated factors loadings
                              regression=regression, # n regression
                              latent_var=latent_var, # n latent variables
                              covariance_latent=TRUE, # covariance latent vars
                              exogenous=exogenous){ # n exogenous 

# next function returns n combination of 2 or more variables 
# we use it to calculate the n covariance of the exogenous variables
comb = function(n, x) {
  if(n==0){return(0)} # if we have no exogenous variables return 0
  if(n==1){return(0)} # if we have 1 exogenous variable return 0
  else{
  factorial(n) / factorial(n-x) / factorial(x)}
}

# 1. pieces of information (observed variables)
info <- obs_var*(obs_var + 1)/2 
#2. measurement part = n factor loadings + n residual variances. 
measurement <- (loads) + obs_var
#3. structural part = n regressions + n residual factor variance.
structural <- ifelse(regression>0,0,regression) + latent_var
# 4. exogenous variables = n variances + n covariances.
# first, let's calculate the covariances 
cov <- NA
# lavaan removes by default the LV covariance when you include a regression
# this code takes into account lavaan behaviour
if(covariance_latent==TRUE){ 
cov<-comb(latent_var, 2)}else{cov<-0}
# second, let's get the n variances and n covariances for our exogenous vars.
exogenous_var <- exogenous + comb(exogenous, 2)
# finally, let's calculate the df
df = info - (measurement + structural + exogenous_var+ cov)
return(df)
} 

df_model_ws_eg <- degres_of_freedom(obs_var=6,
                  loads=4,
                  regression=0,
                  latent_var=2,
                  covariance_latent=TRUE,
                  exogenous=0)


df_model_ws_eg_g <- degres_of_freedom(obs_var=7,
                  loads=4,
                  regression=1,
                  latent_var=2,
                  covariance_latent=FALSE,
                  exogenous=1)

df_model_ws_eg_ga <- degres_of_freedom(obs_var=8,
                  loads=4,
                  regression=2,
                  latent_var=2,
                  covariance_latent=FALSE,
                  exogenous=2)

df_model_ws_eg_gai <- degres_of_freedom(obs_var=9,
                  loads=4,
                  regression=3,
                  latent_var=2,
                  covariance_latent=FALSE,
                  exogenous=3)


df_model_ws_eg_gaie <- degres_of_freedom(obs_var=10,
                  loads=4,
                  regression=4,
                  latent_var=2,
                  covariance_latent=FALSE,
                  exogenous=4)



data.frame(Models = c("ws_eg","ws_eg_g","ws_eg_ga","ws_eg_gai","ws_eg_gaie"),
           df=c(df_model_ws_eg,
                df_model_ws_eg_g,
                df_model_ws_eg_ga,
                df_model_ws_eg_gai,
                df_model_ws_eg_gaie))

      Models df
1      ws_eg  8
2    ws_eg_g 14
3   ws_eg_ga 19
4  ws_eg_gai 24
5 ws_eg_gaie 29

5 Ex.2 – Mediation Analysis

Fit a model where egalitarianism (gincdif, dfincac, smdfslv) mediates the relationship between age (agea), education (eduyrs), gender (gndr), income (hinctnta) and welfare support (gvslvol, gvslvue, gvhlthc)
Assess if the model fits the data well
Inspect the regression parameters and the R-squared values of the latent variables.
Manually calculate the indirect effect of age via egalitarianism \(a \times b\) and its standard error (\(\sigma_{a \times b}\)) using Sobel’s formula:
- Indirect effect = \(a \times b\).
- \(\sigma_{a \times b} = \sqrt{b^2 \times \sigma_{a}^2 + a^2 \times \sigma_{b}^2}\).
Manually calculate the total effect of age \(c' + (a \times b)\) on welfare support
Ask lavaan to calculate these indirect and total effects. Compare to your own solution
Interpret the results. Does egalitarianism mediates age? Can you interpret this result causally ?

model_mediation <- '
## Welfare Support Factor ##
welf_supp =~ gvslvol + gvslvue + gvhlthc

## Egalitarianism ##
egual =~  gincdif + dfincac + smdfslv

## Direct effect(s) ##
welf_supp ~ c*agea + c1*eduyrs + c2*hinctnta + c3*gndr

## Mediator ##
# Path A
egual ~ a*agea
egual ~ a1*eduyrs
egual ~ a2*hinctnta
egual ~ a3*gndr
# Path B
welf_supp ~ b*egual

## Indirect effect (a*b) ##
ab_age := a*b
a1b_edu := a1*b
a2b_inco := a2*b
a3b_gndr := a3*b

## Total effect ##
total_age := c + (a*b)
total1_edu := c1 + (a1*b)
total2_inco := c2 + (a2*b)
total3_gndr := c3 + (a3*b)

'

fit_mediation <- cfa(model_mediation, # model formula
           data=ess_df                # data frame
  )

# Fit indices
fitm_model_mediation<- fitMeasures(fit_mediation, c("logl",
                                                  "AIC", 
                                                  "BIC", 
                                                  "chisq",
                                                  "df",
                                                  "pvalue",
                                                  "cfi",
                                                  "tli",
                                                  "rmsea"), 
                                                  output = "matrix")

fitm_model_mediation

                 
logl   -14776.024
aic     29594.048
bic     29706.328
chisq     107.575
df         24.000
pvalue      0.000
cfi         0.948
tli         0.915
rmsea       0.047

The model fits the data fairly well. Let’s now move to regression estimates and \(R^2\).

tidy_results <- table_results(fit_mediation,             
  columns = c("label", 
              "est_sig"),
  digits = 2,
)

tidy_results %>% filter(str_detect(label, 
                                   "welf_supp.ON|egual.ON") # | means OR
                        )

                  label  est_sig
1     welf_supp.ON.agea  0.01***
2   welf_supp.ON.eduyrs    0.02*
3 welf_supp.ON.hinctnta     0.01
4     welf_supp.ON.gndr    -0.03
5         egual.ON.agea   -0.00*
6       egual.ON.eduyrs  0.02***
7     egual.ON.hinctnta  0.03***
8         egual.ON.gndr   -0.09*
9    welf_supp.ON.egual -0.49***

r2 <- round(inspect(fit_mediation,"r2"),3)
r2

  gvslvol   gvslvue   gvhlthc   gincdif   dfincac   smdfslv welf_supp     egual 
    0.775     0.142     0.511     0.416     0.151     0.395     0.084     0.066

Remember that the \(R^2\) is calculated as 1 minus the ratio of residual variance (appearing in the parameter estimates, although it might be a fixed value) to the model-implied total variance of that variable. In other words, the \(R^2\) is the proportion of variance of the latent response that was explained by the predictors.

Next, let’s calculate the direct and indirect effects manually

## Coefficients ## 
a <- inspect(fit_mediation,"est")$beta["egual","agea"]
b <- inspect(fit_mediation,"est")$beta["welf_supp","egual"]
## Standard Errors ## 
sigma_a <- inspect(fit_mediation,"se")$beta["egual","agea"]
sigma_b <- inspect(fit_mediation,"se")$beta["welf_supp","egual"]
## Manual calculation of indirect effect a x b ## 
a * b

[1] 0.001457582

## Manual calculation of the sigma a x b ## 
sqrt(a^2*sigma_b^2 + b^2*sigma_a^2)

[1] 0.0006588107

## Manual calculation of the total effect (a x b) + c' ## 
a * b + inspect(fit_mediation,"est")$beta["welf_supp","agea"]

[1] 0.01047894

Let’s use tidy SEM to extract the direct and indirect effects calculated by lavaan

tidy_results <- table_results(fit_mediation,             
  columns = c("label", 
              "est_sig", # beta 
              "se"       # sigma
),
  digits = 2,
)


tidy_results %>% filter(str_detect(label, 
                                   "ab_age|total_age") # | means OR
                        )

                 label est_sig   se
1        ab_age.:=.a*b   0.00* 0.00
2 total_age.:=.c+(a*b) 0.01*** 0.00

Unsurprisingly, they are the same as the one that we calculated manually.

Structural Equation Modeling | Exercises 2