Structural Equation Modeling | Lab Session 1

1 Let’s pause for a sec

An artwork by Kamila Mankiewicz on the War in Ukraine

2 Lab sessions: why and how?

  1. Apply theoretical knowledge
  2. Increase understanding by interacting with data
  3. Learn to use some packages in R
  4. How:
    • Relatively unstructured
    • Go at your own pace, try to do the exercises yourself (do yourself a favour and do not just copy paste and run the solutions)
    • “There is never time to do it right, but there is always time to do it over”

2.1 Software used: R

- This is not an R course!
- We will learn some R as we go along
- I will use RStudio
- Many packages or libraries exist to do specific analyses

3 What we are going to cover

  1. Data preparation
  2. Data exploration
  3. One-factor CFA model
    • Degrees of Freedom
    • Model diagnostics
    • Model fit statistics
  4. A three-factor CFA model
  5. Plotting SEM

4 Data

The data set used throughout is the European Social Survey ESS4-2008 Edition 4.5 was released on 1 December 2018. We will restrict the analysis to the Belgian case. Each line in the data set represents a Belgian respondent. The full dataset an documentation can be found on the ESS website

Codebook:

  • gvslvol Standard of living for the old, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)

  • gvslvue Standard of living for the unemployed, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)

  • gvhlthc Health care for the sick, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)

  • gvcldcr Child care services for working parents, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)

  • gvjbevn Job for everyone, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)

  • gvpdlwk Paid leave from work to care for sick family, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)

  • sbstrec Social benefits/services place too great strain on economy (1 Agree strongly - 5 Disagree strongly)

  • sbbsntx Social benefits/services cost businesses too much in taxes/charges (1 Agree strongly - 5 Disagree strongly)

  • sbprvpv Social benefits/services prevent widespread poverty (1 Agree strongly - 5 Disagree strongly)

  • sbeqsoc Social benefits/services lead to a more equal society (1 Agree strongly - 5 Disagree strongly)

  • sbcwkfm Social benefits/services make it easier to combine work and family (1 Agree strongly - 5 Disagree strongly)

  • sblazy Social benefits/services make people lazy (1 Agree strongly - 5 Disagree strongly)

  • sblwcoa Social benefits/services make people less willing care for one another (1 Agree strongly - 5 Disagree strongly)

  • sblwlka Social benefits/services make people less willing look after themselves/family (1 Agree strongly - 5 Disagree strongly)

  • agea Respondent’s age

  • eduyrs Years of full-time education completed

  • gndr Gender (1 Male, 2 Female)

  • hinctnta Household’s total net income, all sources (Deciles of the actual household income range in the given country.)

  • gincdif Government should reduce differences in income levels (1 Agree strongly - 5 Disagree strongly)

  • dfincac Large differences in income acceptable to reward talents and efforts (1 Agree strongly - 5 Disagree strongly)

  • smdfslv For fair society, differences in standard of living should be small (1 Agree strongly - 5 Disagree strongly)

5 Environment preparation

First, let’s load the necessary packages to load, manipulate, visualize and analyse the data.

# Uncomment this once if you need to install the packages on your system 

### DATA MANIPULATION ###
# install.packages("haven")                 # data import from spss
# install.packages("dplyr")                 # data manipulation
# install.packages("psych")                 # descriptives
# install.packages("stringr")               # string manipulation

### MODELING ###
# install.packages("lavaan")                # SEM modelling

### VISUALIZATION ###
# install.packages("tidySEM")               # plotting SEM models
# install.packages("corrplot")              # correlation/covariance plots


# Load the packages 

### DATA MANIPULATION ###
library("haven")        
library("dplyr")      
library("psych")
library('stringr')

### MODELING ###
library("lavaan")       

### VISUALIZATION ###
library("corrplot")     
library("tidySEM")

6 Data exploration

It is a good practice to check that everything is in order and make sense of the data that we are going to analyze.

ess_df <- haven::read_sav("https://github.com/albertostefanelli/SEM_labs/raw/master/data/ESS4_belgium.sav")

head(ess_df, 20)  # first 20 rows of the dataset
# A tibble: 20 × 21
                                     gvslvol                                   gvslvue                                   gvhlthc   gvcldcr                             gvjbevn gvpdlwk sbstrec sbbsntx sbprvpv sbeqsoc sbcwkfm  sblazy sblwcoa sblwlka  agea eduyrs    gndr hinctnta gincdif dfincac smdfslv
                                   <dbl+lbl>                                 <dbl+lbl>                                 <dbl+lbl> <dbl+lbl>                           <dbl+lbl> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl> <dbl+> <dbl+l> <dbl+lb> <dbl+l> <dbl+l> <dbl+l>
 1  8 [8]                                     7 [7]                                     8 [8]                                        5 [5]  8 [8]                                5 [5] 2 [Agr… 1 [Agr… 1 [Agr… 3 [Nei… 2 [Agr… 2 [Agr… 2 [Agr… 2 [Agr…    36     18 1 [Mal…  4 [M -… 2 [Agr… 3 [Nei… 2 [Agr…
 2  7 [7]                                     7 [7]                                     8 [8]                                        7 [7]  7 [7]                                6 [6] 4 [Dis… 3 [Nei… 2 [Agr… 2 [Agr… 2 [Agr… 4 [Dis… 4 [Dis… 4 [Dis…    26     15 2 [Fem…  7 [K -… 2 [Agr… 3 [Nei… 2 [Agr…
 3  6 [6]                                     3 [3]                                     6 [6]                                        5 [5]  3 [3]                                4 [4] 2 [Agr… 2 [Agr… 3 [Nei… 4 [Dis… 3 [Nei… 1 [Agr… 1 [Agr… 1 [Agr…    69     18 1 [Mal… 10 [H -… 4 [Dis… 1 [Agr… 4 [Dis…
 4  6 [6]                                     6 [6]                                     6 [6]                                        6 [6]  3 [3]                                3 [3] 3 [Nei… 2 [Agr… 2 [Agr… 2 [Agr… 2 [Agr… 3 [Nei… 2 [Agr… 2 [Agr…    77     15 2 [Fem…  7 [K -… 2 [Agr… 2 [Agr… 2 [Agr…
 5  6 [6]                                     4 [4]                                     7 [7]                                        5 [5]  6 [6]                                5 [5] 4 [Dis… 1 [Agr… 2 [Agr… 2 [Agr… 2 [Agr… 4 [Dis… 3 [Nei… 3 [Nei…    27     13 1 [Mal…  7 [K -… 3 [Nei… 4 [Dis… 4 [Dis…
 6 10 [Entirely governments' responsibility]  5 [5]                                    10 [Entirely governments' responsibility]     5 [5]  5 [5]                                7 [7] 3 [Nei… 2 [Agr… 2 [Agr… 2 [Agr… 3 [Nei… 2 [Agr… 4 [Dis… 4 [Dis…    32     12 2 [Fem…  6 [S -… 4 [Dis… 2 [Agr… 4 [Dis…
 7 10 [Entirely governments' responsibility]  6 [6]                                     8 [8]                                        6 [6]  8 [8]                                6 [6] 4 [Dis… 2 [Agr… 2 [Agr… 4 [Dis… 3 [Nei… 4 [Dis… 3 [Nei… 4 [Dis…    19     13 2 [Fem…  8 [P -… 4 [Dis… 2 [Agr… 2 [Agr…
 8  7 [7]                                     6 [6]                                     9 [9]                                        7 [7]  8 [8]                                8 [8] 4 [Dis… 3 [Nei… 2 [Agr… 2 [Agr… 2 [Agr… 3 [Nei… 4 [Dis… 4 [Dis…    28     17 2 [Fem… 10 [H -… 2 [Agr… 3 [Nei… 2 [Agr…
 9  9 [9]                                     6 [6]                                     9 [9]                                        5 [5]  5 [5]                                8 [8] 2 [Agr… 3 [Nei… 3 [Nei… 2 [Agr… 3 [Nei… 2 [Agr… 2 [Agr… 3 [Nei…    49     16 1 [Mal…  4 [M -… 1 [Agr… 4 [Dis… 2 [Agr…
10  8 [8]                                     8 [8]                                     8 [8]                                        8 [8]  8 [8]                                8 [8] 4 [Dis… 3 [Nei… 2 [Agr… 2 [Agr… 3 [Nei… 3 [Nei… 4 [Dis… 3 [Nei…    57     16 1 [Mal…  9 [D -… 2 [Agr… 2 [Agr… 3 [Nei…
11  8 [8]                                     6 [6]                                     8 [8]                                        7 [7]  6 [6]                                6 [6] 2 [Agr… 2 [Agr… 2 [Agr… 4 [Dis… 4 [Dis… 2 [Agr… 2 [Agr… 2 [Agr…    71     17 2 [Fem…  7 [K -… 4 [Dis… 4 [Dis… 3 [Nei…
12  8 [8]                                     6 [6]                                     8 [8]                                        8 [8]  7 [7]                                7 [7] 1 [Agr… 3 [Nei… 4 [Dis… 4 [Dis… 3 [Nei… 3 [Nei… 2 [Agr… 2 [Agr…    83     16 2 [Fem…  8 [P -… 3 [Nei… 2 [Agr… 2 [Agr…
13  6 [6]                                     3 [3]                                     7 [7]                                        4 [4]  5 [5]                                4 [4] 2 [Agr… 2 [Agr… 3 [Nei… 2 [Agr… 2 [Agr… 2 [Agr… 4 [Dis… 2 [Agr…    46      5 2 [Fem… 10 [H -… 3 [Nei… 4 [Dis… 2 [Agr…
14  9 [9]                                     9 [9]                                     9 [9]                                        9 [9] 10 [Entirely governments' responsi…   8 [8] 4 [Dis… 4 [Dis… 2 [Agr… 2 [Agr… 3 [Nei… 5 [Dis… 4 [Dis… 4 [Dis…    61     14 2 [Fem…  1 [J -… 4 [Dis… 2 [Agr… 3 [Nei…
15  9 [9]                                     9 [9]                                     9 [9]                                        9 [9]  7 [7]                                9 [9] 4 [Dis… 3 [Nei… 2 [Agr… 2 [Agr… 2 [Agr… 2 [Agr… 2 [Agr… 1 [Agr…    68     11 2 [Fem…  3 [C -… 1 [Agr… 2 [Agr… 2 [Agr…
16  8 [8]                                     7 [7]                                     8 [8]                                        6 [6]  7 [7]                                7 [7] 4 [Dis… 4 [Dis… 2 [Agr… 2 [Agr… 2 [Agr… 4 [Dis… 3 [Nei… 2 [Agr…    62     11 1 [Mal…  8 [P -… 2 [Agr… 4 [Dis… 2 [Agr…
17 10 [Entirely governments' responsibility] 10 [Entirely governments' responsibility] 10 [Entirely governments' responsibility]     7 [7]  7 [7]                                5 [5] 4 [Dis… 2 [Agr… 2 [Agr… 2 [Agr… 2 [Agr… 4 [Dis… 4 [Dis… 4 [Dis…    23     12 1 [Mal…  2 [R -… 1 [Agr… 2 [Agr… 3 [Nei…
18  7 [7]                                     6 [6]                                     7 [7]                                        7 [7]  4 [4]                                6 [6] 4 [Dis… 4 [Dis… 2 [Agr… 3 [Nei… 4 [Dis… 2 [Agr… 2 [Agr… 2 [Agr…    64     11 1 [Mal…  5 [F -… 4 [Dis… 1 [Agr… 4 [Dis…
19  8 [8]                                     3 [3]                                     7 [7]                                        8 [8]  7 [7]                                8 [8] 2 [Agr… 3 [Nei… 2 [Agr… 2 [Agr… 2 [Agr… 2 [Agr… 2 [Agr… 2 [Agr…    64     13 1 [Mal…  9 [D -… 2 [Agr… 4 [Dis… 1 [Agr…
20  5 [5]                                     6 [6]                                     5 [5]                                        8 [8]  5 [5]                                8 [8] 4 [Dis… 1 [Agr… 2 [Agr… 1 [Agr… 1 [Agr… 2 [Agr… 2 [Agr… 2 [Agr…    45      8 1 [Mal…  7 [K -… 2 [Agr… 3 [Nei… 2 [Agr…
nrow(ess_df)      # number of subjects 
[1] 1760
ncol(ess_df)      # number of variables 
[1] 21
names(ess_df)     # names of variables
 [1] "gvslvol"  "gvslvue"  "gvhlthc"  "gvcldcr"  "gvjbevn"  "gvpdlwk"  "sbstrec"  "sbbsntx"  "sbprvpv"  "sbeqsoc"  "sbcwkfm"  "sblazy"   "sblwcoa"  "sblwlka"  "agea"     "eduyrs"   "gndr"     "hinctnta" "gincdif"  "dfincac"  "smdfslv" 

In this first lab, we are exploring two concepts: welfare support and welfare criticism. Let’s take a closer look at our items

ess_df_selected <- ess_df %>% select(
                  ## Welfare support items ##
                  gvslvol, # the old
                  gvslvue, # the unemployed
                  gvhlthc, # the sick
                  gvcldcr, # working parents
                  gvjbevn, # job for everyone
                  gvpdlwk, # paid sick leave  
                  ##    Economic criticism items ##
                  sbstrec, # strain on economy
                  sbbsntx, # too much taxes
                  ##    Social criticism items ## 
                  sbprvpv, # poverty
                  sbeqsoc, # more equal society
                  sbcwkfm, # work and family
                  ##    Moral criticism items ##
                  sblazy,  # people lazy 
                  sblwcoa, # care for others
                  sblwlka  # look after others
)

descriptive_ess <- as.data.frame(psych::describe(ess_df_selected))

descriptive_ess <- dplyr::select(descriptive_ess, 
  n,
  mean,
  sd,
  median,
  min,
  max,
  skew,
  kurtosis)

descriptive_ess
           n     mean        sd median min max        skew    kurtosis
gvslvol 1759 7.871518 1.4811986      8   0  10 -0.78122430  1.37621274
gvslvue 1753 6.059897 1.9176408      6   0  10 -0.40989634  0.40781515
gvhlthc 1757 8.030734 1.4715101      8   0  10 -0.94788987  1.81361522
gvcldcr 1748 7.276316 1.7622804      7   0  10 -0.80981813  1.34667552
gvjbevn 1756 6.234624 2.2667339      7   0  10 -0.48699173 -0.08302998
gvpdlwk 1753 7.310325 1.7677965      8   0  10 -0.82684920  1.31394820
sbstrec 1735 2.963112 1.0193172      3   1   5  0.10310828 -0.92138250
sbbsntx 1714 2.531505 0.9970179      2   1   5  0.42532526 -0.51782206
sbprvpv 1740 2.333908 0.8722280      2   1   5  0.85304303  0.41061715
sbeqsoc 1748 2.374714 0.8341462      2   1   5  0.94680277  0.68051829
sbcwkfm 1735 2.334870 0.7775840      2   1   5  0.98878599  0.95302935
sblazy  1751 2.894917 1.0557772      3   1   5  0.10529513 -0.92067158
sblwcoa 1751 2.919475 1.0352766      3   1   5  0.06846687 -1.05077060
sblwlka 1746 2.981672 1.0339269      3   1   5 -0.09401127 -1.07498593

Before getting into more complex modelling, it is worth checking the structure of the data by calculating the variance-covariance matrix for the items that we are going to use to fit a CFA. Let’s first start with the items measuring welfare support.

# let's select the welfare support items 

ess_df_welfare_supp <- ess_df %>% select(
                  ## Welfare support items ##
                  gvslvol, # the old
                  gvslvue, # the unemployed
                  gvhlthc, # the sick
                  gvcldcr, # working parents
                  gvjbevn, # job for everyone
                  gvpdlwk  # paid sick leave  
)

# let's calculate the sample implied covariance matrix 
welfare_supp_cov <- cov(ess_df_welfare_supp,          # data frame 
                        use = "pairwise.complete.obs" # remove NAs 
                        )

welfare_supp_cov
          gvslvol   gvslvue   gvhlthc   gvcldcr  gvjbevn   gvpdlwk
gvslvol 2.1939494 0.8998156 1.3524399 1.0333956 1.203299 1.0712067
gvslvue 0.8998156 3.6773463 0.8016763 0.7827062 1.488233 0.8159548
gvhlthc 1.3524399 0.8016763 2.1653419 0.8897580 1.287510 0.9485565
gvcldcr 1.0333956 0.7827062 0.8897580 3.1056322 1.231219 1.6873857
gvjbevn 1.2032994 1.4882330 1.2875099 1.2312188 5.138083 1.1656366
gvpdlwk 1.0712067 0.8159548 0.9485565 1.6873857 1.165637 3.1251045
# visual inspection is sometimes more useful
# we can use the corrplot package.
# it it designed for correlation matrices 
# but they can also plot covariance matrices 

welfare_supp_cor <- cov2cor(welfare_supp_cov)
 welfare_supp_cor
          gvslvol   gvslvue   gvhlthc   gvcldcr   gvjbevn   gvpdlwk
gvslvol 1.0000000 0.3167911 0.6204995 0.3958934 0.3583933 0.4090983
gvslvue 0.3167911 1.0000000 0.2840982 0.2316096 0.3423759 0.2406947
gvhlthc 0.6204995 0.2840982 1.0000000 0.3431102 0.3859995 0.3646428
gvcldcr 0.3958934 0.2316096 0.3431102 1.0000000 0.3082192 0.5416354
gvjbevn 0.3583933 0.3423759 0.3859995 0.3082192 1.0000000 0.2908911
gvpdlwk 0.4090983 0.2406947 0.3646428 0.5416354 0.2908911 1.0000000
 corrplot::corrplot(welfare_supp_cor, 
                    is.corr = FALSE,       # whether is a correlation matrix 
                    method = "circle",     # magnitude of covariances as circles 
                    type = "upper",        # remove the bottom of the covariance matrix
                    addCoef.col = "black"  # add to the plot the coefficients
 )

7 1-factor CFA model

7.1 1-factor model: 3 indicators

Now that we have an idea of what is in our data we apply our theoretical knowledge to the our empirical data. Our measurement model will be formed by 3 items that loads on the latent factor “Welfare Support.”

7.2 Degrees of freedom

Sorts of identification:

  1. non-identified: \(df < 0\), impossible to estimate parameters
  2. over-identified: \(df > 0\), we should strive for this
  3. just-identified: \(df = 0\), it is ok, but the fit cannot be assessed
  4. empirical under-identification: it can happen when two or more indicators are highly correlated (multicollinearity)

We can quickly check if our model has positive degrees of freedom using the following formula.

Degrees of freedom = Number of parameters to estimate - Pieces of information

Pieces of information = \(\dfrac{p(p+1)}{2}=\dfrac{3(3+1)}{2}=6\).

where \(p\) is the number of “manifest” indicators \(i\) (also called “measured variables” or “observed indicators”), in this case gvslvol gvslvue, gvhlthc.

Number of parameters to estimate = \(\psi{f} + \lambda_{ji} + \theta_i= 1 + 2 + 3=6\)

where \(\psi{f}\) is the latent factor \(f\) variance, \(\lambda_{fi}\) is the loading on the latent factor \(f\) for each observed indicator \(i\), and \(\theta_{i}\) is the residual variance for each observed indicator \(i\).

Q: Why \(\lambda_{fi}=2\) ? (Pro Tip: (do not count parameters we fixed for scaling).

Degrees of freedom = Number of parameters to estimate - Pieces of informations =\(6-6=0\)

Q: What are the implication of having 0 degrees of freedom?

Remember:

  • Factor variance (\(\psi{j}\)): how much individuals differ on the factor
  • Factor loadings (\(\lambda_{ji}\)): the effect of the factor or latent variable on the measure
  • Residual (error) variance (\(\theta_{i}\)) : variance in the measure not attributable to the factor

Let’s now fit our CFA model using gvslvol, gvslvue, gvhlthc.

model_ws_3 <-'welf_supp =~ gvslvol + gvslvue + gvhlthc'

fit_ws_3 <- cfa(model_ws_3,             # model formula
                data = ess_df_selected  # data frame
  )

summary(fit_ws_3, 
        standardized = TRUE)
lavaan 0.6-10 ended normally after 26 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         6
                                                      
                                                  Used       Total
  Number of observations                          1752        1760
                                                                  
Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  welf_supp =~                                                          
    gvslvol           1.000                               1.233    0.832
    gvslvue           0.593    0.048   12.395    0.000    0.730    0.381
    gvhlthc           0.890    0.062   14.292    0.000    1.097    0.746

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .gvslvol           0.677    0.103    6.555    0.000    0.677    0.308
   .gvslvue           3.143    0.112   28.083    0.000    3.143    0.855
   .gvhlthc           0.962    0.086   11.171    0.000    0.962    0.444
    welf_supp         1.519    0.123   12.359    0.000    1.000    1.000

7.3 Lavaan syntax

Let’s take a closer look at the lavaan syntax

  • latent variable definition =~ (is measured by)
  • regression ~ (is regressed on)
  • (residual) (co)variance ~~ (is correlated with)
  • intercept ~1 (intercept)
  • new model parameter := (is equal to)

So, if we want, we can freely estimate the first factor loading and fix the factor variance to 1:

7.4 Alternative model specification

Remember: Every factor in CFA should be given a metric to be identified.

  1. Fix one factor loading to 1 (“marker variable” method)
  2. Fix factor variance to 1
  3. Fix the average loading to 1 (“effect coding” method).
model_ws_3_alt <-'
welf_supp =~ NA*gvslvol + gvslvue + gvhlthc
welf_supp~~1*welf_supp
'

fit_ws_3_alt <- cfa(model_ws_3_alt,            # model formula
                    data = ess_df_selected     # data frame
  )

summary(fit_ws_3_alt, 
        standardized = TRUE)
lavaan 0.6-10 ended normally after 21 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         6
                                                      
                                                  Used       Total
  Number of observations                          1752        1760
                                                                  
Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  welf_supp =~                                                          
    gvslvol           1.233    0.050   24.718    0.000    1.233    0.832
    gvslvue           0.730    0.050   14.516    0.000    0.730    0.381
    gvhlthc           1.097    0.047   23.321    0.000    1.097    0.746

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    welf_supp         1.000                               1.000    1.000
   .gvslvol           0.677    0.103    6.555    0.000    0.677    0.308
   .gvslvue           3.143    0.112   28.083    0.000    3.143    0.855
   .gvhlthc           0.962    0.086   11.171    0.000    0.962    0.444

Q: What is the difference between these two models?

7.5 1-factor model: 6 indicators

Since we have 3 more indicators that measure our latent construct welfare support, let’s fit a model with 6 items instead of 3. This will allows us to estimate fit indices, one of the most valuable metric in a SEM model.

model_ws_6 <-'welf_supp =~ gvslvol + gvslvue + gvhlthc + gvcldcr + gvjbevn + gvpdlwk'


fit_ws_6 <- cfa(model_ws_6,             # model formula
                data = ess_df_selected  # data frame
  )

summary(fit_ws_6, 
        standardized = TRUE)
lavaan 0.6-10 ended normally after 28 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        12
                                                      
                                                  Used       Total
  Number of observations                          1740        1760
                                                                  
Model Test User Model:
                                                      
  Test statistic                               326.956
  Degrees of freedom                                 9
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  welf_supp =~                                                          
    gvslvol           1.000                               1.129    0.761
    gvslvue           0.727    0.046   15.908    0.000    0.821    0.428
    gvhlthc           0.945    0.037   25.657    0.000    1.067    0.724
    gvcldcr           0.895    0.042   21.086    0.000    1.010    0.573
    gvjbevn           1.042    0.054   19.184    0.000    1.176    0.519
    gvpdlwk           0.913    0.042   21.526    0.000    1.031    0.586

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .gvslvol           0.923    0.048   19.088    0.000    0.923    0.420
   .gvslvue           3.001    0.108   27.859    0.000    3.001    0.817
   .gvhlthc           1.031    0.049   21.109    0.000    1.031    0.476
   .gvcldcr           2.087    0.081   25.883    0.000    2.087    0.672
   .gvjbevn           3.755    0.140   26.790    0.000    3.755    0.731
   .gvpdlwk           2.032    0.079   25.627    0.000    2.032    0.657
    welf_supp         1.274    0.077   16.546    0.000    1.000    1.000

Q: Why do we have 9 degrees of freedom in this model?

7.6 Model diagnostics

Important things to check after running the model:

  1. Estimates
    • Do the indicators load well on the factor(s)?
  2. Model fit [see next]
    • Are the fit indices good?
  3. Heywood cases/ reasonable solution
    • Are variances positive?
    • Are \(r^2\) is below 1?
    • Are there any extreme standard errors?
# Returns the observed covariance matrix.
lavInspect(fit_ws_6, "sampstat")
$cov
        gvslvl gvslvu gvhlth gvcldc gvjbvn gvpdlw
gvslvol 2.197                                    
gvslvue 0.894  3.675                             
gvhlthc 1.353  0.800  2.169                      
gvcldcr 1.034  0.785  0.886  3.107               
gvjbevn 1.197  1.476  1.285  1.226  5.139        
gvpdlwk 1.060  0.806  0.946  1.689  1.156  3.095 
# Returns the model estimated covariance matrix.
fitted(fit_ws_6)
$cov
        gvslvl gvslvu gvhlth gvcldc gvjbvn gvpdlw
gvslvol 2.197                                    
gvslvue 0.926  3.675                             
gvhlthc 1.204  0.875  2.169                      
gvcldcr 1.140  0.829  1.077  3.107               
gvjbevn 1.328  0.966  1.255  1.188  5.139        
gvpdlwk 1.163  0.846  1.099  1.041  1.212  3.095 
# Returns the difference between the observed and estimated covariance matrix
residuals(fit_ws_6)
$type
[1] "raw"

$cov
        gvslvl gvslvu gvhlth gvcldc gvjbvn gvpdlw
gvslvol  0.000                                   
gvslvue -0.033  0.000                            
gvhlthc  0.149 -0.075  0.000                     
gvcldcr -0.106 -0.044 -0.192  0.000              
gvjbevn -0.131  0.510  0.030  0.037  0.000       
gvpdlwk -0.103 -0.040 -0.154  0.648 -0.056  0.000
# Returns the standard errors for the free parameters in the model.
lavTech(fit_ws_6, "se")
$lambda
           [,1]
[1,] 0.00000000
[2,] 0.04571461
[3,] 0.03682999
[4,] 0.04244399
[5,] 0.05432472
[6,] 0.04241931

$theta
          [,1]      [,2]       [,3]       [,4]      [,5]       [,6]
[1,] 0.0483618 0.0000000 0.00000000 0.00000000 0.0000000 0.00000000
[2,] 0.0000000 0.1077257 0.00000000 0.00000000 0.0000000 0.00000000
[3,] 0.0000000 0.0000000 0.04886228 0.00000000 0.0000000 0.00000000
[4,] 0.0000000 0.0000000 0.00000000 0.08062486 0.0000000 0.00000000
[5,] 0.0000000 0.0000000 0.00000000 0.00000000 0.1401613 0.00000000
[6,] 0.0000000 0.0000000 0.00000000 0.00000000 0.0000000 0.07930721

$psi
           [,1]
[1,] 0.07699622
# Returns the parameter estimates with confidence intervals.
parameterEstimates(fit_ws_6, standardized=TRUE)
         lhs op       rhs   est    se      z pvalue ci.lower ci.upper std.lv std.all std.nox
1  welf_supp =~   gvslvol 1.000 0.000     NA     NA    1.000    1.000  1.129   0.761   0.761
2  welf_supp =~   gvslvue 0.727 0.046 15.908      0    0.638    0.817  0.821   0.428   0.428
3  welf_supp =~   gvhlthc 0.945 0.037 25.657      0    0.873    1.017  1.067   0.724   0.724
4  welf_supp =~   gvcldcr 0.895 0.042 21.086      0    0.812    0.978  1.010   0.573   0.573
5  welf_supp =~   gvjbevn 1.042 0.054 19.184      0    0.936    1.149  1.176   0.519   0.519
6  welf_supp =~   gvpdlwk 0.913 0.042 21.526      0    0.830    0.996  1.031   0.586   0.586
7    gvslvol ~~   gvslvol 0.923 0.048 19.088      0    0.828    1.018  0.923   0.420   0.420
8    gvslvue ~~   gvslvue 3.001 0.108 27.859      0    2.790    3.212  3.001   0.817   0.817
9    gvhlthc ~~   gvhlthc 1.031 0.049 21.109      0    0.936    1.127  1.031   0.476   0.476
10   gvcldcr ~~   gvcldcr 2.087 0.081 25.883      0    1.929    2.245  2.087   0.672   0.672
11   gvjbevn ~~   gvjbevn 3.755 0.140 26.790      0    3.480    4.030  3.755   0.731   0.731
12   gvpdlwk ~~   gvpdlwk 2.032 0.079 25.627      0    1.877    2.188  2.032   0.657   0.657
13 welf_supp ~~ welf_supp 1.274 0.077 16.546      0    1.123    1.425  1.000   1.000   1.000

Lavaan offers a quick way to check model fit and statistics

summary(fit_ws_6,            # fitted model 
        fit.measures = TRUE, # returns commonly used fit measures 
        standardized = TRUE  # indicates that we want standardized results
        )
lavaan 0.6-10 ended normally after 28 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        12
                                                      
                                                  Used       Total
  Number of observations                          1740        1760
                                                                  
Model Test User Model:
                                                      
  Test statistic                               326.956
  Degrees of freedom                                 9
  P-value (Chi-square)                           0.000

Model Test Baseline Model:

  Test statistic                              2621.015
  Degrees of freedom                                15
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.878
  Tucker-Lewis Index (TLI)                       0.797

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -19550.568
  Loglikelihood unrestricted model (H1)             NA
                                                      
  Akaike (AIC)                               39125.136
  Bayesian (BIC)                             39190.676
  Sample-size adjusted Bayesian (BIC)        39152.553

Root Mean Square Error of Approximation:

  RMSEA                                          0.142
  90 Percent confidence interval - lower         0.129
  90 Percent confidence interval - upper         0.156
  P-value RMSEA <= 0.05                          0.000

Standardized Root Mean Square Residual:

  SRMR                                           0.061

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  welf_supp =~                                                          
    gvslvol           1.000                               1.129    0.761
    gvslvue           0.727    0.046   15.908    0.000    0.821    0.428
    gvhlthc           0.945    0.037   25.657    0.000    1.067    0.724
    gvcldcr           0.895    0.042   21.086    0.000    1.010    0.573
    gvjbevn           1.042    0.054   19.184    0.000    1.176    0.519
    gvpdlwk           0.913    0.042   21.526    0.000    1.031    0.586

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .gvslvol           0.923    0.048   19.088    0.000    0.923    0.420
   .gvslvue           3.001    0.108   27.859    0.000    3.001    0.817
   .gvhlthc           1.031    0.049   21.109    0.000    1.031    0.476
   .gvcldcr           2.087    0.081   25.883    0.000    2.087    0.672
   .gvjbevn           3.755    0.140   26.790    0.000    3.755    0.731
   .gvpdlwk           2.032    0.079   25.627    0.000    2.032    0.657
    welf_supp         1.274    0.077   16.546    0.000    1.000    1.000
# we can also just extract the factor loadings.
# using lavaan terminology, the factors loadings are the lambdas

inspect(fit_ws_6,           # fitted model 
        what="std")$lambda  # standardized factors loadings
        wlf_sp
gvslvol  0.761
gvslvue  0.428
gvhlthc  0.724
gvcldcr  0.573
gvjbevn  0.519
gvpdlwk  0.586
# more info on the factors loading can be obtained using the tidy SEM package
# first we extract all the estimated parameters 

tidy_results <- table_results(fit_ws_6,             
  columns = c("label", 
              "est_sig", 
              "se", 
              "confint"),
  digits = 2
)

tidy_results %>% filter(str_detect(label, "welf_supp."))
                 label est_sig   se      confint
1 welf_supp.BY.gvslvol    1.00 0.00 [1.00, 1.00]
2 welf_supp.BY.gvslvue 0.73*** 0.05 [0.64, 0.82]
3 welf_supp.BY.gvhlthc 0.94*** 0.04 [0.87, 1.02]
4 welf_supp.BY.gvcldcr 0.89*** 0.04 [0.81, 0.98]
5 welf_supp.BY.gvjbevn 1.04*** 0.05 [0.94, 1.15]
6 welf_supp.BY.gvpdlwk 0.91*** 0.04 [0.83, 1.00]
# we can also take a look at residual variances 
# using lavaan terminology, the residual variances are the thetas

theta <- round(inspect(fit_ws_6, "est")$theta,3)
theta.std <- round(inspect(fit_ws_6, "std")$theta,3) 
r2 <- round(inspect(fit_ws_6, "r2"),3)

data.frame(row.names = c(),                       # empty the columns names 
           Variables = colnames(theta),           # variable names 
           "Residuals" = diag(theta),             # diagonal theta
           "Std. Residuals" = diag(theta.std),    # diagonal std. theta
           "R Squared" = r2                       # R-squared
        )
  Variables Residuals Std..Residuals R.Squared
1   gvslvol     0.923          0.420     0.580
2   gvslvue     3.001          0.817     0.183
3   gvhlthc     1.031          0.476     0.524
4   gvcldcr     2.087          0.672     0.328
5   gvjbevn     3.755          0.731     0.269
6   gvpdlwk     2.032          0.657     0.343

7.7 Model Fit statistics

We have two different type of fit statistics.

  1. Global fit measures:
    • They take into account how the entire entire model fit the data
    • (some) rules of thumb: \(CFI/TLI > 0.95, RMSEA < 0.05, SRMR < 0.06\)
    • current practice is: chi-square value + df + pvalue, RMSEA, CFI and SRMR
    • DO NOT cherry pick your fit indices
    • Check Hu and Bentler (1999)
  2. “Local” fit measures:
## Global fit measures ##
# we can select which global fit measures to extract
fitMeasures(fit_ws_6, c("logl","AIC", "BIC", "chisq", "df", "pvalue", "cfi", "tli","rmsea"), output = "matrix")
                 
logl   -19550.568
aic     39125.136
bic     39190.676
chisq     326.956
df          9.000
pvalue      0.000
cfi         0.878
tli         0.797
rmsea       0.142

Q: How our model perform in terms of global fit measures?

We can have a better grasp of what is happening taking a look at the local fit of our model.

## Local fit measures: modification indices ##
mi <- inspect(fit_ws_6,"mi")
mi.sorted <- mi[order(-mi$mi),] # sort from high to low mi.sorted[1:5,] # only display some large MI values
mi.sorted[1:5,] # only display some large MI values
       lhs op     rhs      mi    epc sepc.lv sepc.all sepc.nox
27 gvcldcr ~~ gvpdlwk 243.943  0.918   0.918    0.446    0.446
15 gvslvol ~~ gvhlthc 168.789  0.621   0.621    0.637    0.637
23 gvhlthc ~~ gvcldcr  58.187 -0.375  -0.375   -0.256   -0.256
21 gvslvue ~~ gvjbevn  47.839  0.607   0.607    0.181    0.181
25 gvhlthc ~~ gvpdlwk  39.471 -0.310  -0.310   -0.214   -0.214
# let's plot the modification indices 
plot(mi.sorted$mi) # plot the MI values
abline(h=3.84) # add a horizontal reference line (chisq value for 1 df where p=0.05)

Q: What the plot is suggesting ?

  1. In certain contexts, we can modify the model based on a review of:
    • MI’s in combination with EPC’s (Expected Value Change). Both need to be “substantial”
    • Theory or the source of the data (e.g. review the content of the test items)
  2. However, modifying a CFA moves it away from a strictly confirmatory model
    • The more modifications, the more exploratory the model becomes
    • Maybe this model was not ready for a confirmatory modelling strategy?

8 3-factor CFA model

We can also specify a model with more than 1 latent factor. Let’s use the welfare criticism items. First, let’s check the sample implied covariance matrix.

ess_df_selected <- ess_df %>% select(
                  ## Economic criticism items ##
                  sbstrec, # strain on economy
                  sbbsntx, # too much taxes
                  ##    Social criticism items ## 
                  sbprvpv, # poverty
                  sbeqsoc, # more equal society
                  sbcwkfm, # work and family
                  ##    Moral criticism items ##
                  sblazy,  # people lazy 
                  sblwcoa, # care for others
                  sblwlka  # look after others
)

welfare_crit_cov <- cov(ess_df_selected, use = "pairwise.complete.obs")

welfare_crit_cor <- cov2cor(welfare_crit_cov)

corrplot::corrplot(welfare_crit_cor, 
                   is.corr = FALSE,       # whether is a correlation matrix 
                   method = "circle",     # magnitude of covariance as circles 
                   type = "upper",        # remove the bottom of the covariance matrix
                   addCoef.col = "black"  # add to the plot the coefficients
          )

Next, let’s fit our 3-factor model using lavaan. The syntax is similar to the 1-factor model

model_wc <-'
## Economic criticism ##
wc_econo =~ sbstrec + sbbsntx
## Social criticism ## 
wc_socia =~ sbprvpv + sbeqsoc + sbcwkfm
##  Moral criticism ##
wc_moral =~ sblazy + sblwcoa + sblwlka
'

fit_wc <- cfa(model_wc,              # model formula
             data = ess_df_selected  # data frame
  )

summary(fit_wc, standardized=TRUE)
lavaan 0.6-10 ended normally after 36 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        19
                                                      
                                                  Used       Total
  Number of observations                          1679        1760
                                                                  
Model Test User Model:
                                                      
  Test statistic                                71.348
  Degrees of freedom                                17
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  wc_econo =~                                                           
    sbstrec           1.000                               0.628    0.616
    sbbsntx           1.001    0.093   10.817    0.000    0.629    0.629
  wc_socia =~                                                           
    sbprvpv           1.000                               0.462    0.527
    sbeqsoc           1.493    0.166    9.011    0.000    0.689    0.824
    sbcwkfm           0.631    0.055   11.461    0.000    0.291    0.375
  wc_moral =~                                                           
    sblazy            1.000                               0.749    0.710
    sblwcoa           1.058    0.045   23.285    0.000    0.792    0.766
    sblwlka           0.977    0.043   22.843    0.000    0.732    0.706

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  wc_econo ~~                                                           
    wc_socia         -0.024    0.011   -2.093    0.036   -0.082   -0.082
    wc_moral          0.250    0.023   10.872    0.000    0.532    0.532
  wc_socia ~~                                                           
    wc_moral         -0.034    0.012   -2.915    0.004   -0.099   -0.099

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .sbstrec           0.647    0.042   15.381    0.000    0.647    0.621
   .sbbsntx           0.603    0.041   14.584    0.000    0.603    0.604
   .sbprvpv           0.554    0.030   18.722    0.000    0.554    0.722
   .sbeqsoc           0.224    0.051    4.404    0.000    0.224    0.320
   .sbcwkfm           0.518    0.020   25.832    0.000    0.518    0.859
   .sblazy            0.551    0.027   20.083    0.000    0.551    0.495
   .sblwcoa           0.441    0.027   16.639    0.000    0.441    0.413
   .sblwlka           0.539    0.027   20.320    0.000    0.539    0.502
    wc_econo          0.395    0.045    8.689    0.000    1.000    1.000
    wc_socia          0.213    0.029    7.328    0.000    1.000    1.000
    wc_moral          0.561    0.039   14.477    0.000    1.000    1.000

9 Plotting

Plotting SEM is quite common, especially when the model are rather complex. There are different packages that can be use to plot SEM in lavaan. We are going to use the tidySEM package. Compared to other packages, it can plot both lavaan and Mplus models and it is fairly intuitive. However, you can also check semPlot or lavaanPlot.

# first, let's define our plot layout 
lay <- get_layout("wc_econo", "", "", "wc_socia","", "","wc_moral", "",
                  "sbstrec", "sbbsntx", "sbprvpv", "sbeqsoc", "sbcwkfm", "sblazy", "sblwcoa", "sblwlka", rows = 2)

# let's take a look at our plot layout.
lay
     [,1]       [,2]      [,3]      [,4]       [,5]      [,6]     [,7]       [,8]     
[1,] "wc_econo" ""        ""        "wc_socia" ""        ""       "wc_moral" ""       
[2,] "sbstrec"  "sbbsntx" "sbprvpv" "sbeqsoc"  "sbcwkfm" "sblazy" "sblwcoa"  "sblwlka"
attr(,"class")
[1] "layout_matrix" "matrix"        "array"        
# let's plot our results
plot_wc <- graph_sem(model = fit_wc,      # model fit
                    layout = lay,         # layout 
                    angle = 170           # adjust the arrows 
                    #label = "est_std",  # get standardized results (not rounded)
          )   

plot_wc

With tidySEM, you can fully customize your plot. For instance, it is possible to highlighting a specific model element, such as the low factor loading for sbcwkfm on wc_socia

graph_data <- prepare_graph(fit_wc)
 
edges(graph_data) <- graph_data %>% 
  edges() %>%
  mutate(colour = "black") %>%
  mutate(colour = replace(colour, from == "wc_socia" & to == "sbcwkfm", "red"))

plot(graph_data,
     layout = lay,        # layout 
     #label = "est_std",   # get standardized results (not rounded)
     angle = 170          # adjust the arrows 
  )

10 !!Support Ukraine!!

References

Hu, Li-tze, and Peter M. Bentler. 1999. “Cutoff Criteria for Fit Indexes in Covariance Structure Analysis: Conventional Criteria Versus New Alternatives.” Structural Equation Modeling: A Multidisciplinary Journal 6 (1): 1–55. https://doi.org/10.1080/10705519909540118.
Thoemmes, Felix, Yves Rosseel, and Johannes Textor. 2018. “Local Fit Evaluation of Structural Equation Models Using Graphical Criteria.” PSYCHOLOGICAL METHODS 23 (1): 27–41. https://doi.org/10.1037/met0000147.