Structural Equation Modeling | Lab Session 1
1 Let’s pause for a sec
2 Lab sessions: why and how?
- Apply theoretical knowledge
- Increase understanding by interacting with data
- Learn to use some packages in R
- How:
- Relatively unstructured
- Go at your own pace, try to do the exercises yourself (do yourself a favour and do not just copy paste and run the solutions)
- “There is never time to do it right, but there is always time to do it over”
2.1 Software used: R
- This is not an R course!
- We will learn some R as we go along
- I will use RStudio
- Many packages or libraries exist to do specific analyses
3 What we are going to cover
- Data preparation
- Data exploration
- One-factor CFA model
- Degrees of Freedom
- Model diagnostics
- Model fit statistics
- A three-factor CFA model
- Plotting SEM
4 Data
The data set used throughout is the European Social Survey ESS4-2008 Edition 4.5 was released on 1 December 2018. We will restrict the analysis to the Belgian case. Each line in the data set represents a Belgian respondent. The full dataset an documentation can be found on the ESS website
Codebook:
gvslvol Standard of living for the old, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
gvslvue Standard of living for the unemployed, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
gvhlthc Health care for the sick, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
gvcldcr Child care services for working parents, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
gvjbevn Job for everyone, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
gvpdlwk Paid leave from work to care for sick family, governments’ responsibility (0 Not governments’ responsibility at all - 10 Entirely governments’ responsibility)
sbstrec Social benefits/services place too great strain on economy (1 Agree strongly - 5 Disagree strongly)
sbbsntx Social benefits/services cost businesses too much in taxes/charges (1 Agree strongly - 5 Disagree strongly)
sbprvpv Social benefits/services prevent widespread poverty (1 Agree strongly - 5 Disagree strongly)
sbeqsoc Social benefits/services lead to a more equal society (1 Agree strongly - 5 Disagree strongly)
sbcwkfm Social benefits/services make it easier to combine work and family (1 Agree strongly - 5 Disagree strongly)
sblazy Social benefits/services make people lazy (1 Agree strongly - 5 Disagree strongly)
sblwcoa Social benefits/services make people less willing care for one another (1 Agree strongly - 5 Disagree strongly)
sblwlka Social benefits/services make people less willing look after themselves/family (1 Agree strongly - 5 Disagree strongly)
agea Respondent’s age
eduyrs Years of full-time education completed
gndr Gender (1 Male, 2 Female)
hinctnta Household’s total net income, all sources (Deciles of the actual household income range in the given country.)
gincdif Government should reduce differences in income levels (1 Agree strongly - 5 Disagree strongly)
dfincac Large differences in income acceptable to reward talents and efforts (1 Agree strongly - 5 Disagree strongly)
smdfslv For fair society, differences in standard of living should be small (1 Agree strongly - 5 Disagree strongly)
5 Environment preparation
First, let’s load the necessary packages to load, manipulate, visualize and analyse the data.
# Uncomment this once if you need to install the packages on your system
### DATA MANIPULATION ###
# install.packages("haven") # data import from spss
# install.packages("dplyr") # data manipulation
# install.packages("psych") # descriptives
# install.packages("stringr") # string manipulation
### MODELING ###
# install.packages("lavaan") # SEM modelling
### VISUALIZATION ###
# install.packages("tidySEM") # plotting SEM models
# install.packages("corrplot") # correlation/covariance plots
# Load the packages
### DATA MANIPULATION ###
library("haven")
library("dplyr")
library("psych")
library('stringr')
### MODELING ###
library("lavaan")
### VISUALIZATION ###
library("corrplot")
library("tidySEM")
6 Data exploration
It is a good practice to check that everything is in order and make sense of the data that we are going to analyze.
<- haven::read_sav("https://github.com/albertostefanelli/SEM_labs/raw/master/data/ESS4_belgium.sav")
ess_df
head(ess_df, 20) # first 20 rows of the dataset
# A tibble: 20 × 21
gvslvol gvslvue gvhlthc gvcldcr gvjbevn gvpdlwk sbstrec sbbsntx sbprvpv sbeqsoc sbcwkfm sblazy sblwcoa sblwlka agea eduyrs gndr hinctnta gincdif dfincac smdfslv
<dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl> <dbl+> <dbl+l> <dbl+lb> <dbl+l> <dbl+l> <dbl+l>
1 8 [8] 7 [7] 8 [8] 5 [5] 8 [8] 5 [5] 2 [Agr… 1 [Agr… 1 [Agr… 3 [Nei… 2 [Agr… 2 [Agr… 2 [Agr… 2 [Agr… 36 18 1 [Mal… 4 [M -… 2 [Agr… 3 [Nei… 2 [Agr…
2 7 [7] 7 [7] 8 [8] 7 [7] 7 [7] 6 [6] 4 [Dis… 3 [Nei… 2 [Agr… 2 [Agr… 2 [Agr… 4 [Dis… 4 [Dis… 4 [Dis… 26 15 2 [Fem… 7 [K -… 2 [Agr… 3 [Nei… 2 [Agr…
3 6 [6] 3 [3] 6 [6] 5 [5] 3 [3] 4 [4] 2 [Agr… 2 [Agr… 3 [Nei… 4 [Dis… 3 [Nei… 1 [Agr… 1 [Agr… 1 [Agr… 69 18 1 [Mal… 10 [H -… 4 [Dis… 1 [Agr… 4 [Dis…
4 6 [6] 6 [6] 6 [6] 6 [6] 3 [3] 3 [3] 3 [Nei… 2 [Agr… 2 [Agr… 2 [Agr… 2 [Agr… 3 [Nei… 2 [Agr… 2 [Agr… 77 15 2 [Fem… 7 [K -… 2 [Agr… 2 [Agr… 2 [Agr…
5 6 [6] 4 [4] 7 [7] 5 [5] 6 [6] 5 [5] 4 [Dis… 1 [Agr… 2 [Agr… 2 [Agr… 2 [Agr… 4 [Dis… 3 [Nei… 3 [Nei… 27 13 1 [Mal… 7 [K -… 3 [Nei… 4 [Dis… 4 [Dis…
6 10 [Entirely governments' responsibility] 5 [5] 10 [Entirely governments' responsibility] 5 [5] 5 [5] 7 [7] 3 [Nei… 2 [Agr… 2 [Agr… 2 [Agr… 3 [Nei… 2 [Agr… 4 [Dis… 4 [Dis… 32 12 2 [Fem… 6 [S -… 4 [Dis… 2 [Agr… 4 [Dis…
7 10 [Entirely governments' responsibility] 6 [6] 8 [8] 6 [6] 8 [8] 6 [6] 4 [Dis… 2 [Agr… 2 [Agr… 4 [Dis… 3 [Nei… 4 [Dis… 3 [Nei… 4 [Dis… 19 13 2 [Fem… 8 [P -… 4 [Dis… 2 [Agr… 2 [Agr…
8 7 [7] 6 [6] 9 [9] 7 [7] 8 [8] 8 [8] 4 [Dis… 3 [Nei… 2 [Agr… 2 [Agr… 2 [Agr… 3 [Nei… 4 [Dis… 4 [Dis… 28 17 2 [Fem… 10 [H -… 2 [Agr… 3 [Nei… 2 [Agr…
9 9 [9] 6 [6] 9 [9] 5 [5] 5 [5] 8 [8] 2 [Agr… 3 [Nei… 3 [Nei… 2 [Agr… 3 [Nei… 2 [Agr… 2 [Agr… 3 [Nei… 49 16 1 [Mal… 4 [M -… 1 [Agr… 4 [Dis… 2 [Agr…
10 8 [8] 8 [8] 8 [8] 8 [8] 8 [8] 8 [8] 4 [Dis… 3 [Nei… 2 [Agr… 2 [Agr… 3 [Nei… 3 [Nei… 4 [Dis… 3 [Nei… 57 16 1 [Mal… 9 [D -… 2 [Agr… 2 [Agr… 3 [Nei…
11 8 [8] 6 [6] 8 [8] 7 [7] 6 [6] 6 [6] 2 [Agr… 2 [Agr… 2 [Agr… 4 [Dis… 4 [Dis… 2 [Agr… 2 [Agr… 2 [Agr… 71 17 2 [Fem… 7 [K -… 4 [Dis… 4 [Dis… 3 [Nei…
12 8 [8] 6 [6] 8 [8] 8 [8] 7 [7] 7 [7] 1 [Agr… 3 [Nei… 4 [Dis… 4 [Dis… 3 [Nei… 3 [Nei… 2 [Agr… 2 [Agr… 83 16 2 [Fem… 8 [P -… 3 [Nei… 2 [Agr… 2 [Agr…
13 6 [6] 3 [3] 7 [7] 4 [4] 5 [5] 4 [4] 2 [Agr… 2 [Agr… 3 [Nei… 2 [Agr… 2 [Agr… 2 [Agr… 4 [Dis… 2 [Agr… 46 5 2 [Fem… 10 [H -… 3 [Nei… 4 [Dis… 2 [Agr…
14 9 [9] 9 [9] 9 [9] 9 [9] 10 [Entirely governments' responsi… 8 [8] 4 [Dis… 4 [Dis… 2 [Agr… 2 [Agr… 3 [Nei… 5 [Dis… 4 [Dis… 4 [Dis… 61 14 2 [Fem… 1 [J -… 4 [Dis… 2 [Agr… 3 [Nei…
15 9 [9] 9 [9] 9 [9] 9 [9] 7 [7] 9 [9] 4 [Dis… 3 [Nei… 2 [Agr… 2 [Agr… 2 [Agr… 2 [Agr… 2 [Agr… 1 [Agr… 68 11 2 [Fem… 3 [C -… 1 [Agr… 2 [Agr… 2 [Agr…
16 8 [8] 7 [7] 8 [8] 6 [6] 7 [7] 7 [7] 4 [Dis… 4 [Dis… 2 [Agr… 2 [Agr… 2 [Agr… 4 [Dis… 3 [Nei… 2 [Agr… 62 11 1 [Mal… 8 [P -… 2 [Agr… 4 [Dis… 2 [Agr…
17 10 [Entirely governments' responsibility] 10 [Entirely governments' responsibility] 10 [Entirely governments' responsibility] 7 [7] 7 [7] 5 [5] 4 [Dis… 2 [Agr… 2 [Agr… 2 [Agr… 2 [Agr… 4 [Dis… 4 [Dis… 4 [Dis… 23 12 1 [Mal… 2 [R -… 1 [Agr… 2 [Agr… 3 [Nei…
18 7 [7] 6 [6] 7 [7] 7 [7] 4 [4] 6 [6] 4 [Dis… 4 [Dis… 2 [Agr… 3 [Nei… 4 [Dis… 2 [Agr… 2 [Agr… 2 [Agr… 64 11 1 [Mal… 5 [F -… 4 [Dis… 1 [Agr… 4 [Dis…
19 8 [8] 3 [3] 7 [7] 8 [8] 7 [7] 8 [8] 2 [Agr… 3 [Nei… 2 [Agr… 2 [Agr… 2 [Agr… 2 [Agr… 2 [Agr… 2 [Agr… 64 13 1 [Mal… 9 [D -… 2 [Agr… 4 [Dis… 1 [Agr…
20 5 [5] 6 [6] 5 [5] 8 [8] 5 [5] 8 [8] 4 [Dis… 1 [Agr… 2 [Agr… 1 [Agr… 1 [Agr… 2 [Agr… 2 [Agr… 2 [Agr… 45 8 1 [Mal… 7 [K -… 2 [Agr… 3 [Nei… 2 [Agr…
nrow(ess_df) # number of subjects
[1] 1760
ncol(ess_df) # number of variables
[1] 21
names(ess_df) # names of variables
[1] "gvslvol" "gvslvue" "gvhlthc" "gvcldcr" "gvjbevn" "gvpdlwk" "sbstrec" "sbbsntx" "sbprvpv" "sbeqsoc" "sbcwkfm" "sblazy" "sblwcoa" "sblwlka" "agea" "eduyrs" "gndr" "hinctnta" "gincdif" "dfincac" "smdfslv"
In this first lab, we are exploring two concepts: welfare support and welfare criticism. Let’s take a closer look at our items
<- ess_df %>% select(
ess_df_selected ## Welfare support items ##
# the old
gvslvol, # the unemployed
gvslvue, # the sick
gvhlthc, # working parents
gvcldcr, # job for everyone
gvjbevn, # paid sick leave
gvpdlwk, ## Economic criticism items ##
# strain on economy
sbstrec, # too much taxes
sbbsntx, ## Social criticism items ##
# poverty
sbprvpv, # more equal society
sbeqsoc, # work and family
sbcwkfm, ## Moral criticism items ##
# people lazy
sblazy, # care for others
sblwcoa, # look after others
sblwlka
)
<- as.data.frame(psych::describe(ess_df_selected))
descriptive_ess
<- dplyr::select(descriptive_ess,
descriptive_ess
n,
mean,
sd,
median,
min,
max,
skew,
kurtosis)
descriptive_ess
n mean sd median min max skew kurtosis
gvslvol 1759 7.871518 1.4811986 8 0 10 -0.78122430 1.37621274
gvslvue 1753 6.059897 1.9176408 6 0 10 -0.40989634 0.40781515
gvhlthc 1757 8.030734 1.4715101 8 0 10 -0.94788987 1.81361522
gvcldcr 1748 7.276316 1.7622804 7 0 10 -0.80981813 1.34667552
gvjbevn 1756 6.234624 2.2667339 7 0 10 -0.48699173 -0.08302998
gvpdlwk 1753 7.310325 1.7677965 8 0 10 -0.82684920 1.31394820
sbstrec 1735 2.963112 1.0193172 3 1 5 0.10310828 -0.92138250
sbbsntx 1714 2.531505 0.9970179 2 1 5 0.42532526 -0.51782206
sbprvpv 1740 2.333908 0.8722280 2 1 5 0.85304303 0.41061715
sbeqsoc 1748 2.374714 0.8341462 2 1 5 0.94680277 0.68051829
sbcwkfm 1735 2.334870 0.7775840 2 1 5 0.98878599 0.95302935
sblazy 1751 2.894917 1.0557772 3 1 5 0.10529513 -0.92067158
sblwcoa 1751 2.919475 1.0352766 3 1 5 0.06846687 -1.05077060
sblwlka 1746 2.981672 1.0339269 3 1 5 -0.09401127 -1.07498593
Before getting into more complex modelling, it is worth checking the structure of the data by calculating the variance-covariance matrix for the items that we are going to use to fit a CFA. Let’s first start with the items measuring welfare support.
# let's select the welfare support items
<- ess_df %>% select(
ess_df_welfare_supp ## Welfare support items ##
# the old
gvslvol, # the unemployed
gvslvue, # the sick
gvhlthc, # working parents
gvcldcr, # job for everyone
gvjbevn, # paid sick leave
gvpdlwk
)
# let's calculate the sample implied covariance matrix
<- cov(ess_df_welfare_supp, # data frame
welfare_supp_cov use = "pairwise.complete.obs" # remove NAs
)
welfare_supp_cov
gvslvol gvslvue gvhlthc gvcldcr gvjbevn gvpdlwk
gvslvol 2.1939494 0.8998156 1.3524399 1.0333956 1.203299 1.0712067
gvslvue 0.8998156 3.6773463 0.8016763 0.7827062 1.488233 0.8159548
gvhlthc 1.3524399 0.8016763 2.1653419 0.8897580 1.287510 0.9485565
gvcldcr 1.0333956 0.7827062 0.8897580 3.1056322 1.231219 1.6873857
gvjbevn 1.2032994 1.4882330 1.2875099 1.2312188 5.138083 1.1656366
gvpdlwk 1.0712067 0.8159548 0.9485565 1.6873857 1.165637 3.1251045
# visual inspection is sometimes more useful
# we can use the corrplot package.
# it it designed for correlation matrices
# but they can also plot covariance matrices
<- cov2cor(welfare_supp_cov)
welfare_supp_cor welfare_supp_cor
gvslvol gvslvue gvhlthc gvcldcr gvjbevn gvpdlwk
gvslvol 1.0000000 0.3167911 0.6204995 0.3958934 0.3583933 0.4090983
gvslvue 0.3167911 1.0000000 0.2840982 0.2316096 0.3423759 0.2406947
gvhlthc 0.6204995 0.2840982 1.0000000 0.3431102 0.3859995 0.3646428
gvcldcr 0.3958934 0.2316096 0.3431102 1.0000000 0.3082192 0.5416354
gvjbevn 0.3583933 0.3423759 0.3859995 0.3082192 1.0000000 0.2908911
gvpdlwk 0.4090983 0.2406947 0.3646428 0.5416354 0.2908911 1.0000000
::corrplot(welfare_supp_cor,
corrplotis.corr = FALSE, # whether is a correlation matrix
method = "circle", # magnitude of covariances as circles
type = "upper", # remove the bottom of the covariance matrix
addCoef.col = "black" # add to the plot the coefficients
)
7 1-factor CFA model
7.1 1-factor model: 3 indicators
Now that we have an idea of what is in our data we apply our theoretical knowledge to the our empirical data. Our measurement model will be formed by 3 items that loads on the latent factor “Welfare Support.”
7.2 Degrees of freedom
Sorts of identification:
- non-identified: \(df < 0\), impossible to estimate parameters
- over-identified: \(df > 0\), we should strive for this
- just-identified: \(df = 0\), it is ok, but the fit cannot be assessed
- empirical under-identification: it can happen when two or more indicators are highly correlated (multicollinearity)
We can quickly check if our model has positive degrees of freedom using the following formula.
Degrees of freedom = Number of parameters to estimate - Pieces of information
Pieces of information = \(\dfrac{p(p+1)}{2}=\dfrac{3(3+1)}{2}=6\).
where \(p\) is the number of “manifest” indicators \(i\) (also called “measured variables” or “observed indicators”), in this case gvslvol gvslvue, gvhlthc.
Number of parameters to estimate = \(\psi{f} + \lambda_{ji} + \theta_i= 1 + 2 + 3=6\)
where \(\psi{f}\) is the latent factor \(f\) variance, \(\lambda_{fi}\) is the loading on the latent factor \(f\) for each observed indicator \(i\), and \(\theta_{i}\) is the residual variance for each observed indicator \(i\).
Q: Why \(\lambda_{fi}=2\) ? (Pro Tip: (do not count parameters we fixed for scaling).
Degrees of freedom = Number of parameters to estimate - Pieces of informations =\(6-6=0\)
Q: What are the implication of having 0 degrees of freedom?
Remember:
- Factor variance (\(\psi{j}\)): how much individuals differ on the factor
- Factor loadings (\(\lambda_{ji}\)): the effect of the factor or latent variable on the measure
- Residual (error) variance (\(\theta_{i}\)) : variance in the measure not attributable to the factor
Let’s now fit our CFA model using gvslvol, gvslvue, gvhlthc.
<-'welf_supp =~ gvslvol + gvslvue + gvhlthc'
model_ws_3
<- cfa(model_ws_3, # model formula
fit_ws_3 data = ess_df_selected # data frame
)
summary(fit_ws_3,
standardized = TRUE)
lavaan 0.6-10 ended normally after 26 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 6
Used Total
Number of observations 1752 1760
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
welf_supp =~
gvslvol 1.000 1.233 0.832
gvslvue 0.593 0.048 12.395 0.000 0.730 0.381
gvhlthc 0.890 0.062 14.292 0.000 1.097 0.746
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.gvslvol 0.677 0.103 6.555 0.000 0.677 0.308
.gvslvue 3.143 0.112 28.083 0.000 3.143 0.855
.gvhlthc 0.962 0.086 11.171 0.000 0.962 0.444
welf_supp 1.519 0.123 12.359 0.000 1.000 1.000
7.3 Lavaan syntax
Let’s take a closer look at the lavaan syntax
- latent variable definition
=~
(is measured by) - regression
~
(is regressed on) - (residual) (co)variance
~~
(is correlated with) - intercept
~1
(intercept) - new model parameter
:=
(is equal to)
So, if we want, we can freely estimate the first factor loading and fix the factor variance to 1:
7.4 Alternative model specification
Remember: Every factor in CFA should be given a metric to be identified.
- Fix one factor loading to 1 (“marker variable” method)
- Fix factor variance to 1
- Fix the average loading to 1 (“effect coding” method).
<-'
model_ws_3_alt welf_supp =~ NA*gvslvol + gvslvue + gvhlthc
welf_supp~~1*welf_supp
'
<- cfa(model_ws_3_alt, # model formula
fit_ws_3_alt data = ess_df_selected # data frame
)
summary(fit_ws_3_alt,
standardized = TRUE)
lavaan 0.6-10 ended normally after 21 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 6
Used Total
Number of observations 1752 1760
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
welf_supp =~
gvslvol 1.233 0.050 24.718 0.000 1.233 0.832
gvslvue 0.730 0.050 14.516 0.000 0.730 0.381
gvhlthc 1.097 0.047 23.321 0.000 1.097 0.746
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
welf_supp 1.000 1.000 1.000
.gvslvol 0.677 0.103 6.555 0.000 0.677 0.308
.gvslvue 3.143 0.112 28.083 0.000 3.143 0.855
.gvhlthc 0.962 0.086 11.171 0.000 0.962 0.444
Q: What is the difference between these two models?
7.5 1-factor model: 6 indicators
Since we have 3 more indicators that measure our latent construct welfare support, let’s fit a model with 6 items instead of 3. This will allows us to estimate fit indices, one of the most valuable metric in a SEM model.
<-'welf_supp =~ gvslvol + gvslvue + gvhlthc + gvcldcr + gvjbevn + gvpdlwk'
model_ws_6
<- cfa(model_ws_6, # model formula
fit_ws_6 data = ess_df_selected # data frame
)
summary(fit_ws_6,
standardized = TRUE)
lavaan 0.6-10 ended normally after 28 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 12
Used Total
Number of observations 1740 1760
Model Test User Model:
Test statistic 326.956
Degrees of freedom 9
P-value (Chi-square) 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
welf_supp =~
gvslvol 1.000 1.129 0.761
gvslvue 0.727 0.046 15.908 0.000 0.821 0.428
gvhlthc 0.945 0.037 25.657 0.000 1.067 0.724
gvcldcr 0.895 0.042 21.086 0.000 1.010 0.573
gvjbevn 1.042 0.054 19.184 0.000 1.176 0.519
gvpdlwk 0.913 0.042 21.526 0.000 1.031 0.586
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.gvslvol 0.923 0.048 19.088 0.000 0.923 0.420
.gvslvue 3.001 0.108 27.859 0.000 3.001 0.817
.gvhlthc 1.031 0.049 21.109 0.000 1.031 0.476
.gvcldcr 2.087 0.081 25.883 0.000 2.087 0.672
.gvjbevn 3.755 0.140 26.790 0.000 3.755 0.731
.gvpdlwk 2.032 0.079 25.627 0.000 2.032 0.657
welf_supp 1.274 0.077 16.546 0.000 1.000 1.000
Q: Why do we have 9 degrees of freedom in this model?
7.6 Model diagnostics
Important things to check after running the model:
- Estimates
- Do the indicators load well on the factor(s)?
- Model fit [see next]
- Are the fit indices good?
- Heywood cases/ reasonable solution
- Are variances positive?
- Are \(r^2\) is below 1?
- Are there any extreme standard errors?
# Returns the observed covariance matrix.
lavInspect(fit_ws_6, "sampstat")
$cov
gvslvl gvslvu gvhlth gvcldc gvjbvn gvpdlw
gvslvol 2.197
gvslvue 0.894 3.675
gvhlthc 1.353 0.800 2.169
gvcldcr 1.034 0.785 0.886 3.107
gvjbevn 1.197 1.476 1.285 1.226 5.139
gvpdlwk 1.060 0.806 0.946 1.689 1.156 3.095
# Returns the model estimated covariance matrix.
fitted(fit_ws_6)
$cov
gvslvl gvslvu gvhlth gvcldc gvjbvn gvpdlw
gvslvol 2.197
gvslvue 0.926 3.675
gvhlthc 1.204 0.875 2.169
gvcldcr 1.140 0.829 1.077 3.107
gvjbevn 1.328 0.966 1.255 1.188 5.139
gvpdlwk 1.163 0.846 1.099 1.041 1.212 3.095
# Returns the difference between the observed and estimated covariance matrix
residuals(fit_ws_6)
$type
[1] "raw"
$cov
gvslvl gvslvu gvhlth gvcldc gvjbvn gvpdlw
gvslvol 0.000
gvslvue -0.033 0.000
gvhlthc 0.149 -0.075 0.000
gvcldcr -0.106 -0.044 -0.192 0.000
gvjbevn -0.131 0.510 0.030 0.037 0.000
gvpdlwk -0.103 -0.040 -0.154 0.648 -0.056 0.000
# Returns the standard errors for the free parameters in the model.
lavTech(fit_ws_6, "se")
$lambda
[,1]
[1,] 0.00000000
[2,] 0.04571461
[3,] 0.03682999
[4,] 0.04244399
[5,] 0.05432472
[6,] 0.04241931
$theta
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.0483618 0.0000000 0.00000000 0.00000000 0.0000000 0.00000000
[2,] 0.0000000 0.1077257 0.00000000 0.00000000 0.0000000 0.00000000
[3,] 0.0000000 0.0000000 0.04886228 0.00000000 0.0000000 0.00000000
[4,] 0.0000000 0.0000000 0.00000000 0.08062486 0.0000000 0.00000000
[5,] 0.0000000 0.0000000 0.00000000 0.00000000 0.1401613 0.00000000
[6,] 0.0000000 0.0000000 0.00000000 0.00000000 0.0000000 0.07930721
$psi
[,1]
[1,] 0.07699622
# Returns the parameter estimates with confidence intervals.
parameterEstimates(fit_ws_6, standardized=TRUE)
lhs op rhs est se z pvalue ci.lower ci.upper std.lv std.all std.nox
1 welf_supp =~ gvslvol 1.000 0.000 NA NA 1.000 1.000 1.129 0.761 0.761
2 welf_supp =~ gvslvue 0.727 0.046 15.908 0 0.638 0.817 0.821 0.428 0.428
3 welf_supp =~ gvhlthc 0.945 0.037 25.657 0 0.873 1.017 1.067 0.724 0.724
4 welf_supp =~ gvcldcr 0.895 0.042 21.086 0 0.812 0.978 1.010 0.573 0.573
5 welf_supp =~ gvjbevn 1.042 0.054 19.184 0 0.936 1.149 1.176 0.519 0.519
6 welf_supp =~ gvpdlwk 0.913 0.042 21.526 0 0.830 0.996 1.031 0.586 0.586
7 gvslvol ~~ gvslvol 0.923 0.048 19.088 0 0.828 1.018 0.923 0.420 0.420
8 gvslvue ~~ gvslvue 3.001 0.108 27.859 0 2.790 3.212 3.001 0.817 0.817
9 gvhlthc ~~ gvhlthc 1.031 0.049 21.109 0 0.936 1.127 1.031 0.476 0.476
10 gvcldcr ~~ gvcldcr 2.087 0.081 25.883 0 1.929 2.245 2.087 0.672 0.672
11 gvjbevn ~~ gvjbevn 3.755 0.140 26.790 0 3.480 4.030 3.755 0.731 0.731
12 gvpdlwk ~~ gvpdlwk 2.032 0.079 25.627 0 1.877 2.188 2.032 0.657 0.657
13 welf_supp ~~ welf_supp 1.274 0.077 16.546 0 1.123 1.425 1.000 1.000 1.000
Lavaan offers a quick way to check model fit and statistics
summary(fit_ws_6, # fitted model
fit.measures = TRUE, # returns commonly used fit measures
standardized = TRUE # indicates that we want standardized results
)
lavaan 0.6-10 ended normally after 28 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 12
Used Total
Number of observations 1740 1760
Model Test User Model:
Test statistic 326.956
Degrees of freedom 9
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 2621.015
Degrees of freedom 15
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.878
Tucker-Lewis Index (TLI) 0.797
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -19550.568
Loglikelihood unrestricted model (H1) NA
Akaike (AIC) 39125.136
Bayesian (BIC) 39190.676
Sample-size adjusted Bayesian (BIC) 39152.553
Root Mean Square Error of Approximation:
RMSEA 0.142
90 Percent confidence interval - lower 0.129
90 Percent confidence interval - upper 0.156
P-value RMSEA <= 0.05 0.000
Standardized Root Mean Square Residual:
SRMR 0.061
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
welf_supp =~
gvslvol 1.000 1.129 0.761
gvslvue 0.727 0.046 15.908 0.000 0.821 0.428
gvhlthc 0.945 0.037 25.657 0.000 1.067 0.724
gvcldcr 0.895 0.042 21.086 0.000 1.010 0.573
gvjbevn 1.042 0.054 19.184 0.000 1.176 0.519
gvpdlwk 0.913 0.042 21.526 0.000 1.031 0.586
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.gvslvol 0.923 0.048 19.088 0.000 0.923 0.420
.gvslvue 3.001 0.108 27.859 0.000 3.001 0.817
.gvhlthc 1.031 0.049 21.109 0.000 1.031 0.476
.gvcldcr 2.087 0.081 25.883 0.000 2.087 0.672
.gvjbevn 3.755 0.140 26.790 0.000 3.755 0.731
.gvpdlwk 2.032 0.079 25.627 0.000 2.032 0.657
welf_supp 1.274 0.077 16.546 0.000 1.000 1.000
# we can also just extract the factor loadings.
# using lavaan terminology, the factors loadings are the lambdas
inspect(fit_ws_6, # fitted model
what="std")$lambda # standardized factors loadings
wlf_sp
gvslvol 0.761
gvslvue 0.428
gvhlthc 0.724
gvcldcr 0.573
gvjbevn 0.519
gvpdlwk 0.586
# more info on the factors loading can be obtained using the tidy SEM package
# first we extract all the estimated parameters
<- table_results(fit_ws_6,
tidy_results columns = c("label",
"est_sig",
"se",
"confint"),
digits = 2
)
%>% filter(str_detect(label, "welf_supp.")) tidy_results
label est_sig se confint
1 welf_supp.BY.gvslvol 1.00 0.00 [1.00, 1.00]
2 welf_supp.BY.gvslvue 0.73*** 0.05 [0.64, 0.82]
3 welf_supp.BY.gvhlthc 0.94*** 0.04 [0.87, 1.02]
4 welf_supp.BY.gvcldcr 0.89*** 0.04 [0.81, 0.98]
5 welf_supp.BY.gvjbevn 1.04*** 0.05 [0.94, 1.15]
6 welf_supp.BY.gvpdlwk 0.91*** 0.04 [0.83, 1.00]
# we can also take a look at residual variances
# using lavaan terminology, the residual variances are the thetas
<- round(inspect(fit_ws_6, "est")$theta,3)
theta <- round(inspect(fit_ws_6, "std")$theta,3)
theta.std <- round(inspect(fit_ws_6, "r2"),3)
r2
data.frame(row.names = c(), # empty the columns names
Variables = colnames(theta), # variable names
"Residuals" = diag(theta), # diagonal theta
"Std. Residuals" = diag(theta.std), # diagonal std. theta
"R Squared" = r2 # R-squared
)
Variables Residuals Std..Residuals R.Squared
1 gvslvol 0.923 0.420 0.580
2 gvslvue 3.001 0.817 0.183
3 gvhlthc 1.031 0.476 0.524
4 gvcldcr 2.087 0.672 0.328
5 gvjbevn 3.755 0.731 0.269
6 gvpdlwk 2.032 0.657 0.343
7.7 Model Fit statistics
We have two different type of fit statistics.
- Global fit measures:
- They take into account how the entire entire model fit the data
- (some) rules of thumb: \(CFI/TLI > 0.95, RMSEA < 0.05, SRMR < 0.06\)
- current practice is: chi-square value + df + pvalue, RMSEA, CFI and SRMR
- DO NOT cherry pick your fit indices
- Check Hu and Bentler (1999)
- They take into account how the entire entire model fit the data
- “Local” fit measures:
- looking at just one part of the model
- usually involves the use of Modification Indexes
- Check Thoemmes, Rosseel, and Textor (2018)
## Global fit measures ##
# we can select which global fit measures to extract
fitMeasures(fit_ws_6, c("logl","AIC", "BIC", "chisq", "df", "pvalue", "cfi", "tli","rmsea"), output = "matrix")
logl -19550.568
aic 39125.136
bic 39190.676
chisq 326.956
df 9.000
pvalue 0.000
cfi 0.878
tli 0.797
rmsea 0.142
Q: How our model perform in terms of global fit measures?
We can have a better grasp of what is happening taking a look at the local fit of our model.
## Local fit measures: modification indices ##
<- inspect(fit_ws_6,"mi")
mi <- mi[order(-mi$mi),] # sort from high to low mi.sorted[1:5,] # only display some large MI values
mi.sorted 1:5,] # only display some large MI values mi.sorted[
lhs op rhs mi epc sepc.lv sepc.all sepc.nox
27 gvcldcr ~~ gvpdlwk 243.943 0.918 0.918 0.446 0.446
15 gvslvol ~~ gvhlthc 168.789 0.621 0.621 0.637 0.637
23 gvhlthc ~~ gvcldcr 58.187 -0.375 -0.375 -0.256 -0.256
21 gvslvue ~~ gvjbevn 47.839 0.607 0.607 0.181 0.181
25 gvhlthc ~~ gvpdlwk 39.471 -0.310 -0.310 -0.214 -0.214
# let's plot the modification indices
plot(mi.sorted$mi) # plot the MI values
abline(h=3.84) # add a horizontal reference line (chisq value for 1 df where p=0.05)
Q: What the plot is suggesting ?
- In certain contexts, we can modify the model based on a review of:
- MI’s in combination with EPC’s (Expected Value Change). Both need to be “substantial”
- Theory or the source of the data (e.g. review the content of the test items)
- However, modifying a CFA moves it away from a strictly confirmatory model
- The more modifications, the more exploratory the model becomes
- Maybe this model was not ready for a confirmatory modelling strategy?
8 3-factor CFA model
We can also specify a model with more than 1 latent factor. Let’s use the welfare criticism items. First, let’s check the sample implied covariance matrix.
<- ess_df %>% select(
ess_df_selected ## Economic criticism items ##
# strain on economy
sbstrec, # too much taxes
sbbsntx, ## Social criticism items ##
# poverty
sbprvpv, # more equal society
sbeqsoc, # work and family
sbcwkfm, ## Moral criticism items ##
# people lazy
sblazy, # care for others
sblwcoa, # look after others
sblwlka
)
<- cov(ess_df_selected, use = "pairwise.complete.obs")
welfare_crit_cov
<- cov2cor(welfare_crit_cov)
welfare_crit_cor
::corrplot(welfare_crit_cor,
corrplotis.corr = FALSE, # whether is a correlation matrix
method = "circle", # magnitude of covariance as circles
type = "upper", # remove the bottom of the covariance matrix
addCoef.col = "black" # add to the plot the coefficients
)
Next, let’s fit our 3-factor model using lavaan. The syntax is similar to the 1-factor model
<-'
model_wc ## Economic criticism ##
wc_econo =~ sbstrec + sbbsntx
## Social criticism ##
wc_socia =~ sbprvpv + sbeqsoc + sbcwkfm
## Moral criticism ##
wc_moral =~ sblazy + sblwcoa + sblwlka
'
<- cfa(model_wc, # model formula
fit_wc data = ess_df_selected # data frame
)
summary(fit_wc, standardized=TRUE)
lavaan 0.6-10 ended normally after 36 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 19
Used Total
Number of observations 1679 1760
Model Test User Model:
Test statistic 71.348
Degrees of freedom 17
P-value (Chi-square) 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
wc_econo =~
sbstrec 1.000 0.628 0.616
sbbsntx 1.001 0.093 10.817 0.000 0.629 0.629
wc_socia =~
sbprvpv 1.000 0.462 0.527
sbeqsoc 1.493 0.166 9.011 0.000 0.689 0.824
sbcwkfm 0.631 0.055 11.461 0.000 0.291 0.375
wc_moral =~
sblazy 1.000 0.749 0.710
sblwcoa 1.058 0.045 23.285 0.000 0.792 0.766
sblwlka 0.977 0.043 22.843 0.000 0.732 0.706
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
wc_econo ~~
wc_socia -0.024 0.011 -2.093 0.036 -0.082 -0.082
wc_moral 0.250 0.023 10.872 0.000 0.532 0.532
wc_socia ~~
wc_moral -0.034 0.012 -2.915 0.004 -0.099 -0.099
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.sbstrec 0.647 0.042 15.381 0.000 0.647 0.621
.sbbsntx 0.603 0.041 14.584 0.000 0.603 0.604
.sbprvpv 0.554 0.030 18.722 0.000 0.554 0.722
.sbeqsoc 0.224 0.051 4.404 0.000 0.224 0.320
.sbcwkfm 0.518 0.020 25.832 0.000 0.518 0.859
.sblazy 0.551 0.027 20.083 0.000 0.551 0.495
.sblwcoa 0.441 0.027 16.639 0.000 0.441 0.413
.sblwlka 0.539 0.027 20.320 0.000 0.539 0.502
wc_econo 0.395 0.045 8.689 0.000 1.000 1.000
wc_socia 0.213 0.029 7.328 0.000 1.000 1.000
wc_moral 0.561 0.039 14.477 0.000 1.000 1.000
9 Plotting
Plotting SEM is quite common, especially when the model are rather complex. There are different packages that can be use to plot SEM in lavaan. We are going to use the tidySEM
package. Compared to other packages, it can plot both lavaan and Mplus models and it is fairly intuitive. However, you can also check semPlot
or lavaanPlot
.
# first, let's define our plot layout
<- get_layout("wc_econo", "", "", "wc_socia","", "","wc_moral", "",
lay "sbstrec", "sbbsntx", "sbprvpv", "sbeqsoc", "sbcwkfm", "sblazy", "sblwcoa", "sblwlka", rows = 2)
# let's take a look at our plot layout.
lay
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "wc_econo" "" "" "wc_socia" "" "" "wc_moral" ""
[2,] "sbstrec" "sbbsntx" "sbprvpv" "sbeqsoc" "sbcwkfm" "sblazy" "sblwcoa" "sblwlka"
attr(,"class")
[1] "layout_matrix" "matrix" "array"
# let's plot our results
<- graph_sem(model = fit_wc, # model fit
plot_wc layout = lay, # layout
angle = 170 # adjust the arrows
#label = "est_std", # get standardized results (not rounded)
)
plot_wc
With tidySEM
, you can fully customize your plot. For instance, it is possible to highlighting a specific model element, such as the low factor loading for sbcwkfm on wc_socia
<- prepare_graph(fit_wc)
graph_data
edges(graph_data) <- graph_data %>%
edges() %>%
mutate(colour = "black") %>%
mutate(colour = replace(colour, from == "wc_socia" & to == "sbcwkfm", "red"))
plot(graph_data,
layout = lay, # layout
#label = "est_std", # get standardized results (not rounded)
angle = 170 # adjust the arrows
)