Intro to Conjoint Experiments | Lab - 4
1 What we are going to do in this LAB
- Get the data ready for CJ Analysis
- Descriptives
- From Wide to Long
- Perform the analysis in R
- AMCE using cjoint
- AMCE using cregg
- Plotting
- Cregg and ggplot
- Fine tuning
- Diagnostics
- Frequencies
- Constrains
- Carry over
- Fatigue
1.1 Dataset used
- 2019 CEU Experimental Political Science dataset
- Q578 Q579 Q580: Choice CJ Task
- F-*-*: Conjoint features
- Q78: Employment status
- Q77: Race
- Q76: Education
- Q75: Gender
- Q74: Age
- Q581: Religiosity
2 Environment preparation
# ### Data import ###
# install.packages("readr") # read datasets
# install.packages("qualtRics") # read qualtrics datasets
# ### Data manipulation ###
# install.packages("dplyr") # pipes and data manipulation
# ### Visualization ###
# install.packages("ggplot2") # graphing capabilities
# ### Estimation ###
# install.packages("cjoint") # base amce package
# install.packages("cregg") # amce and mm
## Custom build functions
#library(devtools)
#devtools::install_github("albertostefanelli/cjoint") # fixes some problem with cjoint
### Data import ###
library("readr")
library("qualtRics")
### Data manipulation ###
library("dplyr")
### Visualization ###
library("ggplot2")
### Estimation ###
library("cjoint")
library("cregg")
3 Load the data: Non-Qualtrics data
<- readr::read_csv("https://github.com/albertostefanelli/conjoint_class/raw/master/data/experimental_political_science_2019_cleaned.csv")
df_base
head(df_base)
# A tibble: 6 × 48
...1 `Duration (in seconds)` Finished ResponseId Q578 Q579 Q580 Q271_1 Q271_2 Q271_3 Q271_4 Q271_5 Q271_6 Q78 Q77 Q76 Q75 Q74 Q581 `F-1-1` `F-1-1-1` `F-1-2` `F-1-1-2` `F-1-3` `F-1-1-3` `F-1-2-1` `F-1-2-2` `F-1-2-3` `F-2-1` `F-2-1-1` `F-2-2` `F-2-1-2` `F-2-3` `F-2-1-3` `F-2-2-1`
<dbl> <dbl> <lgl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 295 TRUE R_3QMq6EzoaEhT… Cand… Cand… Cand… 6 6 6 6 3 5 Stud… Cauc… Some… Male 22 Never Past P… None Policy… Welcome … Penal … The cand… Approxim… Acceptan… No proce… Past P… Approxim… Policy… Acceptan… Penal … The cand… Approxim…
2 2 358 TRUE R_2ZWYdunD4yrj… Cand… Cand… Cand… 6 3 3 6 1 - S… 7 - S… Unem… Cauc… Some… Male 32 Never Past P… Approxim… Penal … No proce… Policy… Welcome … Approxim… The cand… Acceptan… Past P… Approxim… Penal … No proce… Policy… Acceptan… Approxim…
3 3 310 TRUE R_3GEedrAF3Om8… Cand… Cand… Cand… 6 3 6 5 3 6 Stud… Asian Some… Male 21 Never Penal … The cand… Past P… None Policy… Acceptan… No proce… None Acceptan… Penal … The cand… Past P… Approxim… Policy… Collecti… The cand…
4 4 365 TRUE R_emv0H4GnLSxq… Cand… Cand… Cand… 6 5 5 4 - N… 5 5 Empl… Cauc… Some… Male 30 Regu… Past P… None Policy… Acceptan… Penal … The cand… None Collecti… No proce… Past P… Approxim… Policy… Acceptan… Penal … No proce… None
5 5 364 TRUE R_1CIruyJc7nV1… Cand… Cand… Cand… 5 4 - N… 5 5 6 5 Empl… Cauc… Bach… Male 30 Regu… Policy… Welcome … Past P… Approxim… Penal … The cand… Welcome … Approxim… The cand… Policy… Welcome … Past P… None Penal … The cand… Acceptan…
6 6 439 TRUE R_2TMFzzb17eua… Cand… Cand… Cand… 5 4 - N… 4 - N… 4 - N… 3 5 Empl… Hisp… Bach… Male 41 Some… Policy… Collecti… Penal … The cand… Past P… Approxim… Welcome … The cand… None Policy… Acceptan… Penal … No proce… Past P… None Welcome …
# ℹ 13 more variables: `F-2-2-2` <chr>, `F-2-2-3` <chr>, `F-3-1` <chr>, `F-3-1-1` <chr>, `F-3-2` <chr>, `F-3-1-2` <chr>, `F-3-3` <chr>, `F-3-1-3` <chr>, `F-3-2-1` <chr>, `F-3-2-2` <chr>, `F-3-2-3` <chr>, duration_mins <dbl>, scaled_duration_mins <dbl>
3.1 Get the data in the right format
- Transform the DV in numeric format.
- This is required by the read.df.qualtRics() function
- Recode the moderators if you plan to use them in the analysis
- We are going to use education as moderator
- I am interesting in comparing respondents with high and low levels of education
$Q578 <- as.numeric(as.factor(df_base$Q578))
df_base$Q579 <- as.numeric(as.factor(df_base$Q579))
df_base$Q580 <- as.numeric(as.factor(df_base$Q580))
df_base
$education_recoded <- ifelse(df_base$Q76=="Bachelor degree"| df_base$Q76=="PhD, JD, MD"| df_base$Q76=="Master degree", "Higher Education", "Lower Education")
df_base
$education_recoded <- as.factor(df_base$education_recoded) df_base
- Let’s take a look at the recoded data
- Everything seems in order
::describe(with(df_base, cbind(Q578,Q579,Q580,education_recoded))) psych
vars n mean sd median trimmed mad min max range skew kurtosis se
Q578 1 185 1.51 0.5 2 1.52 0 1 2 1 -0.05 -2.01 0.04
Q579 2 185 1.56 0.5 2 1.57 0 1 2 1 -0.23 -1.96 0.04
Q580 3 185 1.46 0.5 1 1.46 0 1 2 1 0.14 -1.99 0.04
education_recoded 4 184 1.47 0.5 1 1.46 0 1 2 1 0.13 -1.99 0.04
3.2 From wide to long
- We need to use a “staked” design because we have respondents answering multiple CJ task
- We are going to achieve this using the custom made function read.df.qualtRics()
- If the read.df.qualtRics() function does not work, install it from my github uncommenting the install_github() from the environment preparation section
- Four arguments
- data frame
- responses: the choice task variables
- covariates: the moderators
- respondentID: the respondent unique identifier
<- cjoint::read.df.qualtRics(df_base,
df_long_format # the DV for our conjoint tasks (Candidate 1 and Candidate 2)
responses=c("Q578","Q579","Q580"),
covariates=c("Q76"),
respondentID = "ResponseId"
)
head(df_long_format)
ResponseId Q76 task respondent profile Past.Political.Experience Past.Political.Experience.rowpos Penal.proceedings Penal.proceedings.rowpos Policy.Proposal Policy.Proposal.rowpos selected
1 R_3QMq6EzoaEhTjSI Some college 1 1 1 None 1 The candidate is under investigation for corruption 3 Welcome immigrants and organise human corridor 2 0
2 R_2ZWYdunD4yrjLI3 Some college 1 2 1 Approximately 10 years 1 No proceedings 2 Welcome immigrants and organise human corridor 3 1
3 R_3GEedrAF3Om8dOt Some college 1 3 1 None 2 The candidate is under investigation for corruption 1 Acceptance of immigrants conditional on certain requisites 3 0
4 R_emv0H4GnLSxqphv Some college 1 4 1 None 1 The candidate is under investigation for corruption 3 Acceptance of immigrants conditional on certain requisites 2 0
5 R_1CIruyJc7nV18gB Bachelor degree 1 5 1 Approximately 10 years 2 The candidate is under investigation for corruption 3 Welcome immigrants and organise human corridor 1 1
6 R_2TMFzzb17euaPQG Bachelor degree 1 6 1 Approximately 20 years 3 The candidate is under investigation for corruption 2 Collective expulsion of immigrants and closure of the border 1 1
respondentIndex
1 1
2 2
3 3
4 4
5 5
6 6
- The read.df.qualtRics transforms the textual information inside the CJ cells into factors with N levels
- Relevant attributes of the read.df.qualtRics function:
- selected: Our DV. If the profile was selected or not (in this case: Q578,Q579,Q580)
- respondent: unique ID
- task: which task the respondent is performing (cognitive fatigue)
- .rowpos: position in the CJ table row of the attribute (ordering effect)
- profile: which profile we are referring to (left-right carry over)
- a set of factors that will be our IVs
- Let’s see how it looks for respondent 1 in our DF
# let's order according to the respondent ID
<- df_long_format[order(df_long_format$respondent),]
respondent_1 # let's check respondent 1
1:6,] respondent_1[
ResponseId Q76 task respondent profile Past.Political.Experience Past.Political.Experience.rowpos Penal.proceedings Penal.proceedings.rowpos Policy.Proposal Policy.Proposal.rowpos selected
1 R_3QMq6EzoaEhTjSI Some college 1 1 1 None 1 The candidate is under investigation for corruption 3 Welcome immigrants and organise human corridor 2 0
186 R_3QMq6EzoaEhTjSI Some college 2 1 1 Approximately 10 years 1 The candidate has been convicted of corruption 3 Acceptance of immigrants conditional on certain requisites 2 1
371 R_3QMq6EzoaEhTjSI Some college 3 1 1 Approximately 20 years 1 No proceedings 3 Collective expulsion of immigrants and closure of the border 2 1
556 R_3QMq6EzoaEhTjSI Some college 1 1 2 Approximately 10 years 1 No proceedings 3 Acceptance of immigrants conditional on certain requisites 2 1
741 R_3QMq6EzoaEhTjSI Some college 2 1 2 Approximately 20 years 1 No proceedings 3 Welcome immigrants and organise human corridor 2 0
926 R_3QMq6EzoaEhTjSI Some college 3 1 2 Approximately 10 years 1 The candidate has been convicted of corruption 3 Welcome immigrants and organise human corridor 2 0
respondentIndex
1 1
186 1
371 1
556 1
741 1
926 1
4 Estimating the AMCE
4.1 AMCE: cjoint non-interactive effects (main effect)
- The baseline AMCE represent the averaged effect of varying one attributes on the probability that a profile is chosen by a respondent, marginalizing over all the other attributes
- Again: It is NOT a descriptive measures BUT a casual estimand
<- cjoint::amce(selected ~ Past.Political.Experience + Penal.proceedings + Policy.Proposal,
fit_1 data=df_long_format,
cluster=TRUE,
respondent.id="ResponseId"
)
summary(fit_1)
------------------------------------------
Average Marginal Component Effects (AMCE):
------------------------------------------
Attribute Level Estimate Std. Err z value Pr(>|z|)
Past.Political.Experience Approximately 20 years 0.011326 0.033881 0.33428 0.738164965318939492178174077707808464766
Past.Political.Experience None -0.126223 0.030292 -4.16694 0.000030871737981493673668234589690939629 ***
Penal.proceedings The candidate has been convicted of corruption -0.410547 0.033176 -12.37481 0.000000000000000000000000000000000035774 ***
Penal.proceedings The candidate is under investigation for corruption -0.296646 0.034319 -8.64375 0.000000000000000005439786938187670866052 ***
Policy.Proposal Collective expulsion of immigrants and closure of the border -0.271236 0.038193 -7.10169 0.000000000001232370465129786086419408608 ***
Policy.Proposal Welcome immigrants and organise human corridor -0.078578 0.038786 -2.02593 0.042771568929823734761530573678101063706 *
---
Number of Obs. = 1110
---
Number of Respondents = 185
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
--------------------
AMCE Baseline Levels:
--------------------
Attribute Level
Past.Political.Experience Approximately 10 years
Penal.proceedings No proceedings
Policy.Proposal Acceptance of immigrants conditional on certain requisites
4.2 AMCE: cjoint Changing baseline
In certain cases, the used baselines are not meaningful and, thus, should be changed. For instance, in this case the baseline for political experiments is an average category (Approximately 10 years). However, i am more interested in understanding whether there is an linear effect of political experience on the probability of selecting a profile. The same goes for the attribute Policy Proposal. In this case, we want to estimate the effect of the other two attributes using as a baselien the level “Welcome immigrants and organise human corridor”.
<- list()
baselines # Change the baseline for "Political Experience"
$Past.Political.Experience <- "None"
baselines# Change the baseline for "Policy Proposal"
$Policy.Proposal <- "Welcome immigrants and organise human corridor"
baselines
<- cjoint::amce(selected ~ Past.Political.Experience + Penal.proceedings + Policy.Proposal,
fit_2 data=df_long_format,
cluster=TRUE,
respondent.id="ResponseId",
baselines=baselines
)
summary(fit_2)
------------------------------------------
Average Marginal Component Effects (AMCE):
------------------------------------------
Attribute Level Estimate Std. Err z value Pr(>|z|)
Past.Political.Experience Approximately 10 years 0.126223 0.030292 4.1669 0.000030871737981486341751043156467204653 ***
Past.Political.Experience Approximately 20 years 0.137549 0.032098 4.2853 0.000018248261254058011277857931364820843 ***
Penal.proceedings The candidate has been convicted of corruption -0.410547 0.033176 -12.3748 0.000000000000000000000000000000000035774 ***
Penal.proceedings The candidate is under investigation for corruption -0.296646 0.034319 -8.6437 0.000000000000000005439786938186825767992 ***
Policy.Proposal Acceptance of immigrants conditional on certain requisites 0.078578 0.038786 2.0259 0.042771568929824865801236910556326620281 *
Policy.Proposal Collective expulsion of immigrants and closure of the border -0.192659 0.038475 -5.0074 0.000000551674642912702002206341844736670 ***
---
Number of Obs. = 1110
---
Number of Respondents = 185
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
--------------------
AMCE Baseline Levels:
--------------------
Attribute Level
Past.Political.Experience None
Penal.proceedings No proceedings
Policy.Proposal Welcome immigrants and organise human corridor
4.3 AMCE: cregg
# Change the baseline for "Political Experience"
$Past.Political.Experience <- relevel(df_long_format$Past.Political.Experience, ref="None")
df_long_format# Change the baseline for "Policy Proposal"
$Policy.Proposal <- relevel(df_long_format$Policy.Proposal, ref="Welcome immigrants and organise human corridor")
df_long_format
<- cregg::cj(df_long_format, selected ~
fit_3 +
Past.Political.Experience +
Penal.proceedings
Policy.Proposal, estimate = "amce",
id = ~ResponseId)
fit_3
outcome statistic feature level estimate std.error z p lower upper
1 selected amce Past.Political.Experience None 0.00000000 NA NA NA NA NA
2 selected amce Past.Political.Experience Approximately 10 years 0.12622317 0.03020953 4.178257 0.00002937517925197088758306612765736076653 0.067013584 0.1854328
3 selected amce Past.Political.Experience Approximately 20 years 0.13754911 0.03201086 4.296952 0.00001731626883035745841392036292738509928 0.074808984 0.2002892
4 selected amce Penal.proceedings No proceedings 0.00000000 NA NA NA NA NA
5 selected amce Penal.proceedings The candidate has been convicted of corruption -0.41054652 0.03308612 -12.408422 0.00000000000000000000000000000000002352443 -0.475394126 -0.3456989
6 selected amce Penal.proceedings The candidate is under investigation for corruption -0.29664601 0.03422617 -8.667227 0.00000000000000000442771248684538310556773 -0.363728073 -0.2295639
7 selected amce Policy.Proposal Welcome immigrants and organise human corridor 0.00000000 NA NA NA NA NA
8 selected amce Policy.Proposal Acceptance of immigrants conditional on certain requisites 0.07857776 0.03868088 2.031437 0.04221072751629999925127023630011535715312 0.002764623 0.1543909
9 selected amce Policy.Proposal Collective expulsion of immigrants and closure of the border -0.19265850 0.03837046 -5.021010 0.00000051400350106376047573279253546929546 -0.267863231 -0.1174538
5 Plotting: cregg and ggplot
plot(fit_3) +
theme(text = element_text(size=21)) +
theme(legend.position = "none")
5.1 Tuning ggplot
- Depending on the review where you are publishing they may ask you to change colours
- A HUGE repository for colours https://cran.r-project.org/web/packages/ggsci/vignettes/ggsci.html
<- plot(fit_3)
plot_to_change
+ scale_colour_grey(start = 0, end = .5) +
plot_to_change theme_bw() + theme(text = element_text(size=18)) +
theme(legend.position = "none")
- We should get rid of points in the attributes
- Past.Political.Experience -> Political Experience
<- plot(fit_3)
plot_to_change
+ scale_y_discrete(labels=rev(c(
plot_to_change "Attribute: Past Political Experience",
"20 years",
"10 years",
"Ref: None",
"Attribute: Penal Proceeding",
"Investigation for corruption",
"Convicted for corruption",
"Ref: No Proceedings",
"Attribute: Policy on immigration",
"Expulsion of Immigrants",
"Conditional Acceptance",
"Ref: Unconditional Acceptance"))) +
theme(text = element_text(size=21)) +
scale_colour_grey(start = 0, end = .5) +
theme(legend.position = "none")
6 Diagnostic
6.1 Display Frequencies and Proportions
This is especially useful if you have marginal distributions of the attributes that are not uniform
plot(cregg::cj_freqs(df_long_format, selected ~
+
Past.Political.Experience +
Penal.proceedings
Policy.Proposal, estimate = "amce",
id = ~ResponseId)) +
theme(text = element_text(size=21))
6.2 Left/Right Diagnostics
- The distribution should be at random
- Meaning around .5 per each cell
# the function requires the by variable to be a factor
$profile <- as.factor(df_long_format$profile)
df_long_format
plot(cregg::cj(df_long_format, selected ~
+
Past.Political.Experience +
Penal.proceedings
Policy.Proposal, id = ~ResponseId,
by = ~profile,
estimate = "mm"),
group = "profile", vline = 0.5) +
theme(text = element_text(size=21))
6.3 Attributes’ order
Let’ssee if the attributes’ order had an impact on the probability of selecting a profile. The attributes’ order is randomized by design but it could be that seeing political experience as the first attribute in the conjoint table has an impact on how the decision making process of the respondent.
- We should not be able to detect any effect of the row position on our DV
- Let’s test Past political experience
- You should do it for all the attributes
$Past.Political.Experience.rowpos <- as.factor(df_long_format$Penal.proceedings.rowpos)
df_long_format
plot(cregg::cj(df_long_format, selected ~
+
Past.Political.Experience +
Penal.proceedings
Policy.Proposal, id = ~ResponseId,
by = ~Past.Political.Experience.rowpos,
estimate = "mm"),
group = "Past.Political.Experience.rowpos",
vline = 0.5)+
theme(text = element_text(size=21))
6.4 Respondent’s fatigue
Let’s now take a look at the effect of task on selecting a profile. We might expect that people got tired with the progression of the experiment and as such, we would see an effect of task numbering on their response behaviour. Again, we want the coefficients to overlap to have no significant different between tasks.
$task <- as.factor(df_long_format$task)
df_long_format
plot(cregg::cj(df_long_format, selected ~
+
Past.Political.Experience +
Penal.proceedings
Policy.Proposal, id = ~ResponseId,
by = ~task,
estimate = "mm"),
group = "task",
vline = 0.5)+
theme(text = element_text(size=21))