Intro to Conjoint Experiments | Lab - 4

1 What we are going to do in this LAB

  1. Get the data ready for CJ Analysis
    • Descriptives
    • From Wide to Long
  2. Perform the analysis in R
    • AMCE using cjoint
    • AMCE using cregg
  3. Plotting
    • Cregg and ggplot
    • Fine tuning
  4. Diagnostics
    • Frequencies
    • Constrains
    • Carry over
    • Fatigue

1.1 Dataset used

  1. 2019 CEU Experimental Political Science dataset
  2. Q578 Q579 Q580: Choice CJ Task
  3. F-*-*: Conjoint features
  4. Q78: Employment status
  5. Q77: Race
  6. Q76: Education
  7. Q75: Gender
  8. Q74: Age
  9. Q581: Religiosity

2 Environment preparation

# ### Data import ###
# install.packages("readr")     # read datasets
# install.packages("qualtRics") # read qualtrics datasets
# ### Data manipulation ###
# install.packages("dplyr")     # pipes and data manipulation
# ### Visualization ###
# install.packages("ggplot2")    # graphing capabilities
# ### Estimation ###
# install.packages("cjoint")    # base amce package
# install.packages("cregg")     # amce and mm 

## Custom build functions 
#library(devtools)
#devtools::install_github("albertostefanelli/cjoint") # fixes some problem with cjoint

### Data import ###
library("readr")     
library("qualtRics") 
### Data manipulation ###
library("dplyr")     
### Visualization ###
library("ggplot2")    
### Estimation ###
library("cjoint")   
library("cregg")     

3 Load the data: Non-Qualtrics data

df_base <- readr::read_csv("https://github.com/albertostefanelli/conjoint_class/raw/master/data/experimental_political_science_2019_cleaned.csv")

head(df_base)
# A tibble: 6 × 48
   ...1 `Duration (in seconds)` Finished ResponseId      Q578  Q579  Q580  Q271_1 Q271_2 Q271_3 Q271_4 Q271_5 Q271_6 Q78   Q77   Q76   Q75     Q74 Q581  `F-1-1` `F-1-1-1` `F-1-2` `F-1-1-2` `F-1-3` `F-1-1-3` `F-1-2-1` `F-1-2-2` `F-1-2-3` `F-2-1` `F-2-1-1` `F-2-2` `F-2-1-2` `F-2-3` `F-2-1-3` `F-2-2-1`
  <dbl>                   <dbl> <lgl>    <chr>           <chr> <chr> <chr> <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr> <chr> <chr> <chr> <dbl> <chr> <chr>   <chr>     <chr>   <chr>     <chr>   <chr>     <chr>     <chr>     <chr>     <chr>   <chr>     <chr>   <chr>     <chr>   <chr>     <chr>    
1     1                     295 TRUE     R_3QMq6EzoaEhT… Cand… Cand… Cand… 6      6      6      6      3      5      Stud… Cauc… Some… Male     22 Never Past P… None      Policy… Welcome … Penal … The cand… Approxim… Acceptan… No proce… Past P… Approxim… Policy… Acceptan… Penal … The cand… Approxim…
2     2                     358 TRUE     R_2ZWYdunD4yrj… Cand… Cand… Cand… 6      3      3      6      1 - S… 7 - S… Unem… Cauc… Some… Male     32 Never Past P… Approxim… Penal … No proce… Policy… Welcome … Approxim… The cand… Acceptan… Past P… Approxim… Penal … No proce… Policy… Acceptan… Approxim…
3     3                     310 TRUE     R_3GEedrAF3Om8… Cand… Cand… Cand… 6      3      6      5      3      6      Stud… Asian Some… Male     21 Never Penal … The cand… Past P… None      Policy… Acceptan… No proce… None      Acceptan… Penal … The cand… Past P… Approxim… Policy… Collecti… The cand…
4     4                     365 TRUE     R_emv0H4GnLSxq… Cand… Cand… Cand… 6      5      5      4 - N… 5      5      Empl… Cauc… Some… Male     30 Regu… Past P… None      Policy… Acceptan… Penal … The cand… None      Collecti… No proce… Past P… Approxim… Policy… Acceptan… Penal … No proce… None     
5     5                     364 TRUE     R_1CIruyJc7nV1… Cand… Cand… Cand… 5      4 - N… 5      5      6      5      Empl… Cauc… Bach… Male     30 Regu… Policy… Welcome … Past P… Approxim… Penal … The cand… Welcome … Approxim… The cand… Policy… Welcome … Past P… None      Penal … The cand… Acceptan…
6     6                     439 TRUE     R_2TMFzzb17eua… Cand… Cand… Cand… 5      4 - N… 4 - N… 4 - N… 3      5      Empl… Hisp… Bach… Male     41 Some… Policy… Collecti… Penal … The cand… Past P… Approxim… Welcome … The cand… None      Policy… Acceptan… Penal … No proce… Past P… None      Welcome …
# ℹ 13 more variables: `F-2-2-2` <chr>, `F-2-2-3` <chr>, `F-3-1` <chr>, `F-3-1-1` <chr>, `F-3-2` <chr>, `F-3-1-2` <chr>, `F-3-3` <chr>, `F-3-1-3` <chr>, `F-3-2-1` <chr>, `F-3-2-2` <chr>, `F-3-2-3` <chr>, duration_mins <dbl>, scaled_duration_mins <dbl>

3.1 Get the data in the right format

  • Transform the DV in numeric format.
    • This is required by the read.df.qualtRics() function
  • Recode the moderators if you plan to use them in the analysis
    • We are going to use education as moderator
    • I am interesting in comparing respondents with high and low levels of education
df_base$Q578 <- as.numeric(as.factor(df_base$Q578))
df_base$Q579 <- as.numeric(as.factor(df_base$Q579))
df_base$Q580 <- as.numeric(as.factor(df_base$Q580))

df_base$education_recoded <- ifelse(df_base$Q76=="Bachelor degree"| df_base$Q76=="PhD, JD, MD"| df_base$Q76=="Master degree", "Higher Education", "Lower Education")

df_base$education_recoded <- as.factor(df_base$education_recoded)
  • Let’s take a look at the recoded data
  • Everything seems in order
psych::describe(with(df_base, cbind(Q578,Q579,Q580,education_recoded)))
                  vars   n mean  sd median trimmed mad min max range  skew kurtosis   se
Q578                 1 185 1.51 0.5      2    1.52   0   1   2     1 -0.05    -2.01 0.04
Q579                 2 185 1.56 0.5      2    1.57   0   1   2     1 -0.23    -1.96 0.04
Q580                 3 185 1.46 0.5      1    1.46   0   1   2     1  0.14    -1.99 0.04
education_recoded    4 184 1.47 0.5      1    1.46   0   1   2     1  0.13    -1.99 0.04

3.2 From wide to long

  • We need to use a “staked” design because we have respondents answering multiple CJ task
Conjointsdt Attributes and Levels

Figure 3.1: Conjointsdt Attributes and Levels

  • We are going to achieve this using the custom made function read.df.qualtRics()
  • If the read.df.qualtRics() function does not work, install it from my github uncommenting the install_github() from the environment preparation section
  • Four arguments
    • data frame
    • responses: the choice task variables
    • covariates: the moderators
    • respondentID: the respondent unique identifier
df_long_format <- cjoint::read.df.qualtRics(df_base, 
  # the DV for our conjoint tasks (Candidate 1 and Candidate 2)
  responses=c("Q578","Q579","Q580"), 
  covariates=c("Q76"),
  respondentID = "ResponseId"
  )

head(df_long_format)
         ResponseId             Q76 task respondent profile Past.Political.Experience Past.Political.Experience.rowpos                                   Penal.proceedings Penal.proceedings.rowpos                                              Policy.Proposal Policy.Proposal.rowpos selected
1 R_3QMq6EzoaEhTjSI    Some college    1          1       1                      None                                1 The candidate is under investigation for corruption                        3               Welcome immigrants and organise human corridor                      2        0
2 R_2ZWYdunD4yrjLI3    Some college    1          2       1    Approximately 10 years                                1                                      No proceedings                        2               Welcome immigrants and organise human corridor                      3        1
3 R_3GEedrAF3Om8dOt    Some college    1          3       1                      None                                2 The candidate is under investigation for corruption                        1   Acceptance of immigrants conditional on certain requisites                      3        0
4 R_emv0H4GnLSxqphv    Some college    1          4       1                      None                                1 The candidate is under investigation for corruption                        3   Acceptance of immigrants conditional on certain requisites                      2        0
5 R_1CIruyJc7nV18gB Bachelor degree    1          5       1    Approximately 10 years                                2 The candidate is under investigation for corruption                        3               Welcome immigrants and organise human corridor                      1        1
6 R_2TMFzzb17euaPQG Bachelor degree    1          6       1    Approximately 20 years                                3 The candidate is under investigation for corruption                        2 Collective expulsion of immigrants and closure of the border                      1        1
  respondentIndex
1               1
2               2
3               3
4               4
5               5
6               6
  • The read.df.qualtRics transforms the textual information inside the CJ cells into factors with N levels
  • Relevant attributes of the read.df.qualtRics function:
    • selected: Our DV. If the profile was selected or not (in this case: Q578,Q579,Q580)
    • respondent: unique ID
    • task: which task the respondent is performing (cognitive fatigue)
    • .rowpos: position in the CJ table row of the attribute (ordering effect)
    • profile: which profile we are referring to (left-right carry over)
    • a set of factors that will be our IVs
  • Let’s see how it looks for respondent 1 in our DF
# let's order according to the respondent ID
respondent_1 <- df_long_format[order(df_long_format$respondent),]
# let's check respondent 1 
respondent_1[1:6,]
           ResponseId          Q76 task respondent profile Past.Political.Experience Past.Political.Experience.rowpos                                   Penal.proceedings Penal.proceedings.rowpos                                              Policy.Proposal Policy.Proposal.rowpos selected
1   R_3QMq6EzoaEhTjSI Some college    1          1       1                      None                                1 The candidate is under investigation for corruption                        3               Welcome immigrants and organise human corridor                      2        0
186 R_3QMq6EzoaEhTjSI Some college    2          1       1    Approximately 10 years                                1      The candidate has been convicted of corruption                        3   Acceptance of immigrants conditional on certain requisites                      2        1
371 R_3QMq6EzoaEhTjSI Some college    3          1       1    Approximately 20 years                                1                                      No proceedings                        3 Collective expulsion of immigrants and closure of the border                      2        1
556 R_3QMq6EzoaEhTjSI Some college    1          1       2    Approximately 10 years                                1                                      No proceedings                        3   Acceptance of immigrants conditional on certain requisites                      2        1
741 R_3QMq6EzoaEhTjSI Some college    2          1       2    Approximately 20 years                                1                                      No proceedings                        3               Welcome immigrants and organise human corridor                      2        0
926 R_3QMq6EzoaEhTjSI Some college    3          1       2    Approximately 10 years                                1      The candidate has been convicted of corruption                        3               Welcome immigrants and organise human corridor                      2        0
    respondentIndex
1                 1
186               1
371               1
556               1
741               1
926               1

4 Estimating the AMCE

4.1 AMCE: cjoint non-interactive effects (main effect)

  • The baseline AMCE represent the averaged effect of varying one attributes on the probability that a profile is chosen by a respondent, marginalizing over all the other attributes
  • Again: It is NOT a descriptive measures BUT a casual estimand
fit_1 <- cjoint::amce(selected ~ Past.Political.Experience + Penal.proceedings + Policy.Proposal,
  data=df_long_format,
  cluster=TRUE, 
  respondent.id="ResponseId"
)

summary(fit_1)
------------------------------------------
Average Marginal Component Effects (AMCE):
------------------------------------------
                 Attribute                                                        Level  Estimate Std. Err   z value                                  Pr(>|z|)    
 Past.Political.Experience                                       Approximately 20 years  0.011326 0.033881   0.33428 0.738164965318939492178174077707808464766    
 Past.Political.Experience                                                         None -0.126223 0.030292  -4.16694 0.000030871737981493673668234589690939629 ***
         Penal.proceedings               The candidate has been convicted of corruption -0.410547 0.033176 -12.37481 0.000000000000000000000000000000000035774 ***
         Penal.proceedings          The candidate is under investigation for corruption -0.296646 0.034319  -8.64375 0.000000000000000005439786938187670866052 ***
           Policy.Proposal Collective expulsion of immigrants and closure of the border -0.271236 0.038193  -7.10169 0.000000000001232370465129786086419408608 ***
           Policy.Proposal               Welcome immigrants and organise human corridor -0.078578 0.038786  -2.02593 0.042771568929823734761530573678101063706   *
---
Number of Obs. = 1110
---
Number of Respondents = 185
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05

--------------------
AMCE Baseline Levels:
--------------------
                 Attribute                                                      Level
 Past.Political.Experience                                     Approximately 10 years
         Penal.proceedings                                             No proceedings
           Policy.Proposal Acceptance of immigrants conditional on certain requisites

4.2 AMCE: cjoint Changing baseline

In certain cases, the used baselines are not meaningful and, thus, should be changed. For instance, in this case the baseline for political experiments is an average category (Approximately 10 years). However, i am more interested in understanding whether there is an linear effect of political experience on the probability of selecting a profile. The same goes for the attribute Policy Proposal. In this case, we want to estimate the effect of the other two attributes using as a baselien the level “Welcome immigrants and organise human corridor”.

baselines <- list()
# Change the baseline for "Political Experience"
baselines$Past.Political.Experience <- "None"
# Change the baseline for "Policy Proposal"
baselines$Policy.Proposal <- "Welcome immigrants and organise human corridor"

fit_2 <- cjoint::amce(selected ~ Past.Political.Experience + Penal.proceedings + Policy.Proposal,
  data=df_long_format,
  cluster=TRUE, 
  respondent.id="ResponseId",
  baselines=baselines
)

summary(fit_2)
------------------------------------------
Average Marginal Component Effects (AMCE):
------------------------------------------
                 Attribute                                                        Level  Estimate Std. Err  z value                                  Pr(>|z|)    
 Past.Political.Experience                                       Approximately 10 years  0.126223 0.030292   4.1669 0.000030871737981486341751043156467204653 ***
 Past.Political.Experience                                       Approximately 20 years  0.137549 0.032098   4.2853 0.000018248261254058011277857931364820843 ***
         Penal.proceedings               The candidate has been convicted of corruption -0.410547 0.033176 -12.3748 0.000000000000000000000000000000000035774 ***
         Penal.proceedings          The candidate is under investigation for corruption -0.296646 0.034319  -8.6437 0.000000000000000005439786938186825767992 ***
           Policy.Proposal   Acceptance of immigrants conditional on certain requisites  0.078578 0.038786   2.0259 0.042771568929824865801236910556326620281   *
           Policy.Proposal Collective expulsion of immigrants and closure of the border -0.192659 0.038475  -5.0074 0.000000551674642912702002206341844736670 ***
---
Number of Obs. = 1110
---
Number of Respondents = 185
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05

--------------------
AMCE Baseline Levels:
--------------------
                 Attribute                                          Level
 Past.Political.Experience                                           None
         Penal.proceedings                                 No proceedings
           Policy.Proposal Welcome immigrants and organise human corridor

4.3 AMCE: cregg

# Change the baseline for "Political Experience"
df_long_format$Past.Political.Experience <- relevel(df_long_format$Past.Political.Experience, ref="None")
# Change the baseline for "Policy Proposal"
df_long_format$Policy.Proposal <- relevel(df_long_format$Policy.Proposal, ref="Welcome immigrants and organise human corridor")

fit_3 <- cregg::cj(df_long_format, selected ~ 
  Past.Political.Experience +
  Penal.proceedings+
  Policy.Proposal, 
  estimate = "amce",
  id = ~ResponseId)


fit_3
   outcome statistic                   feature                                                        level    estimate  std.error          z                                           p        lower      upper
1 selected      amce Past.Political.Experience                                                         None  0.00000000         NA         NA                                          NA           NA         NA
2 selected      amce Past.Political.Experience                                       Approximately 10 years  0.12622317 0.03020953   4.178257 0.00002937517925197088758306612765736076653  0.067013584  0.1854328
3 selected      amce Past.Political.Experience                                       Approximately 20 years  0.13754911 0.03201086   4.296952 0.00001731626883035745841392036292738509928  0.074808984  0.2002892
4 selected      amce         Penal.proceedings                                               No proceedings  0.00000000         NA         NA                                          NA           NA         NA
5 selected      amce         Penal.proceedings               The candidate has been convicted of corruption -0.41054652 0.03308612 -12.408422 0.00000000000000000000000000000000002352443 -0.475394126 -0.3456989
6 selected      amce         Penal.proceedings          The candidate is under investigation for corruption -0.29664601 0.03422617  -8.667227 0.00000000000000000442771248684538310556773 -0.363728073 -0.2295639
7 selected      amce           Policy.Proposal               Welcome immigrants and organise human corridor  0.00000000         NA         NA                                          NA           NA         NA
8 selected      amce           Policy.Proposal   Acceptance of immigrants conditional on certain requisites  0.07857776 0.03868088   2.031437 0.04221072751629999925127023630011535715312  0.002764623  0.1543909
9 selected      amce           Policy.Proposal Collective expulsion of immigrants and closure of the border -0.19265850 0.03837046  -5.021010 0.00000051400350106376047573279253546929546 -0.267863231 -0.1174538

5 Plotting: cregg and ggplot

plot(fit_3) +  
    theme(text = element_text(size=21)) + 
    theme(legend.position = "none")

5.1 Tuning ggplot

plot_to_change <- plot(fit_3)

plot_to_change + scale_colour_grey(start = 0, end = .5) +
  theme_bw() + theme(text = element_text(size=18)) + 
    theme(legend.position = "none")

  • We should get rid of points in the attributes
  • Past.Political.Experience -> Political Experience
plot_to_change <- plot(fit_3)

plot_to_change + scale_y_discrete(labels=rev(c(
    "Attribute: Past Political Experience",
    "20 years",
    "10 years",
    "Ref: None", 
    "Attribute: Penal Proceeding",
    "Investigation for corruption",
    "Convicted for corruption",
    "Ref: No Proceedings", 
    "Attribute: Policy on immigration", 
    "Expulsion of Immigrants",
    "Conditional Acceptance",
    "Ref: Unconditional Acceptance"))) + 
    theme(text = element_text(size=21)) +
    scale_colour_grey(start = 0, end = .5)  + 
    theme(legend.position = "none")

6 Diagnostic

6.1 Display Frequencies and Proportions

This is especially useful if you have marginal distributions of the attributes that are not uniform

plot(cregg::cj_freqs(df_long_format, selected ~ 
  Past.Political.Experience +
  Penal.proceedings+
  Policy.Proposal, 
  estimate = "amce",
  id = ~ResponseId)) + 
    theme(text = element_text(size=21))

6.2 Left/Right Diagnostics

  • The distribution should be at random
  • Meaning around .5 per each cell
# the function requires the by variable to be a factor
df_long_format$profile <-  as.factor(df_long_format$profile)

plot(cregg::cj(df_long_format, selected ~ 
  Past.Political.Experience +
  Penal.proceedings+
  Policy.Proposal, 
  id = ~ResponseId,
  by = ~profile,
  estimate = "mm"), 
  group = "profile", vline = 0.5) + 
  theme(text = element_text(size=21))

6.3 Attributes’ order

Let’ssee if the attributes’ order had an impact on the probability of selecting a profile. The attributes’ order is randomized by design but it could be that seeing political experience as the first attribute in the conjoint table has an impact on how the decision making process of the respondent.

  • We should not be able to detect any effect of the row position on our DV
  • Let’s test Past political experience
  • You should do it for all the attributes
df_long_format$Past.Political.Experience.rowpos <-  as.factor(df_long_format$Penal.proceedings.rowpos)

plot(cregg::cj(df_long_format, selected ~ 
  Past.Political.Experience +
  Penal.proceedings+
  Policy.Proposal, 
  id = ~ResponseId,
  by = ~Past.Political.Experience.rowpos,
  estimate = "mm"), 
    group = "Past.Political.Experience.rowpos", 
    vline = 0.5)+ 
theme(text = element_text(size=21))

6.4 Respondent’s fatigue

Let’s now take a look at the effect of task on selecting a profile. We might expect that people got tired with the progression of the experiment and as such, we would see an effect of task numbering on their response behaviour. Again, we want the coefficients to overlap to have no significant different between tasks.

df_long_format$task <-  as.factor(df_long_format$task)

plot(cregg::cj(df_long_format, selected ~ 
  Past.Political.Experience +
  Penal.proceedings+
  Policy.Proposal, 
  id = ~ResponseId,
  by = ~task,
  estimate = "mm"), 
    group = "task", 
    vline = 0.5)+ 
theme(text = element_text(size=21))

7 References