Intro to Conjoint Experiments | Solutions - 1
1 Environment preparation
# ### Data import ###
# install.packages("readr") # read datasets
# install.packages("qualtRics") # read qualtrics datasets
# install.packages("here") # absolute path management
# ### Data manipulation ###
# install.packages("dplyr") # pipes and data manipulation
# ### Visualization ###
# install.packages("ggplot2") # graphing capabilities
# ### Estimation ###
# install.packages("cjoint") # base amce package
# install.packages("cregg") # amce and mm
# install.packages("factorEx") # amce with non-uniform distribution
## Custom build functions
# library(devtools)
# install_github("albertostefanelli/cjoint") # fixes some problem with cjoint
### Data import ###
library("readr")
library("qualtRics")
library("here")
### Data manipulation ###
library("dplyr")
### Visualization ###
library("ggplot2")
### Estimation ###
library("cjoint")
library("cregg")
library("factorEx")
2 Data
We are going to use the data from Kirkland, Patricia A; Coppock, Alexander, 2017, “Replication Data for: Candidate Choice Without Party Labels: New Insights from Conjoint Survey Experiments”, https://doi.org/10.7910/DVN/WSUHI3. The data has no codebook so we need to load it and understand how it is structured.
3 Exercise 1
- Load the data
- What’s the sample size?
- What the variable contest_no refers to?
- How can the respondent-varying characteristics can be identified?
- Do we have any info on the ‘quality’ of the data?
- What’s the main difference between how the data is organized here and the one presented in the Lab 1?
<- readr::read_csv("https://github.com/albertostefanelli/conjoint_class/raw/master/data/Kirkland_Coppock_mturk_replication.csv")
kc_data
head(kc_data)
# A tibble: 6 × 18
contest_no win comp policy_index valence_index Gender Age Race Job Political Party democrat republican resp_pid_7 resp_pid_3_text resp_mturkid same_party mturk_clean
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl>
1 1 0 79 8 2 Male 65 Asian Small Business Owner State Legislator Independent 1 0 2 Democrat A121M38BLAUHOY 1 1
2 1 1 69 8 2 Male 55 Asian Small Business Owner None Independent 1 0 2 Democrat A121M38BLAUHOY 1 1
3 2 0 76 10 6 Male 35 Hispanic Small Business Owner State Legislator Independent 1 0 2 Democrat A121M38BLAUHOY 0 1
4 2 1 100 9 6 Male 65 Black Small Business Owner Mayor Republican 1 0 2 Democrat A121M38BLAUHOY 0 1
5 3 1 100 10 6 Female 45 White Educator Mayor non-partisan 1 0 2 Democrat A121M38BLAUHOY 1 1
6 3 0 82 9 4 Female 45 Hispanic Small Business Owner City Council Member non-partisan 1 0 2 Democrat A121M38BLAUHOY 1 1
As we will see, in conjoint experiments, we need to distinguish between the number of respondents, number of task performed by each respondent, and the number of profiles shown in the conjoint task.
The number or rows is the total number of observations uses for the analysis.
nrow(kc_data)
[1] 12032
To know the number of respondents we need to divide this number by the number of conjoint tasks and the number of profiles.
# number or tasks
unique(kc_data$contest_no)
[1] 1 2 3 4 5
# number or profiles
table(kc_data$contest_no,kc_data$win)
0 1
1 1201 1203
2 1204 1205
3 1202 1203
4 1203 1203
5 1204 1204
<- 12032/5/2) (respndents
[1] 1203.2
# number or tasks
unique(kc_data$contest_no)
[1] 1 2 3 4 5
# number or profiles
table(kc_data$contest_no,kc_data$win)
0 1
1 1201 1203
2 1204 1205
3 1202 1203
4 1203 1203
5 1204 1204
<- 12032/5/2) (respndents
[1] 1203.2
In this case, we have only 1 respondent-varying characteristic that is its PID.
|> dplyr::select(dplyr::starts_with("resp_")) kc_data
# A tibble: 12,032 × 3
resp_pid_7 resp_pid_3_text resp_mturkid
<dbl> <chr> <chr>
1 2 Democrat A121M38BLAUHOY
2 2 Democrat A121M38BLAUHOY
3 2 Democrat A121M38BLAUHOY
4 2 Democrat A121M38BLAUHOY
5 2 Democrat A121M38BLAUHOY
6 2 Democrat A121M38BLAUHOY
7 2 Democrat A121M38BLAUHOY
8 2 Democrat A121M38BLAUHOY
9 2 Democrat A121M38BLAUHOY
10 2 Democrat A121M38BLAUHOY
# … with 12,022 more rows
4 Design info (reconstructed)
Value | |
---|---|
Sample Size | 1203 |
N Tasks (contest_no) | 5 |
N Profiles | 2 |
Total Obs. | 12,032 |
5 Codebook (reconstructed)
Variable | Value |
---|---|
contest_no | CJ task |
win | Profile chosen |
Gender | CJ Attribute |
Age | CJ Attribute |
Job | CJ Attribute |
Political | CJ Attribute |
Job | CJ Attribute |
Party | CJ Attribute |
resp_mturkid | Respondent ID |
resp_pid_7 | Strength PID (1. Strong Dem – 7. Strong Rep) |
resp_pid_3_text | PID (Democrat, Pure Independent, Republican) |
same_party | whether R identify with the same party as in the candidate in the CJ |
comp | ?? |
policy_index | ?? |
valence_index | ?? |