Intro to Conjoint Experiments | Solutions - 1

1 Environment preparation

# ### Data import ###
# install.packages("readr")     # read datasets
# install.packages("qualtRics") # read qualtrics datasets
# install.packages("here")      # absolute path management
# ### Data manipulation ###
# install.packages("dplyr")     # pipes and data manipulation
# ### Visualization ###
# install.packages("ggplot2")    # graphing capabilities
# ### Estimation ###
# install.packages("cjoint")    # base amce package
# install.packages("cregg")     # amce and mm 
# install.packages("factorEx")  # amce with non-uniform distribution

## Custom build functions 
# library(devtools)
# install_github("albertostefanelli/cjoint") # fixes some problem with cjoint

### Data import ###
library("readr")     
library("qualtRics") 
library("here") 
### Data manipulation ###
library("dplyr")     
### Visualization ###
library("ggplot2")    
### Estimation ###
library("cjoint")   
library("cregg")     
library("factorEx")  

2 Data

We are going to use the data from Kirkland, Patricia A; Coppock, Alexander, 2017, “Replication Data for: Candidate Choice Without Party Labels: New Insights from Conjoint Survey Experiments”, https://doi.org/10.7910/DVN/WSUHI3. The data has no codebook so we need to load it and understand how it is structured.

3 Exercise 1

  1. Load the data
  2. What’s the sample size?
  3. What the variable contest_no refers to?
  4. How can the respondent-varying characteristics can be identified?
  5. Do we have any info on the ‘quality’ of the data?
  6. What’s the main difference between how the data is organized here and the one presented in the Lab 1?
kc_data <- readr::read_csv("https://github.com/albertostefanelli/conjoint_class/raw/master/data/Kirkland_Coppock_mturk_replication.csv")

head(kc_data)
# A tibble: 6 × 18
  contest_no   win  comp policy_index valence_index Gender   Age Race     Job                  Political           Party        democrat republican resp_pid_7 resp_pid_3_text resp_mturkid   same_party mturk_clean
       <dbl> <dbl> <dbl>        <dbl>         <dbl> <chr>  <dbl> <chr>    <chr>                <chr>               <chr>           <dbl>      <dbl>      <dbl> <chr>           <chr>               <dbl>       <dbl>
1          1     0    79            8             2 Male      65 Asian    Small Business Owner State Legislator    Independent         1          0          2 Democrat        A121M38BLAUHOY          1           1
2          1     1    69            8             2 Male      55 Asian    Small Business Owner None                Independent         1          0          2 Democrat        A121M38BLAUHOY          1           1
3          2     0    76           10             6 Male      35 Hispanic Small Business Owner State Legislator    Independent         1          0          2 Democrat        A121M38BLAUHOY          0           1
4          2     1   100            9             6 Male      65 Black    Small Business Owner Mayor               Republican          1          0          2 Democrat        A121M38BLAUHOY          0           1
5          3     1   100           10             6 Female    45 White    Educator             Mayor               non-partisan        1          0          2 Democrat        A121M38BLAUHOY          1           1
6          3     0    82            9             4 Female    45 Hispanic Small Business Owner City Council Member non-partisan        1          0          2 Democrat        A121M38BLAUHOY          1           1

As we will see, in conjoint experiments, we need to distinguish between the number of respondents, number of task performed by each respondent, and the number of profiles shown in the conjoint task.

The number or rows is the total number of observations uses for the analysis.

nrow(kc_data)
[1] 12032

To know the number of respondents we need to divide this number by the number of conjoint tasks and the number of profiles.

# number or tasks 
unique(kc_data$contest_no)
[1] 1 2 3 4 5
# number or profiles 
table(kc_data$contest_no,kc_data$win)
   
       0    1
  1 1201 1203
  2 1204 1205
  3 1202 1203
  4 1203 1203
  5 1204 1204
(respndents <- 12032/5/2)
[1] 1203.2
# number or tasks 
unique(kc_data$contest_no)
[1] 1 2 3 4 5
# number or profiles 
table(kc_data$contest_no,kc_data$win)
   
       0    1
  1 1201 1203
  2 1204 1205
  3 1202 1203
  4 1203 1203
  5 1204 1204
(respndents <- 12032/5/2)
[1] 1203.2

In this case, we have only 1 respondent-varying characteristic that is its PID.

kc_data |> dplyr::select(dplyr::starts_with("resp_"))
# A tibble: 12,032 × 3
   resp_pid_7 resp_pid_3_text resp_mturkid  
        <dbl> <chr>           <chr>         
 1          2 Democrat        A121M38BLAUHOY
 2          2 Democrat        A121M38BLAUHOY
 3          2 Democrat        A121M38BLAUHOY
 4          2 Democrat        A121M38BLAUHOY
 5          2 Democrat        A121M38BLAUHOY
 6          2 Democrat        A121M38BLAUHOY
 7          2 Democrat        A121M38BLAUHOY
 8          2 Democrat        A121M38BLAUHOY
 9          2 Democrat        A121M38BLAUHOY
10          2 Democrat        A121M38BLAUHOY
# … with 12,022 more rows

4 Design info (reconstructed)

Value
Sample Size 1203
N Tasks (contest_no) 5
N Profiles 2
Total Obs. 12,032

5 Codebook (reconstructed)

Variable Value
contest_no CJ task
win Profile chosen
Gender CJ Attribute
Age CJ Attribute
Job CJ Attribute
Political CJ Attribute
Job CJ Attribute
Party CJ Attribute
resp_mturkid Respondent ID
resp_pid_7 Strength PID (1. Strong Dem – 7. Strong Rep)
resp_pid_3_text PID (Democrat, Pure Independent, Republican)
same_party whether R identify with the same party as in the candidate in the CJ
comp ??
policy_index ??
valence_index ??