Intro to Conjoint Experiments | Solutions - 1

1 Environment preparation

# ### Data import ###
# install.packages("readr")     # read datasets
# install.packages("qualtRics") # read qualtrics datasets
# install.packages("here")      # absolute path management
# ### Data manipulation ###
# install.packages("dplyr")     # pipes and data manipulation
# ### Visualization ###
# install.packages("ggplot2")    # graphing capabilities
# ### Estimation ###
# install.packages("cjoint")    # base amce package
# install.packages("cregg")     # amce and mm 
# install.packages("factorEx")  # amce with non-uniform distribution

## Custom build functions 
# library(devtools)
# install_github("albertostefanelli/cjoint") # fixes some problem with cjoint

### Data import ###
library("readr")     
library("qualtRics") 
library("here") 
### Data manipulation ###
library("dplyr")     
### Visualization ###
library("ggplot2")    
### Estimation ###
library("cjoint")   
library("cregg")     
library("factorEx")

2 Data

We are going to use the data from Kirkland, Patricia A; Coppock, Alexander, 2017, “Replication Data for: Candidate Choice Without Party Labels: New Insights from Conjoint Survey Experiments”, https://doi.org/10.7910/DVN/WSUHI3. The data has no codebook so we need to load it and understand how it is structured.

3 Exercise 1

Load the data
What’s the sample size?
What the variable contest_no refers to?
How can the respondent-varying characteristics can be identified?
Do we have any info on the ‘quality’ of the data?
What’s the main difference between how the data is organized here and the one presented in the Lab 1?

kc_data <- readr::read_csv("https://github.com/albertostefanelli/conjoint_class/raw/master/data/Kirkland_Coppock_mturk_replication.csv")

head(kc_data)

# A tibble: 6 × 18
  contest_no   win  comp policy_index valence_index Gender   Age Race     Job                  Political           Party        democrat republican resp_pid_7 resp_pid_3_text resp_mturkid   same_party mturk_clean
       <dbl> <dbl> <dbl>        <dbl>         <dbl> <chr>  <dbl> <chr>    <chr>                <chr>               <chr>           <dbl>      <dbl>      <dbl> <chr>           <chr>               <dbl>       <dbl>
1          1     0    79            8             2 Male      65 Asian    Small Business Owner State Legislator    Independent         1          0          2 Democrat        A121M38BLAUHOY          1           1
2          1     1    69            8             2 Male      55 Asian    Small Business Owner None                Independent         1          0          2 Democrat        A121M38BLAUHOY          1           1
3          2     0    76           10             6 Male      35 Hispanic Small Business Owner State Legislator    Independent         1          0          2 Democrat        A121M38BLAUHOY          0           1
4          2     1   100            9             6 Male      65 Black    Small Business Owner Mayor               Republican          1          0          2 Democrat        A121M38BLAUHOY          0           1
5          3     1   100           10             6 Female    45 White    Educator             Mayor               non-partisan        1          0          2 Democrat        A121M38BLAUHOY          1           1
6          3     0    82            9             4 Female    45 Hispanic Small Business Owner City Council Member non-partisan        1          0          2 Democrat        A121M38BLAUHOY          1           1

As we will see, in conjoint experiments, we need to distinguish between the number of respondents, number of task performed by each respondent, and the number of profiles shown in the conjoint task.

The number or rows is the total number of observations uses for the analysis.

nrow(kc_data)

[1] 12032

To know the number of respondents we need to divide this number by the number of conjoint tasks and the number of profiles.

# number or tasks 
unique(kc_data$contest_no)

[1] 1 2 3 4 5

# number or profiles 
table(kc_data$contest_no,kc_data$win)

   
       0    1
  1 1201 1203
  2 1204 1205
  3 1202 1203
  4 1203 1203
  5 1204 1204

(respndents <- 12032/5/2)

[1] 1203.2

# number or tasks 
unique(kc_data$contest_no)

[1] 1 2 3 4 5

# number or profiles 
table(kc_data$contest_no,kc_data$win)

   
       0    1
  1 1201 1203
  2 1204 1205
  3 1202 1203
  4 1203 1203
  5 1204 1204

(respndents <- 12032/5/2)

[1] 1203.2

In this case, we have only 1 respondent-varying characteristic that is its PID.

kc_data |> dplyr::select(dplyr::starts_with("resp_"))

# A tibble: 12,032 × 3
   resp_pid_7 resp_pid_3_text resp_mturkid  
        <dbl> <chr>           <chr>         
 1          2 Democrat        A121M38BLAUHOY
 2          2 Democrat        A121M38BLAUHOY
 3          2 Democrat        A121M38BLAUHOY
 4          2 Democrat        A121M38BLAUHOY
 5          2 Democrat        A121M38BLAUHOY
 6          2 Democrat        A121M38BLAUHOY
 7          2 Democrat        A121M38BLAUHOY
 8          2 Democrat        A121M38BLAUHOY
 9          2 Democrat        A121M38BLAUHOY
10          2 Democrat        A121M38BLAUHOY
# … with 12,022 more rows

4 Design info (reconstructed)

Value
Sample Size	1203
N Tasks (contest_no)	5
N Profiles	2
Total Obs.	12,032

5 Codebook (reconstructed)

Variable	Value
contest_no	CJ task
win	Profile chosen
Gender	CJ Attribute
Age	CJ Attribute
Job	CJ Attribute
Political	CJ Attribute
Job	CJ Attribute
Party	CJ Attribute
resp_mturkid	Respondent ID
resp_pid_7	Strength PID (1. Strong Dem – 7. Strong Rep)
resp_pid_3_text	PID (Democrat, Pure Independent, Republican)
same_party	whether R identify with the same party as in the candidate in the CJ
comp	??
policy_index	??
valence_index	??