Intro to Conjoint Experiments | Lab - 5
1 Outline
- Interactive models
- Between Attributes
- Subgroups Analysis (AMCE and MM)
- Constrained design
- Non-uniform marginal distribution
- Design-based approaches
- Model-based approaches
- Power Analysis using simulations
- Mixture models
- Treatment heterogeneity
2 Dataset used
- 2019 CEU Experimental Political Science dataset
- Q578 Q579 Q580: Choice CJ Task
- F-*-*: Conjoint features
- ResponseId: id
- Q78: Employment status
- Q77: Race
- Q76: Education
- Q75: Gender
- Q74: Age
- Q581: Religiosity
- Immigration experiment from Hainmueller, Hopkins, and Yamamoto (2014)
- Partisan labels experiment from Kirkland and Coppock (2018)
3 Environment preparation
# ### Data import ###
# install.packages("readr") # read datasets
# install.packages("qualtRics") # read qualtrics datasets
# ### Data manipulation ###
# install.packages("dplyr") # pipes and data manipulation
# install.packages("stringr") # text manipulation
# ### Visualization ###
# install.packages("ggplot2") # graphing capabilities
# install.packages("ggimage") # integrating images in ggplot
# ### Estimation ###
# install.packages("cjoint") # base amce package
# install.packages("cregg") # amce and mm
# install.packages("factorEx") # amce with non-uniform distribution
# install.packages("flexmix") # mixture models
## Custom build functions
# library(devtools)
# devtools::install_github("albertostefanelli/cjoint") # fixes some problem with cjoint
# devtools::install_github("naoki-egami/factorEx", dependencies=TRUE) # amce with non-uniform distribution
# devtools::install_github("albertostefanelli/cjsimPWR") # power analysis using simulation
### Data import ###
library("readr")
library("qualtRics")
### Data manipulation ###
library("dplyr")
library("stringr")
### Visualization ###
library("ggplot2")
library("ggimage")
### Estimation ###
library("cjoint")
library("cregg")
library("factorEx")
library("flexmix")
library("cjsimPWR")
4 Load the data: Non-Qualtrics data
<- readr::read_csv("https://github.com/albertostefanelli/conjoint_class/raw/master/data/experimental_political_science_2019_cleaned.csv")
df_base
head(df_base)
# A tibble: 6 × 48
...1 `Duration (in seconds)` Finished ResponseId Q578 Q579 Q580 Q271_1 Q271_2 Q271_3 Q271_4 Q271_5 Q271_6 Q78 Q77 Q76 Q75 Q74 Q581 `F-1-1` `F-1-1-1` `F-1-2` `F-1-1-2` `F-1-3` `F-1-1-3` `F-1-2-1` `F-1-2-2` `F-1-2-3` `F-2-1` `F-2-1-1` `F-2-2` `F-2-1-2` `F-2-3` `F-2-1-3` `F-2-2-1`
<dbl> <dbl> <lgl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 295 TRUE R_3QMq6EzoaEhT… Cand… Cand… Cand… 6 6 6 6 3 5 Stud… Cauc… Some… Male 22 Never Past P… None Policy… Welcome … Penal … The cand… Approxim… Acceptan… No proce… Past P… Approxim… Policy… Acceptan… Penal … The cand… Approxim…
2 2 358 TRUE R_2ZWYdunD4yrj… Cand… Cand… Cand… 6 3 3 6 1 - S… 7 - S… Unem… Cauc… Some… Male 32 Never Past P… Approxim… Penal … No proce… Policy… Welcome … Approxim… The cand… Acceptan… Past P… Approxim… Penal … No proce… Policy… Acceptan… Approxim…
3 3 310 TRUE R_3GEedrAF3Om8… Cand… Cand… Cand… 6 3 6 5 3 6 Stud… Asian Some… Male 21 Never Penal … The cand… Past P… None Policy… Acceptan… No proce… None Acceptan… Penal … The cand… Past P… Approxim… Policy… Collecti… The cand…
4 4 365 TRUE R_emv0H4GnLSxq… Cand… Cand… Cand… 6 5 5 4 - N… 5 5 Empl… Cauc… Some… Male 30 Regu… Past P… None Policy… Acceptan… Penal … The cand… None Collecti… No proce… Past P… Approxim… Policy… Acceptan… Penal … No proce… None
5 5 364 TRUE R_1CIruyJc7nV1… Cand… Cand… Cand… 5 4 - N… 5 5 6 5 Empl… Cauc… Bach… Male 30 Regu… Policy… Welcome … Past P… Approxim… Penal … The cand… Welcome … Approxim… The cand… Policy… Welcome … Past P… None Penal … The cand… Acceptan…
6 6 439 TRUE R_2TMFzzb17eua… Cand… Cand… Cand… 5 4 - N… 4 - N… 4 - N… 3 5 Empl… Hisp… Bach… Male 41 Some… Policy… Collecti… Penal … The cand… Past P… Approxim… Welcome … The cand… None Policy… Acceptan… Penal … No proce… Past P… None Welcome …
# ℹ 13 more variables: `F-2-2-2` <chr>, `F-2-2-3` <chr>, `F-3-1` <chr>, `F-3-1-1` <chr>, `F-3-2` <chr>, `F-3-1-2` <chr>, `F-3-3` <chr>, `F-3-1-3` <chr>, `F-3-2-1` <chr>, `F-3-2-2` <chr>, `F-3-2-3` <chr>, duration_mins <dbl>, scaled_duration_mins <dbl>
# transform to numeric
$Q578 <- as.numeric(as.factor(df_base$Q578))
df_base$Q579 <- as.numeric(as.factor(df_base$Q579))
df_base$Q580 <- as.numeric(as.factor(df_base$Q580))
df_base
<- read.df.qualtRics(df_base,
df_long_format # the DV for our conjoint tasks (df_base 1 and Candidate 2)
responses=c("Q578","Q579","Q580"),
#covariates=c("Q76"),
respondentID = "ResponseId"
)
5 Interaction between attributes (ACIE) and trade-offs
Our expectation is that the effect of being accused of corruption is conditional on the time a politician has spent in politics.
<- cjoint::amce(selected ~
fit_1 +
Past.Political.Experience +
Penal.proceedings +
Policy.Proposal * Past.Political.Experience,
Penal.proceedings data=df_long_format,
cluster=TRUE,
respondent.id="ResponseId"
)
summary(fit_1)
------------------------------------------
Average Marginal Component Effects (AMCE):
------------------------------------------
Attribute Level Estimate Std. Err z value Pr(>|z|)
Past.Political.Experience Approximately 20 years 0.0096664 0.034141 0.28313 0.777075131376317118636620762117672711611
Past.Political.Experience None -0.1279208 0.030551 -4.18718 0.000028244150852242063960261669874896029 ***
Penal.proceedings The candidate has been convicted of corruption -0.4107275 0.032990 -12.44993 0.000000000000000000000000000000000013998 ***
Penal.proceedings The candidate is under investigation for corruption -0.2957963 0.034377 -8.60452 0.000000000000000007663632871421139651151 ***
Policy.Proposal Collective expulsion of immigrants and closure of the border -0.2743198 0.038022 -7.21486 0.000000000000539910104817876547303644823 ***
Policy.Proposal Welcome immigrants and organise human corridor -0.0751002 0.038872 -1.93200 0.053359578223767793747622789624074357562
---
Number of Obs. = 1110
---
Number of Respondents = 185
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
--------------------
AMCE Baseline Levels:
--------------------
Attribute Level
Past.Political.Experience Approximately 10 years
Penal.proceedings No proceedings
Policy.Proposal Acceptance of immigrants conditional on certain requisites
---------------------------------------------
Average Component Interaction Effects (ACIE):
---------------------------------------------
Attribute Level Estimate Std. Err z value Pr(>|z|)
Past.Political.Experience:Penal.proceedings Approximately 20 years:The candidate has been convicted of corruption -0.161383 0.077751 -2.07564 0.037927 *
Past.Political.Experience:Penal.proceedings None:The candidate has been convicted of corruption -0.066669 0.082882 -0.80438 0.421178
Past.Political.Experience:Penal.proceedings Approximately 20 years:The candidate is under investigation for corruption -0.089770 0.081424 -1.10251 0.270240
Past.Political.Experience:Penal.proceedings None:The candidate is under investigation for corruption -0.112070 0.083351 -1.34456 0.178768
---
Number of Obs. = 1110
Number of Respondents = 185
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
--------------------
ACIE Baseline Levels:
--------------------
Attribute Level
Past.Political.Experience:Penal.proceedings Approximately 10 years:No proceedings
5.1 ACIE: Least liked profile
- Sometimes it is interesting to assess the combination of attributes that are the least liked by the respondents
- This is might be relevant in some context (e.g. new policy)
- To achieve this, we are going to fit a full saturated model
- Meaning, we fit a model with all the highest interactions possible
- In this case, we fit a model with all the 3rd order interactions
<- cjoint::amce(selected ~
fit_2 *Penal.proceedings*Policy.Proposal,
Past.Political.Experiencedata=df_long_format,
cluster=TRUE,
respondent.id="ResponseId"
)
summary(fit_2)
------------------------------------------
Average Marginal Component Effects (AMCE):
------------------------------------------
Attribute Level Estimate Std. Err z value Pr(>|z|)
Past.Political.Experience Approximately 20 years 0.013514 0.034073 0.39661 0.69165537662429610321623840718530118465
Past.Political.Experience None -0.125044 0.031668 -3.94863 0.00007859878292662227832268695415507409 ***
Penal.proceedings The candidate has been convicted of corruption -0.402841 0.032881 -12.25161 0.00000000000000000000000000000000016469 ***
Penal.proceedings The candidate is under investigation for corruption -0.288860 0.034657 -8.33494 0.00000000000000007753662744227048302961 ***
Policy.Proposal Collective expulsion of immigrants and closure of the border -0.280505 0.037839 -7.41315 0.00000000000012333536218225874866920267 ***
Policy.Proposal Welcome immigrants and organise human corridor -0.077665 0.039118 -1.98539 0.04710063098324948627215746910223970190 *
---
Number of Obs. = 1110
---
Number of Respondents = 185
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
--------------------
AMCE Baseline Levels:
--------------------
Attribute Level
Past.Political.Experience Approximately 10 years
Penal.proceedings No proceedings
Policy.Proposal Acceptance of immigrants conditional on certain requisites
---------------------------------------------
Average Component Interaction Effects (ACIE):
---------------------------------------------
Attribute Level Estimate Std. Err z value Pr(>|z|)
Past.Political.Experience:Penal.proceedings Approximately 20 years:The candidate has been convicted of corruption -0.1689454 0.077037 -2.19303 0.0283051 *
Past.Political.Experience:Penal.proceedings None:The candidate has been convicted of corruption -0.0796609 0.084073 -0.94752 0.3433740
Past.Political.Experience:Penal.proceedings Approximately 20 years:The candidate is under investigation for corruption -0.1025972 0.080967 -1.26714 0.2051040
Past.Political.Experience:Penal.proceedings None:The candidate is under investigation for corruption -0.1279843 0.083486 -1.53300 0.1252759
Past.Political.Experience:Policy.Proposal Approximately 20 years:Collective expulsion of immigrants and closure of the border -0.0655741 0.088543 -0.74059 0.4589435
Past.Political.Experience:Policy.Proposal None:Collective expulsion of immigrants and closure of the border 0.0088892 0.081457 0.10913 0.9131019
Past.Political.Experience:Policy.Proposal Approximately 20 years:Welcome immigrants and organise human corridor -0.0644945 0.087605 -0.73620 0.4616113
Past.Political.Experience:Policy.Proposal None:Welcome immigrants and organise human corridor 0.0930505 0.088700 1.04904 0.2941581
Penal.proceedings:Policy.Proposal The candidate has been convicted of corruption:Collective expulsion of immigrants and closure of the border 0.0734913 0.071416 1.02905 0.3034552
Penal.proceedings:Policy.Proposal The candidate is under investigation for corruption:Collective expulsion of immigrants and closure of the border 0.0433472 0.077717 0.55775 0.5770117
Penal.proceedings:Policy.Proposal The candidate has been convicted of corruption:Welcome immigrants and organise human corridor -0.0856311 0.081183 -1.05479 0.2915212
Penal.proceedings:Policy.Proposal The candidate is under investigation for corruption:Welcome immigrants and organise human corridor -0.0452215 0.075772 -0.59681 0.5506367
Past.Political.Experience:Penal.proceedings:Policy.Proposal Approximately 20 years:The candidate has been convicted of corruption:Collective expulsion of immigrants and closure of the border -0.5391220 0.202312 -2.66481 0.0077032 **
Past.Political.Experience:Penal.proceedings:Policy.Proposal None:The candidate has been convicted of corruption:Collective expulsion of immigrants and closure of the border -0.0840575 0.217443 -0.38657 0.6990721
Past.Political.Experience:Penal.proceedings:Policy.Proposal Approximately 20 years:The candidate is under investigation for corruption:Collective expulsion of immigrants and closure of the border -0.3888759 0.206953 -1.87906 0.0602370
Past.Political.Experience:Penal.proceedings:Policy.Proposal None:The candidate is under investigation for corruption:Collective expulsion of immigrants and closure of the border -0.0770578 0.230378 -0.33448 0.7380138
Past.Political.Experience:Penal.proceedings:Policy.Proposal Approximately 20 years:The candidate has been convicted of corruption:Welcome immigrants and organise human corridor 0.0242687 0.204120 0.11889 0.9053590
Past.Political.Experience:Penal.proceedings:Policy.Proposal None:The candidate has been convicted of corruption:Welcome immigrants and organise human corridor 0.2053987 0.206518 0.99458 0.3199412
Past.Political.Experience:Penal.proceedings:Policy.Proposal Approximately 20 years:The candidate is under investigation for corruption:Welcome immigrants and organise human corridor 0.2400620 0.197758 1.21392 0.2247785
Past.Political.Experience:Penal.proceedings:Policy.Proposal None:The candidate is under investigation for corruption:Welcome immigrants and organise human corridor 0.1870717 0.202510 0.92377 0.3556086
---
Number of Obs. = 1110
Number of Respondents = 185
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
--------------------
ACIE Baseline Levels:
--------------------
Attribute Level
Past.Political.Experience:Penal.proceedings Approximately 10 years:No proceedings
Past.Political.Experience:Policy.Proposal Approximately 10 years:Acceptance of immigrants conditional on certain requisites
Penal.proceedings:Policy.Proposal No proceedings:Acceptance of immigrants conditional on certain requisites
Past.Political.Experience:Penal.proceedings:Policy.Proposal Approximately 10 years:No proceedings:Acceptance of immigrants conditional on certain requisites
- In this case, the last liked profile is the a high experience candidate that has been convicted and want to expel all the migrants and shut down the border
- The interpretation needs to be causal so we need to flip the previous statement and focus on the attributes
- Q: How can the “least liked” profile be casually interpreted?
6 Subgroup analysis (conditional AMCEs)
- Expectation: minority respondents display group interest such as they will be more in favour of accepting migrants
- Prepare the data
- Transform race into factor
- Use a reference category Caucasian
- Wide to Long
- Omit NAs
$Q77 <- factor(df_base$Q77)
df_base$Q77 <- relevel(df_base$Q77, "Caucasian")
df_base
<- read.df.qualtRics(df_base,
df_long_format responses=c("Q578","Q579","Q580"),
covariates=c("Q77"),
respondentID = "ResponseId"
)
# interactions work ONLY with compete observations
<- na.omit(df_long_format) df_long_format
6.1 Conditional AMCE
- Run the AMCE with the interaction between each Attribute and the race of the respondent
<-cregg::cj(df_long_format, selected ~
fit_3 +
Past.Political.Experience +
Penal.proceedings +
Policy.Proposal
Penal.proceedings, by = ~Q77,
id = ~ResponseId,
estimate = "amce")
<-cregg::cj(df_long_format, selected ~
fit_3_diff +
Past.Political.Experience +
Penal.proceedings +
Policy.Proposal
Penal.proceedings, by = ~Q77,
id = ~ResponseId,
estimate = "amce_diff")
plot(rbind(fit_3, fit_3_diff)) + ggplot2::facet_wrap(~BY)
- Some minority groups seem to display preference that support my hypothesis
- There is reference heterogeneity between respondents
6.2 Marginal Means
But again, this plot showcases differences in conjoint effect sizes (AMCEs) not descriptive differences in underlying preferences. A plot of the differences in MMs might be better
<-cregg::cj(df_long_format, selected ~
fit_4 +
Past.Political.Experience +
Penal.proceedings +
Policy.Proposal
Penal.proceedings, by = ~Q77,
id = ~ResponseId,
estimate = "mm")
<-cregg::cj(df_long_format, selected ~
fit_4_diff +
Past.Political.Experience +
Penal.proceedings +
Policy.Proposal
Penal.proceedings, by = ~Q77,
id = ~ResponseId,
estimate = "mm_diff")
plot(rbind(fit_4, fit_4_diff)) + ggplot2::facet_wrap(~BY, ncol = 3L)
And while the inferential differences may be small, the risk of using differences in conditional AMCEs versus differences in MMs is that both the size and even the direction of subgroup differences can be misleading when presented as differences in AMCEs
$Estimate <- "AMCE"
fit_3_diff$Estimate <- "MM"
fit_4_diff
plot(rbind(fit_3_diff, fit_4_diff)) + ggplot2::facet_wrap(~Estimate + BY, ncol = 4L)
6.3 F-Test
- We are going to perform an F-Test (also called Omnibus test) to formally assess if there is no interactive effect between Attributes and Education
- We compared a model with interaction and one without interaction to assess whether preferences significantly vary across different groups.
- If the significance test is <0.05, the data suggest that there is a significance difference between groups
- If the significance test is >0.05, the data suggest that the two groups show no difference when one of the attribute is changed.
- 3 steps
- Estimate the base line model without interactions
- Estimate the interaction model
- Perform a F-Test between the baseline model and the interaction model
- with cregg, we can do it in 1 function
::cj_anova(df_long_format, selected ~
cregg+
Past.Political.Experience +
Penal.proceedings +
Policy.Proposal
Penal.proceedings, by = ~Q77,
id = ~ResponseId)
Analysis of Deviance Table
Model 1: selected ~ Past.Political.Experience + Penal.proceedings + Policy.Proposal +
Penal.proceedings
Model 2: selected ~ Past.Political.Experience + Penal.proceedings + Policy.Proposal +
Q77 + Past.Political.Experience:Q77 + Penal.proceedings:Q77 +
Policy.Proposal:Q77
Resid. Df Resid. Dev Df Deviance F Pr(>F)
1 1103 223.87
2 1075 217.63 28 6.2434 1.1014 0.3273
7 Constrained Designs
- Immigration experiment from Hainmueller, Hopkins, and Yamamoto (2014)
- Load the data
- Specify the design with the makeDesign() function
- Run AMCE for unconstrained design
- Run AMCE for constrained design
data("immigrationconjoint")
data("immigrationdesign")
## You can also load a design from a .dat file from the Conjoint SDT
#immigrationdesign <- makeDesign(type="file", filename="immigrant.dat")
<- cjoint::amce(Chosen_Immigrant ~ Gender + Education + `Language Skills` +
fit_unconstrained `Country of Origin` + Job + `Job Experience` + `Job Plans` +
`Reason for Application` + `Prior Entry`,
data=immigrationconjoint,
cluster=TRUE,
respondent.id="CaseID",
design="uniform"
)
<- cjoint::amce(Chosen_Immigrant ~ Gender + Education + `Language Skills` +
fit_constrained `Country of Origin` + Job + `Job Experience` + `Job Plans` +
`Reason for Application` + `Prior Entry`,
data=immigrationconjoint,
cluster=TRUE,
respondent.id="CaseID",
design=immigrationdesign
)
# let's check the estimate for "Computer Programmer"
<- summary(fit_unconstrained)$amce$Estimate[23]
amce_unconstrained <- summary(fit_constrained)$amce$Estimate[23]
amce_constrained
cbind(amce_unconstrained,amce_constrained, round((amce_unconstrained-amce_constrained)*100,2))
amce_unconstrained amce_constrained
[1,] 0.05941493 0.07910121 -1.97
8 Non-uniform marginal distribution
We can also introduce non-uniform marginal distribution to improuve the external validity of our experiment. Let’s first fit a baseline model that uses an uniform distribution
# this function calculate the frequencies of our constrained design
<- cregg::cj_freqs(df_long_format,
freq_cj ~
selected +
Past.Political.Experience +
Penal.proceedings +
Policy.Proposal
Penal.proceedings,id = ~ResponseId
)
# this function transform the frequencies into proportion to be included in the _pAMCE() function
<- function(freq_cregg=freq_cregg){
function_marginal_distribution
<- freq_cregg %>%
data group_by(feature) %>%
mutate(prop=estimate/sum(estimate))
<- c()
marginal_f for (f in unique(data$feature)){
<- c()
marginal_l <- data %>% subset(feature==f)
subset_freq_cj_props $level <- factor(subset_freq_cj_props$level)
subset_freq_cj_props
for (l in subset_freq_cj_props$level){
<- subset_freq_cj_props %>% filter(level==l)
subset_freq_cj_props_filtered <- subset_freq_cj_props_filtered$prop
marginal_l[l]
}<- marginal_l
marginal_f[[f]]
}return(marginal_f)
}
# call the function
<- function_marginal_distribution(freq_cregg=freq_cj)
non_uniform_distribution
non_uniform_distribution
$Past.Political.Experience
Approximately 10 years Approximately 20 years None
0.3270270 0.3324324 0.3405405
$Penal.proceedings
No proceedings The candidate has been convicted of corruption The candidate is under investigation for corruption
0.3531532 0.3324324 0.3144144
$Policy.Proposal
Acceptance of immigrants conditional on certain requisites Collective expulsion of immigrants and closure of the border Welcome immigrants and organise human corridor
0.3333333 0.3207207 0.3459459
Let’s modify the distribution using something that makes more sense in the context of this experiment. Specificaly, we want to specify a different marginal distributionfor the attribute Political Experience and Pena Proceedings.
$Past.Political.Experience <- c("None" = 0.01,
non_uniform_distribution"Approximately 10 years" = 0.69,
"Approximately 20 years" = 0.3)
$Penal.proceedings <- c("No proceedings" = 0.9,
non_uniform_distribution"The candidate has been convicted of corruption" = 0.05,
"The candidate is under investigation for corruption" = 0.05)
non_uniform_distribution
$Past.Political.Experience
None Approximately 10 years Approximately 20 years
0.01 0.69 0.30
$Penal.proceedings
No proceedings The candidate has been convicted of corruption The candidate is under investigation for corruption
0.90 0.05 0.05
$Policy.Proposal
Acceptance of immigrants conditional on certain requisites Collective expulsion of immigrants and closure of the border Welcome immigrants and organise human corridor
0.3333333 0.3207207 0.3459459
# should sum to 1
sum(non_uniform_distribution$Past.Political.Experience)
[1] 1
- Integrate the new marginal distribution in the estimation
- We are using the function design_pAMCE() from the package factorEx (de la Cuesta, Egami, and Imai 2022)
- model_pAMCE() arguments
- formula: same as before
- df: same as before
- id: same as before
- target_dist: the non_uniform_distribution from function_marginal_distribution()
- target_type: marginal (but also join and partial)
- formula: same as before
<- factorEx::model_pAMCE(
model_marginal formula = selected ~
+
Past.Political.Experience +
Penal.proceedings +
Policy.Proposal
Penal.proceedings,reg = FALSE,
data = df_long_format,
cluster_id = df_long_format$ResponseId,
target_dist = non_uniform_distribution,
target_type = "marginal",
boot= 500,
numCores = 4,
)
summary(model_marginal)
----------------
Population AMCEs:
----------------
target_dist factor level Estimate Std. Error p value
target_1 Past.Political.Experience Approximately 20 years 0.08588296 0.04409969 0.051 .
target_1 Past.Political.Experience None -0.07412879 0.04805617 0.123
target_1 Penal.proceedings The candidate has been convicted of corruption -0.38264193 0.04488193 0.000 ***
target_1 Penal.proceedings The candidate is under investigation for corruption -0.25671930 0.04630749 0.000 ***
target_1 Policy.Proposal Collective expulsion of immigrants and closure of the border -0.30004056 0.06150532 0.000 ***
target_1 Policy.Proposal Welcome immigrants and organise human corridor -0.05917951 0.05411017 0.274
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
<- model_marginal$AMCE$Past.Political.Experience %>% filter(type=="target_1") %>% dplyr::select(estimate)
adjusted_marginal <- model_marginal$AMCE$Past.Political.Experience %>% filter(type=="sample") %>% dplyr::select(estimate)
unadjusted_marginal
cbind(adjusted_marginal, unadjusted_marginal, (adjusted_marginal-unadjusted_marginal)*100)
estimate estimate estimate
3 0.08588296 0.01421017 7.167279
4 -0.07412879 -0.12577195 5.164315
9 Power Analysis
The power analysis R-package cjsimPWR is not yet on CRAN and so you have to install the package directly for my GitHub. To do so, uncomment the following lines
# if(!require(devtools)) install.packages("devtools")
# library(devtools)
# devtools::install_github("albertostefanelli/cjsimPWR")
The current version of the power_sim() function can be utilized to calculate a priori power, Type S, and Type M error rates based on conjoint designs typically used in political science and neighboring disciplines. It can be used when the treatment effect (i.e., AMCE) is either (1) considered to be homogeneous across all respondents or (2) expected to differ substantially across different segments of the population.
Let’s start with an experiment with the following attributes with no subgroups (i.e., homogeneous treatment effects)
- Number of attributes (n_attributes): 3
- Number of levels Attribute 1 (n_levels): 2
- Number of levels Attribute 2 (n_levels): 3
- Number of levels Attribute 3 (n_levels): 5
- Number of respondents (units): 500
- Number of tasks per respondent (n_tasks): 5
- Hypothesized AMCE compared to reference level (true_coef):
- Attribute 1 - Level 1 (ref: 0): 0.20
- Attribute 2 - Level 1 (ref: 0): -0.1
- Attribute 2 - Level 2 (ref: 0): 0.10
- Attribute 3 - Level 1 (ref: 0): -0.03
- Attribute 3 - Level 2 (ref: 0) -0.1
- Attribute 3 - Level 3 (ref: 0): -0.1
- Attribute 3 - Level 4 (ref: 0): 0.1
- Individual level heterogeneity (sigma.u_k): 0.05
- Number of simulations (sim_runs): 100
- Seed for reproducibility of the simulation results (seed) : 2114
<- power_sim(
df_power n_attributes = 3,
n_levels = c(2, 3, 5),
units = 600,
n_tasks = 5,
true_coef = list(0.2, c(-0.1, 0.1), c(-0.03, -0.1, -0.1, 0.1)),
sigma.u_k = 0.05,
sim_runs = 100,
seed = 2114
)
|> print(n=40) df_power
# A tibble: 7 × 7
attrb level true_coef power typeS typeM seed
<chr> <dbl> <dbl> <chr> <chr> <chr> <dbl>
1 var_1 1 0.2 1.00 (0.00) 0.00 (0.00) 0.97 (0.07) 2114
2 var_2 1 -0.1 1.00 (0.00) 0.00 (0.00) 0.95 (0.17) 2114
3 var_2 2 0.1 1.00 (0.00) 0.00 (0.00) 1.02 (0.16) 2114
4 var_3 1 -0.03 0.33 (0.47) 0.00 (0.00) 1.89 (0.34) 2114
5 var_3 2 -0.1 1.00 (0.00) 0.00 (0.00) 0.96 (0.18) 2114
6 var_3 3 -0.1 1.00 (0.00) 0.00 (0.00) 0.96 (0.20) 2114
7 var_3 4 0.1 1.00 (0.00) 0.00 (0.00) 1.03 (0.19) 2114
Next, let’s calculate power for an experiment with a similar design but 3 subgroups of respondents (Democrat, Independent, Republican) with hypothesized differences in conditional AMCEs (i.e., heterogeneous treatment effects).
<- power_sim(
df_power_interaction n_attributes = 3,
n_levels = c(2, 3, 6),
n_tasks = 4,
group_name = c("Democrat", "Independent", "Republican"),
units = c(500, 200, 500),
true_coef = list("Democrat" = list(0.2, c(-0.1, 0.1), c(-0.1, -0.1, -0.1, 0.1, -0.03)),
"Independent" = list(0.1, c(-0.2, 0.05), c(-0.1, 0.1, 0.1, 0.3, 0.01)),
"Republican" = list(0.1, c(-0.1, -0.0005), c(-0.1, 0.2, -0.1, 0.1, -0.01))
),sigma.u_k = 0.05,
sim_runs = 100,
seed = 12
)
|> print(n=40) df_power_interaction
# A tibble: 24 × 8
id_grp attrb level true_coef power typeS typeM seed
<chr> <chr> <dbl> <dbl> <chr> <chr> <chr> <dbl>
1 Democrat var_1 1 0.2 1.00 (0.00) 0.00 (0.00) 0.97 (0.08) 12
2 Democrat var_2 1 -0.1 1.00 (0.00) 0.00 (0.00) 0.95 (0.23) 12
3 Democrat var_2 2 0.1 1.00 (0.00) 0.00 (0.00) 1.01 (0.21) 12
4 Democrat var_3 1 -0.1 0.97 (0.17) 0.00 (0.00) 0.98 (0.24) 12
5 Democrat var_3 2 -0.1 0.95 (0.22) 0.00 (0.00) 1.05 (0.25) 12
6 Democrat var_3 3 -0.1 0.96 (0.20) 0.00 (0.00) 1.00 (0.25) 12
7 Democrat var_3 4 0.1 0.97 (0.17) 0.00 (0.00) 1.05 (0.26) 12
8 Democrat var_3 5 -0.03 0.22 (0.42) 0.00 (0.00) 2.27 (0.39) 12
9 Independent var_1 1 0.1 0.98 (0.14) 0.00 (0.00) 1.00 (0.23) 12
10 Independent var_2 1 -0.2 1.00 (0.00) 0.00 (0.00) 0.94 (0.15) 12
11 Independent var_2 2 0.05 0.41 (0.49) 0.00 (0.00) 1.60 (0.40) 12
12 Independent var_3 1 -0.1 0.49 (0.50) 0.00 (0.00) 1.12 (0.28) 12
13 Independent var_3 2 0.1 0.57 (0.50) 0.00 (0.00) 1.25 (0.32) 12
14 Independent var_3 3 0.1 0.60 (0.49) 0.00 (0.00) 1.17 (0.29) 12
15 Independent var_3 4 0.3 1.00 (0.00) 0.00 (0.00) 1.00 (0.12) 12
16 Independent var_3 5 0.01 0.03 (0.17) 0.00 (0.00) 10.55 (1.22) 12
17 Republican var_1 1 0.1 1.00 (0.00) 0.00 (0.00) 0.98 (0.14) 12
18 Republican var_2 1 -0.1 1.00 (0.00) 0.00 (0.00) 0.96 (0.19) 12
19 Republican var_2 2 -0.0005 0.04 (0.20) 0.50 (0.58) 5.99 (103.52) 12
20 Republican var_3 1 -0.1 0.85 (0.36) 0.00 (0.00) 1.01 (0.23) 12
21 Republican var_3 2 0.2 1.00 (0.00) 0.00 (0.00) 1.04 (0.13) 12
22 Republican var_3 3 -0.1 0.91 (0.29) 0.00 (0.00) 0.94 (0.20) 12
23 Republican var_3 4 0.1 0.94 (0.24) 0.00 (0.00) 1.08 (0.28) 12
24 Republican var_3 5 -0.01 0.07 (0.26) 0.43 (0.53) 0.98 (7.57) 12
10 Conjoint Mixture Model
- We are going to use the Kirkland and Coppock (2018) conjoint experiment on partisan labels
- The authors use CJ to asses the impact of partisan labels in non-partisan elections
- Before getting into mixture modelling we need to load and prepare the data for the analysis
<- readr::read_csv("https://github.com/albertostefanelli/conjoint_class/raw/master/data/Kirkland_Coppock_mturk_replication.csv")
df_base
<- df_base %>%
df_base mutate_at(vars(Political,Job,Party,Gender,Age,Race), as.factor) %>%
filter(Party != "non-partisan") %>%
mutate_at(vars(Party), droplevels)
# Get same reference categories as Kirkland & Coppock (2018)
$Political <- relevel(df_base$Political, ref = "None")
df_base$Job <- relevel(df_base$Job, ref = "Educator")
df_base$Party <- relevel(df_base$Party, ref = "Independent")
df_base$Gender <- relevel(df_base$Gender, ref = "Female")
df_base$Age <- relevel(df_base$Age, ref = "35")
df_base$Race <- relevel(df_base$Race, ref = "White") df_base
- We are going to git a pooled model disregarding, for the moment, any group differences
# model formula
<- win ~ Party + Political + Job + Race + Age + Gender
frml
# Pooled Model
<- amce(df_base,
fit_1
frml,id = ~ resp_mturkid)
# let's extract the coefficents to plot only the effect of the attribute party
<- fit_1 %>%
m0pool filter(feature == "Party") %>%
mutate(model = "Pooled") %>%
::select(model, level, estimate, std.error, lower, upper)
dplyr
# Pooled Visualisation
<- ggplot(m0pool, aes(x = level, y = estimate)) +
vis1 geom_point() +
geom_segment(aes(x = level, xend = level,
y = lower, yend = upper)) +
theme_bw() +
theme(
plot.title = element_text(size = 10),
axis.text.x = element_text(size = 7)
+
) xlab("") + ylab("") +
geom_hline(yintercept = 0, color = "black", linetype = "dashed") +
scale_y_continuous(breaks = c(-0.3, -0.2, -0.1, 0, 0.1)) +
coord_flip(ylim = c(-0.35, 0.15))
vis1
- In the pooled model, respondents would rather vote for Independent candidates than either Democrats or Republicans. Such results are typical for conjoints with unobserved subgroups.
- Let’s now split the sample by respondent party ID and fit a model for each group. This is similar to what the cregg package does when you specify the grouping variable using by=Party
# Interaction: Democrats
<- filter(df_base, resp_pid_3_text == "Democrat")
mcp_d <- amce(mcp_d,
fit_2d
frml,id = ~ resp_mturkid,
level_order = "descending")
$resp_party <- "Democrat"
fit_2d
# Interaction: Republican
<- filter(df_base, resp_pid_3_text == "Republican")
mcp_r <- amce(mcp_r,
fit_2r
frml,id = ~ resp_mturkid,
level_order = "descending")
$resp_party <- "Republican"
fit_2r
# Interaction Model
<- rbind(fit_2d, fit_2r) %>%
m1inter filter(feature == "Party") %>%
mutate(model = "Interaction") %>%
mutate(Respondent = resp_party) %>%
::select(model, level, estimate, std.error, lower, upper, Respondent)
dplyr
# Interaction Visualisation
<- m1inter %>%
vis2 mutate(image = ifelse(Respondent == "Democrat",
"http://clipart-library.com/images/BiaKRMaBT.gif",
"https://i1.wp.com/gifgifs.com//animations/jobs-people/politicians/Republican_elephant.gif")) %>%
mutate(image = ifelse(level == "Independent",
NA, image)) %>%
ggplot(aes(x = level, y = estimate,
group = Respondent,
color = Respondent,
shape = Respondent)) +
geom_hline(yintercept = 0, color = "grey30", linetype = "dashed",
alpha = 0.8) +
geom_point(aes(y = estimate), position = position_dodge(width = 0.6)) +
geom_linerange(aes(xmin = level, xmax = level,
ymin = lower, ymax = upper),
position = position_dodge(width = 0.6)) +
geom_image(aes(y = estimate + 0.003,
image = image),
size= .07) +
theme_bw() +
xlab("") + ylab("") +
scale_y_continuous(breaks = c(-0.3, -0.2, -0.1, 0, 0.1)) +
coord_flip(ylim = c(-0.35, 0.15)) +
theme(
legend.position = "none",
plot.title = element_text(size = 10),
axis.text.x = element_text(size = 7)
+
) scale_colour_manual(
values = c("#1404BD", "#DE0100"),
aesthetics = c("colour", "fill")
)
vis2
- Splitting the sample by respondent’s party identification, we see a clear pattern: - Democrats are unlikely to vote for Republicans and vice-versa. - These sub-groups have an effect of different magnitude and direction. - Let’s now see if we can reproduce these results without observing respondent’s party identification
# tell flexmix that the task are nested within respondents
<- update(frml, ~ . | resp_mturkid)
f1_univ
# Model without concomitant variables
set.seed(1402)
<- flexmix(f1_univ, data = df_base, k = 2,
fmod0 model = FLXglm(family = "gaussian"),
#concomitant = FLXPmultinom(~ democrat + republican)
)
parameters(fmod0)[2:3, ]
Comp.1 Comp.2
coef.PartyDemocrat -0.28173962 0.005755052
coef.PartyRepublican 0.07770004 -0.287800220
# we need re-fit the model to obtain sd. errors
<- refit(fmod0)
rfmod0 <- summary(rfmod0) fit_3
$Comp.1
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.3064256 0.0906525 3.3802 0.0007243 ***
PartyDemocrat -0.3147047 0.0582583 -5.4019 0.00000006594 ***
PartyRepublican 0.1287631 0.0710434 1.8125 0.0699155 .
PoliticalCity Council Member 0.0616649 0.0612786 1.0063 0.3142698
PoliticalMayor 0.1823083 0.0610538 2.9860 0.0028263 **
PoliticalRepresentative in Congress 0.3347928 0.0640892 5.2239 0.00000017524 ***
PoliticalState Legislator 0.2139046 0.0618667 3.4575 0.0005452 ***
JobAttorney 0.2022774 0.0686573 2.9462 0.0032172 **
JobBusiness Executive 0.1547245 0.0703626 2.1990 0.0278808 *
JobPolice Officer 0.0021042 0.0685207 0.0307 0.9755022
JobSmall Business Owner 0.2528794 0.0825028 3.0651 0.0021760 **
RaceAsian 0.0055452 0.0674145 0.0823 0.9344443
RaceBlack 0.0276454 0.0558808 0.4947 0.6207975
RaceHispanic 0.0188299 0.0598405 0.3147 0.7530137
Age45 -0.0528786 0.0582919 -0.9071 0.3643355
Age55 0.0312026 0.0579032 0.5389 0.5899727
Age65 -0.0281696 0.0577519 -0.4878 0.6257136
GenderMale -0.0601059 0.0413175 -1.4547 0.1457433
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
$Comp.2
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.4737660 0.0370646 12.7822 < 0.00000000000000022 ***
PartyDemocrat -0.0080373 0.0246772 -0.3257 0.7446531
PartyRepublican -0.2723647 0.0251804 -10.8165 < 0.00000000000000022 ***
PoliticalCity Council Member 0.1577626 0.0264130 5.9729 0.0000000023303537643 ***
PoliticalMayor 0.2098926 0.0257427 8.1535 0.0000000000000003536 ***
PoliticalRepresentative in Congress 0.1104624 0.0299471 3.6886 0.0002255 ***
PoliticalState Legislator 0.1462051 0.0270628 5.4024 0.0000000657416942404 ***
JobAttorney -0.0485885 0.0275440 -1.7640 0.0777265 .
JobBusiness Executive -0.0509025 0.0270085 -1.8847 0.0594722 .
JobPolice Officer -0.0223847 0.0277967 -0.8053 0.4206453
JobSmall Business Owner -0.0183782 0.0289217 -0.6354 0.5251376
RaceAsian 0.0039160 0.0259045 0.1512 0.8798410
RaceBlack 0.0215717 0.0239588 0.9004 0.3679249
RaceHispanic -0.0245309 0.0245074 -1.0010 0.3168470
Age45 0.0707868 0.0255672 2.7687 0.0056287 **
Age55 0.0504756 0.0240597 2.0979 0.0359116 *
Age65 0.0222732 0.0243413 0.9150 0.3601730
GenderMale -0.0266516 0.0167021 -1.5957 0.1105564
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# extract estimates first component
<- fit_3@components[[1]]$Comp.1@.Data[2:3, ]
fit_3r
<- as.data.frame(fit_3r) %>%
fit_3r1 mutate(model = "CFMM",
level = str_remove(row.names(fit_3r), "Party"),
component = "Component 1")
# extract estimates second component
<- fit_3@components[[1]]$Comp.2@.Data[2:3, ]
fit_3d <- as.data.frame(fit_3d) %>%
fit_3d2 mutate(model = "CFMM",
level = str_remove(row.names(fit_3d), "Party"),
component = "Component 2")
# calculate CI
<- rbind(fit_3r1, fit_3d2) %>%
m2mix mutate(estimate = Estimate,
std.error = `Std. Error`,
lower = estimate - 1.96 * std.error,
upper = estimate + 1.96 * std.error) %>%
::select(model, level, estimate, std.error, lower, upper, component)
dplyr
$component <- forcats::fct_relevel(m2mix$component, "Component 1", "Component 2")
m2mix
<- m2mix %>%
m2mix_vis add_row(model = "CFMM",
level = "Independent",
estimate = 0,
std.error = NA,
lower = NA,
upper = NA,
component = "Component 1") %>%
add_row(model = "CFMM",
level = "Independent",
estimate = 0,
std.error = NA,
lower = NA,
upper = NA,
component = "Component 2") %>%
mutate(image = ifelse(component == "Component 2",
"http://clipart-library.com/images/BiaKRMaBT.gif",
"https://i1.wp.com/gifgifs.com//animations/jobs-people/politicians/Republican_elephant.gif")) %>%
mutate(image = ifelse(level == "Independent",
NA, image))
$level <- forcats::fct_relevel(m2mix_vis$level, rev(c("Independent", "Democrat", "Republican")))
m2mix_vis$component <- forcats::fct_relevel(m2mix_vis$component, "Component 2", "Component 1")
m2mix_vis
<- m2mix_vis %>%
vis3 ggplot(aes(x = level, y = estimate,
group = component,
color = component,
shape = component)) +
geom_hline(yintercept = 0, color = "grey30", linetype = "dashed",
alpha = 0.8) +
geom_point(aes(y = estimate), position = position_dodge(width = 0.6)) +
geom_linerange(aes(xmin = level, xmax = level,
ymin = lower, ymax = upper),
position = position_dodge(width = 0.6)) +
geom_image(aes(y = estimate + 0.003,
image = image),
size= .07) +
theme_bw() +
xlab("") + ylab("") +
coord_flip(ylim = c(-0.35, 0.15)) +
theme(
legend.position = "none",
plot.title = element_text(size = 10),
axis.text.x = element_text(size = 7)
+
) scale_colour_manual(
values = c("#1404BD", "#DE0100"),
aesthetics = c("colour", "fill")
)
## Composite visualisation
::ggarrange(vis1 +
ggpubrggtitle("Pooled model") +
theme(
axis.text.y = element_text(face = "bold",
size = 9)
), +
vis2 ggtitle("Split by observed cov.") +
theme(
axis.text.y = element_blank()
), +
vis3 ggtitle("CFMM (no covariates)")+
theme(
axis.text.y = element_blank()
),ncol = 3, nrow = 1,
widths = c(3, 3, 3),
common.legend = F)
- Using CFMM with two classes, we successfully identify the existing subgroups.
- Standard errors capture the effect provided by Kirkland and Coppock (2018), without the need to include respondent’s party.