Page 1 of 19
Final Project
Decision Memo: Study Replication and Extension
CS112: Knowledge: Information-Based Decisions
Minerva Schools at KGI
Liudmyla Serohina
December 2020
Page 2 of 19
DECISION MEMO: STUDY REPLICATION AND EXTENSION 2
To: Cai et. al
Summary:
After analysing the encouragement design study Social Networks and the Decision to
Insure (2015) by Cai et. al1
, I suggest examining the treatment effect of the intensive session on
insurance take-up rates. For this, I will use the replication data hosted on Harvard dataverse2
. The
causality in the study is established by the introduction of insurance via the sessions of different
intensity (simple or intensive), to which the households were randomly assigned. Due to this,
there might have been significant differences between the households assigned to these sessions.
Thus, a question arose when studying the work by Cai et. al: Would the treatment effect of
intensive session on the survey take-up change if we conduct matching on the covariates used in
the linear regression equation?
To answer this question, I firstly run the original linear regression and then proceed to
perform three types of matching on the covariates, used in the regression: Mahalanobis distance
matching, genetic matching and propensity score matching to then compare the treatment effects
of the best matched data with the original data.
Replication
To replicate the treatment effect from the original paper, I selected the following
regression equation(p.90):
1 Cai, Jing, Alain De Janvry, and Elisabeth Sadoulet. 2015. "Social Networks and the Decision to Insure." American
Economic Journal: Applied Economics, 7 (2): 81-108.
2 Clarke, Emma, Isabelle Feldhaus, and Jinyi Zhu. “Replication Data for: Social Networks and the Decision to
Insure.” Harvard Dataverse. Harvard Dataverse, May 8, 2018.
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi%3A10.7910%2FDVN%2FPB2MLX.
Page 3 of 19
DECISION MEMO: STUDY REPLICATION AND EXTENSION 3
The results of this regression are located in the first column of Table 2(p.91) and they
state that the take-up rate in the intensive first round sessions is 14 percentage points higher than
in the simple sessions.
To check this, I ran a linear regression using the equation above with the following
covariates from the data set: intensive, male, age, agpop, ricearea_2010, literacy and village.
The treatment effect I get is approximately 15 percentage points(15.29). The difference of 1
percentage point between this output and the original one might be due to, for instance, a
different approach of cleaning the data from NAs and missing values. Nonetheless, the value is
close to the original treatment effect value, which confirms the results Cai et. al found.
I visualized the linear regression in R to then compare these initial plots with linear
regression plots that used the matched covariates3
.
Figure 1, 2. 1) Ef ect of intensive session on insurance take-up, using unmatched data;
2) Ef ect of intensive session on insurance takeup (zoomed-in to see the confidence band)
Extension
To extend the replication of the linear regression results, I perform three types of
matching: Mahalanobis(multivariable) distance matching, genetic matching and propensity score
matching. Out of these matching procedures, we would select the one that produces the smallest
3 The summary table for linear regression is supplied in Appendix A and the code is in Appendix G.