Study Replication and Extension Causal Inference.pdf

Page 1 of 19

Final Project

Decision Memo: Study Replication and Extension

CS112: Knowledge: Information-Based Decisions

Minerva Schools at KGI

Liudmyla Serohina

December 2020

Page 2 of 19

DECISION MEMO: STUDY REPLICATION AND EXTENSION 2

To: Cai et. al

Summary:

After analysing the encouragement design study Social Networks and the Decision to

Insure (2015) by Cai et. al1

, I suggest examining the treatment effect of the intensive session on

insurance take-up rates. For this, I will use the replication data hosted on Harvard dataverse2

. The

causality in the study is established by the introduction of insurance via the sessions of different

intensity (simple or intensive), to which the households were randomly assigned. Due to this,

there might have been significant differences between the households assigned to these sessions.

Thus, a question arose when studying the work by Cai et. al: Would the treatment effect of

intensive session on the survey take-up change if we conduct matching on the covariates used in

the linear regression equation?

To answer this question, I firstly run the original linear regression and then proceed to

perform three types of matching on the covariates, used in the regression: Mahalanobis distance

matching, genetic matching and propensity score matching to then compare the treatment effects

of the best matched data with the original data.

Replication

To replicate the treatment effect from the original paper, I selected the following

regression equation(p.90):

1 Cai, Jing, Alain De Janvry, and Elisabeth Sadoulet. 2015. "Social Networks and the Decision to Insure." American

Economic Journal: Applied Economics, 7 (2): 81-108.

2 Clarke, Emma, Isabelle Feldhaus, and Jinyi Zhu. “Replication Data for: Social Networks and the Decision to

Insure.” Harvard Dataverse. Harvard Dataverse, May 8, 2018.

https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi%3A10.7910%2FDVN%2FPB2MLX.

Page 3 of 19

DECISION MEMO: STUDY REPLICATION AND EXTENSION 3

The results of this regression are located in the first column of Table 2(p.91) and they

state that the take-up rate in the intensive first round sessions is 14 percentage points higher than

in the simple sessions.

To check this, I ran a linear regression using the equation above with the following

covariates from the data set: intensive, male, age, agpop, ricearea_2010, literacy and village.

The treatment effect I get is approximately 15 percentage points(15.29). The difference of 1

percentage point between this output and the original one might be due to, for instance, a

different approach of cleaning the data from NAs and missing values. Nonetheless, the value is

close to the original treatment effect value, which confirms the results Cai et. al found.

I visualized the linear regression in R to then compare these initial plots with linear

regression plots that used the matched covariates3

Figure 1, 2. 1) Ef ect of intensive session on insurance take-up, using unmatched data;

2) Ef ect of intensive session on insurance takeup (zoomed-in to see the confidence band)

Extension

To extend the replication of the linear regression results, I perform three types of

matching: Mahalanobis(multivariable) distance matching, genetic matching and propensity score

matching. Out of these matching procedures, we would select the one that produces the smallest

3 The summary table for linear regression is supplied in Appendix A and the code is in Appendix G.