addressing sample selection bias for machine learning methods (replication data)
Addressing sample selection bias for machine learning methods (replication data)
Dylan Brewer and Alyssa Carlson
Accepted at Journal of Applied Econometrics, 2023
This replication package contains files required to reproduce results, tables, and figures using Matlab and Stata. We divide the project into instructions to replicate the simulation, the result from Huang et al (2006), and the application.
For reproducing the simulation results
Included files in *\Simulation with short descriptions:
SSML_simfunc: function that produces individual simulations runs
SSML_simulation: script that loops over the SSML_simfunc for different DGP and multiple simulation runs
SSML_figures: script that generates all figures for the paper
SSML_compilefunc: function that compiles the results from SSML_simulation for the SSML_figures script
Steps for replicating simulation:
Save SSML_simfunc, SSML_simulation, SSML_figures, SSML_compilefunc to the same folder. This location will be referred to as the FILEPATH.
Create OUTPUT folder inside the FILEPATH location.
Change the FILEPATH location inside SSML_simulation and SSML_figures.
Run SSML_simulation to produce simulation data and results.
Run SSML_figures to produce figures.
Huang et al replication
For reproducing the Huang et. al. (2006) replication results.
Included files in *\HuangetalReplication with short descriptions:
SSML_huangrep: script that replicates the results from Huang et. al. (2006)
There is no batch download--downloads for each year must be done by hand. For each year, download as many state outcomes as possible and name the files YYYYa.csv, YYYYb.csv, etc. (Example: 1970a.csv, 1970b.csv, 1970c.csv, 1970d.csv). See line 18 of G1_cqclean_202308.do for file structure information.
Steps for replicating application:
Download confidential data from the CQ Press.
Change the working directory in G0_main_202308.do on line 18 to the application folder.
Change local matlabpath in G0_main_202308.do on line 18 to the appropriate location.
Set directory and file path in G9_prediction_202308.m and G10_figures_202308.m as necessary.
Run G0_main_202308.do in Stata to run all programs.
All output (figures and tables) will be saved to subdirectory *\Application\Output.
Contact Dylan Brewer (firstname.lastname@example.org) or Alyssa Carlson (email@example.com) for help with replication.