Instructions for replication of "Heavy Tailed, but not Zipf: Firm and Establishment Size in the U.S." Last modified: February 2023 This file documents the necessary steps to replicate all results in the paper and appendix of: “Heavy Tailed, but not Zipf: Firm and Establishment Size in the U.S.” by Illenin O. Kondo, Logan T. Lewis, and Andrea Stella. DATA SOURCES The tables and figures of this paper are produced using two different types of data: 1. Restricted-access microdata. This data is not provided in the replication files. The Census Bureau's Disclosure Review Board and Disclosure Avoidance Officers have reviewed this information product for unauthorized disclosure of confidential information and have approved the disclosure avoidance practices applied to this release. This research was performed at a Federal Statistical Research Data Center under FSRDC Project Number 2427 (CBDRB-FY22-P2427-R9685). We also include in the paper data from disclosures released under Project Number 1287 on 2017-08-30 (request 5949), 2017-09-08 (request 5949), 2017-12-12 (request 6227), and 2018-06-11 (request 6810). Researchers with access to the LBD and CMF can use the references above to request access to the code used to produce the results disclosed from the RDC. To gain access to the Census microdata, follow the directions on how to write a proposal for access to the data via a Federal Statistical Research Data Center: https://www.census.gov/topics/research/guidance/restricted-use-microdata/standard-application-process.html. 2. Business Dynamics Statistics (BDS): This data is publicly available from the Census Bureau at https://www.census.gov/programs-surveys/bds.html TABLES AND FIGURES For each table and figure, we specify whether the code is available in this replication packet or within the Census RDC. Table 1-7: Produced with restricted-access Census data. Code available upon request by researchers with access to Census data. Table 8: First column produced with BDS data; code available in replication packet in folder "/Tables 8, 9, 15". Remaining columns produced with restricted-access Census data. Code available upon request by researchers with access to Census data. Table 9: First column produced with BDS data; code available in replication packet in folder "/Tables 8, 9, 15". Remaining columns produced with restricted-access Census data. Code available upon request by researchers with access to Census data. Table 10: Code available in replication packet in folder "/Table 10". Table 11: Produced with restricted-access Census data. Code available upon request by researchers with access to Census data. Table 12: Code available in replication packet in folder "/Table 12". Table 13: Code available in replication packet in folder "/Tables 13, 14". Table 14: Code available in replication packet in folder "/Tables 13, 14". Table 15: First column produced with BDS data; code available in replication packet in folder "/Tables 8, 9, 15". Remaining columns produced with restricted-access Census data. Code available upon request by researchers with access to Census data. Table 16-20: Produced with restricted-access Census data. Code available upon request by researchers with access to Census data. Table 21: Code available in replication packet in folder "/Table 21". Table 22-27: Produced with restricted-access Census data. Code available upon request by researchers with access to Census data. Table 28: Code available in replication packet in folder "/Table 28". Table 29-30: Produced with restricted-access Census data. Code available upon request by researchers with access to Census data. Figure 1-7: Code available in replication packet in folder "/Figures". * Requires trandn.m, tn.m, ntail.m Zdravko Botev (2023). Truncated Normal and Student's t-distribution toolbox (https://www.mathworks.com/matlabcentral/fileexchange/53796-truncated-normal-and-student-s-t-distribution-toolbox), MATLAB Central File Exchange. NOTE Those wishing to do a similar maximum-likelihood estimation on a different source of data should look to the code underlying Tables 13 and 14 (the Montecarlo simulation exercise). The MLE is done on the simlulated data and the estimation itself is readily adapted to other datasets.