Adita Mehta, Marc Rysman, and Timothy Simcoe, "Identifying the age profile of patent citations", Journal of Applied Econometrics, Vol. 25, No. 7, 2010, pp. 1179-1204. There are two zip files. The first, mrs-data.zip, contains the two main data files, resultbb.csv and resultbb_1.csv. These are ASCII files in DOS format. Unix/Linux users should use "unzip -a". The second, mrs-extra, contains a large number of other files, including two complete subdirectories, jhtexact and jhtreplication. Since many of these files are binary files, Unix/Linux users should *not* use "unzip -a". The basic data set is called resultbb. We provide it in ASCII (resultbb.csv) and STATA's .DTA format (resultbb.dta). To create this data set, we started with NBER patent data base file. Our basic observation was a patent in a year, and we were interested in how many citations were received by the patent (as measured by the application year of the cited patent). We were also interested in technological categories, and the application and grant year of the cited patent. There were a huge number of patents with identical data (i.e. same grant year, app year, same other features, and zero citations). Instead of keeping each patent-year as a separate observation, we aggregated identical observations and kept track of how many fell into each year. The data set contains 912,075 observations. Each observation is a group of patents in year. Each observation is characterized by five variables: the application year of the patents (appyear), the grant year of the patents (gyear), the calendar year of the observation (appyearc), the technological subcategory assigned by the NBER patent data project (subcat) and the number of citations received (ncites). The final variable is the number of patent years that fit this description (_freq_). Hence, the data set contains six variables. We include the programs we used to generate our results. Most are STATA programs. The program "mainresults.do" generates Table 1 and Figure 5. The program "changesOverTime.do" generates Figure 8, and the EXCEL file (also included) "changes over time.xls" uses output from this program to generate Table 4. The program "grantvsapp.do" generates Figures 3 and 4. We also include a second data set, (resultbb_1.csv in ASCII and resultbb_1.dta in .DTA) that we used to create Table 2 and Figure 6. The data set is sampled from resultbb, as described in the paper. The program "rep_bias.do" generates these results. Describing how Table 3 is generated is somewhat complicated, since we used Gauss for some columns and STATA for others. The first column of Table 3 is simply copied from Jaffe, Hall, and Trajtenberg: Hall, B.H., Jaffe, A.B., & Trajtenberg, M. (2002). The NBER patent citations data file: Lessons, insights, and methodological tools. In A. Jaffe & M. Trajtenberg (Eds.), Patents, Citations and Innovations: A Window on the Knowledge Economy. Cambridge: MIT Press. Readers may find it easier to access the NBER Working Paper version: Hall, B. H., A. B. Jaffe, and M. Tratjenberg (2001). "The NBER Patent Citation Data File: Lessons, Insights and Methodological Tools." NBER Working Paper 8498. Column 2 uses files in the "jhtexact" directory. The program "jhtExactReplication.g" generates the results using the file "datamake.g" and the GAUSS data set contained in the files "jhtexact.dat" and "jhtexact.dht". In this data set, a grant year is a group of patents. Each group of patents is described by their grant year (gyear), the calendar year (also, the grant year of the citing patents, gyearc), the technology category (subcatin, from 1 to 6) and the number of citations received (ncites). Unlike in the previous cases, ncites is a continous variable, the average over all patents that fall in this group. The data set has 2,268 observations. For this directory, we provide our output file with the settings at the top of Gauss file pasted in. The files to generate Columns 3 and 4 appear in the directory "jhtReplication". The main file program is a Gauss program called "jhtReplicationML.g" which calls upon the program Datamake.g. Column 3 is created by setting the variable "citeapp" to be zero (citeapp=0) near the top of the program. Doing so tells the program to measure dates by the grant year and causes it to use the data set contained in the files togaussg.dat and togaussg.dht. This data set has 136,693 observations. Like the data set "jhtexact", this data set has "gyear", "gyearc", and "subcat". Like resultbb", it has a separate observation for each number of citations received ("ncites") and tracks the number of such patents ("_freq_"). Column 4 is generated by setting "citeapp" to one. (citeapp=1). Then the program uses "togaussa.dat". This data set has 70,600 observations. It is similar to "togaussg.dat", but it has one observation for "appyear-appyearc-cat-ncites" group, along with the number of patents that fall in that group ("_freq_"). Note that the data set "resultbb" has many more observations than these later ones because it has a separate observation for each 2-digtit technological subcategory and the later ones have a separate observation for only for 1-digit categories. In practice, the first thing that each program that uses "resultbb" does is to collapse the data set down to 1 digit categories so it would be more efficient to store "resultbb" in this way in the first place, but we didn't do that. Finally, column 5 is generated by the program "industry.do" using "resultbb.dta", as is Figure 7.