Adam Nowak
;
Patrick S. Smith

textual analysis in real estate (replication data)

This paper incorporates text data from MLS listings into a hedonic pricing model. We show that the comments section of the MLS, which is populated by real estate agents who arguably have the most local market knowledge and know what homebuyers value, provides information that improves the performance of both in-sample and out-of-sample pricing estimates. Text is found to decrease pricing error by more than 25%. Information from text is incorporated into a linear model using a tokenization approach. By doing so, the implicit prices for various words and phrases are estimated. The estimation focuses on simultaneous variable selection and estimation for linear models in the presence of a large number of variables using a penalized regression. The LASSO procedure and variants are shown to outperform least-squares in out-of-sample testing.

Data and Resources

Suggested Citation

Nowak, Adam; Smith, Patrick S. (2017): Textual Analysis in Real Estate (replication data). Version: 1. Journal of Applied Econometrics. Dataset. http://dx.doi.org/10.15456/jae.2022326.0703267771