Predictions of stock returns are greatly improved relative to low-dimensional forecasting regressions when the forecasts are based on the estimated factor of large data sets, also known as the diffusion index (DI) model. However, when applied to text data, DI models do not perform well. This paper shows that by simply using text data in a DI model does not improve equity-premium forecasts over the naive historical-average model, but substantial gains are obtained when one selects the most predictive words before computing the factors and allows the dictionary to be updated over time.