early prediction of university dropouts - a random forest approach

We predict university dropout using random forests based on conditional inference trees and on a broad German data set covering a wide range of aspects of student life and study courses. We model the dropout decision as a binary classification (graduate or dropout) and focus on very early prediction of student dropout by stepwise modeling students’ transition from school (pre-study) over the study-decision phase (decision phase) to the first semesters at university (early study phase). We evaluate how predictive performance changes over the three models, and observe a substantially increased performance when including variables from the first study experiences, resulting in an AUC (area under the curve) of 0.86. Important predictors are the final grade at secondary school, and also determinants associated with student satisfaction and their subjective academic self-concept and self-assessment. A direct outcome of this research is the provision of information to universities wishing to implement early warning systems and more personalized counseling services to support students at risk of dropping out during an early stage of study.

Data and Resources

Suggested Citation

Behr, Andreas; Giese, Marco; Teguim Kamdjou, Herve Donald; Theune, Katja (2019): Early prediction of university dropouts - a random forest approach. Version: 1. Journal of Economics and Statistics. Dataset. http://dx.doi.org/10.15456/jbnst.2019333.185049