Decision Tree-Based Early Warning System for Academic Failure: Comparative Analysis with Random Forest and Logistic Regression
Authors
| Issue | 2026 |
| Published | 16 February 2026 |
| Section | Articles |
Abstract
Student underachievement at the secondary education level remains a critical challenge demanding timely and interpretable interventions. This study develops and evaluates a Decision Tree-based model to predict student failure using the Student Performance dataset (n = 649). Two scenarios were investigated: an Early Warning Model (first-period grades) and a Mid-Term Model (first- and second-period grades). Findings reveal the Mid-Term Model delivers markedly higher predictive accuracy, underscoring the value of mid-term data for identifying students at risk. Comparative benchmarking against Random Forest and Logistic Regression used a robust 10-fold cross-validation methodology, incorporating nested hyperparameter tuning and synthetic oversampling resampling. Evaluation revealed that while Logistic Regression achieved the highest accuracy (92.30%) and Random Forest followed (91.38%), a robust paired t-test confirmed no statistically significant difference (p-value=0.0651 and p-value=0.1476, respectively, versus 0.05 threshold) compared to the Decision Tree (89.83%). Therefore, the Decision Tree was selected as the optimal model. It offers full interpretability at no statistically significant cost to accuracy, challenging the assumption that "black-box" models are inherently superior. Further analysis confirmed the first-period grade is the most influential predictor, offering opportunities for early intervention. The model’s interpretable rules, identifying “double warning” and “hidden risk” cases, offer actionable insights for targeted strategies to prevent student failure.
Keywords: Decision Tree, Early Warning System, Academic Failure Prediction, Model Interpretability, Shapley Value Analysis
