Evaluation of Machine Learning Algorithms for Predicting Phishing Attacks in Higher Education Environments
An Experimental Framework for Enhancing Cybersecurity in Academic Institutions
Authors
| Issue | 2026 |
| Published | 12 May 2026 |
| Section | Articles |
Abstract
This study evaluates the performance of several machine learning algorithms Logistic Regression, Support Vector Machine, Random Forest, and XGBoost in predicting phishing attacks within higher education environments. Due to the limited availability of anonymized institutional datasets, the research employs a conceptual experiment design and simulation-based approach that mirrors the characteristics of phishing incidents commonly encountered by academic users. The simulated dataset includes URL-based indicators, HTML features, email text elements, and behavioral metadata. The experimental protocol covers synthetic data generation, domain-specific feature engineering, stratified k-fold cross-validation, hyperparameter tuning via grid search, and performance evaluation using accuracy, precision, recall, F1-score, and ROC/AUC. The simulation results indicate that ensemble-based models (Random Forest and XGBoost) outperform linear and kernel-based models, especially in scenarios with class imbalance typical of campus environments. The discussion highlights implications for real-world campus cybersecurity operations, limitations of conceptual simulations, and future research needs such as real-world validation and the integration of user behavior features. The main contribution is a complete experimental framework that can be executed with real institutional datasets, providing guidance for model selection and deployment in higher education cybersecurity systems.
Keywords: phishing detection, machine learning, Random Forest, XGBoost, higher education
