Improving the Accuracy of the C4.5 Algorithm in Heart Disease Prediction Using Bagging and Information Gain
Authors
| Issue | Vol. 12 No. 1 (2026) |
| Published | 2 June 2026 |
| Section | Articles |
| Pages | 1-13 |
Abstract
Class imbalance is a common challenge in data classification, where the majority class significantly outnumbers the minority class, leading to a decrease in algorithm performance, particularly for the C4.5 algorithm. This study aims to address this problem by proposing a combination of Bootstrap Aggregation (Bagging) and Information Gain (IG). The IG method is employed for feature selection using a threshold of > 0.02 to select the most relevant attributes, while Bagging functions to enhance the stability and accuracy of the classification model. The experiment was conducted using a diabetes dataset from UCI with 10-fold cross-validation validation. The results showed that the C4.5+Bagging model achieved the highest accuracy at 95.96%, while the proposed C4.5+IG+Bagging combination reached an accuracy of 94.42%, a significant increase from the baseline C4.5 algorithm's accuracy of 89.04%. These findings demonstrate that the proposed method combination is effective in improving classification performance on imbalanced data
