A Comparative Study on Handling Imbalanced Data in Indonesian Hate Speech Detection Using FastText and BiLSTM
Authors
| Issue | Vol. 11 No. 2 (2025) |
| Published | 2 December 2025 |
| Section | Articles |
| Pages | 136-149 |
Abstract
Online hate speech has become a serious threat to social harmony in Indonesia, with cases increasing significantly in recent years. This study develops and evaluates a system for detecting Indonesian hate speech using a Bidirectional Long Short-Term Memory (BiLSTM) deep learning model, complemented by FastText word embeddings. To address the common issue of data imbalance in hate speech datasets, this study implements and compares three oversampling techniques: Random Oversampler, Synthetic Minority Oversampling Technique (SMOTE), and Adaptive Synthetic Sampling (ADASYN). The research utilizes the Indonesian Hate Speech Superset, a dataset comprising 14,306 comments. The model's performance is evaluated using Stratified K-fold Cross-Validation, with metrics including Accuracy, Precision, Recall, and F1-score. Results, visualized using a Confusion Matrix to demonstrate that applying oversampling techniques enhances model performance, particularly by improving the Recall and F1-score metrics. These findings contribute to the development of hate speech classification systems that are fairer, more adaptive, and better suited to the unique characteristics of the Indonesian social media landscape.
