A Comparative Study on Handling Imbalanced Data in Indonesian Hate Speech Detection Using FastText and BiLSTM

group

Authors

Issue Vol. 11 No. 2 (2025)
Published 2 December 2025
Section Articles
Pages 136-149
description pdf
subject

Abstract

Online hate speech has become a serious threat to social harmony in Indonesia, with cases increasing significantly in recent years. This study develops and evaluates a system for detecting Indonesian hate speech using a Bidirectional Long Short-Term Memory (BiLSTM) deep learning model, complemented by FastText word embeddings. To address the common issue of data imbalance in hate speech datasets, this study implements and compares three oversampling techniques: Random Oversampler, Synthetic Minority Oversampling Technique (SMOTE), and Adaptive Synthetic Sampling (ADASYN). The research utilizes the Indonesian Hate Speech Superset, a dataset comprising 14,306 comments. The model's performance is evaluated using Stratified K-fold Cross-Validation, with metrics including Accuracy, Precision, Recall, and F1-score. Results, visualized using a Confusion Matrix to demonstrate that applying oversampling techniques enhances model performance, particularly by improving the Recall and F1-score metrics. These findings contribute to the development of hate speech classification systems that are fairer, more adaptive, and better suited to the unique characteristics of the Indonesian social media landscape.

format_quote

How to Cite

file_copyCopy
[1]
Muhamad Faza, A. et al. 2025. A Comparative Study on Handling Imbalanced Data in Indonesian Hate Speech Detection Using FastText and BiLSTM. IJoICT (International Journal on Information and Communication Technology). 11, 2 (Dec. 2025), 136–149. DOI:https://doi.org/10.21108/ijoict.v11i2.9513.

Downloads

Download data is not yet available.