Speech to Text Correction for Indonesian Early Marriage Counseling Chatbots Using IndoRoBERTa and Mistral-7B

subject Abstract

Early marriage among individuals of immature age continues to draw significant attention in Lombok. As of 2021, the prevalence rate stands at 16.59%, indicating that this social issue remains unresolved within the region's community dynamics. Limited access to counseling services particularly in rural areas poses a significant barrier to prevention efforts. This study introduces a virtual counseling chatbot designed to detect and correct Indonesian language text errors during user interactions. The system integrates IndoRoBERTa for error detection and Mistral-7B-Instruct to refine speech to text transcriptions. IndoRoBERTa was trained on synthetic datasets to classify user input as accurate or incorrect, while Mistral-7B-Instruct generates context aware corrections. Achieving an accuracy rate of 98.90%, IndoRoBERTa outperformed benchmark models such as BERT and RNN. The proposed chatbot offers an adaptive and accessible digital solution, especially for communities with limited access to conventional counseling services. This approach highlights the potential of AI-driven tools to support early intervention strategies and reduce the incidence of child marriage in underserved regions.

Keywords: Early Marriage, Virtual Counseling, IndoRoBERTa,, Mistral-7B-Instruct, Speech to Text

label Categories
format_quoteCitationfile_copyCopy
[1]
Firdhaus Dwi Sukma et al. 2025. Speech to Text Correction for Indonesian Early Marriage Counseling Chatbots Using IndoRoBERTa and Mistral-7B. Indonesian Journal on Computing (Indo-JC). 10, 1 (Oct. 2025). DOI:https://doi.org/10.21108/indojc.v10i1.9708.

document_search References

[1] M. D. H. Rahiem, “COVID-19 and the surge of child marriages: A phenomenon in Nusa Tenggara Barat, Indonesia,” Child Abuse Negl, vol. 118, p. 105168, Aug. 2021, doi: 10.1016/j.chiabu.2021.105168.

[2] S. Wahyuningsih, S. Widati, S. M. Praveena, and M. W. Azkiya, “Unveiling barriers to reproductive health awareness among rural adolescents: a systematic review,” Frontiers in Reproductive Health, vol. 6, Nov. 2024, doi: 10.3389/frph.2024.1444111.

[3] D. Mehra, A. Sarkar, P. Sreenath, J. Behera, and S. Mehra, “Effectiveness of a community based intervention to delay early marriage, early pregnancy and improve school retention among adolescents in India,” BMC Public Health, vol. 18, no. 1, p. 732, Dec. 2018, doi: 10.1186/s12889-018-5586-3.

[4] M. Siddiqi, M. E. Greene, A. Stoppel, and C. Allegar, “Interventions to Address the Health and Well-Being of Married Adolescents: A Systematic Review,” Glob Health Sci Pract, vol. 12, no. 4, p. e2300425, Aug. 2024, doi: 10.9745/GHSP-D-23-00425.

[5] G. Park, J. Chung, and S. Lee, “Effect of AI chatbot emotional disclosure on user satisfaction and reuse intention for mental health counseling: a serial mediation model,” Current Psychology, vol. 42, no. 32, pp. 28663–28673, Nov. 2023, doi: 10.1007/s12144-022-03932-z.

[6] H. Chin et al., “The Potential of Chatbots for Emotional Support and Promoting Mental Well-Being in Different Cultures: Mixed Methods Study,” J Med Internet Res, vol. 25, p. e51712, Oct. 2023, doi: 10.2196/51712.

[7] Y. O. Sihombing, R. Fuad Rachmadi, S. Sumpeno, and Moh. J. Mubarok, “Optimizing IndoRoBERTa Model for Multi-Class Classification of Sentiment & Emotion on Indonesian Twitter,” in 2024 IEEE 10th Information Technology International Seminar (ITIS), IEEE, Nov. 2024, pp. 12–17. doi: 10.1109/ITIS64716.2024.10845566.

[8] R. Nihalani and K. Shah, “Enhancing Grammatical Error Detection using BERT with Cleaned Lang-8 Dataset,” Nov. 2024.

[9] Z. He, “English Grammar Error Detection Using Recurrent Neural Networks,” Sci Program, vol. 2021, pp. 1–8, Jul. 2021, doi: 10.1155/2021/7058723.

[10] E. Yulianti, N. Bhary, J. Abdurrohman, F. W. Dwitilas, E. Q. Nuranti, and H. S. Husin, “Named entity recognition on Indonesian legal documents: a dataset and study using transformer-based models,” International Journal of Electrical and Computer Engineering (IJECE), vol. 14, no. 5, p. 5489, Oct. 2024, doi: 10.11591/ijece.v14i5.pp5489-5501.

[11] Y. Efevbera and J. Bhabha, “Defining and deconstructing girl child marriage and applications to global public health,” BMC Public Health, vol. 20, no. 1, p. 1547, Dec. 2020, doi: 10.1186/s12889-020-09545-0.

[12] D. Abdurahman, N. Assefa, and Y. Berhane, “Adolescent Girls’ Early Marriage Intention and its Determinants in Eastern Ethiopia: A Social Norms Perspective,” Sage Open, vol. 13, no. 2, Apr. 2023, doi: 10.1177/21582440231182352.

[13] S. M. Berliana, P. A. N. Kristinadewi, P. D. Rachmawati, R. Fauziningtyas, F. Efendi, and A. Bushy, “Determinants of early marriage among female adolescent in Indonesia,” Int J Adolesc Med Health, vol. 33, no. 1, Mar. 2021, doi: 10.1515/ijamh-2018-0054.

[14] M. Yakob and S. Asra, “Analysis of Spelling Error In Dissertation Based on the General Guideline for Indonesian Spelling (Pedoman Umum Ejaan Bahasa Indonesia),” International Journal for Educational and Vocational Studies, vol. 1, no. 5, Jul. 2019, doi: 10.29103/ijevs.v1i5.1583.

[15] N. Permata Putri, N. Handayani, and S. PGRI Pacitan, “THE SPELLING COMPREHENSION OF PBSI STUDENTS STKIP PGRI PACITAN (VIEWED FROM THE CHANGE OF PUEBI TO EYD V)”.

[16] A. Musyafa, Y. Gao, A. Solyman, C. Wu, and S. Khan, “Automatic Correction of Indonesian Grammatical Errors Based on Transformer,” Applied Sciences, vol. 12, no. 20, p. 10380, Oct. 2022, doi: 10.3390/app122010380.

[17] R. Agustina and S. Ramadhan, “Analysis of Syntaxic Errors in Indonesian Writing: A Literature Review,” IRJE |Indonesian Research Journal in Education| |Vol, doi: 10.22437/irje.

[18] A. Ollerenshaw, M. A. Jalal, R. Milner, and T. Hain, “Empirical Interpretation of the Relationship Between Speech Acoustic Context and Emotion Recognition,” Jun. 2023.

[19] D. Teodorescu, A. Fyshe, and S. M. Mohammad, “Utterance Emotion Dynamics in Children’s Poems: Emotional Changes Across Age,” Jun. 2023.

[20] B. Kushartanti, “THE LINGUISTIC CHOICE BY INDONESIAN-SPEAKING ADOLESCENTS: A CASE STUDY IN TANGERANG,” Linguistik Indonesia, vol. 38, no. 1, pp. 23–34, Mar. 2020, doi: 10.26499/li.v38i1.141.

[21] Y. Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” Jul. 2019.

[22] W. Wongso, C. K. Wah, S. Rahmadani, and S. Limcorn, “Flax-community/Indonesian-roberta-base,” huggingface.

[23] A. Q. Jiang et al., “Mistral 7B,” Oct. 2023.

[24] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” in Proceedings of the 28th International Conference on Computational Linguistics, Stroudsburg, PA, USA: International Committee on Computational Linguistics, 2020, pp. 757–770. doi: 10.18653/v1/2020.coling-main.66.

[25] S. Sathyanarayanan, “Confusion Matrix-Based Performance Evaluation Metrics,” African Journal of Biomedical Research, pp. 4023–4031, Nov. 2024, doi: 10.53555/AJBR.v27i4S.4345.

[26] J. Opitz, “A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation Practice,” Trans Assoc Comput Linguist, vol. 12, pp. 820–836, Jun. 2024, doi: 10.1162/tacl_a_00675.

[27] D. M. Aprilla, F. Bimantoro, and I. G. P. Suta Wijaya, “The Palmprint Recognition Using Xception, VGG16, ResNet50, MobileNet, and EfficientNetB0 Architecture,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 8, no. 2, p. 1065, Apr. 2024, doi: 10.30865/mib.v8i2.7577.

[28] K. Wisnudhanti and F. Candra, “Image Classification of Pandawa Figures Using Convolutional Neural Network on Raspberry Pi 4,” J Phys Conf Ser, vol. 1655, no. 1, p. 012103, Oct. 2020, doi: 10.1088/1742-6596/1655/1/012103.

[29] L. Wright and N. Demeure, “Ranger21: a synergistic deep learning optimizer,” Aug. 2021.

[30] S. Li et al., “Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling,” Oct. 2024.

Downloads

Download data is not yet available.