Detection and Classification of Cognitive Distortions in Mental Health Texts Using a Hybrid Natural Language Processing Approach
Authors
| Issue | 2026 |
| Published | 16 February 2026 |
| Section | Articles |
Abstract
This study develops a hybrid natural language processing system to detect cognitive distortions in Indonesian text, aiming to support early mental health awareness. The proposed model integrates rule-based keyword matching with a Random Forest classifier, leveraging TF-IDF feature extraction from the preprocessed Indonesian Mental Health Conversation dataset. Evaluation against manually labeled data across eight distortion categories shows the hybrid approach outperforms standalone methods, achieving a classification accuracy of 77.5% and an exact match rate of 76.67%. The system demonstrated robust performance and fairness, maintaining a balanced label distribution across categories and achieving a validation accuracy of 94% on the full dataset. To validate real world applicability, the model was integrated into a reflective chatbot that successfully identifies distorted thinking patterns in user input and retrieves contextually relevant responses. These findings confirm that combining linguistic theory with data driven modeling creates an effective, interpretable, and scalable tool for cognitive distortion detection in informal Indonesian psychological text.
Keywords: cognitive distortion, natural language processing, hybrid model, Indonesian text, mental health
