Hybrid Approach for Extractive Text Summarization of Indonesian News Articles using Machine Learning and Heuristic Features

group

Authors

  • Aqeela Nashwa Naysilla University Pignatelli Triputra
  • Anjeli Pignatelli Triputra University
  • Samuel Pignatelli Triputra University
Issue 2026
Published 14 May 2026
Section Articles
description PDF
subject

Abstract

The rapid growth of Indonesian digital news content highlights the need for effective automated summarization methods tailored to morphologically rich, low-resource languages. This study proposes a linguistically informed hybrid approach for extractive text summarization designed specifically for Indonesian language characteristics. The framework integrates machine learning classification with carefully engineered linguistic features to improve summary relevance while maintaining computational efficiency. The methodology combines Logistic Regression and TF-IDF vectorization with additional heuristic features, including positional weighting, keyword relevance, and sentence length scoring. The system is evaluated on a dataset of 750 Indonesian news documents (10,159 sentences) annotated by three linguistic experts and covering multiple news domains to evaluate cross-domain behavior. Experimental results show that the proposed approach achieves 82.53% classification accuracy with a classification F1-score of 0.640. The system also maintains high computational efficiency, requiring only 0.18 seconds per document with a compact 124 MB model size. Summarization quality evaluation further indicates competitive content preservation with a ROUGE-1 F1-score of 0.778. Compared to traditional rule-based baselines, the hybrid system provides a more balanced trade-off between effectiveness and efficiency. Despite these advantages, performance variation across different document structures indicates limitations in handling less structured content, suggesting the need for improved structural adaptability and cross-domain robustness. Overall, this work contributes a practical and linguistically tailored summarization framework that supports scalable deployment for Indonesian digital news processing.

Keywords: text summarization, machine learning, hybrid classification, extractive summarization, low-resource languages

format_quote

How to Cite

file_copyCopy
[1]
Naysilla, A.N. et al. 2026. Hybrid Approach for Extractive Text Summarization of Indonesian News Articles using Machine Learning and Heuristic Features . JASMINE: Journal of Intelligent Systems and Machine Learning. (May 2026).

Downloads

Download data is not yet available.