Hybrid Approach for Extractive Text Summarization of Indonesian News Articles using Machine Learning and Heuristic Features

Aqeela Nashwa Naysilla; Anjeli Tedan; Samuel Karel Augusta Koesmendro

Hybrid Approach for Extractive Text Summarization of Indonesian News Articles using Machine Learning and Heuristic Features

home
2026
Hybrid Approach for Extractive Text Summarization of Indonesian News Articles using Machine Learning and Heuristic Features

group

Authors

Aqeela Nashwa Naysilla University Pignatelli Triputra
Anjeli Pignatelli Triputra University
Samuel Pignatelli Triputra University

Issue	2026
Published	14 May 2026
Section	Articles

description PDF

subject

Abstract

The rapid growth of Indonesian digital news content highlights the need for effective automated summarization methods tailored to morphologically rich, low-resource languages. This study proposes a linguistically informed hybrid approach for extractive text summarization designed specifically for Indonesian language characteristics. The framework integrates machine learning classification with carefully engineered linguistic features to improve summary relevance while maintaining computational efficiency. The methodology combines Logistic Regression and TF-IDF vectorization with additional heuristic features, including positional weighting, keyword relevance, and sentence length scoring. The system is evaluated on a dataset of 750 Indonesian news documents (10,159 sentences) annotated by three linguistic experts and covering multiple news domains to evaluate cross-domain behavior. Experimental results show that the proposed approach achieves 82.53% classification accuracy with a classification F1-score of 0.640. The system also maintains high computational efficiency, requiring only 0.18 seconds per document with a compact 124 MB model size. Summarization quality evaluation further indicates competitive content preservation with a ROUGE-1 F1-score of 0.778. Compared to traditional rule-based baselines, the hybrid system provides a more balanced trade-off between effectiveness and efficiency. Despite these advantages, performance variation across different document structures indicates limitations in handling less structured content, suggesting the need for improved structural adaptability and cross-domain robustness. Overall, this work contributes a practical and linguistically tailored summarization framework that supports scalable deployment for Indonesian digital news processing.

Keywords: text summarization, machine learning, hybrid classification, extractive summarization, low-resource languages

format_quote

How to Cite

file_copyCopy

[1]

Naysilla, A.N. et al. 2026. Hybrid Approach for Extractive Text Summarization of Indonesian News Articles using Machine Learning and Heuristic Features . JASMINE: Journal of Intelligent Systems and Machine Learning. (May 2026).

ACM ACS APA ABNT Chicago Harvard IEEE MLA Turabian Vancouver

Download Endnote/Zotero/Mendeley (RIS) Download BibTeX

Downloads

Download data is not yet available.