Issue | Vol. 10 No. 1 (2025) |
Release | 04 August 2025 |
Section | Articles |
This study aims to analyze and compare the performance of two text classification algorithms Multinomial Naive Bayes (MNB) and Logistic Regression (LR)—for film genre classification using multi-feature text data, both with and without hyperparameter optimization. Film genres play a crucial role in digital content recommendation systems; however, manual classification is subjective and time-consuming. The dataset, obtained from Letterboxd via Kaggle, includes film titles, descriptions, and themes. After preprocessing and text normalization (tokenization, lemmatization, and stemming), the text data were transformed into numerical features using the TF-IDF method. Two modeling scenarios were applied: the first using default parameters, and the second employing GridSearchCV to find the optimal hyperparameter settings. Model performance was evaluated using accuracy, precision, recall, and F1-score. The results indicate that the optimized LR model achieved the highest accuracy of 0.847, followed by the optimized MNB model with an accuracy of 0.837. This study concludes that hyperparameter optimization significantly improves model performance and that LR outperforms MNB in the context of multi-feature text-based genre classification.