Poverty Level Prediction Based on E-Commerce Data Using Naïve Bayes Algorithm and Similarity-Based Feature Selection
The poverty rate is an important measure of any country because it indicates how well the economy develops and how well the economic prosperity distributes among citizens. The Central Statistics Agency, or BPS, measures the poverty rates in Indonesia using the concept of the ability to meet demands (basic needs approach). Using this approach, spending becomes a measure of poverty, defined as an economic incapacity to satisfy food and non-food requirements. Thus, the poor are individuals whose monthly per capita spending is less than the poverty threshold. In this study, the machine learning method using Naive Bayes with similarity-based feature selection and e-commerce data has been proposed to predict the poverty level in Indonesia. We proposed the method to be used as a complement to the results of the costly surveys and censuses conducted by BPS. Our experiments show that the classifier shows little relevance between the predicted and the original values or actual poverty prediction based on BPS data. A limited number of features does not necessarily result in poor accuracy, however great accuracy is not always achieved if a lot of features are being used.