Comparison of Feature Selection Based on Computation Time and Classification Accuracy Using Support Vector Machine

  • Salmun K Nasib Statistics Study Program and Gorontalo State University, Indonesia
  • Fadilah Istiqomah Pammus Statistics Study Program and Gorontalo State University, Indonesia
  • Nurwan Mathematics Study Program and Gorontalo State University, Indonesia
  • La Ode Nashar Statistics Study Program and Gorontalo State University, Indonesia
Keywords: Sentiment Classification; Feature Selection; Chi-square; Mutual Information; Support Vector Machine

Abstract

The goal of this research to compare Chi-Square feature selection with Mutual Information feature selection based on computation time and classification accuracy. In this research, people's comments on Twitter are classified based on positive, negative, and neutral sentiments using the Support Vector Machine method. Sentiment classification has the disadvantage that it has many features that are used, therefore feature selection is needed to optimize a sentiment classification performance. Chi-square feature selection and mutual information feature selection are feature selections that both can improve the accuracy of sentiment classification. How to collect the data on twitter taken using the IDE application from python. The results of this study indicate that sentiment classification using Chi-Square feature selection produces a computation time of 0.4375 seconds with an accuracy of 78% while sentiment classification using Mutual Information feature selection produces an accuracy of 80% with a required computation time of 252.75 seconds. So that the conclusion are obtained based on the computational time aspect, the Chi-Square feature selection is superior to the Mutual Information feature selection, while based on the classification accuracy aspect, the Mutual Information feature selection is more accurate than the Chi-Square feature selection. The recommendations for further research can use mutual information feature selection to get high accuracy results on sentiment classification

References

Gunawan, B., Pratiwi, H. S., & Pratama, E. E. (2018). Sistem Analisis Sentimen pada Ulasan Produk Menggunakan Metode Naive Bayes. Jurnal Edukasi Dan Penelitian Informatika (JEPIN), 4(2), 113. https://doi.org/10.26418/jp.v4i2.27526

Hakim, L., Gustina, S., Putri, S. F., & Faudiah, S. U. (2020). Perancangan Chatbot di Universitas Proklamasi 45. Edumatic : Jurnal Pendidikan Informatika, 4(1), 91–100. https://doi.org/10.29408/edumatic.v4i1.2157

Irene, A. F. (2017). Klasifikasi Sentimen Review Film Menggunakan Algoritma Support Vector Machine Sentiment Classification of Movie Reviews Using Algorithm Support Vector Machine. 4(3), 4740–4750.

Irham, L. G., Adiwijaya, A., & Wisesty, U. N. (2019). Klasifikasi Berita Bahasa Indonesia Menggunakan Mutual Information dan Support Vector Machine. Jurnal Media Informatika Budidarma, 3(4), 284. https://doi.org/10.30865/mib.v3i4.1410

Khairunnisa, S., Adiwijaya, A., & Faraby, S. Al. (2021). Pengaruh Text Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19). Jurnal Media Informatika Budidarma, 5(2), 406. https://doi.org/10.30865/mib.v5i2.2835

Mahdiyah, U., Irawan, M. I., & Imah, E. M. (2015). Integrating Data Selection and Extreme Learning Machine for Imbalanced Data. Procedia Computer Science, 59(Iccsci), 221–229. https://doi.org/10.1016/j.procs.2015.07.561

Mujilahwati, S. (2016). Pre-Processing Text Mining Pada Data Twitter. Seminar Nasional Teknologi Informasi Dan Komunikasi, 2016(Sentika), 2089–9815.

Mutawalli, L., Zaen, M. T. A., & Bagye, W. (2019). KLASIFIKASI TEKS SOSIAL MEDIA TWITTER MENGGUNAKAN SUPPORT VECTOR MACHINE (Studi Kasus Penusukan Wiranto). Jurnal Informatika Dan Rekayasa Elektronik, 2(2), 43. https://doi.org/10.36595/jire.v2i2.117

Nisa, A., Darwiyanto, E., & Asror, I. (2019). Analisis Sentimen Menggunakan Naive Bayes Classifier dengan Chi-Square Feature Selection Terhadap Penyedia Layanan Telekomunikasi. E-Proceeding of Engineering , 6(2), 8650.

Novantirani, A., Sabariah, M. K., & Effendy, V. (2015). Analisis Sentimen pada Twitter untuk Mengenai Penggunaan Transportasi Umum Darat Dalam Kota dengan Metode Support Vector Machine. E-Proceeeding of Engineering, 2(1), 1–7.

Nurkholis, A., Alita, D., & Munandar, A. (2022). Comparison of Kernel Support Vector Machine Multi-Class in PPKM Sentiment Analysis on Twitter. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(2), 227–233. https://doi.org/10.29207/resti.v6i2.3906

Pravina, A. M., Cholissodin, I., & Adikara, P. P. (2019). Analisis Sentimen Tentang Opini Maskapai Penerbangan pada Dokumen Twitter Menggunakan Algoritme Support Vector Machine (SVM). Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 3(3), 2789–2797. http://j-ptiik.ub.ac.id

Prayoginingsih, S., & Kusumawardani, R. P. (2018). Klasifikasi Data Twitter Pelanggan Berdasarkan Kategori myTelkomsel Menggunakan Metode Support Vector Machine (SVM). Sisfo, 07(02). https://doi.org/10.24089/j.sisfo.2018.01.002

Reynaldhi, M. A. R., & Sibaroni, Y. (2021). Analisis Sentimen Review Film pada Twitter menggunakan Metode Klasifikasi Hybrid Naïve Bayes dan Decision Tree. E-Proceeding of Engineering, 8(5), 10127–10137.

Syamsiah. (2014). Pemilihan Model Penentuan Kelayakan Pinjaman Anggota Koperasi Berdasarkan Algoritma Support Vector Machine , Genetic Algorithms , Dan Neural Network. Faktor Exacta, 7(2), 141–153.

Tsani, M., Rupaka, A., Asmoro, L., & Pradana, B. (2020). Analisis Sentimen Review Transportasi Menggunakan Algoritma Support Vector Machine Berbasis Chi Square. Smart Comp :Jurnalnya Orang Pintar Komputer, 9(1), 35–39. https://doi.org/10.30591/smartcomp.v9i1.1817

Published
2023-04-18
How to Cite
Nasib, S. K., Pammus, F. I., Nurwan, & Nashar, L. O. (2023). Comparison of Feature Selection Based on Computation Time and Classification Accuracy Using Support Vector Machine. Indonesian Journal of Applied Research (IJAR), 4(1), 63-74. https://doi.org/10.30997/ijar.v4i1.252