IMPLEMENTATION OF CONVOLUTIONAL NEURAL NETWORK (CNN) ALGORITHM IN MOBILE APPLICATION-BASED VOICE EMOTION CLASSIFICATION SYSTEM

Authors

  • Naufal Ammar Raihan Universitas Sains Al-Qur'an
  • Muhamad Fuat Asnawi Universitas Sains Al-Qur'an
  • Iman Ahmad Ihsannuddin Universitas Sains Al-Qur'an
  • Nahar Mardiyantoro Universitas Sains Al-Qur'an
  • Muhammad Alif Muwafiq Baihaqy Universitas Sains Al-Qur'an

DOI:

https://doi.org/10.58641/cest.v4i2.211

Keywords:

CNN, voice emotion classification, Mel-spectrogram, TensorFlow Lite, Android

Abstract

The ability of machines to recognize emotions from voice is known as Speech Emotion Recognition (SER). This study developed a voice emotion classification system using a Convolutional Neural Network (CNN) and implemented it in the form of an Android mobile application. The main problem raised is how to recognize human emotions through voice signals accurately, efficiently, and in real-time on mobile devices. The study was conducted with two training stages, namely pre-training using the RAVDESS dataset and fine-tuning with the IndoWaveSentiment dataset. Audio data was converted into a 128×128×1 Mel-spectrogram to be input to the CNN. The CNN model consists of three convolution and pooling blocks, as well as dense and softmax layers. After training, the model was converted to TensorFlow Lite format and integrated with the Android application through a client-server architecture using Flask. The test results showed that the system was able to recognize neutral, happy, disappointed, and surprised emotions with a high level of accuracy both on test data but not as good on live recorded voice. The system also features a SQLite-based history feature. Test results showed 96% accuracy on external test data and 55% on live recorded audio, with an average accuracy of 75.5%. This indicates the model performs very well in structured conditions, but still needs improvement for real-world input.

References

Aftab, A., Morsali, A., Ghaemmaghami, S., & Champagne, B. (2021). LIGHT-SERNET: A LIGHTWEIGHT FULLY CONVOLUTIONAL NEURAL NETWORK FOR SPEECH EMOTION RECOGNITION.

Akbar, H., & Sanjaya, W. K. (2023). Kajian Performa Metode Class Weight Random Forest pada Klasifikasi Imbalance Data Kelas Curah Hujan. Jurnal Sains, Nalar, dan Aplikasi Teknologi Informasi, 3(1). https://doi.org/10.20885/snati.v3i1.30

Azmi, K., & Defit, S. (2023). Implementasi Convolutional Neural Network (CNN) Untuk Klasifikasi Batik Tanah Liat Sumatera Barat. 16(1), 2023.

Bagas Prakosa, A., & Radius Tanone, dan. (2023). IMPLEMENTASI MODEL DEEP LEARNING CONVOLUTIONAL NEURAL NETWORK (CNN) PADA CITRA PENYAKIT DAUN JAGUNG UNTUK KLASIFIKASI PENYAKIT TANAMAN. Dalam Jurnal Pendidikan Teknologi Informasi (JUKANTI) (Nomor 6). https://www.kaggle.com/datasets/n

Bansal, S., & Kaur, R. (2024). THE SOUND OF EMOTION: CNN-BASED SPEECH EMOTION RECOGNITION FOR REAL-WORLD APPLICATIONS. https://doi.org/10.56726/IRJMETS56879

Farid, M. N., Rahman, A. F., Wicaksono, H., & Kalimantan, I. T. (2023). Jurnal Sistim Informasi dan Teknologi https://jsisfotek.org/index.php Analisis Pengaruh Kombinasi Fitur Spektral terhadap Tingkat Akurasi Speech Emotion Recognition. 5(2). https://doi.org/10.37034/jsisfotek.v5i1.234

Hakim, S. A., Ubaidillah, M., Ramadhan, A. R., Zulvia, R., Hawari, A., Rizky, A. B., Lutfi, R., Tsania, P., Hermanto, M., Yudistira, N., & Korespondensi, P. (2024). KLASIFIKASI CITRA GENERASI ARTIFICIAL INTELLIGENCE MENGGUNAKAN METODE FINE TUNING PADA RESIDUAL NETWORK AI GENERATED IMAGE CLASSIFICATION USING FINE TUNING ON RESIDUAL NETWORK. 11(3), 655–666. https://doi.org/10.25126/jtiik.938118

McLoughlin, I., Pham, L., Song, Y., Miao, X. X., Phan, H., Cai, P., Gu, Q., Nan, J., Song, H., & Soh, D. (2026). Spectrogram features for audio and speech analysis. Applied Sciences, 16(2), 572. doi:10.3390/app16020572

Nagro, S. (2026). Optimization of speech emotion recognition using hybrid dataset integration and deep learning-based feature fusion with a novel balanced focal entropy loss. Scientific Reports. doi:10.1038/s41598-026-48975-5

Rathnayake, H., James, J., Leoni, G., Nicholas, A., Watson, C., & Keegan, P. (2026). A review on speech emotion recognition for low-resource and Indigenous languages. Speech Communication, 176, 103342. doi:10.1016/j.specom.2025.103342

Rismanto, M. E. P., & Handayani, I. (2025). Klasifikasi Emosi Berdasarkan Suara dengan Metode Convolutional Neural Network. Jurnal Informatika Universitas Pamulang, 9(4), 163–171. https://doi.org/10.32493/informatika.v9i4.45236

Rochman, F., & Junaedi, H. (2020). IMPLEMENTASI TRANSFER LEARNING UNTUK IDENTIFIKASI ORDO TUMBUHAN MELALUI DAUN.

Santoso, B. B., Ocsa, P., & Saian, N. (2023). Implementasi Flask Framework pada Development Modul Reporting Aplikasi Sistem Informasi Helpdesk di PT.XYZ). Jurnal Teknologi Informasi dan Komunikasi), 7(2), 2023. https://doi.org/10.35870/jti

Setiawan, D., Ayu Dewi Karuniawati, E., Imelda Janty, S., & Bintan Cakrawala, P. (2023). Peran Chat Gpt (Generative Pre-Training Transformer) Dalam Implementasi Ditinjau Dari Dataset.

Sharan, R. V, Mascolo, C., & Schuller, B. W. (t.t.). Emotion Recognition from Speech Signals by Mel-Spectrogram and a CNN-RNN.

Wijaya, N., Soesanti, I., & Firmansyah, E. (2017). Prosiding Seminar Nasional Teknologi dan Informatika, 2017 : Kudus, 25 Juli 2017. Badan Penerbit Universitas Muria Kudus.

Wu, X., Lee, T., Lilhore, U. K., Simaiya, S., Alroobaea, R., Baqasah, A. M., Alsafyani, M., & Tekeste, L. G. (2026). A deep learning approach to emotionally intelligent AI for improved learning outcomes. Scientific Reports, 16, 7431. doi:10.1038/s41598-026-37750-1

Zhao, X., Liu, J., & Lin, L. (2026). Deep cross-modal affective memory networks with adaptive multi-source heterogeneous transfer learning in speech emotion recognition. Scientific Reports. doi:10.1038/s41598-026-47200-7

Downloads

Published

2026-04-30

Issue

Section

Articles