Vietnamese Voice Classification based on Deep Learning Approach

Authors

  • Hung Bui Thanh Lecturer

Keywords:

Voice classification, Mel Spectrogram feature, Deep Learning, Convolutional Neural Network

Abstract

In the digital era, it is undeniable that voice classification plays a meaningful task in various aspects of life. In this research, we propose a method of predicting the gender and region of the Vietnamese voice which is based on the spectrum of sound using the deep learning approach. From the raw dataset, we conducted the preprocessing stage to take the audio dataset to the same frequency and time standard. After that, we extracted Mel Spectrogram feature and then put into a deep learning model - Convolutional Neural Network to train and optimize. Our experiments on 37 samples taken from VIVOS corpus audio dataset achieve the accuracy of 86.48% for predicting gender and 51.45% for predicting the region of the voice

References

Isra Khan, Rafi Ullah, Shah Muhammad Emaduddin. Robust Feature Extraction Techniques in Speech Recognition: A Comparative Analysis. Conference: International Conference on Computing & Information Sciences. 2019

M. Alam, M. D. Samad, L. Vidyaratne, A. Glandon, and K. M. Iftekharuddin. Survey on Deep Neural Networks in Speech And Vision Systems. 2019, arXiv:1908.07656.

D.Nagajyothi, P. Siddaiah. Speech Recognition Using Convolutional Neural Networks. International Journal of Engineering & Technology 7(4):133-137, 2018.

Khalid Hussain, Mazhar Hussain and Muhammad Gufran Khan. Improved Acoustic Scene Classification with DNN and CNN. 2017

Michele Valenti, Dario Tonelli, Fabio Vesperini, Emanuele Principi, Stefano Squartini. A Neural Network Approach for Sound Event Detection in Real Life Audio. 25th European Signal Processing Conference (EUSIPCO) 2017.

Tapas Chakraborty, Bidhan Barai, Bikshan Chatterjee, Nibaran Das, Subhadip Basu and Mita Nasipuri. Closed-Set Device-Independent Speaker Identification Using CNN. International Conference on Intelligent Computing and Communication, 2019.

Nidhi Srivastava. Speech Recognition using MFCC and Neural Networks. International Journal of Engineering Development and Research. Vol. 2, pp. 2122-2129, 2013

Rishabh N. Tak, Dharmesh M. Agrawal, and Hemant A. Patil. Novel Phase Encoded Mel Filterbank Energies for Environmental Sound Classification. International Conference on Pattern Recognition and Machine Intelligence 2017

Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu. Convolutional Neural Networks for Speech Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing Volume: 22, Issue: 10, Oct. 2014

Jha, S., Kumar, R., Chiclana, F., Puri, V., & Priyadarshini, I.: Neutrosophic Approach for Enhancing Quality of Signals. Multimedia Tools and Applications, 1-32, 2019.

Bui Thanh Hung, Vijay Bhaskar Semwal, Neha Gaud, Vishwanth Bijalwa. Violent Video Detection by Pre-trained Model and CNN-LSTM Approach". Proceedings of Integrated Intelligence Enable Networks and Computing. Springer Series in Algorithms for Intelligent Systems, 2021.

Bui Thanh Hung, Le Minh Tien. Facial Expression Recognition with CNN-LSTM. Research in Intelligent and Computing in Engineering. Springer Series in Advances in Intelligent Systems and Computing. 2020.

Bui Thanh Hung. Face Recognition Using Hybrid HOG-CNN Approach". Research in Intelligent and Computing in Engineering. Springer Series in Advances in Intelligent Systems and Computing, 2020.

Bui Thanh Hung: Domain-Specific Versus General-Purpose Word Representations in Sentiment Analysis for Deep Learning Models. Frontiers in Intelligent Computing: Theory and Applications pp 252-264, Springer, 2019.

Hieu-Thi Luong and Hai-Quan Vu: A Non-Expert Kaldi Recipe for Vietnamese Speech Recognition System. In Proc. WLSI-3 & OIAF4HLT-2, 2016.

Pydub: https://github.com/jiaaro/pydub

McFee, Brian, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. Librosa: Audio and Music Signal Analysis in Python. In Proceedings of the 14th Python in Science Conference, pp. 18-25. 2015.

François Chollet et al.: Keras, 2015. https://keras.io/

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., and Zheng, X.: Tensorflow: A system for large-scale machine learning. Tech. rep. Google Brain arXiv preprint, 2016.

Downloads

Published

2022-01-29

How to Cite

Bui Thanh, H. (2022). Vietnamese Voice Classification based on Deep Learning Approach. International Journal of Machine Learning and Networked Collaborative Engineering, 4(4), 171–180. Retrieved from https://mlnce.net/index.php/Home/article/view/171