A study on the application of deep learning in Vietnamese speech recognition

Nguyen, Van Khoi (2025) A study on the application of deep learning in Vietnamese speech recognition. World Journal of Advanced Engineering Technology and Sciences, 15 (2). pp. 2894-2898. ISSN 2582-8266

[thumbnail of WJAETS-2025-0877.pdf] Article PDF
WJAETS-2025-0877.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download ( 538kB)

Abstract

Speech recognition has become increasingly important in various real-world applications. However, Vietnamese presents unique linguistic challenges such as tones, syllabic structures, and complex morphology, which make speech recognition for this language significantly different from that of languages like English. In this paper, we propose a deep learning approach that combines Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory (BiLSTM) networks to recognize Vietnamese speech using the VIVOS dataset. The CNN component is employed to extract spatial features from audio spectrograms, while the BiLSTM captures the bidirectional temporal dependencies in speech signals. Experimental results show that the proposed CNN-BiLSTM model achieves a competitive Word Error Rate (WER) of 14.7%. These results highlight the potential of deep learning techniques in effectively recognizing tonal languages such as Vietnamese.

Item Type: Article
Official URL: https://doi.org/10.30574/wjaets.2025.15.2.0877
Uncontrolled Keywords: Speech Recognition; Vietnamese; VIVOS; CNN; BiLSTM
Depositing User: Editor Engineering Section
Date Deposited: 16 Aug 2025 12:37
Related URLs:
URI: https://eprint.scholarsrepository.com/id/eprint/4254