Vanaja, V. and Tunge, Venkatesham and Kanagala, Nithin Kumar and Bhumandla, Harsha Vardhan and Kana, Shruti (2025) AI powered voice synthesizer. World Journal of Advanced Engineering Technology and Sciences, 15 (2). pp. 663-671. ISSN 2582-8266
![WJAETS-2025-0590.pdf [thumbnail of WJAETS-2025-0590.pdf]](https://eprint.scholarsrepository.com/style/images/fileicons/text.png)
WJAETS-2025-0590.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.
Abstract
The AI Voice Synthesizer is an advanced real-time, multilingual voice cloning system that utilizes state-of-the-art deep learning techniques to generate personalized speech with high naturalness and accuracy. Built on the open-source Coqui.ai’s XTTSv2 framework, the system enables users to synthesize speech using their own voice—or any voice sample—by analyzing just a few seconds of audio. It then uses this voice profile to generate natural-sounding speech in multiple languages, even those the original speaker has never spoken, offering a revolutionary leap in the field of synthetic speech and human-computer interaction. Traditional text-to-speech (TTS) systems often suffer from robotic tone, lack of personalization, limited language support, and high latency. In contrast, this project provides a lightweight, low-latency (<200 ms), and user-friendly platform that supports cross-lingual, few-shot voice cloning. Designed with modularity in mind, the system consists of several independent components: speaker embedding extraction, multilingual text processing, real-time speech synthesis, and a web-based front end. These components are integrated into a seamless workflow that is intuitive and accessible for non-technical users, while also being scalable and customizable for developers and researchers.
Item Type: | Article |
---|---|
Official URL: | https://doi.org/10.30574/wjaets.2025.15.2.0590 |
Uncontrolled Keywords: | Real-Time Speech Synthesis; Few-Shot Text-To-Speech; Multilingual TTS; Coqui.AI Speaker Embedding; Personalized Synthetic Voice; Real-Time Voice Cloning System |
Depositing User: | Editor Engineering Section |
Date Deposited: | 04 Aug 2025 16:25 |
Related URLs: | |
URI: | https://eprint.scholarsrepository.com/id/eprint/3553 |