Soppari, Kavitha and Vangapally, Bhanu and Sohail, Syed Sameer and Dubba, Harish (2025) Text to image generation using BERT and GAN. International Journal of Science and Research Archive, 14 (1). pp. 720-725. ISSN 2582-8185
![IJSRA-2025-0137.pdf [thumbnail of IJSRA-2025-0137.pdf]](https://eprint.scholarsrepository.com/style/images/fileicons/text.png)
IJSRA-2025-0137.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.
Abstract
Generating text to images is a difficult task that combines natural language processing and computer vision. Currently available generative adversarial network (GAN)-based models usually employ text encoders that have already been trained on image-text pairs. Nevertheless, these encoders frequently fall short in capturing the semantic complexity of unread text during pre-training, which makes it challenging to produce images that accurately correspond with the written descriptions supplied.Using BERT, a very successful pre-trained language model in natural language processing, we present a novel text-to-image generating model in order to address this problem. BERT's aptitude for picture generating tasks is improved by allowing it to encode rich textual information through fine-tuning on a large text corpus. Results from experiments on a CUB_200_2011 dataset show that our approach performs better than baseline models in both qualitative and quantitative measures.
Item Type: | Article |
---|---|
Official URL: | https://doi.org/10.30574/ijsra.2025.14.1.0137 |
Uncontrolled Keywords: | Text to image generation; Multimodal data; BERT; GAN; High quality |
Depositing User: | Editor IJSRA |
Date Deposited: | 13 Jul 2025 14:19 |
Related URLs: | |
URI: | https://eprint.scholarsrepository.com/id/eprint/635 |