Synthetic data generation in healthcare: Using GANs to overcome data scarcity and bias in machine learning

Faheem, Muhammad and Iqbal, Aqib (2025) Synthetic data generation in healthcare: Using GANs to overcome data scarcity and bias in machine learning. International Journal of Science and Research Archive, 16 (1). pp. 628-639. ISSN 2582-8185

Abstract

The growing use of machine learning (ML) in healthcare is constrained by data scarcity, privacy regulations, fragmented data systems, and demographic imbalances. These limitations reduce model accuracy, hinder generalizability, and contribute to algorithmic bias, particularly affecting minority populations and underrepresented disease categories. Generative Adversarial Networks (GANs) have emerged as a promising solution by enabling the creation of synthetic datasets that preserve data utility while enhancing privacy and fairness. This paper explores the use of GAN-based synthetic data in addressing data limitations within healthcare ML pipelines. It examines key GAN architectures suited for structured clinical data, electronic health records (EHRs), and medical imaging, highlighting their training processes and privacy-preserving capabilities. Applications across clinical research, epidemiology, rare disease modeling, and privacy-conscious data sharing are reviewed. The paper further evaluates synthetic data quality using utility metrics, privacy risk assessments, and fidelity–privacy tradeoffs. While synthetic data offers transformative potential, challenges remain in GAN stability, ethical governance, and validation standards. Future directions include integrating federated learning, enhancing explainability, and advancing differential privacy to ensure ethical and inclusive AI development in healthcare.

Item Type: Article
Official URL: https://doi.org/10.30574/ijsra.2025.16.1.2022
Uncontrolled Keywords: Synthetic Data; GANs (Generative Adversarial Networks); Healthcare; Machine Learning; Data Scarcity
Date Deposited: 01 Sep 2025 12:14
Related URLs:
URI: https://eprint.scholarsrepository.com/id/eprint/4403