Pasupuleti, Naveen Srikanth (2025) Transforming healthcare through cloud-native machine learning architecture: A case study in AWS, Spark, and Kubernetes Implementation. World Journal of Advanced Research and Reviews, 26 (2). pp. 1622-1631. ISSN 2581-9615
![WJARR-2025-1649.pdf [thumbnail of WJARR-2025-1649.pdf]](https://eprint.scholarsrepository.com/style/images/fileicons/text.png)
WJARR-2025-1649.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.
Abstract
This article examines a transformative case study in healthcare data infrastructure, where a skilled data engineer revolutionized operations by implementing an integrated technology stack with advanced machine learning capabilities. Facing challenges of processing diverse and voluminous patient data, the engineer architected a comprehensive solution leveraging AWS services, including S3, Redshift, and Lambda to create a cloud-based data lake optimized for AI workloads. This foundation was augmented with Apache Spark for distributed processing and MLlib for scalable machine learning, Hadoop clusters for specialized workloads, and Kubernetes for container orchestration—creating a flexible, resilient system capable of supporting sophisticated predictive models. The implementation featured automated ETL processes within a robust data pipeline alongside purpose-built feature stores and model serving infrastructure. A strategic combination of SQL and NoSQL databases provided flexible storage solutions optimized for various machine learning algorithms, from natural language processing for clinical notes to computer vision for medical imaging. Despite obstacles including data inconsistency and latency issues, the solution delivered substantial improvements in operational efficiency and clinical outcomes through AI-powered predictive capabilities, demonstrating the transformative potential of modern data engineering and machine learning approaches in healthcare settings.
Item Type: | Article |
---|---|
Official URL: | https://doi.org/10.30574/wjarr.2025.26.2.1649 |
Uncontrolled Keywords: | Data Lake Architecture; Distributed Computing; Container Orchestration; ETL Automation; Healthcare Analytics |
Depositing User: | Editor WJARR |
Date Deposited: | 20 Aug 2025 10:53 |
Related URLs: | |
URI: | https://eprint.scholarsrepository.com/id/eprint/2926 |