PAULRAJ, NISHANTH JOSEPH (2025) ML-driven data engineering pipeline for health informatics. World Journal of Advanced Engineering Technology and Sciences, 15 (2). pp. 765-773. ISSN 2582-8266
![WJAETS-2025-0629.pdf [thumbnail of WJAETS-2025-0629.pdf]](https://eprint.scholarsrepository.com/style/images/fileicons/text.png)
WJAETS-2025-0629.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.
Abstract
This article presents a comprehensive framework for implementing machine learning-driven data engineering pipelines in healthcare informatics. Healthcare data presents unique challenges including high dimensionality, heterogeneity across sources, missing values, temporal dependencies, and strict privacy requirements. To address these challenges, we propose a four-layer architecture comprising data ingestion, data processing, ML modeling, and model management components. The pipeline leverages Apache Spark and Delta Lake for robust data processing, modern ML frameworks for predictive modeling, and MLflow for model lifecycle management. It demonstrates the practical application of this architecture through a sepsis risk prediction use case, highlighting how temporal patterns in clinical data can be leveraged for early intervention. The article also explores deep learning approaches for genomic data analysis and discusses critical implementation challenges including data privacy, class imbalance, model explainability, and model drift. Throughout, It emphasizes best practices that balance technical performance with clinical utility and regulatory compliance, providing a roadmap for healthcare organizations seeking to implement scalable ML solutions.
Item Type: | Article |
---|---|
Official URL: | https://doi.org/10.30574/wjaets.2025.15.2.0629 |
Uncontrolled Keywords: | Healthcare Data Engineering; Machine Learning Pipelines; Clinical Predictive Modeling; Model Lifecycle Management; Sepsis Prediction |
Depositing User: | Editor Engineering Section |
Date Deposited: | 04 Aug 2025 16:25 |
Related URLs: | |
URI: | https://eprint.scholarsrepository.com/id/eprint/3582 |