Jain, Souratn and Das, Jyotipriya (2025) Integrating data engineering and MLOps for scalable and resilient machine learning pipelines: frameworks, challenges, and future trends. World Journal of Advanced Engineering Technology and Sciences, 14 (1). pp. 241-253. ISSN 2582-8266
![WJAETS-2025-0020.pdf [thumbnail of WJAETS-2025-0020.pdf]](https://eprint.scholarsrepository.com/style/images/fileicons/text.png)
WJAETS-2025-0020.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.
Abstract
The combination of Data Engineering and MLOps has become the foundation practices for constructing efficient and secure ML processes. While Data Engineering provides the necessary solutions for handling data in terms of ingestion, transformation, and storage, MLOps delivers the solutions to handling models in terms of deployment, monitoring, and management. Together, these fields help handle the increasing challenges of handling massive amounts of data and training and deploying an ML model for real-time use. This paper discusses the possibilities and trends of integrating data engineering and MLOps, seeking architectural patterns and toolchains mostly seen in optimizing machine learning pipelines. Key issues addressed include data management problems where the tool is limited in functionalities for data processing; workflow slowdown or interruption in automated CI/CD pipelines; and data use licenses where there are disputable ethical issues of data utilization and data fairness. Non-trivial techniques that enable a scalable and robust application architecture, including pipeline design, service redundancy, and automatic coordination, are discussed, along with their example applications. Novel approaches to MLOps are described in terms of serverless architectures, federated learning, and AI toolkits for managing pipelines, and they are presented to demonstrate some future developments. As a synthesis of current literature and best practices in the field of ML, this paper offers practical advice on constructing resilient, high-performing systems. Hopefully, this work will provide the existing literature on machine learning with further development and a best practice guide for organizations to acquire operational effectiveness and advancement into this new era of data-based decision-making.
Item Type: | Article |
---|---|
Official URL: | https://doi.org/10.30574/wjaets.2025.14.1.0020 |
Uncontrolled Keywords: | Data Engineering; MLOps; Machine Learning Pipelines; Scalability and Resilience; AI-driven Automation |
Depositing User: | Editor Engineering Section |
Date Deposited: | 27 Jul 2025 14:56 |
Related URLs: | |
URI: | https://eprint.scholarsrepository.com/id/eprint/2319 |