Alva, Lingareddy (2025) Generative AI for self-optimizing and autonomous data pipelines. World Journal of Advanced Research and Reviews, 26 (2). pp. 1071-1079. ISSN 2581-9615
![WJARR-2025-1667.pdf [thumbnail of WJARR-2025-1667.pdf]](https://eprint.scholarsrepository.com/style/images/fileicons/text.png)
WJARR-2025-1667.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.
Abstract
Generative AI technologies offer transformative potential for addressing fundamental challenges in data pipeline management across enterprise environments. This comprehensive exploration details how artificial intelligence can create self-optimizing, autonomous data pipelines capable of adapting to evolving data ecosystems without human intervention. The integration of machine learning techniques—including anomaly detection, reinforcement learning, and large language models—enables unprecedented capabilities in pipeline orchestration, from predictive failure prevention to dynamic resource allocation. These intelligent systems demonstrate substantial advancements in multiple dimensions: dramatically reducing processing times, preventing failures before occurrence, optimizing resource utilization, automating schema evolution, and significantly lowering operational costs. By leveraging established platforms like Apache Airflow, Apache Spark, and Kubernetes while introducing AI-powered middleware and Databricks' Generative AI capabilities (including Lakehouse IQ, Foundation Models, RAG pipelines, Custom AI Agents, and Auto-Documentation tools), this architecture enables incremental adoption pathways suitable for various organizational maturity levels. Despite remarkable progress, several considerations remain, including initial training requirements, integration with legacy infrastructure, explainability concerns in regulated sectors, and governance frameworks for autonomous systems. Future directions point toward streaming data optimization, federated learning approaches that preserve privacy, specialized language models for intuitive pipeline management, and hardware-aware optimizations for specialized computing environments. The convergence of data engineering with artificial intelligence represents a fundamental shift toward truly adaptive data infrastructure that minimizes operational burden while maximizing business value.
Item Type: | Article |
---|---|
Official URL: | https://doi.org/10.30574/wjarr.2025.26.2.1667 |
Uncontrolled Keywords: | Generative AI; Autonomous data pipelines; Failure prediction; Resource optimization; Schema evolution |
Depositing User: | Editor WJARR |
Date Deposited: | 20 Aug 2025 10:46 |
Related URLs: | |
URI: | https://eprint.scholarsrepository.com/id/eprint/2753 |