AI/ML optimized lakehouse architecture: A Comprehensive framework for modern data science

Aileni, Anvesh Reddy (2025) AI/ML optimized lakehouse architecture: A Comprehensive framework for modern data science. World Journal of Advanced Engineering Technology and Sciences, 15 (2). pp. 2099-2104. ISSN 2582-8266

[thumbnail of WJAETS-2025-0754.pdf] Article PDF
WJAETS-2025-0754.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download ( 499kB)

Abstract

The AI/ML optimized lakehouse architecture represents a transformative paradigm in modern data management, addressing the critical challenges posed by exponential data growth across enterprises. This comprehensive framework integrates the flexibility of data lakes with the performance and reliability of data warehouses, creating a unified platform that eliminates traditional system boundaries and redundancies. The architecture leverages open table formats such as Delta Lake, Apache Iceberg, and Apache Hudi to introduce enterprise-grade features including ACID transactions, schema evolution, and time-travel capabilities to previously unstructured data repositories. Through detailed articles of implementation metrics across diverse industries, the framework demonstrates substantial improvements in query performance, data processing efficiency, model development cycles, and operational costs. ML-centric data pipelines built on this foundation show remarkable advancements in feature engineering capabilities, while integrated feature stores dramatically reduce redundancy and increase model deployment velocity. The lakehouse approach further transforms the machine learning lifecycle through streamlined experimentation, deployment, and monitoring processes, enabling organizations to achieve significantly higher model success rates and faster time-to-production. For enterprises seeking to harness the full potential of their data assets for advanced analytics and artificial intelligence applications, the lakehouse architecture provides a future-proof foundation that scales effectively with growing data volumes while maintaining necessary governance standards.

Item Type: Article
Official URL: https://doi.org/10.30574/wjaets.2025.15.2.0754
Uncontrolled Keywords: Lakehouse architecture; Machine learning infrastructure; Feature engineering; Data pipelines; Model lifecycle management
Depositing User: Editor Engineering Section
Date Deposited: 04 Aug 2025 16:39
Related URLs:
URI: https://eprint.scholarsrepository.com/id/eprint/4003