AI-enhanced self-healing Kubernetes for scalable cloud operations

Nunavath, Veeresh (2025) AI-enhanced self-healing Kubernetes for scalable cloud operations. World Journal of Advanced Engineering Technology and Sciences, 16 (2). 021-029. ISSN 2582-8266

Abstract

As cloud native systems become more complex and dynamic, their infrastructure must be resilient and autonomous. However, self-healing is only one of the built-in features that have pushed Kubernetes well past the leading alternative to become the de facto standard across the industry for orchestrating containerized applications. Still, such features are reactive and their scope is limited. By integrating Artificial Intelligence (AI) into Kubernetes, traditional self-healing evolves into predictive, adaptive, and autonomous functionality. In detail, it reviews the architectural foundations, AI methodology, strategies for implementation, and security considerations required to build these AI-enabled self-healing Kubernetes systems in a scalable cloud environment. Anomaly detection and failure prediction are done using machine learning, policy using reinforcement learning, and natural language processing for doing log analysis in key focus areas. Implementation practices for deploying custom controllers, sidecar agents, and digital twins, with a discussion of their performance and scalability trade-off, are included in the discussion. The second part talks about security challenges (XAI) and standardized frameworks. The architecture is given together with an analysis of the literature on this architecture. There are enough lessons from these examples to draw a complete roadmap for AI-enabled self-healing Kubernetes architectures for pushing cloud operations from here to the next level. Model integrity, API access control, defences against data poisoning, and privacy compliance. The emerging directions (i.e., cross-cluster AI orchestration, explainable AI

Item Type: Article
Official URL: https://doi.org/10.30574/wjaets.2025.16.2.1255
Uncontrolled Keywords: Kubernetes; Self-Healing Systems; Artificial Intelligence; Cloud-Native Infrastructure; Anomaly Detection; Reinforcement Learning; AI Security; Container Orchestration; Explainable AI; Autonomous Operations
Date Deposited: 15 Sep 2025 05:24
Related URLs:
URI: https://eprint.scholarsrepository.com/id/eprint/6004