AI-driven anomaly detection and root cause analysis: Using machine learning on logs, metrics, and traces to detect subtle performance anomalies, security threats, or failures in complex cloud environments

Guntupalli, Raviteja (2025) AI-driven anomaly detection and root cause analysis: Using machine learning on logs, metrics, and traces to detect subtle performance anomalies, security threats, or failures in complex cloud environments. World Journal of Advanced Research and Reviews, 26 (2). pp. 874-879. ISSN 2581-9615

Abstract

Enhanced complexity, together with high service dependencies and dynamic scaling requirements in present-day cloud environments, create both critical and difficult conditions for quick anomaly detection as well as root cause analysis (RCA). The traditional rule-based monitoring framework cannot discover slight and new types of anomalies that occur before system outages or security breaches. The document examines how AI systems alongside Machine Learning (ML) capabilities combined with deep learning processing of logs, metrics, and traces help automatically detect anomalies while performing RCA operations in cloud-native platforms. The paper examines the utilization of supervised learning with unsupervised and reinforcement methods on diverse telemetry information to perform real-time detection of performance dips and, system errors and anomalous usage patterns. These systems can use AI technology to link distributed system incidents while simultaneously pinpointing foundational problems that human personnel cannot match for speed when recommending solutions. The operational effects of these techniques can be seen through real-life applications at Adobe, Uber, Zalando, and LinkedIn. Automated RCA systems face ethical and technical challenges, according to the paper, which details problems like model drift, interpretability of complex models, and observability gaps. The ongoing expansion of cloud systems makes AI-driven anomaly detection essential for maintaining resilience and optimizing performance and cyber defense for both multi-cloud and hybrid cloud systems.

Item Type:	Article
Official URL:	https://doi.org/10.30574/wjarr.2025.26.2.1521
Uncontrolled Keywords:	Cloud monitoring; Anomaly detection; Root cause analysis; Machine learning; Deep learning; Observability; Logs; Metrics; Traces; AI operations; Security threats; Cloud resilience
Date Deposited:	27 Jul 2025 16:42
Related URLs:	https://journalwjarr.com/node/1560 https://doi.org/10.30574/wjarr.2025.26.2... https://journalwjarr.com/sites/default/f...
URI:	https://eprint.scholarsrepository.com/id/eprint/2681

View Item