Gottepu, Anil Kumar (2025) Capturing and mitigating reliability issues in cloud computing: A comprehensive approach. World Journal of Advanced Engineering Technology and Sciences, 15 (3). pp. 1197-1206. ISSN 2582-8266
![WJAETS-2025-1040.pdf [thumbnail of WJAETS-2025-1040.pdf]](https://eprint.scholarsrepository.com/style/images/fileicons/text.png)
WJAETS-2025-1040.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.
Abstract
This article presents a comprehensive framework for addressing reliability challenges in modern cloud computing environments. The article explores the evolution from traditional redundancy-based approaches to sophisticated predictive analytics and integrated security postures essential for maintaining high availability in distributed systems. The article examines how real-time monitoring methodologies, combined with machine learning techniques like Modified Sequential Minimal Optimization and Weibull Distribution Analysis, can anticipate and prevent service disruptions before they impact users. The article analyzes architectural considerations that minimize complexity and decouple components to contain failure propagation while evaluating how service-level agreements must evolve to reflect multidimensional reliability requirements. Through an enterprise-scale case study, the article demonstrates the practical implementation of these principles and their transformative impact on both technical metrics and business outcomes. The article highlights emerging trends in cloud reliability engineering, including observability platforms, AIOps capabilities, and reliability-as-code approaches, while identifying research gaps and future opportunities. This article contributes to the growing field of cloud resilience by integrating technical, organizational, and economic perspectives into a holistic reliability strategy suitable for increasingly complex cloud deployments.
Item Type: | Article |
---|---|
Official URL: | https://doi.org/10.30574/wjaets.2025.15.3.1040 |
Uncontrolled Keywords: | Cloud Reliability Engineering; Predictive Failure Analytics; Observability; Service Level Objectives; Chaos Engineering |
Depositing User: | Editor Engineering Section |
Date Deposited: | 16 Aug 2025 13:10 |
Related URLs: | |
URI: | https://eprint.scholarsrepository.com/id/eprint/4683 |