Intelligent fault detection in snowflake-based big data pipelines using federated machine learning

Goli, Harsha Vardhan Reddy (2025) Intelligent fault detection in snowflake-based big data pipelines using federated machine learning. Global Journal of Engineering and Technology Advances, 23 (2). pp. 215-221. ISSN 2582-5003

[thumbnail of GJETA-2025-0163.pdf] Article PDF
GJETA-2025-0163.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download ( 750kB)

Abstract

This article introduces a federated machine learning (FML) framework for detecting faults and anomalies in Snowflake-powered Big Data pipelines. Traditional fault detection systems typically rely on centralized log ingestion, which raises concerns about privacy and latency. In contrast, the proposed FML-based approach enables individual data nodes to train local models on telemetry and workload metadata, such as query failures, slowdowns, and unexpected I/O patterns. These local models then collaborate in a privacy-preserving manner to create a robust global anomaly detection system. Using synthetic workloads designed to simulate financial and healthcare data lakes, this study demonstrates that the FML approach improves fault detection precision by 22% compared to conventional centralized monitoring solutions. The system integrates seamlessly with Snowflake’s metadata and query profiling layers, using external functions and Snowpipe for real-time data ingestion. Additionally, the researchers developed a Snowflake-native dashboard that visualizes detected anomalies and recommends mitigation strategies. The paper concludes with a discussion on the broader impact of secure, distributed AI systems in enterprise data management, illustrating how combining Snowflake’s cloud scalability with federated learning can enhance fault detection, reduce downtime, and pave the way for autonomous data operations in modern data ecosystems.

Item Type: Article
Official URL: https://doi.org/10.30574/gjeta.2025.23.2.0163
Uncontrolled Keywords: Federated Machine Learning; Snowflake; Big Data Pipelines; Fault Detection; Anomaly Detection; Privacy-Preserving AI; Real-Time Data Ingestion; Metadata; Data Integrity; Autonomous Data Operations
Depositing User: Editor Engineering Section
Date Deposited: 22 Aug 2025 09:09
Related URLs:
URI: https://eprint.scholarsrepository.com/id/eprint/5623