Enhancing checkpointing and state recovery for large-scale stream processing

Mukkath, Shakir Poolakkal (2025) Enhancing checkpointing and state recovery for large-scale stream processing. World Journal of Advanced Research and Reviews, 26 (2). pp. 296-302. ISSN 2581-9615

[thumbnail of WJARR-2025-1638.pdf] Article PDF
WJARR-2025-1638.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download ( 472kB)

Abstract

As real-time applications demand ever-lower latencies and greater fault tolerance, traditional checkpointing mechanisms in distributed streaming systems face new performance bottlenecks. This article examines recent advancements in reducing checkpointing overhead while maintaining high availability, focusing on incremental state snapshots, asynchronous commit techniques, and log-based recovery models. It highlights the shift towards intelligent state management strategies, where adaptive checkpoint intervals and event-driven rollback mechanisms optimize resource utilization. The discussion delves into emerging storage backends that offer hybrid memory-disk approaches, enabling near-instantaneous state recovery without excessive write amplification. The article presents new perspectives on leveraging event sourcing as a state recovery alternative, where historical data streams are reprocessed dynamically to restore lost computation. Additionally, it explores targeted recovery techniques including partial state rollback, causality tracking, compensating events, and incremental recovery prioritization. These innovations collectively transform fault-tolerant stream processing by minimizing recovery scope while maintaining consistency guarantees. Through case studies and theoretical analysis, this work demonstrates how modern approaches significantly reduce recovery times and resource requirements, advancing the field of high-performance stream processing architectures suitable for mission-critical applications.

Item Type: Article
Official URL: https://doi.org/10.30574/wjarr.2025.26.2.1638
Uncontrolled Keywords: Fault Tolerance; Stream Processing; Incremental Checkpointing; Event Sourcing; Distributed Recovery
Depositing User: Editor WJARR
Date Deposited: 27 Jul 2025 15:27
Related URLs:
URI: https://eprint.scholarsrepository.com/id/eprint/2518