Consistency models for distributed deep learning: Tradeoffs between convergence and communication

Dash, Anjan Kumar (2025) Consistency models for distributed deep learning: Tradeoffs between convergence and communication. World Journal of Advanced Engineering Technology and Sciences, 15 (3). pp. 436-445. ISSN 2582-8266

[thumbnail of WJAETS-2025-0892.pdf] Article PDF
WJAETS-2025-0892.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download ( 748kB)

Abstract

Ensuring model convergence in distributed deep learning systems often leads to unnecessary communication. This article discusses strong consistency, eventual consistency and bounded staleness to explore the theories behind them and their use in different machine learning fields. It has been observed in experiments that relaxed consistency models greatly decrease the amount of communication needed, although these models make the outcome more variable and might prolong the time for training. The article explains a dynamic system that changes requirements in accordance with training and gradient behavior to ensure both high efficiency and dependability. Different CNNs and transformer models are compared in this article, with the former responding better to relaxing consistency. This framework offers gradient-based adaptation, phase-based consistency changes, topology-aware communication and auto-tuning of the staleness bound to enhance results for training large datasets in a distributed environment, compared to static methods.

Item Type: Article
Official URL: https://doi.org/10.30574/wjaets.2025.15.3.0892
Uncontrolled Keywords: Distributed Deep Learning; Consistency Models; Communication Efficiency; Adaptive Framework; Parameter Staleness
Depositing User: Editor Engineering Section
Date Deposited: 16 Aug 2025 12:52
Related URLs:
URI: https://eprint.scholarsrepository.com/id/eprint/4465