Dash, Anjan Kumar (2025) Consistency models for distributed deep learning: Tradeoffs between convergence and communication. World Journal of Advanced Engineering Technology and Sciences, 15 (3). pp. 436-445. ISSN 2582-8266
![WJAETS-2025-0892.pdf [thumbnail of WJAETS-2025-0892.pdf]](https://eprint.scholarsrepository.com/style/images/fileicons/text.png)
WJAETS-2025-0892.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.
Abstract
Ensuring model convergence in distributed deep learning systems often leads to unnecessary communication. This article discusses strong consistency, eventual consistency and bounded staleness to explore the theories behind them and their use in different machine learning fields. It has been observed in experiments that relaxed consistency models greatly decrease the amount of communication needed, although these models make the outcome more variable and might prolong the time for training. The article explains a dynamic system that changes requirements in accordance with training and gradient behavior to ensure both high efficiency and dependability. Different CNNs and transformer models are compared in this article, with the former responding better to relaxing consistency. This framework offers gradient-based adaptation, phase-based consistency changes, topology-aware communication and auto-tuning of the staleness bound to enhance results for training large datasets in a distributed environment, compared to static methods.
Item Type: | Article |
---|---|
Official URL: | https://doi.org/10.30574/wjaets.2025.15.3.0892 |
Uncontrolled Keywords: | Distributed Deep Learning; Consistency Models; Communication Efficiency; Adaptive Framework; Parameter Staleness |
Depositing User: | Editor Engineering Section |
Date Deposited: | 16 Aug 2025 12:52 |
Related URLs: | |
URI: | https://eprint.scholarsrepository.com/id/eprint/4465 |