Tyagi, Ankush Jitendrakumar (2025) Scaling deep learning models: Challenges and solutions for large-scale deployments. World Journal of Advanced Engineering Technology and Sciences, 16 (2). 010-020. ISSN 2582-8266
Abstract
Deep learning (DL) models have achieved state-of-the-art performance across numerous domains, including natural language processing, computer vision, and speech recognition. However, the transition from research to production, especially at large scales, presents formidable challenges. As model sizes balloon into billions of parameters and user demand scales exponentially, issues such as training time, inference latency, energy consumption, system reliability, and hardware constraints become significant obstacles. Efficiently scaling DL models is not just a matter of model architecture; it requires a multi-faceted approach encompassing algorithmic, infrastructural, and deployment-level strategies. Large-scale deployments must account for factors such as distributed training across heterogeneous hardware, maintaining inference throughput under real-time constraints, handling memory and communication bottlenecks, and ensuring deployment flexibility from cloud clusters to edge devices. The performance and cost-efficiency of DL systems at scale hinge upon techniques such as model and data parallelism, quantisation, mixed-precision training, and sharded inference. Additionally, orchestration tools like Kubernetes, together with specialised inference runtimes such as TensorRT and NVIDIA Triton, are critical for automated, scalable deployment pipelines. This paper presents a deep technical analysis of the core challenges inherent in scaling DL models, examines modern solutions and their trade-offs, and proposes an integrated framework to address real-world deployment needs. By combining innovations at both the model level and system infrastructure level, the goal is to enable resilient, scalable, and production-grade AI deployments.
Item Type: | Article |
---|---|
Official URL: | https://doi.org/10.30574/wjaets.2025.16.2.1252 |
Uncontrolled Keywords: | Deep Learning Scalability; Large-Scale AI Deployment; Distributed Training; Inference Optimization; Model Parallelism |
Date Deposited: | 15 Sep 2025 05:24 |
Related URLs: | |
URI: | https://eprint.scholarsrepository.com/id/eprint/6001 |