Perumal, Raghavan Krishnasamy Lakshmana (2025) Optimizing Large Language Model Deployment in Edge Computing Environments. International Journal of Science and Research Archive, 14 (3). pp. 1658-1669. ISSN 2582-8185
![IJSRA-2025-0912.pdf [thumbnail of IJSRA-2025-0912.pdf]](https://eprint.scholarsrepository.com/style/images/fileicons/text.png)
IJSRA-2025-0912.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.
Abstract
Deploying large language models (LLMs) in edge computing environments is an emerging challenge at the intersection of AI and distributed systems. Running LLMs directly on edge devices can greatly reduce latency and improve privacy, enabling real-time intelligent applications without constant cloud connectivity. However, modern LLMs often consist of billions of parameters and require tens of gigabytes of memory and massive compute power, far exceeding what typical edge hardware can provide. In this paper, we present a comprehensive approach to optimize LLM deployment in edge computing environments by combining four existing classes of optimisation techniques: model compression, quantization, distributed inference, and federated learning, in a unified framework. Our insight is that a holistic combination of these techniques is necessary to successfully deploy LLMs in practical edge settings. We also provide new algorithmic solutions and empirical data to advance the state of the art.
Item Type: | Article |
---|---|
Official URL: | https://doi.org/10.30574/ijsra.2025.14.3.0912 |
Uncontrolled Keywords: | Large Language Models (LLMs); Edge Computing; Model Compression; Distributed Inference; Federated Learning |
Depositing User: | Editor IJSRA |
Date Deposited: | 17 Jul 2025 17:39 |
Related URLs: | |
URI: | https://eprint.scholarsrepository.com/id/eprint/1309 |