Evaluating Concurrency Impacts on Open AI Language Models

Gupta, Shreyam Dutta (2025) Evaluating Concurrency Impacts on Open AI Language Models. International Journal of Science and Research Archive, 14 (3). pp. 378-387. ISSN 2582-8185

[thumbnail of IJSRA-2025-0647.pdf] Article PDF
IJSRA-2025-0647.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download ( 1MB)

Abstract

While the OpenAI API documentation presents a range of theoretical guidelines and optimization techniques for reducing latency and improving performance in language model applications, it largely focuses on high-level principles rather than providing quantitative, comparative data under realistic load conditions. In this paper, we offer an empirical evaluation of four OpenAI language models—o1-mini, o1-preview, GPT-4o, and GPT-4o-mini; across diverse task categories including explanatory, creative, technical, translation, and coding prompts. By employing asynchronous load testing with varying concurrency levels, we measure key performance metrics such as average response time, throughput, and token efficiency. Our study not only validates the optimization principles discussed in the API documentation but also provides actionable insights and a data-driven framework for model selection in real-world scenarios. This comparative analysis enables practitioners to make informed decisions based on measured performance trade-offs, thereby complementing and extending the theoretical recommendations in the OpenAI guidelines.

Item Type: Article
Official URL: https://doi.org/10.30574/ijsra.2025.14.3.0647
Uncontrolled Keywords: Generative AI; Language Models; Performance Evaluation; Latency Optimization; Token Efficiency
Depositing User: Editor IJSRA
Date Deposited: 16 Jul 2025 17:35
Related URLs:
URI: https://eprint.scholarsrepository.com/id/eprint/1032