Optimizing GPU Utilization for AI Workloads on AWS EKS

Madabushini, Praneel (2025) Optimizing GPU Utilization for AI Workloads on AWS EKS. World Journal of Advanced Research and Reviews, 26 (1). pp. 1955-1963. ISSN 2581-9615

[thumbnail of WJARR-2025-1233.pdf] Article PDF
WJARR-2025-1233.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download ( 534kB)

Abstract

This article explores comprehensive strategies for optimizing GPU utilization for artificial intelligence workloads on Amazon Elastic Kubernetes Service (EKS). As organizations increasingly deploy computationally intensive AI applications, effective GPU resource management has become critical for balancing performance requirements with cost considerations. The article examines four key optimization domains: GPU instance selection and scheduling strategies, cost optimization and resource allocation techniques, performance enhancement using NVIDIA-specific tools, and model-level optimization methods. Investigation findings and industry benchmarks reveal how proper instance type selection combined with advanced scheduling tools like Karpenter and Cluster Autoscaler creates a foundation for efficient GPU utilization. The article further explores how spot instances, precise resource allocation, and comprehensive monitoring solutions can substantially reduce infrastructure costs. Additionally, it highlights the performance advantages of specialized NVIDIA tools such as TensorRT and Triton Inference Server and examines how model-specific techniques, including mixed precision training, gradient accumulation, knowledge distillation, quantization, and pruning can maximize computational efficiency while preserving model accuracy.

Item Type: Article
Official URL: https://doi.org/10.30574/wjarr.2025.26.1.1233
Uncontrolled Keywords: GPU optimization; AWS EKS; Machine Learning Infrastructure; Inference Acceleration; Resource Allocation
Depositing User: Editor WJARR
Date Deposited: 25 Jul 2025 15:24
Related URLs:
URI: https://eprint.scholarsrepository.com/id/eprint/1907