Multi-Modal product recognition in retail environments: Enhancing accuracy through integrated vision and OCR approaches

Patel, Saumil R (2025) Multi-Modal product recognition in retail environments: Enhancing accuracy through integrated vision and OCR approaches. World Journal of Advanced Research and Reviews, 25 (1). pp. 1837-1844. ISSN 2581-9615

[thumbnail of WJARR-2025-0122.pdf] Article PDF
WJARR-2025-0122.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download ( 515kB)

Abstract

This research presents a transformative artificial intelligence solution that addresses critical operational challenges in modern retail environments. Our integrated system combines advanced computer vision and text recognition technologies to automate product identification, inventory tracking, and checkout processes. In response to the retail industry's pressing needs for automation amid labor shortages and rising operational costs, we developed and implemented a comprehensive solution that demonstrates significant business value. The system achieved a 94.6% accuracy rate in product recognition while processing 50-60 items per second, enabling real-time inventory management and automated checkout capabilities. Field testing across multiple retail locations showed a 35% reduction in inventory management time, a 40% decrease in checkout wait times, and a 25% improvement in stock accuracy. The solution encompasses a robust dataset of 538 distinct products, including challenging categories such as liquor bottles and grocery items, and features sophisticated optimization techniques that ensure consistent performance in diverse retail environments. Implementation of this system can lead to substantial operational cost savings, enhanced customer experience, and improved inventory accuracy. Our research demonstrates how AI-driven automation can address the retail industry's current challenges while providing a scalable foundation for future innovations in retail operations management.

Item Type: Article
Official URL: https://doi.org/10.30574/wjarr.2025.25.1.0122
Uncontrolled Keywords: Product Recognition; Computer Vision; OCR; Deep Learning; Retail Automation; YOLO; Multi-modal Learning; Vector Databases
Depositing User: Editor WJARR
Date Deposited: 11 Jul 2025 16:28
Related URLs:
URI: https://eprint.scholarsrepository.com/id/eprint/363