Enhancing data processing with Apache spark: A technical deep dive

Dulam, Avinash (2025) Enhancing data processing with Apache spark: A technical deep dive. World Journal of Advanced Engineering Technology and Sciences, 15 (3). pp. 1279-1284. ISSN 2582-8266

[thumbnail of WJAETS-2025-0910.pdf] Article PDF
WJAETS-2025-0910.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download ( 484kB)

Abstract

Apache Spark has revolutionized big data processing by introducing a unified computing framework that addresses the challenges of distributed data processing, real-time analytics, and machine learning at scale. The framework's architecture, built on Resilient Distributed Datasets (RDDs), enables fault-tolerant parallel operations while providing sophisticated optimization techniques for enhanced performance. Through advanced features like Structured Streaming, DataFrame abstractions, and MLlib integration, Spark offers comprehensive solutions for modern data processing needs, from batch processing to real-time analytics, effectively supporting organizations in managing exponentially growing data volumes while maintaining processing efficiency and scalability. The platform's innovative approach to data abstraction, combined with its robust optimization capabilities and integration with modern computing paradigms, establishes it as a cornerstone technology for enterprises seeking to harness the power of big data while minimizing operational complexity and maximizing resource utilization across diverse processing environments.

Item Type: Article
Official URL: https://doi.org/10.30574/wjaets.2025.15.3.0910
Uncontrolled Keywords: Distributed Computing; Data Processing Optimization; Stream Processing; Machine Learning Integration; Resource Management
Depositing User: Editor Engineering Section
Date Deposited: 16 Aug 2025 13:11
Related URLs:
URI: https://eprint.scholarsrepository.com/id/eprint/4704