Khuat, Quang Hai (2025) Mastering Apache spark architecture: A guide to optimizing data processing workflows. World Journal of Advanced Engineering Technology and Sciences, 15 (1). pp. 910-923. ISSN 2582-8266
![WJAETS-2025-0294.pdf [thumbnail of WJAETS-2025-0294.pdf]](https://eprint.scholarsrepository.com/style/images/fileicons/text.png)
WJAETS-2025-0294.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.
Abstract
This article provides a comprehensive guide to mastering Apache Spark architecture and optimizing data processing workflows. It begins by exploring the fundamental components of Spark's distributed computing model, including the driver program, cluster manager, and executors. The discussion then delves into advanced topics such as resource management, data locality enhancement, and fault tolerance mechanisms. Particular attention is given to performance optimization techniques, including memory management strategies, shuffle operation improvements, and Spark SQL tuning for complex queries. The article also covers the effective use of the Spark Web UI for monitoring and identifying performance bottlenecks. Real-world case studies and quantitative analyses demonstrate the practical impact of these optimization techniques across various industries. Finally, the article examines emerging trends in the Spark ecosystem, including integration with cloud-native technologies and the importance of continuous learning for data engineers. This guide serves as an essential resource for data professionals seeking to harness the full potential of Apache Spark in building scalable and efficient big data processing solutions.
Item Type: | Article |
---|---|
Official URL: | https://doi.org/10.30574/wjaets.2025.15.1.0294 |
Uncontrolled Keywords: | Apache Spark Architecture; Data Processing Optimization; Distributed Computing; Fault Tolerance; Performance Tuning |
Depositing User: | Editor Engineering Section |
Date Deposited: | 04 Aug 2025 16:10 |
Related URLs: | |
URI: | https://eprint.scholarsrepository.com/id/eprint/2841 |