Salary Prediction Using TF-IDF and Ensemble Machine Learning: A Lightweight and Interpretable Approach

Pisolla, Anil and Moghul, Sameer Baig and Gaddam, Triveni and Iyengar, N Ch Sriman Narayana (2025) Salary Prediction Using TF-IDF and Ensemble Machine Learning: A Lightweight and Interpretable Approach. World Journal of Advanced Research and Reviews, 26 (2). pp. 4445-4453. ISSN 2581-9615

[thumbnail of WJARR-2025-2102.pdf] Article PDF
WJARR-2025-2102.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download ( 625kB)

Abstract

Salary prediction is not just a number, it’s a decision-maker for job seekers, employers, and HR teams, shaping expectations and negotiations. Traditional models rely on structured data like job titles, experience levels, and locations, but overlook job descriptions, where real insights hide—skills, responsibilities, and industry-specific language. This study bridges that gap, combining structured and unstructured data for a more intuitive model. TF-IDF extracts key terms, assigning weights to highlight critical information, while structured data undergoes preprocessing through one-hot encoding and feature scaling. An ensemble learning approach strengthens predictions—Random Forest captures patterns, XGBoost refines them, and Linear Regression serves as a baseline. A meta-model, like Logistic Regression, optimally weighs predictions, enhancing accuracy. Evaluated through Accuracy, Macro Average F1-score, and Weighted Average F1-score, the model outperforms standalone approaches, achieving superior classification performance. The results demonstrate that integrating TF-IDF with ensemble learning provides a more accurate, scalable, and interpretable salary prediction system, ready for real-world applications.

Item Type: Article
Official URL: https://doi.org/10.30574/wjarr.2025.26.2.2102
Uncontrolled Keywords: Salary Prediction; TF-IDF (Term Frequency-Inverse Document Frequency); Ensemble Learning; Machine Learning
Depositing User: Editor WJARR
Date Deposited: 20 Aug 2025 11:52
Related URLs:
URI: https://eprint.scholarsrepository.com/id/eprint/3751