Efficient and interpretable monkeypox detection using vision transformers with explainable visualizations

Akter, Sanjida and Mahmud, Mohammad Rasel and Islam, Md Ariful and Siddiqui, Md Ismail Hossain and Sakib, Anamul Haque (2025) Efficient and interpretable monkeypox detection using vision transformers with explainable visualizations. International Journal of Science and Research Archive, 15 (1). pp. 1811-1822. ISSN 2582-8185

[thumbnail of IJSRA-2025-1162.pdf] Article PDF
IJSRA-2025-1162.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download ( 792kB)

Abstract

Monkeypox is a zoonotic disease that poses diagnostic challenges due to its resemblance to other pox-type skin lesions like measles and chickenpox. Traditional deep learning (DL) methods, especially convolutional neural networks (CNNs), often struggle with generalization when trained on small, imbalanced datasets. These methods also tend to lack interpretability and computational efficiency, limiting their use in real-time, resource-constrained settings. This study introduces a lightweight, explainable DL framework based on EfficientFormerV2, which merges the advantages of convolutional inductive biases with efficient token-mixing strategies. We used the publicly available Monkeypox Skin Image Dataset (MSID), which contains 770 images across four categories: Monkeypox, Chickenpox, Measles, and Normal. Through advanced preprocessing and augmentation, we expanded the dataset to 4,000 images, improving class representation and reducing overfitting. Also, we evaluated five models—EfficientFormerV2, T2T-ViT, DeiT, Xception, and MobileNetV4—using metrics like F1-score, specificity, PR AUC, and Matthews Correlation Coefficient (MCC) with 10-fold stratified cross-validation. EfficientFormerV2 performed the best, achieving an F1-score of 98.73%, specificity of 99.63%, PR AUC of 99.86%, and MCC of 94.15%. We used Grad-CAM visualizations to create class-specific heatmaps for better interpretability. This framework combines an efficient architecture, data-centric augmentation, and explainable AI (XAI), offering high accuracy and low-latency predictions, making it suitable for real-time monkeypox screening, especially in low-resource settings.

Item Type: Article
Official URL: https://doi.org/10.30574/ijsra.2025.15.1.1162
Uncontrolled Keywords: Skin Lesion; Vision Transformer; Hybrid Deep Learning; Explainable AI(XAI); Monkeypox
Depositing User: Editor IJSRA
Date Deposited: 22 Jul 2025 23:44
Related URLs:
URI: https://eprint.scholarsrepository.com/id/eprint/1719