LEVit-Skin: A balanced and interpretable transformer-CNN model for multi-class skin cancer diagnosis

Sakib, Anamul Haque and Siddiqui, Md Ismail Hossain and Akter, Sanjida and Sakib, Abdullah Al and Mahmud, Mohammad Rasel (2025) LEVit-Skin: A balanced and interpretable transformer-CNN model for multi-class skin cancer diagnosis. International Journal of Science and Research Archive, 15 (1). pp. 1860-1873. ISSN 2582-8185

[thumbnail of IJSRA-2025-1166.pdf] Article PDF
IJSRA-2025-1166.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download ( 810kB)

Abstract

Skin cancer is a major cause of death, making early detection essential. This study presents LEVit, an explainable and class-balanced deep learning framework designed for multiclass skin lesion classification. LEVit combines a hybrid Vision Transformer (ViT) with a Convolutional Neural Network (CNN). We evaluated LEVit on two benchmark dermoscopic datasets: HAM10000, which consists of 10,015 images across 7 classes, and ISIC 2019, with 25,331 images spanning 8 classes. Both datasets have notable class imbalances. To address this issue, we applied advanced augmentation techniques to oversample minority classes, ensuring a uniform class distribution and enhancing the model's ability to generalize. LEVit effectively captures local lesion textures and global spatial relationships through its integrated self-attention and convolutional modules. We compared its performance against four state-of-the-art models: NASNet, SqueezeNet, SE-Net, and Xception, across four metrics: F1 Score, Specificity, Matthews Correlation Coefficient (MCC), and Precision-Recall Area Under the Curve (PR AUC). LEVit achieved outstanding results, with a F1 Score of 98.11% and a PR AUC of 98.57% on the ISIC 2019 dataset, and a F1 Score of 96.11% and a PR AUC of 96.62% on HAM10000. For interpretability, we utilized Grad-CAM to generate class-specific heatmaps, which highlight the key areas of lesions that influence the model's predictions. This work demonstrates that balanced training and a hybrid architecture can enhance both classification accuracy and interpretability in skin cancer diagnostics, effectively addressing the limitations of existing models and paving the way for reliable clinical applications.

Item Type: Article
Official URL: https://doi.org/10.30574/ijsra.2025.15.1.1166
Uncontrolled Keywords: Skin cancer; Vision transformer; Deep learning; Explainable AI (XAI); Medical imaging.
Depositing User: Editor IJSRA
Date Deposited: 22 Jul 2025 23:42
Related URLs:
URI: https://eprint.scholarsrepository.com/id/eprint/1725