Sakib, Anamul Haque and Siddiqui, Md Ismail Hossain and Akter, Sanjida and Sakib, Abdullah Al and Mahmud, Mohammad Rasel (2025) LEVit-Skin: A balanced and interpretable transformer-CNN model for multi-class skin cancer diagnosis. International Journal of Science and Research Archive, 15 (1). pp. 1860-1873. ISSN 2582-8185
![IJSRA-2025-1166.pdf [thumbnail of IJSRA-2025-1166.pdf]](https://eprint.scholarsrepository.com/style/images/fileicons/text.png)
IJSRA-2025-1166.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.
Abstract
Skin cancer is a major cause of death, making early detection essential. This study presents LEVit, an explainable and class-balanced deep learning framework designed for multiclass skin lesion classification. LEVit combines a hybrid Vision Transformer (ViT) with a Convolutional Neural Network (CNN). We evaluated LEVit on two benchmark dermoscopic datasets: HAM10000, which consists of 10,015 images across 7 classes, and ISIC 2019, with 25,331 images spanning 8 classes. Both datasets have notable class imbalances. To address this issue, we applied advanced augmentation techniques to oversample minority classes, ensuring a uniform class distribution and enhancing the model's ability to generalize. LEVit effectively captures local lesion textures and global spatial relationships through its integrated self-attention and convolutional modules. We compared its performance against four state-of-the-art models: NASNet, SqueezeNet, SE-Net, and Xception, across four metrics: F1 Score, Specificity, Matthews Correlation Coefficient (MCC), and Precision-Recall Area Under the Curve (PR AUC). LEVit achieved outstanding results, with a F1 Score of 98.11% and a PR AUC of 98.57% on the ISIC 2019 dataset, and a F1 Score of 96.11% and a PR AUC of 96.62% on HAM10000. For interpretability, we utilized Grad-CAM to generate class-specific heatmaps, which highlight the key areas of lesions that influence the model's predictions. This work demonstrates that balanced training and a hybrid architecture can enhance both classification accuracy and interpretability in skin cancer diagnostics, effectively addressing the limitations of existing models and paving the way for reliable clinical applications.
Item Type: | Article |
---|---|
Official URL: | https://doi.org/10.30574/ijsra.2025.15.1.1166 |
Uncontrolled Keywords: | Skin cancer; Vision transformer; Deep learning; Explainable AI (XAI); Medical imaging. |
Depositing User: | Editor IJSRA |
Date Deposited: | 22 Jul 2025 23:42 |
Related URLs: | |
URI: | https://eprint.scholarsrepository.com/id/eprint/1725 |