Abstract

Skin cancer, characterized by the uncontrolled growth and spread of abnormal skin cells, remains a significant global health challenge. It predominantly appears on sun-exposed areas like the face, neck, and arms, though it can also develop in less exposed regions. Melanoma, the most aggressive form of skin cancer, is particularly concerning due to its rapid metastasis if not detected early. This aggressive nature underscores the need for timely diagnosis and treatment. While advances have been made in understanding skin cancer mechanisms, the extensive surface area of the skin, as the body’s largest organ, makes it easy for lesions to go undetected. However, the advent of prospective databases of clinical and imaging data has equipped the medical community with highly sensitive and specific biomarkers for cancer diagnosis. These biomarkers facilitate early detection and accurate diagnosis through non-invasive methods, allowing for effective treatment of melanoma before it spreads. This thesis introduces tools for analyzing melanoma using clinical data and Whole Slide Imaging (WSI). It focuses on identifying early-stage melanoma patients through risk grouping and biomarker detection. By applying advanced survival analysis, pattern recognition techniques, and statistical clustering, we develop predictive and interpretability models aimed at improving early detection and diagnosis. The main outcomes of this thesis are five-fold. Firstly, we evaluate various survival analysis algorithms on cutaneous melanoma datasets, identifying the most effective methods. Our analysis reveals that, to date, treebased methods still surpass deep learning models in performance on survival analysis datasets Secondly, we introduce a Python library, SurvLIMEpy, to enhance model explainability for time-to-event data, demonstrating its validity through experiments with both simulated and large-scale melanoma datasets. Thirdly, our analysis of feature importance in the trained models reveals that the features identified by SurvLIMEpy align closely with clinically relevant features reported in medical literature. Fourthly, we show that machine learning models for patient stratification outperform the AJCC staging system when combined with Quantile-Based Survival Clustering. Finally, we explore the use of AI to predict biomarker status from WSIs, utilizing selfsupervised feature extractors and Multiple Instance Learning. The concluding sections discuss the effectiveness of the methodologies employed, their benefits, and their potential impact on the treatment of skin cancer patients while also outlining future research directions to address this critical medical and scientific challenge.