Abstract
Advancements in dermatological artificial intelligence research require high-quality and comprehensive datasets that mirror real-world clinical scenarios. We introduce a collection of 18,946 dermoscopic images spanning from 2010 to 2016, collated at the Hospital ClĂnic in Barcelona, Spain. The BCN20000 dataset aims to address the problem of unconstrained classification of dermoscopic images of skin cancer, including lesions in hard-to-diagnose locations such as those found in nails and mucosa, large lesions which do not fit in the aperture of the dermoscopy device, and hypo-pigmented lesions. Our dataset covers eight key diagnostic categories in dermoscopy, providing a diverse range of lesions for artificial intelligence model training. Furthermore, a ninth out-of-distribution (OOD) class is also present on the test set, comprised of lesions which could not be distinctively classified as any of the others. By providing a comprehensive collection of varied images, BCN20000 helps bridge the gap between the training data for machine learning models and the day-to-day practice of medical practitioners. Additionally, we present a set of baseline classifiers based on state-of-the-art neural networks, which can be extended by other researchers for further experimentation.