Abstract

Histopathology is the gold standard for cancer diagnosis, where pathologists examine tissue samples to assess cellular morphology, tissue architecture, and biomarker expression. The digitization of histological slides has enabled the application of artificial intelligence to assist in this process. However, deploying deep learning models in pathology remains challenging due to the gigapixel-scale nature of whole slide images, variability introduced by staining protocols and tissue morphology, and the scarcity of high-quality annotations. This thesis focuses on the development of deep learning methodologies for the identification and characterization of cells in histopathological slides, addressing these core challenges. Inspired by the way pathologists assess slides, the proposed methods target both individual cells and their spatial organization within the tissue. A transformer-based model, CellNuc-DETR, is introduced for efficient cell nuclei detection and classification, achieving state-of-the-art performance while significantly reducing inference time. To address annotation scarcity in immunohistochemistry, the model is adapted using unsupervised domain adaptation techniques, enabling knowledge transfer from hematoxylin and eosin-stained slides without requiring annotated immunohistochemistry data. To incorporate broader spatial context into cell classification, the thesis explores graph-based representations of histological tissue. It proposes a novel graph construction strategy that integrates both cell-level and patch-level nodes, enabling the fusion of cellular features with visual and contextual information in a scalable manner. To support annotation-efficient learning, the thesis introduces Regularized Graph Infomax, a self-supervised learning algorithm designed for representation learning on graphs without manual annotations. This algorithm is then applied to the proposed cell–patch graphs to enable efficient, large-context cell classification with minimal supervision. Finally, the thesis demonstrates the clinical relevance of these methods in a study on endometrial carcinoma. A weakly supervised whole slide classifier is trained to predict molecular subtypes, and the developed cell detection and classification tools are used to extract cell-level biomarkers from predicted regions of interest. These biomarkers enable the quantification of subtype-specific cell spatial organization differences, linking computational predictions to interpretable biological features. By integrating transformers, graph-based representations, and self-supervised learning, this thesis introduces scalable and annotation-efficient methodologies for computational pathology. These approaches enhance the efficiency, robustness, and generalization of artificial intelligence models, contributing to the development of clinically relevant tools for cancer diagnosis and characterization.