Abstract

Program: Master's Degree in Telecommunications Engineering

Grade: A with honours (10.0/10.0)

Image retrieval in realistic scenarios targets large dynamic datasets of unlabeled images. In these cases, training or ne-tuning a model every time new images are added to the database is neither ecient nor scalable. Convolutional Neural Networks trained for image classi cation over large datasets have been proven e ective feature extractors when transferred to the task of image retrieval. The most successful approaches are based in encoding the activations of convolutional layers as they convey the image spatial information. Our proposal goes beyond and aims at a local-aware encoding of these features depending on the predicted image semantics, with the advantage of using only of the knowledge contained inside the network. In particular, we employ Class Activation Maps (CAMs) to obtain the most discriminative regions of the image from a semantic perspective. Additionally, CAMs are also used to generate object proposals during an unsupervised re-ranking stage after a rst fast search. Our experiments on two public available datasets for instance retrieval, Oxford5k and Paris6k, demonstrate that our system is competitive and even outperforms the current state-of-the-art when using o -the-shelf models trained on the object classes of ImageNet.

[Project page]