Advisors: Eva Mohedano, Kevin McGuinness and Xavier Giro-i-Nieto

Program: Master's degree in Telecommunications Engineering (MET)

Grade: A (9.7/10.0)

This thesis introduces an architecture to improve the accuracy of a Convolutional Neural Network trained for image classification using visual saliency predictions from the original images. In this thesis the accuracy of a Convolutional Neural Network (CNN) trained for classification has been improved using saliency maps from the original images. The network had an AlexNet architecture and was trained using 1.2 million images from the Imagenet dataset. Two methods had been explored in order to exploit the information from the visual saliency predictions. The first methodologies implemented applied the saliency maps directly to the existing layers of the CNN, which in some cases were already trained for classification and in other they were initialized with random weights. In the second methodology the information from the saliency maps was merged from a new branch, trained at the same time as the initial CNN. In order to speed up the training of the networks the experiments were implemented using images reduced to 128x128. With this sizes the proposed model achieves 12.39% increase in Top-1 accuracy performance with respect to the original CNN, and additionally reduces the number of parameters needed compared to AlexNet. Regarding the original size images 227x227 a model that increases 1.72% Top-1 accuracy is proposed. To accelerate the training process of the network the images have been reduced. The methodology that provides the higher improvement in accuracy will be implemented using the original size of the images. The results will be compared to those obtained from the network trained only with the original images. All the methodologies proposed are implemented in a network previously trained for classification. Additionally the most successful methodologies will be implemented in the training of a network. The results will provide information about the best way to add saliency maps to improve the accuracy.