Abstract

This thesis explores the application of a deep learning approach for the prediction of media interestingness. Two di erent models are investigated, one for the prediction of image and one for the prediction of video interestingness. For the prediction of image interestingness, the ResNet50 network is ne-tuned to obtain best results. First, some layers are added. Next, the model is trained and fine-tuned using data augmentation, dropout, class weights, and changing other hyper parameters. For the prediction of video interestingness, rst, features are extracted with a 3D convolutional network. Next a LSTM network is trained and fine-tuned with the features. The nal result is a binary label for each image/video: 1 for interesting, 0 for not interesting. Additionally, a con fidence value is provided for each prediction. Finally, the Mean Average Precision (MAP) is employed as evaluation metric to estimate the quality of the fi nal results.