UPC at CVPRW ActivityNet Challenge 2016

Resource Type Date
Software 2016-06-18


Main Contributor: Alberto Montes (BSc thesis at UPC ETSETB TelecomBCN Spring 2016).

Secondary Contributors: Santiago Pascual de la Puente, Amaia Salvador, Ignasi Esquerra and Xavier Giró-i-Nieto.

This software contains our proposed solution for both the classification and detection tasks of the ActivityNet Challenge 2016We propose a system consisting of two different stages. First, the videos are organized in 16-frame clips, for which we individually extract both audio and visual features. Visual features were extracted from a pretrained 3D convolutional network (C3D), while MFCC coefficients were extracted for audio. On top of these features, we train a recurrent neural network to predict the activity sequence of each video at the granularity of the 16-frames clip.

Our submission obtained a mAP=0.58741 in the classification task, and a mAP=0.22369 in the detection task, according to the ActivityNet 2016 leaderboard

Find the software and details in our repo on GitHub.

People involved

Amaia Salvador PhD Candidate
Xavier Giró Associate Professor