Video Object Linguistic Grounding | Image Processing Group

Herrera-Palacio A, Ventura C, Giró-i-Nieto X. Video Object Linguistic Grounding. In ACM Multimedia Workshop on Multimodal Understanding and Learning for Embodied Applications (MULEA). Nice, France: ACM; 2019.

Google Scholar
BibTex

(441.12 KB)

Abstract

The goal of this work is segmenting on a video sequence the objects which are mentioned in a linguistic description of the scene. We have adapted an existing deep neural network that achieves state of the art performance in semi-supervised video object segmentation, to add a linguistic branch that would generate an attention map over the video frames, making the segmentation of the objects temporally consistent along the sequence.

Paper in ACM Digital Library and UPCommons.
ACM Multimedia 2019 Workshop on Multimodal Understanding and Learning for Embodied Applications

Video Object Linguistic Grounding from Universitat Politècnica de Catalunya

Xavier Giro-i-Nieto and Amanda Duarte in ACM Multimedia 2019

Projects

	MALEGRA - Multimodal Signal Processing and Machine Learning on Graphs
	Language and Vision

Image Processing Group

Search form

User login

Abstract

Projects