Recurrent Instance Segmentation using Sequences of Referring Expressions

Herrera-Palacio A, Ventura C, Silberer C, Sorodoc I-T, Boleda G, Giró-i-Nieto X. Recurrent Instance Segmentation using Sequences of Referring Expressions. In NeurIPS workshop on Visually Grounded Interaction and Language (ViGIL). Vancouver, Canada; 2019.

(1.13 MB)

Abstract

The goal of this work is segmenting the objects in an image which are referred to by a sequence of linguistic descriptions (referring expressions). We propose a deep neural network with recurrent layers that output a sequence of binary masks, one for each referring expression provided by the user. The recurrent layers in the architecture allow the model to condition each predicted mask on the previous ones, from a spatial perspective within the same image. Our multimodal approach uses off-the-shelf architectures to encode both the image and the referring expressions. The visual branch provides a tensor of pixel embeddings that are concatenated with the phrase embeddings produced by a language encoder. Our experiments on the RefCOCO dataset for still images indicate how the proposed architecture successfully exploits the sequences of referring expressions to solve a pixel-wise task of instance segmentation.

Recurrent Instance Segmentation with Linguistic Referring Expressions from Universitat Politècnica de Catalunya

Projects

	MALEGRA - Multimodal Signal Processing and Machine Learning on Graphs
	Deep learning
	Language and Vision

Image Processing Group

Search form

User login

Abstract

Projects