Salvador A, Baradad M, Giró-i-Nieto X, Marqués F. Recurrent Semantic Instance Segmentation. Long Beach, CA, USA: NIPS 2017 Women in Machine Learning Workshop; In Press.  (1.15 MB)


We present a recurrent model for end-to-end instance-aware semantic segmentation that is able to sequentially generate pairs of masks and class predictions. Our proposed system is trainable end-to-end for instance segmentation, does not require further post-processing steps on its output and is conceptually simpler than current methods relying on object proposals \cite{FullyConvolutionalIASS}. While recent works \cite{Paredes,RenZ16} have proposed recurrent architectures for instance segmentation, these are trained and evaluated for a single category.

Our model (depicted in Figure \ref{arch}) is composed of a series of Convolutional LSTMs \cite{ConvLSTM} that are applied in chain with upsampling layers in between to predict a sequence of binary masks and associated class probabilities. Skip connections are incorporated in our model by concatenating the output of the corresponding convolutional layer in the base model with the upsampled output of the ConvLSTM. Binary masks are finally obtained with a 1x1 convolution with sigmoid activation. We concatenate the side outputs of all ConvLSTM layers and apply a per-channel max-pooling operation followed by a single fully-connected layer with softmax activation to obtain the category for each predicted mask.

 We train and evaluate our models with the Pascal VOC 2012 dataset \cite{PASCAL_VOC}. Figure \ref{result} shows some example predictions from our model, where object coloring indicates the order in which objects were found by the network (i.e. 0: dark blue, 1: green, 2: red, 3: light blue). Future work will aim at analyzing and understanding the behavior of the network on other datasets, comparing the system with state of the art solutions and study the relationship of the learned object discovery patterns of our model with those of humans.