Multiple object tracking is a broadly used task in multi- ple applications, all the way from bioengineering to security applications. In this paper we propose a variation of RVOS by adding the center estimation of detected instances, by means of a second head in the decoder which is assigned the task of detecting the corresponding object’s bounding box arithmetic center. We have trained the model using three variants of the cross-entropy loss, which has been adapted to tackle the class imbalance caused by the fact that the center of an object is represented by only one pixel of the image, and have obtained some promising results.