Lin X, Casas J, Pardàs M. Temporally Coherent 3D Point Cloud Video Segmentation in Generic Scenes. IEEE Transactions on Image Processing. 2018;27(6):3087 - 3099.  (24.37 MB)


Video segmentation is an important building block for high level applications such as scene understanding and interaction analysis. While outstanding results are achieved in this field by state-of-the-art learning and model based methods, they are restricted to certain types of scenes or require a large amount of annotated training data to achieve object segmentation in generic scenes. On the other hand, RGBD data, widely available with the introduction of consumer depth sensors, provides actual world 3D geometry compared to 2D images. The explicit geometry in RGBD data greatly helps in computer vision tasks, but the lack of annotations in this type of data may also hinder the extension of learning based methods to RGBD. In this paper, we present a novel generic segmentation approach for 3D point cloud video (stream data) thoroughly exploiting the explicit geometry in RGBD. Our proposal is only based on low level features, such as connectivity and compactness. We exploit temporal coherence by representing the rough estimation of objects in a single frame with a hierarchical structure, and propagating this hierarchy along time. The hierarchical structure provides an efficient way to establish temporal correspondences at different scales of object-connectivity, and to temporally manage the splits and merges of objects. This allows updating the segmentation according to the evidence observed in the history. The proposed method is evaluated on several challenging datasets, with promising results for the presented approach.