Human manipulation dataset

Resource Type Date
Dataset 2016-02-05



The classical 2D image segmentation challenges have recently shifted to RGBD (or point cloud) 3D data. New challenges have also emerged as the depth information is usually noisy, sparse and unorganized.

The lack of ground truth labeling for 3D point cloud data may limit the development and comparative evaluation of methods for 3D segmentation. A new ground truth labeling for a previously published dataset [1] is available below. The labeling focuses on interaction (merge and split between object point clouds), differentiating itself from the few existing labeled RGBD datasets, more oriented to Simultaneous Localization And Mapping (SLAM) tasks.

We provide the 3D point cloud ground truth labeling for the original dataset in [1]: 

  • The original dataset has a calibrated RGB-D data stream recorded using a Kinect sensor at VGA (640x480) resolution at 30Hz framerate. It contains an extended set of examples of human manipulation actions performed by 8 actors.
  • The new ground truth labels are provided pointwisely for the 3D point cloud in each frame in an "elaborated" way, which yields 7 labeled sequences varying from a single attachment "manipulator-object" to multi-attachments, low motion to higher motion, double attached objects to multiple attached objects etc.
  • Each sequence has around 700 frames and all of them (over 4000 frames) are labelled. 3D point cloud ground truth labeling is provided.  
  • Three 3D point cloud ground truth labeling examples are shown below.

exmaple of human manipulation data ground truth labelingexmaple of human manipulation data ground truth labelingexmaple of human manipulation data ground truth labeling


    - Sequence1    - Sequence2     - Sequence3   - Sequence4   - Sequence5    - Sequence6     - Sequence7

Dataset organization

We provide the labeled 3D point cloud for the foreground of each frame in these 7 sequences. The ground truth for each sequence is packaged as a zip file which the download link is available above. For each frame in a sequence, the ground truth labeling is saved as a .mat file named "ds_gt~frameNum.mat".  Each of the .mat files represents a matrix with 5 columns. The first three columns stand for the 3D coordinate (x-y-z) of a point on the point cloud in meter in this frame. The 5th column stands for the the ground truth labeling for this point.


[1] Audio-visual classification and detection of human manipulation actions A. Pieropan, G. Salvi, K. Pauwels, and H. Kjellström. In IROS 14.

People involved

Xiao Lin PhD Candidate
Josep R. Casas Associate Professor
Montse Pardàs Professor

Related Publications

X. Lin, Casas, J., and Pardàs, M., 3D Point Cloud Segmentation Oriented to The Analysis of Interactions, in The 24th European Signal Processing Conference (EUSIPCO 2016), Budapest, Hungary, 2016. (10.54 MB)