Canton-Ferrer C. Human Motion Capture with Scalable Body Models. Casas J, Pardàs M. Universitat Politècnica de Catalunya (UPC); 2009.  (13.45 MB)


Capturing and tracking human motion is becoming a hot research topic due to the number of applications that can be addressed using this information, ranging from action recognition, human-computer interfaces and biometrics. This PhD thesis addresses the problem of extracting the pose parameters of a human body in a multi-camera environment relying on Monte Carlo techniques.

Extracing the describing parameters (pose) of an articulated model of the human body from information provided by multiple cameras can be efficiently tackled using the standard Bayesian prediction and update formulation. However, due to the high dimensionality of the pose space, standard techniques based on linear and Gaussian assumptions are not suitable. Instead, Monte Carlo methods based on a sampled representation of the involved likelihood functions yield to a promising research direction. In this thesis, we present a number of contributions to this topic based on a coarse-to-fine analysis scheme. The input data to all presented algorithms will be a 3D reconstruction of the scene, described by colored voxels, thus combining the information provided by all camera views into a unified data representation.

In a first stage, subjects are coarsely approximated by an ellipsoid and their centroids are estimated and tracked. A novel approach achieving real-time performance is presented based a surface sampling of the objects in the scene: the Sparse Sampling algorithm. In this filtering scheme, an independent tracker is assigned to every target and an exclusion mechanism is defined to avoid interference among targets. Finally, the obtained centroid positions are employed afterwards to initialize a specific pose estimation algorithm.

Two pose estimation algorithms are presented based on the seminal principle of the annealead particle filter technique. The first one is a low cost approach to marker-based human motion capture and, the second, is a markerless technique relying on likelihood functions computed directly on the 3D voxel representation. In both approaches, kinematic constrains are employed to avoid unfeasible poses. Although these algorithms provide satisfactory results when dealing with accurate input data, they tend to loose track when processing noisy measurements and occluded body parts. 

Scalability of the structure of the human body is exploited to define two robust alternatives to analyze faulty data. In the first case, the Scalable Human Body Model-Annealed Particle Filter, is presented as filtering approach adding an extra annealing level to the classical annealed particle filter approach: the body hierarchy annealing loop. In this way, a progressive fitting is performed in a coarse-to-fine manner thus yielding to both more efficient and accurate results. Another alternative is presented employing a human body model hierarchy where different limbs are added progressively to the model. This allows detecting those parts that are occluded (for instance, by furniture) and disregard them into the likelihood evaluation step of the filtering scheme.

Finally, in order to evaluate all the systems proposed in this thesis, a new methodology is presented. Existing methods based on computing the mean and variance of the committed estimation error tend to produce biased figures when a subset of the human body is not tracked properly. We proposed two alternative metrics that avoids this situations and therefore allow a fairer comparison among algorithms.