Abstract

Crowded video sequences like those of demonstrations offer an interesting challenge for object extraction and tracking due to their complexity: taken outdoors, often in extreme illumination conditions, with faces not in frontal view, perspective, complex background, etc. Tracking of the individuals becomes a difficult task due to the high number of occlusions. In order to deal with these problems a mutual feedback spatial-temporal detection algorithm is proposed. The system improves its efficiency thanks to a cooperative approach between spatial detection and temporal tracking. Spatial detection is based on skin color classification and shape analysis by morphological tools. Temporal tracking is based on the analysis of the optical flow. The mutual feedback approach improves both spatial detection and temporal tracking. In order to deal with multiple occlusions, a graph-based approach taking advantage of the neighborhood consistency has been introduced.