Abstract
This paper presents a human action recognition framework based on the theory of nonlinear dynamical systems. The ultimate aim of our method is to recognize actions from multi-view video. We estimate and represent human motion by means of a virtual skeleton model providing the basis for a view-invariant representation of human actions. Actions are modeled as a set of weighted dynamical systems associated to different model variables. We use time-delay embeddings on the time series resulting of the evolution of model variables along time to reconstruct phase portraits of appropriate dimensions. These phase portraits characterize the underlying dynamical systems. We propose a distance to compare trajectories within the reconstructed phase portraits. These distances are used to train SVM models for action recognition. Additionally, we propose an efficient method to learn a set of weights reflecting the discriminative power of a given model variable in a given action class. Our approach presents a good behavior on noisy data, even in cases where action sequences last just for a few frames. Experiments with marker-based and markerless motion capture data show the effectiveness of the proposed method. To the best of our knowledge, this contribution is the first to apply time-delay embeddings on data obtained from multi-view video.