The 'old' remote falls short of requirements when confronted with digital convergence for living room displays. Enriched options to watch, manage and interact with content on large displays demand improved means of interaction. Concurrently, gesture recognition is increasingly present in human-computer interaction for gaming applications. In this paper we propose a gesture localization framework for interactive display of audio-visual content. The proposed framework works with range data captured from a single consumer depth camera. We focus on still gestures because they are generally user friendly (users do not have to make complex and tiring movements) and allow formulating the problem in terms of object localization. Our method is based on random forests, which have shown an excellent performance on classification and regression tasks. In this work, however, we aim at a specific class of localization problems involving highly unbalanced data: positive examples appear during a small fraction of space and time. We study the impact of this natural unbalance on the random forest learning and we propose a framework to robustly detect gestures on range images in real applications. Our experiments with offline data show the effectiveness of our approach. We also present a real-time application where users can control the TV display with a reduced set of still gestures.

Demos and Resources