Abstract
This paper proposes a system that relates objects in an image using occlusion cues and arranges them according to depth. The system does not rely on a priori knowledge of the scene structure and focuses on detecting special points, such as T-junctions and highly convex contours, to infer the depth relationships between objects in the scene. The system makes extensive use of the binary partition tree as hierarchical region-based image representation jointly with a new approach for candidate T-junction estimation. Since some regions may not involve T-junctions, occlusion is also detected by examining convex shapes on region boundaries. Combining T-junctions and convexity leads to a system which only relies on low level depth cues and does not rely on semantic information. However, it shows a similar or better performance with the state-of-the-art while not assuming any type of scene.