Abstract
Advisors: Xavier Giró-i-Nieto (UPC) and Matthias Zeppelzauer (TU Wien)
Degree: Telecommunications Engineering (5 years) at Telecom BCN-ETSETB (UPC)
Currently, there are highly competitive results in the field of object recognition based on the aggregation of point-based features. The aggregation process, typically with an average or max-pooling of the features generates a single vector that represents the image or region that contains the object.
The aggregated point-based features typically describe the texture around the points with descriptors such as SIFT. These descriptors present limitations for wired and textureless objects. A possible solution is the addition of shape-based information. Shape descriptors have been previously used to encode shape information and thus, recognise those types of objects. But generally an alignment step is required in order to match every point from one shape to other ones. The computational cost of the similarity assessment is high.
We purpose to enrich location and texture-based features with shape-based ones. Two main architectures are explored: On the one side, to enrich the SIFT descriptors with shape information before they are aggregated. On the other side, to create the standard Bag of Words histogram and concatenate a shape histogram, classifying them as a single vector.
We evaluate the proposed techniques and the novel features on the Caltech-101 dataset.
Results show that shape features increase the final performance. Our extension of the Bag of Words with a shape-based histogram(BoW+S) results in better performance. However, for a high number of shape features, BoW+S and enriched SIFT architectures tend to converge.
Final grade: A with honors (10/10)