Abstract

Advisors: Xavier Giró-i-Nieto (UPC) and Omar Pera (Pixable)

Degree: Electronic Engineering (5 years) at Telecom BCN-ETSETB (UPC)

This Final degree thesis summarizes the tasks that have been developed during an internship in Pixable Inc. in New York City together with the tasks related to the Me- diaeval 2013 evaluation campaign, where I participated with the team of Universitat Politecnica de Catalunya (UPC). The main focus of my work was on the Photofeed service, that is a photo archive service in the cloud.

The popularisation of the storage of photos on the cloud has opened new oppor- tunities and challenges for the organization and extension of photo collections. In my thesis I have developed a light computational solution for the clustering of web photos based on social events. The proposal combines a first oversegmentation of the photo collections of each user based on temporal cues, as previously proposed in the PhotoTOC algorithm [Platt et al, PACRIM 2003]. On a second stage, the resulting mini-clusters are merged based on contextual metadata such as geolocation, keywords and user IDs.

Closely relate to photo clustering we can study mail classification too. Additional tasks were developed for the Contactive company in this field. In order to solve the problems that Contactive was facing in mail analysis tasks, I developed methods for automatically identifying signature blocks and reply lines in plain-text email messages. This analysis has many potential applications, such as preprocessing email for text-to- speech systems; anonymization of email corpora; improving automatic content-based mail classifiers; and email threading. This method is based on applying machine learning methods to a sequential representation of an email message, in which each email is represented as a sequence of lines, and each line is represented as a set of features.

Final grade: A with honors (10/10)

 

Daniel Manchon