SGR14 - Image and Video Processing Group

Type Start End
National Jan 2014 Apr 2017
Responsible URL
Josep R. Casas Image Processing Group

Reference

Ref. 2014 SGR 1421, Generalitat de Catalunya, Agència de Gestió d'Ajuts Universitaris i de Recerca (AGAUR)

Description

The GPI is a Consolidated Research Group of the Catalan Government continuously since 1999 (Calls: 1999-2001, 2002-2005, 2005-2008, 2009-2013, 2014-2016). SGR14 - Image and Video Processing Group was the active GPI project in the 2014 call of the AGAUR.

This is a baseline project integrating research lines of the GPI group both in Video Analysis and Video Representation. Graph and tree notions have naturally appeared in the research of the group for various image/data analysis and representation tasks. In some cases, the graph structure is already present in the data but, in many occasions, the structure has to be inferred from the raw data. In this field, the Group has an extensive knowledge of tree-based presentations of data through the structuring of level lines (Max/Min-tree [Salembier98]) or the structuring of elementary regions (Binary Partition Tree [Salembier00, Vilaplana08, Calderero10]) or of graphs estimation techniques to represent 3D scenes [Casas06] or the human body for example [Canton11, Navarro12]. Some of these estimation approaches look for homogeneity or correlation in the data whereas other search for discontinuities and lack of correlation. Can we bridge the gap between both approaches and get an improved representation? The combination of homogeneity and correlation with discontinuities and lack of correlation could improve the estimation of data structures. Even the combinination of tree-based representations with classical graph based descriptions is considered. In the former case, the relationship between tree nodes generally represents an inclusion relationship, whereas in graphs the relationship generally addresses the notion of neighbor. Few progresses have been done in the direction of combining both relationships. This issue is considered in this project as essential as it opens the door to the notions of multi-resolution or hierarchical graphs.

Once the graph or tree representation is constructed, it has to be processed to extract the pertinent information in the context of the application. Up to now, the Group has developed interesting strategies to populate the graphs and trees with application-specific features and to process these features. However, the sizes of the graphs were relatively limited and the processing approaches were fairly simple including detection by thresholding or functional optimization via graph cut. The applications we want to be able to deal with in the future (such as search and retrieval in huge database, earth monitoring through high-resolution remote sensing, high-throughput applications in genomics and connectomics) will generate much larger graph or tree representations. Furthermore, the relevant features will be more difficult to estimate and often noisy, or even missing. Therefore, we want to bring to the next level our ability to process “graph signals” and “tree signals”. Here the notion of “graph and tree signals” has to be understood as graphs and trees that have been populated with features attached to either the vertices or the edges. Most of classical graph theory has focused on the analysis of the graph structure, whereas here we want to process the signal that are defined on a graph or tree support. Therefore, another major objective of the Group is to develop a full toolkit for graph signal processing. This topic is clearly emerging in the scientific community and we would like to participate and contribute to this developing field. GPI will develop signal processing tools in the context of Graph and Trees including convolution-based and morphological filters, frequency analysis and transform, wavelet decomposition and processing in the transformed domain, down-sampling and up-sampling, interpolation, etc. Higher-level processing tools are also of interest, in particular, segmentation and classification algorithms. Note that, in this context, we are referring to segmentation or classification of the graph or tree structures themselves and not, as classically done, using a graph structure to segment or classify the raw initial information (a typical example is using graph cut to segment an image). Simple generic models based on 3D Spatial-Color Gaussian Mixture Models will be used for basic graph segmentation (i.e. foreground-background), whereas higher level or targeted models such as planar structures or part-based models will inform graph-based algorithms for producing the segmentation from the target information

The theoretical foundations of this baseline project is at the same time a logical extension of the Group's previous background and the mean to participate and contribute to the emerging field of graph signals processing. Because of our long-term interest in hierarchical graph and tree representations, we believe that we will have a unique perspective in this field and a high contribution potential. This research work will allow us to face new challenges for the applications in which the Group has a good knowledge.

Publications

Duarte A, Surís D, Salvador A, Torres J, Giró-i-Nieto X. Temporal-aware Cross-modal Embeddings for Video and Audio Retrieval. In: NIPS 2017 Women in Machine Learning Workshop (WiML). NIPS 2017 Women in Machine Learning Workshop (WiML). Long Beach, CA, USA: NIPS 2017 Women in Machine Learning Workshop; 2017. (155.1 KB)
Fernàndez D, Varas D, Espadaler J, Ferreira J, Woodward A, Rodríguez D, Giró-i-Nieto X, Riveiro JCarlos, Bou E. ViTS: Video Tagging System from Massive Web Multimedia Collections. In: ICCV 2017 Workshop on Web-scale Vision and Social Media . ICCV 2017 Workshop on Web-scale Vision and Social Media . Venice, Italy; 2017. (1.18 MB)
Lidon A, Bolaños M, Dimiccoli M, Radeva P, Garolera M, Giró-i-Nieto X. Semantic Summarization of Egocentric Photo Stream Events. In: ACM Multimedia 2017 Workshop on Lifelogging Tools and Applications. ACM Multimedia 2017 Workshop on Lifelogging Tools and Applications. Mountain View, CA, USA: ACM; 2017. (3.08 MB)
Varas D. Region-based Particle Filter Leveraged with a Hierarchical Co-clustering Marqués F. Signal Theory and Communications Department. 2016 .
Gurrin C, Giró-i-Nieto X, Radeva P, Dimiccoli M, Johansen H, Joho H, Singh VK. LTA 2016 - The First Workshop on Lifelogging Tools and Applications. In: ACM Multimedia. ACM Multimedia. Amsterdam, The Netherlands: ACM; 2016. (385.75 KB)
Pan J, McGuinness K, Sayrol E, O'Connor N, Giró-i-Nieto X. Shallow and Deep Convolutional Networks for Saliency Prediction. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. IEEE Conference on Computer Vision and Pattern Recognition, CVPR. Las Vegas, NV, USA: Computer Vision Foundation / IEEE; 2016. (466.13 KB)
Cabezas F, Carlier A, Salvador A, Giró-i-Nieto X, Charvillat V. Quality Control in Crowdsourced Object Segmentation. In: IEEE International Conference on Image Processing (ICIP), 2015. IEEE International Conference on Image Processing (ICIP), 2015. ; 2015. (362.33 KB)
Pan J, Giró-i-Nieto X. End-to-end Convolutional Network for Saliency Prediction. Boston, MA (USA): arXiv; 2015. (1.18 MB)
Mohedano E, Healy G, McGuinness K, Giró-i-Nieto X, O'Connor N, Smeaton AF. Improving Object Segmentation by using EEG signals and Rapid Serial Visual Presentation. Multimedia Tools and Applications. 2015 . (3.86 MB)
Salvador A, Zeppelzauer M, Manchon-Vizuete D, Calafell A, Giró-i-Nieto X. Cultural Event Recognition with Visual ConvNets and Temporal Models. In: CVPR ChaLearn Looking at People Workshop 2015. CVPR ChaLearn Looking at People Workshop 2015. ; 2015. (1.09 MB)

Pages