SGR17 - Image and Video Processing Group

Type Start End
National Jan 2017 Sep 2021
Responsible URL
Josep R. Casas Image Processing Group


Ref. 2017 SGR 1768, Generalitat de Catalunya, Agència de Gestió d'Ajuts Universitaris i de Recerca (AGAUR)


The GPI is a Consolidated Research Group of the Catalan Government continuously since 1999 (Calls: 1999-2001, 2002-2005, 2005-2008, 2009-2013, 2014-2016, 2017-2021). SGR17 - Image and Video Processing Group was the active GPI project in the 2017 call of the AGAUR.


Scientific challenges and objectivesScientific challenges and objectives

The development of technologies related to the capture, storage, search, distribution, transfer, analysis and visualization of ever growing heterogeneous datasets represents tremendous difficulties as well as opportunities, and has become a major trend in the field of Information and Communication Technology. The importance of the related research has been recognized at the European level, as in the Horizon 2020 program, where it finds its place both in the Industrial Leadership, for example in the activity line “Content technologies and information management", and in the Societal Challenges, relating to the need for structuring data in all sectors of the digital economy (health, climate, transport, energy, etc.). At the Spanish level, this issue has also been recognized as one of the important challenges of the “Plan Estatal de Investigación Científica y Técnica y de Innovación”. In particular, the challenge for “Economía y Sociedad Digital” high-lights the need for “Development, Innovation and Adoption of Solutions and Technologies for Open/Linked/Big Data”. Furthermore, the application of vision analysis technologies is currently impacting the industry nowadays. In the Barcelona area, Advanced Driver Assistance Systems (ADAS) have been undertaken as a strategic objective by the Catalan automotive industry (Seat, Volkswagen) and UPC, who pushed the CARNET initiative. CARNET is currently getting involved in a KIC on Urban Mobility 2018 led at UPC level by E. Sayrol, member of our group.

GPI will develop and study tools combining, on the one hand, graph signal representation and processing ideas with, on the other hand, machine learning technology. Machine learning will be used for several different purposes such as to provide a classification decision, to learn a mapping or a model to be used in a data processing architecture, to learn better features than handcrafted features or to aggregate several features to create a signal to be further processed. We have envisioned four main objectives focusing on how graph representation and processing will interact with machine learning.

  • Graph-based multimodal representation and machine learning classification: First, in the context of multimodal data representation and analysis, graphs will be used to represent the multimodal data and a detection/classification task will be performed by machine learning. The results will be compared to more classical methods that are not based on machine learning. This approach will be explored for multimodal (faces, overlaid text, speech, etc.) person identification as well as for visual question and answering in the context of search and retrieval applications. A similar processing strategy will be used for lifelogging or well-being & mental health monitoring where multimodal graph representation will combine physiological signals with egocentric vision.

  • Graph combination through machine learning mappings: Machine learning can also be used to estimate a model or a mapping function. For example, in the context of biomedical applications, graph representations will be used to combine MRI images with other information of the patient, and regression will characterize the progress of a phenomenon such as gray matter evolution in Alzheimer's disease. The inference of a model through machine learning techniques also plays an essential role for the merging or fusion of graphs. For example, in the case of multi-view plus depth information, graphs may represent the relation between the 2D segmented regions of the different views. Furthermore, merging of graphs representing multiple modalities in biomedical, ADAS or remote sensing applications will be tackled in our research as, currently, very simple merging algorithms are being used. We will investigate if machine learning techniques can provide better aggregation models.
  • Machine learned features: Deep learning architectures are currently being extensively studied in particular for their ability to infer very powerful features and overcome limitations related to handcrafted features. Our research group already gained some experience in this field, in particular for saliency prediction. and this research will build on this knowledge to further investigate deep learning architectures and to apply them to various fields including saliency prediction, specific object detection (e.g. automotive applications) and characterization of stages in Alzheimer disease.
  • Aggregated machine learned features processed over graph representations: Assuming a graph representing an image or a video sequence has been inferred and its nodes populated with features, we will not use machine learning to take a decision on the presence of a certain object or of a certain event but, as a mean to aggregate the various features and estimate a likelihood function that will populate again the graph structure. Then, taking advantage of the correlation described by the graph (in particular for hierarchical graphs), the likelihood values will be further processed with signal graph tools, such as graph filters or graph morphology, to improve the robustness of the final detection. This approach will be investigated in particular for foreground/background segregation in video sequences and for object detection in remote sensing applications (hyper-spectral and SAR images).