DeeLight: Efficient Deep Learning for Video Sequences and Point Clouds

Type Start End
National Sep 2021 Aug 2024
Responsible URL
Xavier Giro-i-Nieto & Philippe Salembier




Data, computation and large neural networks are the three key components explaining the recent success of deep learning methods. The recent history of deep learning research hints at an ever-growing availability of high quality data, either in the form of datasets or simulated environments. The generalized version of Moore's law suggests a similar trend for computational power, with the cost of each unit of computation decreasing exponentially over time. Large scale computation and data were necessary conditions to unlock the potential of deep learning, but they alone do not explain the recent advances in the field. 

From one hand, the classic supervised learning approach in which all training data is available together with human generated labels does not scale when addressing heavy data volumes. The human annotation cost is often the most limiting factor when bringing deep learning solutions into production. This limitation has motivated the advances in machine learning in data efficiency, in fields such as self-supervised learning, distillation, incremental learning, learning from simulators, unsupervised learning. All these cases aim at a higher data efficiency of the available annotation cost.

On the other hand, the progress in neural architectures and their learning optimization has been boosted motivated by the need of making efficient use of the available computational power. Adopting lighter computational solutions does not only cut down the training time, but also reduces the high amount of memory requirements and energy consumption required by the specialized hardware used for this training: GPUs and TPUs.

These two data and computation efficiency challenges become especially pivotal when processing large amounts of data, as it is the case of video and point cloud sequences. High dimensional data volumes are central in computer vision applications and when developing a large variety of applications, ranging from large scale deployments for mobility in smart cities, to household monitoring systems assisting elderly people living alone. 

Our research project aims at developing light deep learning solutions from both data and computation perspectives, focusing on computer vision applications, upon which our team is highly experienced. The main hypothesis is that the generic approaches proposed in classic problems in machine learning, cannot be applied off-the-shelf when dealing with high dimensional tasks that involve sequences of videos and point clouds. Notice that our proposal diverges from most existing research efforts aiming at accuracy, which has been shown to increase by adding computational and data to the training of deep neural networks. From our perspective, the main challenge of these high performance systems in terms of accuracy is actually in their two supporting pillars: data and computation.

Acknowledgements for the publications listed below:

This work was supported by the Spanish Research Agency (AEI) under project PID2020-117142GB-I00 of the call MCIN/ AEI /10.13039/501100011033.


Caselles P, Ramon E, Garcia J, Giró-i-Nieto X, Moreno F, Triginer G. SIRA: Relightable Avatars from a Single Image. In: Winter Conference on Applications of Computer Vision (WACV). Winter Conference on Applications of Computer Vision (WACV). ; 2023.
Mosella-Montoro A, Ruiz-Hidalgo J. SkinningNet: Two-Stream Graph Convolutional Neural Network for Skinning Prediction of Synthetic Characters. In: IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR). IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR). New Orleans, USA; 2022. (5.45 MB)
Geleta M, Puntí C, McGuinness K, Pons J, Canton-Ferrer C, Giró-i-Nieto X. PixInWav: Residual Steganography for Hiding Pixels in Audio. In: ICASSP. ICASSP. ; 2022. (5.77 MB)
Bonet D, Ortega A, Ruiz-Hidalgo J, Shekkizhar S. Channel Redundancy and Overlap in Convolutional Neural Networks with Channel-Wise NNK Graphs. In: International Conference on Acoustics, Speech and Signal Processing. International Conference on Acoustics, Speech and Signal Processing. ; 2022.
Schürholt K, Knyazev B, Giró-i-Nieto X, Borth D. Hyper-Representations for Pre-Training and Transfer Learning. In: NeurIPS 2022 - Neural Information Processing Systems. NeurIPS 2022 - Neural Information Processing Systems. ; 2022.
Duarte A. Data and methods for a visual understanding of sign languages Torres J, Giró-i-Nieto X. Signal Theory and Communications. 2022 ;PhD.
Duarte A, Albanie S, Giró-i-Nieto X, Varol G. Sign Language Video Retrieval with Free-Form Textual Queries. In: CVPR 2022 - CVF/IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2022 - CVF/IEEE Conference on Computer Vision and Pattern Recognition. ; 2022. (3.22 MB)
Domenech T. Hiding Images in their Spoken Narratives McGuinness K, Pons J, Giró-i-Nieto X. 2022 . (15.23 MB)
Salgueiro L, Marcello J, Vilaplana V. Single-image super-resolution of Sentinel-2 low resolution bands with residual dense convolutional neural networks. Remote Sensing. 2021 ;13(24):5007.
Abadal S, Salgueiro L, Marcello J, Vilaplana V. A Dual Network for Super-Resolution and Semantic Segmentation of Sentinel-2 imagery. Remote Sensing. 2021 ;13(22):4547.