DeeLight: Efficient Deep Learning for Video Sequences and Point Clouds

Type Start End
National Sep 2021 Aug 2024
Responsible URL
Xavier Giro-i-Nieto & Philippe Salembier

Reference

PID2020-117142GB-I00

Description

Data, computation and large neural networks are the three key components explaining the recent success of deep learning methods. The recent history of deep learning research hints at an ever-growing availability of high quality data, either in the form of datasets or simulated environments. The generalized version of Moore's law suggests a similar trend for computational power, with the cost of each unit of computation decreasing exponentially over time. Large scale computation and data were necessary conditions to unlock the potential of deep learning, but they alone do not explain the recent advances in the field. 

From one hand, the classic supervised learning approach in which all training data is available together with human generated labels does not scale when addressing heavy data volumes. The human annotation cost is often the most limiting factor when bringing deep learning solutions into production. This limitation has motivated the advances in machine learning in data efficiency, in fields such as self-supervised learning, distillation, incremental learning, learning from simulators, unsupervised learning. All these cases aim at a higher data efficiency of the available annotation cost.

On the other hand, the progress in neural architectures and their learning optimization has been boosted motivated by the need of making efficient use of the available computational power. Adopting lighter computational solutions does not only cut down the training time, but also reduces the high amount of memory requirements and energy consumption required by the specialized hardware used for this training: GPUs and TPUs.

These two data and computation efficiency challenges become especially pivotal when processing large amounts of data, as it is the case of video and point cloud sequences. High dimensional data volumes are central in computer vision applications and when developing a large variety of applications, ranging from large scale deployments for mobility in smart cities, to household monitoring systems assisting elderly people living alone. 

Our research project aims at developing light deep learning solutions from both data and computation perspectives, focusing on computer vision applications, upon which our team is highly experienced. The main hypothesis is that the generic approaches proposed in classic problems in machine learning, cannot be applied off-the-shelf when dealing with high dimensional tasks that involve sequences of videos and point clouds. Notice that our proposal diverges from most existing research efforts aiming at accuracy, which has been shown to increase by adding computational and data to the training of deep neural networks. From our perspective, the main challenge of these high performance systems in terms of accuracy is actually in their two supporting pillars: data and computation.

Acknowledgements for the publications listed below:

This work was supported by the Spanish Research Agency (AEI) under project PID2020-117142GB-I00 of the call MCIN/ AEI /10.13039/501100011033.

Publications

Gené-Mola J, Ferrer-Ferrer M, Hemming J, Dalfsen P, Hoog D, Sanz-Cortiella R, Rosell-Polo JR, Morros JR, Vilaplana V, Ruiz-Hidalgo J, et al. AmodalAppleSize_RGB-D dataset: RGB-D images of apple trees annotated with modal and amodal segmentation masks for fruit detection, visibility and size estimation. Data in Brief. 2024 ;52.
Gené-Mola J, Felip-Pomés M, Net-Barnés F, Morros JR, Miranda JC, J. Satorra A, L. Jones A, J. Sanahuja L, Ruiz-Hidalgo J, Gregorio E. Video-Based Fruit Detection and Tracking for Apple Counting and Mapping. In: IEEE International Workshop on Metrology for Agriculture and Forestry (MetroAgriFor). IEEE International Workshop on Metrology for Agriculture and Forestry (MetroAgriFor). ; 2023. (680.49 KB)
Ferrer-Ferrer M, Ruiz-Hidalgo J, Gregorio E, Vilaplana V, Morros JR, Gené-Mola J. Simultaneous Fruit Detection and Size Estimation Using Multitask Deep Neural Networks  . Biosystems Engineering. 2023 ;233:63-75. (10.36 MB)
Morros JR, Broquetas A, Mateo A, Puig J, Davins M. Real-time lane classification and accident detection for safer micromobility. In: 11th Internationa Congress on Transportation Research. 11th Internationa Congress on Transportation Research. Heraklion, Crete; 2023. (2.28 MB)
Hurtado C, Shekkizhar S, Ruiz-Hidalgo J, Ortega A. Study of Manifold Geometry using Multiscale Non-Negative Kernel Graphs. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Rhodes Island, Greece; 2023. (1.4 MB)
Gené-Mola J, Ferrer-Ferrer M, Gregorio E, Blok PM, Hemming J, Morros JR, Rosell-Polo JR, Vilaplana V, Ruiz-Hidalgo J. Looking behind occlusions: A study on amodal segmentation for robust on-tree apple fruit size estimation. Computers and Electronics in Agriculture. 2023 ;209. (9.02 MB)
Caselles P, Ramon E, Garcia J, Giró-i-Nieto X, Moreno F, Triginer G. SIRA: Relightable Avatars from a Single Image. In: Winter Conference on Applications of Computer Vision (WACV). Winter Conference on Applications of Computer Vision (WACV). ; 2023.
Schürholt K, Taskiran D, Knyazev B, Giró-i-Nieto X, Borth D. Model Zoos: A Dataset of Diverse Populations of Neural Network Models. In: NeurIPS 2022 Track Datasets and Benchmarks. NeurIPS 2022 Track Datasets and Benchmarks. New Orleans, Louisiana, USA.; 2022. (937.59 KB)
Schürholt K, Taskiran D, Knyazev B, Giró-i-Nieto X, Borth D. Model Zoos: A Dataset of Diverse Populations of Neural Network Models. In: NeurIPS. NeurIPS. ; 2022. (937.59 KB)
Mosella-Montoro A, Ruiz-Hidalgo J. SkinningNet: Two-Stream Graph Convolutional Neural Network for Skinning Prediction of Synthetic Characters. In: IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR). IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR). New Orleans, USA; 2022. (5.45 MB)

Pages

Collaborators