DeeLight: Efficient Deep Learning for Video Sequences and Point Clouds

Type Start End
National Sep 2021 Aug 2024
Responsible URL
Xavier Giro-i-Nieto & Philippe Salembier

Reference

PID2020-117142GB-I00

Description

Data, computation and large neural networks are the three key components explaining the recent success of deep learning methods. The recent history of deep learning research hints at an ever-growing availability of high quality data, either in the form of datasets or simulated environments. The generalized version of Moore's law suggests a similar trend for computational power, with the cost of each unit of computation decreasing exponentially over time. Large scale computation and data were necessary conditions to unlock the potential of deep learning, but they alone do not explain the recent advances in the field. 

From one hand, the classic supervised learning approach in which all training data is available together with human generated labels does not scale when addressing heavy data volumes. The human annotation cost is often the most limiting factor when bringing deep learning solutions into production. This limitation has motivated the advances in machine learning in data efficiency, in fields such as self-supervised learning, distillation, incremental learning, learning from simulators, unsupervised learning. All these cases aim at a higher data efficiency of the available annotation cost.

On the other hand, the progress in neural architectures and their learning optimization has been boosted motivated by the need of making efficient use of the available computational power. Adopting lighter computational solutions does not only cut down the training time, but also reduces the high amount of memory requirements and energy consumption required by the specialized hardware used for this training: GPUs and TPUs.

These two data and computation efficiency challenges become especially pivotal when processing large amounts of data, as it is the case of video and point cloud sequences. High dimensional data volumes are central in computer vision applications and when developing a large variety of applications, ranging from large scale deployments for mobility in smart cities, to household monitoring systems assisting elderly people living alone. 

Our research project aims at developing light deep learning solutions from both data and computation perspectives, focusing on computer vision applications, upon which our team is highly experienced. The main hypothesis is that the generic approaches proposed in classic problems in machine learning, cannot be applied off-the-shelf when dealing with high dimensional tasks that involve sequences of videos and point clouds. Notice that our proposal diverges from most existing research efforts aiming at accuracy, which has been shown to increase by adding computational and data to the training of deep neural networks. From our perspective, the main challenge of these high performance systems in terms of accuracy is actually in their two supporting pillars: data and computation.

Publications

Ramon E, Triginer G, Escur J, Pumarola A, Garcia J, Giró-i-Nieto X, Moreno F. H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction. In: International Conference on Computer Vision (ICCV). International Conference on Computer Vision (ICCV). Virtual: IEEE/CVF; In Press.
Mañas O, Lacoste A, Giró-i-Nieto X, Vazquez D, Rodríguez P. Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data. In: International Conference in Computer Vision (ICCV). International Conference in Computer Vision (ICCV). Virtual: IEEE/CVF; In Press. (1.79 MB)

Collaborators