Multimodal Deep Reinforcement Learning

Type Start End
European Sep 2019 Aug 2021
Responsible URL
Xavier Giro-i-Nieto Doctoral INPhINIT Fellowships


Most recent breakthroughs in artificial intelligence are based on deep learning techniques trained over huge annotated datasets. These models are usually trained in a supervised manner and, while very effective for perceptual sensing tasks, their performance is upper-bounded by the annotator’s knowledge and involve high annotation costs. Many research efforts address more realistic scenarios based on reinforcement learning (RL) paradigm. In RL, an agent executes a sequence of actions in a responsive environment whose feedback is used as a guiding signal for learning. Analogously to perceptual sensing tasks, reinforcement learning agents have greatly benefited from the recent advances on deep learning. In this case, the necessary training datasets are often generated through computation, in particular, by running virtual environments. The Marenostrum supercomputer of the Barcelona Supercomputing Center (BSC-CNS) offers a unique infrastructure of thousands of CPUs that allow accelerating the process of collecting data and training the agents. This project will consider methods that are capable of taking advantage of the vast computational resources available.

The main goal of this research project is developing RL agents that can learn new tasks thanks to visually grounded language, mimicking the learning process of babies. We will aim at those skills which can only be learned thanks to a structured exploration of the environment following human feedback which, during the training phase, will be simulated by virtual environments such as Google DeepMind Lab or MIT’s VirtualHome. Results will impact in a more natural communication between humans and robots, allowing the later to quickly adapt the interaction skills learned on virtual environments to the real world.

Apply to this phd grant before 6th February 2019. Mobility rule applies: candidates must have resided or carried out their main activity in Catalonia or Spain for less than 12 months in the last 3 years.


Campos V, Trott A, Xiong C, Socher R, Giró-i-Nieto X, Torres J. Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills. In: International Conference on Machine Learning (ICML) 2020. International Conference on Machine Learning (ICML) 2020. ; 2020. (6.89 MB)