PhD position - Deep Reinforcement Learning for UxV perception and control

SL-DRT-17-1103

RESEARCH FIELD

Computer science and software

ABSTRACT

Reinforcement learning is a computational approach to solve sequential decision making problems where an agent tries to maximize the total amount of reward it receives when interacting with a dynamic and uncertain environment. In the reinforcement learning paradigm the agent is not told how to behave, but instead must learn itself the policy (sequence of actions) which yields the most reward by trial-and-error.Reinforcement learning has various applications including games, robotics, langage processing, neural networks architecture design, personalized web services and computer vision. In this thesis, we will focus on its application on UxV perception and control using only the embedded camera as input sensor. Specifically, we are interested in learning to navigate in a real and dynamic environnement without planification, in learning to track objects of interest or in running a sequence of tasks by only using the raw video signal. Results can be used for the control of an autonomous vehicle or in domestic robot for smart home applications.In the last few years, we have been witnessing the renaissance of the reinforcement learning. The combination of this paradigm and deep neural networks (Deep Reinforcement Learning) is the main ingredient of many breakthroughs technologies brought by reinforcement learning algorithms. Alpha Go, the first computer Go program who defeats human Go masters is one striking example. Nevertheless, there is a room of improvement when using this paradigm in problems where the observations and the actions are in high dimensional spaces. Current algorithms are unstable and difficult to train in this setting. Recent advances in computer vision and deep learning suggest that using deep models helps in solving many computer vision tasks. To make use of these deep models, we need to scale up most of the available deep reinforcement learning algorithms. Finally, the reward which is a key component of the reinforcement learning paradigm is not easy to define in the tasks we are interested in. Having a good reward definition is crucial the make the system learn efficiently.The main contribution of this thesis will to scale up policy optimization algorithms to handle deep models and to make them efficient in high dimensional observation and action spaces. The algorithm will be used to control UxV using raw video signal only and will be demonstrated in a real world environnement.

LOCATION

Département Intelligence Ambiante et Systèmes Interactifs (LIST)

Vision & Ingénierie des Contenus (SAC)

Saclay

CONTACT PERSON

RABARISOA Jaonary

CEA

DRT/DIASI//LVIC

CEA Saclay - Nano-INNOVBat 861 - PC 173 - F91191 Gif Sur Yvette CedexFrance

Phone number: 00169080129

Email: jaonary.rabarisoa@cea.fr

UNIVERSITY / GRADUATE SCHOOL

Paris-Saclay

Interfaces: Approches interdisciplinaires / fondements; applications et innovation

START DATE

Start date on 01-10-2017

THESIS SUPERVISOR

FILLIAT David

ENSTA ParisTech

Informatique et Ingénierie de Systèmes

828 Boulevard des Maréchaux, 91120 Palaiseau

Phone number: 01 81 87 20 34

Email: david.filliat@ensta-paristech.fr

« The age limit is 26 years for PhD offers and 30 years old for post-doc offers. »


Mer om arbetsgivare

Företag

CEA Tech

Platser

Kategorier

Sista ansökningsdag

2017-10-01


Sök tjänsten

Maila annonsen till mig



Mer om arbetsgivare

Företag

CEA Tech

Platser

Kategorier

Sista ansökningsdag

2017-10-01


Sök tjänsten

Maila annonsen till mig