Models, code, and papers for "Filipe Mutz":

Hindsight policy gradients

Feb 20, 2019
Paulo Rauber, Avinash Ummadisingu, Filipe Mutz, Juergen Schmidhuber

A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enable sample efficient learning. However, reinforcement learning agents have only recently been endowed with such capacity for hindsight. In this paper, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms. Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency.

* Accepted to ICLR 2019 

  Click for Model/Code and Paper
Training Agents using Upside-Down Reinforcement Learning

Dec 05, 2019
Rupesh Kumar Srivastava, Pranav Shyam, Filipe Mutz, Wojciech Jaśkowski, Jürgen Schmidhuber

Traditional Reinforcement Learning (RL) algorithms either predict rewards with value functions or maximize them using policy search. We study an alternative: Upside-Down Reinforcement Learning (Upside-Down RL or UDRL), that solves RL problems primarily using supervised learning techniques. Many of its main principles are outlined in a companion report [34]. Here we present the first concrete implementation of UDRL and demonstrate its feasibility on certain episodic learning problems. Experimental results show that its performance can be surprisingly competitive with, and even exceed that of traditional baseline algorithms developed over decades of research.

* NNAISENSE Technical Report. 17 pages, 6 figures 

  Click for Model/Code and Paper
Map Memorization and Forgetting in the IARA Autonomous Car

Oct 04, 2018
Thomas Teixeira, Filipe Mutz, Vinicius B. Cardoso, Lucas Veronese, Claudine Badue, Thiago Oliveira-Santos, Alberto F. De Souza

In this work, we present a novel strategy for correcting imperfections in occupancy grid maps called map decay. The objective of map decay is to correct invalid occupancy probabilities of map cells that are unobservable by sensors. The strategy was inspired by an analogy between the memory architecture believed to exist in the human brain and the maps maintained by an autonomous vehicle. It consists in merging sensory information obtained during runtime (online) with a priori data from a high-precision map constructed offline. In map decay, cells observed by sensors are updated using traditional occupancy grid mapping techniques and unobserved cells are adjusted so that their occupancy probabilities tend to the values found in the offline map. This strategy is grounded in the idea that the most precise information available about an unobservable cell is the value found in the high-precision offline map. Map decay was successfully tested and is still in use in the IARA autonomous vehicle from Universidade Federal do Esp\'irito Santo.

  Click for Model/Code and Paper
A Model-Predictive Motion Planner for the IARA Autonomous Car

Nov 09, 2017
Vinicius Cardoso, Josias Oliveira, Thomas Teixeira, Claudine Badue, Filipe Mutz, Thiago Oliveira-Santos, Lucas Veronese, Alberto F. De Souza

We present the Model-Predictive Motion Planner (MPMP) of the Intelligent Autonomous Robotic Automobile (IARA). IARA is a fully autonomous car that uses a path planner to compute a path from its current position to the desired destination. Using this path, the current position, a goal in the path and a map, IARA's MPMP is able to compute smooth trajectories from its current position to the goal in less than 50 ms. MPMP computes the poses of these trajectories so that they follow the path closely and, at the same time, are at a safe distance of eventual obstacles. Our experiments have shown that MPMP is able to compute trajectories that precisely follow a path produced by a Human driver (distance of 0.15 m in average) while smoothly driving IARA at speeds of up to 32.4 km/h (9 m/s).

* IEEE International Conference on Robotics and Automation (ICRA 2017), 2017, pp. 225-230 
* This is a preprint. Accepted by 2017 IEEE International Conference on Robotics and Automation (ICRA) 

  Click for Model/Code and Paper
Self-Driving Cars: A Survey

Jan 14, 2019
Claudine Badue, Rânik Guidolini, Raphael Vivacqua Carneiro, Pedro Azevedo, Vinicius Brito Cardoso, Avelino Forechi, Luan Ferreira Reis Jesus, Rodrigo Ferreira Berriel, Thiago Meireles Paixão, Filipe Mutz, Thiago Oliveira-Santos, Alberto Ferreira De Souza

We survey research on self-driving cars published in the literature focusing on autonomous cars developed since the DARPA challenges, which are equipped with an autonomy system that can be categorized as SAE level 3 or higher. The architecture of the autonomy system of self-driving cars is typically organized into the perception system and the decision-making system. The perception system is generally divided into many subsystems responsible for tasks such as self-driving-car localization, static obstacles mapping, moving obstacles detection and tracking, road mapping, traffic signalization detection and recognition, among others. The decision-making system is commonly partitioned as well into many subsystems responsible for tasks such as route planning, path planning, behavior selection, motion planning, and control. In this survey, we present the typical architecture of the autonomy system of self-driving cars. We also review research on relevant methods for perception and decision making. Furthermore, we present a detailed description of the architecture of the autonomy system of the UFES's car, IARA. Finally, we list prominent autonomous research cars developed by technology companies and reported in the media.

  Click for Model/Code and Paper