Models, code, and papers for "Marco Hutter":

Robust Feature Selection by Mutual Information Distributions

Aug 07, 2014
Marco Zaffalon, Marcus Hutter

Mutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables. In order to address questions such as the reliability of the empirical value, one must consider sample-to-population inferential approaches. This paper deals with the distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean and an analytical approximation of the variance are reported. Asymptotic approximations of the distribution are proposed. The results are applied to the problem of selecting features for incremental learning and classification of the naive Bayes classifier. A fast, newly defined method is shown to outperform the traditional approach based on empirical mutual information on a number of real data sets. Finally, a theoretical development is reported that allows one to efficiently extend the above methods to incomplete samples in an easy and effective way.

* Appears in Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI2002) 

  Click for Model/Code and Paper
Robust Inference of Trees

Nov 25, 2005
Marco Zaffalon, Marcus Hutter

This paper is concerned with the reliable inference of optimal tree-approximations to the dependency structure of an unknown distribution generating data. The traditional approach to the problem measures the dependency strength between random variables by the index called mutual information. In this paper reliability is achieved by Walley's imprecise Dirichlet model, which generalizes Bayesian learning with Dirichlet priors. Adopting the imprecise Dirichlet model results in posterior interval expectation for mutual information, and in a set of plausible trees consistent with the data. Reliable inference about the actual tree is achieved by focusing on the substructure common to all the plausible trees. We develop an exact algorithm that infers the substructure in time O(m^4), m being the number of random variables. The new algorithm is applied to a set of data sampled from a known distribution. The method is shown to reliably infer edges of the actual tree even when the data are very scarce, unlike the traditional approach. Finally, we provide lower and upper credibility limits for mutual information under the imprecise Dirichlet model. These enable the previous developments to be extended to a full inferential method for trees.

* Annals of Mathematics and Artificial Intelligence, 45 (2005) 215-239 
* 26 pages, 7 figures 

  Click for Model/Code and Paper
Distribution of Mutual Information from Complete and Incomplete Data

Mar 15, 2004
Marcus Hutter, Marco Zaffalon

Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sample-to-population inferential approaches. This paper deals with the posterior distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean, and analytical approximations for the variance, skewness and kurtosis are derived. These approximations have a guaranteed accuracy level of the order O(1/n^3), where n is the sample size. Leading order approximations for the mean and the variance are derived in the case of incomplete samples. The derived analytical expressions allow the distribution of mutual information to be approximated reliably and quickly. In fact, the derived expressions can be computed with the same order of complexity needed for descriptive mutual information. This makes the distribution of mutual information become a concrete alternative to descriptive mutual information in many applications which would benefit from moving to the inductive side. Some of these prospective applications are discussed, and one of them, namely feature selection, is shown to perform significantly better when inductive mutual information is used.

* Computational Statistics & Data Analysis, Vol.48, No.3, March 2005, pages 633--657 
* 26 pages, LaTeX, 5 figures, 4 tables 

  Click for Model/Code and Paper
Bayesian Treatment of Incomplete Discrete Data applied to Mutual Information and Feature Selection

Jun 24, 2003
Marcus Hutter, Marco Zaffalon

Given the joint chances of a pair of random variables one can compute quantities of interest, like the mutual information. The Bayesian treatment of unknown chances involves computing, from a second order prior distribution and the data likelihood, a posterior distribution of the chances. A common treatment of incomplete data is to assume ignorability and determine the chances by the expectation maximization (EM) algorithm. The two different methods above are well established but typically separated. This paper joins the two approaches in the case of Dirichlet priors, and derives efficient approximations for the mean, mode and the (co)variance of the chances and the mutual information. Furthermore, we prove the unimodality of the posterior distribution, whence the important property of convergence of EM to the global maximum in the chosen framework. These results are applied to the problem of selecting features for incremental learning and naive Bayes classification. A fast filter based on the distribution of mutual information is shown to outperform the traditional filter based on empirical mutual information on a number of incomplete real data sets.

* Proceedings of the 26th German Conference on Artificial Intelligence (KI-2003) 396-406 
* 11 pages, 1 figure 

  Click for Model/Code and Paper
Deep Value Model Predictive Control

Oct 08, 2019
Farbod Farshidian, David Hoeller, Marco Hutter

In this paper, we introduce an actor-critic algorithm called Deep Value Model Predictive Control (DMPC), which combines model-based trajectory optimization with value function estimation. The DMPC actor is a Model Predictive Control (MPC) optimizer with an objective function defined in terms of a value function estimated by the critic. We show that our MPC actor is an importance sampler, which minimizes an upper bound of the cross-entropy to the state distribution of the optimal sampling policy. In our experiments with a Ballbot system, we show that our algorithm can work with sparse and binary reward signals to efficiently solve obstacle avoidance and target reaching tasks. Compared to previous work, we show that including the value function in the running cost of the trajectory optimizer speeds up the convergence. We also discuss the necessary strategies to robustify the algorithm in practice.

* Accepted for publication in the Conference on Robotic Learning (CoRL) 2019, Osaka. 10 pages (+5 supplementary) 

  Click for Model/Code and Paper
MPC-Net: A First Principles Guided Policy Search

Sep 11, 2019
Jan Carius, Farbod Farshidian, Marco Hutter

We present an Imitation Learning approach for the control of dynamical systems with a known model. Our policy search method is guided by solutions from Model Predictive Control (MPC). Contrary to approaches that minimize a distance metric between the guiding demonstrations and the learned policy, our loss function corresponds to the minimization of the control Hamiltonian, which derives from the principle of optimality. Our algorithm, therefore, directly attempts to solve the HJB optimality equation with a parameterized class of control laws. The loss function's explicit encoding of physical constraints manifests in an improved constraint satisfaction metric of the learned controller. We train a mixture-of-expert neural network architecture for controlling a quadrupedal robot and show that this policy structure is well suited for such multimodal systems. The learned policy can successfully stabilize different gaits on the real walking robot from less than 10 min of demonstration data.


  Click for Model/Code and Paper
Robust Recovery Controller for a Quadrupedal Robot using Deep Reinforcement Learning

Jan 22, 2019
Joonho Lee, Jemin Hwangbo, Marco Hutter

The ability to recover from a fall is an essential feature for a legged robot to navigate in challenging environments robustly. Until today, there has been very little progress on this topic. Current solutions mostly build upon (heuristically) predefined trajectories, resulting in unnatural behaviors and requiring considerable effort in engineering system-specific components. In this paper, we present an approach based on model-free Deep Reinforcement Learning (RL) to control recovery maneuvers of quadrupedal robots using a hierarchical behavior-based controller. The controller consists of four neural network policies including three behaviors and one behavior selector to coordinate them. Each of them is trained individually in simulation and deployed directly on a real system. We experimentally validate our approach on the quadrupedal robot ANYmal, which is a dog-sized quadrupedal system with 12 degrees of freedom. With our method, ANYmal manifests dynamic and reactive recovery behaviors to recover from an arbitrary fall configuration within less than 5 seconds. We tested the recovery maneuver more than 100 times, and the success rate was higher than 97 %.


  Click for Model/Code and Paper
Feedback MPC for Torque-Controlled Legged Robots

May 15, 2019
Ruben Grandia, Farbod Farshidian, René Ranftl, Marco Hutter

The computational power of mobile robots is currently insufficient to achieve torque level whole-body Model Predictive Control (MPC) at the update rates required for complex dynamic systems such as legged robots. This problem is commonly circumvented by using a fast tracking controller to compensate for model errors between updates. In this work, we show that the feedback policy from a Differential Dynamic Programming (DDP) based MPC algorithm is a viable alternative to bridge the gap between the low MPC update rate and the actuation command rate. We propose to augment the DDP approach with a relaxed barrier function to address inequality constraints arising from the friction cone. A frequency-dependent cost function is used to reduce the sensitivity to high-frequency model errors and actuator bandwidth limits. We demonstrate that our approach can find stable locomotion policies for the torque-controlled quadruped, ANYmal, both in simulation and on hardware.

* submitted to IROS 2019 

  Click for Model/Code and Paper
Cable-Driven Actuation for Highly Dynamic Robotic Systems

Jun 27, 2018
Jemin Hwangbo, Vassilios Tsounis, Hendrik Kolvenbach, Marco Hutter

This paper presents design and experimental evaluations of an articulated robotic limb called Capler-Leg. The key element of Capler-Leg is its single-stage cable-pulley transmission combined with a high-gap radius motor. Our cable-pulley system is designed to be as light-weight as possible and to additionally serve as the primary cooling element, thus significantly increasing the power density and efficiency of the overall system. The total weight of active elements on the leg, i.e. the stators and the rotors, contribute more than 60% of the total leg weight, which is an order of magnitude higher than most existing robots. The resulting robotic leg has low inertia, high torque transparency, low manufacturing cost, no backlash, and a low number of parts. Capler-Leg system itself, serves as an experimental setup for evaluating the proposed cable- pulley design in terms of robustness and efficiency. A continuous jump experiment shows a remarkable 96.5 % recuperation rate, measured at the battery output. This means that almost all the mechanical energy output used during push-off returned back to the battery during touch-down.


  Click for Model/Code and Paper
Control of a Quadrotor with Reinforcement Learning

Jul 17, 2017
Jemin Hwangbo, Inkyu Sa, Roland Siegwart, Marco Hutter

In this paper, we present a method to control a quadrotor with a neural network trained using reinforcement learning techniques. With reinforcement learning, a common network can be trained to directly map state to actuator command making any predefined control structure obsolete for training. Moreover, we present a new learning algorithm which differs from the existing ones in certain aspects. Our algorithm is conservative but stable for complicated tasks. We found that it is more applicable to controlling a quadrotor than existing algorithms. We demonstrate the performance of the trained policy both in simulation and with a real quadrotor. Experiments show that our policy network can react to step response relatively accurately. With the same policy, we also demonstrate that we can stabilize the quadrotor in the air even under very harsh initialization (manually throwing it upside-down in the air with an initial velocity of 5 m/s). Computation time of evaluating the policy is only 7 {\mu}s per time step which is two orders of magnitude less than common trajectory optimization algorithms with an approximated model.


  Click for Model/Code and Paper
Limits of Learning about a Categorical Latent Variable under Prior Near-Ignorance

Apr 29, 2009
Alberto Piatti, Marco Zaffalon, Fabio Trojani, Marcus Hutter

In this paper, we consider the coherent theory of (epistemic) uncertainty of Walley, in which beliefs are represented through sets of probability distributions, and we focus on the problem of modeling prior ignorance about a categorical random variable. In this setting, it is a known result that a state of prior ignorance is not compatible with learning. To overcome this problem, another state of beliefs, called \emph{near-ignorance}, has been proposed. Near-ignorance resembles ignorance very closely, by satisfying some principles that can arguably be regarded as necessary in a state of ignorance, and allows learning to take place. What this paper does, is to provide new and substantial evidence that also near-ignorance cannot be really regarded as a way out of the problem of starting statistical inference in conditions of very weak beliefs. The key to this result is focusing on a setting characterized by a variable of interest that is \emph{latent}. We argue that such a setting is by far the most common case in practice, and we provide, for the case of categorical latent variables (and general \emph{manifest} variables) a condition that, if satisfied, prevents learning to take place under prior near-ignorance. This condition is shown to be easily satisfied even in the most common statistical problems. We regard these results as a strong form of evidence against the possibility to adopt a condition of prior near-ignorance in real statistical problems.

* International Journal of Approximate Reasoning, 50:4 (2009) pages 597-611 
* 27 LaTeX pages 

  Click for Model/Code and Paper
Whole-Body MPC for a Dynamically Stable Mobile Manipulator

Feb 27, 2019
Maria Vittoria Minniti, Farbod Farshidian, Ruben Grandia, Marco Hutter

Autonomous mobile manipulation is the cutting edge of the modern robotic technology, which offers a dual advantage of mobility provided by a mobile platform and dexterity afforded by the manipulator. A common approach for controlling these systems is based on the task space control. In a nutshell, a task space controller defines a map from user-defined end-effector references to the actuation commands based on an optimization problem over the distance between the reference trajectories and the physically consistent motions. The optimization however ignores the effect of the current decision on the future error, which limits the applicability of the approach for dynamically stable platforms. On the contrary, the Model Predictive Control (MPC) approach offers the capability of foreseeing the future and making a trade-off in between the current and future tracking errors. Here, we transcribe the task at the end-effector space, which makes the task description more natural for the user. Furthermore, we show how the MPC-based controller skillfully incorporates the reference forces at the end-effector in the control problem. To this end, we showcase here the advantages of using this MPC approach for controlling a ball-balancing mobile manipulator, Rezero. We validate our controller on the hardware for tasks such as end-effector pose tracking and door opening.


  Click for Model/Code and Paper
DeepGait: Planning and Control of Quadrupedal Gaits using Deep Reinforcement Learning

Sep 18, 2019
Vassilios Tsounis, Mitja Alge, Joonho Lee, Farbod Farshidian, Marco Hutter

This paper addresses the problem of legged locomotion in non-flat terrain. As legged robots such as quadrupeds are to be deployed in terrains with geometries which are difficult to model and predict, the need arises to equip them with the capability to generalize well to unforeseen situations. In this work, we propose a novel technique for training neural-network policies for terrain-aware locomotion, which combines state-of-the-art methods for model-based motion planning and reinforcement learning. Our approach is centered on formulating Markov decision processes using the evaluation of dynamic feasibility criteria in place of physical simulation. We thus employ policy-gradient methods to independently train policies which respectively plan and execute foothold and base motions in 3D environments using both proprioceptive and exteroceptive measurements. We apply our method within a challenging suite of simulated terrain scenarios which contain features such as narrow bridges, gaps and stepping-stones, and train policies which succeed in locomoting effectively in all cases.

* Submitted IEEE Robotics and Automation Letters (RA-L) and IEEE International Conference on Robotics and Automation (ICRA) 2020 in Paris, France 

  Click for Model/Code and Paper
Locomotion Planning through a Hybrid Bayesian Trajectory Optimization

Mar 09, 2019
Tim Seyde, Jan Carius, Ruben Grandia, Farbod Farshidian, Marco Hutter

Locomotion planning for legged systems requires reasoning about suitable contact schedules. The contact sequence and timings constitute a hybrid dynamical system and prescribe a subset of achievable motions. State-of-the-art approaches cast motion planning as an optimal control problem. In order to decrease computational complexity, one common strategy separates footstep planning from motion optimization and plans contacts using heuristics. In this paper, we propose to learn contact schedule selection from high-level task descriptors using Bayesian optimization. A bi-level optimization is defined in which a Gaussian process model predicts the performance of trajectories generated by a motion planning nonlinear program. The agent, therefore, retains the ability to reason about suitable contact schedules, while explicit computation of the corresponding gradients is avoided. We delineate the algorithm in its general form and provide results for planning single-legged hopping. Our method is capable of learning contact schedule transitions that align with human intuition. It performs competitively against a heuristic baseline in predicting task appropriate contact schedules.

* Accepted for publication at the IEEE International Conference on Robotics and Automation (ICRA) 2019 

  Click for Model/Code and Paper
Frequency-Aware Model Predictive Control

Feb 08, 2019
Ruben Grandia, Farbod Farshidian, Alexey Dosovitskiy, René Ranftl, Marco Hutter

Transferring solutions found by trajectory optimization to robotic hardware remains a challenging task. When the optimization fully exploits the provided model to perform dynamic tasks, the presence of unmodeled dynamics renders the motion infeasible on the real system. Model errors can be a result of model simplifications, but also naturally arise when deploying the robot in unstructured and nondeterministic environments. Predominantly, compliant contacts and actuator dynamics lead to bandwidth limitations. While classical control methods provide tools to synthesize controllers that are robust to a class of model errors, such a notion is missing in modern trajectory optimization, which is solved in the time domain. We propose frequency-shaped cost functions to achieve robust solutions in the context of optimal control for legged robots. Through simulation and hardware experiments we show that motion plans can be made compatible with bandwidth limits set by actuators and contact dynamics. The smoothness of the model predictive solutions can be continuously tuned without compromising the feasibility of the problem. Experiments with the quadrupedal robot ANYmal, which is driven by highly-compliant series elastic actuators, showed significantly improved tracking performance of the planned motion, torque, and force trajectories and enabled the machine to walk robustly on terrain with unmodeled compliance.

* IEEE Robotics and Automation Letters 2019 

  Click for Model/Code and Paper
Rolling in the Deep -- Hybrid Locomotion for Wheeled-Legged Robots using Online Trajectory Optimization

Sep 16, 2019
Marko Bjelonic, Prajish K. Sankar, C. Dario Bellicoso, Heike Vallery, Marco Hutter

Wheeled-legged robots have the potential for highly agile and versatile locomotion. The combination of legs and wheels might be a solution for any real-world application requiring rapid, and long-distance mobility skills on challenging terrain. In this paper, we present an online trajectory optimization framework for wheeled quadrupedal robots capable of executing hybrid walking-driving locomotion strategies. By breaking down the optimization problem into a wheel and base trajectory planning, locomotion planning for high dimensional wheeled-legged robots becomes more tractable, can be solved in real-time on-board in a model predictive control fashion, and becomes robust against unpredicted disturbances. The reference motions are tracked by a hierarchical whole-body controller that sends torque commands to the robot. Our approach is verified on a quadrupedal robot that is fully torque-controlled, including the non-steerable wheels attached to its legs. The robot performs hybrid locomotion with different gait sequences on flat and rough terrain. In addition, we validated the robotic platform at the Defense Advanced Research Projects Agency (DARPA) Subterranean Challenge, where the robot rapidly maps, navigates, and explores dynamic underground environments.


  Click for Model/Code and Paper
Walking Posture Adaptation for Legged Robot Navigation in Confined Spaces

Jan 31, 2019
Russell Buchanan, Tirthankar Bandyopadhyay, Marko Bjelonic, Lorenz Wellhausen, Marco Hutter, Navinda Kottege

Legged robots have the ability to adapt their walking posture to navigate confined spaces due to their high degrees of freedom. However, this has not been exploited in most common multilegged platforms. This paper presents a deformable bounding box abstraction of the robot model, with accompanying mapping and planning strategies, that enable a legged robot to autonomously change its body shape to navigate confined spaces. The mapping is achieved using robot-centric multi-elevation maps generated with distance sensors carried by the robot. The path planning is based on the trajectory optimisation algorithm CHOMP which creates smooth trajectories while avoiding obstacles. The proposed method has been tested in simulation and implemented on the hexapod robot Weaver, which is 33cm tall and 82cm wide when walking normally. We demonstrate navigating under 25cm overhanging obstacles, through 70cm wide gaps and over 22cm high obstacles in both artificial testing spaces and realistic environments, including a subterranean mining tunnel.

* IEEE RA-L/ICRA2019 

  Click for Model/Code and Paper
Learning agile and dynamic motor skills for legged robots

Jan 24, 2019
Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, Marco Hutter

Legged robots pose one of the greatest challenges in robotics. Dynamic and agile maneuvers of animals cannot be imitated by existing methods that are crafted by humans. A compelling alternative is reinforcement learning, which requires minimal craftsmanship and promotes the natural evolution of a control policy. However, so far, reinforcement learning research for legged robots is mainly limited to simulation, and only few and comparably simple examples have been deployed on real systems. The primary reason is that training with real robots, particularly with dynamically balancing systems, is complicated and expensive. In the present work, we introduce a method for training a neural network policy in simulation and transferring it to a state-of-the-art legged system, thereby leveraging fast, automated, and cost-effective data generation schemes. The approach is applied to the ANYmal robot, a sophisticated medium-dog-sized quadrupedal system. Using policies trained in simulation, the quadrupedal machine achieves locomotion skills that go beyond what had been achieved with prior methods: ANYmal is capable of precisely and energy-efficiently following high-level body velocity commands, running faster than before, and recovering from falling even in complex configurations.

* Science Robotics 4.26 (2019): eaau5872 

  Click for Model/Code and Paper
Whole-Body Nonlinear Model Predictive Control Through Contacts for Quadrupeds

Dec 07, 2017
Michael Neunert, Markus Stäuble, Markus Giftthaler, Carmine D. Bellicoso, Jan Carius, Christian Gehring, Marco Hutter, Jonas Buchli

In this work we present a whole-body Nonlinear Model Predictive Control approach for Rigid Body Systems subject to contacts. We use a full dynamic system model which also includes explicit contact dynamics. Therefore, contact locations, sequences and timings are not prespecified but optimized by the solver. Yet, thorough numerical and software engineering allows for running the nonlinear Optimal Control solver at rates up to 190 Hz on a quadruped for a time horizon of half a second. This outperforms the state of the art by at least one order of magnitude. Hardware experiments in form of periodic and non-periodic tasks are applied to two quadrupeds with different actuation systems. The obtained results underline the performance, transferability and robustness of the approach.

* Submitted to "Robotics and Automation: Letters" / "International Conference on Robotics and Automation 2018" 

  Click for Model/Code and Paper