Research papers and code for "Sehoon Ha":
While the recent advances in deep reinforcement learning have achieved impressive results in learning motor skills, many of the trained policies are only capable within a limited set of initial states. We propose a technique to break down a complex robotic task to simpler subtasks and train them sequentially such that the robot can expand its existing skill set gradually. Our key idea is to build a tree of local control policies represented by neural networks, which we refer as Relay Neural Networks. Starting from the root policy that attempts to achieve the task from a small set of initial states, each subsequent policy expands the set of successful initial states by driving the new states to existing "good" states. Our algorithm utilizes the value function of the policy to determine whether a state is "good" under each policy. We take advantage of many existing policy search algorithms that learn the value function simultaneously with the policy, such as those that use actor-critic representations or those that use the advantage function to reduce variance. We demonstrate that the relay networks can solve complex continuous control problems for underactuated dynamic systems.

Click to Read Paper and Get Code
Being able to fall safely is a necessary motor skill for humanoids performing highly dynamic tasks, such as running and jumping. We propose a new method to learn a policy that minimizes the maximal impulse during the fall. The optimization solves for both a discrete contact planning problem and a continuous optimal control problem. Once trained, the policy can compute the optimal next contacting body part (e.g. left foot, right foot, or hands), contact location and timing, and the required joint actuation. We represent the policy as a mixture of actor-critic neural network, which consists of n control policies and the corresponding value functions. Each pair of actor-critic is associated with one of the n possible contacting body parts. During execution, the policy corresponding to the highest value function will be executed while the associated body part will be the next contact with the ground. With this mixture of actor-critic architecture, the discrete contact sequence planning is solved through the selection of the best critics while the continuous control problem is solved by the optimization of actors. We show that our policy can achieve comparable, sometimes even higher, rewards than a recursive search of the action space using dynamic programming, while enjoying 50 to 400 times of speed gain during online execution.

Click to Read Paper and Get Code
Deep reinforcement learning offers the promise of automatic acquisition of robotic control policies that directly map sensory inputs to low-level actions. In the domain of robotic locomotion, it could make it possible for locomotion skills to be learned with minimal engineering and without even needing to construct a model of the robot. However, applying deep reinforcement learning methods on real-world robots is exceptionally difficult, due both to the sample complexity and, just as importantly, the sensitivity of such methods to hyperparameters. While hyperparameter tuning can be performed in parallel in simulated domains, it is usually impractical to tune hyperparameters directly on real-world robotic platforms, especially legged platforms like quadrupedal robots that can be damaged through extensive trial-and-error learning. We develop a stable deep RL algorithm that extends soft actor-critic, requires minimal hyperparameter tuning, and requires only a modest number of trials to learn multilayer neural network policies. We then apply this method to learn walking gaits on a real-world Minitaur robot. Our method can learn to walk from scratch directly in the real world in two hours of training, without any model or simulation, and the resulting policy is robust to moderate variations in the environment. We further show that our algorithm achieves state-of-the-art performance on four standard simulated benchmarks.

* Videos: https://sites.google.com/view/minitaur-locomotion/ . arXiv admin note: substantial text overlap with arXiv:1812.05905
Click to Read Paper and Get Code
Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods typically suffer from two major challenges: high sample complexity and brittleness to hyperparameters. Both of these challenges limit the applicability of such methods to real-world domains. In this paper, we describe Soft Actor-Critic (SAC), our recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework. In this framework, the actor aims to simultaneously maximize expected return and entropy. That is, to succeed at the task while acting as randomly as possible. We extend SAC to incorporate a number of modifications that accelerate training and improve stability with respect to the hyperparameters, including a constrained formulation that automatically tunes the temperature hyperparameter. We systematically evaluate SAC on a range of benchmark tasks, as well as real-world challenging tasks such as locomotion for a quadrupedal robot and robotic manipulation with a dexterous hand. With these improvements, SAC achieves state-of-the-art performance, outperforming prior on-policy and off-policy methods in sample-efficiency and asymptotic performance. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving similar performance across different random seeds. These results suggest that SAC is a promising candidate for learning in real-world robotics tasks.

* arXiv admin note: substantial text overlap with arXiv:1801.01290
Click to Read Paper and Get Code
The use of the Series Elastic Actuator (SEA) system as an actuator system equipped with a compliant element has contributed not only to advances in human interacting robots but also to a wide range of improvements in the robotics area. Nevertheless, there are still limitations in its performance; the elastic spring that is adopted to provide compliance is considered to limit the actuator performance thus lowering the frequency bandwidth of force/torque generation, and the bandwidth decreases even more when it is supposed to provide large torque. This weakness is in turn owing to the limitations of motor and motor drives such as torque and velocity limits. In this paper, mathematical tools to analyze the impact of these limitations on the performance of SEA as a transmission system are provided. A novel criterion called Maximum Torque Transmissibility (MTT)is defined to assess the ability of SEA to fully utilize maximum continuous motor torque. Moreover, an original frequency bandwidth concept, maximum torque frequency bandwidth, which can indicate the maximum frequency up to which the SEA can generate the maximum torque, is proposed based on the proposed MTT. The proposed MTT can be utilized as a unique criterion of the performance, and thus various design parameters including the load condition, mechanical design parameters, and controller parameters of a SEA can be evaluated with its use. Experimental results under various conditions verify that MTT can precisely indicate the limitation of the performance of SEA, and that it can be utilized to accurately analyze the limitation of the controller of SEA.

Click to Read Paper and Get Code
This paper develops an accurate force control algorithm for series elastic actuators (SEAs) based on a novel force estimation scheme, called transmission force observer (TFOB). The proposed method is designed to improve an inferior force measurement of the SEA caused by nonlinearities of the elastic transmission and measurement noise and error of its deformation sensor. This paper first analyzes the limitation of the conventional methods for the SEA transmission force sensing and then investigates its stochastic characteristics, which indeed provide the base to render the accurate force control performance incorporated with the TFOB. In particular, a tuning parameter is introduced from holistic closed-loop system analyses in the frequency domain. This gives a guideline to attain optimum performance of the force-controlled SEA system. The proposed algorithm is experimentally verified in an actual SEA hardware setup.

Click to Read Paper and Get Code