Models, code, and papers for "Masayoshi Tomizuka":
We present an endpoint box regression module(epBRM), which is designed for predicting precise 3D bounding boxes using raw LiDAR 3D point clouds. The proposed epBRM is built with sequence of small networks and is computationally lightweight. Our approach can improve a 3D object detection performance by predicting more precise 3D bounding box coordinates. The proposed approach requires 40 minutes of training to improve the detection performance. Moreover, epBRM imposes less than 12ms to network inference time for up-to 20 objects. The proposed approach utilizes a spatial transformation mechanism to simplify the box regression task. Adopting spatial transformation mechanism into epBRM makes it possible to improve the quality of detection with a small sized network. We conduct in-depth analysis of the effect of various spatial transformation mechanisms applied on raw LiDAR 3D point clouds. We also evaluate the proposed epBRM by applying it to several state-of-the-art 3D object detection systems. We evaluate our approach on KITTI dataset, a standard 3D object detection benchmark for autonomous vehicles. The proposed epBRM enhances the overlaps between ground truth bounding boxes and detected bounding boxes, and improves 3D object detection. Our proposed method evaluated in KITTI test server outperforms current state-of-the-art approaches.
This paper introduces a framework to plan grasps with multi-fingered hands. The framework includes a multi-dimensional iterative surface fitting (MDISF) for grasp planning and a grasp trajectory optimization (GTO) for grasp imagination. The MDISF algorithm searches for optimal contact regions and hand configurations by minimizing the collision and surface fitting error, and the GTO algorithm generates optimal finger trajectories to reach the highly ranked grasp configurations and avoid collision with the environment. The proposed grasp planning and imagination framework considers the collision avoidance and the kinematics of the hand-robot system, and is able to plan grasps and trajectories of different categories efficiently with gradient-based methods using the captured point cloud. The found grasps and trajectories are robust to sensing noises and underlying uncertainties. The effectiveness of the proposed framework is verified by both simulations and experiments.
Direct design of a robot's rendered dynamics, such as in impedance control, is now a well-established control mode in uncertain environments. When the physical interaction port variables are not measured directly, dynamic and kinematic models are required to relate the measured variables to the interaction port variables. A typical example is serial manipulators with joint torque sensors, where the interaction occurs at the end-effector. As interactive robots perform increasingly complex tasks, they will be intermittently coupled with additional dynamic elements such as tools, grippers, or workpieces, some of which should be compensated and brought to the robot side of the interaction port, making the inverse dynamics multimodal. Furthermore, there may also be unavoidable and unmeasured external input when the desired system cannot be totally isolated. Towards semi-autonomous robots, capable of handling such applications, a multimodal Gaussian process regression approach to manipulator dynamic modelling is developed. A sampling-based approach clusters different dynamic modes from unlabelled data, also allowing the seperation of perturbed data with significant, irregular external input. The passivity of the overall approach is shown analytically, and experiments examine the performance and safety of this approach on a test actuator.
Human-robot interactions have been recognized to be a key element of future industrial collaborative robots (co-robots). Unlike traditional robots that work in structured and deterministic environments, co-robots need to operate in highly unstructured and stochastic environments. To ensure that co-robots operate efficiently and safely in dynamic uncertain environments, this paper introduces the robot safe interaction system. In order to address the uncertainties during human-robot interactions, a unique parallel planning and control architecture is proposed, which has a long term global planner to ensure efficiency of robot behavior, and a short term local planner to ensure real time safety under uncertainties. In order for the robot to respond immediately to environmental changes, fast algorithms are used for real-time computation, i.e., the convex feasible set algorithm for the long term optimization, and the safe set algorithm for the short term optimization. Several test platforms are introduced for safe evaluation of the developed system in the early phase of deployment. The effectiveness and the efficiency of the proposed method have been verified in experiment with an industrial robot manipulator.
Probabilistic vehicle trajectory prediction is essential for robust safety of autonomous driving. Current methods for long-term trajectory prediction cannot guarantee the physical feasibility of predicted distribution. Moreover, their models cannot adapt to the driving policy of the predicted target human driver. In this work, we propose to overcome these two shortcomings by a Bayesian recurrent neural network model consisting of Bayesian-neural-network-based policy model and known physical model of the scenario. Bayesian neural network can ensemble complicated output distribution, enabling rich family of trajectory distribution. The embedded physical model ensures feasibility of the distribution. Moreover, the adopted gradient-based training method allows direct optimization for better performance in long prediction horizon. Furthermore, a particle-filter-based parameter adaptation algorithm is designed to adapt the policy Bayesian neural network to the predicted target online. Effectiveness of the proposed methods is verified with a toy example with multi-modal stochastic feedback gain and naturalistic car following data.
Current autonomous driving systems are composed of a perception system and a decision system. Both of them are divided into multiple subsystems built up with lots of human heuristics. An end-to-end approach might clean up the system and avoid huge efforts of human engineering, as well as obtain better performance with increasing data and computation resources. Compared to the decision system, the perception system is more suitable to be designed in an end-to-end framework, since it does not require online driving exploration. In this paper, we propose a novel end-to-end approach for autonomous driving perception. A latent space is introduced to capture all relevant features useful for perception, which is learned through sequential latent representation learning. The learned end-to-end perception model is able to solve the detection, tracking, localization and mapping problems altogether with only minimum human engineering efforts and without storing any maps online. The proposed method is evaluated in a realistic urban driving simulator, with both camera image and lidar point cloud as sensor inputs. The codes and videos of this work are available at our github repo and project website.
Pushing is a useful robotic capability for positioning and reorienting objects. The ability to accurately predict the effect of pushes can enable efficient trajectory planning and complicated object manipulation. Physical prediction models for planar pushing have long been established, but their assumptions and requirements usually don't hold in most practical settings. Data-driven approaches can provide accurate predictions for offline data, but they often have generalizability issues. In this paper, we propose a combined prediction model and an online learning framework for planar push prediction. The combined model consists of a neural network module and analytical components with a low-dimensional parameter. We train the neural network offline using pre-collected pushing data. In online situations, the low-dimensional analytical parameter is learned directly from online pushes to quickly adapt to the new environments. We test our combined model and learning framework on real pushing experiments. Our experimental results show that our model is able to quickly adapt to new environments while achieving similar final prediction performance as that of pure neural network models.
Accurately predicting future behaviors of surrounding vehicles is an essential capability for autonomous vehicles in order to plan safe and feasible trajectories. The behaviors of others, however, are full of uncertainties. Both rational and irrational behaviors exist, and the autonomous vehicles need to be aware of this in their prediction module. The prediction module is also expected to generate reasonable results in the presence of unseen and corner scenarios. Two types of prediction models are typically used to solve the prediction problem: learning-based model and planning-based model. Learning-based model utilizes real driving data to model the human behaviors. Depending on the structure of the data, learning-based models can predict both rational and irrational behaviors. But the balance between them cannot be customized, which creates challenges in generalizing the prediction results. Planning-based model, on the other hand, usually assumes human as a rational agent, i.e., it anticipates only rational behavior of human drivers. In this paper, a generic prediction architecture is proposed to address various rationalities in human behavior. We leverage the advantages from both learning-based and planning-based prediction models. The proposed approach is able to predict continuous trajectories that well-reflect possible future situations of other drivers. Moreover, the prediction performance remains stable under various unseen driving scenarios. A case study under a real-world roundabout scenario is provided to demonstrate the performance and capability of the proposed prediction architecture.
Effective understanding of the environment and accurate trajectory prediction of surrounding dynamic obstacles are critical for intelligent systems such as autonomous vehicles and wheeled mobile robotics navigating in complex scenarios to achieve safe and high-quality decision making, motion planning and control. Due to the uncertain nature of the future, it is desired to make inference from a probability perspective instead of deterministic prediction. In this paper, we propose a conditional generative neural system (CGNS) for probabilistic trajectory prediction to approximate the data distribution, with which realistic, feasible and diverse future trajectory hypotheses can be sampled. The system combines the strengths of conditional latent space learning and variational divergence minimization, and leverages both static context and interaction information with soft attention mechanisms. We also propose a regularization method for incorporating soft constraints into deep neural networks with differentiable barrier functions, which can regulate and push the generated samples into the feasible regions. The proposed system is evaluated on several public benchmark datasets for pedestrian trajectory prediction and a roundabout naturalistic driving dataset collected by ourselves. The experiment results demonstrate that our model achieves better performance than various baseline approaches in terms of prediction accuracy.
Urban autonomous driving decision making is challenging due to complex road geometry and multi-agent interactions. Current decision making methods are mostly manually designing the driving policy, which might result in sub-optimal solutions and is expensive to develop, generalize and maintain at scale. On the other hand, with reinforcement learning (RL), a policy can be learned and improved automatically without any manual designs. However, current RL methods generally do not work well on complex urban scenarios. In this paper, we propose a framework to enable model-free deep reinforcement learning in challenging urban autonomous driving scenarios. We design a specific input representation and use visual encoding to capture the low-dimensional latent states. Several state-of-the-art model-free deep RL algorithms are implemented into our framework, with several tricks to improve their performance. We evaluate our method in a challenging roundabout task with dense surrounding vehicles in a high-definition driving simulator. The result shows that our method can solve the task well and is significantly better than the baseline.
Automatic assembly has broad applications in industries. Traditional assembly tasks utilize predefined trajectories or tuned force control parameters, which make the automatic assembly time-consuming, difficult to generalize, and not robust to uncertainties. In this paper, we propose a learning framework for high precision industrial assembly. The framework combines both the supervised learning and the reinforcement learning. The supervised learning utilizes trajectory optimization to provide the initial guidance to the policy, while the reinforcement learning utilizes actor-critic algorithm to establish the evaluation system even the supervisor is not accurate. The proposed learning framework is more efficient compared with the reinforcement learning and achieves better stability performance than the supervised learning. The effectiveness of the method is verified by both the simulation and experiment.
Precision grasps with multi-fingered hands are important for precise placement and in-hand manipulation tasks. Searching precision grasps on the object represented by point cloud, is challenging due to the complex object shape, high-dimensionality, collision and undesired properties of the sensing and positioning. This paper proposes an optimization model to search for precision grasps with multi-fingered hands. The model takes noisy point cloud of the object as input and optimizes the grasp quality by iteratively searching for the palm pose and finger joints positions. The collision between the hand and the object is approximated and penalized by a series of least-squares. The collision approximation is able to handle the point cloud representation of the objects with complex shapes. The proposed optimization model is able to locate collision-free optimal precision grasps efficiently. The average computation time is 0.50 sec/grasp. The searching is robust to the incompleteness and noise of the point cloud. The effectiveness of the algorithm is demonstrated by experiments.
In order to enable high-quality decision making and motion planning of intelligent systems such as robotics and autonomous vehicles, accurate probabilistic predictions for surrounding interactive objects is a crucial prerequisite. Although many research studies have been devoted to making predictions on a single entity, it remains an open challenge to forecast future behaviors for multiple interactive agents simultaneously. In this work, we take advantage of the Generative Adversarial Network (GAN) due to its capability of distribution learning and propose a generic multi-agent probabilistic prediction and tracking framework which takes the interactions among multiple entities into account, in which all the entities are treated as a whole. However, since GAN is very hard to train, we make an empirical research and present the relationship between training performance and hyperparameter values with a numerical case study. The results imply that the proposed model can capture both the mean, variance and multi-modalities of the groundtruth distribution. Moreover, we apply the proposed approach to a real-world task of vehicle behavior prediction to demonstrate its effectiveness and accuracy. The results illustrate that the proposed model trained by adversarial learning can achieve a better prediction performance than other state-of-the-art models trained by traditional supervised learning which maximizes the data likelihood. The well-trained model can also be utilized as an implicit proposal distribution for particle filtered based Bayesian state estimation.
For autonomous agents to successfully operate in real world, the ability to anticipate future motions of surrounding entities in the scene can greatly enhance their safety levels since potentially dangerous situations could be avoided in advance. While impressive results have been shown on predicting each agent's behavior independently, we argue that it is not valid to consider road entities individually since transitions of vehicle states are highly coupled. Moreover, as the predicted horizon becomes longer, modeling prediction uncertainties and multi-modal distributions over future sequences will turn into a more challenging task. In this paper, we address this challenge by presenting a multi-modal probabilistic prediction approach. The proposed method is based on a generative model and is capable of jointly predicting sequential motions of each pair of interacting agents. Most importantly, our model is interpretable, which can explain the underneath logic as well as obtain more reliability to use in real applications. A complicate real-world roundabout scenario is utilized to implement and examine the proposed method.
Human robot collaboration (HRC) is becoming increasingly important as the paradigm of manufacturing is shifting from mass production to mass customization. The introduction of HRC can significantly improve the flexibility and intelligence of automation. However, due to the stochastic and time-varying nature of human collaborators, it is challenging for the robot to efficiently and accurately identify the plan of human and respond in a safe manner. To address this challenge, we propose an integrated human robot collaboration framework in this paper which includes both plan recognition and trajectory prediction. Such a framework enables the robots to perceive, predict and adapt their actions to the human's plan and intelligently avoid collisions with the human based on the predicted human trajectory. Moreover, by explicitly leveraging the hierarchical relationship between the plan and trajectories, more robust plan recognition performance can be achieved. Experiments are conducted on an industrial robot to verify the proposed framework, which shows that our proposed framework can not only assure safe HRC, but also improve the time efficiency of the HRC team, and the plan recognition module is not sensitive to noises.
The decision and planning system for autonomous driving in urban environments is hard to design. Most current methods are to manually design the driving policy, which can be sub-optimal and expensive to develop and maintain at scale. Instead, with imitation learning we only need to collect data and then the computer will learn and improve the driving policy automatically. However, existing imitation learning methods for autonomous driving are hardly performing well for complex urban scenarios. Moreover, the safety is not guaranteed when we use a deep neural network policy. In this paper, we proposed a framework to learn the driving policy in urban scenarios efficiently given offline connected driving data, with a safety controller incorporated to guarantee safety at test time. The experiments show that our method can achieve high performance in realistic three-dimensional simulations of urban driving scenarios, with only hours of data collection and training on a single consumer GPU.
Although deep reinforcement learning (deep RL) methods have lots of strengths that are favorable if applied to autonomous driving, real deep RL applications in autonomous driving have been slowed down by the modeling gap between the source (training) domain and the target (deployment) domain. Unlike current policy transfer approaches, which generally limit to the usage of uninterpretable neural network representations as the transferred features, we propose to transfer concrete kinematic quantities in autonomous driving. The proposed robust-control-based (RC) generic transfer architecture, which we call RL-RC, incorporates a transferable hierarchical RL trajectory planner and a robust tracking controller based on disturbance observer (DOB). The deep RL policies trained with known nominal dynamics model are transfered directly to the target domain, DOB-based robust tracking control is applied to tackle the modeling gap including the vehicle dynamics errors and the external disturbances such as side forces. We provide simulations validating the capability of the proposed method to achieve zero-shot transfer across multiple driving scenarios such as lane keeping, lane changing and obstacle avoidance.
In a given scenario, simultaneously and accurately predicting every possible interaction of traffic participants is an important capability for autonomous vehicles. The majority of current researches focused on the prediction of an single entity without incorporating the environment information. Although some approaches aimed to predict multiple vehicles, they either predicted each vehicle independently with no considerations on possible interaction with surrounding entities or generated discretized joint motions which cannot be directly used in decision making and motion planning for autonomous vehicle. In this paper, we present a probabilistic framework that is able to jointly predict continuous motions for multiple interacting road participants under any driving scenarios and is capable of forecasting the duration of each interaction, which can enhance the prediction performance and efficiency. The proposed traffic scene prediction framework contains two hierarchical modules: the upper module and the lower module. The upper module forecasts the intention of the predicted vehicle, while the lower module predicts motions for interacting scene entities. An exemplar real-world scenario is used to implement and examine the proposed framework.
Point set registration is a powerful method that enables robots to manipulate deformable objects. By mapping the point cloud of the current object to the pre-trained point cloud, a transformation function can be constructed. The manipulator's trajectory for pre-trained shapes can be warped with this transformation function, yielding a feasible trajectory for the new shape. However, usually this transformation function regards objects as discrete points, and dismisses the topological structures. Therefore, it risks over-stretching or over-compression during manipulation. To tackle this problem, this paper proposes a tangent space point set registration method. A tangent space representation of an object is constructed by defining an angle for each node on the object. Point set registration algorithm runs in this newly-constructed tangent space, yielding a tangent space trajectory. The trajectory is then converted back to Cartesian space and carried out by the robot. Compared to its counterpart in Cartesian space, tangent space point set registration is safer and more robust, succeeding in a series of experiments such as rope straightening, rope knotting, cloth folding and unfolding.