Models, code, and papers for "Ram Vasudevan":
Illumination effects in images, specifically cast shadows and shading, have been shown to decrease the performance of deep neural networks on a large number of vision-based detection, recognition and segmentation tasks in urban driving scenes. A key factor that contributes to this performance gap is the lack of `time-of-day' diversity within real, labeled datasets. There have been impressive advances in the realm of image to image translation in transferring previously unseen visual effects into a dataset, specifically in day to night translation. However, it is not easy to constrain what visual effects, let alone illumination effects, are transferred from one dataset to another during the training process. To address this problem, we propose deep learning framework, called Shadow Transfer, that can relight complex outdoor scenes by transferring realistic shadow, shading, and other lighting effects onto a single image. The novelty of the proposed framework is that it is both self-supervised, and is designed to operate on sensor and label information that is easily available in autonomous vehicle datasets. We show the effectiveness of this method on both synthetic and real datasets, and we provide experiments that demonstrate that the proposed method produces images of higher visual quality than state of the art image to image translation methods.
Highway driving places significant demands on human drivers and autonomous vehicles (AVs) alike due to high speeds and the complex interactions in dense traffic. Merging onto the highway poses additional challenges by limiting the amount of time available for decision-making. Predicting others' trajectories accurately and quickly is crucial to safely executing these maneuvers. Many existing prediction methods based on neural networks have focused on modeling interactions to achieve better accuracy while assuming the existence of observation windows over 3s long. This paper proposes a novel probabilistic model for trajectory prediction that performs competitively with as little as 400ms of observations. The proposed method fits a low-dimensional car-following model to observed behavior and introduces nonconvex regularization terms that enforce realistic driving behaviors in the predictions. The resulting inference procedure allows for realtime forecasts up to 10s into the future while accounting for interactions between vehicles. Experiments on dense traffic in the NGSIM dataset demonstrate that the proposed method achieves state-of-the-art performance with both highly constrained and more traditional observation windows.
In applications such as autonomous driving, it is important to understand, infer, and anticipate the intention and future behavior of pedestrians. This ability allows vehicles to avoid collisions and improve ride safety and quality. This paper proposes a biomechanically inspired recurrent neural network (Bio-LSTM) that can predict the location and 3D articulated body pose of pedestrians in a global coordinate frame, given 3D poses and locations estimated in prior frames with inaccuracy. The proposed network is able to predict poses and global locations for multiple pedestrians simultaneously, for pedestrians up to 45 meters from the cameras (urban intersection scale). The outputs of the proposed network are full-body 3D meshes represented in Skinned Multi-Person Linear (SMPL) model parameters. The proposed approach relies on a novel objective function that incorporates the periodicity of human walking (gait), the mirror symmetry of the human body, and the change of ground reaction forces in a human gait cycle. This paper presents prediction results on the PedX dataset, a large-scale, in-the-wild data set collected at real urban intersections with heavy pedestrian traffic. Results show that the proposed network can successfully learn the characteristics of pedestrian gait and produce accurate and consistent 3D pose predictions.
Soft robots are challenging to model due to their nonlinear behavior. However, their soft bodies make it possible to safely observe their behavior under random control inputs, making them amenable to large-scale data collection and system identification. This paper implements and evaluates a system identification method based on Koopman operator theory. This theory offers a way to represent a nonlinear system as a linear system in the infinite-dimensional space of real-valued functions called observables, enabling models of nonlinear systems to be constructed via linear regression of observed data. The approach does not suffer from some of the shortcomings of other nonlinear system identification methods, which typically require the manual tuning of training parameters and have limited convergence guarantees. A dynamic model of a pneumatic soft robot arm is constructed via this method, and used to predict the behavior of the real system. The total normalized-root-mean-square error (NRMSE) of its predictions over twelve validation trials is lower than that of several other identified models including a neural network, NLARX, nonlinear Hammerstein-Wiener, and linear state space model.
This paper presents an interconnected control-planning strategy for redundant manipulators, subject to system and environmental constraints. The method incorporates low-level control characteristics and high-level planning components into a robust strategy for manipulators acting in complex environments, subject to joint limits. This strategy is formulated using an adaptive control rule, the estimated dynamic model of the robotic system and the nullspace of the linearized constraints. A path is generated that takes into account the capabilities of the platform. The proposed method is computationally efficient, enabling its implementation on a real multi-body robotic system. Through experimental results with a 7 DOF manipulator, we demonstrate the performance of the method in real-world scenarios.
Knowing and predicting dangerous factors within a scene are two key components during autonomous driving, especially in a crowded urban environment. To navigate safely in environments, risk assessment is needed to quantify and associate the risk of taking a certain action. Risk assessment and planning is usually done by first tracking and predicting trajectories of other agents, such as vehicles and pedestrians, and then choosing an action to avoid collision in the future. However, few existing risk assessment algorithms handle occlusion and other sensory limitations effectively. This paper explores the possibility of efficient risk assessment under occlusion via both forward and backward reachability. The proposed algorithm can not only identify where the risk-induced factors are, but also be used for motion planning by executing low-level commands, such as throttle. The proposed method is evaluated on various four-way highly occluded intersections with up to five other vehicles in the scene. Compared with other risk assessment algorithms, the proposed method shows better efficiency, meaning that the ego vehicle reaches the goal at a higher speed. In addition, it also lowers the median collision rate by 7.5x.
Safety guarantees are valuable in the control of walking robots, as falling can be both dangerous and costly. Unfortunately, set-based tools for generating safety guarantees (such as sums-of-squares optimization) are typically restricted to simplified, low-dimensional models of walking robots. For more complex models, methods based on hybrid zero dynamics can ensure the local stability of a pre-specified limit cycle, but provide limited guarantees. This paper combines the benefits of both approaches by using sums-of-squares optimization on a hybrid zero dynamics manifold to generate a guaranteed safe set for a 10-dimensional walking robot model. Along with this set, this paper describes how to generate a controller that maintains safety by modifying the manifold parameters when on the edge of the safe set. The proposed approach, which is applied to a bipedal Rabbit model, provides a roadmap for applying sums-of-squares verification techniques to high dimensional systems. This opens the door for a broad set of tools that can generate safety guarantees and regulating controllers for complex walking robot models.
Navigating safely in urban environments remains a challenging problem for autonomous vehicles. Occlusion and limited sensor range can pose significant challenges to safely navigate among pedestrians and other vehicles in the environment. Enabling vehicles to quantify the risk posed by unseen regions allows them to anticipate future possibilities, resulting in increased safety and ride comfort. This paper proposes an algorithm that takes advantage of the known road layouts to forecast, quantify, and aggregate risk associated with occlusions and limited sensor range. This allows us to make predictions of risk induced by unobserved vehicles even in heavily occluded urban environments. The risk can then be used either by a low-level planning algorithm to generate better trajectories, or by a high-level one to plan a better route. The proposed algorithm is evaluated on intersection layouts from real-world map data with up to five other vehicles in the scene, and verified to reduce collision rates by 4.8x comparing to a baseline method while improving driving comfort.
Locomotion in the real world involves unexpected perturbations, and therefore requires strategies to maintain stability to successfully execute desired behaviours. Ensuring the safety of locomoting systems therefore necessitates a quantitative metric for stability. Due to the difficulty of determining the set of perturbations that induce failure, researchers have used a variety of features as a proxy to describe stability. This paper utilises recent advances in dynamical systems theory to develop a personalised, automated framework to compute the set of perturbations from which a system can avoid failure, which is known as the basin of stability. The approach tracks human motion to synthesise a control input that is analysed to measure the basin of stability. The utility of this analysis is verified on a Sit-to-Stand task performed by 15 individuals. The experiment illustrates that the computed basin of stability for each individual can successfully differentiate between less and more stable Sit-to-Stand strategies.
Urban environments pose a significant challenge for autonomous vehicles (AVs) as they must safely navigate while in close proximity to many pedestrians. It is crucial for the AV to correctly understand and predict the future trajectories of pedestrians to avoid collision and plan a safe path. Deep neural networks (DNNs) have shown promising results in accurately predicting pedestrian trajectories, relying on large amounts of annotated real-world data to learn pedestrian behavior. However, collecting and annotating these large real-world pedestrian datasets is costly in both time and labor. This paper describes a novel method using a stochastic sampling-based simulation to train DNNs for pedestrian trajectory prediction with social interaction. Our novel simulation method can generate vast amounts of automatically-annotated, realistic, and naturalistic synthetic pedestrian trajectories based on small amounts of real annotation. We then use such synthetic trajectories to train an off-the-shelf state-of-the-art deep learning approach Social GAN (Generative Adversarial Network) to perform pedestrian trajectory prediction. Our proposed architecture, trained only using synthetic trajectories, achieves better prediction results compared to those trained on human-annotated real-world data using the same network. Our work demonstrates the effectiveness and potential of using simulation as a substitution for human annotation efforts to train high-performing prediction algorithms such as the DNNs.
Controlling soft robots with precision is a challenge due in large part to the difficulty of constructing models that are amenable to model-based control design techniques. Koopman Operator Theory offers a way to construct explicit linear dynamical models of soft robots and to control them using established model-based linear control methods. This method is data-driven, yet unlike other data-driven models such as neural networks, it yields an explicit control-oriented linear model rather than just a "black-box" input-output mapping. This work describes this Koopman-based system identification method and its application to model predictive controller design. A model and MPC controller of a pneumatic soft robot arm was constructed via the method, and its performance was evaluated over several trajectory following tasks in the real-world. On all of the tasks, the Koopman-based MPC controller outperformed a benchmark MPC controller based on a linear state-space model of the same system.
The compliant structure of soft robotic systems enables a variety of novel capabilities in comparison to traditional rigid-bodied robots. A subclass of soft fluid-driven actuators known as fiber reinforced elastomeric enclosures (FREEs) is particularly well suited as actuators for these types of systems. FREEs are inherently soft and can impart spatial forces without imposing a rigid structure. Furthermore, they can be configured to produce a large variety of force and moment combinations. In this paper we explore the potential of combining multiple differently configured FREEs in parallel to achieve fully controllable multi-dimensional soft actuation. To this end, we propose a novel methodology to represent and calculate the generalized forces generated by soft actuators as a function of their internal pressure. This methodology relies on the notion of a state dependent fluid Jacobian that yields a linear expression for force. We employ this concept to construct the set of all possible forces that can be generated by a soft system in a given state. This force zonotope can be used to inform the design and control of parallel combinations of soft actuators. The approach is verified experimentally with the parallel combination of three carefully designed actuators constrained to a 2DOF testing rig. The force predictions matched measured values with a root-mean-square error of less than 1.5 N force and 8 x 10^(-3)Nm moment, demonstrating the utility of the presented methodology.
Recent work has focused on generating synthetic imagery to increase the size and variability of training data for learning visual tasks in urban scenes. This includes increasing the occurrence of occlusions or varying environmental and weather effects. However, few have addressed modeling variation in the sensor domain. Sensor effects can degrade real images, limiting generalizability of network performance on visual tasks trained on synthetic data and tested in real environments. This paper proposes an efficient, automatic, physically-based augmentation pipeline to vary sensor effects --chromatic aberration, blur, exposure, noise, and color cast-- for synthetic imagery. In particular, this paper illustrates that augmenting synthetic training datasets with the proposed pipeline reduces the domain gap between synthetic and real domains for the task of object detection in urban driving scenes.
Performance on benchmark datasets has drastically improved with advances in deep learning. Still, cross- dataset generalization performance remains relatively low due to the domain shift that can occur between two different datasets. This domain shift is especially exaggerated between synthetic and real datasets. Significant research has been done to reduce this gap, specifically via modeling variation in the spatial layout of a scene, such as occlusions, and scene environmental factors, such as time of day and weather effects. However, few works have addressed modeling the variation in the sensor domain as a means of reducing the synthetic to real domain gap. The camera or sensor used to capture a dataset introduces artifacts into the image data that are unique to the sensor model, suggesting that sensor effects may also contribute to domain shift. To address this, we propose a learned augmentation network composed of physically-based augmentation functions. Our proposed augmentation pipeline transfers specific effects of the sensor model --chromatic aberration, blur, exposure, noise, and color temperature-- from a real dataset to a synthetic dataset. We provide experiments that demonstrate that augmenting synthetic training datasets with the proposed learned augmentation framework reduces the domain gap between synthetic and real domains for object detection in urban driving scenes.
Recent work has shown that convolutional neural networks (CNNs) can be applied successfully in disparity estimation, but these methods still suffer from errors in regions of low-texture, occlusions and reflections. Concurrently, deep learning for semantic segmentation has shown great progress in recent years. In this paper, we design a CNN architecture that combines these two tasks to improve the quality and accuracy of disparity estimation with the help of semantic segmentation. Specifically, we propose a network structure in which these two tasks are highly coupled. One key novelty of this approach is the two-stage refinement process. Initial disparity estimates are refined with an embedding learned from the semantic segmentation branch of the network. The proposed model is trained using an unsupervised approach, in which images from one half of the stereo pair are warped and compared against images from the other camera. Another key advantage of the proposed approach is that a single network is capable of outputting disparity estimates and semantic labels. These outputs are of great use in autonomous vehicle operation; with real-time constraints being key, such performance improvements increase the viability of driving applications. Experiments on KITTI and Cityscapes datasets show that our model can achieve state-of-the-art results and that leveraging embedding learned from semantic segmentation improves the performance of disparity estimation.
In this paper, we present an approach for designing feedback controllers for polynomial systems that maximize the size of the time-limited backwards reachable set (BRS). We rely on the notion of occupation measures to pose the synthesis problem as an infinite dimensional linear program (LP) and provide finite dimensional approximations of this LP in terms of semidefinite programs (SDPs). The solution to each SDP yields a polynomial control policy and an outer approximation of the largest achievable BRS. In contrast to traditional Lyapunov based approaches which are non-convex and require feasible initialization, our approach is convex and does not require any form of initialization. The resulting time-varying controllers and approximated reachable sets are well-suited for use in a trajectory library or feedback motion planning algorithm. We demonstrate the efficacy and scalability of our approach on five nonlinear systems.
An accurate characterization of pose uncertainty is essential for safe autonomous navigation. Early pose uncertainty characterization methods proposed by Smith, Self, and Cheeseman (SCC), used coordinate-based first-order methods to propagate uncertainty through non-linear functions such as pose composition (head-to-tail), pose inversion, and relative pose extraction (tail-to-tail). Characterizing uncertainty in the Lie Algebra of the special Euclidean group results in better uncertainty estimates. However, existing approaches assume that individual poses are independent. Since factors in a pose graph induce correlation, this independence assumption is usually not reflected in reality. In addition, prior work has focused primarily on the pose composition operation. This paper develops a framework for modeling the uncertainty of jointly distributed poses and describes how to perform the equivalent of the SSC pose operations while characterizing uncertainty in the Lie Algebra. Evaluation on simulated and open-source datasets shows that the proposed methods result in more accurate uncertainty estimates. An accompanying C++ library implementation is also released. This is a pre-print of a paper submitted to IEEE TRO in 2019.
An accurate depth map of the environment is critical to the safe operation of autonomous robots and vehicles. Currently, either light detection and ranging (LIDAR) or stereo matching algorithms are used to acquire such depth information. However, a high-resolution LIDAR is expensive and produces sparse depth map at large range; stereo matching algorithms are able to generate denser depth maps but are typically less accurate than LIDAR at long range. This paper combines these approaches together to generate high-quality dense depth maps. Unlike previous approaches that are trained using ground-truth labels, the proposed model adopts a self-supervised training process. Experiments show that the proposed method is able to generate high-quality dense depth maps and performs robustly even with low-resolution inputs. This shows the potential to reduce the cost by using LIDARs with lower resolution in concert with stereo systems while maintaining high resolution.