Research papers and code for "Jia Pan":
It is well-known that a deep understanding of co-workers' behavior and preference is important for collaboration effectiveness. In this work, we present a method to accomplish smooth human-robot collaboration in close proximity by taking into account the human's behavior while planning the robot's trajectory. In particular, we first use an occupancy map to summarize human's movement preference over time, and such prior information is then considered in an optimization-based motion planner via two cost items: 1) avoidance of the workspace previously occupied by human, to eliminate the interruption and to increase the task success rate; 2) tendency to keep a safe distance between the human and the robot to improve the safety. In the experiments, we compare the collaboration performance among planners using different combinations of human-aware cost items, including the avoidance factor, both the avoidance and safe distance factor, and a baseline where no human-related factors are considered. The trajectories generated are tested in both simulated and real-world environments, and the results show that our method can significantly increase the collaborative task success rates and is also human-friendly.

Click to Read Paper and Get Code
We present a novel approach for collision-free global navigation for continuous-time multi-agent systems with general linear dynamics. Our approach is general and can be used to perform collision-free navigation in 2D and 3D workspaces with narrow passages and crowded regions. As part of pre-computation, we compute multiple bridges in the narrow or tight regions in the workspace using kinodynamic RRT algorithms. Our bridge has certain geometric characteristics, that en- able us to calculate a collision-free trajectory for each agent using simple interpolation at runtime. Moreover, we combine interpolated bridge trajectories with local multi-agent navigation algorithms to compute global collision-free paths for each agent. The overall approach combines the performance benefits of coupled multi-agent algorithms with the pre- computed trajectories of the bridges to handle challenging scenarios. In practice, our approach can handle tens to hundreds of agents in real-time on a single CPU core in 2D and 3D workspaces.

Click to Read Paper and Get Code
We present an efficient algorithm for motion planning and control of a robot system with a high number of degrees-of-freedom. These include high-DOF soft robots or an articulated robot interacting with a deformable environment. Our approach takes into account dynamics constraints and present a novel technique to accelerate the forward dynamic computation using a data-driven method. We precompute the forward dynamic function of the robot system on a hierarchical adaptive grid. Furthermore, we exploit the properties of underactuated robot systems and perform these computations for a few DOFs. We provide error bounds for our approximate forward dynamics computation and use our approach for optimization-based motion planning and reinforcement-learning-based feedback control. Our formulation is used for motion planning of two high DOF robot systems: a high-DOF line-actuated elastic robot arm and an underwater swimming robot operating in water. As compared to prior techniques based on exact dynamic function computation, we observe one to two orders of magnitude improvement in performance.

* 7 pages
Click to Read Paper and Get Code
In this paper, we present a general approach to automatically visual-servo control the position and shape of a deformable object whose deformation parameters are unknown. The servo-control is achieved by online learning a model mapping between the robotic end-effector's movement and the object's deformation measurement. The model is learned using the Gaussian Process Regression (GPR) to deal with its highly nonlinear property, and once learned, the model is used for predicting the required control at each time step. To overcome GPR's high computational cost while dealing with long manipulation sequences, we implement a fast online GPR by selectively removing uninformative observation data from the regression process. We validate the performance of our controller on a set of deformable object manipulation tasks and demonstrate that our method can achieve effective and accurate servo-control for general deformable objects with a wide variety of goal settings. Experiment videos are available at https://sites.google.com/view/mso-fogpr

* Submitted to IEEE Robotics and Automation Letters(RAL)
Click to Read Paper and Get Code
High-speed, low-latency obstacle avoidance that is insensitive to sensor noise is essential for enabling multiple decentralized robots to function reliably in cluttered and dynamic environments. While other distributed multi-agent collision avoidance systems exist, these systems require online geometric optimization where tedious parameter tuning and perfect sensing are necessary. We present a novel end-to-end framework to generate reactive collision avoidance policy for efficient distributed multi-agent navigation. Our method formulates an agent's navigation strategy as a deep neural network mapping from the observed noisy sensor measurements to the agent's steering commands in terms of movement velocity. We train the network on a large number of frames of collision avoidance data collected by repeatedly running a multi-agent simulator with different parameter settings. We validate the learned deep neural network policy in a set of simulated and real scenarios with noisy measurements and demonstrate that our method is able to generate a robust navigation strategy that is insensitive to imperfect sensing and works reliably in all situations. We also show that our method can be well generalized to scenarios that do not appear in our training data, including scenes with static obstacles and agents with different sizes. Videos are available at https://sites.google.com/view/deepmaca.

* IEEE Robotics and Automation Letters 2(2): 656-663 (2017)
Click to Read Paper and Get Code
We propose a novel unifying scheme for parallel implementation of articulated robot dynamics algorithms. It is based on a unified Lie group notation for deriving the equations of motion of articulated robots, where various well-known forward algorithms differ only by their joint inertia matrix inversion strategies. This new scheme leads to a unified abstraction of state-of-the-art forward dynamics algorithms into combinations of block bi-diagonal and/or block tri-diagonal systems, which may be efficiently solved by parallel all-prefix-sum operations (scan) and parallel odd-even elimination (OEE) respectively. We implement the proposed scheme on a Nvidia CUDA GPU platform for the comparative study of three algorithms, namely the hybrid articulated-body inertia algorithm (ABIA), the parallel joint space inertia inversion algorithm (JSIIA) and the constrained force algorithm (CFA), and the performances are analyzed.

Click to Read Paper and Get Code
We propose a new parallel framework for fast computation of inverse and forward dynamics of articulated robots based on prefix sums (scans). We re-investigate the well-known recursive Newton-Euler formulation of robot dynamics and show that the forward-backward propagation process for robot inverse dynamics is equivalent to two scan operations on certain semigroups. We show that the state-of-the-art forward dynamics algorithms may almost completely be cast into a sequence of scan operations, with unscannable parts clearly identified. This suggests a serial-parallel hybrid approach for systems with a moderate number of links. We implement our scan based algorithms on Nvidia CUDA platform with performance compared with multithreading CPU-based recursive algorithms; a significant level of acceleration is demonstrated.

Click to Read Paper and Get Code
The complex physical properties of highly deformable materials such as clothes pose significant challenges fanipulation systems. We present a novel visual feedback dictionary-based method for manipulating defoor autonomous robotic mrmable objects towards a desired configuration. Our approach is based on visual servoing and we use an efficient technique to extract key features from the RGB sensor stream in the form of a histogram of deformable model features. These histogram features serve as high-level representations of the state of the deformable material. Next, we collect manipulation data and use a visual feedback dictionary that maps the velocity in the high-dimensional feature space to the velocity of the robotic end-effectors for manipulation. We have evaluated our approach on a set of complex manipulation tasks and human-robot manipulation tasks on different cloth pieces with varying material characteristics.

* The video is available at goo.gl/mDSC4H
Click to Read Paper and Get Code
We present a novel method to compute the approximate global penetration depth (PD) between two non-convex geometric models. Our approach consists of two phases: offline precomputation and run-time queries. In the first phase, our formulation uses a novel sampling algorithm to precompute an approximation of the high-dimensional contact space between the pair of models. As compared with prior random sampling algorithms for contact space approximation, our propagation sampling considerably speeds up the precomputation and yields a high quality approximation. At run-time, we perform a nearest-neighbor query and local projection to efficiently compute the translational or generalized PD. We demonstrate the performance of our approach on complex 3D benchmarks with tens or hundreds of thousands of triangles, and we observe significant improvement over previous methods in terms of accuracy, with a modest improvement in the run-time performance.

* 10 pages. add the acknowledgement
Click to Read Paper and Get Code
We present a novel approach for robust manipulation of high-DOF deformable objects such as cloth. Our approach uses a random forest-based controller that maps the observed visual features of the cloth to an optimal control action of the manipulator. The topological structure of this random forest-based controller is determined automatically based on the training data consisting visual features and optimal control actions. This enables us to integrate the overall process of training data classification and controller optimization into an imitation learning (IL) approach. Our approach enables learning of robust control policy for cloth manipulation with guarantees on convergence.We have evaluated our approach on different multi-task cloth manipulation benchmarks such as flattening, folding and twisting. In practice, our approach works well with different deformable features learned based on the specific task or deep learning. Moreover, our controller outperforms a simple or piecewise linear controller in terms of robustness to noise. In addition, our approach is easy to implement and does not require much parameter tuning.

Click to Read Paper and Get Code
Existing shape estimation methods for deformable object manipulation suffer from the drawbacks of being off-line, model dependent, noise-sensitive or occlusion-sensitive, and thus are not appropriate for manipulation tasks requiring high precision. In this paper, we present a real-time shape estimation approach for autonomous robotic manipulation of 3D deformable objects. Our method fulfills all the requirements necessary for the high-quality deformable object manipulation in terms of being real-time, model-free and robust to noise and occlusion. These advantages are accomplished using a joint tracking and reconstruction framework, in which we track the object deformation by aligning a reference shape model with the stream input from the RGB-D camera, and simultaneously upgrade the reference shape model according to the newly captured RGB-D data. We have evaluated the quality and robustness of our real-time shape estimation pipeline on a set of deformable manipulation tasks implemented on physical robots. Videos are available at https://lifeisfantastic.github.io/DeformShapeEst/

Click to Read Paper and Get Code
To achieve human-like dexterity for anthropomorphic robotic hands, it is essential to understand the biomechanics and control strategies of the human hand, in order to reduce the number of actuators being used without loosing hand flexibility. To this end, in this article, we propose a new interpretation about the working mechanism of the metacarpal (MCP) joint's extension and the underlying control strategies of the human hand, based on which we further propose a highly flexible finger design to achieve independent movements of interphalangeal (IP) joints and MCP joint. Besides, we consider the hyperextension of fingertip into our design which helps robotic finger present compliant and adaptive posture for touching and pinching. In addition, human thumb muscle functions are reconstructed in the proposed robotic hand design, by replacing 9 human muscle tendons with 3 cables in the proposed task-oriented design, realizing all 33 static and stable grasping postures. Videos are available at https://sites.google.com/view/szwd

Click to Read Paper and Get Code
In this paper, we present a decentralized sensor-level collision avoidance policy for multi-robot systems, which shows promising results in practical applications. In particular, our policy directly maps raw sensor measurements to an agent's steering commands in terms of the movement velocity. As a first step toward reducing the performance gap between decentralized and centralized methods, we present a multi-scenario multi-stage training framework to learn an optimal policy. The policy is trained over a large number of robots in rich, complex environments simultaneously using a policy gradient based reinforcement learning algorithm. The learning algorithm is also integrated into a hybrid control framework to further improve the policy's robustness and effectiveness. We validate the learned sensor-level collision avoidance policy in a variety of simulated and real-world scenarios with thorough performance evaluations for large-scale multi-robot systems. The generalization of the learned policy is verified in a set of unseen scenarios including the navigation of a group of heterogeneous robots and a large-scale scenario with 100 robots. Although the policy is trained using simulation data only, we have successfully deployed it on physical robots with shapes and dynamics characteristics that are different from the simulated agents, in order to demonstrate the controller's robustness against the sim-to-real modeling error. Finally, we show that the collision-avoidance policy learned from multi-robot navigation tasks provides an excellent solution to the safe and effective autonomous navigation for a single robot working in a dense real human crowd. Our learned policy enables a robot to make effective progress in a crowd without getting stuck. Videos are available at https://sites.google.com/view/hybridmrca

Click to Read Paper and Get Code
In this paper, we present a general learning-based framework to automatically visual-servo control the position and shape of a deformable object with unknown deformation parameters. The servo-control is accomplished by learning a feedback controller that determines the robotic end-effector's movement according to the deformable object's current status. This status encodes the object's deformation behavior by using a set of observed visual features, which are either manually designed or automatically extracted from the robot's sensor stream. A feedback control policy is then optimized to push the object toward a desired featured status efficiently. The feedback policy can be learned either online or offline. Our online policy learning is based on the Gaussian Process Regression (GPR), which can achieve fast and accurate manipulation and is robust to small perturbations. An offline imitation learning framework is also proposed to achieve a control policy that is robust to large perturbations in the human-robot interaction. We validate the performance of our controller on a set of deformable object manipulation tasks and demonstrate that our method can achieve effective and accurate servo-control for general deformable objects with a wide variety of goal settings.

* arXiv admin note: text overlap with arXiv:1709.07218, arXiv:1710.06947, arXiv:1802.09661
Click to Read Paper and Get Code
Pick-and-place regrasp is an important manipulation skill for a robot. It helps a robot accomplish tasks that cannot be achieved within a single grasp, due to constraints such as kinematics or collisions between the robot and the environment. Previous work on pick-and-place regrasp only leveraged flat surfaces for intermediate placements, and thus is limited in the capability to reorient an object. In this paper, we extend the reorientation capability of a pick-and-place regrasp by adding a vertical pin on the working surface and using it as the intermediate location for regrasping. In particular, our method automatically computes the stable placements of an object leaning against a vertical pin, finds several force-closure grasps, generates a graph of regrasp actions, and searches for the regrasp sequence. To compare the regrasping performance with and without using pins, we evaluate the success rate and the length of regrasp sequences while performing tasks on various models. Experiments on reorientation and assembly tasks validate the benefit of using support pins for regrasping.

* 14 pages, 20 figures
Click to Read Paper and Get Code
Audio-visual speech recognition (AVSR) system is thought to be one of the most promising solutions for robust speech recognition, especially in noisy environment. In this paper, we propose a novel multimodal attention based method for audio-visual speech recognition which could automatically learn the fused representation from both modalities based on their importance. Our method is realized using state-of-the-art sequence-to-sequence (Seq2seq) architectures. Experimental results show that relative improvements from 2% up to 36% over the auditory modality alone are obtained depending on the different signal-to-noise-ratio (SNR). Compared to the traditional feature concatenation methods, our proposed approach can achieve better recognition performance under both clean and noisy conditions. We believe modality attention based end-to-end method can be easily generalized to other multimodal tasks with correlated information.

Click to Read Paper and Get Code
Attention-based end-to-end (E2E) speech recognition models such as Listen, Attend, and Spell (LAS) can achieve better results than traditional automatic speech recognition (ASR) hybrid models on LVCSR tasks. LAS combines acoustic, pronunciation and language model components of a traditional ASR system into a single neural network. However, such architectures are hard to be used for streaming speech recognition for its bidirectional listener architecture and attention mechanism. In this work, we propose to use latency-controlled bidirectional long short-term memory (LC- BLSTM) listener to reduce the delay of forward computing of listener. On the attention side, we propose an adaptive monotonic chunk-wise attention (AMoChA) to make LAS online. We explore how each part performs when it is used alone and obtain comparable or better results than LAS baseline. By combining the above two methods, we successfully stream LAS baseline with only 3.5% relative degradation of character error rate (CER) on our Mandarin corpus. We believe that our methods can also have the same effect on other languages.

Click to Read Paper and Get Code
End-to-end approaches have drawn much attention recently for significantly simplifying the construction of an automatic speech recognition (ASR) system. RNN transducer (RNN-T) is one of the popular end-to-end methods. Previous studies have shown that RNN-T is difficult to train and a very complex training process is needed for a reasonable performance. In this paper, we explore RNN-T for a Chinese large vocabulary continuous speech recognition (LVCSR) task and aim to simplify the training process while maintaining performance. First, a new strategy of learning rate decay is proposed to accelerate the model convergence. Second, we find that adding convolutional layers at the beginning of the network and using ordered data can discard the pre-training process of the encoder without loss of performance. Besides, we design experiments to find a balance among the usage of GPU memory, training circle and model performance. Finally, we achieve 16.9% character error rate (CER) on our test set which is 2% absolute improvement from a strong BLSTM CE system with language model trained on the same text corpus.

Click to Read Paper and Get Code
This paper focuses on the challenging task of learning 3D object surface reconstructions from single RGB images. Existing methods achieve varying degrees of success by using different geometric representations. However, they all have their own drawbacks, and cannot well reconstruct those surfaces of complex topologies. To this end, we propose in this paper a skeleton-bridged, stage-wise learning approach to address the challenge. Our use of skeleton is due to its nice property of topology preservation, while being of lower complexity to learn. To learn skeleton from an input image, we design a deep architecture whose decoder is based on a novel design of parallel streams respectively for synthesis of curve- and surface-like skeleton points. We use different shape representations of point cloud, volume, and mesh in our stage-wise learning, in order to take their respective advantages. We also propose multi-stage use of the input image to correct prediction errors that are possibly accumulated in each stage. We conduct intensive experiments to investigate the efficacy of our proposed approach. Qualitative and quantitative results on representative object categories of both simple and complex topologies demonstrate the superiority of our approach over existing ones. We will make our ShapeNet-Skeleton dataset publicly available.

* 8 pages paper, 3 pages supplementary material, CVPR Oral paper
Click to Read Paper and Get Code
In this paper, we present a robotic navigation algorithm with natural language interfaces, which enables a robot to safely walk through a changing environment with moving persons by following human instructions such as "go to the restaurant and keep away from people". We first classify human instructions into three types: the goal, the constraints, and uninformative phrases. Next, we provide grounding for the extracted goal and constraint items in a dynamic manner along with the navigation process, to deal with the target objects that are too far away for sensor observation and the appearance of moving obstacles like humans. In particular, for a goal phrase (e.g., "go to the restaurant"), we ground it to a location in a predefined semantic map and treat it as a goal for a global motion planner, which plans a collision-free path in the workspace for the robot to follow. For a constraint phrase (e.g., "keep away from people"), we dynamically add the corresponding constraint into a local planner by adjusting the values of a local costmap according to the results returned by the object detection module. The updated costmap is then used to compute a local collision avoidance control for the safe navigation of the robot. By combining natural language processing, motion planning, and computer vision, our developed system is demonstrated to be able to successfully follow natural language navigation instructions to achieve navigation tasks in both simulated and real-world scenarios. Videos are available at https://sites.google.com/view/snhi

Click to Read Paper and Get Code