Research papers and code for "Shaoshan Liu":
In contrast to manned missions, the application of autonomous robots for space exploration missions decreases the safety concerns of the exploration missions while extending the exploration distance since returning transportation is not necessary for robotics missions. In addition, the employment of robots in these missions also decreases mission complexities and costs because there is no need for onboard life support systems: robots can withstand and operate in harsh conditions, for instance, extreme temperature, pressure, and radiation, where humans cannot survive. In this article, we introduce environments on Mars, review the existing autonomous driving techniques deployed on Earth, as well as explore technologies required to enable future commercial autonomous space robotic explorers. Last but not least, we also present that one of the urgent technical challenges for autonomous space explorers, namely, computing power onboard.

* 16 pages, 6 figures, accepted for publication on IEEE Potentials Magazine
Click to Read Paper and Get Code
Robotics systems are complex, often consisted of basic services including SLAM for localization and mapping, Convolution Neural Networks for scene understanding, and Speech Recognition for user interaction, etc. Meanwhile, robots are mobile and usually have tight energy constraints, integrating these services onto an embedded platform with around 10 W of power consumption is critical to the proliferation of mobile robots. In this paper, we present a case study on integrating real-time localization, vision, and speech recognition services on a mobile SoC, Nvidia Jetson TX1, within about 10 W of power envelope. In addition, we explore whether offloading some of the services to cloud platform can lead to further energy efficiency while meeting the real-time requirements

* 12 pages, 8 figures
Click to Read Paper and Get Code
Bundle adjustment (BA) is a fundamental optimization technique used in many crucial applications, including 3D scene reconstruction, robotic localization, camera calibration, autonomous driving, space exploration, street view map generation etc. Essentially, BA is a joint non-linear optimization problem, and one which can consume a significant amount of time and power, especially for large optimization problems. Previous approaches of optimizing BA performance heavily rely on parallel processing or distributed computing, which trade higher power consumption for higher performance. In this paper we propose {\pi}-BA, the first hardware-software co-designed BA engine on an embedded FPGA-SoC that exploits custom hardware for higher performance and power efficiency. Specifically, based on our key observation that not all points appear on all images in a BA problem, we designed and implemented a Co-Observation Optimization technique to accelerate BA operations with optimized usage of memory and computation resources. Experimental results confirm that {\pi}-BA outperforms the existing software implementations in terms of performance and power consumption.

* in Proceedings of IEEE FCCM 2019
Click to Read Paper and Get Code
Autonomous vehicle safety and reliability are the paramount requirements when developing autonomous vehicles. These requirements are guaranteed by massive functional and performance tests. Conducting these tests on real vehicles is extremely expensive and time consuming, and thus it is imperative to develop a simulation platform to perform these tasks. For simulation, we can utilize the Robot Operating System (ROS) for data playback to test newly developed algorithms. However, due to the massive amount of simulation data, performing simulation on single machines is not practical. Hence, a high-performance distributed simulation platform is a critical piece in autonomous driving development. In this paper we present our experiences of building a production distributed autonomous driving simulation platform. This platform is built upon Spark distributed framework, for distributed computing management, and ROS, for data playback simulations.

* 12 pages, 7 figures
Click to Read Paper and Get Code
When you need to enable deep learning on low-cost embedded SoCs, is it better to port an existing deep learning framework or should you build one from scratch? In this paper, we share our practical experiences of building an embedded inference engine using ARM Compute Library (ACL). The results show that, contradictory to conventional wisdoms, for simple models, it takes much less development time to build an inference engine from scratch compared to porting existing frameworks. In addition, by utilizing ACL, we managed to build an inference engine that outperforms TensorFlow by 25%. Our conclusion is that, on embedded devices, we most likely will use very simple deep learning models for inference, and with well-developed building blocks such as ACL, it may be better in both performance and development time to build the engine from scratch.

* 4 pages, 4 figures
Click to Read Paper and Get Code
Simultaneous Localization And Mapping (SLAM) is the problem of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent's location within it. How to enable SLAM robustly and durably on mobile, or even IoT grade devices, is the main challenge faced by the industry today. The main problems we need to address are: 1.) how to accelerate the SLAM pipeline to meet real-time requirements; and 2.) how to reduce SLAM energy consumption to extend battery life. After delving into the problem, we found out that feature extraction is indeed the bottleneck of performance and energy consumption. Hence, in this paper, we design, implement, and evaluate a hardware ORB feature extractor and prove that our design is a great balance between performance and energy consumption compared with ARM Krait and Intel Core i5.

Click to Read Paper and Get Code
We describe the computing tasks involved in autonomous driving, examine existing autonomous driving computing platform implementations. To enable autonomous driving, the computing stack needs to simultaneously provide high performance, low power consumption, and low thermal dissipation, at low cost. We discuss possible approaches to design computing platforms that will meet these needs.

* 7 pages, 4 figures, accepted by IEEE Computer Magazine
Click to Read Paper and Get Code
Enabling full robotic workloads with diverse behaviors on mobile systems with stringent resource and energy constraints remains a challenge. In recent years, attempts have been made to deploy single-accelerator-based computing platforms (such as GPU, DSP, or FPGA) to address this challenge, but with little success. The core problem is two-fold: firstly, different robotic tasks require different accelerators, and secondly, managing multiple accelerators simultaneously is overwhelming for developers. In this paper, we propose PIRT, the first robotic runtime framework to efficiently manage dynamic task executions on mobile systems with multiple accelerators as well as on the cloud to achieve better performance and energy savings. With PIRT, we enable a robot to simultaneously perform autonomous navigation with 25 FPS of localization, obstacle detection with 3 FPS, route planning, large map generation, and scene understanding, traveling at a max speed of 5 miles per hour, all within an 11W computing power envelope.

Click to Read Paper and Get Code
Autonomous driving clouds provide essential services to support autonomous vehicles. Today these services include but not limited to distributed simulation tests for new algorithm deployment, offline deep learning model training, and High-Definition (HD) map generation. These services require infrastructure support including distributed computing, distributed storage, as well as heterogeneous computing. In this paper, we present the details of how we implement a unified autonomous driving cloud infrastructure, and how we support these services on top of this infrastructure.

* 8 pages, 12 figures
Click to Read Paper and Get Code
In this paper, we present the Trifo Visual Inertial Odometry (Trifo-VIO), a tightly-coupled filtering-based stereo VIO system using both points and lines. Line features help improve system robustness in challenging scenarios when point features cannot be reliably detected or tracked, e.g. low-texture environment or lighting change. In addition, we propose a novel lightweight filtering-based loop closing technique to reduce accumulated drift without global bundle adjustment or pose graph optimization. We formulate loop closure as EKF updates to optimally relocate the current sliding window maintained by the filter to past keyframes. We also present the Trifo Ironsides dataset, a new visual-inertial dataset, featuring high-quality synchronized stereo camera and IMU data from the Ironsides sensor [3] with various motion types and textures and millimeter-accuracy groundtruth. To validate the performance of the proposed system, we conduct extensive comparison with state-of-the-art approaches (OKVIS, VINS-MONO and S-MSCKF) using both the public EuRoC dataset and the Trifo Ironsides dataset.

Click to Read Paper and Get Code
In this paper, we present the PerceptIn Robotics Vision System (PIRVS) system, a visual-inertial computing hardware with embedded simultaneous localization and mapping (SLAM) algorithm. The PIRVS hardware is equipped with a multi-core processor, a global-shutter stereo camera, and an IMU with precise hardware synchronization. The PIRVS software features a novel and flexible sensor fusion approach to not only tightly integrate visual measurements with inertial measurements and also to loosely couple with additional sensor modalities. It runs in real-time on both PC and the PIRVS hardware. We perform a thorough evaluation of the proposed system using multiple public visual-inertial datasets. Experimental results demonstrate that our system reaches comparable accuracy of state-of-the-art visual-inertial algorithms on PC, while being more efficient on the PIRVS hardware.

Click to Read Paper and Get Code
Autonomous driving is not one single technology but rather a complex system integrating many technologies, which means that teaching autonomous driving is a challenging task. Indeed, most existing autonomous driving classes focus on one of the technologies involved. This not only fails to provide a comprehensive coverage, but also sets a high entry barrier for students with different technology backgrounds. In this paper, we present a modular, integrated approach to teaching autonomous driving. Specifically, we organize the technologies used in autonomous driving into modules. This is described in the textbook we have developed as well as a series of multimedia online lectures designed to provide technical overview for each module. Then, once the students have understood these modules, the experimental platforms for integration we have developed allow the students to fully understand how the modules interact with each other. To verify this teaching approach, we present three case studies: an introductory class on autonomous driving for students with only a basic technology background; a new session in an existing embedded systems class to demonstrate how embedded system technologies can be applied to autonomous driving; and an industry professional training session to quickly bring up experienced engineers to work in autonomous driving. The results show that students can maintain a high interest level and make great progress by starting with familiar concepts before moving onto other modules.

Click to Read Paper and Get Code
The rise of robotic applications has led to the generation of a huge volume of unstructured data, whereas the current cloud infrastructure was designed to process limited amounts of structured data. To address this problem, we propose a learn-memorize-recall-reduce paradigm for robotic cloud computing. The learning stage converts incoming unstructured data into structured data; the memorization stage provides effective storage for the massive amount of data; the recall stage provides efficient means to retrieve the raw data; while the reduction stage provides means to make sense of this massive amount of unstructured data with limited computing resources.

* 6 pages, 7 figures
Click to Read Paper and Get Code