Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yunhui Liu

$E^{3}$Gen: Efficient, Expressive and Editable Avatars Generation

May 29, 2024

Weitian Zhang, Yichao Yan, Yunhui Liu, Xingdong Sheng, Xiaokang Yang

This paper aims to introduce 3D Gaussian for efficient, expressive, and editable digital avatar generation. This task faces two major challenges: (1) The unstructured nature of 3D Gaussian makes it incompatible with current generation pipelines; (2) the expressive animation of 3D Gaussian in a generative setting that involves training with multiple subjects remains unexplored. In this paper, we propose a novel avatar generation method named $E^3$Gen, to effectively address these challenges. First, we propose a novel generative UV features plane representation that encodes unstructured 3D Gaussian onto a structured 2D UV space defined by the SMPL-X parametric model. This novel representation not only preserves the representation ability of the original 3D Gaussian but also introduces a shared structure among subjects to enable generative learning of the diffusion model. To tackle the second challenge, we propose a part-aware deformation module to achieve robust and accurate full-body expressive pose control. Extensive experiments demonstrate that our method achieves superior performance in avatar generation and enables expressive full-body pose control and editing.

Via

Access Paper or Ask Questions

IPAD: Industrial Process Anomaly Detection Dataset

Apr 23, 2024

Jinfan Liu, Yichao Yan, Junjie Li, Weiming Zhao, Pengzhi Chu, Xingdong Sheng, Yunhui Liu, Xiaokang Yang

Video anomaly detection (VAD) is a challenging task aiming to recognize anomalies in video frames, and existing large-scale VAD researches primarily focus on road traffic and human activity scenes. In industrial scenes, there are often a variety of unpredictable anomalies, and the VAD method can play a significant role in these scenarios. However, there is a lack of applicable datasets and methods specifically tailored for industrial production scenarios due to concerns regarding privacy and security. To bridge this gap, we propose a new dataset, IPAD, specifically designed for VAD in industrial scenarios. The industrial processes in our dataset are chosen through on-site factory research and discussions with engineers. This dataset covers 16 different industrial devices and contains over 6 hours of both synthetic and real-world video footage. Moreover, we annotate the key feature of the industrial process, ie, periodicity. Based on the proposed dataset, we introduce a period memory module and a sliding window inspection mechanism to effectively investigate the periodic information in a basic reconstruction model. Our framework leverages LoRA adapter to explore the effective migration of pretrained models, which are initially trained using synthetic data, into real-world scenarios. Our proposed dataset and method will fill the gap in the field of industrial video anomaly detection and drive the process of video understanding tasks as well as smart factory deployment.

Via

Access Paper or Ask Questions

Rethinking Clothes Changing Person ReID: Conflicts, Synthesis, and Optimization

Apr 19, 2024

Junjie Li, Guanshuo Wang, Fufu Yu, Yichao Yan, Qiong Jia, Shouhong Ding, Xingdong Sheng, Yunhui Liu, Xiaokang Yang

Clothes-changing person re-identification (CC-ReID) aims to retrieve images of the same person wearing different outfits. Mainstream researches focus on designing advanced model structures and strategies to capture identity information independent of clothing. However, the same-clothes discrimination as the standard ReID learning objective in CC-ReID is persistently ignored in previous researches. In this study, we dive into the relationship between standard and clothes-changing~(CC) learning objectives, and bring the inner conflicts between these two objectives to the fore. We try to magnify the proportion of CC training pairs by supplementing high-fidelity clothes-varying synthesis, produced by our proposed Clothes-Changing Diffusion model. By incorporating the synthetic images into CC-ReID model training, we observe a significant improvement under CC protocol. However, such improvement sacrifices the performance under the standard protocol, caused by the inner conflict between standard and CC. For conflict mitigation, we decouple these objectives and re-formulate CC-ReID learning as a multi-objective optimization (MOO) problem. By effectively regularizing the gradient curvature across multiple objectives and introducing preference restrictions, our MOO solution surpasses the single-task training paradigm. Our framework is model-agnostic, and demonstrates superior performance under both CC and standard ReID protocols.

Via

Access Paper or Ask Questions

DBPF: A Framework for Efficient and Robust Dynamic Bin-Picking

Mar 25, 2024

Yichuan Li, Junkai Zhao, Yixiao Li, Zheng Wu, Rui Cao, Masayoshi Tomizuka, Yunhui Liu

Figure 1 for DBPF: A Framework for Efficient and Robust Dynamic Bin-Picking

Figure 2 for DBPF: A Framework for Efficient and Robust Dynamic Bin-Picking

Figure 3 for DBPF: A Framework for Efficient and Robust Dynamic Bin-Picking

Figure 4 for DBPF: A Framework for Efficient and Robust Dynamic Bin-Picking

Efficiency and reliability are critical in robotic bin-picking as they directly impact the productivity of automated industrial processes. However, traditional approaches, demanding static objects and fixed collisions, lead to deployment limitations, operational inefficiencies, and process unreliability. This paper introduces a Dynamic Bin-Picking Framework (DBPF) that challenges traditional static assumptions. The DBPF endows the robot with the reactivity to pick multiple moving arbitrary objects while avoiding dynamic obstacles, such as the moving bin. Combined with scene-level pose generation, the proposed pose selection metric leverages the Tendency-Aware Manipulability Network optimizing suction pose determination. Heuristic task-specific designs like velocity-matching, dynamic obstacle avoidance, and the resight policy, enhance the picking success rate and reliability. Empirical experiments demonstrate the importance of these components. Our method achieves an average 84% success rate, surpassing the 60% of the most comparable baseline, crucially, with zero collisions. Further evaluations under diverse dynamic scenarios showcase DBPF's robust performance in dynamic bin-picking. Results suggest that our framework offers a promising solution for efficient and reliable robotic bin-picking under dynamics.

* 8 pages, 5 figures. This paper has been accepted by IEEE RA-L on 2024-03-24. See the supplementary video at youtube: https://youtu.be/n5af2VsKhkg

Via

Access Paper or Ask Questions

Inter-X: Towards Versatile Human-Human Interaction Analysis

Dec 26, 2023

Liang Xu, Xintao Lv, Yichao Yan, Xin Jin, Shuwen Wu, Congsheng Xu, Yifan Liu, Yizhou Zhou, Fengyun Rao, Xingdong Sheng, Yunhui Liu, Wenjun Zeng, Xiaokang Yang

The analysis of the ubiquitous human-human interactions is pivotal for understanding humans as social beings. Existing human-human interaction datasets typically suffer from inaccurate body motions, lack of hand gestures and fine-grained textual descriptions. To better perceive and generate human-human interactions, we propose Inter-X, a currently largest human-human interaction dataset with accurate body movements and diverse interaction patterns, together with detailed hand gestures. The dataset includes ~11K interaction sequences and more than 8.1M frames. We also equip Inter-X with versatile annotations of more than 34K fine-grained human part-level textual descriptions, semantic interaction categories, interaction order, and the relationship and personality of the subjects. Based on the elaborate annotations, we propose a unified benchmark composed of 4 categories of downstream tasks from both the perceptual and generative directions. Extensive experiments and comprehensive analysis show that Inter-X serves as a testbed for promoting the development of versatile human-human interaction analysis. Our dataset and benchmark will be publicly available for research purposes.

* Project page: https://liangxuy.github.io/inter-x/

Via

Access Paper or Ask Questions

End-to-End Deep Visual Control for Mastering Needle-Picking Skills With World Models and Behavior Cloning

Mar 07, 2023

Hongbin Lin, Bin Li, Xiangyu Chu, Qi Dou, Yunhui Liu, Kwok Wai Samuel Au

Figure 1 for End-to-End Deep Visual Control for Mastering Needle-Picking Skills With World Models and Behavior Cloning

Figure 2 for End-to-End Deep Visual Control for Mastering Needle-Picking Skills With World Models and Behavior Cloning

Figure 3 for End-to-End Deep Visual Control for Mastering Needle-Picking Skills With World Models and Behavior Cloning

Figure 4 for End-to-End Deep Visual Control for Mastering Needle-Picking Skills With World Models and Behavior Cloning

Needle picking is a challenging surgical task in robot-assisted surgery due to the characteristics of small slender shapes of needles, needles' variations in shapes and sizes, and demands for millimeter-level control. Prior works, heavily relying on the prior of needles (e.g., geometric models), are hard to scale to unseen needles' variations. In addition, visual tracking errors can not be minimized online using their approaches. In this paper, we propose an end-to-end deep visual learning framework for needle-picking tasks where both visual and control components can be learned jointly online. Our proposed framework integrates a state-of-the-art reinforcement learning framework, Dreamer, with behavior cloning (BC). Besides, two novel techniques, i.e., Virtual Clutch and Dynamic Spotlight Adaptation (DSA), are introduced to our end-to-end visual controller for needle-picking tasks. We conducted extensive experiments in simulation to evaluate the performance, robustness, variation adaptation, and effectiveness of individual components of our method. Our approach, trained by 8k demonstration timesteps and 140k online policy timesteps, can achieve a remarkable success rate of 80%, a new state-of-the-art with end-to-end vision-based surgical robot learning for delicate operations tasks. Furthermore, our method effectively demonstrated its superiority in generalization to unseen dynamic scenarios with needle variations and image disturbance, highlighting its robustness and versatility. Codes and videos are available at https://sites.google.com/view/dreamerbc.

* First manuscript submitted to IROS 2023 on March 1, 2023

Via

Access Paper or Ask Questions

Two-Stage Grasping: A New Bin Picking Framework for Small Objects

Mar 07, 2023

Hanwen Cao, Jianshu Zhou, Junda Huang, Yichuan Li, Ng Cheng Meng, Rui Cao, Qi Dou, Yunhui Liu

Figure 1 for Two-Stage Grasping: A New Bin Picking Framework for Small Objects

Figure 2 for Two-Stage Grasping: A New Bin Picking Framework for Small Objects

Figure 3 for Two-Stage Grasping: A New Bin Picking Framework for Small Objects

Figure 4 for Two-Stage Grasping: A New Bin Picking Framework for Small Objects

This paper proposes a novel bin picking framework, two-stage grasping, aiming at precise grasping of cluttered small objects. Object density estimation and rough grasping are conducted in the first stage. Fine segmentation, detection, grasping, and pushing are performed in the second stage. A small object bin picking system has been realized to exhibit the concept of two-stage grasping. Experiments have shown the effectiveness of the proposed framework. Unlike traditional bin picking methods focusing on vision-based grasping planning using classic frameworks, the challenges of picking cluttered small objects can be solved by the proposed new framework with simple vision detection and planning.

* ICRA 2023

Via

Access Paper or Ask Questions

Open-source High-precision Autonomous Suturing Framework With Visual Guidance

Oct 04, 2022

Hongbin Lin, Bin Li, Yunhui Liu, Kwok Wai Samuel Au

Figure 1 for Open-source High-precision Autonomous Suturing Framework With Visual Guidance

Autonomous surgery has attracted increasing attention for revolutionizing robotic patient care, yet remains a distant and challenging goal. In this paper, we propose an image-based framework for high-precision autonomous suturing operation. We first build an algebraic geometric algorithm to achieve accurate needle pose estimation, then design the corresponding keypoint-based calibration network for joint-offset compensation, and further plan and control suture trajectory. Our solution ranked first among all competitors in the AccelNet Surgical Robotics Challenge. The source code is opened here to accelerate future autonomous surgery research.

Via

Access Paper or Ask Questions

GenLoco: Generalized Locomotion Controllers for Quadrupedal Robots

Sep 12, 2022

Gilbert Feng, Hongbo Zhang, Zhongyu Li, Xue Bin Peng, Bhuvan Basireddy, Linzhu Yue, Zhitao Song, Lizhi Yang, Yunhui Liu, Koushil Sreenath, Sergey Levine

Figure 1 for GenLoco: Generalized Locomotion Controllers for Quadrupedal Robots

Figure 2 for GenLoco: Generalized Locomotion Controllers for Quadrupedal Robots

Figure 3 for GenLoco: Generalized Locomotion Controllers for Quadrupedal Robots

Figure 4 for GenLoco: Generalized Locomotion Controllers for Quadrupedal Robots

Recent years have seen a surge in commercially-available and affordable quadrupedal robots, with many of these platforms being actively used in research and industry. As the availability of legged robots grows, so does the need for controllers that enable these robots to perform useful skills. However, most learning-based frameworks for controller development focus on training robot-specific controllers, a process that needs to be repeated for every new robot. In this work, we introduce a framework for training generalized locomotion (GenLoco) controllers for quadrupedal robots. Our framework synthesizes general-purpose locomotion controllers that can be deployed on a large variety of quadrupedal robots with similar morphologies. We present a simple but effective morphology randomization method that procedurally generates a diverse set of simulated robots for training. We show that by training a controller on this large set of simulated robots, our models acquire more general control strategies that can be directly transferred to novel simulated and real-world robots with diverse morphologies, which were not observed during training.

* First two authors contributed equally

Via

Access Paper or Ask Questions

AutoLaparo: A New Dataset of Integrated Multi-tasks for Image-guided Surgical Automation in Laparoscopic Hysterectomy

Aug 03, 2022

Ziyi Wang, Bo Lu, Yonghao Long, Fangxun Zhong, Tak-Hong Cheung, Qi Dou, Yunhui Liu

Figure 1 for AutoLaparo: A New Dataset of Integrated Multi-tasks for Image-guided Surgical Automation in Laparoscopic Hysterectomy

Figure 2 for AutoLaparo: A New Dataset of Integrated Multi-tasks for Image-guided Surgical Automation in Laparoscopic Hysterectomy

Figure 3 for AutoLaparo: A New Dataset of Integrated Multi-tasks for Image-guided Surgical Automation in Laparoscopic Hysterectomy

Figure 4 for AutoLaparo: A New Dataset of Integrated Multi-tasks for Image-guided Surgical Automation in Laparoscopic Hysterectomy

Computer-assisted minimally invasive surgery has great potential in benefiting modern operating theatres. The video data streamed from the endoscope provides rich information to support context-awareness for next-generation intelligent surgical systems. To achieve accurate perception and automatic manipulation during the procedure, learning based technique is a promising way, which enables advanced image analysis and scene understanding in recent years. However, learning such models highly relies on large-scale, high-quality, and multi-task labelled data. This is currently a bottleneck for the topic, as available public dataset is still extremely limited in the field of CAI. In this paper, we present and release the first integrated dataset (named AutoLaparo) with multiple image-based perception tasks to facilitate learning-based automation in hysterectomy surgery. Our AutoLaparo dataset is developed based on full-length videos of entire hysterectomy procedures. Specifically, three different yet highly correlated tasks are formulated in the dataset, including surgical workflow recognition, laparoscope motion prediction, and instrument and key anatomy segmentation. In addition, we provide experimental results with state-of-the-art models as reference benchmarks for further model developments and evaluations on this dataset. The dataset is available at https://autolaparo.github.io.

* Accepted at MICCAI 2022

Via

Access Paper or Ask Questions