Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fang Zhao

OccupancyDETR: Making Semantic Scene Completion as Straightforward as Object Detection

Sep 22, 2023
Yupeng Jia, Jie He, Runze Chen, Fang Zhao, Haiyong Luo

Figure 1 for OccupancyDETR: Making Semantic Scene Completion as Straightforward as Object Detection

Figure 2 for OccupancyDETR: Making Semantic Scene Completion as Straightforward as Object Detection

Figure 3 for OccupancyDETR: Making Semantic Scene Completion as Straightforward as Object Detection

Figure 4 for OccupancyDETR: Making Semantic Scene Completion as Straightforward as Object Detection

Visual-based 3D semantic occupancy perception (also known as 3D semantic scene completion) is a new perception paradigm for robotic applications like autonomous driving. Compared with Bird's Eye View (BEV) perception, it extends the vertical dimension, significantly enhancing the ability of robots to understand their surroundings. However, due to this very reason, the computational demand for current 3D semantic occupancy perception methods generally surpasses that of BEV perception methods and 2D perception methods. We propose a novel 3D semantic occupancy perception method, OccupancyDETR, which consists of a DETR-like object detection module and a 3D occupancy decoder module. The integration of object detection simplifies our method structurally - instead of predicting the semantics of each voxels, it identifies objects in the scene and their respective 3D occupancy grids. This speeds up our method, reduces required resources, and leverages object detection algorithm, giving our approach notable performance on small objects. We demonstrate the effectiveness of our proposed method on the SemanticKITTI dataset, showcasing an mIoU of 23 and a processing speed of 6 frames per second, thereby presenting a promising solution for real-time 3D semantic scene completion.

Via

Access Paper or Ask Questions

The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation

Jul 27, 2023
Lingdong Kong, Yaru Niu, Shaoyuan Xie, Hanjiang Hu, Lai Xing Ng, Benoit R. Cottereau, Ding Zhao, Liangjun Zhang, Hesheng Wang, Wei Tsang Ooi, Ruijie Zhu, Ziyang Song, Li Liu, Tianzhu Zhang, Jun Yu, Mohan Jing, Pengwei Li, Xiaohua Qi, Cheng Jin, Yingfeng Chen, Jie Hou, Jie Zhang, Zhen Kan, Qiang Ling, Liang Peng, Minglei Li, Di Xu, Changpeng Yang, Yuanqi Yao, Gang Wu, Jian Kuai, Xianming Liu, Junjun Jiang, Jiamian Huang, Baojun Li, Jiale Chen, Shuang Zhang, Sun Ao, Zhenyu Li, Runze Chen, Haiyong Luo, Fang Zhao, Jingze Yu

Figure 1 for The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation

Figure 2 for The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation

Figure 3 for The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation

Figure 4 for The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation

Accurate depth estimation under out-of-distribution (OoD) scenarios, such as adverse weather conditions, sensor failure, and noise contamination, is desirable for safety-critical applications. Existing depth estimation systems, however, suffer inevitably from real-world corruptions and perturbations and are struggled to provide reliable depth predictions under such cases. In this paper, we summarize the winning solutions from the RoboDepth Challenge -- an academic competition designed to facilitate and advance robust OoD depth estimation. This challenge was developed based on the newly established KITTI-C and NYUDepth2-C benchmarks. We hosted two stand-alone tracks, with an emphasis on robust self-supervised and robust fully-supervised depth estimation, respectively. Out of more than two hundred participants, nine unique and top-performing solutions have appeared, with novel designs ranging from the following aspects: spatial- and frequency-domain augmentations, masked image modeling, image restoration and super-resolution, adversarial training, diffusion-based noise suppression, vision-language pre-training, learned model ensembling, and hierarchical feature enhancement. Extensive experimental analyses along with insightful observations are drawn to better understand the rationale behind each design. We hope this challenge could lay a solid foundation for future research on robust and reliable depth estimation and beyond. The datasets, competition toolkit, workshop recordings, and source code from the winning teams are publicly available on the challenge website.

* Technical Report; 65 pages, 34 figures, 24 tables; Code at https://github.com/ldkong1205/RoboDepth

Via

Access Paper or Ask Questions

Learning Anchor Transformations for 3D Garment Animation

Apr 03, 2023
Fang Zhao, Zekun Li, Shaoli Huang, Junwu Weng, Tianfei Zhou, Guo-Sen Xie, Jue Wang, Ying Shan

Figure 1 for Learning Anchor Transformations for 3D Garment Animation

Figure 2 for Learning Anchor Transformations for 3D Garment Animation

Figure 3 for Learning Anchor Transformations for 3D Garment Animation

Figure 4 for Learning Anchor Transformations for 3D Garment Animation

This paper proposes an anchor-based deformation model, namely AnchorDEF, to predict 3D garment animation from a body motion sequence. It deforms a garment mesh template by a mixture of rigid transformations with extra nonlinear displacements. A set of anchors around the mesh surface is introduced to guide the learning of rigid transformation matrices. Once the anchor transformations are found, per-vertex nonlinear displacements of the garment template can be regressed in a canonical space, which reduces the complexity of deformation space learning. By explicitly constraining the transformed anchors to satisfy the consistencies of position, normal and direction, the physical meaning of learned anchor transformations in space is guaranteed for better generalization. Furthermore, an adaptive anchor updating is proposed to optimize the anchor position by being aware of local mesh topology for learning representative anchor transformations. Qualitative and quantitative experiments on different types of garments demonstrate that AnchorDEF achieves the state-of-the-art performance on 3D garment deformation prediction in motion, especially for loose-fitting garments.

* Accepted to CVPR 2023. Project page: https://semanticdh.github.io/AnchorDEF

Via

Access Paper or Ask Questions

HMC: Hierarchical Mesh Coarsening for Skeleton-free Motion Retargeting

Mar 20, 2023
Haoyu Wang, Shaoli Huang, Fang Zhao, Chun Yuan, Ying Shan

Figure 1 for HMC: Hierarchical Mesh Coarsening for Skeleton-free Motion Retargeting

Figure 2 for HMC: Hierarchical Mesh Coarsening for Skeleton-free Motion Retargeting

Figure 3 for HMC: Hierarchical Mesh Coarsening for Skeleton-free Motion Retargeting

Figure 4 for HMC: Hierarchical Mesh Coarsening for Skeleton-free Motion Retargeting

We present a simple yet effective method for skeleton-free motion retargeting. Previous methods transfer motion between high-resolution meshes, failing to preserve the inherent local-part motions in the mesh. Addressing this issue, our proposed method learns the correspondence in a coarse-to-fine fashion by integrating the retargeting process with a mesh-coarsening pipeline. First, we propose a mesh-coarsening module that coarsens the mesh representations for better motion transfer. This module improves the ability to handle small-part motion and preserves the local motion interdependence between neighboring mesh vertices. Furthermore, we leverage a hierarchical refinement procedure to complement missing mesh details by gradually improving the low-resolution mesh output with a higher-resolution one. We evaluate our method on several well-known 3D character datasets, and it yields an average improvement of 25% on point-wise mesh euclidean distance (PMD) against the start-of-art method. Moreover, our qualitative results show that our method is significantly helpful in preserving the moving consistency of different body parts on the target character due to disentangling body-part structures and mesh details in a hierarchical way.

Via

Access Paper or Ask Questions

Skinned Motion Retargeting with Residual Perception of Motion Semantics & Geometry

Mar 15, 2023
Jiaxu Zhang, Junwu Weng, Di Kang, Fang Zhao, Shaoli Huang, Xuefei Zhe, Linchao Bao, Ying Shan, Jue Wang, Zhigang Tu

Figure 1 for Skinned Motion Retargeting with Residual Perception of Motion Semantics & Geometry

Figure 2 for Skinned Motion Retargeting with Residual Perception of Motion Semantics & Geometry

Figure 3 for Skinned Motion Retargeting with Residual Perception of Motion Semantics & Geometry

Figure 4 for Skinned Motion Retargeting with Residual Perception of Motion Semantics & Geometry

A good motion retargeting cannot be reached without reasonable consideration of source-target differences on both the skeleton and shape geometry levels. In this work, we propose a novel Residual RETargeting network (R2ET) structure, which relies on two neural modification modules, to adjust the source motions to fit the target skeletons and shapes progressively. In particular, a skeleton-aware module is introduced to preserve the source motion semantics. A shape-aware module is designed to perceive the geometries of target characters to reduce interpenetration and contact-missing. Driven by our explored distance-based losses that explicitly model the motion semantics and geometry, these two modules can learn residual motion modifications on the source motion to generate plausible retargeted motion in a single inference without post-processing. To balance these two modifications, we further present a balancing gate to conduct linear interpolation between them. Extensive experiments on the public dataset Mixamo demonstrate that our R2ET achieves the state-of-the-art performance, and provides a good balance between the preservation of motion semantics as well as the attenuation of interpenetration and contact-missing. Code is available at https://github.com/Kebii/R2ET.

* CVPR 2023

Via

Access Paper or Ask Questions

Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation

Mar 22, 2022
Tianfei Zhou, Meijie Zhang, Fang Zhao, Jianwu Li

Figure 1 for Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation

Figure 2 for Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation

Figure 3 for Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation

Figure 4 for Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation

Learning semantic segmentation from weakly-labeled (e.g., image tags only) data is challenging since it is hard to infer dense object regions from sparse semantic tags. Despite being broadly studied, most current efforts directly learn from limited semantic annotations carried by individual image or image pairs, and struggle to obtain integral localization maps. Our work alleviates this from a novel perspective, by exploring rich semantic contexts synergistically among abundant weakly-labeled training data for network learning and inference. In particular, we propose regional semantic contrast and aggregation (RCA) . RCA is equipped with a regional memory bank to store massive, diverse object patterns appearing in training data, which acts as strong support for exploration of dataset-level semantic structure. Particularly, we propose i) semantic contrast to drive network learning by contrasting massive categorical object regions, leading to a more holistic object pattern understanding, and ii) semantic aggregation to gather diverse relational contexts in the memory to enrich semantic representations. In this manner, RCA earns a strong capability of fine-grained semantic understanding, and eventually establishes new state-of-the-art results on two popular benchmarks, i.e., PASCAL VOC 2012 and COCO 2014.

* Accepted to CVPR 2022. Code: https://github.com/maeve07/RCA.git

Via

Access Paper or Ask Questions

Fine-Grained Trajectory-based Travel Time Estimation for Multi-city Scenarios Based on Deep Meta-Learning

Jan 20, 2022
Chenxing Wang, Fang Zhao, Haichao Zhang, Haiyong Luo, Yanjun Qin, Yuchen Fang

Figure 1 for Fine-Grained Trajectory-based Travel Time Estimation for Multi-city Scenarios Based on Deep Meta-Learning

Figure 2 for Fine-Grained Trajectory-based Travel Time Estimation for Multi-city Scenarios Based on Deep Meta-Learning

Figure 3 for Fine-Grained Trajectory-based Travel Time Estimation for Multi-city Scenarios Based on Deep Meta-Learning

Figure 4 for Fine-Grained Trajectory-based Travel Time Estimation for Multi-city Scenarios Based on Deep Meta-Learning

Travel Time Estimation (TTE) is indispensable in intelligent transportation system (ITS). It is significant to achieve the fine-grained Trajectory-based Travel Time Estimation (TTTE) for multi-city scenarios, namely to accurately estimate travel time of the given trajectory for multiple city scenarios. However, it faces great challenges due to complex factors including dynamic temporal dependencies and fine-grained spatial dependencies. To tackle these challenges, we propose a meta learning based framework, MetaTTE, to continuously provide accurate travel time estimation over time by leveraging well-designed deep neural network model called DED, which consists of Data preprocessing module and Encoder-Decoder network module. By introducing meta learning techniques, the generalization ability of MetaTTE is enhanced using small amount of examples, which opens up new opportunities to increase the potential of achieving consistent performance on TTTE when traffic conditions and road networks change over time in the future. The DED model adopts an encoder-decoder network to capture fine-grained spatial and temporal representations. Extensive experiments on two real-world datasets are conducted to confirm that our MetaTTE outperforms six state-of-art baselines, and improve 29.35% and 25.93% accuracy than the best baseline on Chengdu and Porto datasets, respectively.

Via

Access Paper or Ask Questions

STformer: A Noise-Aware Efficient Spatio-Temporal Transformer Architecture for Traffic Forecasting

Dec 06, 2021
Yanjun Qin, Yuchen Fang, Haiyong Luo, Liang Zeng, Fang Zhao, Chenxing Wang

Figure 1 for STformer: A Noise-Aware Efficient Spatio-Temporal Transformer Architecture for Traffic Forecasting

Figure 2 for STformer: A Noise-Aware Efficient Spatio-Temporal Transformer Architecture for Traffic Forecasting

Figure 3 for STformer: A Noise-Aware Efficient Spatio-Temporal Transformer Architecture for Traffic Forecasting

Figure 4 for STformer: A Noise-Aware Efficient Spatio-Temporal Transformer Architecture for Traffic Forecasting

Traffic forecasting plays an indispensable role in the intelligent transportation system, which makes daily travel more convenient and safer. However, the dynamic evolution of spatio-temporal correlations makes accurate traffic forecasting very difficult. Existing work mainly employs graph neural netwroks (GNNs) and deep time series models (e.g., recurrent neural networks) to capture complex spatio-temporal patterns in the dynamic traffic system. For the spatial patterns, it is difficult for GNNs to extract the global spatial information, i.e., remote sensors information in road networks. Although we can use the self-attention to extract global spatial information as in the previous work, it is also accompanied by huge resource consumption. For the temporal patterns, traffic data have not only easy-to-recognize daily and weekly trends but also difficult-to-recognize short-term noise caused by accidents (e.g., car accidents and thunderstorms). Prior traffic models are difficult to distinguish intricate temporal patterns in time series and thus hard to get accurate temporal dependence. To address above issues, we propose a novel noise-aware efficient spatio-temporal Transformer architecture for accurate traffic forecasting, named STformer. STformer consists of two components, which are the noise-aware temporal self-attention (NATSA) and the graph-based sparse spatial self-attention (GBS3A). NATSA separates the high-frequency component and the low-frequency component from the time series to remove noise and capture stable temporal dependence by the learnable filter and the temporal self-attention, respectively. GBS3A replaces the full query in vanilla self-attention with the graph-based sparse query to decrease the time and memory usage. Experiments on four real-world traffic datasets show that STformer outperforms state-of-the-art baselines with lower computational cost.

Via

Access Paper or Ask Questions

CDGNet: A Cross-Time Dynamic Graph-based Deep Learning Model for Traffic Forecasting

Dec 06, 2021
Yuchen Fang, Yanjun Qin, Haiyong Luo, Fang Zhao, Liang Zeng, Bo Hui, Chenxing Wang

Figure 1 for CDGNet: A Cross-Time Dynamic Graph-based Deep Learning Model for Traffic Forecasting

Figure 2 for CDGNet: A Cross-Time Dynamic Graph-based Deep Learning Model for Traffic Forecasting

Figure 3 for CDGNet: A Cross-Time Dynamic Graph-based Deep Learning Model for Traffic Forecasting

Figure 4 for CDGNet: A Cross-Time Dynamic Graph-based Deep Learning Model for Traffic Forecasting

Traffic forecasting is important in intelligent transportation systems of webs and beneficial to traffic safety, yet is very challenging because of the complex and dynamic spatio-temporal dependencies in real-world traffic systems. Prior methods use the pre-defined or learnable static graph to extract spatial correlations. However, the static graph-based methods fail to mine the evolution of the traffic network. Researchers subsequently generate the dynamic graph for each time slice to reflect the changes of spatial correlations, but they follow the paradigm of independently modeling spatio-temporal dependencies, ignoring the cross-time spatial influence. In this paper, we propose a novel cross-time dynamic graph-based deep learning model, named CDGNet, for traffic forecasting. The model is able to effectively capture the cross-time spatial dependence between each time slice and its historical time slices by utilizing the cross-time dynamic graph. Meanwhile, we design a gating mechanism to sparse the cross-time dynamic graph, which conforms to the sparse spatial correlations in the real world. Besides, we propose a novel encoder-decoder architecture to incorporate the cross-time dynamic graph-based GCN for multi-step traffic forecasting. Experimental results on three real-world public traffic datasets demonstrate that CDGNet outperforms the state-of-the-art baselines. We additionally provide a qualitative study to analyze the effectiveness of our architecture.

* 10 pages

Via

Access Paper or Ask Questions

DMGCRN: Dynamic Multi-Graph Convolution Recurrent Network for Traffic Forecasting

Dec 04, 2021
Yanjun Qin, Yuchen Fang, Haiyong Luo, Fang Zhao, Chenxing Wang

Figure 1 for DMGCRN: Dynamic Multi-Graph Convolution Recurrent Network for Traffic Forecasting

Figure 2 for DMGCRN: Dynamic Multi-Graph Convolution Recurrent Network for Traffic Forecasting

Figure 3 for DMGCRN: Dynamic Multi-Graph Convolution Recurrent Network for Traffic Forecasting

Figure 4 for DMGCRN: Dynamic Multi-Graph Convolution Recurrent Network for Traffic Forecasting

Traffic forecasting is a problem of intelligent transportation systems (ITS) and crucial for individuals and public agencies. Therefore, researches pay great attention to deal with the complex spatio-temporal dependencies of traffic system for accurate forecasting. However, there are two challenges: 1) Most traffic forecasting studies mainly focus on modeling correlations of neighboring sensors and ignore correlations of remote sensors, e.g., business districts with similar spatio-temporal patterns; 2) Prior methods which use static adjacency matrix in graph convolutional networks (GCNs) are not enough to reflect the dynamic spatial dependence in traffic system. Moreover, fine-grained methods which use self-attention to model dynamic correlations of all sensors ignore hierarchical information in road networks and have quadratic computational complexity. In this paper, we propose a novel dynamic multi-graph convolution recurrent network (DMGCRN) to tackle above issues, which can model the spatial correlations of distance, the spatial correlations of structure, and the temporal correlations simultaneously. We not only use the distance-based graph to capture spatial information from nodes are close in distance but also construct a novel latent graph which encoded the structure correlations among roads to capture spatial information from nodes are similar in structure. Furthermore, we divide the neighbors of each sensor into coarse-grained regions, and dynamically assign different weights to each region at different times. Meanwhile, we integrate the dynamic multi-graph convolution network into the gated recurrent unit (GRU) to capture temporal dependence. Extensive experiments on three real-world traffic datasets demonstrate that our proposed algorithm outperforms state-of-the-art baselines.

* 10 pages

Via

Access Paper or Ask Questions