Models, code, and papers for "Jiacheng Li":

Connection Sensitive Attention U-NET for Accurate Retinal Vessel Segmentation

Mar 13, 2019
Ruirui Li, Mingming Li, Jiacheng Li

We develop a connection sensitive attention U-Net(CSAU) for accurate retinal vessel segmentation. This method improves the recent attention U-Net for semantic segmentation with four key improvements: (1) connection sensitive loss that models the structure properties to improve the accuracy of pixel-wise segmentation; (2) attention gate with novel neural network structure and concatenating DOWN-Link to effectively learn better attention weights on fine vessels; (3) integration of connection sensitive loss and attention gate to further improve the accuracy on detailed vessels by additionally concatenating attention weights to features before output; (4) metrics of connection sensitive accuracy to reflect the segmentation performance on boundaries and thin vessels. Our method can effectively improve state-of-the-art vessel segmentation methods that suffer from difficulties in presence of abnormalities, bifurcation and microvascular. This connection sensitive loss tightly integrates with the proposed attention U-Net to accurately (i) segment retinal vessels, and (ii) reserve the connectivity of thin vessels by modeling the structural properties. Our method achieves the leading position on DRIVE, STARE and HRF datasets among the state-of-the-art methods.


  Click for Model/Code and Paper
Informative Visual Storytelling with Cross-modal Rules

Aug 05, 2019
Jiacheng Li, Haizhou Shi, Siliang Tang, Fei Wu, Yueting Zhuang

Existing methods in the Visual Storytelling field often suffer from the problem of generating general descriptions, while the image contains a lot of meaningful contents remaining unnoticed. The failure of informative story generation can be concluded to the model's incompetence of capturing enough meaningful concepts. The categories of these concepts include entities, attributes, actions, and events, which are in some cases crucial to grounded storytelling. To solve this problem, we propose a method to mine the cross-modal rules to help the model infer these informative concepts given certain visual input. We first build the multimodal transactions by concatenating the CNN activations and the word indices. Then we use the association rule mining algorithm to mine the cross-modal rules, which will be used for the concept inference. With the help of the cross-modal rules, the generated stories are more grounded and informative. Besides, our proposed method holds the advantages of interpretation, expandability, and transferability, indicating potential for wider application. Finally, we leverage these concepts in our encoder-decoder framework with the attention mechanism. We conduct several experiments on the VIsual StoryTelling~(VIST) dataset, the results of which demonstrate the effectiveness of our approach in terms of both automatic metrics and human evaluation. Additional experiments are also conducted showing that our mined cross-modal rules as additional knowledge helps the model gain better performance when trained on a small dataset.

* 9 pages, to appear in ACM Multimedia 2019 

  Click for Model/Code and Paper
Mechatronic Design of a Dribbling System for RoboCup Small Size Robot

May 24, 2019
Zheyuan Huang, Yunkai Wang, Lingyun Chen, Jiacheng Li, Zexi Chen, Rong Xiong

RoboCup SSL is an excellent platform for researching artificial intelligence and robotics. The dribbling system is an essential issue, which is the main part for completing advanced soccer skills such as trapping and dribbling. In this paper, we designed a new dribbling system for SSL robots, including mechatronics design and control algorithms. For the mechatronics design, we analysed and exposed the 3-touch-point model with the simulation in ADAMS. In the motor controller algorithm, we use reinforcement learning to control the torque output. Finally we verified the results on the robot.

* RCAR 2019. arXiv admin note: substantial text overlap with arXiv:1905.09157 

  Click for Model/Code and Paper
DARTS+: Improved Differentiable Architecture Search with Early Stopping

Sep 13, 2019
Hanwen Liang, Shifeng Zhang, Jiacheng Sun, Xingqiu He, Weiran Huang, Kechen Zhuang, Zhenguo Li

Recently, there has been a growing interest in automating the process of neural architecture design, and the Differentiable Architecture Search (DARTS) method makes the process available within a few GPU days. In particular, a hyper-network called one-shot model is introduced, over which the architecture can be searched continuously with gradient descent. However, the performance of DARTS is often observed to collapse when the number of search epochs becomes large. Meanwhile, lots of "skip-connects" are found in the selected architectures. In this paper, we claim that the cause of the collapse is that there exist cooperation and competition in the bi-level optimization in DARTS, where the architecture parameters and model weights are updated alternatively. Therefore, we propose a simple and effective algorithm, named "DARTS+", to avoid the collapse and improve the original DARTS, by "early stopping" the search procedure when meeting a certain criterion. We demonstrate that the proposed early stopping criterion is effective in avoiding the collapse issue. We also conduct experiments on benchmark datasets and show the effectiveness of our DARTS+ algorithm, where DARTS+ achieves $2.32\%$ test error on CIFAR10, $14.87\%$ on CIFAR100, and $23.7\%$ on ImageNet. We further remark that the idea of "early stopping" is implicitly included in some existing DARTS variants by manually setting a small number of search epochs, while we give an explicit criterion for "early stopping".


  Click for Model/Code and Paper
Towards Making the Most of BERT in Neural Machine Translation

Aug 30, 2019
Jiacheng Yang, Mingxuan Wang, Hao Zhou, Chengqi Zhao, Yong Yu, Weinan Zhang, Lei Li

GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various natural language processing tasks. However, LM fine-tuning often suffers from catastrophic forgetting when applied to resource-rich tasks. In this work, we introduce a concerted training framework (\method) that is the key to integrate the pre-trained LMs to neural machine translation (NMT). Our proposed Cnmt consists of three techniques: a) asymptotic distillation to ensure that the NMT model can retain the previous pre-trained knowledge; \item a dynamic switching gate to avoid catastrophic forgetting of pre-trained knowledge; and b)a strategy to adjust the learning paces according to a scheduled policy. Our experiments in machine translation show \method gains of up to 3 BLEU score on the WMT14 English-German language pair which even surpasses the previous state-of-the-art pre-training aided NMT by 1.4 BLEU score. While for the large WMT14 English-French task with 40 millions of sentence-pairs, our base model still significantly improves upon the state-of-the-art Transformer big model by more than 1 BLEU score.


  Click for Model/Code and Paper
ZJUNlict Extended Team Description Paper for RoboCup 2019

May 22, 2019
Zheyuan Huang, Lingyun Chen, Jiacheng Li, Yunkai Wang, Zexi Chen, Licheng Wen, Jianyang Gu, Peng Hu, Rong Xiong

For the Small Size League of RoboCup 2018, Team ZJUNLict has won the champion and therefore, this paper thoroughly described the devotion which ZJUNLict has devoted and the effort that ZJUNLict has contributed. There are three mean optimizations for the mechanical part which accounted for most of our incredible goals, they are "Touching Point Optimization", "Damping System Optimization", and "Dribbler Optimization". For the electrical part, we realized "Direct Torque Control", "Efficient Radio Communication Protocol" which will be credited for stabilizing the dribbler and a more secure communication between robots and the computer. Our software group contributed as much as our hardware group with the effort of "Vision Lost Compensation" to predict the movement by kalman filter, and "Interception Prediction Algorithm" to achieve some skills and improve our ball possession rate.

* ZJUNlict Extended Team Description Paper for RoboCup 2019 Small Size League 

  Click for Model/Code and Paper
Phrase Grounding by Soft-Label Chain Conditional Random Field

Sep 01, 2019
Jiacheng Liu, Julia Hockenmaier

The phrase grounding task aims to ground each entity mention in a given caption of an image to a corresponding region in that image. Although there are clear dependencies between how different mentions of the same caption should be grounded, previous structured prediction methods that aim to capture such dependencies need to resort to approximate inference or non-differentiable losses. In this paper, we formulate phrase grounding as a sequence labeling task where we treat candidate regions as potential labels, and use neural chain Conditional Random Fields (CRFs) to model dependencies among regions for adjacent mentions. In contrast to standard sequence labeling tasks, the phrase grounding task is defined such that there may be multiple correct candidate regions. To address this multiplicity of gold labels, we define so-called Soft-Label Chain CRFs, and present an algorithm that enables convenient end-to-end training. Our method establishes a new state-of-the-art on phrase grounding on the Flickr30k Entities dataset. Analysis shows that our model benefits both from the entity dependencies captured by the CRF and from the soft-label training regime. Our code is available at \url{github.com/liujch1998/SoftLabelCCRF}

* 11 pages, 5 figures, accepted by EMNLP-IJCNLP 2019 

  Click for Model/Code and Paper
Estimating Risk Levels of Driving Scenarios through Analysis of Driving Styles for Autonomous Vehicles

Apr 23, 2019
Songlin Xu, Jiacheng Zhu

In order to operate safely on the road, autonomous vehicles need not only to be able to identify objects in front of them, but also to be able to estimate the risk level of the object in front of the vehicle automatically. It is obvious that different objects have different levels of danger to autonomous vehicles. An evaluation system is needed to automatically determine the danger level of the object for the autonomous vehicle. It would be too subjective and incomplete if the system were completely defined by humans. Based on this, we propose a framework based on nonparametric Bayesian learning method -- a sticky hierarchical Dirichlet process hidden Markov model(sticky HDP-HMM), and discover the relationship between driving scenarios and driving styles. We use the analysis of driving styles of autonomous vehicles to reflect the risk levels of driving scenarios to the vehicles. In this framework, we firstly use sticky HDP-HMM to extract driving styles from the dataset and get different clusters, then an evaluation system is proposed to evaluate and rank the urgency levels of the clusters. Finally, we map the driving scenarios to the ranking results and thus get clusters of driving scenarios in different risk levels. More importantly, we find the relationship between driving scenarios and driving styles. The experiment shows that our framework can cluster and rank driving styles of different urgency levels and find the relationship between driving scenarios and driving styles and the conclusions also fit people's common sense when driving. Furthermore, this framework can be used for autonomous vehicles to estimate risk levels of driving scenarios and help them make precise and safe decisions.


  Click for Model/Code and Paper
Spherical Latent Spaces for Stable Variational Autoencoders

Oct 12, 2018
Jiacheng Xu, Greg Durrett

A hallmark of variational autoencoders (VAEs) for text processing is their combination of powerful encoder-decoder models, such as LSTMs, with simple latent distributions, typically multivariate Gaussians. These models pose a difficult optimization problem: there is an especially bad local optimum where the variational posterior always equals the prior and the model does not use the latent variable at all, a kind of "collapse" which is encouraged by the KL divergence term of the objective. In this work, we experiment with another choice of latent distribution, namely the von Mises-Fisher (vMF) distribution, which places mass on the surface of the unit hypersphere. With this choice of prior and posterior, the KL divergence term now only depends on the variance of the vMF distribution, giving us the ability to treat it as a fixed hyperparameter. We show that doing so not only averts the KL collapse, but consistently gives better likelihoods than Gaussians across a range of modeling conditions, including recurrent language modeling and bag-of-words document modeling. An analysis of the properties of our vMF representations shows that they learn richer and more nuanced structures in their latent representations than their Gaussian counterparts.

* To appear in EMNLP 2018; 11 pages; Code release: https://github.com/jiacheng-xu/vmf_vae_nlp 

  Click for Model/Code and Paper
Bottom-up Object Detection by Grouping Extreme and Center Points

Feb 03, 2019
Xingyi Zhou, Jiacheng Zhuo, Philipp Krähenbühl

With the advent of deep learning, object detection drifted from a bottom-up to a top-down recognition problem. State of the art algorithms enumerate a near-exhaustive list of object locations and classify each into: object or not. In this paper, we show that bottom-up approaches still perform competitively. We detect four extreme points (top-most, left-most, bottom-most, right-most) and one center point of objects using a standard keypoint estimation network. We group the five keypoints into a bounding box if they are geometrically aligned. Object detection is then a purely appearance-based keypoint estimation problem, without region classification or implicit feature learning. The proposed method performs on-par with the state-of-the-art region based detection methods, with a bounding box AP of 43.2% on COCO test-dev. In addition, our estimated extreme points directly span a coarse octagonal mask, with a COCO Mask AP of 18.9%, much better than the Mask AP of vanilla bounding boxes. Extreme point guided segmentation further improves this to 34.6% Mask AP.


  Click for Model/Code and Paper
A Tempt to Unify Heterogeneous Driving Databases using Traffic Primitives

May 13, 2018
Jiacheng Zhu, Wenshuo Wang, Ding Zhao

A multitude of publicly-available driving datasets and data platforms have been raised for autonomous vehicles (AV). However, the heterogeneities of databases in size, structure and driving context make existing datasets practically ineffective due to a lack of uniform frameworks and searchable indexes. In order to overcome these limitations on existing public datasets, this paper proposes a data unification framework based on traffic primitives with ability to automatically unify and label heterogeneous traffic data. This is achieved by two steps: 1) Carefully arrange raw multidimensional time series driving data into a relational database and then 2) automatically extract labeled and indexed traffic primitives from traffic data through a Bayesian nonparametric learning method. Finally, we evaluate the effectiveness of our developed framework using the collected real vehicle data.

* 6 pages, 7 figures, 1 table, ITSC 2018 

  Click for Model/Code and Paper
Floor-SP: Inverse CAD for Floorplans by Sequential Room-wise Shortest Path

Aug 19, 2019
Jiacheng Chen, Chen Liu, Jiaye Wu, Yasutaka Furukawa

This paper proposes a new approach for automated floorplan reconstruction from RGBD scans, a major milestone in indoor mapping research. The approach, dubbed Floor-SP, formulates a novel optimization problem, where room-wise coordinate descent sequentially solves dynamic programming to optimize the floorplan graph structure. The objective function consists of data terms guided by deep neural networks, consistency terms encouraging adjacent rooms to share corners and walls, and the model complexity term. The approach does not require corner/edge detection with thresholds, unlike most other methods. We have evaluated our system on production-quality RGBD scans of 527 apartments or houses, including many units with non-Manhattan structures. Qualitative and quantitative evaluations demonstrate a significant performance boost over the current state-of-the-art. Please refer to our project website http://jcchen.me/floor-sp/ for code and data.

* 10 pages, 9 figures, accepted to ICCV 2019 

  Click for Model/Code and Paper
Discourse-Aware Neural Extractive Model for Text Summarization

Oct 30, 2019
Jiacheng Xu, Zhe Gan, Yu Cheng, Jingjing Liu

Recently BERT has been adopted in state-of-the-art text summarization models for document encoding. However, such BERT-based extractive models use the sentence as the minimal selection unit, which often results in redundant or uninformative phrases in the generated summaries. As BERT is pre-trained on sentence pairs, not documents, the long-range dependencies between sentences are not well captured. To address these issues, we present a graph-based discourse-aware neural summarization model - DiscoBert. By utilizing discourse segmentation to extract discourse units (instead of sentences) as candidates, DiscoBert provides a fine-grained granularity for extractive selection, which helps reduce redundancy in extracted summaries. Based on this, two discourse graphs are further proposed: ($i$) RST Graph based on RST discourse trees; and ($ii$) Coreference Graph based on coreference mentions in the document. DiscoBert first encodes the extracted discourse units with BERT, and then uses a graph convolutional network to capture the long-range dependencies among discourse units through the constructed graphs. Experimental results on two popular summarization datasets demonstrate that DiscoBert outperforms state-of-the-art methods by a significant margin.


  Click for Model/Code and Paper
Probabilistic Trajectory Prediction for Autonomous Vehicles with Attentive Recurrent Neural Process

Oct 17, 2019
Jiacheng Zhu, Shenghao Qin, Wenshuo Wang, Ding Zhao

Predicting surrounding vehicle behaviors are critical to autonomous vehicles when negotiating in multi-vehicle interaction scenarios. Most existing approaches require tedious training process with large amounts of data and may fail to capture the propagating uncertainty in interaction behaviors. The multi-vehicle behaviors are assumed to be generated from a stochastic process. This paper proposes an attentive recurrent neural process (ARNP) approach to overcome the above limitations, which uses a neural process (NP) to learn a distribution of multi-vehicle interaction behavior. Our proposed model inherits the flexibility of neural networks while maintaining Bayesian probabilistic characteristics. Constructed by incorporating NPs with recurrent neural networks (RNNs), the ARNP model predicts the distribution of a target vehicle trajectory conditioned on the observed long-term sequential data of all surrounding vehicles. This approach is verified by learning and predicting lane-changing trajectories in complex traffic scenarios. Experimental results demonstrate that our proposed method outperforms previous counterparts in terms of accuracy and uncertainty expressiveness. Moreover, the meta-learning instinct of NPs enables our proposed ARNP model to capture global information of all observations, thereby being able to adapt to new targets efficiently.

* 7 pages, 5 figures, submitted to ICRA 2020 

  Click for Model/Code and Paper
A General Framework of Learning Multi-Vehicle Interaction Patterns from Videos

Jul 17, 2019
Chengyuan Zhang, Jiacheng Zhu, Wenshuo Wang, Ding Zhao

Semantic learning and understanding of multi-vehicle interaction patterns in a cluttered driving environment are essential but challenging for autonomous vehicles to make proper decisions. This paper presents a general framework to gain insights into intricate multi-vehicle interaction patterns from bird's-eye view traffic videos. We adopt a Gaussian velocity field to describe the time-varying multi-vehicle interaction behaviors and then use deep autoencoders to learn associated latent representations for each temporal frame. Then, we utilize a hidden semi-Markov model with a hierarchical Dirichlet process as a prior to segment these sequential representations into granular components, also called traffic primitives, corresponding to interaction patterns. Experimental results demonstrate that our proposed framework can extract traffic primitives from videos, thus providing a semantic way to analyze multi-vehicle interaction patterns, even for cluttered driving scenarios that are far messier than human beings can cope with.

* 2019 IEEE Intelligent Transportation Systems Conference (ITSC) 

  Click for Model/Code and Paper
Neyman-Pearson classification: parametrics and power enhancement

Jun 16, 2018
Xin Tong, Lucy Xia, Jiacheng Wang, Yang Feng

The Neyman-Pearson (NP) paradigm in binary classification seeks classifiers that achieve a minimal type II error while enforcing the prioritized type I error under some user-specified level. This paradigm serves naturally in applications such as severe disease diagnosis and spam detection, where people have clear priorities over the two error types. Despite recent advances in NP classification, the NP oracle inequalities, a core theoretical criterion to evaluate classifiers under the NP paradigm, were established only for classifiers based on nonparametric assumptions with bounded feature support. In this work, we conquer the challenges arisen from unbounded feature support in parametric settings and develop NP classification theory and methodology under these settings. Concretely, we propose a new parametric NP classifier NP-sLDA which satisfies the NP oracle inequalities. Furthermore, we construct an adaptive sample splitting scheme that can be applied universally to existing NP classifiers and this adaptive strategy greatly enhances the power of these classifiers. Through extensive numerical experiments and real data studies, we demonstrate the competence of NP-sLDA and the new sample splitting scheme.

* 31 pages 

  Click for Model/Code and Paper
Incorporating Discriminator in Sentence Generation: a Gibbs Sampling Method

Feb 25, 2018
Jinyue Su, Jiacheng Xu, Xipeng Qiu, Xuanjing Huang

Generating plausible and fluent sentence with desired properties has long been a challenge. Most of the recent works use recurrent neural networks (RNNs) and their variants to predict following words given previous sequence and target label. In this paper, we propose a novel framework to generate constrained sentences via Gibbs Sampling. The candidate sentences are revised and updated iteratively, with sampled new words replacing old ones. Our experiments show the effectiveness of the proposed method to generate plausible and diverse sentences.

* published in The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 2018 

  Click for Model/Code and Paper