Models, code, and papers for "Wenxi Chen":

Enhancement Mask for Hippocampus Detection and Segmentation

Feb 12, 2019
Dengsheng Chen, Wenxi Liu, You Huang, Tong Tong, Yuanlong Yu

Detection and segmentation of the hippocampal structures in volumetric brain images is a challenging problem in the area of medical imaging. In this paper, we propose a two-stage 3D fully convolutional neural network that efficiently detects and segments the hippocampal structures. In particular, our approach first localizes the hippocampus from the whole volumetric image while obtaining a proposal for a rough segmentation. After localization, we apply the proposal as an enhancement mask to extract the fine structure of the hippocampus. The proposed method has been evaluated on a public dataset and compares with state-of-the-art approaches. Results indicate the effectiveness of the proposed method, which yields mean Dice Similarity Coefficients (i.e. DSC) of $0.897$ and $0.900$ for the left and right hippocampus, respectively. Furthermore, extensive experiments manifest that the proposed enhancement mask layer has remarkable benefits for accelerating training process and obtaining more accurate segmentation results.

* 2018 IEEE International Conference on Information and Automation (ICIA) 
* This paper has been published in the proceedings of IEEE International Conference on Information and Automation 2018 

  Click for Model/Code and Paper
Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding

Aug 14, 2018
Tianshui Chen, Wenxi Wu, Yuefang Gao, Le Dong, Xiaonan Luo, Liang Lin

Object categories inherently form a hierarchy with different levels of concept abstraction, especially for fine-grained categories. For example, birds (Aves) can be categorized according to a four-level hierarchy of order, family, genus, and species. This hierarchy encodes rich correlations among various categories across different levels, which can effectively regularize the semantic space and thus make prediction less ambiguous. However, previous studies of fine-grained image recognition primarily focus on categories of one certain level and usually overlook this correlation information. In this work, we investigate simultaneously predicting categories of different levels in the hierarchy and integrating this structured correlation information into the deep neural network by developing a novel Hierarchical Semantic Embedding (HSE) framework. Specifically, the HSE framework sequentially predicts the category score vector of each level in the hierarchy, from highest to lowest. At each level, it incorporates the predicted score vector of the higher level as prior knowledge to learn finer-grained feature representation. During training, the predicted score vector of the higher level is also employed to regularize label prediction by using it as soft targets of corresponding sub-categories. To evaluate the proposed framework, we organize the 200 bird species of the Caltech-UCSD birds dataset with the four-level category hierarchy and construct a large-scale butterfly dataset that also covers four level categories. Extensive experiments on these two and the newly-released VegFru datasets demonstrate the superiority of our HSE framework over the baseline methods and existing competitors.

* Accepted at ACM MM 2018 as oral presentation 

  Click for Model/Code and Paper
Deformable Object Tracking with Gated Fusion

Sep 27, 2018
Wenxi Liu, Yibing Song, Dengsheng Chen, Yuanlong Yu, Shengfeng He, Rynson W. H. Lau

The tracking-by-detection framework receives growing attentions through the integration with the Convolutional Neural Network (CNN). Existing methods, however, fail to track objects with severe appearance variations. This is because the traditional convolutional operation is performed on fixed grids, and thus may not be able to find the correct response while the object is changing pose or under varying environmental conditions. In this paper, we propose a deformable convolution layer to enrich the target appearance representations in the tracking-by-detection framework. We aim to capture the target appearance variations via deformable convolution and supplement its original appearance through residual learning. Meanwhile, we propose a gated fusion scheme to control how the variations captured by the deformable convolution affect the original appearance. The enriched feature representation through deformable convolution facilitates the discrimination of the CNN classifier on the target object and background. Extensive experiments on the standard benchmarks show that the proposed tracker performs favorably against state-of-the-art methods.


  Click for Model/Code and Paper
Tell Me About Yourself: Using an AI-Powered Chatbot to Conduct Conversational Surveys

May 25, 2019
Ziang Xiao, Michelle X. Zhou, Q. Vera Liao, Gloria Mark, Changyan Chi, Wenxi Chen, Huahai Yang

The rise of increasingly more powerful chatbots offers a new way to collect information through conversational surveys, where a chatbot asks open-ended questions, interprets a user's free-text responses, and probes answers when needed. To investigate the effectiveness and limitations of such a chatbot in conducting surveys, we conducted a field study involving about 600 participants. In this study, half of the participants took a typical online survey on Qualtrics and the other half interacted with an AI-powered chatbot to complete a conversational survey. Our detailed analysis of over 5200 free-text responses revealed that the chatbot drove a significantly higher level of participant engagement and elicited significantly better quality responses in terms of relevance, depth, and readability. Based on our results, we discuss design implications for creating AI-powered chatbots to conduct effective surveys and beyond.

* Currently under review 

  Click for Model/Code and Paper
On the Necessity and Effectiveness of Learning the Prior of Variational Auto-Encoder

May 31, 2019
Haowen Xu, Wenxiao Chen, Jinlin Lai, Zhihan Li, Youjian Zhao, Dan Pei

Using powerful posterior distributions is a popular approach to achieving better variational inference. However, recent works showed that the aggregated posterior may fail to match unit Gaussian prior, thus learning the prior becomes an alternative way to improve the lower-bound. In this paper, for the first time in the literature, we prove the necessity and effectiveness of learning the prior when aggregated posterior does not match unit Gaussian prior, analyze why this situation may happen, and propose a hypothesis that learning the prior may improve reconstruction loss, all of which are supported by our extensive experiment results. We show that using learned Real NVP prior and just one latent variable in VAE, we can achieve test NLL comparable to very deep state-of-the-art hierarchical VAE, outperforming many previous works with complex hierarchical VAE architectures.


  Click for Model/Code and Paper
A Multimodal Alerting System for Online Class Quality Assurance

Sep 01, 2019
Jiahao Chen, Hang Li, Wenxin Wang, Wenbiao Ding, Gale Yan Huang, Zitao Liu

Online 1 on 1 class is created for more personalized learning experience. It demands a large number of teaching resources, which are scarce in China. To alleviate this problem, we build a platform (marketplace), i.e., \emph{Dahai} to allow college students from top Chinese universities to register as part-time instructors for the online 1 on 1 classes. To warn the unqualified instructors and ensure the overall education quality, we build a monitoring and alerting system by utilizing multimodal information from the online environment. Our system mainly consists of two key components: banned word detector and class quality predictor. The system performance is demonstrated both offline and online. By conducting experimental evaluation of real-world online courses, we are able to achieve 74.3\% alerting accuracy in our production environment.

* The 20th International Conference on Artificial Intelligence in Education(AIED), 2019 

  Click for Model/Code and Paper
REFIT: a Unified Watermark Removal Framework for Deep Learning Systems with Limited Data

Nov 17, 2019
Xinyun Chen, Wenxiao Wang, Chris Bender, Yiming Ding, Ruoxi Jia, Bo Li, Dawn Song

Deep neural networks (DNNs) have achieved tremendous success in various fields; however, training these models from scratch could be computationally expensive and requires a lot of training data. Recent work has explored different watermarking techniques to protect the pre-trained deep neural networks from potential copyright infringements; however, they could be vulnerable to adversaries who aim at removing the watermarks. In this work, we propose REFIT, a unified watermark removal framework based on fine-tuning, which does not rely on the knowledge of the watermarks and even the watermarking schemes. Firstly, we demonstrate that by properly designing the learning rate schedule for fine-tuning, an adversary is always able to remove the watermarks. Furthermore, we conduct a comprehensive study of a realistic attack scenario where the adversary has limited training data. To effectively remove the watermarks without compromising the model functionality under this weak threat model, we propose to incorporate two techniques: (1) an adaption of the elastic weight consolidation (EWC) algorithm, which is originally proposed for mitigating the catastrophic forgetting phenomenon; and (2) unlabeled data augmentation (AU), where we leverage auxiliary unlabeled data from other sources. Our extensive evaluation shows the effectiveness of REFIT against diverse watermark embedding schemes. In particular, both EWC and AU significantly decrease the amount of labeled training data needed for effective watermark removal, and the unlabeled data samples used for AU do not necessarily need to be drawn from the same distribution as the benign data for model evaluation. The experimental results demonstrate that our fine-tuning based watermark removal attacks could pose real threats to the copyright of pre-trained models, and thus highlights the importance of further investigation of the watermarking problem.

* Early version in ICML 2019 Workshop on Security and Privacy of Machine Learning. The first two authors contribute equally 

  Click for Model/Code and Paper
Deep Surface Normal Estimation with Hierarchical RGB-D Fusion

Apr 06, 2019
Jin Zeng, Yanfeng Tong, Yunmu Huang, Qiong Yan, Wenxiu Sun, Jing Chen, Yongtian Wang

The growing availability of commodity RGB-D cameras has boosted the applications in the field of scene understanding. However, as a fundamental scene understanding task, surface normal estimation from RGB-D data lacks thorough investigation. In this paper, a hierarchical fusion network with adaptive feature re-weighting is proposed for surface normal estimation from a single RGB-D image. Specifically, the features from color image and depth are successively integrated at multiple scales to ensure global surface smoothness while preserving visually salient details. Meanwhile, the depth features are re-weighted with a confidence map estimated from depth before merging into the color branch to avoid artifacts caused by input depth corruption. Additionally, a hybrid multi-scale loss function is designed to learn accurate normal estimation given noisy ground-truth dataset. Extensive experimental results validate the effectiveness of the fusion strategy and the loss design, outperforming state-of-the-art normal estimation schemes.


  Click for Model/Code and Paper
Getting Robots Unfrozen and Unlost in Dense Pedestrian Crowds

Sep 30, 2018
Tingxiang Fan, Xinjing Cheng, Jia Pan, Pinxin Long, Wenxi Liu, Ruigang Yang, Dinesh Manocha

We aim to enable a mobile robot to navigate through environments with dense crowds, e.g., shopping malls, canteens, train stations, or airport terminals. In these challenging environments, existing approaches suffer from two common problems: the robot may get frozen and cannot make any progress toward its goal, or it may get lost due to severe occlusions inside a crowd. Here we propose a navigation framework that handles the robot freezing and the navigation lost problems simultaneously. First, we enhance the robot's mobility and unfreeze the robot in the crowd using a reinforcement learning based local navigation policy developed in our previous work~\cite{long2017towards}, which naturally takes into account the coordination between the robot and the human. Secondly, the robot takes advantage of its excellent local mobility to recover from its localization failure. In particular, it dynamically chooses to approach a set of recovery positions with rich features. To the best of our knowledge, our method is the first approach that simultaneously solves the freezing problem and the navigation lost problem in dense crowds. We evaluate our method in both simulated and real-world environments and demonstrate that it outperforms the state-of-the-art approaches. Videos are available at https://sites.google.com/view/rlslam.


  Click for Model/Code and Paper
Geometry-aware Similarity Learning on SPD Manifolds for Visual Recognition

Aug 17, 2016
Zhiwu Huang, Ruiping Wang, Xianqiu Li, Wenxian Liu, Shiguang Shan, Luc Van Gool, Xilin Chen

Symmetric Positive Definite (SPD) matrices have been widely used for data representation in many visual recognition tasks. The success mainly attributes to learning discriminative SPD matrices with encoding the Riemannian geometry of the underlying SPD manifold. In this paper, we propose a geometry-aware SPD similarity learning (SPDSL) framework to learn discriminative SPD features by directly pursuing manifold-manifold transformation matrix of column full-rank. Specifically, by exploiting the Riemannian geometry of the manifold of fixed-rank Positive Semidefinite (PSD) matrices, we present a new solution to reduce optimizing over the space of column full-rank transformation matrices to optimizing on the PSD manifold which has a well-established Riemannian structure. Under this solution, we exploit a new supervised SPD similarity learning technique to learn the transformation by regressing the similarities of selected SPD data pairs to their ground-truth similarities on the target SPD manifold. To optimize the proposed objective function, we further derive an algorithm on the PSD manifold. Evaluations on three visual classification tasks show the advantages of the proposed approach over the existing SPD-based discriminant learning methods.


  Click for Model/Code and Paper
Robust Tracking Using Region Proposal Networks

May 30, 2017
Jimmy Ren, Zhiyang Yu, Jianbo Liu, Rui Zhang, Wenxiu Sun, Jiahao Pang, Xiaohao Chen, Qiong Yan

Recent advances in visual tracking showed that deep Convolutional Neural Networks (CNN) trained for image classification can be strong feature extractors for discriminative trackers. However, due to the drastic difference between image classification and tracking, extra treatments such as model ensemble and feature engineering must be carried out to bridge the two domains. Such procedures are either time consuming or hard to generalize well across datasets. In this paper we discovered that the internal structure of Region Proposal Network (RPN)'s top layer feature can be utilized for robust visual tracking. We showed that such property has to be unleashed by a novel loss function which simultaneously considers classification accuracy and bounding box quality. Without ensemble and any extra treatment on feature maps, our proposed method achieved state-of-the-art results on several large scale benchmarks including OTB50, OTB100 and VOT2016. We will make our code publicly available.


  Click for Model/Code and Paper
Accurate Single Stage Detector Using Recurrent Rolling Convolution

Apr 19, 2017
Jimmy Ren, Xiaohao Chen, Jianbo Liu, Wenxiu Sun, Jiahao Pang, Qiong Yan, Yu-Wing Tai, Li Xu

Most of the recent successful methods in accurate object detection and localization used some variants of R-CNN style two stage Convolutional Neural Networks (CNN) where plausible regions were proposed in the first stage then followed by a second stage for decision refinement. Despite the simplicity of training and the efficiency in deployment, the single stage detection methods have not been as competitive when evaluated in benchmarks consider mAP for high IoU thresholds. In this paper, we proposed a novel single stage end-to-end trainable object detection network to overcome this limitation. We achieved this by introducing Recurrent Rolling Convolution (RRC) architecture over multi-scale feature maps to construct object classifiers and bounding box regressors which are "deep in context". We evaluated our method in the challenging KITTI dataset which measures methods under IoU threshold of 0.7. We showed that with RRC, a single reduced VGG-16 based model already significantly outperformed all the previously published results. At the time this paper was written our models ranked the first in KITTI car detection (the hard level), the first in cyclist detection and the second in pedestrian detection. These results were not reached by the previous single stage methods. The code is publicly available.

* CVPR 2017 

  Click for Model/Code and Paper
ICFVR 2017: 3rd International Competition on Finger Vein Recognition

Jan 04, 2018
Yi Zhang, Houjun Huang, Haifeng Zhang, Liao Ni, Wei Xu, Nasir Uddin Ahmed, Md. Shakil Ahmed, Yilun Jin, Yingjie Chen, Jingxuan Wen, Wenxin Li

In recent years, finger vein recognition has become an important sub-field in biometrics and been applied to real-world applications. The development of finger vein recognition algorithms heavily depends on large-scale real-world data sets. In order to motivate research on finger vein recognition, we released the largest finger vein data set up to now and hold finger vein recognition competitions based on our data set every year. In 2017, International Competition on Finger Vein Recognition(ICFVR) is held jointly with IJCB 2017. 11 teams registered and 10 of them joined the final evaluation. The winner of this year dramatically improved the EER from 2.64% to 0.483% compared to the winner of last year. In this paper, we introduce the process and results of ICFVR 2017 and give insights on development of state-of-art finger vein recognition algorithms.

* 8 pages, 15 figures 

  Click for Model/Code and Paper
Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications

Feb 12, 2018
Haowen Xu, Wenxiao Chen, Nengwen Zhao, Zeyan Li, Jiahao Bu, Zhihan Li, Ying Liu, Youjian Zhao, Dan Pei, Yang Feng, Jie Chen, Zhaogang Wang, Honglin Qiao

To ensure undisrupted business, large Internet companies need to closely monitor various KPIs (e.g., Page Views, number of online users, and number of orders) of its Web applications, to accurately detect anomalies and trigger timely troubleshooting/mitigation. However, anomaly detection for these seasonal KPIs with various patterns and data quality has been a great challenge, especially without labels. In this paper, we proposed Donut, an unsupervised anomaly detection algorithm based on VAE. Thanks to a few of our key techniques, Donut greatly outperforms a state-of-arts supervised ensemble approach and a baseline VAE approach, and its best F-scores range from 0.75 to 0.9 for the studied KPIs from a top global Internet company. We come up with a novel KDE interpretation of reconstruction for Donut, making it the first VAE-based anomaly detection algorithm with solid theoretical explanation.

* 12 pages (including references), 17 figures, submitted to WWW 2018: The 2018 Web Conference, April 23--27, 2018, Lyon, France. The contents discarded from the conference version due to the 9-page limitation are also included in this version 

  Click for Model/Code and Paper