Models, code, and papers for "Xuan Li":

Joint Demosaicing and Super-Resolution (JDSR): Network Design and Perceptual Optimization

Nov 08, 2019
Xuan Xu, Yanfang, Ye, Xin Li

Image demosaicing and super-resolution are two important tasks in color imaging pipeline. So far they have been mostly independently studied in the open literature of deep learning; little is known about the potential benefit of formulating a joint demosaicing and super-resolution (JDSR) problem. In this paper, we propose an end-to-end optimization solution to the JDSR problem and demonstrate its practical significance in computational imaging. Our technical contributions are mainly two-fold. On network design, we have developed a Densely-connected Squeeze-and-Excitation Residual Network (DSERN) for JDSR. For the first time, we address the issue of spatio-spectral attention for color images and discuss how to achieve better information flow by smooth activation for JDSR. Experimental results have shown moderate PSNR/SSIM gain can be achieved by DSERN over previous naive network architectures. On perceptual optimization, we propose to leverage the latest ideas including relativistic discriminator and pre-excitation perceptual loss function to further improve the visual quality of reconstructed images. Our extensive experiment results have shown that Texture-enhanced Relativistic average Generative Adversarial Network (TRaGAN) can produce both subjectively more pleasant images and objectively lower perceptual distortion scores than standard GAN for JDSR. We have verified the benefit of JDSR to high-quality image reconstruction from real-world Bayer pattern collected by NASA Mars Curiosity.


  Click for Model/Code and Paper
Wasserstein Identity Testing

Oct 28, 2017
Shichuan Deng, Wenzheng Li, Xuan Wu

Uniformity testing and the more general identity testing are well studied problems in distributional property testing. Most previous work focuses on testing under $L_1$-distance. However, when the support is very large or even continuous, testing under $L_1$-distance may require a huge (even infinite) number of samples. Motivated by such issues, we consider the identity testing in Wasserstein distance (a.k.a. transportation distance and earthmover distance) on a metric space (discrete or continuous). In this paper, we propose the Wasserstein identity testing problem (Identity Testing in Wasserstein distance). We obtain nearly optimal worst-case sample complexity for the problem. Moreover, for a large class of probability distributions satisfying the so-called "Doubling Condition", we provide nearly instance-optimal sample complexity.


  Click for Model/Code and Paper
On better training the infinite restricted Boltzmann machines

Oct 14, 2017
Xuan Peng, Xunzhang Gao, Xiang Li

The infinite restricted Boltzmann machine (iRBM) is an extension of the classic RBM. It enjoys a good property of automatically deciding the size of the hidden layer according to specific training data. With sufficient training, the iRBM can achieve a competitive performance with that of the classic RBM. However, the convergence of learning the iRBM is slow, due to the fact that the iRBM is sensitive to the ordering of its hidden units, the learned filters change slowly from the left-most hidden unit to right. To break this dependency between neighboring hidden units and speed up the convergence of training, a novel training strategy is proposed. The key idea of the proposed training strategy is randomly regrouping the hidden units before each gradient descent step. Potentially, a mixing of infinite many iRBMs with different permutations of the hidden units can be achieved by this learning method, which has a similar effect of preventing the model from over-fitting as the dropout. The original iRBM is also modified to be capable of carrying out discriminative training. To evaluate the impact of our method on convergence speed of learning and the model's generalization ability, several experiments have been performed on the binarized MNIST and CalTech101 Silhouettes datasets. Experimental results indicate that the proposed training strategy can greatly accelerate learning and enhance generalization ability of iRBMs.

* Submitted to Machine Learning 

  Click for Model/Code and Paper
Lazy stochastic principal component analysis

Sep 21, 2017
Michael Wojnowicz, Dinh Nguyen, Li Li, Xuan Zhao

Stochastic principal component analysis (SPCA) has become a popular dimensionality reduction strategy for large, high-dimensional datasets. We derive a simplified algorithm, called Lazy SPCA, which has reduced computational complexity and is better suited for large-scale distributed computation. We prove that SPCA and Lazy SPCA find the same approximations to the principal subspace, and that the pairwise distances between samples in the lower-dimensional space is invariant to whether SPCA is executed lazily or not. Empirical studies find downstream predictive performance to be identical for both methods, and superior to random projections, across a range of predictive models (linear regression, logistic lasso, and random forests). In our largest experiment with 4.6 million samples, Lazy SPCA reduced 43.7 hours of computation to 9.9 hours. Overall, Lazy SPCA relies exclusively on matrix multiplications, besides an operation on a small square matrix whose size depends only on the target dimensionality.

* To be published in: 2017 IEEE International Conference on Data Mining Workshops (ICDMW) 

  Click for Model/Code and Paper
An LSTM-Based Autonomous Driving Model Using Waymo Open Dataset

Feb 14, 2020
Zhicheng Li, Zhihao Gu, Xuan Di, Rongye Shi

The Waymo Open Dataset has been released recently, providing a platform to crowdsource some fundamental challenges for automated vehicles (AVs), such as 3D detection and tracking. While the dataset provides a large amount of high-quality and multi-source driving information, people in academia are more interested in the underlying driving policy programmed in Waymo self-driving cars, which is inaccessible due to AV manufacturers' proprietary protection. Accordingly, academic researchers have to make various assumptions to implement AV components in their models or simulations, which may not represent the realistic interactions in real-world traffic. Thus, this paper introduces an approach to learn an long short-term memory (LSTM)-based model for imitating the behavior of Waymo's self-driving model. The proposed model has been evaluated based on Mean Absolute Error (MAE). The experimental results show that our model outperforms several baseline models in driving action prediction. Also, a visualization tool is presented for verifying the performance of the model.


  Click for Model/Code and Paper
MV-C3D: A Spatial Correlated Multi-View 3D Convolutional Neural Networks

Jun 15, 2019
Qi Xuan, Fuxian Li, Yi Liu, Yun Xiang

As the development of deep neural networks, 3D object recognition is becoming increasingly popular in computer vision community. Many multi-view based methods are proposed to improve the category recognition accuracy. These approaches mainly rely on multi-view images which are rendered with the whole circumference. In real-world applications, however, 3D objects are mostly observed from partial viewpoints in a less range. Therefore, we propose a multi-view based 3D convolutional neural network, which takes only part of contiguous multi-view images as input and can still maintain high accuracy. Moreover, our model takes these view images as a joint variable to better learn spatially correlated features using 3D convolution and 3D max-pooling layers. Experimental results on ModelNet10 and ModelNet40 datasets show that our MV-C3D technique can achieve outstanding performance with multi-view images which are captured from partial angles with less range. The results on 3D rotated real image dataset MIRO further demonstrate that MV-C3D is more adaptable in real-world scenarios. The classification accuracy can be further improved with the increasing number of view images.

* 11 pages, 11 figures 

  Click for Model/Code and Paper
Training and Testing Object Detectors with Virtual Images

Dec 22, 2017
Yonglin Tian, Xuan Li, Kunfeng Wang, Fei-Yue Wang

In the area of computer vision, deep learning has produced a variety of state-of-the-art models that rely on massive labeled data. However, collecting and annotating images from the real world has a great demand for labor and money investments and is usually too passive to build datasets with specific characteristics, such as small area of objects and high occlusion level. Under the framework of Parallel Vision, this paper presents a purposeful way to design artificial scenes and automatically generate virtual images with precise annotations. A virtual dataset named ParallelEye is built, which can be used for several computer vision tasks. Then, by training the DPM (Deformable Parts Model) and Faster R-CNN detectors, we prove that the performance of models can be significantly improved by combining ParallelEye with publicly available real-world datasets during the training phase. In addition, we investigate the potential of testing the trained models from a specific aspect using intentionally designed virtual datasets, in order to discover the flaws of trained models. From the experimental results, we conclude that our virtual dataset is viable to train and test the object detectors.

* To be published in IEEE/CAA Journal of Automatica Sinica 

  Click for Model/Code and Paper
Purine: A bi-graph based deep learning framework

Apr 16, 2015
Min Lin, Shuo Li, Xuan Luo, Shuicheng Yan

In this paper, we introduce a novel deep learning framework, termed Purine. In Purine, a deep network is expressed as a bipartite graph (bi-graph), which is composed of interconnected operators and data tensors. With the bi-graph abstraction, networks are easily solvable with event-driven task dispatcher. We then demonstrate that different parallelism schemes over GPUs and/or CPUs on single or multiple PCs can be universally implemented by graph composition. This eases researchers from coding for various parallelization schemes, and the same dispatcher can be used for solving variant graphs. Scheduled by the task dispatcher, memory transfers are fully overlapped with other computations, which greatly reduce the communication overhead and help us achieve approximate linear acceleration.

* Submitted to ICLR 2015 workshop 

  Click for Model/Code and Paper
Thick-Net: Parallel Network Structure for Sequential Modeling

Nov 19, 2019
Yu-Xuan Li, Jin-Yuan Liu, Liang Li, Xiang Guan

Recurrent neural networks have been widely used in sequence learning tasks. In previous studies, the performance of the model has always been improved by either wider or deeper structures. However, the former becomes more prone to overfitting, while the latter is difficult to optimize. In this paper, we propose a simple new model named Thick-Net, by expanding the network from another dimension: thickness. Multiple parallel values are obtained via more sets of parameters in each hidden state, and the maximum value is selected as the final output among parallel intermediate outputs. Notably, Thick-Net can efficiently avoid overfitting, and is easier to optimize than the vanilla structures due to the large dropout affiliated with it. Our model is evaluated on four sequential tasks including adding problem, permuted sequential MNIST, text classification and language modeling. The results of these tasks demonstrate that our model can not only improve accuracy with faster convergence but also facilitate a better generalization ability.

* 11 pages, 4 figures 

  Click for Model/Code and Paper
Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis

Jul 18, 2018
Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai

This paper proposes a forward attention method for the sequenceto- sequence acoustic modeling of speech synthesis. This method is motivated by the nature of the monotonic alignment from phone sequences to acoustic sequences. Only the alignment paths that satisfy the monotonic condition are taken into consideration at each decoder timestep. The modified attention probabilities at each timestep are computed recursively using a forward algorithm. A transition agent for forward attention is further proposed, which helps the attention mechanism to make decisions whether to move forward or stay at each decoder timestep. Experimental results show that the proposed forward attention method achieves faster convergence speed and higher stability than the baseline attention method. Besides, the method of forward attention with transition agent can also help improve the naturalness of synthetic speech and control the speed of synthetic speech effectively.

* IEEE International Conference on Acoustics, Speech and Signal Processing 2018 (ICASSP2018) 
* 5 pages, 3 figures, 2 tables. Published in IEEE International Conference on Acoustics, Speech and Signal Processing 2018 (ICASSP2018) 

  Click for Model/Code and Paper
Automated Detecting and Placing Road Objects from Street-level Images

Sep 17, 2019
Chaoquan Zhang, Hongchao Fan, Wanzhi Li, Bo Mao, Xuan Ding

Navigation services utilized by autonomous vehicles or ordinary users require the availability of detailed information about road-related objects and their geolocations, especially at road intersections. However, these road intersections are mainly represented as point elements without detailed information, or are even not available in current versions of crowdsourced mapping databases including OpenStreetMap(OSM). This study develops an approach to automatically detect road objects and place them to right location from street-level images. Our processing pipeline relies on two convolutional neural networks: the first segments the images, while the second detects and classifies the specific objects. Moreover, to locate the detected objects, we establish an attributed topological binary tree(ATBT) based on urban grammar for each image to depict the coherent relations of topologies, attributes and semantics of the road objects. Then the ATBT is further matched with map features on OSM to determine the right placed location. The proposed method has been applied to a case study in Berlin, Germany. We validate the effectiveness of our method on two object classes: traffic signs and traffic lights. Experimental results demonstrate that the proposed approach provides near-precise localization results in terms of completeness and positional accuracy. Among many potential applications, the output may be combined with other sources of data to guide autonomous vehicles


  Click for Model/Code and Paper
A Learning-based Framework for Hybrid Depth-from-Defocus and Stereo Matching

Aug 06, 2018
Zhang Chen, Xinqing Guo, Siyuan Li, Xuan Cao, Jingyi Yu

Depth from defocus (DfD) and stereo matching are two most studied passive depth sensing schemes. The techniques are essentially complementary: DfD can robustly handle repetitive textures that are problematic for stereo matching whereas stereo matching is insensitive to defocus blurs and can handle large depth range. In this paper, we present a unified learning-based technique to conduct hybrid DfD and stereo matching. Our input is image triplets: a stereo pair and a defocused image of one of the stereo views. We first apply depth-guided light field rendering to construct a comprehensive training dataset for such hybrid sensing setups. Next, we adopt the hourglass network architecture to separately conduct depth inference from DfD and stereo. Finally, we exploit different connection methods between the two separate networks for integrating them into a unified solution to produce high fidelity 3D disparity maps. Comprehensive experiments on real and synthetic data show that our new learning-based hybrid 3D sensing technique can significantly improve accuracy and robustness in 3D reconstruction.


  Click for Model/Code and Paper
The ParallelEye Dataset: Constructing Large-Scale Artificial Scenes for Traffic Vision Research

Dec 22, 2017
Xuan Li, Kunfeng Wang, Yonglin Tian, Lan Yan, Fei-Yue Wang

Video image datasets are playing an essential role in design and evaluation of traffic vision algorithms. Nevertheless, a longstanding inconvenience concerning image datasets is that manually collecting and annotating large-scale diversified datasets from real scenes is time-consuming and prone to error. For that virtual datasets have begun to function as a proxy of real datasets. In this paper, we propose to construct large-scale artificial scenes for traffic vision research and generate a new virtual dataset called "ParallelEye". First of all, the street map data is used to build 3D scene model of Zhongguancun Area, Beijing. Then, the computer graphics, virtual reality, and rule modeling technologies are utilized to synthesize large-scale, realistic virtual urban traffic scenes, in which the fidelity and geography match the real world well. Furthermore, the Unity3D platform is used to render the artificial scenes and generate accurate ground-truth labels, e.g., semantic/instance segmentation, object bounding box, object tracking, optical flow, and depth. The environmental conditions in artificial scenes can be controlled completely. As a result, we present a viable implementation pipeline for constructing large-scale artificial scenes for traffic vision research. The experimental results demonstrate that this pipeline is able to generate photorealistic virtual datasets with low modeling time and high accuracy labeling.

* To be published in IEEE ITSC 2017 

  Click for Model/Code and Paper
Time-aware Gradient Attack on Dynamic Network Link Prediction

Nov 24, 2019
Jinyin Chen, Jian Zhang, Zhi Chen, Min Du, Feifei Li, Qi Xuan

In network link prediction, it is possible to hide a target link from being predicted with a small perturbation on network structure. This observation may be exploited in many real world scenarios, for example, to preserve privacy, or to exploit financial security. There have been many recent studies to generate adversarial examples to mislead deep learning models on graph data. However, none of the previous work has considered the dynamic nature of real-world systems. In this work, we present the first study of adversarial attack on dynamic network link prediction (DNLP). The proposed attack method, namely time-aware gradient attack (TGA), utilizes the gradient information generated by deep dynamic network embedding (DDNE) across different snapshots to rewire a few links, so as to make DDNE fail to predict target links. We implement TGA in two ways: one is based on traversal search, namely TGA-Tra; and the other is simplified with greedy search for efficiency, namely TGA-Gre. We conduct comprehensive experiments which show the outstanding performance of TGA in attacking DNLP algorithms.

* 7 pages 

  Click for Model/Code and Paper
Boosting Image Recognition with Non-differentiable Constraints

Oct 02, 2019
Xuan Li, Yuchen Lu, Peng Xu, Jizong Peng, Christian Desrosiers, Xue Liu

In this paper, we study the problem of image recognition with non-differentiable constraints. A lot of real-life recognition applications require a rich output structure with deterministic constraints that are discrete or modeled by a non-differentiable function. A prime example is recognizing digit sequences, which are restricted by such rules (e.g., \textit{container code detection}, \textit{social insurance number recognition}, etc.). We investigate the usefulness of adding non-differentiable constraints in learning for the task of digit sequence recognition. Toward this goal, we synthesize six different datasets from MNIST and Cropped SVHN, with three discrete rules inspired by real-life protocols. To deal with the non-differentiability of these rules, we propose a reinforcement learning approach based on the policy gradient method. We find that incorporating this rule-based reinforcement can effectively increase the accuracy for all datasets and provide a good inductive bias which improves the model even with limited data. On one of the datasets, MNIST\_Rule2, models trained with rule-based reinforcement increase the accuracy by 4.7\% for 2000 samples and 23.6\% for 500 samples. We further test our model against synthesized adversarial examples, e.g., blocking out digits, and observe that adding our rule-based reinforcement increases the model robustness with a relatively smaller performance drop.


  Click for Model/Code and Paper
A Fast and Precise Method for Large-Scale Land-Use Mapping Based on Deep Learning

Aug 09, 2019
Xuan Yang, Zhengchao Chen, Baipeng Li, Dailiang Peng, Pan Chen, Bing Zhang

The land-use map is an important data that can reflect the use and transformation of human land, and can provide valuable reference for land-use planning. For the traditional image classification method, producing a high spatial resolution (HSR), land-use map in large-scale is a big project that requires a lot of human labor, time, and financial expenditure. The rise of the deep learning technique provides a new solution to the problems above. This paper proposes a fast and precise method that can achieve large-scale land-use classification based on deep convolutional neural network (DCNN). In this paper, we optimize the data tiling method and the structure of DCNN for the multi-channel data and the splicing edge effect, which are unique to remote sensing deep learning, and improve the accuracy of land-use classification. We apply our improved methods in the Guangdong Province of China using GF-1 images, and achieve the land-use classification accuracy of 81.52%. It takes only 13 hours to complete the work, which will take several months for human labor.

* Accepted at IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2019 

  Click for Model/Code and Paper
Progressive Learning of Low-Precision Networks

May 28, 2019
Zhengguang Zhou, Wengang Zhou, Xutao Lv, Xuan Huang, Xiaoyu Wang, Houqiang Li

Recent years have witnessed the great advance of deep learning in a variety of vision tasks. Many state-of-the-art deep neural networks suffer from large size and high complexity, which makes it difficult to deploy in resource-limited platforms such as mobile devices. To this end, low-precision neural networks are widely studied which quantize weights or activations into the low-bit format. Though being efficient, low-precision networks are usually hard to train and encounter severe accuracy degradation. In this paper, we propose a new training strategy through expanding low-precision networks during training and removing the expanded parts for network inference. First, we equip each low-precision convolutional layer with an ancillary full-precision convolutional layer based on a low-precision network structure, which could guide the network to good local minima. Second, a decay method is introduced to reduce the output of the added full-precision convolution gradually, which keeps the resulted topology structure the same to the original low-precision one. Experiments on SVHN, CIFAR and ILSVRC-2012 datasets prove that the proposed method can bring faster convergence and higher accuracy for low-precision neural networks.

* 10 pages, 8 figures 

  Click for Model/Code and Paper
Fast Non-local Stereo Matching based on Hierarchical Disparity Prediction

Sep 28, 2015
Xuan Luo, Xuejiao Bai, Shuo Li, Hongtao Lu, Sei-ichiro Kamata

Stereo matching is the key step in estimating depth from two or more images. Recently, some tree-based non-local stereo matching methods have been proposed, which achieved state-of-the-art performance. The algorithms employed some tree structures to aggregate cost and thus improved the performance and reduced the coputation load of the stereo matching. However, the computational complexity of these tree-based algorithms is still high because they search over the entire disparity range. In addition, the extreme greediness of the minimum spanning tree (MST) causes the poor performance in large areas with similar colors but varying disparities. In this paper, we propose an efficient stereo matching method using a hierarchical disparity prediction (HDP) framework to dramatically reduce the disparity search range so as to speed up the tree-based non-local stereo methods. Our disparity prediction scheme works on a graph pyramid derived from an image whose disparity to be estimated. We utilize the disparity of a upper graph to predict a small disparity range for the lower graph. Some independent disparity trees (DT) are generated to form a disparity prediction forest (HDPF) over which the cost aggregation is made. When combined with the state-of-the-art tree-based methods, our scheme not only dramatically speeds up the original methods but also improves their performance by alleviating the second drawback of the tree-based methods. This is partially because our DTs overcome the extreme greediness of the MST. Extensive experimental results on some benchmark datasets demonstrate the effectiveness and efficiency of our framework. For example, the segment-tree based stereo matching becomes about 25.57 times faster and 2.2% more accurate over the Middlebury 2006 full-size dataset.

* 9 pages 

  Click for Model/Code and Paper
An End-to-end Video Text Detector with Online Tracking

Aug 20, 2019
Hongyuan Yu, Chengquan Zhang, Xuan Li, Junyu Han, Errui Ding, Liang Wang

Video text detection is considered as one of the most difficult tasks in document analysis due to the following two challenges: 1) the difficulties caused by video scenes, i.e., motion blur, illumination changes, and occlusion; 2) the properties of text including variants of fonts, languages, orientations, and shapes. Most existing methods attempt to enhance the performance of video text detection by cooperating with video text tracking, but treat these two tasks separately. In this work, we propose an end-to-end video text detection model with online tracking to address these two challenges. Specifically, in the detection branch, we adopt ConvLSTM to capture spatial structure information and motion memory. In the tracking branch, we convert the tracking problem to text instance association, and an appearance-geometry descriptor with memory mechanism is proposed to generate robust representation of text instances. By integrating these two branches into one trainable framework, they can promote each other and the computational cost is significantly reduced. Experiments on existing video text benchmarks including ICDAR2013 Video, Minetto and YVT demonstrate that the proposed method significantly outperforms state-of-the-art methods. Our method improves F-score by about 2 on all datasets and it can run realtime with 24.36 fps on TITAN Xp.


  Click for Model/Code and Paper
Ground-truth dataset and baseline evaluations for image base-detail separation algorithms

Feb 18, 2016
Xuan Dong, Boyan Bonev, Weixin Li, Weichao Qiu, Xianjie Chen, Alan Yuille

Base-detail separation is a fundamental computer vision problem consisting of modeling a smooth base layer with the coarse structures, and a detail layer containing the texture-like structures. One of the challenges of estimating the base is to preserve sharp boundaries between objects or parts to avoid halo artifacts. Many methods have been proposed to address this problem, but there is no ground-truth dataset of real images for quantitative evaluation. We proposed a procedure to construct such a dataset, and provide two datasets: Pascal Base-Detail and Fashionista Base-Detail, containing 1000 and 250 images, respectively. Our assumption is that the base is piecewise smooth and we label the appearance of each piece by a polynomial model. The pieces are objects and parts of objects, obtained from human annotations. Finally, we proposed a way to evaluate methods with our base-detail ground-truth and we compared the performances of seven state-of-the-art algorithms.

* This paper has been withdrawn by the author due to some un-proper examples 

  Click for Model/Code and Paper