Models, code, and papers for "Xiao Shu":

Quality Adaptive Low-Rank Based JPEG Decoding with Applications

Jan 06, 2016
Xiao Shu, Xiaolin Wu

Small compression noises, despite being transparent to human eyes, can adversely affect the results of many image restoration processes, if left unaccounted for. Especially, compression noises are highly detrimental to inverse operators of high-boosting (sharpening) nature, such as deblurring and superresolution against a convolution kernel. By incorporating the non-linear DCT quantization mechanism into the formulation for image restoration, we propose a new sparsity-based convex programming approach for joint compression noise removal and image restoration. Experimental results demonstrate significant performance gains of the new approach over existing image restoration methods.

  Click for Model/Code and Paper
Deep Learning with Inaccurate Training Data for Image Restoration

Nov 18, 2018
Bolin Liu, Xiao Shu, Xiaolin Wu

In many applications of deep learning, particularly those in image restoration, it is either very difficult, prohibitively expensive, or outright impossible to obtain paired training data precisely as in the real world. In such cases, one is forced to use synthesized paired data to train the deep convolutional neural network (DCNN). However, due to the unavoidable generalization error in statistical learning, the synthetically trained DCNN often performs poorly on real world data. To overcome this problem, we propose a new general training method that can compensate for, to a large extent, the generalization errors of synthetically trained DCNNs.

  Click for Model/Code and Paper
On Numerosity of Deep Convolutional Neural Networks

Jul 11, 2018
Xiaolin Wu, Xi Zhang, Xiao Shu

Subitizing, or the sense of small natural numbers, is a cognitive construct so primary and critical to the survival and well-being of humans and primates that is considered and proven to be innate; it responds to visual stimuli prior to the development of any symbolic skills, language or arithmetic. Given highly acclaimed successes of deep convolutional neural networks (DCNN) in tasks of visual intelligence, one would expect that DCNNs can learn subitizing. But somewhat surprisingly, our carefully crafted extensive experiments, which are similar to those of cognitive psychology, demonstrate that DCNNs cannot, even with strong supervision, see through superficial variations in visual representations and distill the abstract notion of natural number, a task that children perform with high accuracy and confidence. The DCNN black box learners driven by very large training sets are apparently still confused by geometric variations and fail to grasp the topological essence in subitizing. In sharp contrast to the failures of the black box learning, by incorporating a mechanism of mathematical morphology into convolutional kernels, we are able to construct a recurrent convolutional neural network that can perform subitizing deterministically. Our findings in this study of cognitive computing, without and with prior of human knowledge, are discussed; they are, we believe, significant and thought-provoking in the interests of AI research, because visual-based numerosity is a benchmark of minimum sort for human cognition.

  Click for Model/Code and Paper
Demoiréing of Camera-Captured Screen Images Using Deep Convolutional Neural Network

Apr 11, 2018
Bolin Liu, Xiao Shu, Xiaolin Wu

Taking photos of optoelectronic displays is a direct and spontaneous way of transferring data and keeping records, which is widely practiced. However, due to the analog signal interference between the pixel grids of the display screen and camera sensor array, objectionable moir\'e (alias) patterns appear in captured screen images. As the moir\'e patterns are structured and highly variant, they are difficult to be completely removed without affecting the underneath latent image. In this paper, we propose an approach of deep convolutional neural network for demoir\'eing screen photos. The proposed DCNN consists of a coarse-scale network and a fine-scale network. In the coarse-scale network, the input image is first downsampled and then processed by stacked residual blocks to remove the moir\'e artifacts. After that, the fine-scale network upsamples the demoir\'ed low-resolution image back to the original resolution. Extensive experimental results have demonstrated that the proposed technique can efficiently remove the moir\'e patterns for camera acquired screen images; the new technique outperforms the existing ones.

  Click for Model/Code and Paper
Learning-Based Dequantization For Image Restoration Against Extremely Poor Illumination

Mar 20, 2018
Chang Liu, Xiaolin Wu, Xiao Shu

All existing image enhancement methods, such as HDR tone mapping, cannot recover A/D quantization losses due to insufficient or excessive lighting, (underflow and overflow problems). The loss of image details due to A/D quantization is complete and it cannot be recovered by traditional image processing methods, but the modern data-driven machine learning approach offers a much needed cure to the problem. In this work we propose a novel approach to restore and enhance images acquired in low and uneven lighting. First, the ill illumination is algorithmically compensated by emulating the effects of artificial supplementary lighting. Then a DCNN trained using only synthetic data recovers the missing detail caused by quantization.

  Click for Model/Code and Paper
Fast Screening Algorithm for Rotation and Scale Invariant Template Matching

Jul 19, 2017
Bolin Liu, Xiao Shu, Xiaolin Wu

This paper presents a generic pre-processor for expediting conventional template matching techniques. Instead of locating the best matched patch in the reference image to a query template via exhaustive search, the proposed algorithm rules out regions with no possible matches with minimum computational efforts. While working on simple patch features, such as mean, variance and gradient, the fast pre-screening is highly discriminative. Its computational efficiency is gained by using a novel octagonal-star-shaped template and the inclusion-exclusion principle to extract and compare patch features. Moreover, it can handle arbitrary rotation and scaling of reference images effectively. Extensive experiments demonstrate that the proposed algorithm greatly reduces the search space while never missing the best match.

  Click for Model/Code and Paper
Independence Promoted Graph Disentangled Networks

Nov 26, 2019
Yanbei Liu, Xiao Wang, Shu Wu, Zhitao Xiao

We address the problem of disentangled representation learning with independent latent factors in graph convolutional networks (GCNs). The current methods usually learn node representation by describing its neighborhood as a perceptual whole in a holistic manner while ignoring the entanglement of the latent factors. However, a real-world graph is formed by the complex interaction of many latent factors (e.g., the same hobby, education or work in social network). While little effort has been made toward exploring the disentangled representation in GCNs. In this paper, we propose a novel Independence Promoted Graph Disentangled Networks (IPGDN) to learn disentangled node representation while enhancing the independence among node representations. In particular, we firstly present disentangled representation learning by neighborhood routing mechanism, and then employ the Hilbert-Schmidt Independence Criterion (HSIC) to enforce independence between the latent representations, which is effectively integrated into a graph convolutional framework as a regularizer at the output layer. Experimental studies on real-world graphs validate our model and demonstrate that our algorithms outperform the state-of-the-arts by a wide margin in different network applications, including semi-supervised graph classification, graph clustering and graph visualization.

  Click for Model/Code and Paper
Single Image Reflection Removal Using Deep Encoder-Decoder Network

Jan 31, 2018
Zhixiang Chi, Xiaolin Wu, Xiao Shu, Jinjin Gu

Image of a scene captured through a piece of transparent and reflective material, such as glass, is often spoiled by a superimposed layer of reflection image. While separating the reflection from a familiar object in an image is mentally not difficult for humans, it is a challenging, ill-posed problem in computer vision. In this paper, we propose a novel deep convolutional encoder-decoder method to remove the objectionable reflection by learning a map between image pairs with and without reflection. For training the neural network, we model the physical formation of reflections in images and synthesize a large number of photo-realistic reflection-tainted images from reflection-free images collected online. Extensive experimental results show that, although the neural network learns only from synthetic data, the proposed method is effective on real-world images, and it significantly outperforms the other tested state-of-the-art techniques.

  Click for Model/Code and Paper
Variational Regularized Transmission Refinement for Image Dehazing

Feb 19, 2019
Qiaoling Shu, Chuansheng Wu, Zhe Xiao, Ryan Wen Liu

High-quality dehazing performance is highly dependent upon the accurate estimation of transmission map. In this work, the coarse estimation version is first obtained by weightedly fusing two different transmission maps, which are generated from foreground and sky regions, respectively. A hybrid variational model with promoted regularization terms is then proposed to assisting in refining transmission map. The resulting complicated optimization problem is effectively solved via an alternating direction algorithm. The final haze-free image can be effectively obtained according to the refined transmission map and atmospheric scattering model. Our dehazing framework has the capacity of preserving important image details while suppressing undesirable artifacts, even for hazy images with large sky regions. Experiments on both synthetic and realistic images have illustrated that the proposed method is competitive with or even outperforms the state-of-the-art dehazing techniques under different imaging conditions.

* 5 pages, 5 figures 

  Click for Model/Code and Paper
Dual Skew Divergence Loss for Neural Machine Translation

Aug 22, 2019
Fengshun Xiao, Yingting Wu, Hai Zhao, Rui Wang, Shu Jiang

For neural sequence model training, maximum likelihood (ML) has been commonly adopted to optimize model parameters with respect to the corresponding objective. However, in the case of sequence prediction tasks like neural machine translation (NMT), training with the ML-based cross entropy loss would often lead to models that overgeneralize and plunge into local optima. In this paper, we propose an extended loss function called dual skew divergence (DSD), which aims to give a better tradeoff between generalization ability and error avoidance during NMT training. Our empirical study indicates that switching to DSD loss after the convergence of ML training helps the model skip the local optimum and stimulates a stable performance improvement. The evaluations on WMT 2014 English-German and English-French translation tasks demonstrate that the proposed loss indeed helps bring about better translation performance than several baselines.

* 9pages 

  Click for Model/Code and Paper
Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images

Mar 09, 2016
Shuran Song, Jianxiong Xiao

We focus on the task of amodal 3D object detection in RGB-D images, which aims to produce a 3D bounding box of an object in metric form at its full extent. We introduce Deep Sliding Shapes, a 3D ConvNet formulation that takes a 3D volumetric scene from a RGB-D image as input and outputs 3D object bounding boxes. In our approach, we propose the first 3D Region Proposal Network (RPN) to learn objectness from geometric shapes and the first joint Object Recognition Network (ORN) to extract geometric features in 3D and color features in 2D. In particular, we handle objects of various sizes by training an amodal RPN at two different scales and an ORN to regress 3D bounding boxes. Experiments show that our algorithm outperforms the state-of-the-art by 13.8 in mAP and is 200x faster than the original Sliding Shapes. All source code and pre-trained models will be available at GitHub.

  Click for Model/Code and Paper
Tracking Revisited using RGBD Camera: Baseline and Benchmark

Dec 12, 2012
Shuran Song, Jianxiong Xiao

Although there has been significant progress in the past decade,tracking is still a very challenging computer vision task, due to problems such as occlusion and model drift.Recently, the increased popularity of depth sensors e.g. Microsoft Kinect has made it easy to obtain depth data at low cost.This may be a game changer for tracking, since depth information can be used to prevent model drift and handle occlusion.In this paper, we construct a benchmark dataset of 100 RGBD videos with high diversity, including deformable objects, various occlusion conditions and moving cameras. We propose a very simple but strong baseline model for RGBD tracking, and present a quantitative comparison of several state-of-the-art tracking algorithms.Experimental results show that including depth information and reasoning about occlusion significantly improves tracking performance. The datasets, evaluation details, source code for the baseline algorithm, and instructions for submitting new models will be made available online after acceptance.

  Click for Model/Code and Paper
IM2HEIGHT: Height Estimation from Single Monocular Imagery via Fully Residual Convolutional-Deconvolutional Network

Feb 28, 2018
Lichao Mou, Xiao Xiang Zhu

In this paper we tackle a very novel problem, namely height estimation from a single monocular remote sensing image, which is inherently ambiguous, and a technically ill-posed problem, with a large source of uncertainty coming from the overall scale. We propose a fully convolutional-deconvolutional network architecture being trained end-to-end, encompassing residual learning, to model the ambiguous mapping between monocular remote sensing images and height maps. Specifically, it is composed of two parts, i.e., convolutional sub-network and deconvolutional sub-network. The former corresponds to feature extractor that transforms the input remote sensing image to high-level multidimensional feature representation, whereas the latter plays the role of a height generator that produces height map from the feature extracted from the convolutional sub-network. Moreover, to preserve fine edge details of estimated height maps, we introduce a skip connection to the network, which is able to shuttle low-level visual information, e.g., object boundaries and edges, directly across the network. To demonstrate the usefulness of single-view height prediction, we show a practical example of instance segmentation of buildings using estimated height map. This paper, for the first time in the remote sensing community, attempts to estimate height from monocular vision. The proposed network is validated using a large-scale high resolution aerial image data set covered an area of Berlin. Both visual and quantitative analysis of the experimental results demonstrate the effectiveness of our approach.

  Click for Model/Code and Paper
Hierarchical Contextualized Representation for Named Entity Recognition

Nov 19, 2019
Ying Luo, Fengshun Xiao, Hai Zhao

Named entity recognition (NER) models are typically based on the architecture of Bi-directional LSTM (BiLSTM). The constraints of sequential nature and the modeling of single input prevent the full utilization of global information from larger scope, not only in the entire sentence, but also in the entire document (dataset). In this paper, we address these two deficiencies and propose a model augmented with hierarchical contextualized representation: sentence-level representation and document-level representation. In sentence-level, we take different contributions of words in a single sentence into consideration to enhance the sentence representation learned from an independent BiLSTM via label embedding attention mechanism. In document-level, the key-value memory network is adopted to record the document-aware information for each unique word which is sensitive to similarity of context information. Our two-level hierarchical contextualized representations are fused with each input token embedding and corresponding hidden state of BiLSTM, respectively. The experimental results on three benchmark NER datasets (CoNLL-2003 and Ontonotes 5.0 English datasets, CoNLL-2002 Spanish dataset) show that we establish new state-of-the-art results.

* Accepted by AAAI 2020 

  Click for Model/Code and Paper
Robot In a Room: Toward Perfect Object Recognition in Closed Environments

Jul 09, 2015
Shuran Song, Linguang Zhang, Jianxiong Xiao

While general object recognition is still far from being solved, this paper proposes a way for a robot to recognize every object at an almost human-level accuracy. Our key observation is that many robots will stay in a relatively closed environment (e.g. a house or an office). By constraining a robot to stay in a limited territory, we can ensure that the robot has seen most objects before and the speed of introducing a new object is slow. Furthermore, we can build a 3D map of the environment to reliably subtract the background to make recognition easier. We propose extremely robust algorithms to obtain a 3D map and enable humans to collectively annotate objects. During testing time, our algorithm can recognize all objects very reliably, and query humans from crowd sourcing platform if confidence is low or new objects are identified. This paper explains design decisions in building such a system, and constructs a benchmark for extensive evaluation. Experiments suggest that making robot vision appear to be working from an end user's perspective is a reachable goal today, as long as the robot stays in a closed environment. By formulating this task, we hope to lay the foundation of a new direction in vision for robotics. Code and data will be available upon acceptance.

  Click for Model/Code and Paper
Region-Manipulated Fusion Networks for Pancreatitis Recognition

Jul 03, 2019
Jian Wang, Xiaoyao Li, Xiangbo Shu, Weiqin Li

This work first attempts to automatically recognize pancreatitis on CT scan images. However, different form the traditional object recognition, such pancreatitis recognition is challenging due to the fine-grained and non-rigid appearance variability of the local diseased regions. To this end, we propose a customized Region-Manipulated Fusion Networks (RMFN) to capture the key characteristics of local lesion for pancreatitis recognition. Specifically, to effectively highlight the imperceptible lesion regions, a novel region-manipulated scheme in RMFN is proposed to force the lesion regions while weaken the non-lesion regions by ceaselessly aggregating the multi-scale local information onto feature maps. The proposed scheme can be flexibly equipped into the existing neural networks, such as AlexNet and VGG. To evaluate the performance of the propose method, a real CT image database about pancreatitis is collected from hospitals \footnote{The database is available later}. And experimental results on such database well demonstrate the effectiveness of the proposed method for pancreatitis recognition.

  Click for Model/Code and Paper
DSGN: Deep Stereo Geometry Network for 3D Object Detection

Jan 10, 2020
Yilun Chen, Shu Liu, Xiaoyong Shen, Jiaya Jia

Most state-of-the-art 3D object detectors heavily rely on LiDAR sensors and there remains a large gap in terms of performance between image-based and LiDAR-based methods, caused by inappropriate representation for the prediction in 3D scenarios. Our method, called Deep Stereo Geometry Network (DSGN), reduces this gap significantly by detecting 3D objects on a differentiable volumetric representation -- 3D geometric volume, which effectively encodes 3D geometric structure for 3D regular space. With this representation, we learn depth information and semantic cues simultaneously. For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline that jointly estimates the depth and detects 3D objects in an end-to-end learning manner. Our approach outperforms previous stereo-based 3D detectors (about 10 higher in terms of AP) and even achieves comparable performance with a few LiDAR-based methods on the KITTI 3D object detection leaderboard. Code will be made publicly available.

  Click for Model/Code and Paper
Multi-Robot Deep Reinforcement Learning with Macro-Actions

Sep 19, 2019
Yuchen Xiao, Joshua Hoffman, Tian Xia, Christopher Amato

In many real-world multi-robot tasks, high-quality solutions often require a team of robots to perform asynchronous actions under decentralized control. Multi-agent reinforcement learning methods have difficulty learning decentralized policies because the environment appearing to be non-stationary due to other agents also learning at the same time. In this paper, we address this challenge by proposing a macro-action-based decentralized multi-agent double deep recurrent Q-net (MacDec-MADDRQN) which creates a new double Q-updating rule to train each decentralized Q-net using a centralized Q-net for action selection. A generalized version of MacDec-MADDRQN with two separate training environments, called Parallel-MacDec-MADDRQN, is also presented to cope with the uncertainty in adopting either centralized or decentralized exploration. The advantages and the practical nature of our methods are demonstrated by achieving near-centralized results in simulation experiments and permitting real robots to accomplish a warehouse tool delivery task in an efficient way.

  Click for Model/Code and Paper
Fast Point R-CNN

Aug 16, 2019
Yilun Chen, Shu Liu, Xiaoyong Shen, Jiaya Jia

We present a unified, efficient and effective framework for point-cloud based 3D object detection. Our two-stage approach utilizes both voxel representation and raw point cloud data to exploit respective advantages. The first stage network, with voxel representation as input, only consists of light convolutional operations, producing a small number of high-quality initial predictions. Coordinate and indexed convolutional feature of each point in initial prediction are effectively fused with the attention mechanism, preserving both accurate localization and context information. The second stage works on interior points with their fused feature for further refining the prediction. Our method is evaluated on KITTI dataset, in terms of both 3D and Bird's Eye View (BEV) detection, and achieves state-of-the-arts with a 15FPS detection rate.

  Click for Model/Code and Paper