Models, code, and papers for "Xu Sun":

Structure Regularization for Structured Prediction: Theories and Experiments

Jan 30, 2015
Xu Sun

While there are many studies on weight regularization, the study on structure regularization is rare. Many existing systems on structured prediction focus on increasing the level of structural dependencies within the model. However, this trend could have been misdirected, because our study suggests that complex structures are actually harmful to generalization ability in structured prediction. To control structure-based overfitting, we propose a structure regularization framework via \emph{structure decomposition}, which decomposes training samples into mini-samples with simpler structures, deriving a model with better generalization power. We show both theoretically and empirically that structure regularization can effectively control overfitting risk and lead to better accuracy. As a by-product, the proposed method can also substantially accelerate the training speed. The method and the theoretical results can apply to general graphical models with arbitrary structures. Experiments on well-known tasks demonstrate that our method can easily beat the benchmark systems on those highly-competitive tasks, achieving state-of-the-art accuracies yet with substantially faster training speed.

  Click for Model/Code and Paper
Exact Decoding on Latent Variable Conditional Models is NP-Hard

Jun 18, 2014
Xu Sun

Latent variable conditional models, including the latent conditional random fields as a special case, are popular models for many natural language processing and vision processing tasks. The computational complexity of the exact decoding/inference in latent conditional random fields is unclear. In this paper, we try to clarify the computational complexity of the exact decoding. We analyze the complexity and demonstrate that it is an NP-hard problem even on a sequential labeling setting. Furthermore, we propose the latent-dynamic inference (LDI-Naive) method and its bounded version (LDI-Bounded), which are able to perform exact-inference or almost-exact-inference by using top-$n$ search and dynamic programming.

  Click for Model/Code and Paper
Transfer Deep Learning for Low-Resource Chinese Word Segmentation with a Novel Neural Network

Sep 14, 2017
Jingjing Xu, Xu Sun

Recent studies have shown effectiveness in using neural networks for Chinese word segmentation. However, these models rely on large-scale data and are less effective for low-resource datasets because of insufficient training data. We propose a transfer learning method to improve low-resource word segmentation by leveraging high-resource corpora. First, we train a teacher model on high-resource corpora and then use the learned knowledge to initialize a student model. Second, a weighted data similarity method is proposed to train the student model on low-resource data. Experiment results show that our work significantly improves the performance on low-resource datasets: 2.3% and 1.5% F-score on PKU and CTB datasets. Furthermore, this paper achieves state-of-the-art results: 96.1%, and 96.2% F-score on PKU and CTB datasets.

  Click for Model/Code and Paper
F-Score Driven Max Margin Neural Network for Named Entity Recognition in Chinese Social Media

Apr 11, 2017
Hangfeng He, Xu Sun

We focus on named entity recognition (NER) for Chinese social media. With massive unlabeled text and quite limited labelled corpus, we propose a semi-supervised learning model based on B-LSTM neural network. To take advantage of traditional methods in NER such as CRF, we combine transition probability with deep learning in our model. To bridge the gap between label accuracy and F-score of NER, we construct a model which can be directly trained on F-score. When considering the instability of F-score driven method and meaningful information provided by label accuracy, we propose an integrated method to train on both F-score and label accuracy. Our integrated model yields 7.44\% improvement over previous state-of-the-art result.

  Click for Model/Code and Paper
Conditional Random Fields with Decode-based Learning: Simpler and Faster

Apr 18, 2018
Xu Sun, Shuming Ma

Conditional random fields (CRF) is one of the most famous approaches for structured classification. It is a structured gradient-based method, which has high accuracy but with drawbacks: very slow training, hard to implement in the tasks with complex structures, and no support of decode-based optimization (which is important in many cases). To address these issues, we propose a simple and fast solution, a decode-based probabilistic online learning method, called CRF with decode-based learning (DBL-CRF). The proposed DBL-CRF decodes the output candidates, derives probabilities, and conduct efficient online learning. The method has the similar probabilistic information as CRF, but supports decode-based optimization and does not need gradient computation. We show that this method is with fast training, very simple to implement, with top accuracy, and with theoretical guarantees of convergence. Experiments on well-known tasks show that our method has better accuracy and much faster speed than our strong baseline CRF systems. The code is available at

  Click for Model/Code and Paper
Hybrid Oracle: Making Use of Ambiguity in Transition-based Chinese Dependency Parsing

Feb 06, 2018
Xuancheng Ren, Xu Sun

In the training of transition-based dependency parsers, an oracle is used to predict a transition sequence for a sentence and its gold tree. However, the transition system may exhibit ambiguity, that is, there can be multiple correct transition sequences that form the gold tree. We propose to make use of the property in the training of neural dependency parsers, and present the Hybrid Oracle. The new oracle gives all the correct transitions for a parsing state, which are used in the cross entropy loss function to provide better supervisory signal. It is also used to generate different transition sequences for a sentence to better explore the training data and improve the generalization ability of the parser. Evaluations show that the parsers trained using the hybrid oracle outperform the parsers using the traditional oracle in Chinese dependency parsing. We provide analysis from a linguistic view. The code is available at .

  Click for Model/Code and Paper
A Chinese Dataset with Negative Full Forms for General Abbreviation Prediction

Dec 18, 2017
Yi Zhang, Xu Sun

Abbreviation is a common phenomenon across languages, especially in Chinese. In most cases, if an expression can be abbreviated, its abbreviation is used more often than its fully expanded forms, since people tend to convey information in a most concise way. For various language processing tasks, abbreviation is an obstacle to improving the performance, as the textual form of an abbreviation does not express useful information, unless it's expanded to the full form. Abbreviation prediction means associating the fully expanded forms with their abbreviations. However, due to the deficiency in the abbreviation corpora, such a task is limited in current studies, especially considering general abbreviation prediction should also include those full form expressions that do not have valid abbreviations, namely the negative full forms (NFFs). Corpora incorporating negative full forms for general abbreviation prediction are few in number. In order to promote the research in this area, we build a dataset for general Chinese abbreviation prediction, which needs a few preprocessing steps, and evaluate several different models on the built dataset. The dataset is available at

  Click for Model/Code and Paper
A Semantic Relevance Based Neural Network for Text Summarization and Text Simplification

Oct 06, 2017
Shuming Ma, Xu Sun

Text summarization and text simplification are two major ways to simplify the text for poor readers, including children, non-native speakers, and the functionally illiterate. Text summarization is to produce a brief summary of the main ideas of the text, while text simplification aims to reduce the linguistic complexity of the text and retain the original meaning. Recently, most approaches for text summarization and text simplification are based on the sequence-to-sequence model, which achieves much success in many text generation tasks. However, although the generated simplified texts are similar to source texts literally, they have low semantic relevance. In this work, our goal is to improve semantic relevance between source texts and simplified texts for text summarization and text simplification. We introduce a Semantic Relevance Based neural model to encourage high semantic similarity between texts and summaries. In our model, the source text is represented by a gated attention encoder, while the summary representation is produced by a decoder. Besides, the similarity score between the representations is maximized during training. Our experiments show that the proposed model outperforms the state-of-the-art systems on two benchmark corpus.

  Click for Model/Code and Paper
A Generic Online Parallel Learning Framework for Large Margin Models

Mar 02, 2017
Shuming Ma, Xu Sun

To speed up the training process, many existing systems use parallel technology for online learning algorithms. However, most research mainly focus on stochastic gradient descent (SGD) instead of other algorithms. We propose a generic online parallel learning framework for large margin models, and also analyze our framework on popular large margin algorithms, including MIRA and Structured Perceptron. Our framework is lock-free and easy to implement on existing systems. Experiments show that systems with our framework can gain near linear speed up by increasing running threads, and with no loss in accuracy.

  Click for Model/Code and Paper
Lock-Free Parallel Perceptron for Graph-based Dependency Parsing

Mar 02, 2017
Xu Sun, Shuming Ma

Dependency parsing is an important NLP task. A popular approach for dependency parsing is structured perceptron. Still, graph-based dependency parsing has the time complexity of $O(n^3)$, and it suffers from slow training. To deal with this problem, we propose a parallel algorithm called parallel perceptron. The parallel algorithm can make full use of a multi-core computer which saves a lot of training time. Based on experiments we observe that dependency parsing with parallel perceptron can achieve 8-fold faster training speed than traditional structured perceptron methods when using 10 threads, and with no loss at all in accuracy.

  Click for Model/Code and Paper
A New Recurrent Neural CRF for Learning Non-linear Edge Features

Nov 14, 2016
Shuming Ma, Xu Sun

Conditional Random Field (CRF) and recurrent neural models have achieved success in structured prediction. More recently, there is a marriage of CRF and recurrent neural models, so that we can gain from both non-linear dense features and globally normalized CRF objective. These recurrent neural CRF models mainly focus on encode node features in CRF undirected graphs. However, edge features prove important to CRF in structured prediction. In this work, we introduce a new recurrent neural CRF model, which learns non-linear edge features, and thus makes non-linear features encoded completely. We compare our model with different neural models in well-known structured prediction tasks. Experiments show that our model outperforms state-of-the-art methods in NP chunking, shallow parsing, Chinese word segmentation and POS tagging.

  Click for Model/Code and Paper
GPU Accelerated Cascade Hashing Image Matching for Large Scale 3D Reconstruction

May 23, 2018
Tao Xu, Kun Sun, Wenbing Tao

Image feature point matching is a key step in Structure from Motion(SFM). However, it is becoming more and more time consuming because the number of images is getting larger and larger. In this paper, we proposed a GPU accelerated image matching method with improved Cascade Hashing. Firstly, we propose a Disk-Memory-GPU data exchange strategy and optimize the load order of data, so that the proposed method can deal with big data. Next, we parallelize the Cascade Hashing method on GPU. An improved parallel reduction and an improved parallel hashing ranking are proposed to fulfill this task. Finally, extensive experiments show that our image matching is about 20 times faster than SiftGPU on the same graphics card, nearly 100 times faster than the CPU CasHash method and hundreds of times faster than the CPU Kd-Tree based matching method. Further more, we introduce the epipolar constraint to the proposed method, and use the epipolar geometry to guide the feature matching procedure, which further reduces the matching cost.

  Click for Model/Code and Paper
Mining Commonsense Facts from the Physical World

Feb 11, 2020
Yanyan Zou, Wei Lu, Xu Sun

Textual descriptions of the physical world implicitly mention commonsense facts, while the commonsense knowledge bases explicitly represent such facts as triples. Compared to dramatically increased text data, the coverage of existing knowledge bases is far away from completion. Most of the prior studies on populating knowledge bases mainly focus on Freebase. To automatically complete commonsense knowledge bases to improve their coverage is under-explored. In this paper, we propose a new task of mining commonsense facts from the raw text that describes the physical world. We build an effective new model that fuses information from both sequence text and existing knowledge base resource. Then we create two large annotated datasets each with approximate 200k instances for commonsense knowledge base completion. Empirical results demonstrate that our model significantly outperforms baselines.

  Click for Model/Code and Paper
Towards Real Scene Super-Resolution with Raw Images

May 29, 2019
Xiangyu Xu, Yongrui Ma, Wenxiu Sun

Most existing super-resolution methods do not perform well in real scenarios due to lack of realistic training data and information loss of the model input. To solve the first problem, we propose a new pipeline to generate realistic training data by simulating the imaging process of digital cameras. And to remedy the information loss of the input, we develop a dual convolutional neural network to exploit the originally captured radiance information in raw images. In addition, we propose to learn a spatially-variant color transformation which helps more effective color corrections. Extensive experiments demonstrate that super-resolution with raw data helps recover fine details and clear structures, and more importantly, the proposed network and data generation pipeline achieve superior results for single image super-resolution in real scenarios.

* Accepted in CVPR 2019, project page: 

  Click for Model/Code and Paper
A Real-Time Tiny Detection Model for Stem End and Blossom End of Navel Orange

May 24, 2019
Xiaoye Sun, Shaoyun Xu, Gongyan Li

To distinguish the stem end and blossom end of navel orange from its black spot, we propose a real-time tiny detection model (RTTD) with low computational cost, compact architecture and high detection accuracy. In particular, based on the characteristics of the data, we apply pure dense connectivity to limit and simplify the design of the model architecture and use k-means clustering to set the size and aspect ratios of the default boxes. The architecture of model is based on deeply supervised object detectors (DSOD), and which reduces some components like dense block and prediction layers for efficient and adds some auxiliary structure like Squeeze-and-Excitation layer and Swish for accuracy. And we create a dataset in Pascal VOC format annotated the three types of detection targets stem end, blossom end and black spot. Experimental results on our orange data set confirm that RTTD has competitive results to the state-of-the-art one stage detectors like SSD, DSOD, YOLOv2, YOLOv3, RFB and FSSD, and it achieves 87.479%mAP at 131 FPS with only 5.812M parameters.

  Click for Model/Code and Paper
Learning Deformable Kernels for Image and Video Denoising

Apr 15, 2019
Xiangyu Xu, Muchen Li, Wenxiu Sun

Most of the classical denoising methods restore clear results by selecting and averaging pixels in the noisy input. Instead of relying on hand-crafted selecting and averaging strategies, we propose to explicitly learn this process with deep neural networks. Specifically, we propose deformable 2D kernels for image denoising where the sampling locations and kernel weights are both learned. The proposed kernel naturally adapts to image structures and could effectively reduce the oversmoothing artifacts. Furthermore, we develop 3D deformable kernels for video denoising to more efficiently sample pixels across the spatial-temporal space. Our method is able to solve the misalignment issues of large motion from dynamic scenes. For better training our video denoising model, we introduce the trilinear sampler and a new regularization term. We demonstrate that the proposed method performs favorably against the state-of-the-art image and video denoising approaches on both synthetic and real-world data.

* 10 pages 

  Click for Model/Code and Paper
Learning with Batch-wise Optimal Transport Loss for 3D Shape Recognition

Mar 21, 2019
Lin Xu, Han Sun, Yuai Liu

Deep metric learning is essential for visual recognition. The widely used pair-wise (or triplet) based loss objectives cannot make full use of semantical information in training samples or give enough attention to those hard samples during optimization. Thus, they often suffer from a slow convergence rate and inferior performance. In this paper, we show how to learn an importance-driven distance metric via optimal transport programming from batches of samples. It can automatically emphasize hard examples and lead to significant improvements in convergence. We propose a new batch-wise optimal transport loss and combine it in an end-to-end deep metric learning manner. We use it to learn the distance metric and deep feature representation jointly for recognition. Empirical results on visual retrieval and classification tasks with six benchmark datasets, i.e., MNIST, CIFAR10, SHREC13, SHREC14, ModelNet10, and ModelNet40, demonstrate the superiority of the proposed method. It can accelerate the convergence rate significantly while achieving a state-of-the-art recognition performance. For example, in 3D shape recognition experiments, we show that our method can achieve better recognition performance within only 5 epochs than what can be obtained by mainstream 3D shape recognition approaches after 200 epochs.

* 10 pages, 4 figures Accepted by CVPR2019 

  Click for Model/Code and Paper
Limited Gradient Descent: Learning With Noisy Labels

Dec 06, 2018
Yi Sun, Yan Tian, Yiping Xu

Label noise may handicap the generalization of classifiers, and the effective learning of the main pattern from samples with noisy labels is an important issue. Recent studies have shown that deep neural networks tend to prioritize the learning of simple patterns over the memorization of noise patterns. This suggests the need for a method to search for the best generalization that learns the main pattern until noise begins to be memorized. An intuitive idea is to use a supervised approach to find the stop timing of learning by, for example, employing a clean verification set. In practice, however, a clean verification set is sometimes difficult to obtain. To solve this problem, we propose an unsupervised method called limited gradient descent to estimate the best stop timing. We modified the labels of a few samples in a noisy dataset to be almost false labels, creating a reverse pattern. By monitoring the learning progresses of the noisy samples and the reverse samples, we could determine the stop timing of learning. In this paper, we also provide some sufficient conditions on learning with noisy labels. Experimental results on CIFAR-10 demonstrate that our approach has a similar generalization performance to supervised methods. For uncomplicated datasets, such as MNIST, we add a relabeling strategy to further improve generalization and achieve state-of-the-art performance.

  Click for Model/Code and Paper
A Two-Stream Variational Adversarial Network for Video Generation

Dec 03, 2018
Ximeng Sun, Huijuan Xu, Kate Saenko

Video generation is an inherently challenging task, as it requires the model to generate realistic content and motion simultaneously. Existing methods generate both motion and content together using a single generator network, but this approach may fail on complex videos. In this paper, we propose a two-stream video generation model that separates content and motion generation into two parallel generators, called Two-Stream Variational Adversarial Network (TwoStreamVAN). Our model outputs a realistic video given an input action label by progressively generating and fusing motion and content features at multiple scales using adaptive motion kernels. In addition, to better evaluate video generation models, we design a new synthetic human action dataset to bridge the difficulty gap between over-complicated human action datasets and simple toy datasets. Our model significantly outperforms existing methods on the standard Weizmann Human Action and MUG Facial Expression datasets, as well as our new dataset.

  Click for Model/Code and Paper
HyperAdam: A Learnable Task-Adaptive Adam for Network Training

Nov 22, 2018
Shipeng Wang, Jian Sun, Zongben Xu

Deep neural networks are traditionally trained using human-designed stochastic optimization algorithms, such as SGD and Adam. Recently, the approach of learning to optimize network parameters has emerged as a promising research topic. However, these learned black-box optimizers sometimes do not fully utilize the experience in human-designed optimizers, therefore have limitation in generalization ability. In this paper, a new optimizer, dubbed as \textit{HyperAdam}, is proposed that combines the idea of "learning to optimize" and traditional Adam optimizer. Given a network for training, its parameter update in each iteration generated by HyperAdam is an adaptive combination of multiple updates generated by Adam with varying decay rates. The combination weights and decay rates in HyperAdam are adaptively learned depending on the task. HyperAdam is modeled as a recurrent neural network with AdamCell, WeightCell and StateCell. It is justified to be state-of-the-art for various network training, such as multilayer perceptron, CNN and LSTM.

  Click for Model/Code and Paper