Models, code, and papers for "Hu Xu":

A Unified Framework for Clustering Constrained Data without Locality Property

Oct 02, 2018
Hu Ding, Jinhui Xu

In this paper, we consider a class of constrained clustering problems of points in $\mathbb{R}^{d}$, where $d$ could be rather high. A common feature of these problems is that their optimal clusterings no longer have the locality property (due to the additional constraints), which is a key property required by many algorithms for their unconstrained counterparts. To overcome the difficulty caused by the loss of locality, we present in this paper a unified framework, called {\em Peeling-and-Enclosing (PnE)}, to iteratively solve two variants of the constrained clustering problems, {\em constrained $k$-means clustering} ($k$-CMeans) and {\em constrained $k$-median clustering} ($k$-CMedian). Our framework is based on two standalone geometric techniques, called {\em Simplex Lemma} and {\em Weaker Simplex Lemma}, for $k$-CMeans and $k$-CMedian, respectively. The simplex lemma (or weaker simplex lemma) enables us to efficiently approximate the mean (or median) point of an unknown set of points by searching a small-size grid, independent of the dimensionality of the space, in a simplex (or the surrounding region of a simplex), and thus can be used to handle high dimensional data. If $k$ and $\frac{1}{\epsilon}$ are fixed numbers, our framework generates, in nearly linear time ({\em i.e.,} $O(n(\log n)^{k+1}d)$), $O((\log n)^{k})$ $k$-tuple candidates for the $k$ mean or median points, and one of them induces a $(1+\epsilon)$-approximation for $k$-CMeans or $k$-CMedian, where $n$ is the number of points. Combining this unified framework with a problem-specific selection algorithm (which determines the best $k$-tuple candidate), we obtain a $(1+\epsilon)$-approximation for each of the constrained clustering problems. We expect that our technique will be applicable to other constrained clustering problems without locality.

  Click for Model/Code and Paper
An End-to-end Approach for Handling Unknown Slot Values in Dialogue State Tracking

May 03, 2018
Puyang Xu, Qi Hu

We highlight a practical yet rarely discussed problem in dialogue state tracking (DST), namely handling unknown slot values. Previous approaches generally assume predefined candidate lists and thus are not designed to output unknown values, especially when the spoken language understanding (SLU) module is absent as in many end-to-end (E2E) systems. We describe in this paper an E2E architecture based on the pointer network (PtrNet) that can effectively extract unknown slot values while still obtains state-of-the-art accuracy on the standard DSTC2 benchmark. We also provide extensive empirical evidence to show that tracking unknown values can be challenging and our approach can bring significant improvement with the help of an effective feature dropout technique.

* Accepted by ACL 2018 

  Click for Model/Code and Paper
Efficient Reinforcement Learning Using Recursive Least-Squares Methods

Jun 03, 2011
H. He, D. Hu, X. Xu

The recursive least-squares (RLS) algorithm is one of the most well-known algorithms used in adaptive filtering, system identification and adaptive control. Its popularity is mainly due to its fast convergence speed, which is considered to be optimal in practice. In this paper, RLS methods are used to solve reinforcement learning problems, where two new reinforcement learning algorithms using linear value function approximators are proposed and analyzed. The two algorithms are called RLS-TD(lambda) and Fast-AHC (Fast Adaptive Heuristic Critic), respectively. RLS-TD(lambda) can be viewed as the extension of RLS-TD(0) from lambda=0 to general lambda within interval [0,1], so it is a multi-step temporal-difference (TD) learning algorithm using RLS methods. The convergence with probability one and the limit of convergence of RLS-TD(lambda) are proved for ergodic Markov chains. Compared to the existing LS-TD(lambda) algorithm, RLS-TD(lambda) has advantages in computation and is more suitable for online learning. The effectiveness of RLS-TD(lambda) is analyzed and verified by learning prediction experiments of Markov chains with a wide range of parameter settings. The Fast-AHC algorithm is derived by applying the proposed RLS-TD(lambda) algorithm in the critic network of the adaptive heuristic critic method. Unlike conventional AHC algorithm, Fast-AHC makes use of RLS methods to improve the learning-prediction efficiency in the critic. Learning control experiments of the cart-pole balancing and the acrobot swing-up problems are conducted to compare the data efficiency of Fast-AHC with conventional AHC. From the experimental results, it is shown that the data efficiency of learning control can also be improved by using RLS methods in the learning-prediction process of the critic. The performance of Fast-AHC is also compared with that of the AHC method using LS-TD(lambda). Furthermore, it is demonstrated in the experiments that different initial values of the variance matrix in RLS-TD(lambda) are required to get better performance not only in learning prediction but also in learning control. The experimental results are analyzed based on the existing theoretical work on the transient phase of forgetting factor RLS methods.

* Journal Of Artificial Intelligence Research, Volume 16, pages 259-292, 2002 

  Click for Model/Code and Paper
Generalized tensor regression with covariates on multiple modes

Oct 21, 2019
Zhuoyan Xu, Jiaxin Hu, Miaoyan Wang

We consider the problem of tensor-response regression given covariates on multiple modes. Such data problems arise frequently in applications such as neuroimaging, network analysis, and spatial-temporal modeling. We propose a new family of tensor response regression models that incorporate covariates, and establish the theoretical accuracy guarantees. Unlike earlier methods, our estimation allows high-dimensionality in both the tensor response and the covariate matrices on multiple modes. An efficient alternating updating algorithm is further developed. Our proposal handles a broad range of data types, including continuous, count, and binary observations. Through simulation and applications to two real datasets, we demonstrate the outperformance of our approach over the state-of-art.

* 25 pages, 6 figures 

  Click for Model/Code and Paper
Controlled CNN-based Sequence Labeling for Aspect Extraction

May 29, 2019
Lei Shu, Hu Xu, Bing Liu

One key task of fine-grained sentiment analysis on reviews is to extract aspects or features that users have expressed opinions on. This paper focuses on supervised aspect extraction using a modified CNN called controlled CNN (Ctrl). The modified CNN has two types of control modules. Through asynchronous parameter updating, it prevents over-fitting and boosts CNN's performance significantly. This model achieves state-of-the-art results on standard aspect extraction datasets. To the best of our knowledge, this is the first paper to apply control modules to aspect extraction.

  Click for Model/Code and Paper
Unseen Class Discovery in Open-world Classification

Jan 17, 2018
Lei Shu, Hu Xu, Bing Liu

This paper concerns open-world classification, where the classifier not only needs to classify test examples into seen classes that have appeared in training but also reject examples from unseen or novel classes that have not appeared in training. Specifically, this paper focuses on discovering the hidden unseen classes of the rejected examples. Clearly, without prior knowledge this is difficult. However, we do have the data from the seen training classes, which can tell us what kind of similarity/difference is expected for examples from the same class or from different classes. It is reasonable to assume that this knowledge can be transferred to the rejected examples and used to discover the hidden unseen classes in them. This paper aims to solve this problem. It first proposes a joint open classification model with a sub-model for classifying whether a pair of examples belongs to the same or different classes. This sub-model can serve as a distance function for clustering to discover the hidden classes of the rejected examples. Experimental results show that the proposed model is highly promising.

  Click for Model/Code and Paper
DOC: Deep Open Classification of Text Documents

Sep 25, 2017
Lei Shu, Hu Xu, Bing Liu

Traditional supervised learning makes the closed-world assumption that the classes appeared in the test data must have appeared in training. This also applies to text learning or text classification. As learning is used increasingly in dynamic open environments where some new/test documents may not belong to any of the training classes, identifying these novel documents during classification presents an important problem. This problem is called open-world classification or open classification. This paper proposes a novel deep learning based approach. It outperforms existing state-of-the-art techniques dramatically.

* accepted at EMNLP 2017 

  Click for Model/Code and Paper
Lifelong Learning CRF for Supervised Aspect Extraction

Apr 29, 2017
Lei Shu, Hu Xu, Bing Liu

This paper makes a focused contribution to supervised aspect extraction. It shows that if the system has performed aspect extraction from many past domains and retained their results as knowledge, Conditional Random Fields (CRF) can leverage this knowledge in a lifelong learning manner to extract in a new domain markedly better than the traditional CRF without using this prior knowledge. The key innovation is that even after CRF training, the model can still improve its extraction with experiences in its applications.

* Accepted at ACL 2017. arXiv admin note: text overlap with arXiv:1612.07940 

  Click for Model/Code and Paper
A New Path to Construct Parametric Orientation Field: Sparse FOMFE Model and Compressed Sparse FOMFE Model

Jul 01, 2014
Jinwei Xu, Jiankun Hu, Xiuping Jia

Orientation field, representing the fingerprint ridge structure direction, plays a crucial role in fingerprint-related image processing tasks. Orientation field is able to be constructed by either non-parametric or parametric methods. In this paper, the advantages and disadvantages regarding to the existing non-parametric and parametric approaches are briefly summarized. With the further investigation for constructing the orientation field by parametric technique, two new models - sparse FOMFE model and compressed sparse FOMFE model are introduced, based on the rapidly developing signal sparse representation and compressed sensing theories. The experiments on high-quality fingerprint image dataset (plain and rolled print) and poor-quality fingerprint image dataset (latent print) demonstrate their feasibilities to construct the orientation field in a sparse or even compressed sparse mode. The comparisons among the state-of-art orientation field modeling approaches show that the proposed two models have the potential availability in big data-oriented fingerprint indexing tasks.

  Click for Model/Code and Paper
A Fully Automated Latent Fingerprint Matcher with Embedded Self-learning Segmentation Module

Jun 26, 2014
Jinwei Xu, Jiankun Hu, Xiuping Jia

Latent fingerprint has the practical value to identify the suspects who have unintentionally left a trace of fingerprint in the crime scenes. However, designing a fully automated latent fingerprint matcher is a very challenging task as it needs to address many challenging issues including the separation of overlapping structured patterns over the partial and poor quality latent fingerprint image, and finding a match against a large background database that would have different resolutions. Currently there is no fully automated latent fingerprint matcher available to the public and most literature reports have utilized a specialized latent fingerprint matcher COTS3 which is not accessible to the public. This will make it infeasible to assess and compare the relevant research work which is vital for this research community. In this study, we target to develop a fully automated latent matcher for adaptive detection of the region of interest and robust matching of latent prints. Unlike the manually conducted matching procedure, the proposed latent matcher can run like a sealed black box without any manual intervention. This matcher consists of the following two modules: (i) the dictionary learning-based region of interest (ROI) segmentation scheme; and (ii) the genetic algorithm-based minutiae set matching unit. Experimental results on NIST SD27 latent fingerprint database demonstrates that the proposed matcher outperforms the currently public state-of-art latent fingerprint matcher.

  Click for Model/Code and Paper
An Improved Historical Embedding without Alignment

Oct 19, 2019
Xiaofei Xu, Ke Deng, Fei Hu, Li Li

Many words have evolved in meaning as a result of cultural and social change. Understanding such changes is crucial for modelling language and cultural evolution. Low-dimensional embedding methods have shown promise in detecting words' meaning change by encoding them into dense vectors. However, when exploring semantic change of words over time, these methods require the alignment of word embeddings across different time periods. This process is computationally expensive, prohibitively time consuming and suffering from contextual variability. In this paper, we propose a new and scalable method for encoding words from different time periods into one dense vector space. This can greatly improve performance when it comes to identifying words that have changed in meaning over time. We evaluated our method on dataset from Google Books N-gram. Our method outperformed three other popular methods in terms of the number of words correctly identified to have changed in meaning. Additionally, we provide an intuitive visualization of the semantic evolution of some words extracted by our method

* 9 pages, 5 figures, conference 

  Click for Model/Code and Paper
Exploring Weight Symmetry in Deep Neural Networks

Jan 10, 2019
Xu Shell Hu, Sergey Zagoruyko, Nikos Komodakis

We propose to impose symmetry in neural network parameters to improve parameter usage and make use of dedicated convolution and matrix multiplication routines. Due to significant reduction in the number of parameters as a result of the symmetry constraints, one would expect a dramatic drop in accuracy. Surprisingly, we show that this is not the case, and, depending on network size, symmetry can have little or no negative effect on network accuracy, especially in deep overparameterized networks. We propose several ways to impose local symmetry in recurrent and convolutional neural networks, and show that our symmetry parameterizations satisfy universal approximation property for single hidden layer networks. We extensively evaluate these parameterizations on CIFAR, ImageNet and language modeling datasets, showing significant benefits from the use of symmetry. For instance, our ResNet-101 with channel-wise symmetry has almost 25% less parameters and only 0.2% accuracy loss on ImageNet. Code for our experiments is available at

  Click for Model/Code and Paper
Supervised Complementary Entity Recognition with Augmented Key-value Pairs of Knowledge

May 29, 2017
Hu Xu, Lei Shu, Philip S. Yu

Extracting opinion targets is an important task in sentiment analysis on product reviews and complementary entities (products) are one important type of opinion targets that may work together with the reviewed product. In this paper, we address the problem of Complementary Entity Recognition (CER) as a supervised sequence labeling with the capability of expanding domain knowledge as key-value pairs from unlabeled reviews, by automatically learning and enhancing knowledge-based features. We use Conditional Random Field (CRF) as the base learner and augment CRF with knowledge-based features (called the Knowledge-based CRF or KCRF for short). We conduct experiments to show that KCRF effectively improves the performance of supervised CER task.

  Click for Model/Code and Paper
A Unified End-to-End Framework for Efficient Deep Image Compression

Feb 11, 2020
Jiaheng Liu, Guo Lu, Zhihao Hu, Dong Xu

Image compression is a widely used technique to reduce the spatial redundancy in images. Recently, learning based image compression has achieved significant progress by using the powerful representation ability from neural networks. However, the current state-of-the-art learning based image compression methods suffer from the huge computational complexity, which limits their capacity for practical applications. In this paper, we propose a unified framework called Efficient Deep Image Compression (EDIC) based on three new technologies, including a channel attention module, a Gaussian mixture model and a decoder-side enhancement module. Specifically, we design an auto-encoder style network for learning based image compression. To improve the coding efficiency, we exploit the channel relationship between latent representations by using the channel attention module. Besides, the Gaussian mixture model is introduced for the entropy model and improves the accuracy for bitrate estimation. Furthermore, we introduce the decoder-side enhancement module to further improve image compression performance. Our EDIC method can also be readily incorporated with the Deep Video Compression (DVC) framework to further improve the video compression performance. Simultaneously, our EDIC method boosts the coding performance significantly while bringing slightly increased computational complexity. More importantly, experimental results demonstrate that the proposed approach outperforms the current state-of-the-art image compression methods and is up to more than 150 times faster in terms of decoding speed when compared with Minnen's method. The proposed framework also successfully improves the performance of the recent deep video compression system DVC.

* have been submitted to journal 

  Click for Model/Code and Paper
Towards More Efficient and Effective Inference: The Joint Decision of Multi-Participants

Jan 19, 2020
Hui Zhu, Zhulin An, Kaiqiang Xu, Xiaolong Hu, Yongjun Xu

Existing approaches to improve the performances of convolutional neural networks by optimizing the local architectures or deepening the networks tend to increase the size of models significantly. In order to deploy and apply the neural networks to edge devices which are in great demand, reducing the scale of networks are quite crucial. However, It is easy to degrade the performance of image processing by compressing the networks. In this paper, we propose a method which is suitable for edge devices while improving the efficiency and effectiveness of inference. The joint decision of multi-participants, mainly contain multi-layers and multi-networks, can achieve higher classification accuracy (0.26% on CIFAR-10 and 4.49% on CIFAR-100 at most) with similar total number of parameters for classical convolutional neural networks.

  Click for Model/Code and Paper
Constructing the F-Graph with a Symmetric Constraint for Subspace Clustering

Dec 17, 2019
Kai Xu, Xiao-Jun Wu, Wen-Bo Hu

Based on further studying the low-rank subspace clustering (LRSC) and L2-graph subspace clustering algorithms, we propose a F-graph subspace clustering algorithm with a symmetric constraint (FSSC), which constructs a new objective function with a symmetric constraint basing on F-norm, whose the most significant advantage is to obtain a closed-form solution of the coefficient matrix. Then, take the absolute value of each element of the coefficient matrix, and retain the k largest coefficients per column, set the other elements to 0, to get a new coefficient matrix. Finally, FSSC performs spectral clustering over the new coefficient matrix. The experimental results on face clustering and motion segmentation show FSSC algorithm can not only obviously reduce the running time, but also achieve higher accuracy compared with the state-of-the-art representation-based subspace clustering algorithms, which verifies that the FSSC algorithm is efficacious and feasible.

* 9 pages, 1 figure 

  Click for Model/Code and Paper
Few-Features Attack to Fool Machine Learning Models through Mask-Based GAN

Nov 12, 2019
Feng Chen, Yunkai Shang, Bo Xu, Jincheng Hu

GAN is a deep-learning based generative approach to generate contents such as images, languages and speeches. Recently, studies have shown that GAN can also be applied to generative adversarial attack examples to fool the machine-learning models. In comparison with the previous non-learning adversarial example attack approaches, the GAN-based adversarial attack example approach can generate the adversarial samples quickly using the GAN architecture every time facing a new sample after training, but meanwhile needs to perturb the attack samples in great quantities, which results in the unpractical application in reality. To address this issue, we propose a new approach, named Few-Feature-Attack-GAN (FFA-GAN). FFA-GAN has a significant time-consuming advantage than the non-learning adversarial samples approaches and a better non-zero-features performance than the GANbased adversarial sample approaches. FFA-GAN can automatically generate the attack samples in the black-box attack through the GAN architecture instead of the evolutional algorithms or the other non-learning approaches. Besides, we introduce the mask mechanism into the generator network of the GAN architecture to optimize the constraint issue, which can also be regarded as the sparsity problem of the important features. During the training, the different weights of losses of the generator are set in the different training phases to ensure the divergence of the two above mentioned parallel networks of the generator. Experiments are made respectively on the structured data sets KDD-Cup 1999 and CIC-IDS 2017, in which the dimensions of the data are relatively low, and also on the unstructured data sets MNIST and CIFAR-10 with the data of the relatively high dimensions. The results of the experiments demonstrate the effectiveness and the robustness of our proposed approach.

  Click for Model/Code and Paper
Spoofing and Anti-Spoofing with Wax Figure Faces

Oct 12, 2019
Shan Jia, Xin Li, Chuanbo Hu, Zhengquan Xu

We have witnessed rapid advances in both face presentation attack models and presentation attack detection (PAD) in recent years. Compared to widely studied 2D face presentation attacks (e.g. printed photos and video replays), 3D face presentation attacks are more challenging because face recognition systems (FRS) is more easily confused by the 3D characteristics of materials similar to real faces. Existing 3D face spoofing databases, mostly based on 3D facial masks, are restricted to small data size and suffer from poor authenticity due to the difficulty and expense of mask production. In this work, we introduce a wax figure face database (WFFD) as a novel and super-realistic 3D face presentation attack. This database contains 2300 image pairs (totally 4600) and 745 subjects including both real and wax figure faces with high diversity from online collections. On one hand, our experiments have demonstrated the spoofing potential of WFFD on three popular FRSs. On the other hand, we have developed a multi-feature voting scheme for wax figure face detection (anti-spoofing), which combines three discriminative features at the decision level. The proposed detection method was compared against several face PAD approaches and found to outperform other competing methods. Surprisingly, our fusion-based detection method achieves an Average Classification Error Rate (ACER) of 11.73\% on the WFFD database, which is even better than human-based detection.

  Click for Model/Code and Paper
Modeling Multi-Action Policy for Task-Oriented Dialogues

Aug 30, 2019
Lei Shu, Hu Xu, Bing Liu, Piero Molino

Dialogue management (DM) plays a key role in the quality of the interaction with the user in a task-oriented dialogue system. In most existing approaches, the agent predicts only one DM policy action per turn. This significantly limits the expressive power of the conversational agent and introduces unwanted turns of interactions that may challenge users' patience. Longer conversations also lead to more errors and the system needs to be more robust to handle them. In this paper, we compare the performance of several models on the task of predicting multiple acts for each turn. A novel policy model is proposed based on a recurrent cell called gated Continue-Act-Slots (gCAS) that overcomes the limitations of the existing models. Experimental results show that gCAS outperforms other approaches. The code is available at

* 7 

  Click for Model/Code and Paper