Research papers and code for "Jun Wang":
Recent years have witnessed wide application of hashing for large-scale image retrieval. However, most existing hashing methods are based on hand-crafted features which might not be optimally compatible with the hashing procedure. Recently, deep hashing methods have been proposed to perform simultaneous feature learning and hash-code learning with deep neural networks, which have shown better performance than traditional hashing methods with hand-crafted features. Most of these deep hashing methods are supervised whose supervised information is given with triplet labels. For another common application scenario with pairwise labels, there have not existed methods for simultaneous feature learning and hash-code learning. In this paper, we propose a novel deep hashing method, called deep pairwise-supervised hashing(DPSH), to perform simultaneous feature learning and hash-code learning for applications with pairwise labels. Experiments on real datasets show that our DPSH method can outperform other methods to achieve the state-of-the-art performance in image retrieval applications.

* IJCAI 2016
Click to Read Paper and Get Code
Recent progress in Generative Adversarial Networks (GANs) has shown promising signs of improving GAN training via architectural change. Despite some early success, at present the design of GAN architectures requires human expertise, laborious trial-and-error testings, and often draws inspiration from its image classification counterpart. In the current paper, we present the first neural architecture search algorithm, automated neural architecture search for deep generative models, or AGAN for abbreviation, that is specifically suited for GAN training. For unsupervised image generation tasks on CIFAR-10, our algorithm finds architecture that outperforms state-of-the-art models under same regularization techniques. For supervised tasks, the automatically searched architectures also achieve highly competitive performance, outperforming best human-invented architectures at resolution $32\times32$. Moreover, we empirically demonstrate that the modules learned by AGAN are transferable to other image generation tasks such as STL-10.

Click to Read Paper and Get Code
Modeling personality is a challenging problem with applications spanning computer games, virtual assistants, online shopping and education. Many techniques have been tried, ranging from neural networks to computational cognitive architectures. However, most approaches rely on examples with hand-crafted features and scenarios. Here, we approach learning a personality by training agents using a Deep Q-Network (DQN) model on rewards based on psychoanalysis, against hand-coded AI in the game of Pong. As a result, we obtain 4 agents, each with its own personality. Then, we define happiness of an agent, which can be seen as a measure of alignment with agent's objective function, and study it when agents play both against hand-coded AI, and against each other. We find that the agents that achieve higher happiness during testing against hand-coded AI, have lower happiness when competing against each other. This suggests that higher happiness in testing is a sign of overfitting in learning to interact with hand-coded AI, and leads to worse performance against agents with different personalities.

* 5 pages, 3 figures, Presented at the Aligned Artificial Intelligence Workshop at the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA; changed the footnote to workshop
Click to Read Paper and Get Code
Visual data, such as an image or a sequence of video frames, is often naturally represented as a point set. In this paper, we consider the fundamental problem of finding a nearest set from a collection of sets, to a query set. This problem has obvious applications in large-scale visual retrieval and recognition, and also in applied fields beyond computer vision. One challenge stands out in solving the problem---set representation and measure of similarity. Particularly, the query set and the sets in dataset collection can have varying cardinalities. The training collection is large enough such that linear scan is impractical. We propose a simple representation scheme that encodes both statistical and structural information of the sets. The derived representations are integrated in a kernel framework for flexible similarity measurement. For the query set process, we adopt a learning-to-hash pipeline that turns the kernel representations into hash bits based on simple learners, using multiple kernel learning. Experiments on two visual retrieval datasets show unambiguously that our set-to-set hashing framework outperforms prior methods that do not take the set-to-set search setting.

* 9 pages
Click to Read Paper and Get Code
This article provides an interesting exploration of character-level convolutional neural network solving Chinese corpus text classification problem. We constructed a large-scale Chinese language dataset, and the result shows that character-level convolutional neural network works better on Chinese corpus than its corresponding pinyin format dataset. This is the first time that character-level convolutional neural network applied to text classification problem.

* MSc Thesis, 44 pages
Click to Read Paper and Get Code
In practical machine learning systems, graph based data representation has been widely used in various learning paradigms, ranging from unsupervised clustering to supervised classification. Besides those applications with natural graph or network structure data, such as social network analysis and relational learning, many other applications often involve a critical step in converting data vectors to an adjacency graph. In particular, a sparse subgraph extracted from the original graph is often required due to both theoretic and practical needs. Previous study clearly shows that the performance of different learning algorithms, e.g., clustering and classification, benefits from such sparse subgraphs with balanced node connectivity. However, the existing graph construction methods are either computationally expensive or with unsatisfactory performance. In this paper, we utilize a scalable method called auction algorithm and its parallel extension to recover a sparse yet nearly balanced subgraph with significantly reduced computational cost. Empirical study and comparison with the state-ofart approaches clearly demonstrate the superiority of the proposed method in both efficiency and accuracy.

* Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)
Click to Read Paper and Get Code
Feature selection is one of the most prominent learning tasks, especially in high-dimensional datasets in which the goal is to understand the mechanisms that underly the learning dataset. However most of them typically deliver just a flat set of relevant features and provide no further information on what kind of structures, e.g. feature groupings, might underly the set of relevant features. In this paper we propose a new learning paradigm in which our goal is to uncover the structures that underly the set of relevant features for a given learning problem. We uncover two types of features sets, non-replaceable features that contain important information about the target variable and cannot be replaced by other features, and functionally similar features sets that can be used interchangeably in learned models, given the presence of the non-replaceable features, with no change in the predictive performance. To do so we propose a new learning algorithm that learns a number of disjoint models using a model disjointness regularization constraint together with a constraint on the predictive agreement of the disjoint models. We explore the behavior of our approach on a number of high-dimensional datasets, and show that, as expected by their construction, these satisfy a number of properties. Namely, model disjointness, a high predictive agreement, and a similar predictive performance to models learned on the full set of relevant features. The ability to structure the set of relevant features in such a manner can become a valuable tool in different applications of scientific knowledge discovery.

Click to Read Paper and Get Code
Deep 3-dimensional (3D) Convolutional Network (ConvNet) has shown promising performance on video recognition tasks because of its powerful spatio-temporal information fusion ability. However, the extremely intensive requirements on memory access and computing power prohibit it from being used in resource-constrained scenarios, such as portable and edge devices. So in this paper, we first propose a two-stage Fully Separable Block (FSB) to significantly compress the model sizes of 3D ConvNets. Then a feature enhancement approach named Temporal Residual Gradient (TRG) is developed to improve the performance of compressed model on video tasks, which provides higher accuracy, faster convergency and better robustness. Moreover, in order to further decrease the computing workload, we propose a hybrid Fast Algorithm (hFA) to drastically reduce the computation complexity of convolutions. These methods are effectively combined to design a light-weight and efficient ConvNet for video recognition tasks. Experiments on the popular dataset report 2.3x compression rate, 3.6x workload reduction, and 6.3% top-1 accuracy gain, over the state-of-the-art SlowFast model, which is already a highly compact model. The proposed methods also show good adaptability on traditional 3D ConvNet, demonstrating 7.4x more compact model, 11.0x less workload, and 3.0% higher accuracy

Click to Read Paper and Get Code
Unlike images or videos data which can be easily labeled by human being, sensor data annotation is a time-consuming process. However, traditional methods of human activity recognition require a large amount of such strictly labeled data for training classifiers. In this paper, we present an attention-based convolutional neural network for human recognition from weakly labeled data. The proposed attention model can focus on labeled activity among a long sequence of sensor data, and while filter out a large amount of background noise signals. In experiment on the weakly labeled dataset, we show that our attention model outperforms classical deep learning methods in accuracy. Besides, we determine the specific locations of the labeled activity in a long sequence of weakly labeled data by converting the compatibility score which is generated from attention model to compatibility density. Our method greatly facilitates the process of sensor data annotation, and makes data collection more easy.

* 6 pages, 5 figures
Click to Read Paper and Get Code
We consider the problem of minimizing a smooth convex function by reducing the optimization to computing the Nash equilibrium of a particular zero-sum convex-concave game. Zero-sum games can be solved using online learning dynamics, where a classical technique involves simulating two no-regret algorithms that play against each other and, after $T$ rounds, the average iterate is guaranteed to solve the original optimization problem with error decaying as $O(\log T/T)$. In this paper we show that the technique can be enhanced to a rate of $O(1/T^2)$ by extending recent work \cite{RS13,SALS15} that leverages \textit{optimistic learning} to speed up equilibrium computation. The resulting optimization algorithm derived from this analysis coincides \textit{exactly} with the well-known \NA \cite{N83a} method, and indeed the same story allows us to recover several variants of the Nesterov's algorithm via small tweaks. We are also able to establish the accelerated linear rate for a function which is both strongly-convex and smooth. This methodology unifies a number of different iterative optimization methods: we show that the \HB algorithm is precisely the non-optimistic variant of \NA, and recent prior work already established a similar perspective on \FW \cite{AW17,ALLW18}.

Click to Read Paper and Get Code
To automatically test web applications, crawling-based techniques are usually adopted to mine the behavior models, explore the state spaces or detect the violated invariants of the applications. However, in existing crawlers, rules for identifying the topics of input text fields, such as login ids, passwords, emails, dates and phone numbers, have to be manually configured. Moreover, the rules for one application are very often not suitable for another. In addition, when several rules conflict and match an input text field to more than one topics, it can be difficult to determine which rule suggests a better match. This paper presents a natural-language approach to automatically identify the topics of encountered input fields during crawling by semantically comparing their similarities with the input fields in labeled corpus. In our evaluation with 100 real-world forms, the proposed approach demonstrated comparable performance to the rule-based one. Our experiments also show that the accuracy of the rule-based approach can be improved by up to 19% when integrated with our approach.

* 11 pages
Click to Read Paper and Get Code
Multi-objective optimisation is regarded as one of the most promising ways for dealing with constrained optimisation problems in evolutionary optimisation. This paper presents a theoretical investigation of a multi-objective optimisation evolutionary algorithm for solving the 0-1 knapsack problem. Two initialisation methods are considered in the algorithm: local search initialisation and greedy search initialisation. Then the solution quality of the algorithm is analysed in terms of the approximation ratio.

Click to Read Paper and Get Code
Deep learning has been successfully applied to the single-image super-resolution (SISR) task with great performance in recent years. However, most convolutional neural network based SR models require heavy computation, which limit their real-world applications. In this work, a lightweight SR network, named Adaptive Weighted Super-Resolution Network (AWSRN), is proposed for SISR to address this issue. A novel local fusion block (LFB) is designed in AWSRN for efficient residual learning, which consists of stacked adaptive weighted residual units (AWRU) and a local residual fusion unit (LRFU). Moreover, an adaptive weighted multi-scale (AWMS) module is proposed to make full use of features in reconstruction layer. AWMS consists of several different scale convolutions, and the redundancy scale branch can be removed according to the contribution of adaptive weights in AWMS for lightweight network. The experimental results on the commonly used datasets show that the proposed lightweight AWSRN achieves superior performance on x2, x3, x4, and x8 scale factors to state-of-the-art methods with similar parameters and computational overhead. Code is avaliable at: https://github.com/ChaofWang/AWSRN

* 9 pages, 6 figures
Click to Read Paper and Get Code
We consider a new variant of \textsc{AMSGrad}. AMSGrad \cite{RKK18} is a popular adaptive gradient based optimization algorithm that is widely used in training deep neural networks. Our new variant of the algorithm assumes that mini-batch gradients in consecutive iterations have some underlying structure, which makes the gradients sequentially predictable. By exploiting the predictability and some ideas from the field of \textsc{Optimistic Online learning}, the new algorithm can accelerate the convergence and enjoy a tighter regret bound. We conduct experiments on training various neural networks on several datasets to show that the proposed method speeds up the convergence in practice.

Click to Read Paper and Get Code
One of the well-known formulations of human perception is a hierarchical inference model based on the interaction between conceptual knowledge and sensory stimuli from the partially observable environment. This model helps human to learn inductive biases and guides their behaviors by minimizing their surprise of observations. However, most model-based reinforcement learning still lacks the support of object-based physical reasoning. In this paper, we propose Object-based Perception Control (OPC). It combines the learning of perceiving objects from the scene and that of control of the objects in the perceived environments by the free-energy principle. Extensive experiments on high-dimensional pixel environments show that OPC outperforms several strong baselines in accumulated rewards and the quality of perceptual grouping.

Click to Read Paper and Get Code
Humans have consciousness as the ability to perceive events and objects: a mental model of the world developed from the most impoverished of visual stimuli, enabling humans to make rapid decisions and take actions. Although spatial and temporal aspects of different scenes are generally diverse, the underlying physics among environments still work the same way, thus learning an abstract description of shared physical dynamics helps human to understand the world. In this paper, we explore building this mental world with neural network models through multi-task learning, namely the meta-world model. We show through extensive experiments that our proposed meta-world models successfully capture the common dynamics over the compact representations of visually different environments from Atari Games. We also demonstrate that agents equipped with our meta-world model possess the ability of visual self-recognition, i.e., recognize themselves from the reflected mirrored environment derived from the classic mirror self-recognition test (MSR).

Click to Read Paper and Get Code
Deep transfer learning has acquired significant research interest. It makes use of pre-trained models that are learned from a source domain, and utilizes these models for the tasks in a target domain. Model-based deep transfer learning is arguably the most frequently used method. However, very little work has been devoted to enhancing deep transfer learning by focusing on the influence of data. In this work, we propose an instance-based approach to improve deep transfer learning in target domain. Specifically, we choose a pre-trained model which is learned from a source domain, and utilize this model to estimate the influence of each training sample in a target domain. Then we optimize training data of the target domain by removing the training samples that will lower the performance of the pre-trained model. We then fine-tune the pre-trained model with the optimized training data in the target domain, or build a new model which can be initialized partially based on the pre-trained model, and fine-tune it with the optimized training data in the target domain. Using this approach, transfer learning can help deep learning models to learn more useful features. Extensive experiments demonstrate the effectiveness of our approach on further boosting deep learning models for typical high-level computer vision tasks, such as image classification.

Click to Read Paper and Get Code
Deep learning models learn to fit training data while they are highly expected to generalize well to testing data. Most works aim at finding such models by creatively designing architectures and fine-tuning parameters. To adapt to particular tasks, hand-crafted information such as image prior has also been incorporated into end-to-end learning. However, very little progress has been made on investigating how an individual training sample will influence the generalization ability of a model. In other words, to achieve high generalization accuracy, do we really need all the samples in a training dataset? In this paper, we demonstrate that deep learning models such as convolutional neural networks may not favor all training samples, and generalization accuracy can be further improved by dropping those unfavorable samples. Specifically, the influence of removing a training sample is quantifiable, and we propose a Two-Round Training approach, aiming to achieve higher generalization accuracy. We locate unfavorable samples after the first round of training, and then retrain the model from scratch with the reduced training dataset in the second round. Since our approach is essentially different from fine-tuning or further training, the computational cost should not be a concern. Our extensive experimental results indicate that, with identical settings, the proposed approach can boost performance of the well-known networks on both high-level computer vision problems such as image classification, and low-level vision problems such as image denoising.

Click to Read Paper and Get Code
Recently, hidden Markov models (HMMs) have achieved promising results for offline handwritten Chinese text recognition. However, due to the large vocabulary of Chinese characters with each modeled by a uniform and fixed number of hidden states, a high demand of memory and computation is required. In this study, to address this issue, we present parsimonious HMMs via the state tying which can fully utilize the similarities among different Chinese characters. Two-step algorithm with the data-driven question-set is adopted to generate the tied-state pool using the likelihood measure. The proposed parsimonious HMMs with both Gaussian mixture models (GMMs) and deep neural networks (DNNs) as the emission distributions not only lead to a compact model but also improve the recognition accuracy via the data sharing for the tied states and the confusion decreasing among state classes. Tested on ICDAR-2013 competition database, in the best configured case, the new parsimonious DNN-HMM can yield a relative character error rate (CER) reduction of 6.2%, 25% reduction of model size and 60% reduction of decoding time over the conventional DNN-HMM. In the compact setting case of average 1-state HMM, our parsimonious DNN-HMM significantly outperforms the conventional DNN-HMM with a relative CER reduction of 35.5%.

* Accepted by ICFHR2018
Click to Read Paper and Get Code
In this article, we mathematically study several GAN related topics, including Inception score, label smoothing, gradient vanishing and the -log(D(x)) alternative. --- An advanced version is included in arXiv:1703.02000 "Activation Maximization Generative Adversarial Nets". Please refer Section 6 in 1703.02000 for detailed analysis on Inception Score, and refer its appendix for the discussions on Label Smoothing, Gradient Vanishing and -log(D(x)) Alternative.

* An advanced version is included in arXiv:1703.02000 "Activation Maximization Generative Adversarial Nets"
Click to Read Paper and Get Code