Models, code, and papers for "Xi Zhang":

Deep convolutional neural networks (DCNN) have enjoyed great successes in many signal processing applications because they can learn complex, non-linear causal relationships from input to output. In this light, DCNNs are well suited for the task of sequential prediction of multidimensional signals, such as images, and have the potential of improving the performance of traditional linear predictors. In this research we investigate how far DCNNs can push the envelop in terms of prediction precision. We propose, in a case study, a two-stage deep regression DCNN framework for nonlinear prediction of two-dimensional image signals. In the first-stage regression, the proposed deep prediction network (PredNet) takes the causal context as input and emits a prediction of the present pixel. Three PredNets are trained with the regression objectives of minimizing $\ell_1$, $\ell_2$ and $\ell_\infty$ norms of prediction residuals, respectively. The second-stage regression combines the outputs of the three PredNets to generate an even more precise and robust prediction. The proposed deep regression model is applied to lossless predictive image coding, and it outperforms the state-of-the-art linear predictors by appreciable margin.

Imbalanced Learning is an important learning algorithm for the classification models, which have enjoyed much popularity on many applications. Typically, imbalanced learning algorithms can be partitioned into two types, i.e., data level approaches and algorithm level approaches. In this paper, the focus is to develop a robust synthetic minority oversampling technique which falls the umbrella of data level approaches. On one hand, we proposed a method to generate synthetic samples in a high dimensional feature space, instead of a linear sampling space. On the other hand, in the proposed imbalanced learning framework, Gaussian Mixture Model is employed to distinguish the outliers from minority class instances and filter out the synthetic majority class instances. Last and more importantly, an adaptive optimization method is proposed to optimize these parameters in sampling process. By doing so, an effectiveness and efficiency imbalanced learning framework is developed.

Recently a number of CNN-based techniques were proposed to remove image compression artifacts. As in other restoration applications, these techniques all learn a mapping from decompressed patches to the original counterparts under the ubiquitous L2 metric. However, this approach is incapable of restoring distinctive image details which may be statistical outliers but have high semantic importance (e.g., tiny lesions in medical images). To overcome this weakness, we propose to incorporate an L-infinity fidelity criterion in the design of neural network so that no small, distinctive structures of the original image can be dropped or distorted. Moreover, our anti-artifacts neural network is designed to work on a range of compression bit rates, rather than a fixed one as in the past. Experimental results demonstrate that the proposed method outperforms the state-of-the-art methods in L-infinity error metric and perceptual quality, while being competitive in L2 error metric as well. It can restore subtle image details that are otherwise destroyed or missed by other algorithms. Our research suggests a new machine learning paradigm of ultra high fidelity image compression that is ideally suited for applications in medicine, space, and sciences.

This paper presents Aicyber's system for NLPCC 2017 shared task 2. It is formed by a voting of three deep learning based system trained on character-enhanced word vectors and a well known bag-of-word model.

In November 2016 we submitted to arXiv our paper "Automated Inference on Criminality Using Face Images". It generated a great deal of discussions in the Internet and some media outlets. Our work is only intended for pure academic discussions; how it has become a media consumption is a total surprise to us. Although in agreement with our critics on the need and importance of policing AI research for the general good of the society, we are deeply baffled by the ways some of them mispresented our work, in particular the motive and objective of our research.

Sparse Subspace Clustering (SSC) has achieved state-of-the-art clustering quality by performing spectral clustering over a $\ell^{1}$-norm based similarity graph. However, SSC is a transductive method which does not handle with the data not used to construct the graph (out-of-sample data). For each new datum, SSC requires solving $n$ optimization problems in O(n) variables for performing the algorithm over the whole data set, where $n$ is the number of data points. Therefore, it is inefficient to apply SSC in fast online clustering and scalable graphing. In this letter, we propose an inductive spectral clustering algorithm, called inductive Sparse Subspace Clustering (iSSC), which makes SSC feasible to cluster out-of-sample data. iSSC adopts the assumption that high-dimensional data actually lie on the low-dimensional manifold such that out-of-sample data could be grouped in the embedding space learned from in-sample data. Experimental results show that iSSC is promising in clustering out-of-sample data.

Given the success of the deep convolutional neural networks (DCNNs) in applications of visual recognition and classification, it would be tantalizing to test if DCNNs can also learn spatial concepts, such as straightness, convexity, left/right, front/back, relative size, aspect ratio, polygons, etc., from varied visual examples of these concepts that are simple and yet vital for spatial reasoning. Much to our dismay, extensive experiments of the type of cognitive psychology demonstrate that the data-driven deep learning (DL) cannot see through superficial variations in visual representations and grasp the spatial concept in abstraction. The root cause of failure turns out to be the learning methodology, not the computational model of the neural network itself. By incorporating task-specific convolutional kernels, we are able to construct DCNNs for spatial cognition tasks that can generalize to input images not drawn from the same distribution of the training set. This work raises a precaution that without manually-incorporated priors or features DCCNs may fail spatial cognitive tasks at rudimentary level.

This paper studies distributed estimation and inference for a general statistical problem with a convex loss that could be non-differentiable. For the purpose of efficient computation, we restrict ourselves to stochastic first-order optimization, which enjoys low per-iteration complexity. To motivate the proposed method, we first investigate the theoretical properties of a straightforward Divide-and-Conquer Stochastic Gradient Descent (DC-SGD) approach. Our theory shows that there is a restriction on the number of machines and this restriction becomes more stringent when the dimension $p$ is large. To overcome this limitation, this paper proposes a new multi-round distributed estimation procedure that approximates the Newton step only using stochastic subgradient. The key component in our method is the proposal of a computationally efficient estimator of $\Sigma^{-1} w$, where $\Sigma$ is the population Hessian matrix and $w$ is any given vector. Instead of estimating $\Sigma$ (or $\Sigma^{-1}$) that usually requires the second-order differentiability of the loss, the proposed First-Order Newton-type Estimator (FONE) directly estimates the vector of interest $\Sigma^{-1} w$ as a whole and is applicable to non-differentiable losses. Our estimator also facilitates the inference for the empirical risk minimizer. It turns out that the key term in the limiting covariance has the form of $\Sigma^{-1} w$, which can be estimated by FONE.

This paper studies the inference problem in quantile regression (QR) for a large sample size $n$ but under a limited memory constraint, where the memory can only store a small batch of data of size $m$. A natural method is the na\"ive divide-and-conquer approach, which splits data into batches of size $m$, computes the local QR estimator for each batch, and then aggregates the estimators via averaging. However, this method only works when $n=o(m^2)$ and is computationally expensive. This paper proposes a computationally efficient method, which only requires an initial QR estimator on a small batch of data and then successively refines the estimator via multiple rounds of aggregations. Theoretically, as long as $n$ grows polynomially in $m$, we establish the asymptotic normality for the obtained estimator and show that our estimator with only a few rounds of aggregations achieves the same efficiency as the QR estimator computed on all the data. Moreover, our result allows the case that the dimensionality $p$ goes to infinity. The proposed method can also be applied to address the QR problem under distributed computing environment (e.g., in a large-scale sensor network) or for real-time streaming data.

With the arrival of the big data era, more and more data are becoming readily available in various real-world applications and those data are usually highly heterogeneous. Taking computational medicine as an example, we have both Electronic Health Records (EHR) and medical images for each patient. For complicated diseases such as Parkinson's and Alzheimer's, both EHR and neuroimaging information are very important for disease understanding because they contain complementary aspects of the disease. However, EHR and neuroimage are completely different. So far the existing research has been mainly focusing on one of them. In this paper, we proposed a framework, Memory-Based Graph Convolution Network (MemGCN), to perform integrative analysis with such multi-modal data. Specifically, GCN is used to extract useful information from the patients' neuroimages. The information contained in the patient EHRs before the acquisition of each brain image is captured by a memory network because of its sequential nature. The information contained in each brain image is combined with the information read out from the memory network to infer the disease state at the image acquisition timestamp. To further enhance the analytical power of MemGCN, we also designed a multi-hop strategy that allows multiple reading and updating on the memory can be performed at each iteration. We conduct experiments using the patient data from the Parkinson's Progression Markers Initiative (PPMI) with the task of classification of Parkinson's Disease (PD) cases versus controls. We demonstrate that superior classification performance can be achieved with our proposed framework, comparing with existing approaches involving a single type of data.

Subitizing, or the sense of small natural numbers, is a cognitive construct so primary and critical to the survival and well-being of humans and primates that is considered and proven to be innate; it responds to visual stimuli prior to the development of any symbolic skills, language or arithmetic. Given highly acclaimed successes of deep convolutional neural networks (DCNN) in tasks of visual intelligence, one would expect that DCNNs can learn subitizing. But somewhat surprisingly, our carefully crafted extensive experiments, which are similar to those of cognitive psychology, demonstrate that DCNNs cannot, even with strong supervision, see through superficial variations in visual representations and distill the abstract notion of natural number, a task that children perform with high accuracy and confidence. The DCNN black box learners driven by very large training sets are apparently still confused by geometric variations and fail to grasp the topological essence in subitizing. In sharp contrast to the failures of the black box learning, by incorporating a mechanism of mathematical morphology into convolutional kernels, we are able to construct a recurrent convolutional neural network that can perform subitizing deterministically. Our findings in this study of cognitive computing, without and with prior of human knowledge, are discussed; they are, we believe, significant and thought-provoking in the interests of AI research, because visual-based numerosity is a benchmark of minimum sort for human cognition.

The investment on the stock market is prone to be affected by the Internet. For the purpose of improving the prediction accuracy, we propose a multi-task stock prediction model that not only considers the stock correlations but also supports multi-source data fusion. Our proposed model first utilizes tensor to integrate the multi-sourced data, including financial Web news, investors' sentiments extracted from the social network and some quantitative data on stocks. In this way, the intrinsic relationships among different information sources can be captured, and meanwhile, multi-sourced information can be complemented to solve the data sparsity problem. Secondly, we propose an improved sub-mode coordinate algorithm (SMC). SMC is based on the stock similarity, aiming to reduce the variance of their subspace in each dimension produced by the tensor decomposition. The algorithm is able to improve the quality of the input features, and thus improves the prediction accuracy. And the paper utilizes the Long Short-Term Memory (LSTM) neural network model to predict the stock fluctuation trends. Finally, the experiments on 78 A-share stocks in CSI 100 and thirteen popular HK stocks in the year 2015 and 2016 are conducted. The results demonstrate the improvement on the prediction accuracy and the effectiveness of the proposed model.

Large-scale mobile traffic analytics is becoming essential to digital infrastructure provisioning, public transportation, events planning, and other domains. Monitoring city-wide mobile traffic is however a complex and costly process that relies on dedicated probes. Some of these probes have limited precision or coverage, others gather tens of gigabytes of logs daily, which independently offer limited insights. Extracting fine-grained patterns involves expensive spatial aggregation of measurements, storage, and post-processing. In this paper, we propose a mobile traffic super-resolution technique that overcomes these problems by inferring narrowly localised traffic consumption from coarse measurements. We draw inspiration from image processing and design a deep-learning architecture tailored to mobile networking, which combines Zipper Network (ZipNet) and Generative Adversarial neural Network (GAN) models. This enables to uniquely capture spatio-temporal relations between traffic volume snapshots routinely monitored over broad coverage areas (`low-resolution') and the corresponding consumption at 0.05 km $^2$ level (`high-resolution') usually obtained after intensive computation. Experiments we conduct with a real-world data set demonstrate that the proposed ZipNet(-GAN) infers traffic consumption with remarkable accuracy and up to 100$\times$ higher granularity as compared to standard probing, while outperforming existing data interpolation techniques. To our knowledge, this is the first time super-resolution concepts are applied to large-scale mobile traffic analysis and our solution is the first to infer fine-grained urban traffic patterns from coarse aggregates.

We study zero-shot learning (ZSL) as a transfer learning problem, and focus on the two key aspects of ZSL, model effectiveness and model adaptation. For effective modeling, we adopt the boosting strategy to learn a zero-shot classifier from weak models to a strong model. For adaptable knowledge transfer, we devise a Semantic Correlation Regularization (SCR) approach to regularize the boosted model to be consistent with the inter-class semantic correlations. With SCR embedded in the boosting objective, and with a self-controlled sample selection for learning robustness, we propose a unified framework, Boosted Zero-shot classification with Semantic Correlation Regularization (BZ-SCR). By balancing the SCR-regularized boosted model selection and the self-controlled sample selection, BZ-SCR is capable of capturing both discriminative and adaptable feature-to-class semantic alignments, while ensuring the reliability and adaptability of the learned samples. The experiments on two ZSL datasets show the superiority of BZ-SCR over the state-of-the-arts.

This article is a sequel to our earlier work [25]. The main objective of our research is to explore the potential of supervised machine learning in face-induced social computing and cognition, riding on the momentum of much heralded successes of face processing, analysis and recognition on the tasks of biometric-based identification. We present a case study of automated statistical inference on sociopsychological perceptions of female faces controlled for race, attractiveness, age and nationality. Our empirical evidences point to the possibility of training machine learning algorithms, using example face images characterized by internet users, to predict perceptions of personality traits and demeanors.

We study the problem of selecting $K$ arms with the highest expected rewards in a stochastic $n$-armed bandit game. This problem has a wide range of applications, e.g., A/B testing, crowdsourcing, simulation optimization. Our goal is to develop a PAC algorithm, which, with probability at least $1-\delta$, identifies a set of $K$ arms with the aggregate regret at most $\epsilon$. The notion of aggregate regret for multiple-arm identification was first introduced in \cite{Zhou:14} , which is defined as the difference of the averaged expected rewards between the selected set of arms and the best $K$ arms. In contrast to \cite{Zhou:14} that only provides instance-independent sample complexity, we introduce a new hardness parameter for characterizing the difficulty of any given instance. We further develop two algorithms and establish the corresponding sample complexity in terms of this hardness parameter. The derived sample complexity can be significantly smaller than state-of-the-art results for a large class of instances and matches the instance-independent lower bound upto a $\log(\epsilon^{-1})$ factor in the worst case. We also prove a lower bound result showing that the extra $\log(\epsilon^{-1})$ is necessary for instance-dependent algorithms using the introduced hardness parameter.

It is a key to construct a similarity graph in graph-oriented subspace learning and clustering. In a similarity graph, each vertex denotes a data point and the edge weight represents the similarity between two points. There are two popular schemes to construct a similarity graph, i.e., pairwise distance based scheme and linear representation based scheme. Most existing works have only involved one of the above schemes and suffered from some limitations. Specifically, pairwise distance based methods are sensitive to the noises and outliers compared with linear representation based methods. On the other hand, there is the possibility that linear representation based algorithms wrongly select inter-subspaces points to represent a point, which will degrade the performance. In this paper, we propose an algorithm, called Locally Linear Representation (LLR), which integrates pairwise distance with linear representation together to address the problems. The proposed algorithm can automatically encode each data point over a set of points that not only could denote the objective point with less residual error, but also are close to the point in Euclidean space. The experimental results show that our approach is promising in subspace learning and subspace clustering.

Low-frequency words place a major challenge for automatic speech recognition (ASR). The probabilities of these words, which are often important name entities, are generally under-estimated by the language model (LM) due to their limited occurrences in the training data. Recently, we proposed a word-pair approach to deal with the problem, which borrows information of frequent words to enhance the probabilities of low-frequency words. This paper presents an extension to the word-pair method by involving multiple `predicting words' to produce better estimation for low-frequency words. We also employ this approach to deal with out-of-language words in the task of multi-lingual speech recognition.

In the new wave of artificial intelligence, deep learning is impacting various industries. As a closely related area, optimization algorithms greatly contribute to the development of deep learning. But the reverse applications are still insufficient. Is there any efficient way to solve certain optimization problem through deep learning? The key is to convert the optimization to a representation suitable for deep learning. In this paper, a traveling salesman problem (TSP) is studied. Considering that deep learning is good at image processing, an image representation method is proposed to transfer a TSP to an image. Based on samples of a 10 city TSP, a fully convolutional network (FCN) is used to learn the mapping from a feasible region to an optimal solution. The training process is analyzed and interpreted through stages. A visualization method is presented to show how a FCN can understand the training task of a TSP. Once the training is completed, no significant effort is required to solve a new TSP and the prediction is obtained on the scale of milliseconds. The results show good performance in finding the global optimal solution. Moreover, the developed FCN model has been demonstrated on TSP's with different city numbers, proving excellent generalization performance.

When samples have internal structure, we often see a mismatch between the objective optimized during training and the model's goal during inference. For example, in sequence-to-sequence modeling we are interested in high-quality translated sentences, but training typically uses maximum likelihood at the word level. Learning to recognize individual faces from group photos, each captioned with the correct but unordered list of people in it, is another example where a mismatch between training and inference objectives occurs. In both cases, the natural training-time loss would involve a combinatorial problem -- dynamic programming-based global sequence alignment and weighted bipartite graph matching, respectively -- but solutions to combinatorial problems are not differentiable with respect to their input parameters, so surrogate, differentiable losses are used instead. Here, we show how to perform gradient descent over combinatorial optimization algorithms that involve continuous parameters, for example edge weights, and can be efficiently expressed as integer, linear, or mixed-integer linear programs. We demonstrate usefulness of gradient descent over combinatorial optimization in sequence-to-sequence modeling using differentiable encoder-decoder architecture with softmax or Gumbel-softmax, and in weakly supervised learning involving a convolutional, residual feed-forward network for image classification.