Models, code, and papers for "Xin Zhou":

##### STN-Homography: estimate homography parameters directly

Jun 06, 2019
Qiang Zhou, Xin Li

In this paper, we introduce the STN-Homography model to directly estimate the homography matrix between image pair. Different most CNN-based homography estimation methods which use an alternative 4-point homography parameterization, we use prove that, after coordinate normalization, the variance of elements of coordinate normalized $3\times3$ homography matrix is very small and suitable to be regressed well with CNN. Based on proposed STN-Homography, we use a hierarchical architecture which stacks several STN-Homography models and successively reduce the estimation error. Effectiveness of the proposed method is shown through experiments on MSCOCO dataset, in which it significantly outperforms the state-of-the-art. The average processing time of our hierarchical STN-Homography with 1 stage is only 4.87 ms on the GPU, and the processing time for hierarchical STN-Homography with 3 stages is 17.85 ms. The code will soon be open sourced.

##### A Continuous Opinion Dynamic Model in Co-evolving Networks--A Novel Group Decision Approach

May 17, 2017
Qingxing Dong, Xin Zhou

Opinion polarization is a ubiquitous phenomenon in opinion dynamics. In contrast to the traditional consensus oriented group decision making (GDM) framework, this paper proposes a framework with the co-evolution of both opinions and relationship networks to improve the potential consensus level of a group and help the group reach a stable state. Taking the bound of confidence and the degree of individual's persistence into consideration, the evolution of the opinion is driven by the relationship among the group. Meanwhile, the antagonism or cooperation of individuals presented by the network topology also evolve according to the dynamic opinion distances. Opinions are convergent and the stable state will be reached in this co-evolution mechanism. We further explored this framework through simulation experiments. The simulation results verify the influence of the level of persistence on the time cost and indicate the influence of group size, the initial topology of networks and the bound of confidence on the number of opinion clusters.

* 24 pages, 3 figures
##### Causal nearest neighbor rules for optimal treatment regimes

Nov 22, 2017
Xin Zhou, Michael R. Kosorok

The estimation of optimal treatment regimes is of considerable interest to precision medicine. In this work, we propose a causal $k$-nearest neighbor method to estimate the optimal treatment regime. The method roots in the framework of causal inference, and estimates the causal treatment effects within the nearest neighborhood. Although the method is simple, it possesses nice theoretical properties. We show that the causal $k$-nearest neighbor regime is universally consistent. That is, the causal $k$-nearest neighbor regime will eventually learn the optimal treatment regime as the sample size increases. We also establish its convergence rate. However, the causal $k$-nearest neighbor regime may suffer from the curse of dimensionality, i.e. performance deteriorates as dimensionality increases. To alleviate this problem, we develop an adaptive causal $k$-nearest neighbor method to perform metric selection and variable selection simultaneously. The performance of the proposed methods is illustrated in simulation studies and in an analysis of a chronic depression clinical trial.

##### Imbalanced classification: an objective-oriented review

Feb 11, 2020
Yang Feng, Min Zhou, Xin Tong

A common issue for classification in scientific research and industry is the existence of imbalanced classes. When sample sizes of different classes are imbalanced in training data, naively implementing a classification method often leads to unsatisfactory prediction results on test data. Multiple resampling techniques have been proposed to address the class imbalance issues. Yet, there is no general guidance on when to use each technique. In this article, we provide an objective-oriented review of the common resampling techniques for binary classification under imbalanced class sizes. The learning objectives we consider include the classical paradigm that minimizes the overall classification error, the cost-sensitive learning paradigm that minimizes a cost-adjusted weighted type I and type II errors, and the Neyman-Pearson paradigm that minimizes the type II error subject to a type I error constraint. Under each paradigm, we investigate the combination of the resampling techniques and a few state-of-the-art classification methods. For each pair of resampling techniques and classification methods, we use simulation studies to study the performance under different evaluation metrics. From these extensive simulation experiments, we demonstrate under each classification paradigm, the complex dynamics among resampling techniques, base classification methods, evaluation metrics, and imbalance ratios. For practitioners, the take-away message is that with imbalanced data, one usually should consider all the combinations of resampling techniques and the base classification methods.

* 31 pages, 24 figures
##### Searching for Stage-wise Neural Graphs In the Limit

Dec 30, 2019
Xin Zhou, Dejing Dou, Boyang Li

Search space is a key consideration for neural architecture search. Recently, Xie et al. (2019) found that randomly generated networks from the same distribution perform similarly, which suggests we should search for random graph distributions instead of graphs. We propose graphon as a new search space. A graphon is the limit of Cauchy sequence of graphs and a scale-free probabilistic distribution, from which graphs of different number of nodes can be drawn. By utilizing properties of the graphon space and the associated cut-distance metric, we develop theoretically motivated techniques that search for and scale up small-capacity stage-wise graphs found on small datasets to large-capacity graphs that can handle ImageNet. The scaled stage-wise graphs outperform DenseNet and randomly wired Watts-Strogatz networks, indicating the benefits of graphon theory in NAS applications.

##### Automatic Differentiation for Complex Valued SVD

Sep 09, 2019
Zhou-Quan Wan, Shi-Xin Zhang

In this note, we report the back propagation formula for complex valued singular value decompositions (SVD). This formula is an important ingredient for a complete automatic differentiation(AD) infrastructure in terms of complex numbers, and it is also the key to understand and utilize AD in tensor networks.

* 4.2 pages
##### Multimodal Machine Translation with Reinforcement Learning

May 07, 2018
Xin Qian, Ziyi Zhong, Jieli Zhou

Multimodal machine translation is one of the applications that integrates computer vision and language processing. It is a unique task given that in the field of machine translation, many state-of-the-arts algorithms still only employ textual information. In this work, we explore the effectiveness of reinforcement learning in multimodal machine translation. We present a novel algorithm based on the Advantage Actor-Critic (A2C) algorithm that specifically cater to the multimodal machine translation task of the EMNLP 2018 Third Conference on Machine Translation (WMT18). We experiment our proposed algorithm on the Multi30K multilingual English-German image description dataset and the Flickr30K image entity dataset. Our model takes two channels of inputs, image and text, uses translation evaluation metrics as training rewards, and achieves better results than supervised learning MLE baseline models. Furthermore, we discuss the prospects and limitations of using reinforcement learning for machine translation. Our experiment results suggest a promising reinforcement learning solution to the general task of multimodal sequence to sequence learning.

##### Matrix Completion via Max-Norm Constrained Optimization

Apr 28, 2017
T. Tony Cai, Wen-Xin Zhou

Matrix completion has been well studied under the uniform sampling model and the trace-norm regularized methods perform well both theoretically and numerically in such a setting. However, the uniform sampling model is unrealistic for a range of applications and the standard trace-norm relaxation can behave very poorly when the underlying sampling scheme is non-uniform. In this paper we propose and analyze a max-norm constrained empirical risk minimization method for noisy matrix completion under a general sampling model. The optimal rate of convergence is established under the Frobenius norm loss in the context of approximately low-rank matrix reconstruction. It is shown that the max-norm constrained method is minimax rate-optimal and yields a unified and robust approximate recovery guarantee, with respect to the sampling distributions. The computational effectiveness of this method is also discussed, based on first-order algorithms for solving convex optimizations involving max-norm regularization.

* 33 pages
##### A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion

Sep 24, 2013
T. Tony Cai, Wen-Xin Zhou

We consider in this paper the problem of noisy 1-bit matrix completion under a general non-uniform sampling distribution using the max-norm as a convex relaxation for the rank. A max-norm constrained maximum likelihood estimate is introduced and studied. The rate of convergence for the estimate is obtained. Information-theoretical methods are used to establish a minimax lower bound under the general sampling model. The minimax upper and lower bounds together yield the optimal rate of convergence for the Frobenius norm loss. Computational algorithms and numerical performance are also discussed.

* 33 pages, 3 figures
##### Nonconvex Regularized Robust Regression with Oracle Properties in Polynomial Time

Jul 09, 2019
Xiaoou Pan, Qiang Sun, Wen-Xin Zhou

This paper investigates tradeoffs among optimization errors, statistical rates of convergence and the effect of heavy-tailed random errors for high-dimensional adaptive Huber regression with nonconvex regularization. When the additive errors in linear models have only bounded second moment, our results suggest that adaptive Huber regression with nonconvex regularization yields statistically optimal estimators that satisfy oracle properties as if the true underlying support set were known beforehand. Computationally, we need as many as O(log s + log log d) convex relaxations to reach such oracle estimators, where s and d denote the sparsity and ambient dimension, respectively. Numerical studies lend strong support to our methodology and theory.

* 55 pages
##### On the approximation ability of evolutionary optimization with application to minimum set cover

Jan 08, 2012
Yang Yu, Xin Yao, Zhi-Hua Zhou

Evolutionary algorithms (EAs) are heuristic algorithms inspired by natural evolution. They are often used to obtain satisficing solutions in practice. In this paper, we investigate a largely underexplored issue: the approximation performance of EAs in terms of how close the solution obtained is to an optimal solution. We study an EA framework named simple EA with isolated population (SEIP) that can be implemented as a single- or multi-objective EA. We analyze the approximation performance of SEIP using the partial ratio, which characterizes the approximation ratio that can be guaranteed. Specifically, we analyze SEIP using a set cover problem that is NP-hard. We find that in a simple configuration, SEIP efficiently achieves an $H_n$-approximation ratio, the asymptotic lower bound, for the unbounded set cover problem. We also find that SEIP efficiently achieves an $(H_k-\frac{k-1}/{8k^9})$-approximation ratio, the currently best-achievable result, for the k-set cover problem. Moreover, for an instance class of the k-set cover problem, we disclose how SEIP, using either one-bit or bit-wise mutation, can overcome the difficulty that limits the greedy algorithm.

##### Detailed Proofs of Alternating Minimization Based Trajectory Generation for Quadrotor Aggressive Flight

Feb 21, 2020
Zhepei Wang, Xin Zhou, Chao Xu, Fei Gao

This technical report provides detailed theoretical analysis of the algorithm used in \textit{Alternating Minimization Based Trajectory Generation for Quadrotor Aggressive Flight}. An assumption is provided to ensure that settings for the objective function are meaningful. What's more, we explore the structure of the optimization problem and analyze the global/local convergence rate of the employed algorithm.

* Supplementary material for paper submitted to RA-L/IROS 2020
##### Automatic Differentiable Monte Carlo: Theory and Application

Nov 20, 2019
Shi-Xin Zhang, Zhou-Quan Wan, Hong Yao

Differentiable programming has emerged as a key programming paradigm empowering rapid developments of deep learning while its applications to important computational methods such as Monte Carlo remain largely unexplored. Here we present the general theory enabling infinite-order automatic differentiation on expectations computed by Monte Carlo with unnormalized probability distributions, which we call "automatic differentiable Monte Carlo" (ADMC). By implementing ADMC algorithms on computational graphs, one can also leverage state-of-the-art machine learning frameworks and techniques to traditional Monte Carlo applications in statistics and physics. We illustrate the versatility of ADMC by showing some applications: fast search of phase transitions and accurately finding ground states of interacting many-body models in two dimensions. ADMC paves a promising way to innovate Monte Carlo in various aspects to achieve higher accuracy and efficiency, e.g. easing or solving the sign problem of quantum many-body models through ADMC.

* 11.5 pages + supplemental materials, 4 figures
##### Pose Estimation for Texture-less Shiny Objects in a Single RGB Image Using Synthetic Training Data

Sep 23, 2019
Chen Chen, Xin Jiang, Weiguo Zhou, Yun-Hui Liu

In the industrial domain, the pose estimation of multiple texture-less shiny parts is a valuable but challenging task. In this particular scenario, it is impractical to utilize keypoints or other texture information because most of them are not actual features of the target but the reflections of surroundings. Moreover, the similarity of color also poses a challenge in segmentation. In this article, we propose to divide the pose estimation process into three stages: object detection, features detection and pose optimization. A convolutional neural network was utilized to perform object detection. Concerning the reliability of surface texture, we leveraged the contour information for estimating pose. Since conventional contour-based methods are inapplicable to clustered metal parts due to the difficulties in segmentation, we use the dense discrete points along the metal part edges as semantic keypoints for contour detection. Afterward, we exploit both keypoint information and CAD model to calculate the 6D pose of each object in view. A typical implementation of deep learning methods not only requires a large amount of training data, but also relies on intensive human labor for labeling the datasets. Therefore, we propose an approach to generate datasets and label them automatically. Despite not using any real-world photos for training, a series of experiments showed that the algorithm built on synthetic data perform well in the real environment.

##### A Variational Approach to Weakly Supervised Document-Level Multi-Aspect Sentiment Classification

Apr 10, 2019
Ziqian Zeng, Wenxuan Zhou, Xin Liu, Yangqiu Song

In this paper, we propose a variational approach to weakly supervised document-level multi-aspect sentiment classification. Instead of using user-generated ratings or annotations provided by domain experts, we use target-opinion word pairs as "supervision." These word pairs can be extracted by using dependency parsers and simple rules. Our objective is to predict an opinion word given a target word while our ultimate goal is to learn a sentiment polarity classifier to predict the sentiment polarity of each aspect given a document. By introducing a latent variable, i.e., the sentiment polarity, to the objective function, we can inject the sentiment polarity classifier to the objective via the variational lower bound. We can learn a sentiment polarity classifier by optimizing the lower bound. We show that our method can outperform weakly supervised baselines on TripAdvisor and BeerAdvocate datasets and can be comparable to the state-of-the-art supervised method with hundreds of labels per aspect.

* Accepted by NAACL-HLT 2019
##### Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search

Apr 05, 2019
Xin Li, Yiming Zhou, Zheng Pan, Jiashi Feng

Achieving good speed and accuracy trade-off on a target platform is very important in deploying deep neural networks in real world scenarios. However, most existing automatic architecture search approaches only concentrate on high performance. In this work, we propose an algorithm that can offer better speed/accuracy trade-off of searched networks, which is termed "Partial Order Pruning". It prunes the architecture search space with a partial order assumption to automatically search for the architectures with the best speed and accuracy trade-off. Our algorithm explicitly takes profile information about the inference speed on the target platform into consideration. With the proposed algorithm, we present several Dongfeng (DF) networks that provide high accuracy and fast inference speed on various application GPU platforms. By further searching decoder architectures, our DF-Seg real-time segmentation networks yield state-of-the-art speed/accuracy trade-off on both the target embedded device and the high-end GPU.

* Accepted to CVPR 2019
##### From Rank Estimation to Rank Approximation: Rank Residual Constraint for Image Denoising

Aug 14, 2018
Zhiyuan Zha, Xin Yuan, Tao Yue, Jiaotao Zhou

In this paper, we propose a novel approach for the rank minimization problem, termed rank residual constraint (RRC). Different from existing low-rank based approaches, such as the well-known weighted nuclear norm minimization (WNNM) and nuclear norm minimization (NNM), which aim to estimate the underlying low-rank matrix directly from the corrupted observation, we progressively approximate or approach the underlying low-rank matrix via minimizing the rank residual. By integrating the image nonlocal self-similarity (NSS) prior with the proposed RRC model, we develop an iterative algorithm for image denoising. To this end, we first present a recursive based nonlocal means method to obtain a good reference of the original image patch groups, and then the rank residual of the image patch groups between this reference and the noisy image is minimized to achieve a better estimate of the desired image. In this manner, both the reference and the estimated image in each iteration are improved gradually and jointly. Based on the group-based sparse representation model, we further provide a theoretical analysis on the feasibility of the proposed RRC model. Experimental results demonstrate that the proposed RRC model outperforms many state-of-the-art denoising methods in both the objective and perceptual qualities.

##### A Novel Co-design Peta-scale Heterogeneous Cluster for Deep Learning Training

May 18, 2018
Xin Chen, Hua Zhou, Yuxiang Gao, Yu Zhu

Large scale deep Convolution Neural Networks (CNNs) increasingly demands the computing power. It is key for researchers to own a great powerful computing platform to leverage deep learning (DL) advancing.On the other hand, as the commonly-used accelerator, the commodity GPUs cards of new generations are more and more expensive. Consequently, it is of importance to design an affordable distributed heterogeneous system that provides powerful computational capacity and develop a well-suited software that efficiently utilizes its computational capacity. In this paper, we present our co-design distributed system including a peta-scale GPU cluster, called "Manoa". Based on properties and topology of Manoa, we first propose job server framework and implement it, named "MiMatrix". The central node of MiMatrix, referred to as the job server, undertakes all of controlling, scheduling and monitoring, and I/O tasks without weight data transfer for AllReduce processing in each iteration. Therefore, MiMatrix intrinsically solves the bandwidth bottleneck of central node in parameter server framework that is widely used in distributed DL tasks. Meanwhile, we also propose a new AllReduce algorithm, GPUDirect RDMA-Aware AllReduce~(GDRAA), in which both computation and handshake message are O(1) and the number of synchronization is two in each iteration that is a theoretical minimum number. Owe to the dedicated co-design distributed system, MiMatrix efficiently makes use of the Manoa's computational capacity and bandwidth. We benchmark Manoa Resnet50 and Resenet101 on Imagenet-1K dataset. Some of results have demonstrated state-of-the-art.

* 23 pages, 4 figures, 1 table
##### Deep Neural Network Capacity

Feb 18, 2018
Aosen Wang, Hua Zhou, Wenyao Xu, Xin Chen

In recent years, deep neural network exhibits its powerful superiority on information discrimination in many computer vision applications. However, the capacity of deep neural network architecture is still a mystery to the researchers. Intuitively, larger capacity of neural network can always deposit more information to improve the discrimination ability of the model. But, the learnable parameter scale is not feasible to estimate the capacity of deep neural network. Due to the overfitting, directly increasing hidden nodes number and hidden layer number are already demonstrated not necessary to effectively increase the network discrimination ability. In this paper, we propose a novel measurement, named "total valid bits", to evaluate the capacity of deep neural networks for exploring how to quantitatively understand the deep learning and the insights behind its super performance. Specifically, our scheme to retrieve the total valid bits incorporates the skilled techniques in both training phase and inference phase. In the network training, we design decimal weight regularization and 8-bit forward quantization to obtain the integer-oriented network representations. Moreover, we develop adaptive-bitwidth and non-uniform quantization strategy in the inference phase to find the neural network capacity, total valid bits. By allowing zero bitwidth, our adaptive-bitwidth quantization can execute the model reduction and valid bits finding simultaneously. In our extensive experiments, we first demonstrate that our total valid bits is a good indicator of neural network capacity. We also analyze the impact on network capacity from the network architecture and advanced training skills, such as dropout and batch normalization.

* There is an error in Average Valid Bits computation in figure 1 in page 2
##### Distributed learning with regularized least squares

Mar 11, 2017
Shao-Bo Lin, Xin Guo, Ding-Xuan Zhou

We study distributed learning with the least squares regularization scheme in a reproducing kernel Hilbert space (RKHS). By a divide-and-conquer approach, the algorithm partitions a data set into disjoint data subsets, applies the least squares regularization scheme to each data subset to produce an output function, and then takes an average of the individual output functions as a final global estimator or predictor. We show with error bounds in expectation in both the $L^2$-metric and RKHS-metric that the global output function of this distributed learning is a good approximation to the algorithm processing the whole data in one single machine. Our error bounds are sharp and stated in a general setting without any eigenfunction assumption. The analysis is achieved by a novel second order decomposition of operator differences in our integral operator approach. Even for the classical least squares regularization scheme in the RKHS associated with a general kernel, we give the best learning rate in the literature.

* 28 pages