Models, code, and papers for "Xin Zhou":

In this paper, we introduce the STN-Homography model to directly estimate the homography matrix between image pair. Different most CNN-based homography estimation methods which use an alternative 4-point homography parameterization, we use prove that, after coordinate normalization, the variance of elements of coordinate normalized $3\times3$ homography matrix is very small and suitable to be regressed well with CNN. Based on proposed STN-Homography, we use a hierarchical architecture which stacks several STN-Homography models and successively reduce the estimation error. Effectiveness of the proposed method is shown through experiments on MSCOCO dataset, in which it significantly outperforms the state-of-the-art. The average processing time of our hierarchical STN-Homography with 1 stage is only 4.87 ms on the GPU, and the processing time for hierarchical STN-Homography with 3 stages is 17.85 ms. The code will soon be open sourced.

Opinion polarization is a ubiquitous phenomenon in opinion dynamics. In contrast to the traditional consensus oriented group decision making (GDM) framework, this paper proposes a framework with the co-evolution of both opinions and relationship networks to improve the potential consensus level of a group and help the group reach a stable state. Taking the bound of confidence and the degree of individual's persistence into consideration, the evolution of the opinion is driven by the relationship among the group. Meanwhile, the antagonism or cooperation of individuals presented by the network topology also evolve according to the dynamic opinion distances. Opinions are convergent and the stable state will be reached in this co-evolution mechanism. We further explored this framework through simulation experiments. The simulation results verify the influence of the level of persistence on the time cost and indicate the influence of group size, the initial topology of networks and the bound of confidence on the number of opinion clusters.

The estimation of optimal treatment regimes is of considerable interest to precision medicine. In this work, we propose a causal $k$-nearest neighbor method to estimate the optimal treatment regime. The method roots in the framework of causal inference, and estimates the causal treatment effects within the nearest neighborhood. Although the method is simple, it possesses nice theoretical properties. We show that the causal $k$-nearest neighbor regime is universally consistent. That is, the causal $k$-nearest neighbor regime will eventually learn the optimal treatment regime as the sample size increases. We also establish its convergence rate. However, the causal $k$-nearest neighbor regime may suffer from the curse of dimensionality, i.e. performance deteriorates as dimensionality increases. To alleviate this problem, we develop an adaptive causal $k$-nearest neighbor method to perform metric selection and variable selection simultaneously. The performance of the proposed methods is illustrated in simulation studies and in an analysis of a chronic depression clinical trial.

A common issue for classification in scientific research and industry is the existence of imbalanced classes. When sample sizes of different classes are imbalanced in training data, naively implementing a classification method often leads to unsatisfactory prediction results on test data. Multiple resampling techniques have been proposed to address the class imbalance issues. Yet, there is no general guidance on when to use each technique. In this article, we provide an objective-oriented review of the common resampling techniques for binary classification under imbalanced class sizes. The learning objectives we consider include the classical paradigm that minimizes the overall classification error, the cost-sensitive learning paradigm that minimizes a cost-adjusted weighted type I and type II errors, and the Neyman-Pearson paradigm that minimizes the type II error subject to a type I error constraint. Under each paradigm, we investigate the combination of the resampling techniques and a few state-of-the-art classification methods. For each pair of resampling techniques and classification methods, we use simulation studies to study the performance under different evaluation metrics. From these extensive simulation experiments, we demonstrate under each classification paradigm, the complex dynamics among resampling techniques, base classification methods, evaluation metrics, and imbalance ratios. For practitioners, the take-away message is that with imbalanced data, one usually should consider all the combinations of resampling techniques and the base classification methods.

Search space is a key consideration for neural architecture search. Recently, Xie et al. (2019) found that randomly generated networks from the same distribution perform similarly, which suggests we should search for random graph distributions instead of graphs. We propose graphon as a new search space. A graphon is the limit of Cauchy sequence of graphs and a scale-free probabilistic distribution, from which graphs of different number of nodes can be drawn. By utilizing properties of the graphon space and the associated cut-distance metric, we develop theoretically motivated techniques that search for and scale up small-capacity stage-wise graphs found on small datasets to large-capacity graphs that can handle ImageNet. The scaled stage-wise graphs outperform DenseNet and randomly wired Watts-Strogatz networks, indicating the benefits of graphon theory in NAS applications.

In this note, we report the back propagation formula for complex valued singular value decompositions (SVD). This formula is an important ingredient for a complete automatic differentiation(AD) infrastructure in terms of complex numbers, and it is also the key to understand and utilize AD in tensor networks.

Multimodal machine translation is one of the applications that integrates computer vision and language processing. It is a unique task given that in the field of machine translation, many state-of-the-arts algorithms still only employ textual information. In this work, we explore the effectiveness of reinforcement learning in multimodal machine translation. We present a novel algorithm based on the Advantage Actor-Critic (A2C) algorithm that specifically cater to the multimodal machine translation task of the EMNLP 2018 Third Conference on Machine Translation (WMT18). We experiment our proposed algorithm on the Multi30K multilingual English-German image description dataset and the Flickr30K image entity dataset. Our model takes two channels of inputs, image and text, uses translation evaluation metrics as training rewards, and achieves better results than supervised learning MLE baseline models. Furthermore, we discuss the prospects and limitations of using reinforcement learning for machine translation. Our experiment results suggest a promising reinforcement learning solution to the general task of multimodal sequence to sequence learning.

Matrix completion has been well studied under the uniform sampling model and the trace-norm regularized methods perform well both theoretically and numerically in such a setting. However, the uniform sampling model is unrealistic for a range of applications and the standard trace-norm relaxation can behave very poorly when the underlying sampling scheme is non-uniform. In this paper we propose and analyze a max-norm constrained empirical risk minimization method for noisy matrix completion under a general sampling model. The optimal rate of convergence is established under the Frobenius norm loss in the context of approximately low-rank matrix reconstruction. It is shown that the max-norm constrained method is minimax rate-optimal and yields a unified and robust approximate recovery guarantee, with respect to the sampling distributions. The computational effectiveness of this method is also discussed, based on first-order algorithms for solving convex optimizations involving max-norm regularization.

We consider in this paper the problem of noisy 1-bit matrix completion under a general non-uniform sampling distribution using the max-norm as a convex relaxation for the rank. A max-norm constrained maximum likelihood estimate is introduced and studied. The rate of convergence for the estimate is obtained. Information-theoretical methods are used to establish a minimax lower bound under the general sampling model. The minimax upper and lower bounds together yield the optimal rate of convergence for the Frobenius norm loss. Computational algorithms and numerical performance are also discussed.

This paper investigates tradeoffs among optimization errors, statistical rates of convergence and the effect of heavy-tailed random errors for high-dimensional adaptive Huber regression with nonconvex regularization. When the additive errors in linear models have only bounded second moment, our results suggest that adaptive Huber regression with nonconvex regularization yields statistically optimal estimators that satisfy oracle properties as if the true underlying support set were known beforehand. Computationally, we need as many as O(log s + log log d) convex relaxations to reach such oracle estimators, where s and d denote the sparsity and ambient dimension, respectively. Numerical studies lend strong support to our methodology and theory.

Evolutionary algorithms (EAs) are heuristic algorithms inspired by natural evolution. They are often used to obtain satisficing solutions in practice. In this paper, we investigate a largely underexplored issue: the approximation performance of EAs in terms of how close the solution obtained is to an optimal solution. We study an EA framework named simple EA with isolated population (SEIP) that can be implemented as a single- or multi-objective EA. We analyze the approximation performance of SEIP using the partial ratio, which characterizes the approximation ratio that can be guaranteed. Specifically, we analyze SEIP using a set cover problem that is NP-hard. We find that in a simple configuration, SEIP efficiently achieves an $H_n$-approximation ratio, the asymptotic lower bound, for the unbounded set cover problem. We also find that SEIP efficiently achieves an $(H_k-\frac{k-1}/{8k^9})$-approximation ratio, the currently best-achievable result, for the k-set cover problem. Moreover, for an instance class of the k-set cover problem, we disclose how SEIP, using either one-bit or bit-wise mutation, can overcome the difficulty that limits the greedy algorithm.

This technical report provides detailed theoretical analysis of the algorithm used in \textit{Alternating Minimization Based Trajectory Generation for Quadrotor Aggressive Flight}. An assumption is provided to ensure that settings for the objective function are meaningful. What's more, we explore the structure of the optimization problem and analyze the global/local convergence rate of the employed algorithm.

Differentiable programming has emerged as a key programming paradigm empowering rapid developments of deep learning while its applications to important computational methods such as Monte Carlo remain largely unexplored. Here we present the general theory enabling infinite-order automatic differentiation on expectations computed by Monte Carlo with unnormalized probability distributions, which we call "automatic differentiable Monte Carlo" (ADMC). By implementing ADMC algorithms on computational graphs, one can also leverage state-of-the-art machine learning frameworks and techniques to traditional Monte Carlo applications in statistics and physics. We illustrate the versatility of ADMC by showing some applications: fast search of phase transitions and accurately finding ground states of interacting many-body models in two dimensions. ADMC paves a promising way to innovate Monte Carlo in various aspects to achieve higher accuracy and efficiency, e.g. easing or solving the sign problem of quantum many-body models through ADMC.

In the industrial domain, the pose estimation of multiple texture-less shiny parts is a valuable but challenging task. In this particular scenario, it is impractical to utilize keypoints or other texture information because most of them are not actual features of the target but the reflections of surroundings. Moreover, the similarity of color also poses a challenge in segmentation. In this article, we propose to divide the pose estimation process into three stages: object detection, features detection and pose optimization. A convolutional neural network was utilized to perform object detection. Concerning the reliability of surface texture, we leveraged the contour information for estimating pose. Since conventional contour-based methods are inapplicable to clustered metal parts due to the difficulties in segmentation, we use the dense discrete points along the metal part edges as semantic keypoints for contour detection. Afterward, we exploit both keypoint information and CAD model to calculate the 6D pose of each object in view. A typical implementation of deep learning methods not only requires a large amount of training data, but also relies on intensive human labor for labeling the datasets. Therefore, we propose an approach to generate datasets and label them automatically. Despite not using any real-world photos for training, a series of experiments showed that the algorithm built on synthetic data perform well in the real environment.

In this paper, we propose a variational approach to weakly supervised document-level multi-aspect sentiment classification. Instead of using user-generated ratings or annotations provided by domain experts, we use target-opinion word pairs as "supervision." These word pairs can be extracted by using dependency parsers and simple rules. Our objective is to predict an opinion word given a target word while our ultimate goal is to learn a sentiment polarity classifier to predict the sentiment polarity of each aspect given a document. By introducing a latent variable, i.e., the sentiment polarity, to the objective function, we can inject the sentiment polarity classifier to the objective via the variational lower bound. We can learn a sentiment polarity classifier by optimizing the lower bound. We show that our method can outperform weakly supervised baselines on TripAdvisor and BeerAdvocate datasets and can be comparable to the state-of-the-art supervised method with hundreds of labels per aspect.

Achieving good speed and accuracy trade-off on a target platform is very important in deploying deep neural networks in real world scenarios. However, most existing automatic architecture search approaches only concentrate on high performance. In this work, we propose an algorithm that can offer better speed/accuracy trade-off of searched networks, which is termed "Partial Order Pruning". It prunes the architecture search space with a partial order assumption to automatically search for the architectures with the best speed and accuracy trade-off. Our algorithm explicitly takes profile information about the inference speed on the target platform into consideration. With the proposed algorithm, we present several Dongfeng (DF) networks that provide high accuracy and fast inference speed on various application GPU platforms. By further searching decoder architectures, our DF-Seg real-time segmentation networks yield state-of-the-art speed/accuracy trade-off on both the target embedded device and the high-end GPU.

In this paper, we propose a novel approach for the rank minimization problem, termed rank residual constraint (RRC). Different from existing low-rank based approaches, such as the well-known weighted nuclear norm minimization (WNNM) and nuclear norm minimization (NNM), which aim to estimate the underlying low-rank matrix directly from the corrupted observation, we progressively approximate or approach the underlying low-rank matrix via minimizing the rank residual. By integrating the image nonlocal self-similarity (NSS) prior with the proposed RRC model, we develop an iterative algorithm for image denoising. To this end, we first present a recursive based nonlocal means method to obtain a good reference of the original image patch groups, and then the rank residual of the image patch groups between this reference and the noisy image is minimized to achieve a better estimate of the desired image. In this manner, both the reference and the estimated image in each iteration are improved gradually and jointly. Based on the group-based sparse representation model, we further provide a theoretical analysis on the feasibility of the proposed RRC model. Experimental results demonstrate that the proposed RRC model outperforms many state-of-the-art denoising methods in both the objective and perceptual qualities.

Large scale deep Convolution Neural Networks (CNNs) increasingly demands the computing power. It is key for researchers to own a great powerful computing platform to leverage deep learning (DL) advancing.On the other hand, as the commonly-used accelerator, the commodity GPUs cards of new generations are more and more expensive. Consequently, it is of importance to design an affordable distributed heterogeneous system that provides powerful computational capacity and develop a well-suited software that efficiently utilizes its computational capacity. In this paper, we present our co-design distributed system including a peta-scale GPU cluster, called "Manoa". Based on properties and topology of Manoa, we first propose job server framework and implement it, named "MiMatrix". The central node of MiMatrix, referred to as the job server, undertakes all of controlling, scheduling and monitoring, and I/O tasks without weight data transfer for AllReduce processing in each iteration. Therefore, MiMatrix intrinsically solves the bandwidth bottleneck of central node in parameter server framework that is widely used in distributed DL tasks. Meanwhile, we also propose a new AllReduce algorithm, GPUDirect RDMA-Aware AllReduce~(GDRAA), in which both computation and handshake message are O(1) and the number of synchronization is two in each iteration that is a theoretical minimum number. Owe to the dedicated co-design distributed system, MiMatrix efficiently makes use of the Manoa's computational capacity and bandwidth. We benchmark Manoa Resnet50 and Resenet101 on Imagenet-1K dataset. Some of results have demonstrated state-of-the-art.

In recent years, deep neural network exhibits its powerful superiority on information discrimination in many computer vision applications. However, the capacity of deep neural network architecture is still a mystery to the researchers. Intuitively, larger capacity of neural network can always deposit more information to improve the discrimination ability of the model. But, the learnable parameter scale is not feasible to estimate the capacity of deep neural network. Due to the overfitting, directly increasing hidden nodes number and hidden layer number are already demonstrated not necessary to effectively increase the network discrimination ability. In this paper, we propose a novel measurement, named "total valid bits", to evaluate the capacity of deep neural networks for exploring how to quantitatively understand the deep learning and the insights behind its super performance. Specifically, our scheme to retrieve the total valid bits incorporates the skilled techniques in both training phase and inference phase. In the network training, we design decimal weight regularization and 8-bit forward quantization to obtain the integer-oriented network representations. Moreover, we develop adaptive-bitwidth and non-uniform quantization strategy in the inference phase to find the neural network capacity, total valid bits. By allowing zero bitwidth, our adaptive-bitwidth quantization can execute the model reduction and valid bits finding simultaneously. In our extensive experiments, we first demonstrate that our total valid bits is a good indicator of neural network capacity. We also analyze the impact on network capacity from the network architecture and advanced training skills, such as dropout and batch normalization.

We study distributed learning with the least squares regularization scheme in a reproducing kernel Hilbert space (RKHS). By a divide-and-conquer approach, the algorithm partitions a data set into disjoint data subsets, applies the least squares regularization scheme to each data subset to produce an output function, and then takes an average of the individual output functions as a final global estimator or predictor. We show with error bounds in expectation in both the $L^2$-metric and RKHS-metric that the global output function of this distributed learning is a good approximation to the algorithm processing the whole data in one single machine. Our error bounds are sharp and stated in a general setting without any eigenfunction assumption. The analysis is achieved by a novel second order decomposition of operator differences in our integral operator approach. Even for the classical least squares regularization scheme in the RKHS associated with a general kernel, we give the best learning rate in the literature.