Models, code, and papers for "Xiaodong He":

Simulator imperfection, often known as model error, is ubiquitous in practical data assimilation problems. Despite the enormous efforts dedicated to addressing this problem, properly handling simulator imperfection in data assimilation remains to be a challenging task. In this work, we propose an approach to dealing with simulator imperfection from a point of view of functional approximation that can be implemented through a certain machine learning method, such as kernel-based learning adopted in the current work. To this end, we start from considering a class of supervised learning problems, and then identify similarities between supervised learning and variational data assimilation. These similarities found the basis for us to develop an ensemble-based learning framework to tackle supervised learning problems, while achieving various advantages of ensemble-based methods over the variational ones. After establishing the ensemble-based learning framework, we proceed to investigate the integration of ensemble-based learning into an ensemble-based data assimilation framework to handle simulator imperfection. In the course of our investigations, we also develop a strategy to tackle the issue of multi-modality in supervised-learning problems, and transfer this strategy to data assimilation problems to help improve assimilation performance. For demonstration, we apply the ensemble-based learning framework and the integrated, ensemble-based data assimilation framework to a supervised learning problem and a data assimilation problem with an imperfect forward simulator, respectively. The experiment results indicate that both frameworks achieve good performance in relevant case studies, and that functional approximation through machine learning may serve as a viable way to account for simulator imperfection in data assimilation problems.

In this paper, a new statistic feature of the discrete short-time amplitude spectrum is discovered by experiments for the signals of unvoiced pronunciation. For the random-varying short-time spectrum, this feature reveals the relationship between the amplitude's average and its standard for every frequency component. On the other hand, the association between the amplitude distributions for different frequency components is also studied. A new model representing such association is inspired by the normalized histogram of amplitude. By mathematical analysis, the new statistic feature discovered is proved to be necessary evidence which supports the proposed model, and also can be direct evidence for the widely used hypothesis of "identical distribution of amplitude for all frequencies".

We improve existing results in the field of compressed sensing and matrix completion when sampled data may be grossly corrupted. We introduce three new theorems. 1) In compressed sensing, we show that if the m \times n sensing matrix has independent Gaussian entries, then one can recover a sparse signal x exactly by tractable \ell1 minimimization even if a positive fraction of the measurements are arbitrarily corrupted, provided the number of nonzero entries in x is O(m/(log(n/m) + 1)). 2) In the very general sensing model introduced in "A probabilistic and RIPless theory of compressed sensing" by Candes and Plan, and assuming a positive fraction of corrupted measurements, exact recovery still holds if the signal now has O(m/(log^2 n)) nonzero entries. 3) Finally, we prove that one can recover an n \times n low-rank matrix from m corrupted sampled entries by tractable optimization provided the rank is on the order of O(m/(n log^2 n)); again, this holds when there is a positive fraction of corrupted samples.

We show that a character-level encoder-decoder framework can be successfully applied to question answering with a structured knowledge base. We use our model for single-relation question answering and demonstrate the effectiveness of our approach on the SimpleQuestions dataset (Bordes et al., 2015), where we improve state-of-the-art accuracy from 63.9% to 70.9%, without use of ensembles. Importantly, our character-level model has 16x fewer parameters than an equivalent word-level model, can be learned with significantly less data compared to previous work, which relies on data augmentation, and is robust to new entities in testing.

In recent years, different types of adversarial examples from different fields have emerged endlessly, including purely natural ones without perturbations. A variety of defenses are proposed and then broken quickly. Two fundamental questions need to be asked: What's the reason for the existence of adversarial examples and are adversarial examples unsolvable? In this paper, we will show the reason for the existence of adversarial examples is there are non-isomorphic natural explanations that can all explain data set. Specifically, for two natural explanations of being true and provable, G\"odel's sentence is an adversarial example but ineliminable. It can't be solved by the re-accumulation of data set or the re-improvement of learning algorithm. Finally, from the perspective of computability, we will prove the incomputability for adversarial examples, which are unrecognizable.

Evolutionary stochastic gradient descent (ESGD) was proposed as a population-based approach that combines the merits of gradient-aware and gradient-free optimization algorithms for superior overall optimization performance. In this paper we investigate a variant of ESGD for optimization of acoustic models for automatic speech recognition (ASR). In this variant, we assume the existence of a well-trained acoustic model and use it as an anchor in the parent population whose good "gene" will propagate in the evolution to the offsprings. We propose an ESGD algorithm leveraging the anchor models such that it guarantees the best fitness of the population will never degrade from the anchor model. Experiments on 50-hour Broadcast News (BN50) and 300-hour Switchboard (SWB300) show that the ESGD with anchors can further improve the loss and ASR performance over the existing well-trained acoustic models.

Mobile edge computing (MEC) emerges recently as a promising solution to relieve resource-limited mobile devices from computation-intensive tasks, which enables devices to offload workloads to nearby MEC servers and improve the quality of computation experience. Nevertheless, by considering a MEC system consisting of multiple mobile users with stochastic task arrivals and wireless channels in this paper, the design of computation offloading policies is challenging to minimize the long-term average computation cost in terms of power consumption and buffering delay. A deep reinforcement learning (DRL) based decentralized dynamic computation offloading strategy is investigated to build a scalable MEC system with limited feedback. Specifically, a continuous action space-based DRL approach named deep deterministic policy gradient (DDPG) is adopted to learn efficient computation offloading policies independently at each mobile user. Thus, powers of both local execution and task offloading can be adaptively allocated by the learned policies from each user's local observation of the MEC system. Numerical results are illustrated to demonstrate that efficient policies can be learned at each user, and performance of the proposed DDPG based decentralized strategy outperforms the conventional deep Q-network (DQN) based discrete power control strategy and some other greedy strategies with reduced computation cost. Besides, the power-delay tradeoff is also analyzed for both the DDPG based and DQN based strategies.

Social robots, also known as service or assistant robots, have been developed to improve the quality of human life in recent years. The design of socially capable and intelligent robots can vary, depending on the target user groups. In this work, we assess the effect of social robots' roles, functions, and communication approaches in the context of a social agent providing service or entertainment to users with developmental disabilities. In this paper, we describe an exploratory study of interface design for a social robot that assists people suffering from developmental disabilities. We developed series of prototypes and tested one in a user study that included three residents with various function levels. This entire study had been recorded for the following qualitative data analysis. Results show that each design factor played a different role in delivering information and in increasing engagement. We also note that some of the fundamental design principles that would work for ordinary users did not apply to our target user group. We conclude that social robots could benefit our target users, and acknowledge that these robots were not suitable for certain scenarios based on the feedback from our users.

Canonical correlation analysis (CCA) is a fundamental statistical tool for exploring the correlation structure between two sets of random variables. In this paper, motivated by recent success of applying CCA to learn low dimensional representations of high dimensional objects, we propose to quantify the estimation loss of CCA by the excess prediction loss defined through a prediction-after-dimension-reduction framework. Such framework suggests viewing CCA estimation as estimating the subspaces spanned by the canonical variates. Interestedly, the proposed error metrics derived from the excess prediction loss turn out to be closely related to the principal angles between the subspaces spanned by the population and sample canonical variates respectively. We characterize the non-asymptotic minimax rates under the proposed metrics, especially the dependency of the minimax rates on the key quantities including the dimensions, the condition number of the covariance matrices, the canonical correlations and the eigen-gap, with minimal assumptions on the joint covariance matrix. To the best of our knowledge, this is the first finite sample result that captures the effect of the canonical correlations on the minimax rates.

Kernel PCA is a widely used nonlinear dimension reduction technique in machine learning, but storing the kernel matrix is notoriously challenging when the sample size is large. Inspired by Yi et al. [2016], where the idea of partial matrix sampling followed by nonconvex optimization is proposed for matrix completion and robust PCA, we apply a similar approach to memory-efficient Kernel PCA. In theory, with no assumptions on the kernel matrix in terms of eigenvalues or eigenvectors, we established a model-free theory for the low-rank approximation based on any local minimum of the proposed objective function. As interesting byproducts, when the underlying positive semidefinite matrix is assumed to be low-rank and highly structured, corollaries of our main theorem improve the state-of-the-art results of Ge et al. [2016, 2017] for nonconvex matrix completion with no spurious local minima. Numerical experiments also show that our approach is competitive in terms of approximation accuracy compared to the well-known Nystr\"{o}m algorithm for Kernel PCA.

Minimizing the nuclear norm of a matrix has been shown to be very efficient in reconstructing a low-rank sampled matrix. Furthermore, minimizing the sum of nuclear norms of matricizations of a tensor has been shown to be very efficient in recovering a low-Tucker-rank sampled tensor. In this paper, we propose to recover a low-TT-rank sampled tensor by minimizing a weighted sum of nuclear norms of unfoldings of the tensor. We provide numerical results to show that our proposed method requires significantly less number of samples to recover to the original tensor in comparison with simply minimizing the sum of nuclear norms since the structure of the unfoldings in the TT tensor model is fundamentally different from that of matricizations in the Tucker tensor model.

This paper addresses the problem of predicting popularity of comments in an online discussion forum using reinforcement learning, particularly addressing two challenges that arise from having natural language state and action spaces. First, the state representation, which characterizes the history of comments tracked in a discussion at a particular point, is augmented to incorporate the global context represented by discussions on world events available in an external knowledge source. Second, a two-stage Q-learning framework is introduced, making it feasible to search the combinatorial action space while also accounting for redundancy among sub-actions. We experiment with five Reddit communities, showing that the two methods improve over previous reported results on this task.

We consider the problem of low canonical polyadic (CP) rank tensor completion. A completion is a tensor whose entries agree with the observed entries and its rank matches the given CP rank. We analyze the manifold structure corresponding to the tensors with the given rank and define a set of polynomials based on the sampling pattern and CP decomposition. Then, we show that finite completability of the sampled tensor is equivalent to having a certain number of algebraically independent polynomials among the defined polynomials. Our proposed approach results in characterizing the maximum number of algebraically independent polynomials in terms of a simple geometric structure of the sampling pattern, and therefore we obtain the deterministic necessary and sufficient condition on the sampling pattern for finite completability of the sampled tensor. Moreover, assuming that the entries of the tensor are sampled independently with probability $p$ and using the mentioned deterministic analysis, we propose a combinatorial method to derive a lower bound on the sampling probability $p$, or equivalently, the number of sampled entries that guarantees finite completability with high probability. We also show that the existing result for the matrix completion problem can be used to obtain a loose lower bound on the sampling probability $p$. In addition, we obtain deterministic and probabilistic conditions for unique completability. It is seen that the number of samples required for finite or unique completability obtained by the proposed analysis on the CP manifold is orders-of-magnitude lower than that is obtained by the existing analysis on the Grassmannian manifold.

In this paper, we analyze the fundamental conditions for low-rank tensor completion given the separation or tensor-train (TT) rank, i.e., ranks of unfoldings. We exploit the algebraic structure of the TT decomposition to obtain the deterministic necessary and sufficient conditions on the locations of the samples to ensure finite completability. Specifically, we propose an algebraic geometric analysis on the TT manifold that can incorporate the whole rank vector simultaneously in contrast to the existing approach based on the Grassmannian manifold that can only incorporate one rank component. Our proposed technique characterizes the algebraic independence of a set of polynomials defined based on the sampling pattern and the TT decomposition, which is instrumental to obtaining the deterministic condition on the sampling pattern for finite completability. In addition, based on the proposed analysis, assuming that the entries of the tensor are sampled independently with probability $p$, we derive a lower bound on the sampling probability $p$, or equivalently, the number of sampled entries that ensures finite completability with high probability. Moreover, we also provide the deterministic and probabilistic conditions for unique completability.

Based on the in-depth analysis of the essence and features of vague phenomena, this paper focuses on establishing the axiomatical foundation of membership degree theory for vague phenomena, presents an axiomatic system to govern membership degrees and their interconnections. On this basis, the concept of vague partition is introduced, further, the concept of fuzzy set introduced by Zadeh in 1965 is redefined based on vague partition from the perspective of axiomatization. The thesis defended in this paper is that the relationship among vague attribute values should be the starting point to recognize and model vague phenomena from a quantitative view.

Recently, dropout has seen increasing use in deep learning. For deep convolutional neural networks, dropout is known to work well in fully-connected layers. However, its effect in pooling layers is still not clear. This paper demonstrates that max-pooling dropout is equivalent to randomly picking activation based on a multinomial distribution at training time. In light of this insight, we advocate employing our proposed probabilistic weighted pooling, instead of commonly used max-pooling, to act as model averaging at test time. Empirical evidence validates the superiority of probabilistic weighted pooling. We also compare max-pooling dropout and stochastic pooling, both of which introduce stochasticity based on multinomial distributions at pooling stage.

Recently, dropout has seen increasing use in deep learning. For deep convolutional neural networks, dropout is known to work well in fully-connected layers. However, its effect in convolutional and pooling layers is still not clear. This paper demonstrates that max-pooling dropout is equivalent to randomly picking activation based on a multinomial distribution at training time. In light of this insight, we advocate employing our proposed probabilistic weighted pooling, instead of commonly used max-pooling, to act as model averaging at test time. Empirical evidence validates the superiority of probabilistic weighted pooling. We also empirically show that the effect of convolutional dropout is not trivial, despite the dramatically reduced possibility of over-fitting due to the convolutional architecture. Elaborately designing dropout training simultaneously in max-pooling and fully-connected layers, we achieve state-of-the-art performance on MNIST, and very competitive results on CIFAR-10 and CIFAR-100, relative to other approaches without data augmentation. Finally, we compare max-pooling dropout and stochastic pooling, both of which introduce stochasticity based on multinomial distributions at pooling stage.

Image composition is one of the most important applications in image processing. However, the inharmonious appearance between the spliced region and background degrade the quality of the image. Thus, we address the problem of Image Harmonization: Given a spliced image and the mask of the spliced region, we try to harmonize the "style'' of the pasted region with the background (non-spliced region). Previous approaches have been focusing on learning directly by the neural network. In this work, we start from an empirical observation: the differences can only be found in the spliced region between the spliced image and the harmonized result while they share the same semantic information and the appearance in the non-spliced region. Thus, in order to learn the feature map in the masked region and the others individually, we propose a novel attention module named Spatial-Separated Attention Module (S2AM). Furthermore, we design a novel image harmonization framework by inserting the S2AM in the coarser low-level features of the Unet structure in two different ways. Besides image harmonization, we make a big step for harmonizing the composite image without the specific mask under previous observation. The experiments show that the proposed S2AM performs better than other state-of-the-art attention modules in our task. Moreover, we demonstrate the advantages of our model against other state-of-the-art image harmonization methods via criteria from multiple points of view. Code is available at https://github.com/vinthony/s2am

A novel way of matching two images with shifting transformation is studied. The approach is based on the presentation of the virtual edge current in images, and also the study of virtual electromagnetic interaction between two related images inspired by electromagnetism. The edge current in images is proposed as a discrete simulation of the physical current, which is based on the significant edge line extracted by Canny-like edge detection. Then the virtual interaction of the edge currents between related images is studied by imitating the electro-magnetic interaction between current-carrying wires. Based on the virtual interaction force between two related images, a novel method is presented and applied in image matching for shifting transformation. The preliminary experimental results indicate the effectiveness of the proposed method.

A novel model for image segmentation is proposed, which is inspired by the carrier immigration mechanism in physical P-N junction. The carrier diffusing and drifting are simulated in the proposed model, which imitates the physical self-balancing mechanism in P-N junction. The effect of virtual carrier immigration in digital images is analyzed and studied by experiments on test images and real world images. The sign distribution of net carrier at the model's balance state is exploited for region segmentation. The experimental results for both test images and real-world images demonstrate self-adaptive and meaningful gathering of pixels to suitable regions, which prove the effectiveness of the proposed method for image region segmentation.