Models, code, and papers for "Xi Chen":

The estimation of unknown values of parameters (or hidden variables, control variables) that characterise a physical system often relies on the comparison of measured data with synthetic data produced by some numerical simulator of the system as the parameter values are varied. This process often encounters two major difficulties: the generation of synthetic data for each considered set of parameter values can be computationally expensive if the system model is complicated; and the exploration of the parameter space can be inefficient and/or incomplete, a typical example being when the exploration becomes trapped in a local optimum of the objection function that characterises the mismatch between the measured and synthetic data. A method to address both these issues is presented, whereby: a surrogate model (or proxy), which emulates the computationally expensive system simulator, is constructed using deep recurrent networks (DRN); and a nested sampling (NS) algorithm is employed to perform efficient and robust exploration of the parameter space. The analysis is performed in a Bayesian context, in which the samples characterise the full joint posterior distribution of the parameters, from which parameter estimates and uncertainties are easily derived. The proposed approach is compared with conventional methods in some numerical examples, for which the results demonstrate that one can accelerate the parameter estimation process by at least an order of magnitude.

Deep reinforcement learning (deep RL) has achieved superior performance in complex sequential tasks by learning directly from image input. A deep neural network is used as a function approximator and requires no specific state information. However, one drawback of using only images as input is that this approach requires a prohibitively large amount of training time and data for the model to learn the state feature representation and approach reasonable performance. This is not feasible in real-world applications, especially when the data are expansive and training phase could introduce disasters that affect human safety. In this work, we use a human demonstration approach to speed up training for learning features and use the resulting pre-trained model to replace the neural network in the deep RL Deep Q-Network (DQN), followed by human interaction to further refine the model. We empirically evaluate our approach by using only a human demonstration model and modified DQN with human demonstration model included in the Microsoft AirSim car simulator. Our results show that (1) pre-training with human demonstration in a supervised learning approach is better and much faster at discovering features than DQN alone, (2) initializing the DQN with a pre-trained model provides a significant improvement in training time and performance even with limited human demonstration, and (3) providing the ability for humans to supply suggestions during DQN training can speed up the network's convergence on an optimal policy, as well as allow it to learn more complex policies that are harder to discover by random exploration.

In this short note we consider a dynamic assortment planning problem under the capacitated multinomial logit (MNL) bandit model. We prove a tight lower bound on the accumulated regret that matches existing regret upper bounds for all parameters (time horizon $T$, number of items $N$ and maximum assortment capacity $K$) up to logarithmic factors. Our results close an $O(\sqrt{K})$ gap between upper and lower regret bounds from existing works.

The gesture recognition using motion capture data and depth sensors has recently drawn more attention in vision recognition. Currently most systems only classify dataset with a couple of dozens different actions. Moreover, feature extraction from the data is often computational complex. In this paper, we propose a novel system to recognize the actions from skeleton data with simple, but effective, features using deep neural networks. Features are extracted for each frame based on the relative positions of joints (PO), temporal differences (TD), and normalized trajectories of motion (NT). Given these features a hybrid multi-layer perceptron is trained, which simultaneously classifies and reconstructs input data. We use deep autoencoder to visualize learnt features, and the experiments show that deep neural networks can capture more discriminative information than, for instance, principal component analysis can. We test our system on a public database with 65 classes and more than 2,000 motion sequences. We obtain an accuracy above 95% which is, to our knowledge, the state of the art result for such a large dataset.

We consider the dynamic assortment optimization problem under the multinomial logit model (MNL) with unknown utility parameters. The main question investigated in this paper is model mis-specification under the $\varepsilon$-contamination model, which is a fundamental model in robust statistics and machine learning. In particular, throughout a selling horizon of length $T$, we assume that customers make purchases according to a well specified underlying multinomial logit choice model in a ($1-\varepsilon$)-fraction of the time periods, and make arbitrary purchasing decisions instead in the remaining $\varepsilon$-fraction of the time periods. In this model, we develop a new robust online assortment optimization policy via an active elimination strategy. We establish both upper and lower bounds on the regret, and show that our policy is optimal up to logarithmic factor in T when the assortment capacity is constant. Furthermore, we develop a fully adaptive policy that does not require any prior knowledge of the contamination parameter $\varepsilon$. Our simulation study shows that our policy outperforms the existing policies based on upper confidence bounds (UCB) and Thompson sampling.

Priors in Bayesian analyses often encode informative domain knowledge that can be useful in making the inference process more efficient. Occasionally, however, priors may be unrepresentative of the parameter values for a given dataset, which can result in inefficient parameter space exploration, or even incorrect inferences, particularly for nested sampling (NS) algorithms. Simply broadening the prior in such cases may be inappropriate or impossible in some applications. Hence a previous solution of this problem, known as posterior repartitioning (PR), redefines the prior and likelihood while keeping their product fixed, so that the posterior inferences and evidence estimates remain unchanged, but the efficiency of the NS process is significantly increased. In its most practical form, PR raises the prior to some power $\beta$, which is introduced as an auxiliary variable that must be determined on a case-by-case basis, usually by lowering $\beta$ from unity according to some pre-defined `annealing schedule' until the resulting inferences converge to a consistent solution. We present here an alternative Bayesian `automated PR' method, in which $\beta$ is instead treated as a hyperparameter that is inferred from the data alongside the original parameters of the problem, and then marginalised over to obtain the final inference. We show through numerical examples that this approach provides a robust and efficient `hands-off' solution to addressing the issue of unrepresentative priors in Bayesian inference using NS. Moreover, we show that for problems with representative priors the method has a negligible computational overhead relative to standard nesting sampling, which suggests that it should be used in as a matter of course in all NS analyses.

To guarantee the safe and efficient motion planning of autonomous driving under dynamic traffic environment, the autonomous vehicle should be equipped with not only the optimal but also a long term efficient policy to deal with complex scenarios. The first challenge is that to acquire the optimal planning trajectory means to sacrifice the planning efficiency. The second challenge is that most search based planning method cannot find the desired trajectory in extreme scenario. In this paper, we propose a data driven approach for motion planning to solve the above challenges. We transform the lane change mission into Mixed Integer Quadratic Problem with logical constraints, allowing the planning module to provide feasible, safe and comfortable actions in more complex scenario. Furthermore, we propose a hierarchical learning structure to guarantee online, fast and more generalized motion planning. Our approach's performance is demonstrated in the simulated lane change scenario and compared with related planning method.

Compared with other semantic segmentation tasks, portrait segmentation requires both higher precision and faster inference speed. However, this problem has not been well studied in previous works. In this paper, we propose a lightweight network architecture, called Boundary-Aware Network (BANet) which selectively extracts detail information in boundary area to make high-quality segmentation output with real-time( >25FPS) speed. In addition, we design a new loss function called refine loss which supervises the network with image level gradient information. Our model is able to produce finer segmentation results which has richer details than annotations.

With the increased affordability and availability of whole-genome sequencing, large-scale and high-throughput gene expression is widely used to characterize diseases, including cancers. However, establishing specificity in cancer diagnosis using gene expression data continues to pose challenges due to the high dimensionality and complexity of the data. Here we present models of deep learning (DL) and apply them to gene expression data for the diagnosis and categorization of cancer. In this study, we have developed two DL models using messenger ribonucleic acid (mRNA) datasets available from the Genomic Data Commons repository. Our models achieved 98% accuracy in cancer detection, with false negative and false positive rates below 1.7%. In our results, we demonstrated that 18 out of 32 cancer-typing classifications achieved more than 90% accuracy. Due to the limitation of a small sample size (less than 50 observations), certain cancers could not achieve a higher accuracy in typing classification, but still achieved high accuracy for the cancer detection task. To validate our models, we compared them with traditional statistical models. The main advantage of our models over traditional cancer detection is the ability to use data from various cancer types to automatically form features to enhance the detection and diagnosis of a specific cancer type.

This paper studies distributed estimation and inference for a general statistical problem with a convex loss that could be non-differentiable. For the purpose of efficient computation, we restrict ourselves to stochastic first-order optimization, which enjoys low per-iteration complexity. To motivate the proposed method, we first investigate the theoretical properties of a straightforward Divide-and-Conquer Stochastic Gradient Descent (DC-SGD) approach. Our theory shows that there is a restriction on the number of machines and this restriction becomes more stringent when the dimension $p$ is large. To overcome this limitation, this paper proposes a new multi-round distributed estimation procedure that approximates the Newton step only using stochastic subgradient. The key component in our method is the proposal of a computationally efficient estimator of $\Sigma^{-1} w$, where $\Sigma$ is the population Hessian matrix and $w$ is any given vector. Instead of estimating $\Sigma$ (or $\Sigma^{-1}$) that usually requires the second-order differentiability of the loss, the proposed First-Order Newton-type Estimator (FONE) directly estimates the vector of interest $\Sigma^{-1} w$ as a whole and is applicable to non-differentiable losses. Our estimator also facilitates the inference for the empirical risk minimizer. It turns out that the key term in the limiting covariance has the form of $\Sigma^{-1} w$, which can be estimated by FONE.

In this paper, we study the dynamic assortment optimization problem under a finite selling season of length $T$. At each time period, the seller offers an arriving customer an assortment of substitutable products under a cardinality constraint, and the customer makes the purchase among offered products according to a discrete choice model. Most existing work associates each product with a real-valued fixed mean utility and assumes a multinomial logit choice (MNL) model. In many practical applications, feature/contexutal information of products is readily available. In this paper, we incorporate the feature information by assuming a linear relationship between the mean utility and the feature. In addition, we allow the feature information of products to change over time so that the underlying choice model can also be non-stationary. To solve the dynamic assortment optimization under this changing contextual MNL model, we need to simultaneously learn the underlying unknown coefficient and makes the decision on the assortment. To this end, we develop an upper confidence bound (UCB) based policy and establish the regret bound on the order of $\widetilde O(d\sqrt{T})$, where $d$ is the dimension of the feature and $\widetilde O$ suppresses logarithmic dependence. We further established the lower bound $\Omega(d\sqrt{T}/K)$ where $K$ is the cardinality constraint of an offered assortment, which is usually small. When $K$ is a constant, our policy is optimal up to logarithmic factors. In the exploitation phase of the UCB algorithm, we need to solve a combinatorial optimization for assortment optimization based on the learned information. We further develop an approximation algorithm and an efficient greedy heuristic. The effectiveness of the proposed policy is further demonstrated by our numerical studies.

This paper studies the inference problem in quantile regression (QR) for a large sample size $n$ but under a limited memory constraint, where the memory can only store a small batch of data of size $m$. A natural method is the na\"ive divide-and-conquer approach, which splits data into batches of size $m$, computes the local QR estimator for each batch, and then aggregates the estimators via averaging. However, this method only works when $n=o(m^2)$ and is computationally expensive. This paper proposes a computationally efficient method, which only requires an initial QR estimator on a small batch of data and then successively refines the estimator via multiple rounds of aggregations. Theoretically, as long as $n$ grows polynomially in $m$, we establish the asymptotic normality for the obtained estimator and show that our estimator with only a few rounds of aggregations achieves the same efficiency as the QR estimator computed on all the data. Moreover, our result allows the case that the dimensionality $p$ goes to infinity. The proposed method can also be applied to address the QR problem under distributed computing environment (e.g., in a large-scale sensor network) or for real-time streaming data.

Two of the leading approaches for model-free reinforcement learning are policy gradient methods and $Q$-learning methods. $Q$-learning methods can be effective and sample-efficient when they work, however, it is not well-understood why they work, since empirically, the $Q$-values they estimate are very inaccurate. A partial explanation may be that $Q$-learning methods are secretly implementing policy gradient updates: we show that there is a precise equivalence between $Q$-learning and policy gradient methods in the setting of entropy-regularized reinforcement learning, that "soft" (entropy-regularized) $Q$-learning is exactly equivalent to a policy gradient method. We also point out a connection between $Q$-learning methods and natural policy gradient methods. Experimentally, we explore the entropy-regularized versions of $Q$-learning and policy gradients, and we find them to perform as well as (or slightly better than) the standard variants on the Atari benchmark. We also show that the equivalence holds in practical settings by constructing a $Q$-learning method that closely matches the learning dynamics of A3C without using a target network or $\epsilon$-greedy exploration schedule.

With the development of dialog techniques, conversational search has attracted more and more attention as it enables users to interact with the search engine in a natural and efficient manner. However, comparing with the natural language understanding in traditional task-oriented dialog which focuses on slot filling and tracking, the query understanding in E-commerce conversational search is quite different and more challenging due to more diverse user expressions and complex intentions. In this work, we define the real-world problem of query tracking in E-commerce conversational search, in which the goal is to update the internal query after each round of interaction. We also propose a self attention based neural network to handle the task in a machine comprehension perspective. Further more we build a novel E-commerce query tracking dataset from an operational E-commerce Search Engine, and experimental results on this dataset suggest that our proposed model outperforms several baseline methods by a substantial gain for Exact Match accuracy and F1 score, showing the potential of machine comprehension like model for this task.

We study a stylized dynamic assortment planning problem during a selling season of finite length $T$, by considering a nested multinomial logit model with $M$ nests and $N$ items per nest. Our policy simultaneously learns customers' choice behavior and makes dynamic decisions on assortments based on the current knowledge. It achieves the regret at the order of $\tilde{O}(\sqrt{MNT}+MN^2)$, where $M$ is the number of nests and $N$ is the number of products in each nest. We further provide a lower bound result of $\Omega(\sqrt{MT})$, which shows the optimality of the upper bound when $T>M$ and $N$ is small. However, the $N^2$ term in the upper bound is not ideal for applications where $N$ is large as compared to $T$. To address this issue, we further generalize our first policy by introducing a discretization technique, which leads to a regret of $\tilde{O}(\sqrt{M}T^{2/3}+MNT^{1/3})$ with a specific choice of discretization granularity. It improves the previous regret bound whenever $N>T^{1/3}$. We provide numerical results to demonstrate the empirical performance of both proposed policies.

In this paper we consider the dynamic assortment selection problem under an uncapacitated multinomial-logit (MNL) model. By carefully analyzing a revenue potential function, we show that a trisection based algorithm achieves an item-independent regret bound of $O(\sqrt{T\log\log T})$, which matches information theoretical lower bounds up to iterated logarithmic terms. Our proof technique draws tools from the unimodal/convex bandit literature as well as adaptive confidence parameters in minimax multi-armed bandit problems.

We consider a non-stationary sequential stochastic optimization problem, in which the underlying cost functions change over time under a variation budget constraint. We propose an $L_{p,q}$-variation functional to quantify the change, which yields less variation for dynamic function sequences whose changes are constrained to short time periods or small subsets of input domain. Under the $L_{p,q}$-variation constraint, we derive both upper and matching lower regret bounds for smooth and strongly convex function sequences, which generalize previous results in Besbes et al. (2015). Furthermore, we provide an upper bound for general convex function sequences with noisy gradient feedback, which matches the optimal rate as $p\to\infty$. Our results reveal some surprising phenomena under this general variation functional, such as the curse of dimensionality of the function domain. The key technical novelties in our analysis include affinity lemmas that characterize the distance of the minimizers of two convex functions with bounded Lp difference, and a cubic spline based construction that attains matching lower bounds.

It is an important task to reliably detect and track multiple moving objects for video surveillance and monitoring. However, when occlusion occurs in nonlinear motion scenarios, many existing methods often fail to continuously track multiple moving objects of interest. In this paper we propose an effective approach for detection and tracking of multiple moving objects with occlusion. Moving targets are initially detected using a simple yet efficient block matching technique, providing rough location information for multiple object tracking. More accurate location information is then estimated for each moving object by a nonlinear tracking algorithm. Considering the ambiguity caused by the occlusion among multiple moving objects, we apply an unscented Kalman filtering (UKF) technique for reliable object detection and tracking. Different from conventional Kalman filtering (KF), which cannot achieve the optimal estimation in nonlinear tracking scenarios, UKF can be used to track both linear and nonlinear motions due to the unscented transform. Further, it estimates the velocity information for each object to assist to the object detection algorithm, effectively delineating multiple moving objects of occlusion. The experimental results demonstrate that the proposed method can correctly detect and track multiple moving objects with nonlinear motion patterns and occlusions.

We study the active learning problem of top-$k$ ranking from multi-wise comparisons under the popular multinomial logit model. Our goal is to identify the top-$k$ items with high probability by adaptively querying sets for comparisons and observing the noisy output of the most preferred item from each comparison. To achieve this goal, we design a new active ranking algorithm without using any information about the underlying items' preference scores. We also establish a matching lower bound on the sample complexity even when the set of preference scores is given to the algorithm. These two results together show that the proposed algorithm is nearly instance optimal (similar to instance optimal [FLN03], but up to polylog factors). Our work extends the existing literature on rank aggregation in three directions. First, instead of studying a static problem with fixed data, we investigate the top-$k$ ranking problem in an active learning setting. Second, we show our algorithm is nearly instance optimal, which is a much stronger theoretical guarantee. Finally, we extend the pairwise comparison to the multi-wise comparison, which has not been fully explored in ranking literature.

Rank aggregation based on pairwise comparisons over a set of items has a wide range of applications. Although considerable research has been devoted to the development of rank aggregation algorithms, one basic question is how to efficiently collect a large amount of high-quality pairwise comparisons for the ranking purpose. Because of the advent of many crowdsourcing services, a crowd of workers are often hired to conduct pairwise comparisons with a small monetary reward for each pair they compare. Since different workers have different levels of reliability and different pairs have different levels of ambiguity, it is desirable to wisely allocate the limited budget for comparisons among the pairs of items and workers so that the global ranking can be accurately inferred from the comparison results. To this end, we model the active sampling problem in crowdsourced ranking as a Bayesian Markov decision process, which dynamically selects item pairs and workers to improve the ranking accuracy under a budget constraint. We further develop a computationally efficient sampling policy based on knowledge gradient as well as a moment matching technique for posterior approximation. Experimental evaluations on both synthetic and real data show that the proposed policy achieves high ranking accuracy with a lower labeling cost.