Research papers and code for "Rui Luo":
We propose a new sampler that integrates the protocol of parallel tempering with the Nos\'e-Hoover (NH) dynamics. The proposed method can efficiently draw representative samples from complex posterior distributions with multiple isolated modes in the presence of noise arising from stochastic gradient. It potentially facilitates deep Bayesian learning on large datasets where complex multimodal posteriors and mini-batch gradient are encountered.

Click to Read Paper and Get Code
We propose {graphical sure screening}, or GRASS, a very simple and computationally-efficient screening procedure for recovering the structure of a Gaussian graphical model in the high-dimensional setting. The GRASS estimate of the conditional dependence graph is obtained by thresholding the elements of the sample covariance matrix. The proposed approach possesses the sure screening property: with very high probability, the GRASS estimated edge set contains the true edge set. Furthermore, with high probability, the size of the estimated edge set is controlled. We provide a choice of threshold for GRASS that can control the expected false positive rate. We illustrate the performance of GRASS in a simulation study and on a gene expression data set, and show that in practice it performs quite competitively with more complex and computationally-demanding techniques for graph estimation.

Click to Read Paper and Get Code
In this paper, we propose a new sampler for Bayesian learning that can efficiently draw representative samples from complex posterior distributions with multiple isolated modes in the presence of mini-batch noise. This is done by simulating a collection of replicas in parallel with different temperatures. When evolving the Nos\'e-Hoover dynamics, the sampler adaptively neutralizes the mini-batch noise. To approximate the detailed balance, configuration exchange is performed periodically between adjacent replicas according to a noise-aware test of acceptance. While its effectiveness on complex multimodal posteriors has been illustrated by testing over synthetic distributions, experiments on deep Bayesian neural network learning have shown its significant improvements over strong baselines for image classification.

Click to Read Paper and Get Code
In recent year, tremendous strides have been made in face detection thanks to deep learning. However, most published face detectors deteriorate dramatically as the faces become smaller. In this paper, we present the Small Faces Attention (SFA) face detector to better detect faces with small scale. First, we propose a new scale-invariant face detection architecture which pays more attention to small faces, including 4-branch detection architecture and small faces sensitive anchor design. Second, feature maps fusion strategy is applied in SFA by partially combining high-level features into low-level features to further improve the ability of finding hard faces. Third, we use multi-scale training and testing strategy to enhance face detection performance in practice. Comprehensive experiments show that SFA significantly improves face detection performance, especially on small faces. Our real-time SFA face detector can run at 5 FPS on a single GPU as well as maintain high performance. Besides, our final SFA face detector achieves state-of-the-art detection performance on challenging face detection benchmarks, including WIDER FACE and FDDB datasets, with competitive runtime speed. Both our code and models will be available to the research community.

* 10 pages, 0 figures, 0 tables, 41 references
Click to Read Paper and Get Code
Volatility is a quantity of measurement for the price movements of stocks or options which indicates the uncertainty within financial markets. As an indicator of the level of risk or the degree of variation, volatility is important to analyse the financial market, and it is taken into consideration in various decision-making processes in financial activities. On the other hand, recent advancement in deep learning techniques has shown strong capabilities in modelling sequential data, such as speech and natural language. In this paper, we empirically study the applicability of the latest deep structures with respect to the volatility modelling problem, through which we aim to provide an empirical guidance for the theoretical analysis of the marriage between deep learning techniques and financial applications in the future. We examine both the traditional approaches and the deep sequential models on the task of volatility prediction, including the most recent variants of convolutional and recurrent networks, such as the dilated architecture. Accordingly, experiments with real-world stock price datasets are performed on a set of 1314 daily stock series for 2018 days of transaction. The evaluation and comparison are based on the negative log likelihood (NLL) of real-world stock price time series. The result shows that the dilated neural models, including dilated CNN and Dilated RNN, produce most accurate estimation and prediction, outperforming various widely-used deterministic models in the GARCH family and several recently proposed stochastic models. In addition, the high flexibility and rich expressive power are validated in this study.

* NIPS 2018, Workshop on Challenges and Opportunities for AI in Financial Services
Click to Read Paper and Get Code
In this paper, we show that the recent integration of statistical models with deep recurrent neural networks provides a new way of formulating volatility (the degree of variation of time series) models that have been widely used in time series analysis and prediction in finance. The model comprises a pair of complementary stochastic recurrent neural networks: the generative network models the joint distribution of the stochastic volatility process; the inference network approximates the conditional distribution of the latent variables given the observables. Our focus here is on the formulation of temporal dynamics of volatility over time under a stochastic recurrent neural network framework. Experiments on real-world stock price datasets demonstrate that the proposed model generates a better volatility estimation and prediction that outperforms stronge baseline methods, including the deterministic models, such as GARCH and its variants, and the stochastic MCMC-based models, and the Gaussian-process-based, on the average negative log-likelihood measure.

Click to Read Paper and Get Code
Recently, the rapid development of word embedding and neural networks has brought new inspiration to various NLP and IR tasks. In this paper, we describe a staged hybrid model combining Recurrent Convolutional Neural Networks (RCNN) with highway layers. The highway network module is incorporated in the middle takes the output of the bi-directional Recurrent Neural Network (Bi-RNN) module in the first stage and provides the Convolutional Neural Network (CNN) module in the last stage with the input. The experiment shows that our model outperforms common neural network models (CNN, RNN, Bi-RNN) on a sentiment analysis task. Besides, the analysis of how sequence length influences the RCNN with highway layers shows that our model could learn good representation for the long text.

* Neu-IR '16 SIGIR Workshop on Neural Information Retrieval
Click to Read Paper and Get Code
In this paper, we propose a novel sampling method, the thermostat-assisted continuously-tempered Hamiltonian Monte Carlo, for the purpose of multimodal Bayesian learning. It simulates a noisy dynamical system by incorporating both a continuously-varying tempering variable and the Nos\'e-Hoover thermostats. A significant benefit is that it is not only able to efficiently generate i.i.d. samples when the underlying posterior distributions are multimodal, but also capable of adaptively neutralising the noise arising from the use of mini-batches. While the properties of the approach have been studied using synthetic datasets, our experiments on three real datasets have also shown its performance gains over several strong baselines for Bayesian learning with various types of neural networks plunged in.

Click to Read Paper and Get Code
In this paper, we propose Hard Person Identity Mining (HPIM) that attempts to refine the hard example mining to improve the exploration efficacy in person re-identification. It is motivated by following observation: the more attributes some people share, the more difficult to separate their identities. Based on this observation, we develop HPIM via a transferred attribute describer, a deep multi-attribute classifier trained from the source noisy person attribute datasets. We encode each image into the attribute probabilistic description in the target person re-ID dataset. Afterwards in the attribute code space, we consider each person as a distribution to generate his view-specific attribute codes in different practical scenarios. Hence we estimate the person-specific statistical moments from zeroth to higher order, which are further used to calculate the central moment discrepancies between persons. Such discrepancy is a ground to choose hard identity to organize proper mini-batches, without concerning the person representation changing in metric learning. It presents as a complementary tool of hard example mining, which helps to explore the global instead of the local hard example constraint in the mini-batch built by randomly sampled identities. Extensive experiments on two person re-identification benchmarks validated the effectiveness of our proposed algorithm.

Click to Read Paper and Get Code
Humans are capable of attributing latent mental contents such as beliefs, or intentions to others. The social skill is critical in everyday life to reason about the potential consequences of their behaviors so as to plan ahead. It is known that humans use this reasoning ability recursively, i.e. considering what others believe about their own beliefs. In this paper, we start from level-$1$ recursion and introduce a probabilistic recursive reasoning (PR2) framework for multi-agent reinforcement learning. Our hypothesis is that it is beneficial for each agent to account for how the opponents would react to its future behaviors. Under the PR2 framework, we adopt variational Bayes methods to approximate the opponents' conditional policy, to which each agent finds the best response and then improve their own policy. We develop decentralized-training-decentralized-execution algorithms, PR2-Q and PR2-Actor-Critic, that are proved to converge in the self-play scenario when there is one Nash equilibrium. Our methods are tested on both the matrix game and the differential game, which have a non-trivial equilibrium where common gradient-based methods fail to converge. Our experiments show that it is critical to reason about how the opponents believe about what the agent believes. We expect our work to contribute a new idea of modeling the opponents to the multi-agent reinforcement learning community.

* ICLR 2019
Click to Read Paper and Get Code
Recognizing pedestrian attributes is an important task in computer vision community due to it plays an important role in video surveillance. Many algorithms has been proposed to handle this task. The goal of this paper is to review existing works using traditional methods or based on deep learning networks. Firstly, we introduce the background of pedestrian attributes recognition (PAR, for short), including the fundamental concepts of pedestrian attributes and corresponding challenges. Secondly, we introduce existing benchmarks, including popular datasets and evaluation criterion. Thirdly, we analyse the concept of multi-task learning and multi-label learning, and also explain the relations between these two learning algorithms and pedestrian attribute recognition. We also review some popular network architectures which have widely applied in the deep learning community. Fourthly, we analyse popular solutions for this task, such as attributes group, part-based, \emph{etc}. Fifthly, we shown some applications which takes pedestrian attributes into consideration and achieve better performance. Finally, we summarized this paper and give several possible research directions for pedestrian attributes recognition. The project page of this paper can be found from the following website: \url{https://sites.google.com/view/ahu-pedestrianattributes/}.

* Check our project page for High Resolution version of this survey: https://sites.google.com/view/ahu-pedestrianattributes/
Click to Read Paper and Get Code
The tracking-by-detection framework requires a set of positive and negative training samples to learn robust tracking models for precise localization of target objects. However, existing tracking models mostly treat different samples independently while ignores the relationship information among them. In this paper, we propose a novel structure-aware deep neural network to overcome such limitations. In particular, we construct a graph to represent the pairwise relationships among training samples, and additionally take the natural language as the supervised information to learn both feature representations and classifiers robustly. To refine the states of the target and re-track the target when it is back to view from heavy occlusion and out of view, we elaborately design a novel subnetwork to learn the target-driven visual attentions from the guidance of both visual and natural language cues. Extensive experiments on five tracking benchmark datasets validated the effectiveness of our proposed method.

Click to Read Paper and Get Code
Incorporating various modes of information into the machine learning procedure is becoming a new trend. And data from various source can provide more information than single one no matter they are heterogeneous or homogeneous. Existing deep learning based algorithms usually directly concatenate features from each domain to represent the input data. Seldom of them take the quality of data into consideration which is a key issue in related multimodal problems. In this paper, we propose an efficient quality-aware deep neural network to model the weight of data from each domain using deep reinforcement learning (DRL). Specifically, we take the weighting of each domain as a decision-making problem and teach an agent learn to interact with the environment. The agent can tune the weight of each domain through discrete action selection and obtain a positive reward if the saliency results are improved. The target of the agent is to achieve maximum rewards after finished its sequential action selection. We validate the proposed algorithms on multimodal saliency detection in a coarse-to-fine way. The coarse saliency maps are generated from an encoder-decoder framework which is trained with content loss and adversarial loss. The final results can be obtained via adaptive weighting of maps from each domain. Experiments conducted on two kinds of salient object detection benchmarks validated the effectiveness of our proposed quality-aware deep neural network.

Click to Read Paper and Get Code
Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of agent interactions. In this paper, we present Mean Field Reinforcement Learning where the interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents; the interplay between the two entities is mutually reinforced: the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics of the population change according to the collective patterns of the individual policies. We develop practical mean field Q-learning and mean field Actor-Critic algorithms and analyze the convergence of the solution to Nash equilibrium. Experiments on Gaussian squeeze, Ising model, and battle games justify the learning effectiveness of our mean field approaches. In addition, we report the first result to solve the Ising model via model-free reinforcement learning methods.

* ICML 2018 (Full paper + Long talk)
Click to Read Paper and Get Code
The success of many natural language processing (NLP) tasks is bound by the number and quality of annotated data, but there is often a shortage of such training data. In this paper, we ask the question: "Can we combine a neural network (NN) with regular expressions (RE) to improve supervised learning for NLP?". In answer, we develop novel methods to exploit the rich expressiveness of REs at different levels within a NN, showing that the combination significantly enhances the learning effectiveness when a small number of training examples are available. We evaluate our approach by applying it to spoken language understanding for intent detection and slot filling. Experimental results show that our approach is highly effective in exploiting the available training data, giving a clear boost to the RE-unaware NN.

* 11 Pages, 2 Figures, Accepted by ACL 2018
Click to Read Paper and Get Code
Person re-identification plays an important role in realistic video surveillance with increasing demand for public safety. In this paper, we propose a novel framework with rules of updating images for person re-identification in real-world surveillance system. First, Image Pool is generated by using mean-shift tracking method to automatically select video frame fragments of the target person. Second, features extracted from Image Pool by convolutional network work together to re-rank original ranking list of the main image and matching results will be generated. In addition, updating rules are designed for replacing images in Image Pool when a new image satiating with our updating critical formula in video system. These rules fall into two categories: if the new image is from the same camera as the previous updated image, it will replace one of assist images; otherwise, it will replace the main image directly. Experiments are conduced on Market-1501, iLIDS-VID and PRID-2011 and our ITSD datasets to validate that our framework outperforms on rank-1 accuracy and mAP for person re-identification. Furthermore, the update ability of our framework provides consistently remarkable accuracy rate in real-world surveillance system.

* 23 pages,5 figures. submitted to JVCI
Click to Read Paper and Get Code
Distant supervision significantly reduces human efforts in building training data for many classification tasks. While promising, this technique often introduces noise to the generated training data, which can severely affect the model performance. In this paper, we take a deep look at the application of distant supervision in relation extraction. We show that the dynamic transition matrix can effectively characterize the noise in the training data built by distant supervision. The transition matrix can be effectively trained using a novel curriculum learning based method without any direct supervision about the noise. We thoroughly evaluate our approach under a wide range of extraction scenarios. Experimental results show that our approach consistently improves the extraction results and outperforms the state-of-the-art in various evaluation scenarios.

* 10 pages, accepted by ACL 2017
Click to Read Paper and Get Code
A fundamental trade-off between effectiveness and efficiency needs to be balanced when designing an online question answering system. Effectiveness comes from sophisticated functions such as extractive machine reading comprehension (MRC), while efficiency is obtained from improvements in preliminary retrieval components such as candidate document selection and paragraph ranking. Given the complexity of the real-world multi-document MRC scenario, it is difficult to jointly optimize both in an end-to-end system. To address this problem, we develop a novel deep cascade learning model, which progressively evolves from the document-level and paragraph-level ranking of candidate texts to more precise answer extraction with machine reading comprehension. Specifically, irrelevant documents and paragraphs are first filtered out with simple functions for efficiency consideration. Then we jointly train three modules on the remaining texts for better tracking the answer: the document extraction, the paragraph extraction and the answer extraction. Experiment results show that the proposed method outperforms the previous state-of-the-art methods on two large-scale multi-document benchmark datasets, i.e., TriviaQA and DuReader. In addition, our online system can stably serve typical scenarios with millions of daily requests in less than 50ms.

* Accepted at AAAI 2019
Click to Read Paper and Get Code
As architecture, system, data management, and machine learning communities pay greater attention to innovative big data and data-driven artificial intelligence (in short, AI) algorithms, architecture, and systems, the pressure of benchmarking rises. However, complexity, diversity, frequently changed workloads, and rapid evolution of big data, especially AI systems raise great challenges in benchmarking. First, for the sake of conciseness, benchmarking scalability, portability cost, reproducibility, and better interpretation of performance data, we need understand what are the abstractions of frequently-appearing units of computation, which we call dwarfs, among big data and AI workloads. Second, for the sake of fairness, the benchmarks must include diversity of data and workloads. Third, for co-design of software and hardware, the benchmarks should be consistent across different communities. Other than creating a new benchmark or proxy for every possible workload, we propose using dwarf-based benchmarks--the combination of eight dwarfs--to represent diversity of big data and AI workloads. The current version--BigDataBench 4.0 provides 13 representative real-world data sets and 47 big data and AI benchmarks, including seven workload types: online service, offline analytics, graph analytics, AI, data warehouse, NoSQL, and streaming. BigDataBench 4.0 is publicly available from http://prof.ict.ac.cn/BigDataBench. Also, for the first time, we comprehensively characterize the benchmarks of seven workload types in BigDataBench 4.0 in addition to traditional benchmarks like SPECCPU, PARSEC and HPCC in a hierarchical manner and drill down on five levels, using the Top-Down analysis from an architecture perspective.

* 24 pages, 10 figures
Click to Read Paper and Get Code