Research papers and code for "Yi Mao":
Incorporating domain knowledge into the modeling process is an effective way to improve learning accuracy. However, as it is provided by humans, domain knowledge can only be specified with some degree of uncertainty. We propose to explicitly model such uncertainty through probabilistic constraints over the parameter space. In contrast to hard parameter constraints, our approach is effective also when the domain knowledge is inaccurate and generally results in superior modeling accuracy. We focus on generative and conditional modeling where the parameters are assigned a Dirichlet or Gaussian prior and demonstrate the framework with experiments on both synthetic and real-world data.

* Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)
Click to Read Paper and Get Code
In this paper, we address the task of learning novel visual concepts, and their interactions with other concepts, from a few images with sentence descriptions. Using linguistic context and visual features, our method is able to efficiently hypothesize the semantic meaning of new words and add them to its word dictionary so that they can be used to describe images which contain these novel concepts. Our method has an image captioning module based on m-RNN with several improvements. In particular, we propose a transposed weight sharing scheme, which not only improves performance on image captioning, but also makes the model more suitable for the novel concept learning task. We propose methods to prevent overfitting the new concepts. In addition, three novel concept datasets are constructed for this new task. In the experiments, we show that our method effectively learns novel visual concepts from a few examples without disturbing the previously learned concepts. The project page is http://www.stat.ucla.edu/~junhua.mao/projects/child_learning.html

* ICCV 2015 camera ready version. We add much more novel visual concepts in the NVC dataset and have released it, see http://www.stat.ucla.edu/~junhua.mao/projects/child_learning.html
Click to Read Paper and Get Code
In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. It directly models the probability distribution of generating a word given previous words and an image. Image captions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on four benchmark datasets (IAPR TC-12, Flickr 8K, Flickr 30K and MS COCO). Our model outperforms the state-of-the-art methods. In addition, we apply the m-RNN model to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval. The project page of this work is: www.stat.ucla.edu/~junhua.mao/m-RNN.html .

* ICLR 2015
* Add a simple strategy to boost the performance of image captioning task significantly. More details are shown in Section 8 of the paper. The code and related data are available at https://github.com/mjhucla/mRNN-CR ;. arXiv admin note: substantial text overlap with arXiv:1410.1090
Click to Read Paper and Get Code
Text documents are complex high dimensional objects. To effectively visualize such data it is important to reduce its dimensionality and visualize the low dimensional embedding as a 2-D or 3-D scatter plot. In this paper we explore dimensionality reduction methods that draw upon domain knowledge in order to achieve a better low dimensional embedding and visualization of documents. We consider the use of geometries specified manually by an expert, geometries derived automatically from corpus statistics, and geometries computed from linguistic resources.

* 13 pages, 15 figures
Click to Read Paper and Get Code
Predicting the popularity of online videos is important for video streaming content providers. This is a challenging problem because of the following two reasons. First, the problem is both "wide" and "deep". That is, it not only depends on a wide range of features, but also be highly non-linear and complex. Second, multiple competitors may be involved. In this paper, we propose a general prediction model using the multi-task learning (MTL) module and the relation network (RN) module, where MTL can reduce over-fitting and RN can model the relations of multiple competitors. Experimental results show that our proposed approach significantly increases the accuracy on predicting the total view counts of TV series with RN and MTL modules.

Click to Read Paper and Get Code
In this paper, we propose a low-rank representation with symmetric constraint (LRRSC) method for robust subspace clustering. Given a collection of data points approximately drawn from multiple subspaces, the proposed technique can simultaneously recover the dimension and members of each subspace. LRRSC extends the original low-rank representation algorithm by integrating a symmetric constraint into the low-rankness property of high-dimensional data representation. The symmetric low-rank representation, which preserves the subspace structures of high-dimensional data, guarantees weight consistency for each pair of data points so that highly correlated data points of subspaces are represented together. Moreover, it can be efficiently calculated by solving a convex optimization problem. We provide a rigorous proof for minimizing the nuclear-norm regularized least square problem with a symmetric constraint. The affinity matrix for spectral clustering can be obtained by further exploiting the angular information of the principal directions of the symmetric low-rank representation. This is a critical step towards evaluating the memberships between data points. Experimental results on benchmark databases demonstrate the effectiveness and robustness of LRRSC compared with several state-of-the-art subspace clustering algorithms.

* 12 pages
Click to Read Paper and Get Code
High dimensional structured data such as text and images is often poorly understood and misrepresented in statistical modeling. The standard histogram representation suffers from high variance and performs poorly in general. We explore novel connections between statistical translation, heat kernels on manifolds and graphs, and expected distances. These connections provide a new framework for unsupervised metric learning for text documents. Experiments indicate that the resulting distances are generally superior to their more standard counterparts.

* Appears in Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007)
Click to Read Paper and Get Code
We propose a symmetric low-rank representation (SLRR) method for subspace clustering, which assumes that a data set is approximately drawn from the union of multiple subspaces. The proposed technique can reveal the membership of multiple subspaces through the self-expressiveness property of the data. In particular, the SLRR method considers a collaborative representation combined with low-rank matrix recovery techniques as a low-rank representation to learn a symmetric low-rank representation, which preserves the subspace structures of high-dimensional data. In contrast to performing iterative singular value decomposition in some existing low-rank representation based algorithms, the symmetric low-rank representation in the SLRR method can be calculated as a closed form solution by solving the symmetric low-rank optimization problem. By making use of the angular information of the principal directions of the symmetric low-rank representation, an affinity graph matrix is constructed for spectral clustering. Extensive experimental results show that it outperforms state-of-the-art subspace clustering algorithms.

* 13 pages
Click to Read Paper and Get Code
Web applications suffer from cross-site scripting (XSS) attacks that resulting from incomplete or incorrect input sanitization. Learning the structure of attack vectors could enrich the variety of manifestations in generated XSS attacks. In this study, we focus on generating more threatening XSS attacks for the state-of-the-art detection approaches that can find potential XSS vulnerabilities in Web applications, and propose a mechanism for structural learning of attack vectors with the aim of generating mutated XSS attacks in a fully automatic way. Mutated XSS attack generation depends on the analysis of attack vectors and the structural learning mechanism. For the kernel of the learning mechanism, we use a Hidden Markov model (HMM) as the structure of the attack vector model to capture the implicit manner of the attack vector, and this manner is benefited from the syntax meanings that are labeled by the proposed tokenizing mechanism. Bayes theorem is used to determine the number of hidden states in the model for generalizing the structure model. The paper has the contributions as following: (1) automatically learn the structure of attack vectors from practical data analysis to modeling a structure model of attack vectors, (2) mimic the manners and the elements of attack vectors to extend the ability of testing tool for identifying XSS vulnerabilities, (3) be helpful to verify the flaws of blacklist sanitization procedures of Web applications. We evaluated the proposed mechanism by Burp Intruder with a dataset collected from public XSS archives. The results show that mutated XSS attack generation can identify potential vulnerabilities.

* EPTCS 35, 2010, pp. 15-26
* In Proceedings TAV-WEB 2010, arXiv:1009.3306
Click to Read Paper and Get Code
The operation and planning of large-scale power systems are becoming more challenging with the increasing penetration of stochastic renewable generation. In order to minimize the decision risks in power systems with large amount of renewable resources, there is a growing need to model the short-term generation uncertainty. By producing a group of possible future realizations for certain set of renewable generation plants, scenario approach has become one popular way for renewables uncertainty modeling. However, due to the complex spatial and temporal correlations underlying in renewable generations, traditional model-based approaches for forecasting future scenarios often require extensive knowledge, while fitted models are often hard to scale. To address such modeling burdens, we propose a learning-based, data-driven scenario forecasts method based on generative adversarial networks (GANs), which is a class of deep-learning generative algorithms used for modeling unknown distributions. We firstly utilize an improved GANs with convergence guarantees to learn the intrinsic patterns and model the unknown distributions of (multiple-site) renewable generation time-series. Then by solving an optimization problem, we are able to generate forecasted scenarios without any scenario number and forecasting horizon restrictions. Our method is totally model-free, and could forecast scenarios under different level of forecast uncertainties. Extensive numerical simulations using real-world data from NREL wind and solar integration datasets validate the performance of proposed method in forecasting both wind and solar power scenarios.

Click to Read Paper and Get Code
Recently, topic modeling has been widely used to discover the abstract topics in text corpora. Most of the existing topic models are based on the assumption of three-layer hierarchical Bayesian structure, i.e. each document is modeled as a probability distribution over topics, and each topic is a probability distribution over words. However, the assumption is not optimal. Intuitively, it's more reasonable to assume that each topic is a probability distribution over concepts, and then each concept is a probability distribution over words, i.e. adding a latent concept layer between topic layer and word layer in traditional three-layer assumption. In this paper, we verify the proposed assumption by incorporating the new assumption in two representative topic models, and obtain two novel topic models. Extensive experiments were conducted among the proposed models and corresponding baselines, and the results show that the proposed models significantly outperform the baselines in terms of case study and perplexity, which means the new assumption is more reasonable than traditional one.

* 7 pages
Click to Read Paper and Get Code
Policy gradient methods have achieved remarkable successes in solving challenging reinforcement learning problems. However, it still often suffers from the large variance issue on policy gradient estimation, which leads to poor sample efficiency during training. In this work, we propose a control variate method to effectively reduce variance for policy gradient methods. Motivated by the Stein's identity, our method extends the previous control variate methods used in REINFORCE and advantage actor-critic by introducing more general action-dependent baseline functions. Empirical studies show that our method significantly improves the sample efficiency of the state-of-the-art policy gradient approaches.

* The first two authors contributed equally. Author ordering determined by coin flip over a Google Hangout. Accepted by ICLR 2018
Click to Read Paper and Get Code
While deep convolutional neural networks (CNNs) have shown a great success in single-label image classification, it is important to note that real world images generally contain multiple labels, which could correspond to different objects, scenes, actions and attributes in an image. Traditional approaches to multi-label image classification learn independent classifiers for each category and employ ranking or thresholding on the classification results. These techniques, although working well, fail to explicitly exploit the label dependencies in an image. In this paper, we utilize recurrent neural networks (RNNs) to address this problem. Combined with CNNs, the proposed CNN-RNN framework learns a joint image-label embedding to characterize the semantic label dependency as well as the image-label relevance, and it can be trained end-to-end from scratch to integrate both information in a unified framework. Experimental results on public benchmark datasets demonstrate that the proposed architecture achieves better performance than the state-of-the-art multi-label classification model

* CVPR 2016
Click to Read Paper and Get Code
The Ramsey number is of vital importance in Ramsey's theorem. This paper proposed a novel methodology for constructing Ramsey graphs about R(3,10), which uses Artificial Bee Colony optimization(ABC) to raise the lower bound of Ramsey number R(3,10). The r(3,10)-graph contains two limitations, that is, neither complete graphs of order 3 nor independent sets of order 10. To resolve these limitations, a special mathematical model is put in the paradigm to convert the problems into discrete optimization whose smaller minimizers are correspondent to bigger lower bound as approximation of inf R(3,10). To demonstrate the potential of the proposed method, simulations are done to to minimize the amount of these two types of graphs. For the first time, four r(3,9,39) graphs with best approximation for inf R(3,10) are reported in simulations to support the current lower bound for R(3,10). The experiments' results show that the proposed paradigm for Ramsey number's calculation driven by ABC is a successful method with the advantages of high precision and robustness.

* 25 pages, 6 figures
Click to Read Paper and Get Code
In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel sentence descriptions to explain the content of images. It directly models the probability distribution of generating a word given previous words and the image. Image descriptions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on three benchmark datasets (IAPR TC-12, Flickr 8K, and Flickr 30K). Our model outperforms the state-of-the-art generative method. In addition, the m-RNN model can be applied to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval.

Click to Read Paper and Get Code
We present a sequence-to-action parsing approach for the natural language to SQL task that incrementally fills the slots of a SQL query with feasible actions from a pre-defined inventory. To account for the fact that typically there are multiple correct SQL queries with the same or very similar semantics, we draw inspiration from syntactic parsing techniques and propose to train our sequence-to-action models with non-deterministic oracles. We evaluate our models on the WikiSQL dataset and achieve an execution accuracy of 83.7% on the test set, a 2.1% absolute improvement over the models trained with traditional static oracles assuming a single correct target SQL query. When further combined with the execution-guided decoding strategy, our model sets a new state-of-the-art performance at an execution accuracy of 87.1%.

Click to Read Paper and Get Code
Deep Neural Networks (DNN) have demonstrated superior ability to extract high level embedding vectors from low level features. Despite the success, the serving time is still the bottleneck due to expensive run-time computation of multiple layers of dense matrices. GPGPU, FPGA, or ASIC-based serving systems require additional hardware that are not in the mainstream design of most commercial applications. In contrast, tree or forest-based models are widely adopted because of low serving cost, but heavily depend on carefully engineered features. This work proposes a Deep Embedding Forest model that benefits from the best of both worlds. The model consists of a number of embedding layers and a forest/tree layer. The former maps high dimensional (hundreds of thousands to millions) and heterogeneous low-level features to the lower dimensional (thousands) vectors, and the latter ensures fast serving. Built on top of a representative DNN model called Deep Crossing, and two forest/tree-based models including XGBoost and LightGBM, a two-step Deep Embedding Forest algorithm is demonstrated to achieve on-par or slightly better performance as compared with the DNN counterpart, with only a fraction of serving time on conventional hardware. After comparing with a joint optimization algorithm called partial fuzzification, also proposed in this paper, it is concluded that the two-step Deep Embedding Forest has achieved near optimal performance. Experiments based on large scale data sets (up to 1 billion samples) from a major sponsored search engine proves the efficacy of the proposed model.

* 14 pages, 3 figures, 5 tables
Click to Read Paper and Get Code
We consider the problem of neural semantic parsing, which translates natural language questions into executable SQL queries. We introduce a new mechanism, execution guidance, to leverage the semantics of SQL. It detects and excludes faulty programs during the decoding procedure by conditioning on the execution of partially generated program. The mechanism can be used with any autoregressive generative model, which we demonstrate on four state-of-the-art recurrent or template-based semantic parsing models. We demonstrate that execution guidance universally improves model performance on various text-to-SQL datasets with different scales and query complexity: WikiSQL, ATIS, and GeoQuery. As a result, we achieve new state-of-the-art execution accuracy of 83.8% on WikiSQL.

Click to Read Paper and Get Code
Large companies need to monitor various metrics (for example, Page Views and Revenue) of their applications and services in real time. At Microsoft, we develop a time-series anomaly detection service which helps customers to monitor the time-series continuously and alert for potential incidents on time. In this paper, we introduce the pipeline and algorithm of our anomaly detection service, which is designed to be accurate, efficient and general. The pipeline consists of three major modules, including data ingestion, experimentation platform and online compute. To tackle the problem of time-series anomaly detection, we propose a novel algorithm based on Spectral Residual (SR) and Convolutional Neural Network (CNN). Our work is the first attempt to borrow the SR model from visual saliency detection domain to time-series anomaly detection. Moreover, we innovatively combine SR and CNN together to improve the performance of SR model. Our approach achieves superior experimental results compared with state-of-the-art baselines on both public datasets and Microsoft production data.

* KDD 2019
Click to Read Paper and Get Code
The spatial distributions of different types of cells could reveal a cancer cell growth pattern, its relationships with the tumor microenvironment and the immune response of the body, all of which represent key hallmarks of cancer. However, manually recognizing and localizing all the cells in pathology slides are almost impossible. In this study, we developed an automated cell type classification pipeline, ConvPath, which includes nuclei segmentation, convolutional neural network-based tumor, stromal and lymphocytes classification, and extraction of tumor microenvironment related features for lung cancer pathology images. The overall classification accuracy is 92.9% and 90.1% in training and independent testing datasets, respectively. By identifying cells and classifying cell types, this pipeline can convert a pathology image into a spatial map of tumor, stromal and lymphocyte cells. From this spatial map, we can extracted features that characterize the tumor micro-environment. Based on these features, we developed an image feature-based prognostic model and validated the model in two independent cohorts. The predicted risk group serves as an independent prognostic factor, after adjusting for clinical variables that include age, gender, smoking status, and stage.

Click to Read Paper and Get Code