Models, code, and papers for "Fan Yang":

Feature Augmentation via Nonparametrics and Selection (FANS) in High Dimensional Classification

Jan 02, 2015
Jianqing Fan, Yang Feng, Jiancheng Jiang, Xin Tong

We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.

* 30 pages, 2 figures 

  Click for Model/Code and Paper
Learning to Teach

May 09, 2018
Yang Fan, Fei Tian, Tao Qin, Xiang-Yang Li, Tie-Yan Liu

Teaching plays a very important role in our society, by spreading human knowledge and educating our next generations. A good teacher will select appropriate teaching materials, impact suitable methodologies, and set up targeted examinations, according to the learning behaviors of the students. In the field of artificial intelligence, however, one has not fully explored the role of teaching, and pays most attention to machine \emph{learning}. In this paper, we argue that equal attention, if not more, should be paid to teaching, and furthermore, an optimization framework (instead of heuristics) should be used to obtain good teaching strategies. We call this approach `learning to teach'. In the approach, two intelligent agents interact with each other: a student model (which corresponds to the learner in traditional machine learning algorithms), and a teacher model (which determines the appropriate data, loss function, and hypothesis space to facilitate the training of the student model). The teacher model leverages the feedback from the student model to optimize its own teaching strategies by means of reinforcement learning, so as to achieve teacher-student co-evolution. To demonstrate the practical value of our proposed approach, we take the training of deep neural networks (DNN) as an example, and show that by using the learning to teach techniques, we are able to use much less training data and fewer iterations to achieve almost the same accuracy for different kinds of DNN models (e.g., multi-layer perceptron, convolutional neural networks and recurrent neural networks) under various machine learning tasks (e.g., image classification and text understanding).

* ICLR 2018 

  Click for Model/Code and Paper
High-dimensional variable selection for Cox's proportional hazards model

May 19, 2010
Jianqing Fan, Yang Feng, Yichao Wu

Variable selection in high dimensional space has challenged many contemporary statistical problems from many frontiers of scientific disciplines. Recent technology advance has made it possible to collect a huge amount of covariate information such as microarray, proteomic and SNP data via bioimaging technology while observing survival information on patients in clinical studies. Thus, the same challenge applies to the survival analysis in order to understand the association between genomics information and clinical information about the survival time. In this work, we extend the sure screening procedure Fan and Lv (2008) to Cox's proportional hazards model with an iterative version available. Numerical simulation studies have shown encouraging performance of the proposed method in comparison with other techniques such as LASSO. This demonstrates the utility and versatility of the iterative sure independent screening scheme.

* 17 pages, 5 figures 

  Click for Model/Code and Paper
Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models

Jan 18, 2011
Jianqing Fan, Yang Feng, Rui Song

A variable screening procedure via correlation learning was proposed Fan and Lv (2008) to reduce dimensionality in sparse ultra-high dimensional models. Even when the true model is linear, the marginal regression can be highly nonlinear. To address this issue, we further extend the correlation learning to marginal nonparametric learning. Our nonparametric independence screening is called NIS, a specific member of the sure independence screening. Several closely related variable screening procedures are proposed. Under the nonparametric additive models, it is shown that under some mild technical conditions, the proposed independence screening methods enjoy a sure screening property. The extent to which the dimensionality can be reduced by independence screening is also explicitly quantified. As a methodological extension, an iterative nonparametric independence screening (INIS) is also proposed to enhance the finite sample performance for fitting sparse additive models. The simulation results and a real data analysis demonstrate that the proposed procedure works well with moderate sample size and large dimension and performs better than competing methods.

* 48 pages 

  Click for Model/Code and Paper
Detecting Unknown Behaviors by Pre-defined Behaviours: An Bayesian Non-parametric Approach

Nov 25, 2019
Jin Watanabe, Fan Yang

An automatic mouse behavior recognition system can considerably reduce the workload of experimenters and facilitate the analysis process. Typically, supervised approaches, unsupervised approaches and semi-supervised approaches are applied for behavior recognition purpose under a setting which has all of predefined behaviors. In the real situation, however, as mouses can show various types of behaviors, besides the predefined behaviors that we want to analyze, there are many undefined behaviors existing. Both supervised approaches and conventional semi-supervised approaches cannot identify these undefined behaviors. Though unsupervised approaches can detect these undefined behaviors, a post-hoc labeling is needed. In this paper, we propose a semi-supervised infinite Gaussian mixture model (SsIGMM), to incorporate both labeled and unlabelled information in learning process while considering undefined behaviors. It also generates the distribution of the predefined and undefined behaviors by mixture Gaussians, which can be used for further analysis. In our experiments, we confirmed the superiority of SsIGMM for segmenting and labelling mouse-behavior videos.

  Click for Model/Code and Paper
PointRNN: Point Recurrent Neural Network for Moving Point Cloud Processing

Nov 24, 2019
Hehe Fan, Yi Yang

In this paper, we introduce a Point Recurrent Neural Network (PointRNN) for moving point cloud processing. At each time step, PointRNN takes point coordinates $\boldsymbol{P} \in \mathbb{R}^{n \times 3}$ and point features $\boldsymbol{X} \in \mathbb{R}^{n \times d}$ as input ($n$ and $d$ denote the number of points and the number of feature channels, respectively). The state of PointRNN is composed of point coordinates $\boldsymbol{P}$ and point states $\boldsymbol{S} \in \mathbb{R}^{n \times d'}$ ($d'$ denotes the number of state channels). Similarly, the output of PointRNN is composed of $\boldsymbol{P}$ and new point features $\boldsymbol{Y} \in \mathbb{R}^{n \times d''}$ ($d''$ denotes the number of new feature channels). Since point clouds are orderless, point features and states from two time steps can not be directly operated. Therefore, a point-based spatiotemporally-local correlation is adopted to aggregate point features and states according to point coordinates. We further propose two variants of PointRNN, i.e., Point Gated Recurrent Unit (PointGRU) and Point Long Short-Term Memory (PointLSTM). We apply PointRNN, PointGRU and PointLSTM to moving point cloud prediction, which aims to predict the future trajectories of points in a set given their history movements. Experimental results show that PointRNN, PointGRU and PointLSTM are able to produce correct predictions on both synthetic and real-world datasets, demonstrating their ability to model point cloud sequences. The code has been released at \url{}.

* technical report 

  Click for Model/Code and Paper
Using Randomness to Improve Robustness of Machine-Learning Models Against Evasion Attacks

Aug 10, 2018
Fan Yang, Zhiyuan Chen

Machine learning models have been widely used in security applications such as intrusion detection, spam filtering, and virus or malware detection. However, it is well-known that adversaries are always trying to adapt their attacks to evade detection. For example, an email spammer may guess what features spam detection models use and modify or remove those features to avoid detection. There has been some work on making machine learning models more robust to such attacks. However, one simple but promising approach called {\em randomization} is underexplored. This paper proposes a novel randomization-based approach to improve robustness of machine learning models against evasion attacks. The proposed approach incorporates randomization into both model training time and model application time (meaning when the model is used to detect attacks). We also apply this approach to random forest, an existing ML method which already has some degree of randomness. Experiments on intrusion detection and spam filtering data show that our approach further improves robustness of random-forest method. We also discuss how this approach can be applied to other ML models.

  Click for Model/Code and Paper
Data Techniques For Online End-to-end Speech Recognition

Jan 24, 2020
Yang Chen, Weiran Wang, I-Fan Chen, Chao Wang

Practitioners often need to build ASR systems for new use cases in a short amount of time, given limited in-domain data. While recently developed end-to-end methods largely simplify the modeling pipelines, they still suffer from the data sparsity issue. In this work, we explore a few simple-to-implement techniques for building online ASR systems in an end-to-end fashion, with a small amount of transcribed data in the target domain. These techniques include data augmentation in the target domain, domain adaptation using models previously trained on a large source domain, and knowledge distillation on non-transcribed target domain data; they are applicable in real scenarios with different types of resources. Our experiments demonstrate that each technique is independently useful in the low-resource setting, and combining them yields significant improvement of the online ASR performance in the target domain.

* 5 pages, 1 figure 

  Click for Model/Code and Paper
GLA-Net: An Attention Network with Guided Loss for Mismatch Removal

Sep 28, 2019
Zhi Chen, Fan Yang, Wenbing Tao

Mismatch removal is a critical prerequisite in many feature-based tasks. Recent attempts cast the mismatch removal task as a binary classification problem and solve it through deep learning based methods. In these methods, the imbalance between positive and negative classes is important, which affects network performance, i.e., Fn-score. To establish the link between Fn-score and loss, we propose to guide the loss with the Fn-score directly. We theoretically demonstrate the direct link between our Guided Loss and Fn-score during training. Moreover, we discover that outliers often impair global context in mismatch removal networks. To address this issue, we introduce the attention mechanism to mismatch removal task and propose a novel Inlier Attention Block (IA Block). To evaluate the effectiveness of our loss and IA Block, we design an end-to-end network for mismatch removal, called GLA-Net \footnote{Our code will be available in Github later.}. Experiments have shown that our network achieves the state-of-the-art performance on benchmark datasets.

  Click for Model/Code and Paper
Distributed Machine Learning on Mobile Devices: A Survey

Sep 18, 2019
Renjie Gu, Shuo Yang, Fan Wu

In recent years, mobile devices have gained increasingly development with stronger computation capability and larger storage. Some of the computation-intensive machine learning and deep learning tasks can now be run on mobile devices. To take advantage of the resources available on mobile devices and preserve users' privacy, the idea of mobile distributed machine learning is proposed. It uses local hardware resources and local data to solve machine learning sub-problems on mobile devices, and only uploads computation results instead of original data to contribute to the optimization of the global model. This architecture can not only relieve computation and storage burden on servers, but also protect the users' sensitive information. Another benefit is the bandwidth reduction, as various kinds of local data can now participate in the training process without being uploaded to the server. In this paper, we provide a comprehensive survey on recent studies of mobile distributed machine learning. We survey a number of widely-used mobile distributed machine learning methods. We also present an in-depth discussion on the challenges and future directions in this area. We believe that this survey can demonstrate a clear overview of mobile distributed machine learning and provide guidelines on applying mobile distributed machine learning to real applications.

  Click for Model/Code and Paper
Evaluating Explanation Without Ground Truth in Interpretable Machine Learning

Aug 15, 2019
Fan Yang, Mengnan Du, Xia Hu

Interpretable Machine Learning (IML) has become increasingly important in many real-world applications, such as autonomous cars and medical diagnosis, where explanations are significantly preferred to help people better understand how machine learning systems work and further enhance their trust towards systems. However, due to the diversified scenarios and subjective nature of explanations, we rarely have the ground truth for benchmark evaluation in IML on the quality of generated explanations. Having a sense of explanation quality not only matters for assessing system boundaries, but also helps to realize the true benefits to human users in practical settings. To benchmark the evaluation in IML, in this article, we rigorously define the problem of evaluating explanations, and systematically review the existing efforts from state-of-the-arts. Specifically, we summarize three general aspects of explanation (i.e., generalizability, fidelity and persuasibility) with formal definitions, and respectively review the representative methodologies for each of them under different tasks. Further, a unified evaluation framework is designed according to the hierarchical needs from developers and end-users, which could be easily adopted for different scenarios in practice. In the end, open problems are discussed, and several limitations of current evaluation techniques are raised for future explorations.

  Click for Model/Code and Paper
Video Instance Segmentation

Jun 02, 2019
Linjie Yang, Yuchen Fan, Ning Xu

In this paper we present a new computer vision task, named video instance segmentation. The goal of this new task is simultaneous detection, segmentation and tracking of instances in videos. In words, it is the first time that the image instance segmentation problem is extended to the video domain. To facilitate research on this new task, we propose a large-scale benchmark called YouTube-VIS, which consists of 2883 high-resolution YouTube videos, a 40-category label set and 131k high-quality instance masks. In addition, we propose a novel algorithm called MaskTrack R-CNN for this task. Our new method introduces a new tracking branch to Mask R-CNN to jointly perform the detection, segmentation and tracking tasks simultaneously. Finally, we evaluate the proposed method and several strong baselines on our new dataset. Experimental results clearly demonstrate the advantages of the proposed algorithm and reveal insight for future improvement. We believe the video instance segmentation task will motivate the community along the line of research for video understanding.

* Tech report introducing the video instance segmentation task 

  Click for Model/Code and Paper
Optimal Control of Complex Systems through Variational Inference with a Discrete Event Decision Process

May 07, 2019
Wen Dong, Bo Liu, Fan Yang

Complex social systems are composed of interconnected individuals whose interactions result in group behaviors. Optimal control of a real-world complex system has many applications, including road traffic management, epidemic prevention, and information dissemination. However, such real-world complex system control is difficult to achieve because of high-dimensional and non-linear system dynamics, and the exploding state and action spaces for the decision maker. Prior methods can be divided into two categories: simulation-based and analytical approaches. Existing simulation approaches have high-variance in Monte Carlo integration, and the analytical approaches suffer from modeling inaccuracy. We adopted simulation modeling in specifying the complex dynamics of a complex system, and developed analytical solutions for searching optimal strategies in a complex network with high-dimensional state-action space. To capture the complex system dynamics, we formulate the complex social network decision making problem as a discrete event decision process. To address the curse of dimensionality and search in high-dimensional state action spaces in complex systems, we reduce control of a complex system to variational inference and parameter learning, introduce Bethe entropy approximation, and develop an expectation propagation algorithm. Our proposed algorithm leads to higher system expected rewards, faster convergence, and lower variance of value function in a real-world transportation scenario than state-of-the-art analytical and sampling approaches.

  Click for Model/Code and Paper
Cubic LSTMs for Video Prediction

Apr 20, 2019
Hehe Fan, Linchao Zhu, Yi Yang

Predicting future frames in videos has become a promising direction of research for both computer vision and robot learning communities. The core of this problem involves moving object capture and future motion prediction. While object capture specifies which objects are moving in videos, motion prediction describes their future dynamics. Motivated by this analysis, we propose a Cubic Long Short-Term Memory (CubicLSTM) unit for video prediction. CubicLSTM consists of three branches, i.e., a spatial branch for capturing moving objects, a temporal branch for processing motions, and an output branch for combining the first two branches to generate predicted frames. Stacking multiple CubicLSTM units along the spatial branch and output branch, and then evolving along the temporal branch can form a cubic recurrent neural network (CubicRNN). Experiment shows that CubicRNN produces more accurate video predictions than prior methods on both synthetic and real-world datasets.

* Accepted to AAAI-2019 

  Click for Model/Code and Paper
Tensor Methods for Additive Index Models under Discordance and Heterogeneity

Jul 17, 2018
Krishnakumar Balasubramanian, Jianqing Fan, Zhuoran Yang

Motivated by the sampling problems and heterogeneity issues common in high- dimensional big datasets, we consider a class of discordant additive index models. We propose method of moments based procedures for estimating the indices of such discordant additive index models in both low and high-dimensional settings. Our estimators are based on factorizing certain moment tensors and are also applicable in the overcomplete setting, where the number of indices is more than the dimensionality of the datasets. Furthermore, we provide rates of convergence of our estimator in both high and low-dimensional setting. Establishing such results requires deriving tensor operator norm concentration inequalities that might be of independent interest. Finally, we provide simulation results supporting our theory. Our contributions extend the applicability of tensor methods for novel models in addition to making progress on understanding theoretical properties of such tensor methods.

  Click for Model/Code and Paper
Privacy-Protective-GAN for Face De-identification

Jun 23, 2018
Yifan Wu, Fan Yang, Haibin Ling

Face de-identification has become increasingly important as the image sources are explosively growing and easily accessible. The advance of new face recognition techniques also arises people's concern regarding the privacy leakage. The mainstream pipelines of face de-identification are mostly based on the k-same framework, which bears critiques of low effectiveness and poor visual quality. In this paper, we propose a new framework called Privacy-Protective-GAN (PP-GAN) that adapts GAN with novel verificator and regulator modules specially designed for the face de-identification problem to ensure generating de-identified output with retained structure similarity according to a single input. We evaluate the proposed approach in terms of privacy protection, utility preservation, and structure similarity. Our approach not only outperforms existing face de-identification techniques but also provides a practical framework of adapting GAN with priors of domain knowledge.

  Click for Model/Code and Paper
Differentiable Learning of Logical Rules for Knowledge Base Reasoning

Nov 27, 2017
Fan Yang, Zhilin Yang, William W. Cohen

We study the problem of learning probabilistic first-order logical rules for knowledge base reasoning. This learning problem is difficult because it requires learning the parameters in a continuous space as well as the structure in a discrete space. We propose a framework, Neural Logic Programming, that combines the parameter and structure learning of first-order logical rules in an end-to-end differentiable model. This approach is inspired by a recently-developed differentiable logic called TensorLog, where inference tasks can be compiled into sequences of differentiable operations. We design a neural controller system that learns to compose these operations. Empirically, our method outperforms prior work on multiple knowledge base benchmark datasets, including Freebase and WikiMovies.

* Accepted at NIPS 2017 

  Click for Model/Code and Paper