Models, code, and papers for "Yan Wang":
The goal of precipitation nowcasting is to predict the future rainfall intensity in a local region over a relatively short period of time. Very few previous studies have examined this crucial and challenging weather forecasting problem from the machine learning perspective. In this paper, we formulate precipitation nowcasting as a spatiotemporal sequence forecasting problem in which both the input and the prediction target are spatiotemporal sequences. By extending the fully connected LSTM (FC-LSTM) to have convolutional structures in both the input-to-state and state-to-state transitions, we propose the convolutional LSTM (ConvLSTM) and use it to build an end-to-end trainable model for the precipitation nowcasting problem. Experiments show that our ConvLSTM network captures spatiotemporal correlations better and consistently outperforms FC-LSTM and the state-of-the-art operational ROVER algorithm for precipitation nowcasting.
With the goal of making high-resolution forecasts of regional rainfall, precipitation nowcasting has become an important and fundamental technology underlying various public services ranging from rainstorm warnings to flight safety. Recently, the Convolutional LSTM (ConvLSTM) model has been shown to outperform traditional optical flow based methods for precipitation nowcasting, suggesting that deep learning models have a huge potential for solving the problem. However, the convolutional recurrence structure in ConvLSTM-based models is location-invariant while natural motion and transformation (e.g., rotation) are location-variant in general. Furthermore, since deep-learning-based precipitation nowcasting is a newly emerging area, clear evaluation protocols have not yet been established. To address these problems, we propose both a new model and a benchmark for precipitation nowcasting. Specifically, we go beyond ConvLSTM and propose the Trajectory GRU (TrajGRU) model that can actively learn the location-variant structure for recurrent connections. Besides, we provide a benchmark that includes a real-world large-scale dataset from the Hong Kong Observatory, a new training loss, and a comprehensive evaluation protocol to facilitate future research and gauge the state of the art.
Sparse coding has been popularly used as an effective data representation method in various applications, such as computer vision, medical imaging and bioinformatics, etc. However, the conventional sparse coding algorithms and its manifold regularized variants (graph sparse coding and Laplacian sparse coding), learn the codebook and codes in a unsupervised manner and neglect the class information available in the training set. To address this problem, in this paper we propose a novel discriminative sparse coding method based on multi-manifold, by learning discriminative class-conditional codebooks and sparse codes from both data feature space and class labels. First, the entire training set is partitioned into multiple manifolds according to the class labels. Then, we formulate the sparse coding as a manifold-manifold matching problem and learn class-conditional codebooks and codes to maximize the manifold margins of different classes. Lastly, we present a data point-manifold matching error based strategy to classify the unlabeled data point. Experimental results on somatic mutations identification and breast tumors classification in ultrasonic images tasks demonstrate the efficacy of the proposed data representation-classification approach.
In this paper, we propose the problem of optimizing multivariate performance measures from multi-view data, and an effective method to solve it. This problem has two features: the data points are presented by multiple views, and the target of learning is to optimize complex multivariate performance measures. We propose to learn a linear discriminant functions for each view, and combine them to construct a overall multivariate mapping function for mult-view data. To learn the parameters of the linear dis- criminant functions of different views to optimize multivariate performance measures, we formulate a optimization problem. In this problem, we propose to minimize the complexity of the linear discriminant functions of each view, encourage the consistences of the responses of different views over the same data points, and minimize the upper boundary of a given multivariate performance measure. To optimize this problem, we employ the cutting-plane method in an iterative algorithm. In each iteration, we update a set of constrains, and optimize the mapping function parameter of each view one by one.
In this paper, we propose the problem of domain transfer structured output learn- ing and the first solution to solve it. The problem is defined on two different data domains sharing the same input and output spaces, named as source domain and target domain. The outputs are structured, and for the data samples of the source domain, the corresponding outputs are available, while for most data samples of the target domain, the corresponding outputs are missing. The input distributions of the two domains are significantly different. The problem is to learn a predictor for the target domain to predict the structured outputs from the input. Due to the limited number of outputs available for the samples form the target domain, it is difficult to directly learn the predictor from the target domain, thus it is necessary to use the output information available in source domain. We propose to learn the target domain predictor by adapting a auxiliary predictor trained by using source domain data to the target domain. The adaptation is implemented by adding a delta function on the basis of the auxiliary predictor. An algorithm is developed to learn the parameter of the delta function to minimize loss functions associat- ed with the predicted outputs against the true outputs of the data samples with available outputs of the target domain.
Nonnegative Matrix Factorization (NMF) has been a popular representation method for pattern classification problem. It tries to decompose a nonnegative matrix of data samples as the product of a nonnegative basic matrix and a nonnegative coefficient matrix, and the coefficient matrix is used as the new representation. However, traditional NMF methods ignore the class labels of the data samples. In this paper, we proposed a supervised novel NMF algorithm to improve the discriminative ability of the new representation. Using the class labels, we separate all the data sample pairs into within-class pairs and between-class pairs. To improve the discriminate ability of the new NMF representations, we hope that the maximum distance of the within-class pairs in the new NMF space could be minimized, while the minimum distance of the between-class pairs pairs could be maximized. With this criterion, we construct an objective function and optimize it with regard to basic and coefficient matrices and slack variables alternatively, resulting in a iterative algorithm.
Sparse coding has shown its power as an effective data representation method. However, up to now, all the sparse coding approaches are limited within the single domain learning problem. In this paper, we extend the sparse coding to cross domain learning problem, which tries to learn from a source domain to a target domain with significant different distribution. We impose the Maximum Mean Discrepancy (MMD) criterion to reduce the cross-domain distribution difference of sparse codes, and also regularize the sparse codes by the class labels of the samples from both domains to increase the discriminative ability. The encouraging experiment results of the proposed cross-domain sparse coding algorithm on two challenging tasks --- image classification of photograph and oil painting domains, and multiple user spam detection --- show the advantage of the proposed method over other cross-domain data representation methods.
We propose a state of the art method for intelligent object recognition and video surveillance based on human visual attention. Bottom up and top down attention are applied respectively in the process of acquiring interested object(saliency map) and object recognition. The revision of 4 channel PFT method is proposed for bottom up attention and enhances the speed and accuracy. Inhibit of return (IOR) is applied in judging the sequence of saliency object pop out. Euclidean distance of color distribution, object center coordinates and speed are considered in judging whether the target is match and suspicious. The extensive tests on videos and images show that our method in video analysis has high accuracy and fast speed compared with traditional method. The method can be applied into many fields such as video surveillance and security.
In this paper, we consider the clustering problem on images where each image contains patches in people and location domains. We exploit the correlation between people and location domains, and proposed a semi-supervised co-clustering algorithm to cluster images. Our algorithm updates the correlation links at the runtime, and produces clustering in both domains simultaneously. We conduct experiments in a manually collected dataset and a Flickr dataset. The result shows that the such correlation improves the clustering performance.
Session-based Recurrent Neural Networks (RNNs) are gaining increasing popularity for recommendation task, due to the high autocorrelation of user's behavior on the latest session and the effectiveness of RNN to capture the sequence order information. However, most existing session-based RNN recommender systems still solely focus on the short-term interactions within a single session and completely discard all the other long-term data across different sessions. While traditional Collaborative Filtering (CF) methods have many advanced research works on exploring long-term dependency, which show great value to be explored and exploited in deep learning models. Therefore, in this paper, we propose ASARS, a novel framework that effectively imports the temporal dynamics methodology in CF into session-based RNN system in DL, such that the temporal info can act as scalable weights by a parallel attentional network. Specifically, we first conduct an extensive data analysis to show the distribution and importance of such temporal interactions data both within sessions and across sessions. And then, our ASARS framework promotes two novel models: (1) an inter-session temporal dynamic model that captures the long-term user interaction for RNN recommender system. We integrate the time changes in session RNN and add user preferences as model drifting; and (2) a novel triangle parallel attention network that enhances the original RNN model by incorporating time information. Such triangle parallel network is also specially designed for realizing data argumentation in sequence-to-scalar RNN architecture, and thus it can be trained very efficiently. Our extensive experiments on four real datasets from different domains demonstrate the effectiveness and large improvement of ASARS for personalized recommendation.
We aim at developing and improving the imbalanced business risk modeling via jointly using proper evaluation criteria, resampling, cross-validation, classifier regularization, and ensembling techniques. Area Under the Receiver Operating Characteristic Curve (AUC of ROC) is used for model comparison based on 10-fold cross validation. Two undersampling strategies including random undersampling (RUS) and cluster centroid undersampling (CCUS), as well as two oversampling methods including random oversampling (ROS) and Synthetic Minority Oversampling Technique (SMOTE), are applied. Three highly interpretable classifiers, including logistic regression without regularization (LR), L1-regularized LR (L1LR), and decision tree (DT) are implemented. Two ensembling techniques, including Bagging and Boosting, are applied on the DT classifier for further model improvement. The results show that, Boosting on DT by using the oversampled data containing 50% positives via SMOTE is the optimal model and it can achieve AUC, recall, and F1 score valued 0.8633, 0.9260, and 0.8907, respectively.
With the fast development of peer to peer (P2P) lending, financial institutions have a substantial challenge from benefit loss due to the delinquent behaviors of the money borrowers. Therefore, having a comprehensive understanding of the changing trend of the default rate in the P2P domain is crucial. In this paper, we comprehensively study the changing trend of the default rate of P2P USA market at the aggregative level from August 2007 to January 2016. From the data visualization perspective, we found that three features, including delinq_2yrs, recoveries, and collection_recovery_fee, could potentially increase the default rate. The long short-term memory (LSTM) approach shows its great potential in modeling the P2P transaction data. Furthermore, incorporating the macroeconomic feature unemp_rate can improve the LSTM performance by decreasing RMSE on both training and testing datasets. Our study can broaden the applications of LSTM approach in the P2P market.
This paper aims to explore models based on the extreme gradient boosting (XGBoost) approach for business risk classification. Feature selection (FS) algorithms and hyper-parameter optimizations are simultaneously considered during model training. The five most commonly used FS methods including weight by Gini, weight by Chi-square, hierarchical variable clustering, weight by correlation, and weight by information are applied to alleviate the effect of redundant features. Two hyper-parameter optimization approaches, random search (RS) and Bayesian tree-structured Parzen Estimator (TPE), are applied in XGBoost. The effect of different FS and hyper-parameter optimization methods on the model performance are investigated by the Wilcoxon Signed Rank Test. The performance of XGBoost is compared to the traditionally utilized logistic regression (LR) model in terms of classification accuracy, area under the curve (AUC), recall, and F1 score obtained from the 10-fold cross validation. Results show that hierarchical clustering is the optimal FS method for LR while weight by Chi-square achieves the best performance in XG-Boost. Both TPE and RS optimization in XGBoost outperform LR significantly. TPE optimization shows a superiority over RS since it results in a significantly higher accuracy and a marginally higher AUC, recall and F1 score. Furthermore, XGBoost with TPE tuning shows a lower variability than the RS method. Finally, the ranking of feature importance based on XGBoost enhances the model interpretation. Therefore, XGBoost with Bayesian TPE hyper-parameter optimization serves as an operative while powerful approach for business risk modeling.
In this paper, we study the problem of transfer learning with the attribute data. In the transfer learning problem, we want to leverage the data of the auxiliary and the target domains to build an effective model for the classification problem in the target domain. Meanwhile, the attributes are naturally stable cross different domains. This strongly motives us to learn effective domain transfer attribute representations. To this end, we proposed to embed the attributes of the data to a common space by using the powerful convolutional neural network (CNN) model. The convolutional representations of the data points are mapped to the corresponding attributes so that they can be effective embedding of the attributes. We also represent the data of different domains by a domain-independent CNN, ant a domain-specific CNN, and combine their outputs with the attribute embedding to build the classification model. An joint learning framework is constructed to minimize the classification errors, the attribute mapping error, the mismatching of the domain-independent representations cross different domains, and to encourage the the neighborhood smoothness of representations in the target domain. The minimization problem is solved by an iterative algorithm based on gradient descent. Experiments over benchmark data sets of person re-identification, bankruptcy prediction, and spam email detection, show the effectiveness of the proposed method.
Knowing the reflection of game theory and ethics, we develop a mathematical representation to bridge the gap between the concepts in moral philosophy (e.g., Kantian and Utilitarian) and AI ethics industry technology standard (e.g., IEEE P7000 standard series for Ethical AI). As an application, we demonstrate how human value can be obtained from the experimental game theory (e.g., trust game experiment) so as to build an ethical AI. Moreover, an approach to test the ethics (rightness or wrongness) of a given AI algorithm by using an iterated Prisoner's Dilemma Game experiment is discussed as an example. Compared with existing mathematical frameworks and testing method on AI ethics technology, the advantages of the proposed approach are analyzed.
Effective image and sentence matching depends on how to well measure their global visual-semantic similarity. Based on the observation that such a global similarity arises from a complex aggregation of multiple local similarities between pairwise instances of image (objects) and sentence (words), we propose a selective multimodal Long Short-Term Memory network (sm-LSTM) for instance-aware image and sentence matching. The sm-LSTM includes a multimodal context-modulated attention scheme at each timestep that can selectively attend to a pair of instances of image and sentence, by predicting pairwise instance-aware saliency maps for image and sentence. For selected pairwise instances, their representations are obtained based on the predicted saliency maps, and then compared to measure their local similarity. By similarly measuring multiple local similarities within a few timesteps, the sm-LSTM sequentially aggregates them with hidden states to obtain a final matching score as the desired global similarity. Extensive experiments show that our model can well match image and sentence with complex content, and achieve the state-of-the-art results on two public benchmark datasets.
While perception tasks such as visual object recognition and text understanding play an important role in human intelligence, the subsequent tasks that involve inference, reasoning and planning require an even higher level of intelligence. The past few years have seen major advances in many perception tasks using deep learning models. For higher-level inference, however, probabilistic graphical models with their Bayesian nature are still more powerful and flexible. To achieve integrated intelligence that involves both perception and inference, it is naturally desirable to tightly integrate deep learning and Bayesian models within a principled probabilistic framework, which we call Bayesian deep learning. In this unified framework, the perception of text or images using deep learning can boost the performance of higher-level inference and in return, the feedback from the inference process is able to enhance the perception of text or images. This paper proposes a general framework for Bayesian deep learning and reviews its recent applications on recommender systems, topic models, and control. In this paper, we also discuss the relationship and differences between Bayesian deep learning and other related topics like Bayesian treatment of neural networks.
To what extent is the success of deep visualization due to the training? Could we do deep visualization using untrained, random weight networks? To address this issue, we explore new and powerful generative models for three popular deep visualization tasks using untrained, random weight convolutional neural networks. First we invert representations in feature spaces and reconstruct images from white noise inputs. The reconstruction quality is statistically higher than that of the same method applied on well trained networks with the same architecture. Next we synthesize textures using scaled correlations of representations in multiple layers and our results are almost indistinguishable with the original natural texture and the synthesized textures based on the trained network. Third, by recasting the content of an image in the style of various artworks, we create artistic images with high perceptual quality, highly competitive to the prior work of Gatys et al. on pretrained networks. To our knowledge this is the first demonstration of image representations using untrained deep neural networks. Our work provides a new and fascinating tool to study the representation of deep network architecture and sheds light on new understandings on deep visualization.
While perception tasks such as visual object recognition and text understanding play an important role in human intelligence, the subsequent tasks that involve inference, reasoning and planning require an even higher level of intelligence. The past few years have seen major advances in many perception tasks using deep learning models. For higher-level inference, however, probabilistic graphical models with their Bayesian nature are still more powerful and flexible. To achieve integrated intelligence that involves both perception and inference, it is naturally desirable to tightly integrate deep learning and Bayesian models within a principled probabilistic framework, which we call Bayesian deep learning. In this unified framework, the perception of text or images using deep learning can boost the performance of higher-level inference and in return, the feedback from the inference process is able to enhance the perception of text or images. This survey provides a general introduction to Bayesian deep learning and reviews its recent applications on recommender systems, topic models, and control. In this survey, we also discuss the relationship and differences between Bayesian deep learning and other related topics like Bayesian treatment of neural networks.
Popular Hough Transform-based object detection approaches usually construct an appearance codebook by clustering local image features. However, how to choose appropriate values for the parameters used in the clustering step remains an open problem. Moreover, some popular histogram features extracted from overlapping image blocks may cause a high degree of redundancy and multicollinearity. In this paper, we propose a novel Hough Transform-based object detection approach. First, to address the above issues, we exploit a Bridge Partial Least Squares (BPLS) technique to establish context-encoded Hough Regression Models (HRMs), which are linear regression models that cast probabilistic Hough votes to predict object locations. BPLS is an efficient variant of Partial Least Squares (PLS). PLS-based regression techniques (including BPLS) can reduce the redundancy and eliminate the multicollinearity of a feature set. And the appropriate value of the only parameter used in PLS (i.e., the number of latent components) can be determined by using a cross-validation procedure. Second, to efficiently handle object scale changes, we propose a novel multi-scale voting scheme. In this scheme, multiple Hough images corresponding to multiple object scales can be obtained simultaneously. Third, an object in a test image may correspond to multiple true and false positive hypotheses at different scales. Based on the proposed multi-scale voting scheme, a principled strategy is proposed to fuse hypotheses to reduce false positives by evaluating normalized pointwise mutual information between hypotheses. In the experiments, we also compare the proposed HRM approach with its several variants to evaluate the influences of its components on its performance. Experimental results show that the proposed HRM approach has achieved desirable performances on popular benchmark datasets.