Models, code, and papers for "Wang Zhu":
Penalized estimation can conduct variable selection and parameter estimation simultaneously. The general framework is to minimize a loss function subject to a penalty designed to generate sparse variable selection. Much of the previous work have focused on convex loss functions including generalized linear models. When data are contaminated with noise, robust loss functions are typically introduced. Recent literature has witnessed a growing impact of nonconvex loss-based methods, which can generate robust estimation for data contaminated with outliers. This article investigates robust variable selection based on penalized nonconvex loss functions. We investigate properties of the local and global minimizers of the original penalized loss function and the surrogate penalized loss function induced by the majorization-minimization (MM) algorithm for numerical computation. We establish convergence theory of the proposed MM algorithm for penalized convex and nonconvex loss functions. Performance of the proposed algorithms for regression and classification problems are evaluated on simulated and real data including healthcare costs and cancer clinical status. Efficient implementations of the algorithms are available in the R package mpath in CRAN.
Many real-world datasets are labeled with natural orders, i.e., ordinal labels. Ordinal regression is a method to predict ordinal labels that finds a wide range of applications in data-rich science domains, such as medical, social and economic sciences. Most existing approaches work well for a single ordinal regression task. However, they ignore the task relatedness when there are multiple related tasks. Multi-task learning (MTL) provides a framework to encode task relatedness, to bridge data from all tasks, and to simultaneously learn multiple related tasks to improve the generalization performance. Even though MTL methods have been extensively studied, there is barely existing work investigating MTL for data with ordinal labels. We tackle multiple ordinal regression problems via sparse and deep multi-task approaches, i.e., two regularized multi-task ordinal regression (RMTOR) models for small datasets and two deep neural networks based multi-task ordinal regression (DMTOR) models for large-scale datasets. The performance of the proposed multi-task ordinal regression models (MTOR) is demonstrated on three real-world medical datasets for multi-stage disease diagnosis. Our experimental results indicate that our proposed MTOR models markedly improve the prediction performance comparing with single-task learning (STL) ordinal regression models.
In this paper, we propose an end-to-end image clustering auto-encoder algorithm: ICAE. The algorithm uses PEDCC (Predefined Evenly-Distributed Class Centroids) as the clustering centers of the images, which ensures the inter-class distance of latent features is maximal, and adds data distribution constraint, data augmentation constraint, auto-encoder reconstruction loss constraint and latent features plus noise constraint to improve clustering performance. Specifically, we perform one-to-one data augmentation such as rotation, shear, and shift before data is input to the encoder to learn the more effective features. The data and the enhanced data are simultaneously input into the auto-encoder to obtain latent features and augmented latent features whose similarity are constrained by an augmentation loss. Then, making use of the MMD distance, we combine the latent features and augmented latent features to make their distribution close to the PEDCC distribution (uniform distribution between classes, Dirac distribution within the class) to further learn the features used for clustering. At the same time, the MSE of the original input image and reconstructed image is used as reconstruction constraint, and the noise is added to the latent features to build generalization constraint to improve the generalization ability. Finally, extensive experiments on three common datasets MNIST, Fashion-MNIST, COIL20 are conducted. The experimental results show that the algorithm has achieved the best clustering results so far, and also has good generalization ability. In addition, we can use the pre-defined PEDCC class centers, and the decoding module of the auto-encoder to clearly generate the samples of each class. The code can be downloaded at xxx!
To compress deep convolutional neural networks (CNNs) with large memory footprint and long inference time, this paper proposes a novel pruning criterion using layer-wised Ln-norm of feature maps. Different from existing pruning criteria, which are mainly based on L1-norm of convolution kernels, the proposed method utilizes Ln-norm of output feature maps after non-linear activations, where n is a variable, increasing from 1 at the first convolution layer to inf at the last convolution layer. With the ability of accurately identifying unimportant convolution kernels, the proposed method achieves a good balance between model size and inference accuracy. The experiments on ImageNet and the successful application in railway surveillance system show that the proposed method outperforms existing kernel-norm-based methods and is generally applicable to any deep neural network with convolution operations.
Consumers' purchase decisions are increasingly influenced by user-generated online reviews. Accordingly, there has been growing concern about the potential for posting deceptive opinion spam fictitious reviews that have been deliberately written to sound authentic, to deceive the readers. Existing approaches mainly focus on developing automatic supervised learning based methods to help users identify deceptive opinion spams. This work, we used the LSI and Sprinkled LSI technique to reduce the dimension for deception detection. We make our contribution to demonstrate what LSI is capturing in latent semantic space and reveal how deceptive opinions can be recognized automatically from truthful opinions. Finally, we proposed a voting scheme which integrates different approaches to further improve the classification performance.
Rough sets are efficient for data pre-processing in data mining. As a generalization of the linear independence in vector spaces, matroids provide well-established platforms for greedy algorithms. In this paper, we apply rough sets to matroids and study the contraction of the dual of the corresponding matroid. First, for an equivalence relation on a universe, a matroidal structure of the rough set is established through the lower approximation operator. Second, the dual of the matroid and its properties such as independent sets, bases and rank function are investigated. Finally, the relationships between the contraction of the dual matroid to the complement of a single point set and the contraction of the dual matroid to the complement of the equivalence class of this point are studied.
This study proposes a framework for human-like autonomous car-following planning based on deep reinforcement learning (deep RL). Historical driving data are fed into a simulation environment where an RL agent learns from trial and error interactions based on a reward function that signals how much the agent deviates from the empirical data. Through these interactions, an optimal policy, or car-following model that maps in a human-like way from speed, relative speed between a lead and following vehicle, and inter-vehicle spacing to acceleration of a following vehicle is finally obtained. The model can be continuously updated when more data are fed in. Two thousand car-following periods extracted from the 2015 Shanghai Naturalistic Driving Study were used to train the model and compare its performance with that of traditional and recent data-driven car-following models. As shown by this study results, a deep deterministic policy gradient car-following model that uses disparity between simulated and observed speed as the reward function and considers a reaction delay of 1s, denoted as DDPGvRT, can reproduce human-like car-following behavior with higher accuracy than traditional and recent data-driven car-following models. Specifically, the DDPGvRT model has a spacing validation error of 18% and speed validation error of 5%, which are less than those of other models, including the intelligent driver model, models based on locally weighted regression, and conventional neural network-based models. Moreover, the DDPGvRT demonstrates good capability of generalization to various driving situations and can adapt to different drivers by continuously learning. This study demonstrates that reinforcement learning methodology can offer insight into driver behavior and can contribute to the development of human-like autonomous driving algorithms and traffic-flow models.
In the generalized zero-shot learning, synthesizing unseen data with generative models has been the most popular method to address the imbalance of training data between seen and unseen classes. However, this method requires that the unseen semantic information is available during the training stage, and training generative models is not trivial. Given that the generator of these models can only be trained with seen classes, we argue that synthesizing unseen data may not be an ideal approach for addressing the domain shift caused by the imbalance of the training data. In this paper, we propose to realize the generalized zero-shot recognition in different domains. Thus, unseen (seen) classes can avoid the effect of the seen (unseen) classes. In practice, we propose a threshold and probabilistic distribution joint method to segment the testing instances into seen, unseen and uncertain domains. Moreover, the uncertain domain is further adjusted to alleviate the domain shift. Extensive experiments on five benchmark datasets show that the proposed method exhibits competitive performance compared with that based on generative models.
Mining graph data has become a popular research topic in computer science and has been widely studied in both academia and industry given the increasing amount of network data in the recent years. However, the huge amount of network data has posed great challenges for efficient analysis. This motivates the advent of graph representation which maps the graph into a low-dimension vector space, keeping original graph structure and supporting graph inference. The investigation on efficient representation of a graph has profound theoretical significance and important realistic meaning, we therefore introduce some basic ideas in graph representation/network embedding as well as some representative models in this chapter.
Zero-shot detection, namely, localizing both seen and unseen objects, increasingly gains importance for large-scale applications, with large number of object classes, since, collecting sufficient annotated data with ground truth bounding boxes is simply not scalable. While vanilla deep neural networks deliver high performance for objects available during training, unseen object detection degrades significantly. At a fundamental level, while vanilla detectors are capable of proposing bounding boxes, which include unseen objects, they are often incapable of assigning high-confidence to unseen objects, due to the inherent precision/recall tradeoffs that requires rejecting background objects. We propose a novel detection algorithm Dont Even Look Once (DELO), that synthesizes visual features for unseen objects and augments existing training algorithms to incorporate unseen object detection. Our proposed scheme is evaluated on Pascal VOC and MSCOCO, and we demonstrate significant improvements in test accuracy over vanilla and other state-of-art zero-shot detectors
With the rapid development of Internet and multimedia services in the past decade, a huge amount of user-generated and service provider-generated multimedia data become available. These data are heterogeneous and multi-modal in nature, imposing great challenges for processing and analyzing them. Multi-modal data consist of a mixture of various types of data from different modalities such as texts, images, videos, audios etc. In this article, we present a deep and comprehensive overview for multi-modal analysis in multimedia. We introduce two scientific research problems, data-driven correlational representation and knowledge-guided fusion for multimedia analysis. To address the two scientific problems, we investigate them from the following aspects: 1) multi-modal correlational representation: multi-modal fusion of data across different modalities, and 2) multi-modal data and knowledge fusion: multi-modal fusion of data with domain knowledge. More specifically, on data-driven correlational representation, we highlight three important categories of methods, such as multi-modal deep representation, multi-modal transfer learning, and multi-modal hashing. On knowledge-guided fusion, we discuss the approaches for fusing knowledge with data and four exemplar applications that require various kinds of domain knowledge, including multi-modal visual question answering, multi-modal video summarization, multi-modal visual pattern mining and multi-modal recommendation. Finally, we bring forward our insights and future research directions.
In this work we develop a distributed least squares approximation (DLSA) method, which is able to solve a large family of regression problems (e.g., linear regression, logistic regression, Cox's model) on a distributed system. By approximating the local objective function using a local quadratic form, we are able to obtain a combined estimator by taking a weighted average of local estimators. The resulting estimator is proved to be statistically as efficient as the global estimator. In the meanwhile it requires only one round of communication. We further conduct the shrinkage estimation based on the DLSA estimation by using an adaptive Lasso approach. The solution can be easily obtained by using the LARS algorithm on the master node. It is theoretically shown that the resulting estimator enjoys the oracle property and is selection consistent by using a newly designed distributed Bayesian Information Criterion (DBIC). The finite sample performance as well as the computational efficiency are further illustrated by extensive numerical study and an airline dataset. The airline dataset is 52GB in memory size. The entire methodology has been implemented by Python for a de-facto standard Spark system. By using the proposed DLSA algorithm on the Spark system, it takes 26 minutes to obtain a logistic regression estimator whereas a full likelihood algorithm takes 15 hours to reaches an inferior result.
In recent years, perceptual-quality driven super-resolution methods show satisfactory results. However, super-resolved images have uncertain texture details and unpleasant artifact. We build a novel perceptual loss function composed of morphological components adversarial loss and color adversarial loss and salient content loss to ameliorate these problems. The adversarial loss is applied to constrain color and morphological components distribution of super-resolved images and the salient content loss highlights the perceptual similarity of feature-rich regions. Experiments show that proposed method achieves significant improvements in terms of perceptual index and visual quality compared with the state-of-the-art methods.
Intent classification and slot filling are two essential tasks for natural language understanding. They often suffer from small-scale human-labeled training data, resulting in poor generalization capability, especially for rare words. Recently a new language representation model, BERT (Bidirectional Encoder Representations from Transformers), facilitates pre-training deep bidirectional representations on large-scale unlabeled corpora, and has created state-of-the-art models for a wide variety of natural language processing tasks after simple fine-tuning. However, there has not been much effort on exploring BERT for natural language understanding. In this work, we propose a joint intent classification and slot filling model based on BERT. Experimental results demonstrate that our proposed model achieves significant improvement on intent classification accuracy, slot filling F1, and sentence-level semantic frame accuracy on several public benchmark datasets, compared to the attention-based recurrent neural network models and slot-gated models.
In computer vision applications, such as domain adaptation (DA), few shot learning (FSL) and zero-shot learning (ZSL), we encounter new objects and environments, for which insufficient examples exist to allow for training "models from scratch," and methods that adapt existing models, trained on the presented training environment, to the new scenario are required. We propose a novel visual attribute encoding method that encodes each image as a low-dimensional probability vector composed of prototypical part-type probabilities. The prototypes are learnt to be representative of all training data. At test-time we utilize this encoding as an input to a classifier. At test-time we freeze the encoder and only learn/adapt the classifier component to limited annotated labels in FSL; new semantic attributes in ZSL. We conduct extensive experiments on benchmark datasets. Our method outperforms state-of-art methods trained for the specific contexts (ZSL, FSL, DA).
The popularity of mobile devices results in the availability of enormous data and computational resources at the network edge. To leverage the data and resources, a new machine learning paradigm, called edge learning, has emerged where learning algorithms are deployed at the edge for providing fast and intelligent services to mobile users. While computing speeds are advancing rapidly, the communication latency is becoming the bottleneck of fast edge learning. To address this issue, this work is focused on designing a low latency multi-access scheme for edge learning. We consider a popular framework, federated edge learning (FEEL), where edge-server and on-device learning are synchronized to train a model without violating user-data privacy. It is proposed that model updates simultaneously transmitted by devices over broadband channels should be analog aggregated "over-the-air" by exploiting the superposition property of a multi-access channel. Thereby, "interference" is harnessed to provide fast implementation of the model aggregation. This results in dramatical latency reduction compared with the traditional orthogonal access (i.e., OFDMA). In this work, the performance of FEEL is characterized targeting a single-cell random network. First, due to power alignment between devices as required for aggregation, a fundamental tradeoff is shown to exist between the update-reliability and the expected update-truncation ratio. This motivates the design of an opportunistic scheduling scheme for FEEL that selects devices within a distance threshold. This scheme is shown using real datasets to yield satisfactory learning performance in the presence of high mobility. Second, both the multi-access latency of the proposed analog aggregation and the OFDMA scheme are analyzed. Their ratio, which quantifies the latency reduction of the former, is proved to scale almost linearly with device population.
We propose a novel Generalized Zero-Shot learning (GZSL) method that is agnostic to both unseen images and unseen semantic vectors during training. Prior works in this context propose to map high-dimensional visual features to the semantic domain, we believe contributes to the semantic gap. To bridge the gap, we propose a novel low-dimensional embedding of visual instances that is "visually semantic." Analogous to semantic data that quantifies the existence of an attribute in the presented instance, components of our visual embedding quantifies existence of a prototypical part-type in the presented instance. In parallel, as a thought experiment, we quantify the impact of noisy semantic data by utilizing a novel visual oracle to visually supervise a learner. These factors, namely semantic noise, visual-semantic gap and label noise lead us to propose a new graphical model for inference with pairwise interactions between label, semantic data, and inputs. We tabulate results on a number of benchmark datasets demonstrating significant improvement in accuracy over state-of-the-art under both semantic and visual supervision.
Deep transfer learning has acquired significant research interest. It makes use of pre-trained models that are learned from a source domain, and utilizes these models for the tasks in a target domain. Model-based deep transfer learning is arguably the most frequently used method. However, very little work has been devoted to enhancing deep transfer learning by focusing on the influence of data. In this work, we propose an instance-based approach to improve deep transfer learning in target domain. Specifically, we choose a pre-trained model which is learned from a source domain, and utilize this model to estimate the influence of each training sample in a target domain. Then we optimize training data of the target domain by removing the training samples that will lower the performance of the pre-trained model. We then fine-tune the pre-trained model with the optimized training data in the target domain, or build a new model which can be initialized partially based on the pre-trained model, and fine-tune it with the optimized training data in the target domain. Using this approach, transfer learning can help deep learning models to learn more useful features. Extensive experiments demonstrate the effectiveness of our approach on further boosting deep learning models for typical high-level computer vision tasks, such as image classification.
In this paper, we propose a new pipeline of training a monocular UAV to fly a collision-free trajectory along the dense forest trail. As gathering high-precision images in the real world is expensive and the off-the-shelf dataset has some deficiencies, we collect a new dense forest trail dataset in a variety of simulated environment in Unreal Engine. Then we formulate visual perception of forests as a classification problem. A ResNet-18 model is trained to decide the moving direction frame by frame. To transfer the learned strategy to the real world, we construct a ResNet-18 adaptation model via multi-kernel maximum mean discrepancies to leverage the relevant labelled data and alleviate the discrepancy between simulated and real environment. Simulation and real-world flight with a variety of appearance and environment changes are both tested. The ResNet-18 adaptation and its variant model achieve the best result of 84.08% accuracy in reality.
A multitude of publicly-available driving datasets and data platforms have been raised for autonomous vehicles (AV). However, the heterogeneities of databases in size, structure and driving context make existing datasets practically ineffective due to a lack of uniform frameworks and searchable indexes. In order to overcome these limitations on existing public datasets, this paper proposes a data unification framework based on traffic primitives with ability to automatically unify and label heterogeneous traffic data. This is achieved by two steps: 1) Carefully arrange raw multidimensional time series driving data into a relational database and then 2) automatically extract labeled and indexed traffic primitives from traffic data through a Bayesian nonparametric learning method. Finally, we evaluate the effectiveness of our developed framework using the collected real vehicle data.