Models, code, and papers for "Rui Wang":
Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this paper, we give a comprehensive survey of the state-of-the-art domain adaptation techniques for NMT.
Cost-sensitive learning relies on the availability of a known and fixed cost matrix. However, in some scenarios, the cost matrix is uncertain during training, and re-train a classifier after the cost matrix is specified would not be an option. For binary classification, this issue can be successfully addressed by methods maximizing the Area Under the ROC Curve (AUC) metric. Since the AUC can measure performance of base classifiers independent of cost during training, and a larger AUC is more likely to lead to a smaller total cost in testing using the threshold moving method. As an extension of AUC to multi-class problems, MAUC has attracted lots of attentions and been widely used. Although MAUC also measures performance of base classifiers independent of cost, it is unclear whether a larger MAUC of classifiers is more likely to lead to a smaller total cost. In fact, it is also unclear what kinds of post-processing methods should be used in multi-class problems to convert base classifiers into discrete classifiers such that the total cost is as small as possible. In the paper, we empirically explore the relationship between MAUC and the total cost of classifiers by applying two categories of post-processing methods. Our results suggest that a larger MAUC is also beneficial. Interestingly, simple calibration methods that convert the output matrix into posterior probabilities perform better than existing sophisticated post re-optimization methods.
The recent developments of basis pursuit and compressed sensing seek to extract information from as few samples as possible. In such applications, since the number of samples is restricted, one should deploy the sampling points wisely. We are motivated to study the optimal distribution of finite sampling points. Formulation under the framework of optimal reconstruction yields a minimization problem. In the discrete case, we estimate the distance between the optimal subspace resulting from a general Karhunen-Loeve transform and the kernel space to obtain another algorithm that is computationally favorable. Numerical experiments are then presented to illustrate the performance of the algorithms for the searching of optimal sampling points.
Many studies on the cost-sensitive learning assumed that a unique cost matrix is known for a problem. However, this assumption may not hold for many real-world problems. For example, a classifier might need to be applied in several circumstances, each of which associates with a different cost matrix. Or, different human experts have different opinions about the costs for a given problem. Motivated by these facts, this study aims to seek the minimax classifier over multiple cost matrices. In summary, we theoretically proved that, no matter how many cost matrices are involved, the minimax problem can be tackled by solving a number of standard cost-sensitive problems and sub-problems that involve only two cost matrices. As a result, a general framework for achieving minimax classifier over multiple cost matrices is suggested and justified by preliminary empirical studies.
Feature selection is an important pre-processing step for many pattern classification tasks. Traditionally, feature selection methods are designed to obtain a feature subset that can lead to high classification accuracy. However, classification accuracy has recently been shown to be an inappropriate performance metric of classification systems in many cases. Instead, the Area Under the receiver operating characteristic Curve (AUC) and its multi-class extension, MAUC, have been proved to be better alternatives. Hence, the target of classification system design is gradually shifting from seeking a system with the maximum classification accuracy to obtaining a system with the maximum AUC/MAUC. Previous investigations have shown that traditional feature selection methods need to be modified to cope with this new objective. These methods most often are restricted to binary classification problems only. In this study, a filter feature selection method, namely MAUC Decomposition based Feature Selection (MDFS), is proposed for multi-class classification problems. To the best of our knowledge, MDFS is the first method specifically designed to select features for building classification systems with maximum MAUC. Extensive empirical results demonstrate the advantage of MDFS over several compared feature selection methods.
To extract the structured representations of open-domain events, Bayesian graphical models have made some progress. However, these approaches typically assume that all words in a document are generated from a single event. While this may be true for short text such as tweets, such an assumption does not generally hold for long text such as news articles. Moreover, Bayesian graphical models often rely on Gibbs sampling for parameter inference which may take long time to converge. To address these limitations, we propose an event extraction model based on Generative Adversarial Nets, called Adversarial-neural Event Model (AEM). AEM models an event with a Dirichlet prior and uses a generator network to capture the patterns underlying latent events. A discriminator is used to distinguish documents reconstructed from the latent events and the original documents. A byproduct of the discriminator is that the features generated by the learned discriminator network allow the visualization of the extracted events. Our model has been evaluated on two Twitter datasets and a news article dataset. Experimental results show that our model outperforms the baseline approaches on all the datasets, with more significant improvements observed on the news article dataset where an increase of 15\% is observed in F-measure.
The success of neural summarization models stems from the meticulous encodings of source articles. To overcome the impediments of limited and sometimes noisy training data, one promising direction is to make better use of the available training data by applying filters during summarization. In this paper, we propose a novel Bi-directional Selective Encoding with Template (BiSET) model, which leverages template discovered from training data to softly select key information from each source article to guide its summarization process. Extensive experiments on a standard summarization dataset were conducted and the results show that the template-equipped BiSET model manages to improve the summarization performance significantly with a new state of the art.
Unsupervised domain adaptation seeks to learn an invariant and discriminative representation for an unlabeled target domain by leveraging the information of a labeled source dataset. We propose to improve the discriminative ability of the target domain representation by simultaneously learning tightly clustered target representations while encouraging that each cluster is assigned to a unique and different class from the source. This strategy alleviates the effects of negative transfer when combined with adversarial domain matching between source and target representations. Our approach is robust to differences in the source and target label distributions and thus applicable to both balanced and imbalanced domain adaptation tasks, and with a simple extension, it can also be used for partial domain adaptation. Experiments on several benchmark datasets for domain adaptation demonstrate that our approach can achieve state-of-the-art performance in all three scenarios, namely, balanced, imbalanced and partial domain adaptation.
Topic models are widely used for thematic structure discovery in text. But traditional topic models often require dedicated inference procedures for specific tasks at hand. Also, they are not designed to generate word-level semantic representations. To address these limitations, we propose a topic modeling approach based on Generative Adversarial Nets (GANs), called Adversarial-neural Topic Model (ATM). The proposed ATM models topics with Dirichlet prior and employs a generator network to capture the semantic patterns among latent topics. Meanwhile, the generator could also produce word-level semantic representations. To illustrate the feasibility of porting ATM to tasks other than topic modeling, we apply ATM for open domain event extraction. Our experimental results on the two public corpora show that ATM generates more coherence topics, outperforming a number of competitive baselines. Moreover, ATM is able to extract meaningful events from news articles.
Our research topic is Human segmentation with static camera. This topic can be divided into three sub-tasks, which are object detection, instance identification and segmentation. These sub-tasks are three closely related subjects. The development of each subject has great impact on the other two fields. In this literature review, we will first introduce the background of human segmentation and then talk about issues related to the above three fields as well as how they interact with each other.
As a fundamental NLP task, semantic role labeling (SRL) aims to discover the semantic roles for each predicate within one sentence. This paper investigates how to incorporate syntactic knowledge into the SRL task effectively. We present different approaches of encoding the syntactic information derived from dependency trees of different quality and representations; we propose a syntax-enhanced self-attention model and compare it with other two strong baseline methods; and we conduct experiments with newly published deep contextualized word representations as well. The experiment results demonstrate that with proper incorporation of the high quality syntactic information, our model achieves a new state-of-the-art performance for the Chinese SRL task on the CoNLL-2009 dataset.
The importance of wild video based image set recognition is becoming monotonically increasing. However, the contents of these collected videos are often complicated, and how to efficiently perform set modeling and feature extraction is a big challenge for set-based classification algorithms. In recent years, some proposed image set classification methods have made a considerable advance by modeling the original image set with covariance matrix, linear subspace, or Gaussian distribution. As a matter of fact, most of them just adopt a single geometric model to describe each given image set, which may lose some other useful information for classification. To tackle this problem, we propose a novel algorithm to model each image set from a multi-geometric perspective. Specifically, the covariance matrix, linear subspace, and Gaussian distribution are applied for set representation simultaneously. In order to fuse these multiple heterogeneous Riemannian manifoldvalued features, the well-equipped Riemannian kernel functions are first utilized to map them into high dimensional Hilbert spaces. Then, a multi-kernel metric learning framework is devised to embed the learned hybrid kernels into a lower dimensional common subspace for classification. We conduct experiments on four widely used datasets corresponding to four different classification tasks: video-based face recognition, set-based object categorization, video-based emotion recognition, and dynamic scene classification, to evaluate the classification performance of the proposed algorithm. Extensive experimental results justify its superiority over the state-of-the-art.
This study proposes an end-to-end framework for solving multi-objective optimization problems (MOPs) using Deep Reinforcement Learning (DRL), termed DRL-MOA. The idea of decomposition is adopted to decompose a MOP into a set of scalar optimization subproblems. The subproblems are then optimized cooperatively by a neighbourhood-based parameter transfer strategy which significantly accelerates the training procedure and makes the realization of DRL-MOA possible. The subproblems are modelled as neural networks and the RL method is used to optimize them. In specific, the multi-objective travelling salesman problem (MOTSP) is solved in this work using the DRL-MOA framework by modelling the subproblem as the Pointer Network. It is found that, once the trained model is available, it can scale to MOTSPs of any number of cities, e.g., 70-city, 100-city, even the 200-city MOTSP, without re-training the model. The Pareto Front can be directly obtained by a simple feed-forward of the network; thereby, no iteration is required and the MOP can be always solved in a reasonable time. Experimental results indicate a strong convergence ability of the DRL-MOA, especially for large-scale MOTSPs, e.g., 200-city MOTSP, for which evolutionary algorithms such as NSGA-II and MOEA/D are pretty hard to converge even implemented for a large number of iterations. The DRL-MOA can also obtain a much wider spread of the PF than the two competitors. Moreover, the DRL-MOA has a high level of modularity and can be easily generalized to other MOPs by replacing the modelling of the subproblem.
Traditional Neural machine translation (NMT) involves a fixed training procedure where each sentence is sampled once during each epoch. In reality, some sentences are well-learned during the initial few epochs; however, using this approach, the well-learned sentences would continue to be trained along with those sentences that were not well learned for 10-30 epochs, which results in a wastage of time. Here, we propose an efficient method to dynamically sample the sentences in order to accelerate the NMT training. In this approach, a weight is assigned to each sentence based on the measured difference between the training costs of two iterations. Further, in each epoch, a certain percentage of sentences are dynamically sampled according to their weights. Empirical results based on the NIST Chinese-to-English and the WMT English-to-German tasks depict that the proposed method can significantly accelerate the NMT training and improve the NMT performance.
Point set registration is a powerful method that enables robots to manipulate deformable objects. By mapping the point cloud of the current object to the pre-trained point cloud, a transformation function can be constructed. The manipulator's trajectory for pre-trained shapes can be warped with this transformation function, yielding a feasible trajectory for the new shape. However, usually this transformation function regards objects as discrete points, and dismisses the topological structures. Therefore, it risks over-stretching or over-compression during manipulation. To tackle this problem, this paper proposes a tangent space point set registration method. A tangent space representation of an object is constructed by defining an angle for each node on the object. Point set registration algorithm runs in this newly-constructed tangent space, yielding a tangent space trajectory. The trajectory is then converted back to Cartesian space and carried out by the robot. Compared to its counterpart in Cartesian space, tangent space point set registration is safer and more robust, succeeding in a series of experiments such as rope straightening, rope knotting, cloth folding and unfolding.
Recently, hidden Markov models (HMMs) have achieved promising results for offline handwritten Chinese text recognition. However, due to the large vocabulary of Chinese characters with each modeled by a uniform and fixed number of hidden states, a high demand of memory and computation is required. In this study, to address this issue, we present parsimonious HMMs via the state tying which can fully utilize the similarities among different Chinese characters. Two-step algorithm with the data-driven question-set is adopted to generate the tied-state pool using the likelihood measure. The proposed parsimonious HMMs with both Gaussian mixture models (GMMs) and deep neural networks (DNNs) as the emission distributions not only lead to a compact model but also improve the recognition accuracy via the data sharing for the tied states and the confusion decreasing among state classes. Tested on ICDAR-2013 competition database, in the best configured case, the new parsimonious DNN-HMM can yield a relative character error rate (CER) reduction of 6.2%, 25% reduction of model size and 60% reduction of decoding time over the conventional DNN-HMM. In the compact setting case of average 1-state HMM, our parsimonious DNN-HMM significantly outperforms the conventional DNN-HMM with a relative CER reduction of 35.5%.
We present multiresolution tree-structured networks to process point clouds for 3D shape understanding and generation tasks. Our network represents a 3D shape as a set of locality-preserving 1D ordered list of points at multiple resolutions. This allows efficient feed-forward processing through 1D convolutions, coarse-to-fine analysis through a multi-grid architecture, and it leads to faster convergence and small memory footprint during training. The proposed tree-structured encoders can be used to classify shapes and outperform existing point-based architectures on shape classification benchmarks, while tree-structured decoders can be used for generating point clouds directly and they outperform existing approaches for image-to-shape inference tasks learned using the ShapeNet dataset. Our model also allows unsupervised learning of point-cloud based shapes by using a variational autoencoder, leading to higher-quality generated shapes.
Recent direct visual odometry and SLAM algorithms have demonstrated impressive levels of precision. However, they require a photometric camera calibration in order to achieve competitive results. Hence, the respective algorithm cannot be directly applied to an off-the-shelf-camera or to a video sequence acquired with an unknown camera. In this work we propose a method for online photometric calibration which enables to process auto exposure videos with visual odometry precisions that are on par with those of photometrically calibrated videos. Our algorithm recovers the exposure times of consecutive frames, the camera response function, and the attenuation factors of the sensor irradiance due to vignetting. Gain robust KLT feature tracks are used to obtain scene point correspondences as input to a nonlinear optimization framework. We show that our approach can reliably calibrate arbitrary video sequences by evaluating it on datasets for which full photometric ground truth is available. We further show that our calibration can improve the performance of a state-of-the-art direct visual odometry method that works solely on pixel intensities, calibrating for photometric parameters in an online fashion in realtime.
We propose Stereo Direct Sparse Odometry (Stereo DSO) as a novel method for highly accurate real-time visual odometry estimation of large-scale environments from stereo cameras. It jointly optimizes for all the model parameters within the active window, including the intrinsic/extrinsic camera parameters of all keyframes and the depth values of all selected pixels. In particular, we propose a novel approach to integrate constraints from static stereo into the bundle adjustment pipeline of temporal multi-view stereo. Real-time optimization is realized by sampling pixels uniformly from image regions with sufficient intensity gradient. Fixed-baseline stereo resolves scale drift. It also reduces the sensitivities to large optical flow and to rolling shutter effect which are known shortcomings of direct image alignment methods. Quantitative evaluation demonstrates that the proposed Stereo DSO outperforms existing state-of-the-art visual odometry methods both in terms of tracking accuracy and robustness. Moreover, our method delivers a more precise metric 3D reconstruction than previous dense/semi-dense direct approaches while providing a higher reconstruction density than feature-based methods.
We propose a method to generate 3D shapes using point clouds. Given a point-cloud representation of a 3D shape, our method builds a kd-tree to spatially partition the points. This orders them consistently across all shapes, resulting in reasonably good correspondences across all shapes. We then use PCA analysis to derive a linear shape basis across the spatially partitioned points, and optimize the point ordering by iteratively minimizing the PCA reconstruction error. Even with the spatial sorting, the point clouds are inherently noisy and the resulting distribution over the shape coefficients can be highly multi-modal. We propose to use the expressive power of neural networks to learn a distribution over the shape coefficients in a generative-adversarial framework. Compared to 3D shape generative models trained on voxel-representations, our point-based method is considerably more light-weight and scalable, with little loss of quality. It also outperforms simpler linear factor models such as Probabilistic PCA, both qualitatively and quantitatively, on a number of categories from the ShapeNet dataset. Furthermore, our method can easily incorporate other point attributes such as normal and color information, an additional advantage over voxel-based representations.