Models, code, and papers for "Wei Zhang":
We present a novel approach to improve the performance of distant supervision relation extraction with Positive and Unlabeled (PU) Learning. This approach first applies reinforcement learning to decide whether a sentence is positive to a given relation, and then positive and unlabeled bags are constructed. In contrast to most previous studies, which mainly use selected positive instances only, we make full use of unlabeled instances and propose two new representations for positive and unlabeled bags. These two representations are then combined in an appropriate way to make bag-level prediction. Experimental results on a widely used real-world dataset demonstrate that this new approach indeed achieves significant and consistent improvements as compared to several competitive baselines.
To deal with changing environments, a new performance measure---adaptive regret, defined as the maximum static regret over any interval, is proposed in online learning. Under the setting of online convex optimization, several algorithms have been successfully developed to minimize the adaptive regret. However, existing algorithms lack universality in the sense that they can only handle one type of convex functions and need apriori knowledge of parameters. By contrast, there exist universal algorithms, such as MetaGrad, that attain optimal static regret for multiple types of convex functions simultaneously. Along this line of research, this paper presents the first universal algorithm for minimizing the adaptive regret of convex functions. Specifically, we borrow the idea of maintaining multiple learning rates in MetaGrad to handle the uncertainty of functions, and utilize the technique of sleeping experts to capture changing environments. In this way, our algorithm automatically adapts to the property of functions (convex, exponentially concave, or strongly convex), as well as the nature of environments (stationary or changing). As a by product, it also allows the type of functions to switch between rounds.
In this paper we explore deep learning models with memory component or attention mechanism for question answering task. We combine and compare three models, Neural Machine Translation, Neural Turing Machine, and Memory Networks for a simulated QA data set. This paper is the first one that uses Neural Machine Translation and Neural Turing Machines for solving QA tasks. Our results suggest that the combination of attention and memory have potential to solve certain QA problem.
Reusable model design becomes desirable with the rapid expansion of machine learning applications. In this paper, we focus on the reusability of pre-trained deep convolutional models. Specifically, different from treating pre-trained models as feature extractors, we reveal more treasures beneath convolutional layers, i.e., the convolutional activations could act as a detector for the common object in the image co-localization problem. We propose a simple but effective method, named Deep Descriptor Transforming (DDT), for evaluating the correlations of descriptors and then obtaining the category-consistent regions, which can accurately locate the common object in a set of images. Empirical studies validate the effectiveness of the proposed DDT method. On benchmark image co-localization datasets, DDT consistently outperforms existing state-of-the-art methods by a large margin. Moreover, DDT also demonstrates good generalization ability for unseen categories and robustness for dealing with noisy data.
Collecting training data from the physical world is usually time-consuming and even dangerous for fragile robots, and thus, recent advances in robot learning advocate the use of simulators as the training platform. Unfortunately, the reality gap between synthetic and real visual data prohibits direct migration of the models trained in virtual worlds to the real world. This paper proposes a modular architecture for tackling the virtual-to-real problem. The proposed architecture separates the learning model into a perception module and a control policy module, and uses semantic image segmentation as the meta representation for relating these two modules. The perception module translates the perceived RGB image to semantic image segmentation. The control policy module is implemented as a deep reinforcement learning agent, which performs actions based on the translated image segmentation. Our architecture is evaluated in an obstacle avoidance task and a target following task. Experimental results show that our architecture significantly outperforms all of the baseline methods in both virtual and real environments, and demonstrates a faster learning curve than them. We also present a detailed analysis for a variety of variant configurations, and validate the transferability of our modular architecture.
In modern large-scale machine learning applications, the training data are often partitioned and stored on multiple machines. It is customary to employ the "data parallelism" approach, where the aggregated training loss is minimized without moving data across machines. In this paper, we introduce a novel distributed dual formulation for regularized loss minimization problems that can directly handle data parallelism in the distributed setting. This formulation allows us to systematically derive dual coordinate optimization procedures, which we refer to as Distributed Alternating Dual Maximization (DADM). The framework extends earlier studies described in (Boyd et al., 2011; Ma et al., 2015a; Jaggi et al., 2014; Yang, 2013) and has rigorous theoretical analyses. Moreover with the help of the new formulation, we develop the accelerated version of DADM (Acc-DADM) by generalizing the acceleration technique from (Shalev-Shwartz and Zhang, 2014) to the distributed setting. We also provide theoretical results for the proposed accelerated version and the new result improves previous ones (Yang, 2013; Ma et al., 2015a) whose runtimes grow linearly on the condition number. Our empirical studies validate our theory and show that our accelerated approach significantly improves the previous state-of-the-art distributed dual coordinate optimization algorithms.
Learning to remember long sequences remains a challenging task for recurrent neural networks. Register memory and attention mechanisms were both proposed to resolve the issue with either high computational cost to retain memory differentiability, or by discounting the RNN representation learning towards encoding shorter local contexts than encouraging long sequence encoding. Associative memory, which studies the compression of multiple patterns in a fixed size memory, were rarely considered in recent years. Although some recent work tries to introduce associative memory in RNN and mimic the energy decay process in Hopfield nets, it inherits the shortcoming of rule-based memory updates, and the memory capacity is limited. This paper proposes a method to learn the memory update rule jointly with task objective to improve memory capacity for remembering long sequences. Also, we propose an architecture that uses multiple such associative memory for more complex input encoding. We observed some interesting facts when compared to other RNN architectures on some well-studied sequence learning tasks.
Point matching refers to the process of finding spatial transformation and correspondences between two sets of points. In this paper, we focus on the case that there is only partial overlap between two point sets. Following the approach of the robust point matching method, we model point matching as a mixed linear assignment-least square problem and show that after eliminating the transformation variable, the resulting problem of minimization with respect to point correspondence is a concave optimization problem. Furthermore, this problem has the property that the objective function can be converted into a form with few nonlinear terms via a linear transformation. Based on these properties, we employ the branch-and-bound (BnB) algorithm to optimize the resulting problem where the dimension of the search space is small. To further improve efficiency of the BnB algorithm where computation of the lower bound is the bottleneck, we propose a new lower bounding scheme which has a k-cardinality linear assignment formulation and can be efficiently solved. Experimental results show that the proposed algorithm outperforms state-of-the-art methods in terms of robustness to disturbances and point matching accuracy.
A metonym is a word with a figurative meaning, similar to a metaphor. Because metonyms are closely related to metaphors, we apply features that are used successfully for metaphor recognition to the task of detecting metonyms. On the ACL SemEval 2007 Task 8 data with gold standard metonym annotations, our system achieved 86.45% accuracy on the location metonyms. Our code can be found on GitHub.
Restricted Boltzman Machines (RBMs) have been successfully used in recommender systems. However, as with most of other collaborative filtering techniques, it cannot solve cold start problems for there is no rating for a new item. In this paper, we first apply conditional RBM (CRBM) which could take extra information into account and show that CRBM could solve cold start problem very well, especially for rating prediction task. CRBM naturally combine the content and collaborative data under a single framework which could be fitted effectively. Experiments show that CRBM can be compared favourably with matrix factorization models, while hidden features learned from the former models are more easy to be interpreted.
Deep reinforcement learning (DRL) has shown great potential in training control agents for map-less robot navigation. However, the trained agents are generally dependent on the employed robot in training or dimension-specific, which cannot be directly reused by robots with different dimensional configurations. To address this issue, a novel DRL-based robot navigation method is proposed in this paper. The proposed approach trains a meta-robot with DRL and then transfers the meta-skill to a robot with a different dimensional configuration (named dimension-scaled robot) using a method named dimension-variable skill transfer (DVST), referred to as DRL-DVST. During the training phase, the meta-agent learns to perform self-navigation with the meta-robot in a simulation environment. In the skill-transfer phase, the observations of the dimension-scaled robot are transferred to the meta-agent in a scaled manner, and the control policy generated by the meta-agent is scaled back to the dimension-scaled robot. Simulation and real-world experimental results indicate that robots with different sizes and angular velocity bounds can accomplish navigation tasks in unknown and dynamic environments without any retraining. This work greatly extends the application range of DRL-based navigation methods from the fixed dimensional configuration to varied dimensional configurations.
The trace norm is widely used in multi-task learning as it can discover low-rank structures among tasks in terms of model parameters. Nowadays, with the emerging of big datasets and the popularity of deep learning techniques, tensor trace norms have been used for deep multi-task models. However, existing tensor trace norms cannot discover all the low-rank structures and they require users to manually determine the importance of their components. To solve those two issues together, in this paper, we propose a Generalized Tensor Trace Norm (GTTN). The GTTN is defined as a convex combination of matrix trace norms of all possible tensor flattenings and hence it can discover all the possible low-rank structures. In the induced objective function, we will learn combination coefficients in the GTTN to automatically determine the importance. Experiments on real-world datasets demonstrate the effectiveness of the proposed GTTN.
Matrix decomposition is one of the fundamental tools to discover knowledge from big data generated by modern applications. However, it is still inefficient or infeasible to process very big data using such a method in a single machine. Moreover, big data are often distributedly collected and stored on different machines. Thus, such data generally bear strong heterogeneous noise. It is essential and useful to develop distributed matrix decomposition for big data analytics. Such a method should scale up well, model the heterogeneous noise, and address the communication issue in a distributed system. To this end, we propose a distributed Bayesian matrix decomposition model (DBMD) for big data mining and clustering. Specifically, we adopt three strategies to implement the distributed computing including 1) the accelerated gradient descent, 2) the alternating direction method of multipliers (ADMM), and 3) the statistical inference. We investigate the theoretical convergence behaviors of these algorithms. To address the heterogeneity of the noise, we propose an optimal plug-in weighted average that reduces the variance of the estimation. Synthetic experiments validate our theoretical results, and real-world experiments show that our algorithms scale up well to big data and achieves superior or competing performance compared to other distributed methods.
Very short-term convective storm forecasting, termed nowcasting, has long been an important issue and has attracted substantial interest. Existing nowcasting methods rely principally on radar images and are limited in terms of nowcasting storm initiation and growth. Real-time re-analysis of meteorological data supplied by numerical models provides valuable information about three-dimensional (3D), atmospheric, boundary layer thermal dynamics, such as temperature and wind. To mine such data, we here develop a convolution-recurrent, hybrid deep-learning method with the following characteristics: (1) the use of cell-based oversampling to increase the number of training samples; this mitigates the class imbalance issue; (2) the use of both raw 3D radar data and 3D meteorological data re-analyzed via multi-source 3D convolution without any need for handcraft feature engineering; and (3) the stacking of convolutional neural networks on a long short-term memory encoder/decoder that learns the spatiotemporal patterns of convective processes. Experimental results demonstrated that our method performs better than other extrapolation methods. Qualitative analysis yielded encouraging nowcasting results.
Accurate and robust detection of multi-class objects in optical remote sensing images is essential to many real-world applications such as urban planning, traffic control, searching and rescuing, etc. However, state-of-the-art object detection techniques designed for images captured using ground-level sensors usually experience a sharp performance drop when directly applied to remote sensing images, largely due to the object appearance differences in remote sensing images in term of sparse texture, low contrast, arbitrary orientations, large scale variations, etc. This paper presents a novel object detection network (CAD-Net) that exploits attention-modulated features as well as global and local contexts to address the new challenges in detecting objects from remote sensing images. The proposed CAD-Net learns global and local contexts of objects by capturing their correlations with the global scene (at scene-level) and the local neighboring objects or features (at object-level), respectively. In addition, it designs a spatial-and-scale-aware attention module that guides the network to focus on more informative regions and features as well as more appropriate feature scales. Experiments over two publicly available object detection datasets for remote sensing images demonstrate that the proposed CAD-Net achieves superior detection performance. The implementation codes will be made publicly available for facilitating future researches.
Visual cognition of the indoor environment can benefit from the spatial layout estimation, which is to represent an indoor scene with a 2D box on a monocular image. In this paper, we propose to fully exploit the edge and semantic information of a room image for layout estimation. More specifically, we present an encoder-decoder network with shared encoder and two separate decoders, which are composed of multiple deconvolution (transposed convolution) layers, to jointly learn the edge maps and semantic labels of a room image. We combine these two network predictions in a scoring function to evaluate the quality of the layouts, which are generated by ray sampling and from a predefined layout pool. Guided by the scoring function, we apply a novel refinement strategy to further optimize the layout hypotheses. Experimental results show that the proposed network can yield accurate estimates of edge maps and semantic labels. By fully utilizing the two different types of labels, the proposed method achieves state-of-the-art layout estimation performance on benchmark datasets.
Self-navigation, referring to automatically reaching the goal while avoiding collision with obstacles, is a fundamental skill of mobile robots. Currently, Deep Reinforcement Learning (DRL) can enable the robot to navigate in a more complex environment with less computation power compared to conventional methods. However, it is time-consuming and hard to train the robot to learn goal-reaching and obstacle-avoidance skills simultaneously using DRL-based algorithms. In this paper, two Dueling Deep Q Networks (DQN) named Goal Network and Avoidance Network are used to learn the goal-reaching and obstacle-avoidance skills individually. A novel method named danger-aware advantage composition is proposed to fuse the two networks together without any redesigning and retraining. The composed Navigation Network can enable the robot to reach the goal right behind the wall and to navigate in unknown complexed environment safely and quickly.
This paper proposes a pedestrian detection and re-identification (re-id) integration net (I-Net) in an end-to-end learning framework. The I-Net is used in real-world video surveillance scenarios, where the target person needs to be searched in the whole scene videos, while the annotations of pedestrian bounding boxes are unavailable. By comparing to the OIM which is a work for joint detection and re-id, we have three distinct contributions. First, we introduce a Siamese architecture of I-Net instead of 1 stream, such that a verification task can be implemented. Second, we propose a novel on-line pairing loss (OLP) and hard example priority softmax loss (HEP), such that only the hard negatives are posed much attention in loss computation. Third, an on-line dictionary for negative samples storage is designed in I-Net without recording the positive samples. We show our result on person search datasets, the gap between detection and re-identification is narrowed. The superior performance can be achieved.
The research of personalized recommendation techniques today has mostly parted into two mainstream directions, i.e., the factorization-based approaches and topic models. Practically, they aim to benefit from the numerical ratings and textual reviews, correspondingly, which compose two major information sources in various real-world systems. However, although the two approaches are supposed to be correlated for their same goal of accurate recommendation, there still lacks a clear theoretical understanding of how their objective functions can be mathematically bridged to leverage the numerical ratings and textual reviews collectively, and why such a bridge is intuitively reasonable to match up their learning procedures for the rating prediction and top-N recommendation tasks, respectively. In this work, we exposit with mathematical analysis that, the vector-level randomization functions to coordinate the optimization objectives of factorizational and topic models unfortunately do not exist at all, although they are usually pre-assumed and intuitively designed in the literature. Fortunately, we also point out that one can avoid the seeking of such a randomization function by optimizing a Joint Factorizational Topic (JFT) model directly. We apply our JFT model to restaurant recommendation, and study its performance in both normal and cross-city recommendation scenarios, where the latter is an extremely difficult task for its inherent cold-start nature. Experimental results on real-world datasets verified the appealing performance of our approach against previous methods, on both rating prediction and top-N recommendation tasks.