Models, code, and papers for "Wei Zhang":
To deal with changing environments, a new performance measure---adaptive regret, defined as the maximum static regret over any interval, is proposed in online learning. Under the setting of online convex optimization, several algorithms have been successfully developed to minimize the adaptive regret. However, existing algorithms lack universality in the sense that they can only handle one type of convex functions and need apriori knowledge of parameters. By contrast, there exist universal algorithms, such as MetaGrad, that attain optimal static regret for multiple types of convex functions simultaneously. Along this line of research, this paper presents the first universal algorithm for minimizing the adaptive regret of convex functions. Specifically, we borrow the idea of maintaining multiple learning rates in MetaGrad to handle the uncertainty of functions, and utilize the technique of sleeping experts to capture changing environments. In this way, our algorithm automatically adapts to the property of functions (convex, exponentially concave, or strongly convex), as well as the nature of environments (stationary or changing). As a by product, it also allows the type of functions to switch between rounds.
In this paper we explore deep learning models with memory component or attention mechanism for question answering task. We combine and compare three models, Neural Machine Translation, Neural Turing Machine, and Memory Networks for a simulated QA data set. This paper is the first one that uses Neural Machine Translation and Neural Turing Machines for solving QA tasks. Our results suggest that the combination of attention and memory have potential to solve certain QA problem.
Reusable model design becomes desirable with the rapid expansion of machine learning applications. In this paper, we focus on the reusability of pre-trained deep convolutional models. Specifically, different from treating pre-trained models as feature extractors, we reveal more treasures beneath convolutional layers, i.e., the convolutional activations could act as a detector for the common object in the image co-localization problem. We propose a simple but effective method, named Deep Descriptor Transforming (DDT), for evaluating the correlations of descriptors and then obtaining the category-consistent regions, which can accurately locate the common object in a set of images. Empirical studies validate the effectiveness of the proposed DDT method. On benchmark image co-localization datasets, DDT consistently outperforms existing state-of-the-art methods by a large margin. Moreover, DDT also demonstrates good generalization ability for unseen categories and robustness for dealing with noisy data.
Collecting training data from the physical world is usually time-consuming and even dangerous for fragile robots, and thus, recent advances in robot learning advocate the use of simulators as the training platform. Unfortunately, the reality gap between synthetic and real visual data prohibits direct migration of the models trained in virtual worlds to the real world. This paper proposes a modular architecture for tackling the virtual-to-real problem. The proposed architecture separates the learning model into a perception module and a control policy module, and uses semantic image segmentation as the meta representation for relating these two modules. The perception module translates the perceived RGB image to semantic image segmentation. The control policy module is implemented as a deep reinforcement learning agent, which performs actions based on the translated image segmentation. Our architecture is evaluated in an obstacle avoidance task and a target following task. Experimental results show that our architecture significantly outperforms all of the baseline methods in both virtual and real environments, and demonstrates a faster learning curve than them. We also present a detailed analysis for a variety of variant configurations, and validate the transferability of our modular architecture.
In modern large-scale machine learning applications, the training data are often partitioned and stored on multiple machines. It is customary to employ the "data parallelism" approach, where the aggregated training loss is minimized without moving data across machines. In this paper, we introduce a novel distributed dual formulation for regularized loss minimization problems that can directly handle data parallelism in the distributed setting. This formulation allows us to systematically derive dual coordinate optimization procedures, which we refer to as Distributed Alternating Dual Maximization (DADM). The framework extends earlier studies described in (Boyd et al., 2011; Ma et al., 2015a; Jaggi et al., 2014; Yang, 2013) and has rigorous theoretical analyses. Moreover with the help of the new formulation, we develop the accelerated version of DADM (Acc-DADM) by generalizing the acceleration technique from (Shalev-Shwartz and Zhang, 2014) to the distributed setting. We also provide theoretical results for the proposed accelerated version and the new result improves previous ones (Yang, 2013; Ma et al., 2015a) whose runtimes grow linearly on the condition number. Our empirical studies validate our theory and show that our accelerated approach significantly improves the previous state-of-the-art distributed dual coordinate optimization algorithms.
Learning to remember long sequences remains a challenging task for recurrent neural networks. Register memory and attention mechanisms were both proposed to resolve the issue with either high computational cost to retain memory differentiability, or by discounting the RNN representation learning towards encoding shorter local contexts than encouraging long sequence encoding. Associative memory, which studies the compression of multiple patterns in a fixed size memory, were rarely considered in recent years. Although some recent work tries to introduce associative memory in RNN and mimic the energy decay process in Hopfield nets, it inherits the shortcoming of rule-based memory updates, and the memory capacity is limited. This paper proposes a method to learn the memory update rule jointly with task objective to improve memory capacity for remembering long sequences. Also, we propose an architecture that uses multiple such associative memory for more complex input encoding. We observed some interesting facts when compared to other RNN architectures on some well-studied sequence learning tasks.
Point matching refers to the process of finding spatial transformation and correspondences between two sets of points. In this paper, we focus on the case that there is only partial overlap between two point sets. Following the approach of the robust point matching method, we model point matching as a mixed linear assignment-least square problem and show that after eliminating the transformation variable, the resulting problem of minimization with respect to point correspondence is a concave optimization problem. Furthermore, this problem has the property that the objective function can be converted into a form with few nonlinear terms via a linear transformation. Based on these properties, we employ the branch-and-bound (BnB) algorithm to optimize the resulting problem where the dimension of the search space is small. To further improve efficiency of the BnB algorithm where computation of the lower bound is the bottleneck, we propose a new lower bounding scheme which has a k-cardinality linear assignment formulation and can be efficiently solved. Experimental results show that the proposed algorithm outperforms state-of-the-art methods in terms of robustness to disturbances and point matching accuracy.
A metonym is a word with a figurative meaning, similar to a metaphor. Because metonyms are closely related to metaphors, we apply features that are used successfully for metaphor recognition to the task of detecting metonyms. On the ACL SemEval 2007 Task 8 data with gold standard metonym annotations, our system achieved 86.45% accuracy on the location metonyms. Our code can be found on GitHub.
Restricted Boltzman Machines (RBMs) have been successfully used in recommender systems. However, as with most of other collaborative filtering techniques, it cannot solve cold start problems for there is no rating for a new item. In this paper, we first apply conditional RBM (CRBM) which could take extra information into account and show that CRBM could solve cold start problem very well, especially for rating prediction task. CRBM naturally combine the content and collaborative data under a single framework which could be fitted effectively. Experiments show that CRBM can be compared favourably with matrix factorization models, while hidden features learned from the former models are more easy to be interpreted.
Very short-term convective storm forecasting, termed nowcasting, has long been an important issue and has attracted substantial interest. Existing nowcasting methods rely principally on radar images and are limited in terms of nowcasting storm initiation and growth. Real-time re-analysis of meteorological data supplied by numerical models provides valuable information about three-dimensional (3D), atmospheric, boundary layer thermal dynamics, such as temperature and wind. To mine such data, we here develop a convolution-recurrent, hybrid deep-learning method with the following characteristics: (1) the use of cell-based oversampling to increase the number of training samples; this mitigates the class imbalance issue; (2) the use of both raw 3D radar data and 3D meteorological data re-analyzed via multi-source 3D convolution without any need for handcraft feature engineering; and (3) the stacking of convolutional neural networks on a long short-term memory encoder/decoder that learns the spatiotemporal patterns of convective processes. Experimental results demonstrated that our method performs better than other extrapolation methods. Qualitative analysis yielded encouraging nowcasting results.
Accurate and robust detection of multi-class objects in optical remote sensing images is essential to many real-world applications such as urban planning, traffic control, searching and rescuing, etc. However, state-of-the-art object detection techniques designed for images captured using ground-level sensors usually experience a sharp performance drop when directly applied to remote sensing images, largely due to the object appearance differences in remote sensing images in term of sparse texture, low contrast, arbitrary orientations, large scale variations, etc. This paper presents a novel object detection network (CAD-Net) that exploits attention-modulated features as well as global and local contexts to address the new challenges in detecting objects from remote sensing images. The proposed CAD-Net learns global and local contexts of objects by capturing their correlations with the global scene (at scene-level) and the local neighboring objects or features (at object-level), respectively. In addition, it designs a spatial-and-scale-aware attention module that guides the network to focus on more informative regions and features as well as more appropriate feature scales. Experiments over two publicly available object detection datasets for remote sensing images demonstrate that the proposed CAD-Net achieves superior detection performance. The implementation codes will be made publicly available for facilitating future researches.
Visual cognition of the indoor environment can benefit from the spatial layout estimation, which is to represent an indoor scene with a 2D box on a monocular image. In this paper, we propose to fully exploit the edge and semantic information of a room image for layout estimation. More specifically, we present an encoder-decoder network with shared encoder and two separate decoders, which are composed of multiple deconvolution (transposed convolution) layers, to jointly learn the edge maps and semantic labels of a room image. We combine these two network predictions in a scoring function to evaluate the quality of the layouts, which are generated by ray sampling and from a predefined layout pool. Guided by the scoring function, we apply a novel refinement strategy to further optimize the layout hypotheses. Experimental results show that the proposed network can yield accurate estimates of edge maps and semantic labels. By fully utilizing the two different types of labels, the proposed method achieves state-of-the-art layout estimation performance on benchmark datasets.
Self-navigation, referring to automatically reaching the goal while avoiding collision with obstacles, is a fundamental skill of mobile robots. Currently, Deep Reinforcement Learning (DRL) can enable the robot to navigate in a more complex environment with less computation power compared to conventional methods. However, it is time-consuming and hard to train the robot to learn goal-reaching and obstacle-avoidance skills simultaneously using DRL-based algorithms. In this paper, two Dueling Deep Q Networks (DQN) named Goal Network and Avoidance Network are used to learn the goal-reaching and obstacle-avoidance skills individually. A novel method named danger-aware advantage composition is proposed to fuse the two networks together without any redesigning and retraining. The composed Navigation Network can enable the robot to reach the goal right behind the wall and to navigate in unknown complexed environment safely and quickly.
This paper proposes a pedestrian detection and re-identification (re-id) integration net (I-Net) in an end-to-end learning framework. The I-Net is used in real-world video surveillance scenarios, where the target person needs to be searched in the whole scene videos, while the annotations of pedestrian bounding boxes are unavailable. By comparing to the OIM which is a work for joint detection and re-id, we have three distinct contributions. First, we introduce a Siamese architecture of I-Net instead of 1 stream, such that a verification task can be implemented. Second, we propose a novel on-line pairing loss (OLP) and hard example priority softmax loss (HEP), such that only the hard negatives are posed much attention in loss computation. Third, an on-line dictionary for negative samples storage is designed in I-Net without recording the positive samples. We show our result on person search datasets, the gap between detection and re-identification is narrowed. The superior performance can be achieved.
The research of personalized recommendation techniques today has mostly parted into two mainstream directions, i.e., the factorization-based approaches and topic models. Practically, they aim to benefit from the numerical ratings and textual reviews, correspondingly, which compose two major information sources in various real-world systems. However, although the two approaches are supposed to be correlated for their same goal of accurate recommendation, there still lacks a clear theoretical understanding of how their objective functions can be mathematically bridged to leverage the numerical ratings and textual reviews collectively, and why such a bridge is intuitively reasonable to match up their learning procedures for the rating prediction and top-N recommendation tasks, respectively. In this work, we exposit with mathematical analysis that, the vector-level randomization functions to coordinate the optimization objectives of factorizational and topic models unfortunately do not exist at all, although they are usually pre-assumed and intuitively designed in the literature. Fortunately, we also point out that one can avoid the seeking of such a randomization function by optimizing a Joint Factorizational Topic (JFT) model directly. We apply our JFT model to restaurant recommendation, and study its performance in both normal and cross-city recommendation scenarios, where the latter is an extremely difficult task for its inherent cold-start nature. Experimental results on real-world datasets verified the appealing performance of our approach against previous methods, on both rating prediction and top-N recommendation tasks.
We propose a new algorithm called PLUTO for building logistic regression trees to binary response data. PLUTO can capture the nonlinear and interaction patterns in messy data by recursively partitioning the sample space. It fits a simple or a multiple linear logistic regression model in each partition. PLUTO employs the cyclical coordinate descent method for estimation of multiple linear logistic regression models with elastic net penalties, which allows it to deal with high-dimensional data efficiently. The tree structure comprises a graphical description of the data. Together with the logistic regression models, it provides an accurate classifier as well as a piecewise smooth estimate of the probability of "success". PLUTO controls selection bias by: (1) separating split variable selection from split point selection; (2) applying an adjusted chi-squared test to find the split variable instead of exhaustive search. A bootstrap calibration technique is employed to further correct selection bias. Comparison on real datasets shows that on average, the multiple linear PLUTO models predict more accurately than other algorithms.
Although Bayesian Optimization (BO) has been employed for accelerating materials design in computational materials engineering, existing works are restricted to problems with quantitative variables. However, real designs of materials systems involve both qualitative and quantitative design variables representing material compositions, microstructure morphology, and processing conditions. For mixed-variable problems, existing Bayesian Optimization (BO) approaches represent qualitative factors by dummy variables first and then fit a standard Gaussian process (GP) model with numerical variables as the surrogate model. This approach is restrictive theoretically and fails to capture complex correlations between qualitative levels. We present in this paper the integration of a novel latent-variable (LV) approach for mixed-variable GP modeling with the BO framework for materials design. LVGP is a fundamentally different approach that maps qualitative design variables to underlying numerical LV in GP, which has strong physical justification. It provides flexible parameterization and representation of qualitative factors and shows superior modeling accuracy compared to the existing methods. We demonstrate our approach through testing with numerical examples and materials design examples. It is found that in all test examples the mapped LVs provide intuitive visualization and substantial insight into the nature and effects of the qualitative factors. Though materials designs are used as examples, the method presented is generic and can be utilized for other mixed variable design optimization problems that involve expensive physics-based simulations.
Dynamic malware analysis executes the program in an isolated environment and monitors its run-time behaviour (e.g., system API calls) for malware detection. This technique has been proven to be effective against various code obfuscation techniques and newly released ("zero-day") malware. However, existing works typically only consider the API name while ignoring the arguments, or require complex feature engineering operations and expert knowledge to process the arguments. In this paper, we propose a novel and low-cost feature extraction approach, and an effective deep neural network architecture for accurate and fast malware detection. Specifically, the feature representation approach utilizes a feature hashing trick to encode the API call arguments associated with the API name. The deep neural network architecture applies multiple Gated-CNNs (convolutional neural networks) to transform the extracted features of each API call. The outputs are further processed through LSTM (long-short term memory networks) to learn the sequential correlation among API calls. Experiments show that our solution outperforms baselines significantly on a large real dataset. Valuable insights about feature engineering and architecture design are derived from ablation study.
Dependency trees convey rich structural information that is proven useful for extracting relations among entities in text. However, how to effectively make use of relevant information while ignoring irrelevant information from the dependency trees remains a challenging research question. Existing approaches employing rule based hard-pruning strategies for selecting relevant partial dependency structures may not always yield optimal results. In this work, we propose Attention Guided Graph Convolutional Networks (AGGCNs), a novel model which directly takes full dependency trees as inputs. Our model can be understood as a soft-pruning approach that automatically learns how to selectively attend to the relevant sub-structures useful for the relation extraction task. Extensive results on various tasks including cross-sentence n-ary relation extraction and large-scale sentence-level relation extraction show that our model is able to better leverage the structural information of the full dependency trees, giving significantly better results than previous approaches.
In the field of aircraft detection, tremendous progress has been gained movitated by the development of deep convolution neural networks(DCNNs). At present, most state-of-art models based on DCNNs belong to top-down approaches which take a wide use of anchor mechanism. The obtaining of high accuracy in them relys on the enumeration of massive potentional locations of objects with the form of rectangular bounding box, which is wasteful and less elaborate. In this paper, we present a novel aircraft detection model in a bottom-up manner, which formulated the task as detection of two intersecting line segments inside each target and grouping of them, thus we name it as X-LineNet. As the result of learning more delicate visual grammars information of aircraft, detection results with more concrete details and higher accuracy can be gained by X-LineNet. Just for these advantages, we designed a novel form of detetction results--pentagonal mask which has less redundancy and can better represent airplanes than that of rectangular box in remote sensing images.