Models, code, and papers for "Qiang Hu":
The developments of deep neural networks (DNN) in recent years have ushered a brand new era of artificial intelligence. DNNs are proved to be excellent in solving very complex problems, e.g., visual recognition and text understanding, to the extent of competing with or even surpassing people. Despite inspiring and encouraging success of DNNs, thorough theoretical analyses still lack to unravel the mystery of their magics. The design of DNN structure is dominated by empirical results in terms of network depth, number of neurons and activations. A few of remarkable works published recently in an attempt to interpret DNNs have established the first glimpses of their internal mechanisms. Nevertheless, research on exploring how DNNs operate is still at the initial stage with plenty of room for refinement. In this paper, we extend precedent research on neural networks with piecewise linear activations (PLNN) concerning linear regions bounds. We present (i) the exact maximal number of linear regions for single layer PLNNs; (ii) a upper bound for multi-layer PLNNs; and (iii) a tighter upper bound for the maximal number of liner regions on rectifier networks. The derived bounds also indirectly explain why deep models are more powerful than shallow counterparts, and how non-linearity of activation functions impacts on expressiveness of networks.
Deep learning has been applied to camera relocalization, in particular, PoseNet and its extended work are the convolutional neural networks which regress the camera pose from a single image. However there are many problems, one of them is expensive parameter selection. In this paper, we directly explore the three Euler angles as the orientation representation in the camera pose regressor. There is no need to select the parameter, which is not tolerant in the previous works. Experimental results on the 7 Scenes datasets and the King's College dataset demonstrate that it has competitive performances.
Collaborative filtering (CF) is the key technique for recommender systems (RSs). CF exploits user-item behavior interactions (e.g., clicks) only and hence suffers from the data sparsity issue. One research thread is to integrate auxiliary information such as product reviews and news titles, leading to hybrid filtering methods. Another thread is to transfer knowledge from other source domains such as improving the movie recommendation with the knowledge from the book domain, leading to transfer learning methods. In real-world life, no single service can satisfy a user's all information needs. Thus it motivates us to exploit both auxiliary and source information for RSs in this paper. We propose a novel neural model to smoothly enable Transfer Meeting Hybrid (TMH) methods for cross-domain recommendation with unstructured text in an end-to-end manner. TMH attentively extracts useful content from unstructured text via a memory module and selectively transfers knowledge from a source domain via a transfer network. On two real-world datasets, TMH shows better performance in terms of three ranking metrics by comparing with various baselines. We conduct thorough analyses to understand how the text content and transferred knowledge help the proposed model.
The cross-domain recommendation technique is an effective way of alleviating the data sparsity in recommender systems by leveraging the knowledge from relevant domains. Transfer learning is a class of algorithms underlying these techniques. In this paper, we propose a novel transfer learning approach for cross-domain recommendation by using neural networks as the base model. We assume that hidden layers in two base networks are connected by cross mappings, leading to the collaborative cross networks (CoNet). CoNet enables dual knowledge transfer across domains by introducing cross connections from one base network to another and vice versa. CoNet is achieved in multi-layer feedforward networks by adding dual connections and joint loss functions, which can be trained efficiently by back-propagation. The proposed model is evaluated on two real-world datasets and it outperforms baseline models by relative improvements of 3.56\% in MRR and 8.94\% in NDCG, respectively.
Inspired by the authorship controversy of Dream of the Red Chamber and the application of machine learning in the study of literary stylometry, we develop a rigorous new method for the mathematical analysis of authorship by testing for a so-called chrono-divide in writing styles. Our method incorporates some of the latest advances in the study of authorship attribution, particularly techniques from support vector machines. By introducing the notion of relative frequency as a feature ranking metric our method proves to be highly effective and robust. Applying our method to the Cheng-Gao version of Dream of the Red Chamber has led to convincing if not irrefutable evidence that the first $80$ chapters and the last $40$ chapters of the book were written by two different authors. Furthermore, our analysis has unexpectedly provided strong support to the hypothesis that Chapter 67 was not the work of Cao Xueqin either. We have also tested our method to the other three Great Classical Novels in Chinese. As expected no chrono-divides have been found. This provides further evidence of the robustness of our method.
In this paper we study the consistency of an empirical minimum error entropy (MEE) algorithm in a regression setting. We introduce two types of consistency. The error entropy consistency, which requires the error entropy of the learned function to approximate the minimum error entropy, is shown to be always true if the bandwidth parameter tends to 0 at an appropriate rate. The regression consistency, which requires the learned function to approximate the regression function, however, is a complicated issue. We prove that the error entropy consistency implies the regression consistency for homoskedastic models where the noise is independent of the input variable. But for heteroskedastic models, a counterexample is used to show that the two types of consistency do not coincide. A surprising result is that the regression consistency is always true, provided that the bandwidth parameter tends to infinity at an appropriate rate. Regression consistency of two classes of special models is shown to hold with fixed bandwidth parameter, which further illustrates the complexity of regression consistency of MEE. Fourier transform plays crucial roles in our analysis.
We consider the minimum error entropy (MEE) criterion and an empirical risk minimization learning algorithm in a regression setting. A learning theory approach is presented for this MEE algorithm and explicit error bounds are provided in terms of the approximation ability and capacity of the involved hypothesis space when the MEE scaling parameter is large. Novel asymptotic analysis is conducted for the generalization error associated with Renyi's entropy and a Parzen window function, to overcome technical difficulties arisen from the essential differences between the classical least squares problems and the MEE setting. A semi-norm and the involved symmetrized least squares error are introduced, which is related to some ranking algorithms.
Long-range dependencies modeling, widely used in capturing spatiotemporal correlation, has shown to be effective in CNN dominated computer vision tasks. Yet neither stacks of convolutional operations to enlarge receptive fields nor recent nonlocal modules is computationally efficient. In this paper, we present a generic family of lightweight global descriptors for modeling the interactions between positions across different dimensions (e.g., channels, frames). This descriptor enables subsequent convolutions to access the informative global features with negligible computational complexity and parameters. Benchmark experiments show that the proposed method can complete state-of-the-art long-range mechanisms with a significant reduction in extra computing cost. Code available at https://github.com/HolmesShuan/Compact-Global-Descriptor.
Discriminant Correlation Filters (DCF) based methods now become a kind of dominant approach to online object tracking. The features used in these methods, however, are either based on hand-crafted features like HoGs, or convolutional features trained independently from other tasks like image classification. In this work, we present an end-to-end lightweight network architecture, namely DCFNet, to learn the convolutional features and perform the correlation tracking process simultaneously. Specifically, we treat DCF as a special correlation filter layer added in a Siamese network, and carefully derive the backpropagation through it by defining the network output as the probability heatmap of object location. Since the derivation is still carried out in Fourier frequency domain, the efficiency property of DCF is preserved. This enables our tracker to run at more than 60 FPS during test time, while achieving a significant accuracy gain compared with KCF using HoGs. Extensive evaluations on OTB-2013, OTB-2015, and VOT2015 benchmarks demonstrate that the proposed DCFNet tracker is competitive with several state-of-the-art trackers, while being more compact and much faster.
The outstanding pattern recognition performance of deep learning brings new vitality to the synthetic aperture radar (SAR) automatic target recognition (ATR). However, there is a limitation in current deep learning based ATR solution that each learning process only handle one SAR image, namely learning the static scattering information, while missing the space-varying information. It is obvious that multi-aspect joint recognition introduced space-varying scattering information should improve the classification accuracy and robustness. In this paper, a novel multi-aspect-aware method is proposed to achieve this idea through the bidirectional Long Short-Term Memory (LSTM) recurrent neural networks based space-varying scattering information learning. Specifically, we first select different aspect images to generate the multi-aspect space-varying image sequences. Then, the Gabor filter and three-patch local binary pattern (TPLBP) are progressively implemented to extract a comprehensive spatial features, followed by dimensionality reduction with the Multi-layer Perceptron (MLP) network. Finally, we design a bidirectional LSTM recurrent neural network to learn the multi-aspect features with further integrating the softmax classifier to achieve target recognition. Experimental results demonstrate that the proposed method can achieve 99.9% accuracy for 10-class recognition. Besides, its anti-noise and anti-confusion performance are also better than the conventional deep learning based methods.
Inter-process communication (IPC) is one of the core functions of modern robotics middleware. We propose an efficient IPC technique called TZC (Towards Zero-Copy). As a core component of TZC, we design a novel algorithm called partial serialization. Our formulation can generate messages that can be divided into two parts. During message transmission, one part is transmitted through a socket and the other part uses shared memory. The part within shared memory is never copied or serialized during its lifetime. We have integrated TZC with ROS and ROS2 and find that TZC can be easily combined with current open-source platforms. By using TZC, the overhead of IPC remains constant when the message size grows. In particular, when the message size is 4MB (less than the size of a full HD image), TZC can reduce the overhead of ROS IPC from tens of milliseconds to hundreds of microseconds and can reduce the overhead of ROS2 IPC from hundreds of milliseconds to less than 1 millisecond. We also demonstrate the benefits of TZC by integrating with TurtleBot2 that are used in autonomous driving scenarios. We show that by using TZC, the braking distance can be shortened by 16% than ROS.
In this paper we illustrate how to perform both visual object tracking and semi-supervised video object segmentation, in real-time, with a single simple approach. Our method, dubbed SiamMask, improves the offline training procedure of popular fully-convolutional Siamese approaches for object tracking by augmenting their loss with a binary segmentation task. Once trained, SiamMask solely relies on a single bounding box initialisation and operates online, producing class-agnostic object segmentation masks and rotated bounding boxes at 35 frames per second. Despite its simplicity, versatility and fast speed, our strategy allows us to establish a new state-of-the-art among real-time trackers on VOT-2018, while at the same time demonstrating competitive performance and the best speed for the semi-supervised video object segmentation task on DAVIS-2016 and DAVIS-2017. The project website is http://www.robots.ox.ac.uk/~qwang/SiamMask.
Recently, Siamese networks have drawn great attention in visual tracking community because of their balanced accuracy and speed. However, features used in most Siamese tracking approaches can only discriminate foreground from the non-semantic backgrounds. The semantic backgrounds are always considered as distractors, which hinders the robustness of Siamese trackers. In this paper, we focus on learning distractor-aware Siamese networks for accurate and long-term tracking. To this end, features used in traditional Siamese trackers are analyzed at first. We observe that the imbalanced distribution of training data makes the learned features less discriminative. During the off-line training phase, an effective sampling strategy is introduced to control this distribution and make the model focus on the semantic distractors. During inference, a novel distractor-aware module is designed to perform incremental learning, which can effectively transfer the general embedding to the current video domain. In addition, we extend the proposed approach for long-term tracking by introducing a simple yet effective local-to-global search region strategy. Extensive experiments on benchmarks show that our approach significantly outperforms the state-of-the-arts, yielding 9.6% relative gain in VOT2016 dataset and 35.9% relative gain in UAV20L dataset. The proposed tracker can perform at 160 FPS on short-term benchmarks and 110 FPS on long-term benchmarks.
Modern robotic systems have become an alternative to humans to perform risky or exhausting tasks. In such application scenarios, communications between robots and the control center have become one of the major problems. Buffering is a commonly used solution to relieve temporary network disruption. But the assumption that newer messages are more valuable than older ones is not true for many application scenarios such as explorations, rescue operations, and surveillance. In this paper, we proposed a novel resilient buffer management policy named OptSample. It can uniformly sampling messages and dynamically adjust the sample rate based on run-time network situation. We define an evaluation function to estimate the profit of a message sequence. Based on the function, our analysis and simulation shows that the OptSample policy can effectively prevent losing long segment of continuous messages and improve the overall profit of the received messages. We implement the proposed policy in ROS. The implementation is transparent to user and no user code need to be changed. Experimental results on several application scenarios show that the OptSample policy can help robotic systems be more resilient against network disruption.
Deep learning (DL) has recently achieved tremendous success in a variety of cutting-edge applications, e.g., image recognition, speech and natural language processing, and autonomous driving. Besides the available big data and hardware evolution, DL frameworks and platforms play a key role to catalyze the research, development, and deployment of DL intelligent solutions. However, the difference in computation paradigm, architecture design and implementation of existing DL frameworks and platforms brings challenges for DL software development, deployment, maintenance, and migration. Up to the present, it still lacks a comprehensive study on how current diverse DL frameworks and platforms influence the DL software development process. In this paper, we initiate the first step towards the investigation on how existing state-of-the-art DL frameworks (i.e., TensorFlow, Theano, and Torch) and platforms (i.e., server/desktop, web, and mobile) support the DL software development activities. We perform an in-depth and comparative evaluation on metrics such as learning accuracy, DL model size, robustness, and performance, on state-of-the-art DL frameworks across platforms using two popular datasets MNIST and CIFAR-10. Our study reveals that existing DL frameworks still suffer from compatibility issues, which becomes even more severe when it comes to different platforms. We pinpoint the current challenges and opportunities towards developing high quality and compatible DL systems. To ignite further investigation along this direction to address urgent industrial demands of intelligent solutions, we make all of our assembled feasible toolchain and dataset publicly available.
Deep Learning (DL) has recently achieved tremendous success. A variety of DL frameworks and platforms play a key role to catalyze such progress. However, the differences in architecture designs and implementations of existing frameworks and platforms bring new challenges for DL software development and deployment. Till now, there is no study on how various mainstream frameworks and platforms influence both DL software development and deployment in practice. To fill this gap, we take the first step towards understanding how the most widely-used DL frameworks and platforms support the DL software development and deployment. We conduct a systematic study on these frameworks and platforms by using two types of DNN architectures and three popular datasets. (1) For development process, we investigate the prediction accuracy under the same runtime training configuration or same model weights/biases. We also study the adversarial robustness of trained models by leveraging the existing adversarial attack techniques. The experimental results show that the computing differences across frameworks could result in an obvious prediction accuracy decline, which should draw the attention of DL developers. (2) For deployment process, we investigate the prediction accuracy and performance (refers to time cost and memory consumption) when the trained models are migrated/quantized from PC to real mobile devices and web browsers. The DL platform study unveils that the migration and quantization still suffer from compatibility and reliability issues. Meanwhile, we find several DL software bugs by using the results as a benchmark. We further validate the results through bug confirmation from stakeholders and industrial positive feedback to highlight the implications of our study. Through our study, we summarize practical guidelines, identify challenges and pinpoint new research directions.