Research papers and code for "Hong Zhang":
We introduce two tactics to attack agents trained by deep reinforcement learning algorithms using adversarial examples, namely the strategically-timed attack and the enchanting attack. In the strategically-timed attack, the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode. Limiting the attack activity to this subset helps prevent detection of the attack by the agent. We propose a novel method to determine when an adversarial example should be crafted and applied. In the enchanting attack, the adversary aims at luring the agent to a designated target state. This is achieved by combining a generative model and a planning algorithm: while the generative model predicts the future states, the planning algorithm generates a preferred sequence of actions for luring the agent. A sequence of adversarial examples is then crafted to lure the agent to take the preferred sequence of actions. We apply the two tactics to the agents trained by the state-of-the-art deep reinforcement learning algorithm including DQN and A3C. In 5 Atari games, our strategically timed attack reduces as much reward as the uniform attack (i.e., attacking at every time step) does by attacking the agent 4 times less often. Our enchanting attack lures the agent toward designated target states with a more than 70% success rate. Videos are available at http://yclin.me/adversarial_attack_RL/

* To Appear at IJCAI 2017. Project website: http://yclin.me/adversarial_attack_RL/
Click to Read Paper and Get Code
In this paper, a multi-state diagnosis and prognosis (MDP) framework is proposed for tool condition monitoring via a deep belief network based multi-state approach (DBNMS). For fault diagnosis, a cost-sensitive deep belief network (namely ECS-DBN) is applied to deal with the imbalanced data problem for tool state estimation. An appropriate prognostic degradation model is then applied for tool wear estimation based on the different tool states. The proposed framework has the advantage of automatic feature representation learning and shows better performance in accuracy and robustness. The effectiveness of the proposed DBNMS is validated using a real-world dataset obtained from the gun drilling process. This dataset contains a large amount of measured signals involving different tool geometries under various operating conditions. The DBNMS is examined for both the tool state estimation and tool wear estimation tasks. In the experimental studies, the prediction results are evaluated and compared with popular machine learning approaches, which show the superior performance of the proposed DBNMS approach.

* 14 pages, 12 figures, 10 tables, submitted to IEEE Transactions on Cybernetics
Click to Read Paper and Get Code
Although low-rank and sparse decomposition based methods have been successfully applied to the problem of moving object detection using structured sparsity-inducing norms, they are still vulnerable to significant illumination changes that arise in certain applications. We are interested in moving object detection in applications involving time-lapse image sequences for which current methods mistakenly group moving objects and illumination changes into foreground. Our method relies on the multilinear (tensor) data low-rank and sparse decomposition framework to address the weaknesses of existing methods. The key to our proposed method is to create first a set of prior maps that can characterize the changes in the image sequence due to illumination. We show that they can be detected by a k-support norm. To deal with concurrent, two types of changes, we employ two regularization terms, one for detecting moving objects and the other for accounting for illumination changes, in the tensor low-rank and sparse decomposition formulation. Through comprehensive experiments using challenging datasets, we show that our method demonstrates a remarkable ability to detect moving objects under discontinuous change in illumination, and outperforms the state-of-the-art solutions to this challenging problem.

* 14 pages, 14 figures, including supplementary materials
Click to Read Paper and Get Code
In this paper, we study an important yet less explored aspect in video detection and tracking -- stability. Surprisingly, there is no prior work that tried to study it. As a result, we start our work by proposing a novel evaluation metric for video detection which considers both stability and accuracy. For accuracy, we extend the existing accuracy metric mean Average Precision (mAP). For stability, we decompose it into three terms: fragment error, center position error, scale and ratio error. Each error represents one aspect of stability. Furthermore, we demonstrate that the stability metric has low correlation with accuracy metric. Thus, it indeed captures a different perspective of quality. Lastly, based on this metric, we evaluate several existing methods for video detection and show how they affect accuracy and stability. We believe our work can provide guidance and solid baselines for future researches in the related areas.

Click to Read Paper and Get Code
Extracting moving objects from a video sequence and estimating the background of each individual image are fundamental issues in many practical applications such as visual surveillance, intelligent vehicle navigation, and traffic monitoring. Recently, some methods have been proposed to detect moving objects in a video via low-rank approximation and sparse outliers where the background is modeled with the computed low-rank component of the video and the foreground objects are detected as the sparse outliers in the low-rank approximation. All of these existing methods work in a batch manner, preventing them from being applied in real time and long duration tasks. In this paper, we present an online sequential framework, namely contiguous outliers representation via online low-rank approximation (COROLA), to detect moving objects and learn the background model at the same time. We also show that our model can detect moving objects with a moving camera. Our experimental evaluation uses simulated data and real public datasets and demonstrates the superior performance of COROLA in terms of both accuracy and execution time.

* 37 pages, 10 figures
Click to Read Paper and Get Code
We propose a technique to improve the search efficiency of the bag-of-words (BoW) method for image retrieval. We introduce a notion of difficulty for the image matching problems and propose methods that reduce the amount of computations required for the feature vector-quantization task in BoW by exploiting the fact that easier queries need less computational resources. Measuring the difficulty of a query and stopping the search accordingly is formulated as a stopping problem. We introduce stopping rules that terminate the image search depending on the difficulty of each query, thereby significantly reducing the computational cost. Our experimental results show the effectiveness of our approach when it is applied to appearance-based localization problem.

Click to Read Paper and Get Code
Vector-quantization can be a computationally expensive step in visual bag-of-words (BoW) search when the vocabulary is large. A BoW-based appearance SLAM needs to tackle this problem for an efficient real-time operation. We propose an effective method to speed up the vector-quantization process in BoW-based visual SLAM. We employ a graph-based nearest neighbor search (GNNS) algorithm to this aim, and experimentally show that it can outperform the state-of-the-art. The graph-based search structure used in GNNS can efficiently be integrated into the BoW model and the SLAM framework. The graph-based index, which is a k-NN graph, is built over the vocabulary words and can be extracted from the BoW's vocabulary construction procedure, by adding one iteration to the k-means clustering, which adds small extra cost. Moreover, exploiting the fact that images acquired for appearance-based SLAM are sequential, GNNS search can be initiated judiciously which helps increase the speedup of the quantization process considerably.

Click to Read Paper and Get Code
Optical neural network (ONN) is emerging as an attractive proposal for machine-learning applications, enabling high-speed computation with low-energy consumption. However, there are several challenges in applying ONN for industrial applications, including the realization of activation functions and maintaining stability. In particular, the stability of ONNs decrease with the circuit depth, limiting the scalability of the ONNs for practical uses. Here we demonstrate how to compress the circuit depth of ONN to scale only logarithmically, leading to an exponential gain in terms of noise robustness. Our low-depth (LD) ONN is based on an architecture, called Optical CompuTing Of dot-Product UnitS (OCTOPUS), which can also be applied individually as a linear perceptron for solving classification problems. Using the standard data set of Letter Recognition, we present numerical evidence showing that LD-ONN can exhibit a significant gain in noise robustness, compared with a previous ONN proposal based on singular-value decomposition [Nature Photonics 11, 441 (2017)].

* 16 pages, 6 figures
Click to Read Paper and Get Code
Convolutional neural networks (CNNs) have emerged as the state-of-the-art in multiple vision tasks including depth estimation. However, memory and computing power requirements remain as challenges to be tackled in these models. Monocular depth estimation has significant use in robotics and virtual reality that requires deployment on low-end devices. Training a small model from scratch results in a significant drop in accuracy and it does not benefit from pre-trained large models. Motivated by the literature of model pruning, we propose a lightweight monocular depth model obtained from a large trained model. This is achieved by removing the least important features with a novel joint end-to-end filter pruning. We propose to learn a binary mask for each filter to decide whether to drop the filter or not. These masks are trained jointly to exploit relations between filters at different layers as well as redundancy within the same layer. We show that we can achieve around 5x compression rate with small drop in accuracy on the KITTI driving dataset. We also show that masking can improve accuracy over the baseline with fewer parameters, even without enforcing compression loss.

Click to Read Paper and Get Code
Adversary scenarios in driving, where the other vehicles may make mistakes or have a competing or malicious intent, have to be studied not only for our safety but also for addressing the concerns from public in order to push the technology forward. Classical planning solutions for adversary driving do not exist so far, especially when the vehicles do not communicate their intent. Given recent success in solving hard problems in artificial intelligence (AI), it is worth studying the potential of reinforcement learning for safety driving in adversary settings. In most recent reinforcement learning applications, there is a deep neural networks that maps an input state to an optimal policy over primitive actions. However, learning a policy over primitive actions is very difficult and inefficient. On the other hand, the knowledge already learned in classical planning methods should be inherited and reused. In order to take advantage of reinforcement learning good at exploring the action space for safety and classical planning skill models good at handling most driving scenarios, we propose to learn a policy over an action space of primitive actions augmented with classical planning methods. We show two advantages by doing so. First, training this reinforcement learning agent is easier and faster than training the primitive-action agent. Second, our new agent outperforms the primitive-action reinforcement learning agent, human testers as well as the classical planning methods that our agent queries as skills.

Click to Read Paper and Get Code
In recent years, various shadow detection methods from a single image have been proposed and used in vision systems; however, most of them are not appropriate for the robotic applications due to the expensive time complexity. This paper introduces a fast shadow detection method using a deep learning framework, with a time cost that is appropriate for robotic applications. In our solution, we first obtain a shadow prior map with the help of multi-class support vector machine using statistical features. Then, we use a semantic- aware patch-level Convolutional Neural Network that efficiently trains on shadow examples by combining the original image and the shadow prior map. Experiments on benchmark datasets demonstrate the proposed method significantly decreases the time complexity of shadow detection, by one or two orders of magnitude compared with state-of-the-art methods, without losing accuracy.

* 6 pages, 5 figures, Submitted to IROS 2018
Click to Read Paper and Get Code
In this paper, we aim at learning simultaneously a discriminative dictionary and a robust projection matrix from noisy data. The joint learning, makes the learned projection and dictionary a better fit for each other, so a more accurate classification can be obtained. However, current prevailing joint dimensionality reduction and dictionary learning methods, would fail when the training samples are noisy or heavily corrupted. To address this issue, we propose a joint projection and dictionary learning using low-rank regularization and graph constraints (JPDL-LR). Specifically, the discrimination of the dictionary is achieved by imposing Fisher criterion on the coding coefficients. In addition, our method explicitly encodes the local structure of data by incorporating a graph regularization term, that further improves the discriminative ability of the projection matrix. Inspired by recent advances of low-rank representation for removing outliers and noise, we enforce a low-rank constraint on sub-dictionaries of all classes to make them more compact and robust to noise. Experimental results on several benchmark datasets verify the effectiveness and robustness of our method for both dimensionality reduction and image classification, especially when the data contains considerable noise or variations.

Click to Read Paper and Get Code
For an object classification system, the most critical obstacles towards real-world applications are often caused by large intra-class variability, arising from different lightings, occlusion and corruption, in limited sample sets. Most methods in the literature would fail when the training samples are heavily occluded, corrupted or have significant illumination or viewpoint variations. Besides, most of the existing methods and especially deep learning-based methods, need large training sets to achieve a satisfactory recognition performance. Although using the pre-trained network on a generic large-scale dataset and fine-tune it to the small-sized target dataset is a widely used technique, this would not help when the content of base and target datasets are very different. To address these issues, we propose a joint projection and low-rank dictionary learning method using dual graph constraints (JP-LRDL). The proposed joint learning method would enable us to learn the features on top of which dictionaries can be better learned, from the data with large intra-class variability. Specifically, a structured class-specific dictionary is learned and the discrimination is further improved by imposing a graph constraint on the coding coefficients, that maximizes the intra-class compactness and inter-class separability. We also enforce low-rank and structural incoherence constraints on sub-dictionaries to make them more compact and robust to variations and outliers and reduce the redundancy among them, respectively. To preserve the intrinsic structure of data and penalize unfavourable relationship among training samples simultaneously, we introduce a projection graph into the framework, which significantly enhances the discriminative ability of the projection matrix and makes the method robust to small-sized and high-dimensional datasets.

* arXiv admin note: text overlap with arXiv:1603.07697; text overlap with arXiv:1404.3606 by other authors
Click to Read Paper and Get Code
The $k$-means clustering algorithm is popular but has the following main drawbacks: 1) the number of clusters, $k$, needs to be provided by the user in advance, 2) it can easily reach local minima with randomly selected initial centers, 3) it is sensitive to outliers, and 4) it can only deal with well separated hyperspherical clusters. In this paper, we propose a Local Density Peaks Searching (LDPS) initialization framework to address these issues. The LDPS framework includes two basic components: one of them is the local density that characterizes the density distribution of a data set, and the other is the local distinctiveness index (LDI) which we introduce to characterize how distinctive a data point is compared with its neighbors. Based on these two components, we search for the local density peaks which are characterized with high local densities and high LDIs to deal with 1) and 2). Moreover, we detect outliers characterized with low local densities but high LDIs, and exclude them out before clustering begins. Finally, we apply the LDPS initialization framework to $k$-medoids, which is a variant of $k$-means and chooses data samples as centers, with diverse similarity measures other than the Euclidean distance to fix the last drawback of $k$-means. Combining the LDPS initialization framework with $k$-means and $k$-medoids, we obtain two novel clustering methods called LDPS-means and LDPS-medoids, respectively. Experiments on synthetic data sets verify the effectiveness of the proposed methods, especially when the ground truth of the cluster number $k$ is large. Further, experiments on several real world data sets, Handwritten Pendigits, Coil-20, Coil-100 and Olivetti Face Database, illustrate that our methods give a superior performance than the analogous approaches on both estimating $k$ and unsupervised object categorization.

* 16 pages, 9 figures, journal paper
Click to Read Paper and Get Code
Deep convolutional neural networks (CNN) have recently been shown in many computer vision and pattern recog- nition applications to outperform by a significant margin state- of-the-art solutions that use traditional hand-crafted features. However, this impressive performance is yet to be fully exploited in robotics. In this paper, we focus one specific problem that can benefit from the recent development of the CNN technology, i.e., we focus on using a pre-trained CNN model as a method of generating an image representation appropriate for visual loop closure detection in SLAM (simultaneous localization and mapping). We perform a comprehensive evaluation of the outputs at the intermediate layers of a CNN as image descriptors, in comparison with state-of-the-art image descriptors, in terms of their ability to match images for detecting loop closures. The main conclusions of our study include: (a) CNN-based image representations perform comparably to state-of-the-art hand- crafted competitors in environments without significant lighting change, (b) they outperform state-of-the-art competitors when lighting changes significantly, and (c) they are also significantly faster to extract than the state-of-the-art hand-crafted features even on a conventional CPU and are two orders of magnitude faster on an entry-level GPU.

* 8 pages, 4 figures
Click to Read Paper and Get Code
Isometric feature mapping (Isomap) is a promising manifold learning method. However, Isomap fails to work on data which distribute on clusters in a single manifold or manifolds. Many works have been done on extending Isomap to multi-manifolds learning. In this paper, we first proposed a new multi-manifolds learning algorithm (M-Isomap) with help of a general procedure. The new algorithm preserves intra-manifold geodesics and multiple inter-manifolds edges precisely. Compared with previous methods, this algorithm can isometrically learn data distributed on several manifolds. Secondly, the original multi-cluster manifold learning algorithm first proposed in \cite{DCIsomap} and called D-C Isomap has been revised so that the revised D-C Isomap can learn multi-manifolds data. Finally, the features and effectiveness of the proposed multi-manifolds learning algorithms are demonstrated and compared through experiments.

Click to Read Paper and Get Code
We consider a ranking and selection problem in the context of personalized decision making, where the best alternative is not universal but varies as a function of observable covariates. The goal of ranking and selection with covariates (R&S-C) is to use sampling to compute a decision rule that can specify the best alternative with certain statistical guarantee for each subsequent individual after observing his or her covariates. A linear model is proposed to capture the relationship between the mean performance of an alternative and the covariates. Under the indifference-zone formulation, we develop two-stage procedures for both homoscedastic and heteroscedastic sampling errors, respectively, and prove their statistical validity, which is defined in terms of probability of correct selection. We also generalize the well-known slippage configuration, and prove that the generalized slippage configuration is the least favorable configuration of our procedures. Extensive numerical experiments are conducted to investigate the performance of the proposed procedures. Finally, we demonstrate the usefulness of R&S-C via a case study of selecting the best treatment regimen in the prevention of esophageal cancer. We find that by leveraging disease-related personal information, R&S-C can improve substantially the expected quality-adjusted life years for some groups of patients through providing patient-specific treatment regimen.

Click to Read Paper and Get Code
A large number of experimental data shows that Support Vector Machine (SVM) algorithm has obvious advantages in text classification, handwriting recognition, image classification, bioinformatics, and some other fields. To some degree, the optimization of SVM depends on its kernel function and Slack variable, the determinant of which is its parameters $\delta$ and c in the classification function. That is to say,to optimize the SVM algorithm, the optimization of the two parameters play a huge role. Ant Colony Optimization (ACO) is optimization algorithm which simulate ants to find the optimal path.In the available literature, we mix the ACO algorithm and Parallel algorithm together to find a well parameters.

* 3 pages, 2 figures, 2 tables
Click to Read Paper and Get Code
Manifold learning is a hot research topic in the field of computer science and has many applications in the real world. A main drawback of manifold learning methods is, however, that there is no explicit mappings from the input data manifold to the output embedding. This prohibits the application of manifold learning methods in many practical problems such as classification and target detection. Previously, in order to provide explicit mappings for manifold learning methods, many methods have been proposed to get an approximate explicit representation mapping with the assumption that there exists a linear projection between the high-dimensional data samples and their low-dimensional embedding. However, this linearity assumption may be too restrictive. In this paper, an explicit nonlinear mapping is proposed for manifold learning, based on the assumption that there exists a polynomial mapping between the high-dimensional data samples and their low-dimensional representations. As far as we know, this is the first time that an explicit nonlinear mapping for manifold learning is given. In particular, we apply this to the method of Locally Linear Embedding (LLE) and derive an explicit nonlinear manifold learning algorithm, named Neighborhood Preserving Polynomial Embedding (NPPE). Experimental results on both synthetic and real-world data show that the proposed mapping is much more effective in preserving the local neighborhood information and the nonlinear geometry of the high-dimensional data samples than previous work.

Click to Read Paper and Get Code