Research papers and code for "Xiaolin Huang":
One-bit measurements widely exist in the real world, and they can be used to recover sparse signals. This task is known as the problem of learning halfspaces in learning theory and one-bit compressive sensing (1bit-CS) in signal processing. In this paper, we propose novel algorithms based on both convex and nonconvex sparsity-inducing penalties for robust 1bit-CS. We provide a sufficient condition to verify whether a solution is globally optimal or not. Then we show that the globally optimal solution for positive homogeneous penalties can be obtained in two steps: a proximal operator and a normalization step. For several nonconvex penalties, including minimax concave penalty (MCP), $\ell_0$ norm, and sorted $\ell_1$ penalty, we provide fast algorithms for finding the analytical solutions by solving the dual problem. Specifically, our algorithm is more than $200$ times faster than the existing algorithm for MCP. Its efficiency is comparable to the algorithm for the $\ell_1$ penalty in time, while its performance is much better. Among these penalties, the sorted $\ell_1$ penalty is most robust to noise in different settings.

* X. Huang and M. Yan, Non-convex penalties with analytical solutions for one-bit compressive sensing, Signal Processing, 144 (2018), 341-351
Click to Read Paper and Get Code
Traditionally, kernel learning methods requires positive definitiveness on the kernel, which is too strict and excludes many sophisticated similarities, that are indefinite, in multimedia area. To utilize those indefinite kernels, indefinite learning methods are of great interests. This paper aims at the extension of the logistic regression from positive semi-definite kernels to indefinite kernels. The model, called indefinite kernel logistic regression (IKLR), keeps consistency to the regular KLR in formulation but it essentially becomes non-convex. Thanks to the positive decomposition of an indefinite matrix, IKLR can be transformed into a difference of two convex models, which follows the use of concave-convex procedure. Moreover, we employ an inexact solving scheme to speed up the sub-problem and develop a concave-inexact-convex procedure (CCICP) algorithm with theoretical convergence analysis. Systematical experiments on multi-modal datasets demonstrate the superiority of the proposed IKLR method over kernel logistic regression with positive definite kernels and other state-of-the-art indefinite learning based algorithms.

* Note that this is not the camera-ready version
Click to Read Paper and Get Code
Previous works for PCB defect detection based on image difference and image processing techniques have already achieved promising performance. However, they sometimes fall short because of the unaccounted defect patterns or over-sensitivity about some hyper-parameters. In this work, we design a deep model that accurately detects PCB defects from an input pair of a detect-free template and a defective tested image. A novel group pyramid pooling module is proposed to efficiently extract features of a large range of resolutions, which are merged by group to predict PCB defect of corresponding scales. To train the deep model, a dataset is established, namely DeepPCB, which contains 1,500 image pairs with annotations including positions of 6 common types of PCB defects. Experiment results validate the effectiveness and efficiency of the proposed model by achieving $98.6\%$ mAP @ 62 FPS on DeepPCB dataset. This dataset is now available at: https://github.com/tangsanli5201/DeepPCB.

* 4 pages, 4 figures
Click to Read Paper and Get Code
Sign information is the key to overcoming the inevitable saturation error in compressive sensing systems, which causes information loss and results in bias. For sparse signal recovery from saturation, we propose to use a linear loss to improve the effectiveness from existing methods that utilize hard constraints/hinge loss for sign consistency. Due to the use of linear loss, an analytical solution in the update progress is obtained, and some nonconvex penalties are applicable, e.g., the minimax concave penalty, the $\ell_0$ norm, and the sorted $\ell_1$ norm. Theoretical analysis reveals that the estimation error can still be bounded. Generally, with linear loss and nonconvex penalties, the recovery performance is significantly improved, and the computational time is largely saved, which is verified by the numerical experiments.

Click to Read Paper and Get Code
One-shot semantic segmentation poses a challenging task of recognizing the object regions from unseen categories with only one annotated example as supervision. In this paper, we propose a simple yet effective Similarity Guidance network to tackle the One-shot (SG-One) segmentation problem. We aim at predicting the segmentation mask of a query image with the reference to one densely labeled support image. To obtain the robust representative feature of the support image, we firstly propose a masked average pooling strategy for producing the guidance features using only the pixels belonging to the support image. We then leverage the cosine similarity to build the relationship between the guidance features and features of pixels from the query image. In this way, the possibilities embedded in the produced similarity maps can be adopted to guide the process of segmenting objects. Furthermore, our SG-One is a unified framework which can efficiently process both support and query images within one network and be learned in an end-to-end manner. We conduct extensive experiments on Pascal VOC 2012. In particular, our SG-One achieves the mIoU score of 46.3%, which outperforms the state-of-the-art.

Click to Read Paper and Get Code
False positive and false negative rates are equally important for evaluating the performance of a classifier. Adversarial examples by increasing false negative rate have been studied in recent years. However, harming a classifier by increasing false positive rate is almost blank, since it is much more difficult to generate a new and meaningful positive than the negative. To generate false positives, a supervised generative framework is proposed in this paper. Experiment results show that our method is practical and effective to generate those adversarial examples on large-scale image datasets.

Click to Read Paper and Get Code
Traditional kernels or their combinations are often not sufficiently flexible to fit the data in complicated practical tasks. In this paper, we present a Data-Adaptive Nonparametric Kernel (DANK) learning framework by imposing an adaptive matrix on the kernel/Gram matrix in an entry-wise strategy. Since we do not specify the formulation of the adaptive matrix, each entry in it can be directly and flexibly learned from the data. Therefore, the solution space of the learned kernel is largely expanded, which makes DANK flexible to adapt to the data. Specifically, the proposed kernel learning framework can be seamlessly embedded to support vector machines (SVM) and support vector regression (SVR), which has the capability of enlarging the margin between classes and reducing the model generalization error. Theoretically, we demonstrate that the objective function of our devised model is gradient-Lipschitz continuous. Thereby, the training process for kernel and parameter learning in SVM/SVR can be efficiently optimized in a unified framework. Further, to address the scalability issue in DANK, a decomposition-based scalable approach is developed, of which the effectiveness is demonstrated by both empirical studies and theoretical guarantees. Experimentally, our method outperforms other representative kernel learning based algorithms on various classification and regression benchmark datasets.

Click to Read Paper and Get Code
The one-bit quantization is implemented by one single comparator that operates at low power and a high rate. Hence one-bit compressive sensing (1bit-CS) becomes attractive in signal processing. When measurements are corrupted by noise during signal acquisition and transmission, 1bit-CS is usually modeled as minimizing a loss function with a sparsity constraint. The one-sided $\ell_1$ loss and the linear loss are two popular loss functions for 1bit-CS. To improve the decoding performance on noisy data, we consider the pinball loss, which provides a bridge between the one-sided $\ell_1$ loss and the linear loss. Using the pinball loss, two convex models, an elastic-net pinball model and its modification with the $\ell_1$-norm constraint, are proposed. To efficiently solve them, the corresponding dual coordinate ascent algorithms are designed and their convergence is proved. The numerical experiments confirm the effectiveness of the proposed algorithms and the performance of the pinball loss minimization for 1bit-CS.

* 11 pages
Click to Read Paper and Get Code
Weakly supervised methods usually generate localization results based on attention maps produced by classification networks. However, the attention maps exhibit the most discriminative parts of the object which are small and sparse. We propose to generate Self-produced Guidance (SPG) masks which separate the foreground, the object of interest, from the background to provide the classification networks with spatial correlation information of pixels. A stagewise approach is proposed to incorporate high confident object regions to learn the SPG masks. The high confident regions within attention maps are utilized to progressively learn the SPG masks. The masks are then used as an auxiliary pixel-level supervision to facilitate the training of classification networks. Extensive experiments on ILSVRC demonstrate that SPG is effective in producing high-quality object localizations maps. Particularly, the proposed SPG achieves the Top-1 localization error rate of 43.83% on the ILSVRC validation set, which is a new state-of-the-art error rate.

* ECCV 2018
Click to Read Paper and Get Code
In this work, we propose Adversarial Complementary Learning (ACoL) to automatically localize integral objects of semantic interest with weak supervision. We first mathematically prove that class localization maps can be obtained by directly selecting the class-specific feature maps of the last convolutional layer, which paves a simple way to identify object regions. We then present a simple network architecture including two parallel-classifiers for object localization. Specifically, we leverage one classification branch to dynamically localize some discriminative object regions during the forward pass. Although it is usually responsive to sparse parts of the target objects, this classifier can drive the counterpart classifier to discover new and complementary object regions by erasing its discovered regions from the feature maps. With such an adversarial learning, the two parallel-classifiers are forced to leverage complementary object regions for classification and can finally generate integral object localization together. The merits of ACoL are mainly two-fold: 1) it can be trained in an end-to-end manner; 2) dynamically erasing enables the counterpart classifier to discover complementary object regions more effectively. We demonstrate the superiority of our ACoL approach in a variety of experiments. In particular, the Top-1 localization error rate on the ILSVRC dataset is 45.14%, which is the new state-of-the-art.

* CVPR 2018 Accepted
Click to Read Paper and Get Code
Robust PCA methods are typically batch algorithms which requires loading all observations into memory before processing. This makes them inefficient to process big data. In this paper, we develop an efficient online robust principal component methods, namely online moving window robust principal component analysis (OMWRPCA). Unlike existing algorithms, OMWRPCA can successfully track not only slowly changing subspace but also abruptly changed subspace. By embedding hypothesis testing into the algorithm, OMWRPCA can detect change points of the underlying subspaces. Extensive simulation studies demonstrate the superior performance of OMWRPCA compared with other state-of-art approaches. We also apply the algorithm for real-time background subtraction of surveillance video.

Click to Read Paper and Get Code
Robustness of deep learning methods for limited angle tomography is challenged by two major factors: a) due to insufficient training data the network may not generalize well to unseen data; b) deep learning methods are sensitive to noise. Thus, generating reconstructed images directly from a neural network appears inadequate. We propose to constrain the reconstructed images to be consistent with the measured projection data, while the unmeasured information is complemented by learning based methods. For this purpose, a data consistent artifact reduction (DCAR) method is introduced: First, a prior image is generated from an initial limited angle reconstruction via deep learning as a substitute for missing information. Afterwards, a conventional iterative reconstruction algorithm is applied, integrating the data consistency in the measured angular range and the prior information in the missing angular range. This ensures data integrity in the measured area, while inaccuracies incorporated by the deep learning prior lie only in areas where no information is acquired. The proposed DCAR method achieves significant image quality improvement: for 120-degree cone-beam limited angle tomography more than 10% RMSE reduction in noise-free case and more than 24% RMSE reduction in noisy case compared with a state-of-the-art U-Net based method.

* Accepted by MICCAI MLMIR workshop
Click to Read Paper and Get Code
In this paper, we propose a novel matching based tracker by investigating the relationship between template matching and the recent popular correlation filter based trackers (CFTs). Compared to the correlation operation in CFTs, a sophisticated similarity metric termed "mutual buddies similarity" (MBS) is proposed to exploit the relationship of multiple reciprocal nearest neighbors for target matching. By doing so, our tracker obtains powerful discriminative ability on distinguishing target and background as demonstrated by both empirical and theoretical analyses. Besides, instead of utilizing single template with the improper updating scheme in CFTs, we design a novel online template updating strategy named "memory filtering" (MF), which aims to select a certain amount of representative and reliable tracking results in history to construct the current stable and expressive template set. This scheme is beneficial for the proposed tracker to comprehensively "understand" the target appearance variations, "recall" some stable results. Both qualitative and quantitative evaluations on two benchmarks suggest that the proposed tracking method performs favorably against some recently developed CFTs and other competitive trackers.

* has been published on IEEE TIP
Click to Read Paper and Get Code
Hyper-kernels endowed by hyper-Reproducing Kernel Hilbert Space (hyper-RKHS) formulate the kernel learning task as learning on the space of kernels itself, which provides significant model flexibility for kernel learning with outstanding performance in real-world applications. However, the convergence behavior of these learning algorithms in hyper-RKHS has not been investigated in learning theory. In this paper, we conduct approximation analysis of kernel ridge regression (KRR) and support vector regression (SVR) in this space. To the best of our knowledge, this is the first work to study the approximation performance of regression in hyper-RKHS. For applications, we propose a general kernel learning framework conducted by the introduced two regression models to deal with the out-of-sample extensions problem, i.e., to learn a underlying general kernel from the pre-given kernel/similarity matrix in hyper-RKHS. Experimental results on several benchmark datasets suggest that our methods are able to learn a general kernel function from an arbitrary given kernel matrix.

* 20 pages, 2 figures
Click to Read Paper and Get Code
This paper addresses streak reduction in limited angle tomography. Although the iterative reweighted total variation (wTV) algorithm reduces small streaks well, it is rather inept at eliminating large ones since total variation (TV) regularization is scale-dependent and may regard these streaks as homogeneous areas. Hence, the main purpose of this paper is to reduce streak artifacts at various scales. We propose the scale-space anisotropic total variation (ssaTV) algorithm in two different implementations. The first implementation (ssaTV-1) utilizes an anisotropic gradient-like operator which uses 2s neighboring pixels along the streaks' normal direction at each scale s. The second implementation (ssaTV-2) makes use of anisotropic down-sampling and up-sampling operations, similarly oriented along the streaks' normal direction, to apply TV regularization at various scales. Experiments on numerical and clinical data demonstrate that both ssaTV algorithms reduce streak artifacts more effectively and efficiently than wTV, particularly when using multiple scales.

* 8 pages, 12 figures (48 subfigrues in total)
Click to Read Paper and Get Code
Image smoothing is a fundamental procedure in applications of both computer vision and graphics. The required smoothing properties can be different or even contradictive among different tasks. Nevertheless, the inherent smoothing nature of one smoothing operator is usually fixed and thus cannot meet the various requirements of different applications. In this paper, a non-convex non-smooth optimization framework is proposed to achieve diverse smoothing natures where even contradictive smoothing behaviors can be achieved. To this end, we first introduce the truncated Huber penalty function which has seldom been used in image smoothing. A robust framework is then proposed. When combined with the strong flexibility of the truncated Huber penalty function, our framework is capable of a range of applications and can outperform the state-of-the-art approaches in several tasks. In addition, an efficient numerical solution is provided and its convergence is theoretically guaranteed even the optimization framework is non-convex and non-smooth. The effectiveness and superior performance of our approach are validated through comprehensive experimental results in a range of applications.

Click to Read Paper and Get Code
Deep convolutional neural networks (CNNs) have made impressive progress in many video recognition tasks such as video pose estimation and video object detection. However, CNN inference on video is computationally expensive due to processing dense frames individually. In this work, we propose a framework called Recurrent Residual Module (RRM) to accelerate the CNN inference for video recognition tasks. This framework has a novel design of using the similarity of the intermediate feature maps of two consecutive frames, to largely reduce the redundant computation. One unique property of the proposed method compared to previous work is that feature maps of each frame are precisely computed. The experiments show that, while maintaining the similar recognition performance, our RRM yields averagely 2x acceleration on the commonly used CNNs such as AlexNet, ResNet, deep compression model (thus 8-12x faster than the original dense models using the efficient inference engine), and impressively 9x acceleration on some binary networks such as XNOR-Nets (thus 500x faster than the original model). We further verify the effectiveness of the RRM on speeding up CNNs for video pose estimation and video object detection.

* To appear in CVPR 2018
Click to Read Paper and Get Code
When a measurement falls outside the quantization or measurable range, it becomes saturated and cannot be used in classical reconstruction methods. For example, in C-arm angiography systems, which provide projection radiography, fluoroscopy, digital subtraction angiography, and are widely used for medical diagnoses and interventions, the limited dynamic range of C-arm flat detectors leads to overexposure in some projections during an acquisition, such as imaging relatively thin body parts (e.g., the knee). Aiming at overexposure correction for computed tomography (CT) reconstruction, we in this paper propose a mixed one-bit compressive sensing (M1bit-CS) to acquire information from both regular and saturated measurements. This method is inspired by the recent progress on one-bit compressive sensing, which deals with only sign observations. Its successful applications imply that information carried by saturated measurements is useful to improve recovery quality. For the proposed M1bit-CS model, alternating direction methods of multipliers is developed and an iterative saturation detection scheme is established. Then we evaluate M1bit-CS on one-dimensional signal recovery tasks. In some experiments, the performance of the proposed algorithms on mixed measurements is almost the same as recovery on unsaturated ones with the same amount of measurements. Finally, we apply the proposed method to overexposure correction for CT reconstruction on a phantom and a simulated clinical image. The results are promising, as the typical streaking artifacts and capping artifacts introduced by saturated projection data are effectively reduced, yielding significant error reduction compared with existing algorithms based on extrapolation.

Click to Read Paper and Get Code
Chromosome classification is critical for karyotyping in abnormality diagnosis. To expedite diagnosis process, we present a novel method named Varifocal-Net for simultaneous classification of chromosome's type and polarity using deep convolutional networks. The approach consists of one global-scale network (G-Net) and one local-scale network (L-Net). It follows two stages. The first stage is to learn both global and local features. We extract global features and detect finer local regions via the G-Net. With the proposed varifocal mechanism, we zoom into local parts and extract local features via the L-Net. Residual learning and multi-task learning strategies are utilized to promote high-level feature extraction. The detection of discriminative local parts is fulfilled by a localization subnet of the G-Net, whose training process involves both supervised and weekly-supervised learning. The second stage is to build two multi-layer perceptron classifiers that exploit features of both two scales to boost classification performance. Evaluation results from 1909 karyotyping cases demonstrate that our Varifocal-Net achieved the highest accuracy of 0.9805, 0.9909 and average F1-score of 0.9771, 0.9909 for the type and polarity task, respectively. It outperformed state-of-the-art methods, demonstrating the effectiveness of our Varifocal mechanism and multi-scale feature ensemble.

* 10 pages, 11 figures, 10 tables
Click to Read Paper and Get Code