Models, code, and papers for "Xin Li":
This short article presents a class of projection-based solution algorithms to the problem considered in the pioneering work on compressed sensing - perfect reconstruction of a phantom image from 22 radial lines in the frequency domain. Under the framework of projection-based image reconstruction, we will show experimentally that several old and new tools of nonlinear filtering (including Perona-Malik diffusion, nonlinear diffusion, Translation-Invariant thresholding and SA-DCT thresholding) all lead to perfect reconstruction of the phantom image.
A deep convolutional fuzzy system (DCFS) on a high-dimensional input space is a multi-layer connection of many low-dimensional fuzzy systems, where the input variables to the low-dimensional fuzzy systems are selected through a moving window (a convolution operator) across the input spaces of the layers. To design the DCFS based on input-output data pairs, we propose a bottom-up layer-by-layer scheme. Specifically, by viewing each of the first-layer fuzzy systems as a weak estimator of the output based only on a very small portion of the input variables, we can design these fuzzy systems using the WM Method. After the first-layer fuzzy systems are designed, we pass the data through the first layer and replace the inputs in the original data set by the corresponding outputs of the first layer to form a new data set, then we design the second-layer fuzzy systems based on this new data set in the same way as designing the first-layer fuzzy systems. Repeating this process we design the whole DCFS. Since the WM Method requires only one-pass of the data, this training algorithm for the DCFS is very fast. We apply the DCFS model with the training algorithm to predict a synthetic chaotic plus random time-series and the real Hang Seng Index of the Hong Kong stock market.
Deep learning has greatly improved visual recognition in recent years. However, recent research has shown that there exist many adversarial examples that can negatively impact the performance of such an architecture. This paper focuses on detecting those adversarial examples by analyzing whether they come from the same distribution as the normal examples. Instead of directly training a deep neural network to detect adversarials, a much simpler approach was proposed based on statistics on outputs from convolutional layers. A cascade classifier was designed to efficiently detect adversarials. Furthermore, trained from one particular adversarial generating mechanism, the resulting classifier can successfully detect adversarials from a completely different mechanism as well. The resulting classifier is non-subdifferentiable, hence creates a difficulty for adversaries to attack by using the gradient of the classifier. After detecting adversarial examples, we show that many of them can be recovered by simply performing a small average filter on the image. Those findings should lead to more insights about the classification mechanisms in deep convolutional neural networks.
We propose a new heavy-tailed distribution --- Gaussian-Chain (GC) distribution, which is inspirited by the hierarchical structures prevailing in social organizations. We determine the mean, variance and kurtosis of the Gaussian-Chain distribution to show its heavy-tailed property, and compute the tail distribution table to give specific numbers showing how heavy is the heavy-tails. To filter out the heavy-tailed noise, we construct two filters --- 2nd and 3rd-order GC filters --- based on the maximum likelihood principle. Simulation results show that the GC filters perform much better than the benchmark least-squares algorithm when the noise is heavy-tail distributed. Using the GC filters, we propose a trading strategy, named Ride-the-Mood, to follow the mood of the market by detecting the actions of the big buyers and the big sellers in the market based on the noisy, heavy-tailed price data. Application of the Ride-the-Mood strategy to five blue-chip Hong Kong stocks over the recent two-year period from April 2, 2012 to March 31, 2014 shows that their returns are higher than the returns of the benchmark Buy-and-Hold strategy and the Hang Seng Index Fund.
We propose a novel Shapley value approach to help address neural networks' interpretability and "vanishing gradient" problems. Our method is based on an accurate analytical approximation to the Shapley value of a neuron with ReLU activation. This analytical approximation admits a linear propagation of relevance across neural network layers, resulting in a simple, fast and sensible interpretation of neural networks' decision making process. We then derived a globally continuous and non-vanishing Shapley gradient, which can replace the conventional gradient in training neural network layers with ReLU activation, and leading to better training performance. We further derived a Shapley Activation (SA) function, which is a close approximation to ReLU but features the Shapley gradient. The SA is easy to implement in existing machine learning frameworks. Numerical tests show that SA consistently outperforms ReLU in training convergence, accuracy and stability.
In this paper, we introduce the STN-Homography model to directly estimate the homography matrix between image pair. Different most CNN-based homography estimation methods which use an alternative 4-point homography parameterization, we use prove that, after coordinate normalization, the variance of elements of coordinate normalized $3\times3$ homography matrix is very small and suitable to be regressed well with CNN. Based on proposed STN-Homography, we use a hierarchical architecture which stacks several STN-Homography models and successively reduce the estimation error. Effectiveness of the proposed method is shown through experiments on MSCOCO dataset, in which it significantly outperforms the state-of-the-art. The average processing time of our hierarchical STN-Homography with 1 stage is only 4.87 ms on the GPU, and the processing time for hierarchical STN-Homography with 3 stages is 17.85 ms. The code will soon be open sourced.
We develop a new algorithm to perform facial reconstruction from a given skull. This technique has forensic application in helping the identification of skeletal remains when other information is unavailable. Unlike most existing strategies that directly reconstruct the face from the skull, we utilize a database of portrait photos to create many face candidates, then perform a superimposition to get a well matched face, and then revise it according to the superimposition. To support this pipeline, we build an effective autoencoder for image-based facial reconstruction, and a generative model for constrained face inpainting. Our experiments have demonstrated that the proposed pipeline is stable and accurate.
This paper proposes a novel algorithm to reassemble an arbitrarily shredded image to its original status. Existing reassembly pipelines commonly consist of a local matching stage and a global compositions stage. In the local stage, a key challenge in fragment reassembly is to reliably compute and identify correct pairwise matching, for which most existing algorithms use handcrafted features, and hence, cannot reliably handle complicated puzzles. We build a deep convolutional neural network to detect the compatibility of a pairwise stitching, and use it to prune computed pairwise matches. To improve the network efficiency and accuracy, we transfer the calculation of CNN to the stitching region and apply a boost training strategy. In the global composition stage, we modify the commonly adopted greedy edge selection strategies to two new loop closure based searching algorithms. Extensive experiments show that our algorithm significantly outperforms existing methods on solving various puzzles, especially those challenging ones with many fragment pieces.
Existing approaches towards single image dehazing including both model-based and learning-based heavily rely on the estimation of so-called transmission maps. Despite its conceptual simplicity, using transmission maps as an intermediate step often makes it more difficult to optimize the perceptual quality of reconstructed images. To overcome this weakness, we propose a direct deep learning approach toward image dehazing bypassing the step of transmission map estimation and facilitating end-to-end perceptual optimization. Our technical contributions are mainly three-fold. First, based on the analogy between dehazing and denoising, we propose to directly learn a nonlinear mapping from the space of degraded images to that of haze-free ones via recursive deep residual learning; Second, inspired by the success of generative adversarial networks (GAN), we propose to optimize the perceptual quality of dehazed images by introducing a discriminator and a loss function adaptive to hazy conditions; Third, we propose to remove notorious halo-like artifacts at large scene depth discontinuities by a novel application of guided filtering. Extensive experimental results have shown that the subjective qualities of dehazed images by the proposed perceptually optimized GAN (POGAN) are often more favorable than those by existing state-of-the-art approaches especially when hazy condition varies.
The quality of solution sets generated by decomposition-based evolutionary multiobjective optimisation (EMO) algorithms depends heavily on the consistency between a given problem's Pareto front shape and the specified weights' distribution. A set of weights distributed uniformly in a simplex often lead to a set of well-distributed solutions on a Pareto front with a simplex-like shape, but may fail on other Pareto front shapes. It is an open problem on how to specify a set of appropriate weights without the information of the problem's Pareto front beforehand. In this paper, we propose an approach to adapt the weights during the evolutionary process (called AdaW). AdaW progressively seeks a suitable distribution of weights for the given problem by elaborating five parts in the weight adaptation --- weight generation, weight addition, weight deletion, archive maintenance, and weight update frequency. Experimental results have shown the effectiveness of the proposed approach. AdaW works well for Pareto fronts with very different shapes: 1) the simplex-like, 2) the inverted simplex-like, 3) the highly nonlinear, 4) the disconnect, 5) the degenerated, 6) the badly-scaled, and 7) the high-dimensional.
Nowadays, it is still difficult to adapt Convolutional Neural Network (CNN) based models for deployment on embedded devices. The heavy computation and large memory footprint of CNN models become the main burden in real application. In this paper, we propose a "Sparse Shrink" algorithm to prune an existing CNN model. By analyzing the importance of each channel via sparse reconstruction, the algorithm is able to prune redundant feature maps accordingly. The resulting pruned model thus directly saves computational resource. We have evaluated our algorithm on CIFAR-100. As shown in our experiments, we can reduce 56.77% parameters and 73.84% multiplication in total with only minor decrease in accuracy. These results have demonstrated the effectiveness of our "Sparse Shrink" algorithm.
Classical principal component analysis (PCA) is not robust to the presence of sparse outliers in the data. The use of the $\ell_1$ norm in the Robust PCA (RPCA) method successfully eliminates the weakness of PCA in separating the sparse outliers. In this paper, by sticking a simple weight to the Frobenius norm, we propose a weighted low rank (WLR) method to avoid the often computationally expensive algorithms relying on the $\ell_1$ norm. As a proof of concept, a background estimation model has been presented and compared with two $\ell_1$ norm minimization algorithms. We illustrate that as long as a simple weight matrix is inferred from the data, one can use the weighted Frobenius norm and achieve the same or better performance.
One of the most common approaches for multiobjective optimization is to generate a solution set that well approximates the whole Pareto-optimal frontier to facilitate the later decision-making process. However, how to evaluate and compare the quality of different solution sets remains challenging. Existing measures typically require additional problem knowledge and information, such as a reference point or a substituted set of the Pareto-optimal frontier. In this paper, we propose a quality measure, called dominance move (DoM), to compare solution sets generated by multiobjective optimizers. Given two solution sets, DoM measures the minimum sum of move distances for one set to weakly Pareto dominate the other set. DoM can be seen as a natural reflection of the difference between two solutions, capturing all aspects of solution sets' quality, being compliant with Pareto dominance, and does not need any additional problem knowledge and parameters. We present an exact method to calculate the DoM in the biobjective case. We show the necessary condition of constructing the optimal partition for a solution set's minimum move, and accordingly propose an efficient algorithm to recursively calculate the DoM. Finally, DoM is evaluated on several groups of artificial and real test cases as well as by a comparison with two well-established quality measures.
Designing a scheme that can achieve a good performance in predicting single person activities and group activities is a challenging task. In this paper, we propose a novel robust and efficient human activity recognition scheme called ReHAR, which can be used to handle single person activities and group activities prediction. First, we generate an optical flow image for each video frame. Then, both video frames and their corresponding optical flow images are fed into a Single Frame Representation Model to generate representations. Finally, an LSTM is used to pre- dict the final activities based on the generated representations. The whole model is trained end-to-end to allow meaningful representations to be generated for the final activity recognition. We evaluate ReHAR using two well-known datasets: the NCAA Basketball Dataset and the UCFSports Action Dataset. The experimental results show that the pro- posed ReHAR achieves a higher activity recognition accuracy with an order of magnitude shorter computation time compared to the state-of-the-art methods.
We proposed Additive Powers-of-Two~(APoT) quantization, an efficient non-uniform quantization scheme that attends to the bell-shaped and long-tailed distribution of weights in neural networks. By constraining all quantization levels as a sum of several Powers-of-Two terms, APoT quantization enjoys overwhelming efficiency of computation and a good match with weights' distribution. A simple reparameterization on clipping function is applied to generate better-defined gradient for updating of optimal clipping threshold. Moreover, weight normalization is presented to refine the input distribution of weights to be more stable and consistent. Experimental results show that our proposed method outperforms state-of-the-art methods, and is even competitive with the full-precision models demonstrating the effectiveness of our proposed APoT quantization. For example, our 3-bit quantized ResNet-34 on ImageNet only drops 0.3% Top-1 and 0.2% Top-5 accuracy without bells and whistles, while the computation of our model is approximately 2x less than uniformly quantized neural networks.
We consider supervised dimension reduction problems, namely to identify a low dimensional projection of the predictors $\-x$ which can retain the statistical relationship between $\-x$ and the response variable $y$. We follow the idea of the sliced inverse regression (SIR) class of methods, which is to use the statistical information of the conditional distribution $\pi(\-x|y)$ to identify the dimension reduction (DR) space and in particular we focus on the task of computing this conditional distribution. We propose a Bayesian framework to compute the conditional distribution where the likelihood function is obtained using the Gaussian process regression model. The conditional distribution $\pi(\-x|y)$ can then be obtained directly by assigning weights to the original data points. We then can perform DR by considering certain moment functions (e.g. the first moment) of the samples of the posterior distribution. With numerical examples, we demonstrate that the proposed method is especially effective for small data problems.
Medical imaging contains the essential information for rendering diagnostic and treatment decisions. Inspecting (visual perception) and interpreting image to generate a report are tedious clinical routines for a radiologist where automation is expected to greatly reduce the workload. Despite rapid development of natural image captioning, computer-aided medical image visual perception and interpretation remain a challenging task, largely due to the lack of high-quality annotated image-report pairs and tailor-made generative models for sufficient extraction and exploitation of localized semantic features, particularly those associated with abnormalities. To tackle these challenges, we present Vispi, an automatic medical image interpretation system, which first annotates an image via classifying and localizing common thoracic diseases with visual support and then followed by report generation from an attentive LSTM model. Analyzing an open IU X-ray dataset, we demonstrate a superior performance of Vispi in disease classification, localization and report generation using automatic performance evaluation metrics ROUGE and CIDEr.
No-reference image quality assessment (NR-IQA) aims to measure the image quality without reference image. However, contrast distortion has been overlooked in the current research of NR-IQA. In this paper, we propose a very simple but effective metric for predicting quality of contrast-altered images based on the fact that a high-contrast image is often more similar to its contrast enhanced image. Specifically, we first generate an enhanced image through histogram equalization. We then calculate the similarity of the original image and the enhanced one by using structural-similarity index (SSIM) as the first feature. Further, we calculate the histogram based entropy and cross entropy between the original image and the enhanced one respectively, to gain a sum of 4 features. Finally, we learn a regression module to fuse the aforementioned 5 features for inferring the quality score. Experiments on four publicly available databases validate the superiority and efficiency of the proposed technique.
Despite the significant advances in iris segmentation, accomplishing accurate iris segmentation in non-cooperative environment remains a grand challenge. In this paper, we present a deep learning framework, referred to as Iris R-CNN, to offer superior accuracy for iris segmentation. The proposed framework is derived from Mask R-CNN, and several novel techniques are proposed to carefully explore the unique characteristics of iris. First, we propose two novel networks: (i) Double-Circle Region Proposal Network (DC-RPN), and (ii) Double-Circle Classification and Regression Network (DC-CRN) to take into account the iris and pupil circles to maximize the accuracy for iris segmentation. Second, we propose a novel normalization scheme for Regions of Interest (RoIs) to facilitate a radically new pooling operation over a double-circle region. Experimental results on two challenging iris databases, UBIRIS.v2 and MICHE, demonstrate the superior accuracy of the proposed approach over other state-of-the-art methods.
Rolling Horizon Evolutionary Algorithms (RHEA) are a class of online planning methods for real-time game playing; their performance is closely related to the planning horizon and the search time allowed. In this paper, we propose to learn a prior for RHEA in an offline manner by training a value network and a policy network. The value network is used to reduce the planning horizon by providing an estimation of future rewards, and the policy network is used to initialize the population, which helps to narrow down the search scope. The proposed algorithm, named prior-based RHEA (p-RHEA), trains policy and value networks by performing planning and learning iteratively. In the planning stage, the horizon-limited search assisted with the policy network and value network is performed to improve the policies and collect training samples. In the learning stage, the policy network and value network are trained with the collected samples to learn better prior knowledge. Experimental results on OpenAI Gym MuJoCo tasks show that the performance of the proposed p-RHEA is significantly improved compared to that of RHEA.