Models, code, and papers for "Xiaolin Wu":

Ultra High Fidelity Image Compression with $\ell_\infty$-constrained Encoding and Deep Decoding

Feb 10, 2020
Xi Zhang, Xiaolin Wu

In many professional fields, such as medicine, remote sensing and sciences, users often demand image compression methods to be mathematically lossless. But lossless image coding has a rather low compression ratio (around 2:1 for natural images). The only known technique to achieve significant compression while meeting the stringent fidelity requirements is the methodology of $\ell_\infty$-constrained coding that was developed and standardized in nineties. We make a major progress in $\ell_\infty$-constrained image coding after two decades, by developing a novel CNN-based soft $\ell_\infty$-constrained decoding method. The new method repairs compression defects by using a restoration CNN called the $\ell_\infty\mbox{-SDNet}$ to map a conventionally decoded image to the latent image. A unique strength of the $\ell_\infty\mbox{-SDNet}$ is its ability to enforce a tight error bound on a per pixel basis. As such, no small distinctive structures of the original image can be dropped or distorted, even if they are statistical outliers that are otherwise sacrificed by mainstream CNN restoration methods. More importantly, this research ushers in a new image compression system of $\ell_\infty$-constrained encoding and deep soft decoding ($\ell_\infty\mbox{-ED}^2$). The $\ell_\infty \mbox{-ED}^2$ approach beats the best of existing lossy image compression methods (e.g., BPG, WebP, etc.) not only in $\ell_\infty$ but also in $\ell_2$ error metric and perceptual quality, for bit rates near the threshold of perceptually transparent reconstruction. Operationally, the new compression system is practical, with a low-complexity real-time encoder and a cascade decoder consisting of a fast initial decoder and an optional CNN soft decoder.

  Click for Model/Code and Paper
Nonlinear Prediction of Multidimensional Signals via Deep Regression with Applications to Image Coding

Oct 30, 2018
Xi Zhang, Xiaolin Wu

Deep convolutional neural networks (DCNN) have enjoyed great successes in many signal processing applications because they can learn complex, non-linear causal relationships from input to output. In this light, DCNNs are well suited for the task of sequential prediction of multidimensional signals, such as images, and have the potential of improving the performance of traditional linear predictors. In this research we investigate how far DCNNs can push the envelop in terms of prediction precision. We propose, in a case study, a two-stage deep regression DCNN framework for nonlinear prediction of two-dimensional image signals. In the first-stage regression, the proposed deep prediction network (PredNet) takes the causal context as input and emits a prediction of the present pixel. Three PredNets are trained with the regression objectives of minimizing $\ell_1$, $\ell_2$ and $\ell_\infty$ norms of prediction residuals, respectively. The second-stage regression combines the outputs of the three PredNets to generate an even more precise and robust prediction. The proposed deep regression model is applied to lossless predictive image coding, and it outperforms the state-of-the-art linear predictors by appreciable margin.

  Click for Model/Code and Paper
Near-lossless L-infinity constrained Multi-rate Image Decompression via Deep Neural Network

Oct 10, 2018
Xi Zhang, Xiaolin Wu

Recently a number of CNN-based techniques were proposed to remove image compression artifacts. As in other restoration applications, these techniques all learn a mapping from decompressed patches to the original counterparts under the ubiquitous L2 metric. However, this approach is incapable of restoring distinctive image details which may be statistical outliers but have high semantic importance (e.g., tiny lesions in medical images). To overcome this weakness, we propose to incorporate an L-infinity fidelity criterion in the design of neural network so that no small, distinctive structures of the original image can be dropped or distorted. Moreover, our anti-artifacts neural network is designed to work on a range of compression bit rates, rather than a fixed one as in the past. Experimental results demonstrate that the proposed method outperforms the state-of-the-art methods in L-infinity error metric and perceptual quality, while being competitive in L2 error metric as well. It can restore subtle image details that are otherwise destroyed or missed by other algorithms. Our research suggests a new machine learning paradigm of ultra high fidelity image compression that is ideally suited for applications in medicine, space, and sciences.

  Click for Model/Code and Paper
Responses to Critiques on Machine Learning of Criminality Perceptions (Addendum of arXiv:1611.04135)

May 26, 2017
Xiaolin Wu, Xi Zhang

In November 2016 we submitted to arXiv our paper "Automated Inference on Criminality Using Face Images". It generated a great deal of discussions in the Internet and some media outlets. Our work is only intended for pure academic discussions; how it has become a media consumption is a total surprise to us. Although in agreement with our critics on the need and importance of policing AI research for the general good of the society, we are deeply baffled by the ways some of them mispresented our work, in particular the motive and objective of our research.

  Click for Model/Code and Paper
Quality Adaptive Low-Rank Based JPEG Decoding with Applications

Jan 06, 2016
Xiao Shu, Xiaolin Wu

Small compression noises, despite being transparent to human eyes, can adversely affect the results of many image restoration processes, if left unaccounted for. Especially, compression noises are highly detrimental to inverse operators of high-boosting (sharpening) nature, such as deblurring and superresolution against a convolution kernel. By incorporating the non-linear DCT quantization mechanism into the formulation for image restoration, we propose a new sparsity-based convex programming approach for joint compression noise removal and image restoration. Experimental results demonstrate significant performance gains of the new approach over existing image restoration methods.

  Click for Model/Code and Paper
Lossless Compression of Mosaic Images with Convolutional Neural Network Prediction

Jan 28, 2020
Seyed Mehdi Ayyoubzadeh, Xiaolin Wu

We present a CNN-based predictive lossless compression scheme for raw color mosaic images of digital cameras. This specialized application problem was previously understudied but it is now becoming increasingly important, because modern CNN methods for image restoration tasks (e.g., superresolution, low lighting enhancement, deblurring), must operate on original raw mosaic images to obtain the best possible results. The key innovation of this paper is a high-order nonlinear CNN predictor of spatial-spectral mosaic patterns. The deep learning prediction can model highly complex sample dependencies in spatial-spectral mosaic images more accurately and hence remove statistical redundancies more thoroughly than existing image predictors. Experiments show that the proposed CNN predictor achieves unprecedented lossless compression performance on camera raw images.

  Click for Model/Code and Paper
Adaptive Loss Function for Super Resolution Neural Networks Using Convex Optimization Techniques

Jan 21, 2020
Seyed Mehdi Ayyoubzadeh, Xiaolin Wu

Single Image Super-Resolution (SISR) task refers to learn a mapping from low-resolution images to the corresponding high-resolution ones. This task is known to be extremely difficult since it is an ill-posed problem. Recently, Convolutional Neural Networks (CNNs) have achieved state of the art performance on SISR. However, the images produced by CNNs do not contain fine details of the images. Generative Adversarial Networks (GANs) aim to solve this issue and recover sharp details. Nevertheless, GANs are notoriously difficult to train. Besides that, they generate artifacts in the high-resolution images. In this paper, we have proposed a method in which CNNs try to align images in different spaces rather than only the pixel space. Such a space is designed using convex optimization techniques. CNNs are encouraged to learn high-frequency components of the images as well as low-frequency components. We have shown that the proposed method can recover fine details of the images and it is stable in the training process.

  Click for Model/Code and Paper
Filter Bank Regularization of Convolutional Neural Networks

Jul 25, 2019
Seyed Mehdi Ayyoubzadeh, Xiaolin Wu

Regularization techniques are widely used to improve the generality, robustness, and efficiency of deep convolutional neural networks (DCNNs). In this paper, we propose a novel approach of regulating DCNN convolutional kernels by a structured filter bank. Comparing with the existing regularization methods, such as $\ell_1$ or $\ell_2$ minimization of DCNN kernel weights and the kernel orthogonality, which ignore sample correlations within a kernel, the use of filter bank in regularization of DCNNs can mold the DCNN kernels to common spatial structures and features (e.g., edges or textures of various orientations and frequencies) of natural images. On the other hand, unlike directly making DCNN kernels fixed filters, the filter bank regularization still allows the freedom of optimizing DCNN weights via deep learning. This new DCNN design strategy aims to combine the best of two worlds: the inclusion of structural image priors of traditional filter banks to improve the robustness and generality of DCNN solutions and the capability of modern deep learning to model complex non-linear functions hidden in training data. Experimental results on object recognition tasks show that the proposed regularization approach guides DCNNs to faster convergence and better generalization than existing regularization methods of weight decay and kernel orthogonality.

* 11 pages, 17 figures 

  Click for Model/Code and Paper
Challenge of Spatial Cognition for Deep Learning

Jul 30, 2019
Xiaolin Wu, Xi Zhang, Jun Du

Given the success of the deep convolutional neural networks (DCNNs) in applications of visual recognition and classification, it would be tantalizing to test if DCNNs can also learn spatial concepts, such as straightness, convexity, left/right, front/back, relative size, aspect ratio, polygons, etc., from varied visual examples of these concepts that are simple and yet vital for spatial reasoning. Much to our dismay, extensive experiments of the type of cognitive psychology demonstrate that the data-driven deep learning (DL) cannot see through superficial variations in visual representations and grasp the spatial concept in abstraction. The root cause of failure turns out to be the learning methodology, not the computational model of the neural network itself. By incorporating task-specific convolutional kernels, we are able to construct DCNNs for spatial cognition tasks that can generalize to input images not drawn from the same distribution of the training set. This work raises a precaution that without manually-incorporated priors or features DCCNs may fail spatial cognitive tasks at rudimentary level.

  Click for Model/Code and Paper
Deep Learning with Inaccurate Training Data for Image Restoration

Nov 18, 2018
Bolin Liu, Xiao Shu, Xiaolin Wu

In many applications of deep learning, particularly those in image restoration, it is either very difficult, prohibitively expensive, or outright impossible to obtain paired training data precisely as in the real world. In such cases, one is forced to use synthesized paired data to train the deep convolutional neural network (DCNN). However, due to the unavoidable generalization error in statistical learning, the synthetically trained DCNN often performs poorly on real world data. To overcome this problem, we propose a new general training method that can compensate for, to a large extent, the generalization errors of synthetically trained DCNNs.

  Click for Model/Code and Paper
On Numerosity of Deep Convolutional Neural Networks

Jul 11, 2018
Xiaolin Wu, Xi Zhang, Xiao Shu

Subitizing, or the sense of small natural numbers, is a cognitive construct so primary and critical to the survival and well-being of humans and primates that is considered and proven to be innate; it responds to visual stimuli prior to the development of any symbolic skills, language or arithmetic. Given highly acclaimed successes of deep convolutional neural networks (DCNN) in tasks of visual intelligence, one would expect that DCNNs can learn subitizing. But somewhat surprisingly, our carefully crafted extensive experiments, which are similar to those of cognitive psychology, demonstrate that DCNNs cannot, even with strong supervision, see through superficial variations in visual representations and distill the abstract notion of natural number, a task that children perform with high accuracy and confidence. The DCNN black box learners driven by very large training sets are apparently still confused by geometric variations and fail to grasp the topological essence in subitizing. In sharp contrast to the failures of the black box learning, by incorporating a mechanism of mathematical morphology into convolutional kernels, we are able to construct a recurrent convolutional neural network that can perform subitizing deterministically. Our findings in this study of cognitive computing, without and with prior of human knowledge, are discussed; they are, we believe, significant and thought-provoking in the interests of AI research, because visual-based numerosity is a benchmark of minimum sort for human cognition.

  Click for Model/Code and Paper
Demoiréing of Camera-Captured Screen Images Using Deep Convolutional Neural Network

Apr 11, 2018
Bolin Liu, Xiao Shu, Xiaolin Wu

Taking photos of optoelectronic displays is a direct and spontaneous way of transferring data and keeping records, which is widely practiced. However, due to the analog signal interference between the pixel grids of the display screen and camera sensor array, objectionable moir\'e (alias) patterns appear in captured screen images. As the moir\'e patterns are structured and highly variant, they are difficult to be completely removed without affecting the underneath latent image. In this paper, we propose an approach of deep convolutional neural network for demoir\'eing screen photos. The proposed DCNN consists of a coarse-scale network and a fine-scale network. In the coarse-scale network, the input image is first downsampled and then processed by stacked residual blocks to remove the moir\'e artifacts. After that, the fine-scale network upsamples the demoir\'ed low-resolution image back to the original resolution. Extensive experimental results have demonstrated that the proposed technique can efficiently remove the moir\'e patterns for camera acquired screen images; the new technique outperforms the existing ones.

  Click for Model/Code and Paper
Learning-Based Dequantization For Image Restoration Against Extremely Poor Illumination

Mar 20, 2018
Chang Liu, Xiaolin Wu, Xiao Shu

All existing image enhancement methods, such as HDR tone mapping, cannot recover A/D quantization losses due to insufficient or excessive lighting, (underflow and overflow problems). The loss of image details due to A/D quantization is complete and it cannot be recovered by traditional image processing methods, but the modern data-driven machine learning approach offers a much needed cure to the problem. In this work we propose a novel approach to restore and enhance images acquired in low and uneven lighting. First, the ill illumination is algorithmically compensated by emulating the effects of artificial supplementary lighting. Then a DCNN trained using only synthetic data recovers the missing detail caused by quantization.

  Click for Model/Code and Paper
Collaborative Autoencoder for Recommender Systems

Jan 30, 2018
Qibing Li, Xiaolin Zheng, Xinyue Wu

In recent years, deep neural networks have yielded state-of-the-art performance on several tasks. Although some recent works have focused on combining deep learning with recommendation, we highlight three issues of existing works. First, most works perform deep content feature learning and resort to matrix factorization, which cannot effectively model the highly complex user-item interaction function. Second, due to the difficulty on training deep neural networks, existing models utilize a shallow architecture, and thus limit the expressive potential of deep learning. Third, neural network models are easy to overfit on the implicit setting, because negative interactions are not taken into account. To tackle these issues, we present a generic recommender framework called Neural Collaborative Autoencoder (NCAE) to perform collaborative filtering, which works well for both explicit feedback and implicit feedback. NCAE can effectively capture the relationship between interactions via a non-linear matrix factorization process. To optimize the deep architecture of NCAE, we develop a three-stage pre-training mechanism that combines supervised and unsupervised feature learning. Moreover, to prevent overfitting on the implicit setting, we propose an error reweighting module and a sparsity-aware data-augmentation strategy. Extensive experiments on three real-world datasets demonstrate that NCAE can significantly advance the state-of-the-art.

  Click for Model/Code and Paper
Fast Screening Algorithm for Rotation and Scale Invariant Template Matching

Jul 19, 2017
Bolin Liu, Xiao Shu, Xiaolin Wu

This paper presents a generic pre-processor for expediting conventional template matching techniques. Instead of locating the best matched patch in the reference image to a query template via exhaustive search, the proposed algorithm rules out regions with no possible matches with minimum computational efforts. While working on simple patch features, such as mean, variance and gradient, the fast pre-screening is highly discriminative. Its computational efficiency is gained by using a novel octagonal-star-shaped template and the inclusion-exclusion principle to extract and compare patch features. Moreover, it can handle arbitrary rotation and scaling of reference images effectively. Extensive experiments demonstrate that the proposed algorithm greatly reduces the search space while never missing the best match.

  Click for Model/Code and Paper
Automated Inference on Sociopsychological Impressions of Attractive Female Faces

Dec 23, 2016
Xiaolin Wu, Xi Zhang, Chang Liu

This article is a sequel to our earlier work [25]. The main objective of our research is to explore the potential of supervised machine learning in face-induced social computing and cognition, riding on the momentum of much heralded successes of face processing, analysis and recognition on the tasks of biometric-based identification. We present a case study of automated statistical inference on sociopsychological perceptions of female faces controlled for race, attractiveness, age and nationality. Our empirical evidences point to the possibility of training machine learning algorithms, using example face images characterized by internet users, to predict perceptions of personality traits and demeanors.

  Click for Model/Code and Paper
Non-rigid Registration Method between 3D CT Liver Data and 2D Ultrasonic Images based on Demons Model

Dec 31, 2019
Shuo Huang, Ke wu, Xiaolin Meng, Cheng Li

The non-rigid registration between CT data and ultrasonic images of liver can facilitate the diagnosis and treatment, which has been widely studied in recent years. To improve the registration accuracy of the Demons model on the non-rigid registration between 3D CT liver data and 2D ultrasonic images, a novel boundary extraction and enhancement method based on radial directional local intuitionistic fuzzy entropy in the polar coordinates has been put forward, and a new registration workflow has been provided. Experiments show that our method can acquire high-accuracy registration results. Experiments also show that the accuracy of the results of our method is higher than that of the original Demons method and the Demons method using simulated ultrasonic image by Field II. The operation time of our registration workflow is about 30 seconds, and it can be used in the surgery.

  Click for Model/Code and Paper
DeepAtom: A Framework for Protein-Ligand Binding Affinity Prediction

Dec 01, 2019
Yanjun Li, Mohammad A. Rezaei, Chenglong Li, Xiaolin Li, Dapeng Wu

The cornerstone of computational drug design is the calculation of binding affinity between two biological counterparts, especially a chemical compound, i.e., a ligand, and a protein. Predicting the strength of protein-ligand binding with reasonable accuracy is critical for drug discovery. In this paper, we propose a data-driven framework named DeepAtom to accurately predict the protein-ligand binding affinity. With 3D Convolutional Neural Network (3D-CNN) architecture, DeepAtom could automatically extract binding related atomic interaction patterns from the voxelized complex structure. Compared with the other CNN based approaches, our light-weight model design effectively improves the model representational capacity, even with the limited available training data. With validation experiments on the PDBbind v.2016 benchmark and the independent Astex Diverse Set, we demonstrate that the less feature engineering dependent DeepAtom approach consistently outperforms the other state-of-the-art scoring methods. We also compile and propose a new benchmark dataset to further improve the model performances. With the new dataset as training input, DeepAtom achieves Pearson's R=0.83 and RMSE=1.23 pK units on the PDBbind v.2016 core set. The promising results demonstrate that DeepAtom models can be potentially adopted in computational drug development protocols such as molecular docking and virtual screening.

* IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2019) 
* Accepted IEEE BIBM 2019 paper with minor revisions 

  Click for Model/Code and Paper
Single Image Reflection Removal Using Deep Encoder-Decoder Network

Jan 31, 2018
Zhixiang Chi, Xiaolin Wu, Xiao Shu, Jinjin Gu

Image of a scene captured through a piece of transparent and reflective material, such as glass, is often spoiled by a superimposed layer of reflection image. While separating the reflection from a familiar object in an image is mentally not difficult for humans, it is a challenging, ill-posed problem in computer vision. In this paper, we propose a novel deep convolutional encoder-decoder method to remove the objectionable reflection by learning a map between image pairs with and without reflection. For training the neural network, we model the physical formation of reflections in images and synthesize a large number of photo-realistic reflection-tainted images from reflection-free images collected online. Extensive experimental results show that, although the neural network learns only from synthetic data, the proposed method is effective on real-world images, and it significantly outperforms the other tested state-of-the-art techniques.

  Click for Model/Code and Paper
Random Walk Graph Laplacian based Smoothness Prior for Soft Decoding of JPEG Images

Jul 07, 2016
Xianming Liu, Gene Cheung, Xiaolin Wu, Debin Zhao

Given the prevalence of JPEG compressed images, optimizing image reconstruction from the compressed format remains an important problem. Instead of simply reconstructing a pixel block from the centers of indexed DCT coefficient quantization bins (hard decoding), soft decoding reconstructs a block by selecting appropriate coefficient values within the indexed bins with the help of signal priors. The challenge thus lies in how to define suitable priors and apply them effectively. In this paper, we combine three image priors---Laplacian prior for DCT coefficients, sparsity prior and graph-signal smoothness prior for image patches---to construct an efficient JPEG soft decoding algorithm. Specifically, we first use the Laplacian prior to compute a minimum mean square error (MMSE) initial solution for each code block. Next, we show that while the sparsity prior can reduce block artifacts, limiting the size of the over-complete dictionary (to lower computation) would lead to poor recovery of high DCT frequencies. To alleviate this problem, we design a new graph-signal smoothness prior (desired signal has mainly low graph frequencies) based on the left eigenvectors of the random walk graph Laplacian matrix (LERaG). Compared to previous graph-signal smoothness priors, LERaG has desirable image filtering properties with low computation overhead. We demonstrate how LERaG can facilitate recovery of high DCT frequencies of a piecewise smooth (PWS) signal via an interpretation of low graph frequency components as relaxed solutions to normalized cut in spectral clustering. Finally, we construct a soft decoding algorithm using the three signal priors with appropriate prior weights. Experimental results show that our proposal outperforms state-of-the-art soft decoding algorithms in both objective and subjective evaluations noticeably.

  Click for Model/Code and Paper