Research papers and code for "Hao Liu":
Dynamic spectrum access (DSA) is regarded as an effective and efficient technology to share radio spectrum among different networks. As a secondary user (SU), a DSA device will face two critical problems: avoiding causing harmful interference to primary users (PUs), and conducting effective interference coordination with other secondary users. These two problems become even more challenging for a distributed DSA network where there is no centralized controllers for SUs. In this paper, we investigate communication strategies of a distributive DSA network under the presence of spectrum sensing errors. To be specific, we apply the powerful machine learning tool, deep reinforcement learning (DRL), for SUs to learn "appropriate" spectrum access strategies in a distributed fashion assuming NO knowledge of the underlying system statistics. Furthermore, a special type of recurrent neural network (RNN), called the reservoir computing (RC), is utilized to realize DRL by taking advantage of the underlying temporal correlation of the DSA network. Using the introduced machine learning-based strategy, SUs could make spectrum access decisions distributedly relying only on their own current and past spectrum sensing outcomes. Through extensive experiments, our results suggest that the RC-based spectrum access strategy can help the SU to significantly reduce the chances of collision with PUs and other SUs. We also show that our scheme outperforms the myopic method which assumes the knowledge of system statistics, and converges faster than the Q-learning method when the number of channels is large.

* This work is accepted in IEEE IoT Journal 2018
Click to Read Paper and Get Code
Learning with a primary objective, such as softmax cross entropy for classification and sequence generation, has been the norm for training deep neural networks for years. Although being a widely-adopted approach, using cross entropy as the primary objective exploits mostly the information from the ground-truth class for maximizing data likelihood, and largely ignores information from the complement (incorrect) classes. We argue that, in addition to the primary objective, training also using a complement objective that leverages information from the complement classes can be effective in improving model performance. This motivates us to study a new training paradigm that maximizes the likelihood of the groundtruth class while neutralizing the probabilities of the complement classes. We conduct extensive experiments on multiple tasks ranging from computer vision to natural language understanding. The experimental results confirm that, compared to the conventional training with just one primary objective, training also with the complement objective further improves the performance of the state-of-the-art models across all tasks. In addition to the accuracy improvement, we also show that models trained with both primary and complement objectives are more robust to single-step adversarial attacks.

* ICLR'19 Camera Ready
Click to Read Paper and Get Code
Due to the intractable partition function, the exact likelihood function for a Markov random field (MRF), in many situations, can only be approximated. Major approximation approaches include pseudolikelihood and Laplace approximation. In this paper, we propose a novel way of approximating the likelihood function through first approximating the marginal likelihood functions of individual parameters and then reconstructing the joint likelihood function from these marginal likelihood functions. For approximating the marginal likelihood functions, we derive a particular likelihood function from a modified scenario of coin tossing which is useful for capturing how one parameter interacts with the remaining parameters in the likelihood function. For reconstructing the joint likelihood function, we use an appropriate copula to link up these marginal likelihood functions. Numerical investigation suggests the superior performance of our approach. Especially as the size of the MRF increases, both the numerical performance and the computational cost of our approach remain consistently satisfactory, whereas Laplace approximation deteriorates and pseudolikelihood becomes computationally unbearable.

Click to Read Paper and Get Code
Measuring the performance of solar energy and heat transfer systems requires a lot of time, economic cost and manpower. Meanwhile, directly predicting their performance is challenging due to the complicated internal structures. Fortunately, a knowledge-based machine learning method can provide a promising prediction and optimization strategy for the performance of energy systems. In this Chapter, the authors will show how they utilize the machine learning models trained from a large experimental database to perform precise prediction and optimization on a solar water heater (SWH) system. A new energy system optimization strategy based on a high-throughput screening (HTS) process is proposed. This Chapter consists of: i) Comparative studies on varieties of machine learning models (artificial neural networks (ANNs), support vector machine (SVM) and extreme learning machine (ELM)) to predict the performances of SWHs; ii) Development of an ANN-based software to assist the quick prediction and iii) Introduction of a computational HTS method to design a high-performance SWH system.

* 20 pages
Click to Read Paper and Get Code
Variational inference with {\alpha}-divergences has been widely used in modern probabilistic machine learning. Compared to Kullback-Leibler (KL) divergence, a major advantage of using {\alpha}-divergences (with positive {\alpha} values) is their mass-covering property. However, estimating and optimizing {\alpha}-divergences require to use importance sampling, which could have extremely large or infinite variances due to heavy tails of importance weights. In this paper, we propose a new class of tail-adaptive f-divergences that adaptively change the convex function f with the tail of the importance weights, in a way that theoretically guarantees finite moments, while simultaneously achieving mass-covering properties. We test our methods on Bayesian neural networks, as well as deep reinforcement learning in which our method is applied to improve a recent soft actor-critic (SAC) algorithm. Our results show that our approach yields significant advantages compared with existing methods based on classical KL and {\alpha}-divergences.

* Conference on Neural Information Processing Systems (NIPS) 2018
Click to Read Paper and Get Code
Deep neural networks are vulnerable to adversarial examples. Prior defenses attempted to make deep networks more robust by either improving the network architecture or adding adversarial examples into the training set, with their respective limitations. We propose a new direction. Motivated by recent research that shows that outliers in the training set have a high negative influence on the trained model, our approach makes the model more robust by detecting and removing outliers in the training set without modifying the network architecture or requiring adversarial examples. We propose two methods for detecting outliers based on canonical examples and on training errors, respectively. After removing the outliers, we train the classifier with the remaining examples to obtain a sanitized model. Our evaluation shows that the sanitized model improves classification accuracy and forces the attacks to generate adversarial examples with higher distortions. Moreover, the Kullback-Leibler divergence from the output of the original model to that of the sanitized model allows us to distinguish between normal and adversarial examples reliably.

Click to Read Paper and Get Code
Reservoir computing (RC) is a special neural network which consists of a fixed high dimensional feature mapping and trained readout weights. In this paper, we consider a new RC structure for MIMO-OFDM symbol detection, namely windowed echo state network (WESN). It is introduced by adding buffers in input layers which brings an enhanced short-term memory (STM) of the underlying neural network through our theoretical proof. A unified training framework is developed for the WESN MIMO-OFDM symbol detector using both comb and scattered pilot patterns, where the utilized pilots are compatible with the structure adopted in 3GPP LTE/LTE-Advanced systems. Complexity analysis reveals the advantages of the WESN based symbol detector over the state-of-the-art symbol detectors such as the linear the minimum mean square error (LMMSE) detection and the sphere decoder when the system is employed with a large number of OFDM sub-carriers. Numerical evaluations corroborate that the improvement of the STM introduced by the WESN can significantly improve the symbol detection performance as well as effectively mitigate model mismatch effects as opposed to existing methods.

Click to Read Paper and Get Code
Surrogate-assisted evolutionary algorithms (SAEAs) are powerful optimisation tools for computationally expensive problems (CEPs). However, a randomly selected algorithm may fail in solving unknown problems due to no free lunch theorems, and it will cause more computational resource if we re-run the algorithm or try other algorithms to get a much solution, which is more serious in CEPs. In this paper, we consider an algorithm portfolio for SAEAs to reduce the risk of choosing an inappropriate algorithm for CEPs. We propose two portfolio frameworks for very expensive problems in which the maximal number of fitness evaluations is only 5 times of the problem's dimension. One framework named Par-IBSAEA runs all algorithm candidates in parallel and a more sophisticated framework named UCB-IBSAEA employs the Upper Confidence Bound (UCB) policy from reinforcement learning to help select the most appropriate algorithm at each iteration. An effective reward definition is proposed for the UCB policy. We consider three state-of-the-art individual-based SAEAs on different problems and compare them to the portfolios built from their instances on several benchmark problems given limited computation budgets. Our experimental studies demonstrate that our proposed portfolio frameworks significantly outperform any single algorithm on the set of benchmark problems.

Click to Read Paper and Get Code
Pulmonary nodule detection using low-dose Computed Tomography (CT) is often the first step in lung disease screening and diagnosis. Recently, algorithms based on deep convolutional neural nets have shown great promise for automated nodule detection. Most of the existing deep learning nodule detection systems are constructed in two steps: a) nodule candidates screening and b) false positive reduction, using two different models trained separately. Although it is commonly adopted, the two-step approach not only imposes significant resource overhead on training two independent deep learning models, but also is sub-optimal because it prevents cross-talk between the two. In this work, we present an end-to-end framework for nodule detection, integrating nodule candidate screening and false positive reduction into one model, trained jointly. We demonstrate that the end-to-end system improves the performance by 3.88\% over the two-step approach, while at the same time reducing model complexity by one third and cutting inference time by 3.6 fold. Code will be made publicly available.

* 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019)
Click to Read Paper and Get Code
3D object classification and segmentation using deep neural networks has been extremely successful. As the problem of identifying 3D objects has many safety-critical applications, the neural networks have to be robust against adversarial changes to the input data set. There is a growing body of research on generating human-imperceptible adversarial attacks and defenses against them in the 2D image classification domain. However, 3D objects have various differences with 2D images, and this specific domain has not been rigorously studied so far. We present a preliminary evaluation of adversarial attacks on deep 3D point cloud classifiers, namely PointNet and PointNet++, by evaluating both white-box and black-box adversarial attacks that were proposed for 2D images and extending those attacks to reduce the perceptibility of the perturbations in 3D space. We also show the high effectiveness of simple defenses against those attacks by proposing new defenses that exploit the unique structure of 3D point clouds. Finally, we attempt to explain the effectiveness of the defenses through the intrinsic structures of both the point clouds and the neural network architectures. Overall, we find that networks that process 3D point cloud data are weak to adversarial attacks, but they are also more easily defensible compared to 2D image classifiers. Our investigation will provide the groundwork for future studies on improving the robustness of deep neural networks that handle 3D data.

* 8 pages, 3 figures, 5 tables
Click to Read Paper and Get Code
Research on automatic music generation has seen great progress due to the development of deep neural networks. However, the generation of multi-instrument music of arbitrary genres still remains a challenge. Existing research either works on lead sheets or multi-track piano-rolls found in MIDIs, but both musical notations have their limits. In this work, we propose a new task called lead sheet arrangement to avoid such limits. A new recurrent convolutional generative model for the task is proposed, along with three new symbolic-domain harmonic features to facilitate learning from unpaired lead sheets and MIDIs. Our model can generate lead sheets and their arrangements of eight-bar long. Audio samples of the generated result can be found at https://drive.google.com/open?id=1c0FfODTpudmLvuKBbc23VBCgQizY6-Rk

* 7 pages, 7 figures and 4 tables
Click to Read Paper and Get Code
A growing demand for natural-scene text detection has been witnessed by the computer vision community since text information plays a significant role in scene understanding and image indexing. Deep neural networks are being used due to their strong capabilities of pixel-wise classification or word localization, similar to being used in common vision problems. In this paper, we present a novel two-task network with integrating bottom and top cues. The first task aims to predict a pixel-by-pixel labeling and based on which, word proposals are generated with a canonical connected component analysis. The second task aims to output a bundle of character candidates used later to verify the word proposals. The two sub-networks share base convolutional features and moreover, we present a new loss to strengthen the interaction between them. We evaluate the proposed network on public benchmark datasets and show it can detect arbitrary-orientation scene text with a finer output boundary. In ICDAR 2013 text localization task, we achieve the state-of-the-art performance with an F-score of 0.919 and a much better recall of 0.915.

* 10 pages, 5 figures
Click to Read Paper and Get Code
In this paper, we describe work in progress towards a real-time vision-based traffic flow prediction (TFP) system. The proposed method consists of three elemental operators, that are dynamic texture model based motion segmentation, feature extraction and Gaussian process (GP) regression. The objective of motion segmentation is to recognize the target regions covering the moving vehicles in the sequence of visual processes. The feature extraction operator aims to extract useful features from the target regions. The extracted features are then mapped to the number of vehicles through the operator of GP regression. A training stage using historical visual data is required for determining the parameter values of the GP. Using a low-resolution visual data set, we performed preliminary evaluations on the performance of the proposed method. The results show that our method beats a benchmark solution based on Gaussian mixture model, and has the potential to be developed into qualified and practical solutions to real-time TFP.

* 8 pages, 4 figures, conference
Click to Read Paper and Get Code
In this paper, we investigate the power of online learning in stochastic network optimization with unknown system statistics {\it a priori}. We are interested in understanding how information and learning can be efficiently incorporated into system control techniques, and what are the fundamental benefits of doing so. We propose two \emph{Online Learning-Aided Control} techniques, $\mathtt{OLAC}$ and $\mathtt{OLAC2}$, that explicitly utilize the past system information in current system control via a learning procedure called \emph{dual learning}. We prove strong performance guarantees of the proposed algorithms: $\mathtt{OLAC}$ and $\mathtt{OLAC2}$ achieve the near-optimal $[O(\epsilon), O([\log(1/\epsilon)]^2)]$ utility-delay tradeoff and $\mathtt{OLAC2}$ possesses an $O(\epsilon^{-2/3})$ convergence time. $\mathtt{OLAC}$ and $\mathtt{OLAC2}$ are probably the first algorithms that simultaneously possess explicit near-optimal delay guarantee and sub-linear convergence time. Simulation results also confirm the superior performance of the proposed algorithms in practice. To the best of our knowledge, our attempt is the first to explicitly incorporate online learning into stochastic network optimization and to demonstrate its power in both theory and practice.

Click to Read Paper and Get Code
Rendering bridges the gap between 2D vision and 3D scenes by simulating the physical process of image formation. By inverting such renderer, one can think of a learning approach to infer 3D information from 2D images. However, standard graphics renderers involve a fundamental discretization step called rasterization, which prevents the rendering process to be differentiable, hence able to be learned. Unlike the state-of-the-art differentiable renderers, which only approximate the rendering gradient in the back propagation, we propose a truly differentiable rendering framework that is able to (1) directly render colorized mesh using differentiable functions and (2) back-propagate efficient supervision signals to mesh vertices and their attributes from various forms of image representations, including silhouette, shading and color images. The key to our framework is a novel formulation that views rendering as an aggregation function that fuses the probabilistic contributions of all mesh triangles with respect to the rendered pixels. Such formulation enables our framework to flow gradients to the occluded and far-range vertices, which cannot be achieved by the previous state-of-the-arts. We show that by using the proposed renderer, one can achieve significant improvement in 3D unsupervised single-view reconstruction both qualitatively and quantitatively. Experiments also demonstrate that our approach is able to handle the challenging tasks in image-based shape fitting, which remain nontrivial to existing differentiable renderers.

* This is a substantially revised version of previous submission: arXiv:1901.05567
Click to Read Paper and Get Code
Rendering is the process of generating 2D images from 3D assets, simulated in a virtual environment, typically with a graphics pipeline. By inverting such renderer, one can think of a learning approach to predict a 3D shape from an input image. However, standard rendering pipelines involve a fundamental discretization step called rasterization, which prevents the rendering process to be differentiable, hence suitable for learning. We present the first non-parametric and truly differentiable rasterizer based on silhouettes. Our method enables unsupervised learning for high-quality 3D mesh reconstruction from a single image. We call our framework `soft rasterizer' as it provides an accurate soft approximation of the standard rasterizer. The key idea is to fuse the probabilistic contributions of all mesh triangles with respect to the rendered pixels. When combined with a mesh generator in a deep neural network, our soft rasterizer is able to generate an approximated silhouette of the generated polygon mesh in the forward pass. The rendering loss is back-propagated to supervise the mesh generation without the need of 3D training data. Experimental results demonstrate that our approach significantly outperforms the state-of-the-art unsupervised techniques, both quantitatively and qualitatively. We also show that our soft rasterizer can achieve comparable results to the cutting-edge supervised learning method and in various cases even better ones, especially for real-world data.

Click to Read Paper and Get Code
We extend Convolutional Neural Networks (CNNs) on flat and regular domains (e.g. 2D images) to curved surfaces embedded in 3D Euclidean space that are discretized as irregular meshes and widely used to represent geometric data in Computer Vision and Graphics. We define surface convolution on tangent spaces of a surface domain, where the convolution has two desirable properties: 1) the distortion of surface domain signals is locally minimal when being projected to the tangent space, and 2) the translation equi-variance property holds locally, by aligning tangent spaces with the canonical parallel transport that preserves metric. For computation, we rely on a parallel N-direction frame field on the surface that minimizes field variation and therefore is as compatible as possible to and approximates the parallel transport. On the tangent spaces equipped with parallel frames, the computation of surface convolution becomes standard routine. The frames have rotational symmetry which we disambiguate by constructing the covering space of surface induced by the parallel frames and grouping the feature maps into N sets accordingly; convolution is computed on the N branches of the cover space with respective feature maps while the kernel weights are shared. To handle irregular points of a discrete mesh while sharing kernel weights, we make the convolution semi-discrete, i.e. the convolution kernels are polynomial functions, and their convolution with discrete surface points becomes sampling and weighted summation. Pooling and unpooling operations are computed along a mesh hierarchy built through simplification. The presented surface CNNs allow effective deep learning on meshes. We show that for tasks of classification, segmentation and non-rigid registration, surface CNNs using only raw input signals achieve superior performances than previous models using sophisticated input features.

* 10 pages, 11 figures
Click to Read Paper and Get Code
Unsupervised structure learning in high-dimensional time series data has attracted a lot of research interests. For example, segmenting and labelling high dimensional time series can be helpful in behavior understanding and medical diagnosis. Recent advances in generative sequential modeling have suggested to combine recurrent neural networks with state space models (e.g., Hidden Markov Models). This combination can model not only the long term dependency in sequential data, but also the uncertainty included in the hidden states. Inheriting these advantages of stochastic neural sequential models, we propose a structured and stochastic sequential neural network, which models both the long-term dependencies via recurrent neural networks and the uncertainty in the segmentation and labels via discrete random variables. For accurate and efficient inference, we present a bi-directional inference network by reparamterizing the categorical segmentation and labels with the recent proposed Gumbel-Softmax approximation and resort to the Stochastic Gradient Variational Bayes. We evaluate the proposed model in a number of tasks, including speech modeling, automatic segmentation and labeling in behavior understanding, and sequential multi-objects recognition. Experimental results have demonstrated that our proposed model can achieve significant improvement over the state-of-the-art methods.

Click to Read Paper and Get Code
Visualizing the details of different cellular structures is of great importance to elucidate cellular functions. However, it is challenging to obtain high quality images of different structures directly due to complex cellular environments. Fluorescence microscopy is a popular technique to label different structures but has several drawbacks. In particular, labeling is time consuming and may affect cell morphology, and simultaneous labels are inherently limited. This raises the need of building computational models to learn relationships between unlabeled and labeled fluorescence images, and to infer fluorescent labels of other unlabeled fluorescence images. We propose to develop a novel deep model for fluorescence image prediction. We first propose a novel network layer, known as the global transformer layer, that fuses global information from inputs effectively. The proposed global transformer layer can generate outputs with arbitrary dimensions, and can be employed for all the regular, down-sampling, and up-sampling operators. We then incorporate our proposed global transformer layers and dense blocks to build an U-Net like network. We believe such a design can promote feature reusing between layers. In addition, we propose a multi-scale input strategy to encourage networks to capture features at different scales. We conduct evaluations across various label-free prediction tasks to demonstrate the effectiveness of our approach. Both quantitative and qualitative results show that our method outperforms the state-of-the-art approach significantly. It is also shown that our proposed global transformer layer is useful to improve the fluorescence image prediction results.

* 8 pages, 3 figures, 4 tables
Click to Read Paper and Get Code