Research papers and code for "Hao Zhang":
Dynamic spectrum access (DSA) is regarded as an effective and efficient technology to share radio spectrum among different networks. As a secondary user (SU), a DSA device will face two critical problems: avoiding causing harmful interference to primary users (PUs), and conducting effective interference coordination with other secondary users. These two problems become even more challenging for a distributed DSA network where there is no centralized controllers for SUs. In this paper, we investigate communication strategies of a distributive DSA network under the presence of spectrum sensing errors. To be specific, we apply the powerful machine learning tool, deep reinforcement learning (DRL), for SUs to learn "appropriate" spectrum access strategies in a distributed fashion assuming NO knowledge of the underlying system statistics. Furthermore, a special type of recurrent neural network (RNN), called the reservoir computing (RC), is utilized to realize DRL by taking advantage of the underlying temporal correlation of the DSA network. Using the introduced machine learning-based strategy, SUs could make spectrum access decisions distributedly relying only on their own current and past spectrum sensing outcomes. Through extensive experiments, our results suggest that the RC-based spectrum access strategy can help the SU to significantly reduce the chances of collision with PUs and other SUs. We also show that our scheme outperforms the myopic method which assumes the knowledge of system statistics, and converges faster than the Q-learning method when the number of channels is large.

* This work is accepted in IEEE IoT Journal 2018
Click to Read Paper and Get Code
Back injuries are the most prevalent work-related musculoskeletal disorders and represent a major cause of disability. Although innovations in wearable robots aim to alleviate this hazard, the majority of existing exoskeletons are obtrusive because the rigid linkage design limits natural movement, thus causing ergonomic risk. Moreover, these existing systems are typically only suitable for one type of movement assistance, not ubiquitous for a wide variety of activities. To fill in this gap, this paper presents a new wearable robot design approach continuum soft exoskeleton. This spine-inspired wearable robot is unobtrusive and assists both squat and stoops while not impeding walking motion. To tackle the challenge of the unique anatomy of spine that is inappropriate to be simplified as a single degree of freedom joint, our robot is conformal to human anatomy and it can reduce multiple types of forces along the human spine such as the spinae muscle force, shear, and compression force of the lumbar vertebrae. We derived kinematics and kinetics models of this mechanism and established an analytical biomechanics model of human-robot interaction. Quantitative analysis of disc compression force, disc shear force and muscle force was performed in simulation. We further developed a virtual impedance control strategy to deliver force control and compensate hysteresis of Bowden cable transmission. The feasibility of the prototype was experimentally tested on three healthy subjects. The root mean square error of force tracking is 6.63 N (3.3 % of the 200N peak force) and it demonstrated that it can actively control the stiffness to the desired value. This continuum soft exoskeleton represents a feasible solution with the potential to reduce back pain for multiple activities and multiple forces along the human spine.

* IROS 2019
* 8 pages, 13 figures
Click to Read Paper and Get Code
Current studies about motor imagery based rehabilitation training systems for stroke subjects lack an appropriate analytic method, which can achieve a considerable classification accuracy, at the same time detects gradual changes of imagery patterns during rehabilitation process and disinters potential mechanisms about motor function recovery. In this study, we propose an adaptive boosting algorithm based on the cortex plasticity and spectral band shifts. This approach models the usually predetermined spatial-spectral configurations in EEG study into variable preconditions, and introduces a new heuristic of stochastic gradient boost for training base learners under these preconditions. We compare our proposed algorithm with commonly used methods on datasets collected from 2 months' clinical experiments. The simulation results demonstrate the effectiveness of the method in detecting the variations of stroke patients' EEG patterns. By chronologically reorganizing the weight parameters of the learned additive model, we verify the spatial compensatory mechanism on impaired cortex and detect the changes of accentuation bands in spectral domain, which may contribute important prior knowledge for rehabilitation practice.

* 10 pages,3 figures
Click to Read Paper and Get Code
Training recurrent neural networks (RNNs) with backpropagation through time (BPTT) has known drawbacks such as being difficult to capture longterm dependencies in sequences. Successful alternatives to BPTT have not yet been discovered. Recently, BP with synthetic gradients by a decoupled neural interface module has been proposed to replace BPTT for training RNNs. On the other hand, it has been shown that the representations learned with synthetic and real gradients are different though they are functionally identical. In this project, we explore ways of combining synthetic and real gradients with application to neural language modeling tasks. Empirically, we demonstrate the effectiveness of alternating training with synthetic and real gradients after periodic warm restarts on language modeling tasks.

Click to Read Paper and Get Code
We advocate the use of implicit fields for learning generative models of shapes and introduce an implicit field decoder for shape generation, aimed at improving the visual quality of the generated shapes. An implicit field assigns a value to each point in 3D space, so that a shape can be extracted as an iso-surface. Our implicit field decoder is trained to perform this assignment by means of a binary classifier. Specifically, it takes a point coordinate, along with a feature vector encoding a shape, and outputs a value which indicates whether the point is outside the shape or not. By replacing conventional decoders by our decoder for representation learning and generative modeling of shapes, we demonstrate superior results for tasks such as shape autoencoding, generation, interpolation, and single-view 3D reconstruction, particularly in terms of visual quality. Code and supplementary material are available at https://github.com/czq142857/implicit-decoder.

* Code: https://github.com/czq142857/implicit-decoder Project page: https://www.sfu.ca/~zhiqinc/imgan/Readme.html
Click to Read Paper and Get Code
The developments of deep neural networks (DNN) in recent years have ushered a brand new era of artificial intelligence. DNNs are proved to be excellent in solving very complex problems, e.g., visual recognition and text understanding, to the extent of competing with or even surpassing people. Despite inspiring and encouraging success of DNNs, thorough theoretical analyses still lack to unravel the mystery of their magics. The design of DNN structure is dominated by empirical results in terms of network depth, number of neurons and activations. A few of remarkable works published recently in an attempt to interpret DNNs have established the first glimpses of their internal mechanisms. Nevertheless, research on exploring how DNNs operate is still at the initial stage with plenty of room for refinement. In this paper, we extend precedent research on neural networks with piecewise linear activations (PLNN) concerning linear regions bounds. We present (i) the exact maximal number of linear regions for single layer PLNNs; (ii) a upper bound for multi-layer PLNNs; and (iii) a tighter upper bound for the maximal number of liner regions on rectifier networks. The derived bounds also indirectly explain why deep models are more powerful than shallow counterparts, and how non-linearity of activation functions impacts on expressiveness of networks.

* Counting linear regions of neural networks
Click to Read Paper and Get Code
Semantic role theory is a widely used approach for event representation. Yet, there are multiple indications that semantic role paradigm is necessary but not sufficient to cover all elements of event structure. We conducted an analysis of semantic role representation for events to provide an empirical evidence of insufficiency. The consequence of that is a hybrid role-scalar approach. The results are considered as preliminary in investigation of semantic roles coverage for event representation.

* 5 pages, 1 table, 1 figure
Click to Read Paper and Get Code
In most convolution neural networks (CNNs), downsampling hidden layers is adopted for increasing computation efficiency and the receptive field size. Such operation is commonly so-called pooling. Maximation and averaging over sliding windows (max/average pooling), and plain downsampling in the form of strided convolution are popular pooling methods. Since the pooling is a lossy procedure, a motivation of our work is to design a new pooling approach for less lossy in the dimensionality reduction. Inspired by the Fourier spectral pooling(FSP) proposed by Rippel et. al. [1], we present the Hartley transform based spectral pooling method in CNNs. Compared with FSP, the proposed spectral pooling avoids the use of complex arithmetic for frequency representation and reduces the computation. Spectral pooling preserves more structure features for network's discriminability than max and average pooling. We empirically show that Hartley spectral pooling gives rise to the convergence of training CNNs on MNIST and CIFAR-10 datasets.

* 5 pages, 6 figures, letter
Click to Read Paper and Get Code
We present Semantic WordRank (SWR), an unsupervised method for generating an extractive summary of a single document. Built on a weighted word graph with semantic and co-occurrence edges, SWR scores sentences using an article-structure-biased PageRank algorithm with a Softplus function adjustment, and promotes topic diversity using spectral subtopic clustering under the Word-Movers-Distance metric. We evaluate SWR on the DUC-02 and SummBank datasets and show that SWR produces better summaries than the state-of-the-art algorithms over DUC-02 under common ROUGE measures. We then show that, under the same measures over SummBank, SWR outperforms each of the three human annotators (aka. judges) and compares favorably with the combined performance of all judges.

* 12 pages, accepted by IDEAL2018
Click to Read Paper and Get Code
In this paper, we derive a temporal arbitrage policy for storage via reinforcement learning. Real-time price arbitrage is an important source of revenue for storage units, but designing good strategies have proven to be difficult because of the highly uncertain nature of the prices. Instead of current model predictive or dynamic programming approaches, we use reinforcement learning to design an optimal arbitrage policy. This policy is learned through repeated charge and discharge actions performed by the storage unit through updating a value matrix. We design a reward function that does not only reflect the instant profit of charge/discharge decisions but also incorporate the history information. Simulation results demonstrate that our designed reward function leads to significant performance improvement compared with existing algorithms.

* 2018 IEEE PES General Meeting (GM)
Click to Read Paper and Get Code
Task selection (picking an appropriate labeling task) and worker selection (assigning the labeling task to a suitable worker) are two major challenges in task assignment for crowdsourcing. Recently, worker selection has been successfully addressed by the bandit-based task assignment (BBTA) method, while task selection has not been thoroughly investigated yet. In this paper, we experimentally compare several task selection strategies borrowed from active learning literature, and show that the least confidence strategy significantly improves the performance of task assignment in crowdsourcing.

* arXiv admin note: substantial text overlap with arXiv:1507.05800
Click to Read Paper and Get Code
Visual and audio modalities are two symbiotic modalities underlying videos, which contain both common and complementary information. If they can be mined and fused sufficiently, performances of related video tasks can be significantly enhanced. However, due to the environmental interference or sensor fault, sometimes, only one modality exists while the other is abandoned or missing. By recovering the missing modality from the existing one based on the common information shared between them and the prior information of the specific modality, great bonus will be gained for various vision tasks. In this paper, we propose a Cross-Modal Cycle Generative Adversarial Network (CMCGAN) to handle cross-modal visual-audio mutual generation. Specifically, CMCGAN is composed of four kinds of subnetworks: audio-to-visual, visual-to-audio, audio-to-audio and visual-to-visual subnetworks respectively, which are organized in a cycle architecture. CMCGAN has several remarkable advantages. Firstly, CMCGAN unifies visual-audio mutual generation into a common framework by a joint corresponding adversarial loss. Secondly, through introducing a latent vector with Gaussian distribution, CMCGAN can handle dimension and structure asymmetry over visual and audio modalities effectively. Thirdly, CMCGAN can be trained end-to-end to achieve better convenience. Benefiting from CMCGAN, we develop a dynamic multimodal classification network to handle the modality missing problem. Abundant experiments have been conducted and validate that CMCGAN obtains the state-of-the-art cross-modal visual-audio generation results. Furthermore, it is shown that the generated modality achieves comparable effects with those of original modality, which demonstrates the effectiveness and advantages of our proposed method.

* Have some problems need to be handled
Click to Read Paper and Get Code
LIDAR is one of the most important sensors for Unmanned Ground Vehicles (UGV). Object detection and classification based on lidar point cloud is a key technology for UGV. In object detection and classification, the mutual occlusion between neighboring objects is an important factor affecting the accuracy. In this paper, we consider occlusion as an intrinsic property of the point cloud data. We propose a novel approach that explicitly model the occlusion. The occlusion property is then taken into account in the subsequent classification step. We perform experiments on the KITTI dataset. Experimental results indicate that by utilizing the occlusion property that we modeled, the classifier obtains much better performance.

Click to Read Paper and Get Code
Traditionally the danger cylinder is intimately related to the solution stability in P3P problem. In this work, we show that the danger cylinder is also closely related to the multiple-solution phenomenon. More specifically, we show when the optical center lies on the danger cylinder, of the 3 possible P3P solutions, i.e., one double solution, and two other solutions, the optical center of the double solution still lies on the danger cylinder, but the optical centers of the other two solutions no longer lie on the danger cylinder. And when the optical center moves on the danger cylinder, accordingly the optical centers of the two other solutions of the corresponding P3P problem form a new surface, characterized by a polynomial equation of degree 12 in the optical center coordinates, called the Companion Surface of Danger Cylinder (CSDC). That means the danger cylinder always has a companion surface. For the significance of CSDC, we show that when the optical center passes through the CSDC, the number of solutions of P3P problem must change by 2. That means CSDC acts as a delimitating surface of the P3P solution space. These new findings shed some new lights on the P3P multi-solution phenomenon, an important issue in PnP study.

Click to Read Paper and Get Code
Pulmonary lobe segmentation is an important task for pulmonary disease related Computer Aided Diagnosis systems (CADs). Classical methods for lobe segmentation rely on successful detection of fissures and other anatomical information such as the location of blood vessels and airways. With the success of deep learning in recent years, Deep Convolutional Neural Network (DCNN) has been widely applied to analyze medical images like Computed Tomography (CT) and Magnetic Resonance Imaging (MRI), which, however, requires a large number of ground truth annotations. In this work, we release our manually labeled 50 CT scans which are randomly chosen from the LUNA16 dataset and explore the use of deep learning on this task. We propose pre-processing CT image by cropping region that is covered by the convex hull of the lungs in order to mitigate the influence of noise from outside the lungs. Moreover, we design a hybrid loss function with dice loss to tackle extreme class imbalance issue and focal loss to force model to focus on voxels that are hard to be discriminated. To validate the robustness and performance of our proposed framework trained with a small number of training examples, we further tested our model on CT scans from an independent dataset. Experimental results show the robustness of the proposed approach, which consistently improves performance across different datasets by a maximum of $5.87\%$ as compared to a baseline model.

* 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019)
Click to Read Paper and Get Code
We propose a convolutional neural network (CNN) denoising based method for seismic data interpolation. It provides a simple and efficient way to break though the lack problem of geophysical training labels that are often required by deep learning methods. The new method consists of two steps: (1) Train a set of CNN denoisers from natural image clean-noisy pairs to learn denoising; (2) Integrate the trained CNN denoisers into project onto convex set (POCS) framework to perform seismic data interpolation. The method alleviates the demanding of seismic big data with similar features as applications of end-to-end deep learning on seismic data interpolation. Additionally, the proposed method is flexible for many cases of traces missing because missing cases are not involved in the training step, and thus it is of plug-and-play nature. These indicate the high generalizability of our approach and the reduction of the need of the problem-specific training. Primary results on synthetic and field data show promising interpolation performances of the presented CNN-POCS method in terms of signal-to-noise ratio, de-aliasing and weak-feature reconstruction, in comparison with traditional $f$-$x$ prediction filtering and curvelet transform based POCS methods.

* 26 pages, 7 figures, 2 tables
Click to Read Paper and Get Code
Traditionally, the P3P problem is solved by firstly transforming its 3 quadratic equations into a quartic one, then by locating the roots of the resulting quartic equation and verifying whether a root does really correspond to a true solution of the P3P problem itself. However, a root of the quartic equation does not always correspond to a solution of the P3P problem. In this work, we show that when the optical center is outside of all the 6 toroids defined by the control point triangle, each positive root of the Grunert's quartic equation must correspond to a true solution of the P3P problem, and the corresponding P3P problem cannot have a unique solution, it must have either 2 positive solutions or 4 positive solutions. In addition, we show that when the optical center passes through any one of the 3 toroids among these 6 toroids ( except possibly for two concentric circles) , the number of the solutions of the corresponding P3P problem always changes by 1, either increased by 1 or decreased by 1.Furthermore we show that such changed solutions always locate in a small neighborhood of control points, hence the 3 toroids are critical surfaces of the P3P problem and the 3 control points are 3 singular points of solutions. A notable example is that when the optical center passes through the outer surface of the union of the 6 toroids from the outside to inside, the number of the solutions must always decrease by 1. Our results are the first to give an explicit and geometrically intuitive relationship between the P3P solutions and the roots of its quartic equation. It could act as some theoretical guidance for P3P practitioners to properly arrange their control points to avoid undesirable solutions.

Click to Read Paper and Get Code
It is well known that the P3P problem could have 1, 2, 3 and at most 4 positive solutions under different configurations among its 3 control points and the position of the optical center. Since in any real applications, the knowledge on the exact number of possible solutions is a prerequisite for selecting the right one among all the possible solutions, the study on the phenomenon of multiple solutions in the P3P problem has been an active topic . In this work, we provide some new geometric interpretations on the multi-solution phenomenon in the P3P problem, our main results include: (1): The necessary and sufficient condition for the P3P problem to have a pair of side-sharing solutions is the two optical centers of the solutions both lie on one of the 3 vertical planes to the base plane of control points; (2): The necessary and sufficient condition for the P3P problem to have a pair of point-sharing solutions is the two optical centers of the solutions both lie on one of the 3 so-called skewed danger cylinders;(3): If the P3P problem has other solutions in addition to a pair of side-sharing ( point-sharing) solutions, these remaining solutions must be a point-sharing ( side-sharing ) pair. In a sense, the side-sharing pair and the point-sharing pair are companion pairs. In sum, our results provide some new insights into the nature of the multi-solution phenomenon in the P3P problem, in addition to their academic value, they could also be used as some theoretical guidance for practitioners in real applications to avoid occurrence of multiple solutions by properly arranging the control points.

Click to Read Paper and Get Code
In this paper, we propose an actor ensemble algorithm, named ACE, for continuous control with a deterministic policy in reinforcement learning. In ACE, we use actor ensemble (i.e., multiple actors) to search the global maxima of the critic. Besides the ensemble perspective, we also formulate ACE in the option framework by extending the option-critic architecture with deterministic intra-option policies, revealing a relationship between ensemble and options. Furthermore, we perform a look-ahead tree search with those actors and a learned value prediction model, resulting in a refined value estimation. We demonstrate a significant performance boost of ACE over DDPG and its variants in challenging physical robot simulators.

* AAAI 2019
Click to Read Paper and Get Code