Models, code, and papers for "Hao Zhang":
Dynamic spectrum access (DSA) is regarded as an effective and efficient technology to share radio spectrum among different networks. As a secondary user (SU), a DSA device will face two critical problems: avoiding causing harmful interference to primary users (PUs), and conducting effective interference coordination with other secondary users. These two problems become even more challenging for a distributed DSA network where there is no centralized controllers for SUs. In this paper, we investigate communication strategies of a distributive DSA network under the presence of spectrum sensing errors. To be specific, we apply the powerful machine learning tool, deep reinforcement learning (DRL), for SUs to learn "appropriate" spectrum access strategies in a distributed fashion assuming NO knowledge of the underlying system statistics. Furthermore, a special type of recurrent neural network (RNN), called the reservoir computing (RC), is utilized to realize DRL by taking advantage of the underlying temporal correlation of the DSA network. Using the introduced machine learning-based strategy, SUs could make spectrum access decisions distributedly relying only on their own current and past spectrum sensing outcomes. Through extensive experiments, our results suggest that the RC-based spectrum access strategy can help the SU to significantly reduce the chances of collision with PUs and other SUs. We also show that our scheme outperforms the myopic method which assumes the knowledge of system statistics, and converges faster than the Q-learning method when the number of channels is large.
Back injuries are the most prevalent work-related musculoskeletal disorders and represent a major cause of disability. Although innovations in wearable robots aim to alleviate this hazard, the majority of existing exoskeletons are obtrusive because the rigid linkage design limits natural movement, thus causing ergonomic risk. Moreover, these existing systems are typically only suitable for one type of movement assistance, not ubiquitous for a wide variety of activities. To fill in this gap, this paper presents a new wearable robot design approach continuum soft exoskeleton. This spine-inspired wearable robot is unobtrusive and assists both squat and stoops while not impeding walking motion. To tackle the challenge of the unique anatomy of spine that is inappropriate to be simplified as a single degree of freedom joint, our robot is conformal to human anatomy and it can reduce multiple types of forces along the human spine such as the spinae muscle force, shear, and compression force of the lumbar vertebrae. We derived kinematics and kinetics models of this mechanism and established an analytical biomechanics model of human-robot interaction. Quantitative analysis of disc compression force, disc shear force and muscle force was performed in simulation. We further developed a virtual impedance control strategy to deliver force control and compensate hysteresis of Bowden cable transmission. The feasibility of the prototype was experimentally tested on three healthy subjects. The root mean square error of force tracking is 6.63 N (3.3 % of the 200N peak force) and it demonstrated that it can actively control the stiffness to the desired value. This continuum soft exoskeleton represents a feasible solution with the potential to reduce back pain for multiple activities and multiple forces along the human spine.
Current studies about motor imagery based rehabilitation training systems for stroke subjects lack an appropriate analytic method, which can achieve a considerable classification accuracy, at the same time detects gradual changes of imagery patterns during rehabilitation process and disinters potential mechanisms about motor function recovery. In this study, we propose an adaptive boosting algorithm based on the cortex plasticity and spectral band shifts. This approach models the usually predetermined spatial-spectral configurations in EEG study into variable preconditions, and introduces a new heuristic of stochastic gradient boost for training base learners under these preconditions. We compare our proposed algorithm with commonly used methods on datasets collected from 2 months' clinical experiments. The simulation results demonstrate the effectiveness of the method in detecting the variations of stroke patients' EEG patterns. By chronologically reorganizing the weight parameters of the learned additive model, we verify the spatial compensatory mechanism on impaired cortex and detect the changes of accentuation bands in spectral domain, which may contribute important prior knowledge for rehabilitation practice.
Training recurrent neural networks (RNNs) with backpropagation through time (BPTT) has known drawbacks such as being difficult to capture longterm dependencies in sequences. Successful alternatives to BPTT have not yet been discovered. Recently, BP with synthetic gradients by a decoupled neural interface module has been proposed to replace BPTT for training RNNs. On the other hand, it has been shown that the representations learned with synthetic and real gradients are different though they are functionally identical. In this project, we explore ways of combining synthetic and real gradients with application to neural language modeling tasks. Empirically, we demonstrate the effectiveness of alternating training with synthetic and real gradients after periodic warm restarts on language modeling tasks.
We advocate the use of implicit fields for learning generative models of shapes and introduce an implicit field decoder for shape generation, aimed at improving the visual quality of the generated shapes. An implicit field assigns a value to each point in 3D space, so that a shape can be extracted as an iso-surface. Our implicit field decoder is trained to perform this assignment by means of a binary classifier. Specifically, it takes a point coordinate, along with a feature vector encoding a shape, and outputs a value which indicates whether the point is outside the shape or not. By replacing conventional decoders by our decoder for representation learning and generative modeling of shapes, we demonstrate superior results for tasks such as shape autoencoding, generation, interpolation, and single-view 3D reconstruction, particularly in terms of visual quality. Code and supplementary material are available at https://github.com/czq142857/implicit-decoder.
The developments of deep neural networks (DNN) in recent years have ushered a brand new era of artificial intelligence. DNNs are proved to be excellent in solving very complex problems, e.g., visual recognition and text understanding, to the extent of competing with or even surpassing people. Despite inspiring and encouraging success of DNNs, thorough theoretical analyses still lack to unravel the mystery of their magics. The design of DNN structure is dominated by empirical results in terms of network depth, number of neurons and activations. A few of remarkable works published recently in an attempt to interpret DNNs have established the first glimpses of their internal mechanisms. Nevertheless, research on exploring how DNNs operate is still at the initial stage with plenty of room for refinement. In this paper, we extend precedent research on neural networks with piecewise linear activations (PLNN) concerning linear regions bounds. We present (i) the exact maximal number of linear regions for single layer PLNNs; (ii) a upper bound for multi-layer PLNNs; and (iii) a tighter upper bound for the maximal number of liner regions on rectifier networks. The derived bounds also indirectly explain why deep models are more powerful than shallow counterparts, and how non-linearity of activation functions impacts on expressiveness of networks.
Semantic role theory is a widely used approach for event representation. Yet, there are multiple indications that semantic role paradigm is necessary but not sufficient to cover all elements of event structure. We conducted an analysis of semantic role representation for events to provide an empirical evidence of insufficiency. The consequence of that is a hybrid role-scalar approach. The results are considered as preliminary in investigation of semantic roles coverage for event representation.
In most convolution neural networks (CNNs), downsampling hidden layers is adopted for increasing computation efficiency and the receptive field size. Such operation is commonly so-called pooling. Maximation and averaging over sliding windows (max/average pooling), and plain downsampling in the form of strided convolution are popular pooling methods. Since the pooling is a lossy procedure, a motivation of our work is to design a new pooling approach for less lossy in the dimensionality reduction. Inspired by the Fourier spectral pooling(FSP) proposed by Rippel et. al. , we present the Hartley transform based spectral pooling method in CNNs. Compared with FSP, the proposed spectral pooling avoids the use of complex arithmetic for frequency representation and reduces the computation. Spectral pooling preserves more structure features for network's discriminability than max and average pooling. We empirically show that Hartley spectral pooling gives rise to the convergence of training CNNs on MNIST and CIFAR-10 datasets.
We present Semantic WordRank (SWR), an unsupervised method for generating an extractive summary of a single document. Built on a weighted word graph with semantic and co-occurrence edges, SWR scores sentences using an article-structure-biased PageRank algorithm with a Softplus function adjustment, and promotes topic diversity using spectral subtopic clustering under the Word-Movers-Distance metric. We evaluate SWR on the DUC-02 and SummBank datasets and show that SWR produces better summaries than the state-of-the-art algorithms over DUC-02 under common ROUGE measures. We then show that, under the same measures over SummBank, SWR outperforms each of the three human annotators (aka. judges) and compares favorably with the combined performance of all judges.
In this paper, we derive a temporal arbitrage policy for storage via reinforcement learning. Real-time price arbitrage is an important source of revenue for storage units, but designing good strategies have proven to be difficult because of the highly uncertain nature of the prices. Instead of current model predictive or dynamic programming approaches, we use reinforcement learning to design an optimal arbitrage policy. This policy is learned through repeated charge and discharge actions performed by the storage unit through updating a value matrix. We design a reward function that does not only reflect the instant profit of charge/discharge decisions but also incorporate the history information. Simulation results demonstrate that our designed reward function leads to significant performance improvement compared with existing algorithms.
Task selection (picking an appropriate labeling task) and worker selection (assigning the labeling task to a suitable worker) are two major challenges in task assignment for crowdsourcing. Recently, worker selection has been successfully addressed by the bandit-based task assignment (BBTA) method, while task selection has not been thoroughly investigated yet. In this paper, we experimentally compare several task selection strategies borrowed from active learning literature, and show that the least confidence strategy significantly improves the performance of task assignment in crowdsourcing.
Recommender systems have received great commercial success. Recommendation has been used widely in areas such as e-commerce, online music FM, online news portal, etc. However, several problems related to input data structure pose serious challenge to recommender system performance. Two of these problems are Matthew effect and sparsity problem. Matthew effect heavily skews recommender system output towards popular items. Data sparsity problem directly affects the coverage of recommendation result. Collaborative filtering is a simple benchmark ubiquitously adopted in the industry as the baseline for recommender system design. Understanding the underlying mechanism of collaborative filtering is crucial for further optimization. In this paper, we do a thorough quantitative analysis on Matthew effect and sparsity problem in the particular context setting of collaborative filtering. We compare the underlying mechanism of user-based and item-based collaborative filtering and give insight to industrial recommender system builders.
Visual and audio modalities are two symbiotic modalities underlying videos, which contain both common and complementary information. If they can be mined and fused sufficiently, performances of related video tasks can be significantly enhanced. However, due to the environmental interference or sensor fault, sometimes, only one modality exists while the other is abandoned or missing. By recovering the missing modality from the existing one based on the common information shared between them and the prior information of the specific modality, great bonus will be gained for various vision tasks. In this paper, we propose a Cross-Modal Cycle Generative Adversarial Network (CMCGAN) to handle cross-modal visual-audio mutual generation. Specifically, CMCGAN is composed of four kinds of subnetworks: audio-to-visual, visual-to-audio, audio-to-audio and visual-to-visual subnetworks respectively, which are organized in a cycle architecture. CMCGAN has several remarkable advantages. Firstly, CMCGAN unifies visual-audio mutual generation into a common framework by a joint corresponding adversarial loss. Secondly, through introducing a latent vector with Gaussian distribution, CMCGAN can handle dimension and structure asymmetry over visual and audio modalities effectively. Thirdly, CMCGAN can be trained end-to-end to achieve better convenience. Benefiting from CMCGAN, we develop a dynamic multimodal classification network to handle the modality missing problem. Abundant experiments have been conducted and validate that CMCGAN obtains the state-of-the-art cross-modal visual-audio generation results. Furthermore, it is shown that the generated modality achieves comparable effects with those of original modality, which demonstrates the effectiveness and advantages of our proposed method.
The conventional high-level sensing techniques require high-fidelity images as input to extract target features, which are produced by either complex imaging hardware or high-complexity reconstruction algorithms. In this letter, we propose single-pixel sensing (SPS) that performs high-level sensing directly from coupled measurements of a single-pixel detector, without the conventional image acquisition and reconstruction process. The technique consists of three steps including binary light modulation that can be physically implemented at $\sim$22kHz, single-pixel coupled detection owning wide working spectrum and high signal-to-noise ratio, and end-to-end deep-learning based sensing that reduces both hardware and software complexity. Besides, the binary modulation is trained and optimized together with the sensing network, which ensures least required measurements and optimal sensing accuracy. The effectiveness of SPS is demonstrated on the classification task of handwritten MNIST dataset, and 96.68% classification accuracy at $\sim$1kHz is achieved. The reported single-pixel sensing technique is a novel framework for highly efficient machine intelligence.
Stochastic variance-reduced gradient (SVRG) is a classical optimization method. Although it is theoretically proved to have better convergence performance than stochastic gradient descent (SGD), the generalization performance of SVRG remains open. In this paper we investigate the effects of some training techniques, mini-batching and learning rate decay, on the generalization performance of SVRG, and verify the generalization performance of Batch-SVRG (B-SVRG). In terms of the relationship between optimization and generalization, we believe that the average norm of gradients on each training sample as well as the norm of average gradient indicate how flat the landscape is and how well the model generalizes. Based on empirical observations of such metrics, we perform a sign switch on B-SVRG and derive a practical algorithm, BatchPlus-SVRG (BP-SVRG), which is numerically shown to enjoy better generalization performance than B-SVRG, even SGD in some scenarios of deep neural networks.
In recent years, data-driven methods have been utilized to learn dynamical systems and partial differential equations (PDE). However, major challenges remain to be resolved, including learning PDE under noisy data and limited discrete data. To overcome these challenges, in this work, a deep-learning based data-driven method, called DL-PDE, is developed to discover the governing PDEs of underlying physical processes. The DL-PDE method combines deep learning via neural networks and data-driven discovery of PDEs via sparse regressions, such as the least absolute shrinkage and selection operator (Lasso) and sequential threshold ridge regression (STRidge). In this method, derivatives are calculated by automatic differentiation from the deep neural network, and equation form and coefficients are obtained with sparse regressions. The DL-PDE is tested with physical processes, governed by groundwater flow equation, contaminant transport equation, Burgers equation and Korteweg-de Vries (KdV) equation, for proof-of-concept and applications in real-world engineering settings. The proposed DL-PDE achieves satisfactory results when data are discrete and noisy.
Pulmonary nodule detection, false positive reduction and segmentation represent three of the most common tasks in the computeraided analysis of chest CT images. Methods have been proposed for eachtask with deep learning based methods heavily favored recently. However training deep learning models to solve each task separately may be sub-optimal - resource intensive and without the benefit of feature sharing. Here, we propose a new end-to-end 3D deep convolutional neural net (DCNN), called NoduleNet, to solve nodule detection, false positive reduction and nodule segmentation jointly in a multi-task fashion. To avoid friction between different tasks and encourage feature diversification, we incorporate two major design tricks: 1) decoupled feature maps for nodule detection and false positive reduction, and 2) a segmentation refinement subnet for increasing the precision of nodule segmentation. Extensive experiments on the large-scale LIDC dataset demonstrate that the multi-task training is highly beneficial, improving the nodule detection accuracy by 10.27%, compared to the baseline model trained to only solve the nodule detection task. We also carry out systematic ablation studies to highlight contributions from each of the added components. Code is available at https://github.com/uci-cbcl/NoduleNet.
LIDAR is one of the most important sensors for Unmanned Ground Vehicles (UGV). Object detection and classification based on lidar point cloud is a key technology for UGV. In object detection and classification, the mutual occlusion between neighboring objects is an important factor affecting the accuracy. In this paper, we consider occlusion as an intrinsic property of the point cloud data. We propose a novel approach that explicitly model the occlusion. The occlusion property is then taken into account in the subsequent classification step. We perform experiments on the KITTI dataset. Experimental results indicate that by utilizing the occlusion property that we modeled, the classifier obtains much better performance.
Traditionally the danger cylinder is intimately related to the solution stability in P3P problem. In this work, we show that the danger cylinder is also closely related to the multiple-solution phenomenon. More specifically, we show when the optical center lies on the danger cylinder, of the 3 possible P3P solutions, i.e., one double solution, and two other solutions, the optical center of the double solution still lies on the danger cylinder, but the optical centers of the other two solutions no longer lie on the danger cylinder. And when the optical center moves on the danger cylinder, accordingly the optical centers of the two other solutions of the corresponding P3P problem form a new surface, characterized by a polynomial equation of degree 12 in the optical center coordinates, called the Companion Surface of Danger Cylinder (CSDC). That means the danger cylinder always has a companion surface. For the significance of CSDC, we show that when the optical center passes through the CSDC, the number of solutions of P3P problem must change by 2. That means CSDC acts as a delimitating surface of the P3P solution space. These new findings shed some new lights on the P3P multi-solution phenomenon, an important issue in PnP study.