Research papers and code for "Hao Song":
Dynamic spectrum access (DSA) is regarded as an effective and efficient technology to share radio spectrum among different networks. As a secondary user (SU), a DSA device will face two critical problems: avoiding causing harmful interference to primary users (PUs), and conducting effective interference coordination with other secondary users. These two problems become even more challenging for a distributed DSA network where there is no centralized controllers for SUs. In this paper, we investigate communication strategies of a distributive DSA network under the presence of spectrum sensing errors. To be specific, we apply the powerful machine learning tool, deep reinforcement learning (DRL), for SUs to learn "appropriate" spectrum access strategies in a distributed fashion assuming NO knowledge of the underlying system statistics. Furthermore, a special type of recurrent neural network (RNN), called the reservoir computing (RC), is utilized to realize DRL by taking advantage of the underlying temporal correlation of the DSA network. Using the introduced machine learning-based strategy, SUs could make spectrum access decisions distributedly relying only on their own current and past spectrum sensing outcomes. Through extensive experiments, our results suggest that the RC-based spectrum access strategy can help the SU to significantly reduce the chances of collision with PUs and other SUs. We also show that our scheme outperforms the myopic method which assumes the knowledge of system statistics, and converges faster than the Q-learning method when the number of channels is large.

* This work is accepted in IEEE IoT Journal 2018
Click to Read Paper and Get Code
The technology of image segmentation is widely used in medical image processing, face recognition pedestrian detection, etc. The current image segmentation techniques include region-based segmentation, edge detection segmentation, segmentation based on clustering, segmentation based on weakly-supervised learning in CNN, etc. This paper analyzes and summarizes these algorithms of image segmentation, and compares the advantages and disadvantages of different algorithms. Finally, we make a prediction of the development trend of image segmentation with the combination of these algorithms.

Click to Read Paper and Get Code
The task of calibration is to retrospectively adjust the outputs from a machine learning model to provide better probability estimates on the target variable. While calibration has been investigated thoroughly in classification, it has not yet been well-established for regression tasks. This paper considers the problem of calibrating a probabilistic regression model to improve the estimated probability densities over the real-valued targets. We propose to calibrate a regression model through the cumulative probability density, which can be derived from calibrating a multi-class classifier. We provide three non-parametric approaches to solve the problem, two of which provide empirical estimates and the third providing smooth density estimates. The proposed approaches are experimentally evaluated to show their ability to improve the performance of regression models on the predictive likelihood.

Click to Read Paper and Get Code
We are concerned with obtaining well-calibrated output distributions from regression models. Such distributions allow us to quantify the uncertainty that the model has regarding the predicted target value. We introduce the novel concept of distribution calibration, and demonstrate its advantages over the existing definition of quantile calibration. We further propose a post-hoc approach to improving the predictions from previously trained regression models, using multi-output Gaussian Processes with a novel Beta link function. The proposed method is experimentally verified on a set of common regression models and shows improvements for both distribution-level and quantile-level calibration.

* ICML 2019, 10 pages
Click to Read Paper and Get Code
Knowledge bases (KBs) have attracted increasing attention due to its great success in various areas, such as Web and mobile search.Existing KBs are restricted to objective factual knowledge, such as city population or fruit shape, whereas,subjective knowledge, such as big city, which is commonly mentioned in Web and mobile queries, has been neglected. Subjective knowledge differs from objective knowledge in that it has no documented or observed ground truth. Instead, the truth relies on people's dominant opinion. Thus, we can use the crowdsourcing technique to get opinion from the crowd. In our work, we propose a system, called crowdsourced subjective knowledge acquisition (CoSKA),for subjective knowledge acquisition powered by crowdsourcing and existing KBs. The acquired knowledge can be used to enrich existing KBs in the subjective dimension which bridges the gap between existing objective knowledge and subjective queries.The main challenge of CoSKA is the conflict between large scale knowledge facts and limited crowdsourcing resource. To address this challenge, in this work, we define knowledge inference rules and then select the seed knowledge judiciously for crowdsourcing to maximize the inference power under the resource constraint. Our experimental results on real knowledge base and crowdsourcing platform verify the effectiveness of CoSKA system.

Click to Read Paper and Get Code
This paper presents a cost-sensitive active Question-Answering (QA) framework for learning a nine-layer And-Or graph (AOG) from web images. The AOG explicitly represents object categories, poses/viewpoints, parts, and detailed structures within the parts in a compositional hierarchy. The QA framework is designed to minimize an overall risk, which trades off the loss and query costs. The loss is defined for nodes in all layers of the AOG, including the generative loss (measuring the likelihood of the images) and the discriminative loss (measuring the fitness to human answers). The cost comprises both the human labor of answering questions and the computational cost of model learning. The cost-sensitive QA framework iteratively selects different storylines of questions to update different nodes in the AOG. Experiments showed that our method required much less human supervision (e.g., labeling parts on 3--10 training objects for each category) and achieved better performance than baseline methods.

Click to Read Paper and Get Code
Region-based Convolutional Neural Networks (R-CNNs) have achieved great success in the field of object detection. The existing R-CNNs usually divide a Region-of-Interest (ROI) into grids, and then localize objects by utilizing the spatial information reflected by the relative position of each grid in the ROI. In this paper, we propose a novel feature-encoding approach, where spatial information is represented through the spatial distributions of visual patterns. In particular, we design a Mask Weight Network (MWN) to learn a set of masks and then apply channel-wise masking operations to ROI feature map, followed by a global pooling and a cheap fully-connected layer. We integrate the newly designed feature encoder into the Faster R-CNN architecture. The resulting new Faster R-CNNs can preserve the object-detection accuracy of the standard Faster R-CNNs by using substantially fewer parameters. Compared to R-FCNs using state-of-art PS ROI pooling and deformable PS ROI pooling, the new Faster R-CNNs can produce higher object-detection accuracy with good run-time efficiency. We also show that a specifically designed and learned MWN can capture global contextual information and further improve the object-detection accuracy. Validation experiments are conducted on both PASCAL VOC and MS COCO datasets.

* 8 pages
Click to Read Paper and Get Code
In this paper, we present an object detection method that tackles the stingray detection problem based on aerial images. In this problem, the images are aerially captured on a sea-surface area by using an Unmanned Aerial Vehicle (UAV), and the stingrays swimming under (but close to) the sea surface are the target we want to detect and locate. To this end, we use a deep object detection method, faster RCNN, to train a stingray detector based on a limited training set of images. To boost the performance, we develop a new generative approach, conditional GLO, to increase the training samples of stingray, which is an extension of the Generative Latent Optimization (GLO) approach. Unlike traditional data augmentation methods that generate new data only for image classification, our proposed method that mixes foreground and background together can generate new data for an object detection task, and thus improve the training efficacy of a CNN detector. Experimental results show that satisfiable performance can be obtained by using our approach on stingray detection in aerial images.

* to appear in CVPR 2018 Workshop (CVPR 2018 Workshop and Challenge: Automated Analysis of Marine Video for Environmental Monitoring)
Click to Read Paper and Get Code
We present a robust and precise localization system that achieves centimeter-level localization accuracy in disparate city scenes. Our system adaptively uses information from complementary sensors such as GNSS, LiDAR, and IMU to achieve high localization accuracy and resilience in challenging scenes, such as urban downtown, highways, and tunnels. Rather than relying only on LiDAR intensity or 3D geometry, we make innovative use of LiDAR intensity and altitude cues to significantly improve localization system accuracy and robustness. Our GNSS RTK module utilizes the help of the multi-sensor fusion framework and achieves a better ambiguity resolution success rate. An error-state Kalman filter is applied to fuse the localization measurements from different sources with novel uncertainty estimation. We validate, in detail, the effectiveness of our approaches, achieving 5-10cm RMS accuracy and outperforming previous state-of-the-art systems. Importantly, our system, while deployed in a large autonomous driving fleet, made our vehicles fully autonomous in crowded city streets despite road construction that occurred from time to time. A dataset including more than 60 km real traffic driving in various urban roads is used to comprehensively test our system.

* 8 pages, 6 figures, 2 tables
Click to Read Paper and Get Code
This paper studies recommender systems with knowledge graphs, which can effectively address the problems of data sparsity and cold start. Recently, a variety of methods have been developed for this problem, which generally try to learn effective representations of users and items and then match items to users according to their representations. Though these methods have been shown quite effective, they lack good explanations, which are critical to recommender systems. In this paper, we take a different path and propose generating recommendations by finding meaningful paths from users to items. Specifically, we formulate the problem as a sequential decision process, where the target user is defined as the initial state, and the walks on the graphs are defined as actions. We shape the rewards according to existing state-of-the-art methods and then train a policy function with policy gradient methods. Experimental results on three real-world datasets show that our proposed method not only provides effective recommendations but also offers good explanations.

Click to Read Paper and Get Code
Just like many other topics in computer vision, image classification has achieved significant progress recently by using deep-learning neural networks, especially the Convolutional Neural Networks (CNN). Most of the existing works are focused on classifying very clear natural images, evidenced by the widely used image databases such as Caltech-256, PASCAL VOCs and ImageNet. However, in many real applications, the acquired images may contain certain degradations that lead to various kinds of blurring, noise, and distortions. One important and interesting problem is the effect of such degradations to the performance of CNN-based image classification. More specifically, we wonder whether image-classification performance drops with each kind of degradation, whether this drop can be avoided by including degraded images into training, and whether existing computer vision algorithms that attempt to remove such degradations can help improve the image-classification performance. In this paper, we empirically study this problem for four kinds of degraded images -- hazy images, underwater images, motion-blurred images and fish-eye images. For this study, we synthesize a large number of such degraded images by applying respective physical models to the clear natural images and collect a new hazy image dataset from the Internet. We expect this work can draw more interests from the community to study the classification of degraded images.

Click to Read Paper and Get Code
Graph similarity search is among the most important graph-based applications, e.g. finding the chemical compounds that are most similar to a query compound. Graph similarity/distance computation, such as Graph Edit Distance (GED) and Maximum Common Subgraph (MCS), is the core operation of graph similarity search and many other applications, but very costly to compute in practice. Inspired by the recent success of neural network approaches to several graph applications, such as node or graph classification, we propose a novel neural network based approach to address this classic yet challenging graph problem, aiming to alleviate the computational burden while preserving a good performance. The proposed approach, called SimGNN, combines two strategies. First, we design a learnable embedding function that maps every graph into an embedding vector, which provides a global summary of a graph. A novel attention mechanism is proposed to emphasize the important nodes with respect to a specific similarity metric. Second, we design a pairwise node comparison method to supplement the graph-level embeddings with fine-grained node-level information. Our model can be trained in an end-to-end fashion, achieves better generalization on unseen graphs, and in the worst case runs in quadratic time with respect to the number of nodes in two graphs. Taking GED computation as an example, experimental results on three real graph datasets demonstrate the effectiveness and efficiency of our approach. Specifically, our model achieves smaller error rate and great time reduction compared against a series of baselines, including several approximation algorithms on GED computation, and many existing graph neural network based models. Our study suggests SimGNN provides a new direction for future research on graph similarity computation and graph similarity search.

Click to Read Paper and Get Code
Satellite-based positioning system such as GPS often suffers from large amount of noise that degrades the positioning accuracy dramatically especially in real-time applications. In this work, we consider a data-mining approach to enhance the GPS signal. We build a large-scale high precision GPS receiver grid system to collect real-time GPS signals for training. The Gaussian Process (GP) regression is chosen to model the vertical Total Electron Content (vTEC) distribution of the ionosphere of the Earth. Our experiments show that the noise in the real-time GPS signals often exceeds the breakdown point of the conventional robust regression methods resulting in sub-optimal system performance. We propose a three-step approach to address this challenge. In the first step we perform a set of signal validity tests to separate the signals into clean and dirty groups. In the second step, we train an initial model on the clean signals and then reweigting the dirty signals based on the residual error. A final model is retrained on both the clean signals and the reweighted dirty signals. In the theoretical analysis, we prove that the proposed three-step approach is able to tolerate much higher noise level than the vanilla robust regression methods if two reweighting rules are followed. We validate the superiority of the proposed method in our real-time high precision positioning system against several popular state-of-the-art robust regression methods. Our method achieves centimeter positioning accuracy in the benchmark region with probability $78.4\%$ , outperforming the second best baseline method by a margin of $8.3\%$. The benchmark takes 6 hours on 20,000 CPU cores or 14 years on a single CPU.

* accepted by SIGKDD 2019
Click to Read Paper and Get Code
Semantic segmentation is critical to image content understanding and object localization. Recent development in fully-convolutional neural network (FCN) has enabled accurate pixel-level labeling. One issue in previous works is that the FCN based method does not exploit the object boundary information to delineate segmentation details since the object boundary label is ignored in the network training. To tackle this problem, we introduce a double branch fully convolutional neural network, which separates the learning of the desirable semantic class labeling with mask-level object proposals guided by relabeled boundaries. This network, called object boundary guided FCN (OBG-FCN), is able to integrate the distinct properties of object shape and class features elegantly in a fully convolutional way with a designed masking architecture. We conduct experiments on the PASCAL VOC segmentation benchmark, and show that the end-to-end trainable OBG-FCN system offers great improvement in optimizing the target semantic segmentation quality.

* The results in the first version of this paper are mistaken due to overlapping validation data and incorrect benchmark methods
Click to Read Paper and Get Code
Events are happening in real-world and real-time, which can be planned and organized occasions involving multiple people and objects. Social media platforms publish a lot of text messages containing public events with comprehensive topics. However, mining social events is challenging due to the heterogeneous event elements in texts and explicit and implicit social network structures. In this paper, we design an event meta-schema to characterize the semantic relatedness of social events and build an event-based heterogeneous information network (HIN) integrating information from external knowledge base, and propose a novel Pair-wise Popularity Graph Convolutional Network (PP-GCN) based fine-grained social event categorization model. We propose a Knowledgeable meta-paths Instances based social Event Similarity (KIES) between events and build a weighted adjacent matrix as input to the PP-GCN model. Comprehensive experiments on real data collections are conducted to compare various social event detection and clustering tasks. Experimental results demonstrate that our proposed framework outperforms other alternative social event categorization techniques.

* Accepted by IJCAI'19(International Joint Conference on Artificial Intelligence)
Click to Read Paper and Get Code
Chromosome classification is critical for karyotyping in abnormality diagnosis. To expedite diagnosis process, we present a novel method named Varifocal-Net for simultaneous classification of chromosome's type and polarity using deep convolutional networks. The approach consists of one global-scale network (G-Net) and one local-scale network (L-Net). It follows two stages. The first stage is to learn both global and local features. We extract global features and detect finer local regions via the G-Net. With the proposed varifocal mechanism, we zoom into local parts and extract local features via the L-Net. Residual learning and multi-task learning strategies are utilized to promote high-level feature extraction. The detection of discriminative local parts is fulfilled by a localization subnet of the G-Net, whose training process involves both supervised and weekly-supervised learning. The second stage is to build two multi-layer perceptron classifiers that exploit features of both two scales to boost classification performance. Evaluation results from 1909 karyotyping cases demonstrate that our Varifocal-Net achieved the highest accuracy of 0.9805, 0.9909 and average F1-score of 0.9771, 0.9909 for the type and polarity task, respectively. It outperformed state-of-the-art methods, demonstrating the effectiveness of our Varifocal mechanism and multi-scale feature ensemble.

* 10 pages, 11 figures, 10 tables
Click to Read Paper and Get Code
We study the task of image inpainting, which is to fill in the missing region of an incomplete image with plausible contents. To this end, we propose a learning-based approach to generate visually coherent completion given a high-resolution image with missing components. In order to overcome the difficulty to directly learn the distribution of high-dimensional image data, we divide the task into inference and translation as two separate steps and model each step with a deep neural network. We also use simple heuristics to guide the propagation of local textures from the boundary to the hole. We show that, by using such techniques, inpainting reduces to the problem of learning two image-feature translation functions in much smaller space and hence easier to train. We evaluate our method on several public datasets and show that we generate results of better visual quality than previous state-of-the-art methods.

* ECCV 2018 camera ready
Click to Read Paper and Get Code
This paper describes HyperStream, a large-scale, flexible and robust software package, written in the Python language, for processing streaming data with workflow creation capabilities. HyperStream overcomes the limitations of other computational engines and provides high-level interfaces to execute complex nesting, fusion, and prediction both in online and offline forms in streaming environments. HyperStream is a general purpose tool that is well-suited for the design, development, and deployment of Machine Learning algorithms and predictive models in a wide space of sequential predictive problems. Source code, installation instructions, examples, and documentation can be found at: https://github.com/IRC-SPHERE/HyperStream.

Click to Read Paper and Get Code
Autonomous driving has attracted remarkable attention from both industry and academia. An important task is to estimate 3D properties(e.g.translation, rotation and shape) of a moving or parked vehicle on the road. This task, while critical, is still under-researched in the computer vision community - partially owing to the lack of large scale and fully-annotated 3D car database suitable for autonomous driving research. In this paper, we contribute the first large-scale database suitable for 3D car instance understanding - ApolloCar3D. The dataset contains 5,277 driving images and over 60K car instances, where each car is fitted with an industry-grade 3D CAD model with absolute model size and semantically labelled keypoints. This dataset is above 20 times larger than PASCAL3D+ and KITTI, the current state-of-the-art. To enable efficient labelling in 3D, we build a pipeline by considering 2D-3D keypoint correspondences for a single instance and 3D relationship among multiple instances. Equipped with such dataset, we build various baseline algorithms with the state-of-the-art deep convolutional neural networks. Specifically, we first segment each car with a pre-trained Mask R-CNN, and then regress towards its 3D pose and shape based on a deformable 3D car model with or without using semantic keypoints. We show that using keypoints significantly improves fitting performance. Finally, we develop a new 3D metric jointly considering 3D pose and 3D shape, allowing for comprehensive evaluation and ablation study. By comparing with human performance we suggest several future directions for further improvements.

Click to Read Paper and Get Code
We study the problem of semantic matching in product search, that is, given a customer query, retrieve all semantically related products from the catalog. Pure lexical matching via an inverted index falls short in this respect due to several factors: a) lack of understanding of hypernyms, synonyms, and antonyms, b) fragility to morphological variants (e.g. "woman" vs. "women"), and c) sensitivity to spelling errors. To address these issues, we train a deep learning model for semantic matching using customer behavior data. Much of the recent work on large-scale semantic search using deep learning focuses on ranking for web search. In contrast, semantic matching for product search presents several novel challenges, which we elucidate in this paper. We address these challenges by a) developing a new loss function that has an inbuilt threshold to differentiate between random negative examples, impressed but not purchased examples, and positive examples (purchased items), b) using average pooling in conjunction with n-grams to capture short-range linguistic patterns, c) using hashing to handle out of vocabulary tokens, and d) using a model parallel training architecture to scale across 8 GPUs. We present compelling offline results that demonstrate at least 4.7% improvement in Recall@100 and 14.5% improvement in mean average precision (MAP) over baseline state-of-the-art semantic search methods using the same tokenization method. Moreover, we present results and discuss learnings from online A/B tests which demonstrate the efficacy of our method.

* 10 pages, 7 figures, KDD 2019 (Applied Data Science Track)
Click to Read Paper and Get Code