Research papers and code for "Xia Liang":
Most existing neural network models for music generation explore how to generate music bars, then directly splice the music bars into a song. However, these methods do not explore the relationship between the bars, and the connected song as a whole has no musical form structure and sense of musical direction. To address this issue, we propose a Multi-model Multi-task Hierarchical Conditional VAE-GAN (Variational Autoencoder-Generative adversarial networks) networks, named MIDI-Sandwich, which combines musical knowledge, such as musical form, tonic, and melodic motion. The MIDI-Sandwich has two submodels: Hierarchical Conditional Variational Autoencoder (HCVAE) and Hierarchical Conditional Generative Adversarial Network (HCGAN). The HCVAE uses hierarchical structure. The underlying layer of HCVAE uses Local Conditional Variational Autoencoder (L-CVAE) to generate a music bar which is pre-specified by the First and Last Notes (FLN). The upper layer of HCVAE uses Global Variational Autoencoder(G-VAE) to analyze the latent vector sequence generated by the L-CVAE encoder, to explore the musical relationship between the bars, and to produce the song pieced together by multiple music bars generated by the L-CVAE decoder, which makes the song both have musical structure and sense of direction. At the same time, the HCVAE shares a part of itself with the HCGAN to further improve the performance of the generated music. The MIDI-Sandwich is validated on the Nottingham dataset and is able to generate a single-track melody sequence (17x8 beats), which is superior to the length of most of the generated models (8 to 32 beats). Meanwhile, by referring to the experimental methods of many classical kinds of literature, the quality evaluation of the generated music is performed. The above experiments prove the validity of the model.

* cast KSEM2019 on May 3, 2019 (weak rejected)
Click to Read Paper and Get Code
Quantitative structure-activity relationship (QSAR) modelling is effective 'bridge' to search the reliable relationship related bioactivity to molecular structure. A QSAR classification model contains a lager number of redundant, noisy and irrelevant descriptors. To address this problem, various of methods have been proposed for descriptor selection. Generally, they can be grouped into three categories: filters, wrappers, and embedded methods. Regularization method is an important embedded technology, which can be used for continuous shrinkage and automatic descriptors selection. In recent years, the interest of researchers in the application of regularization techniques is increasing in descriptors selection , such as, logistic regression(LR) with $L_1$ penalty. In this paper, we proposed a novel descriptor selection method based on self-paced learning(SPL) with Logsum penalized LR for predicting the bioactivity of molecular structure. SPL inspired by the learning process of humans and animals that gradually learns from easy samples(smaller losses) to hard samples(bigger losses) samples into training and Logsum regularization has capacity to select few meaningful and significant molecular descriptors, respectively. Experimental results on simulation and three public QSAR datasets show that our proposed SPL-Logsum method outperforms other commonly used sparse methods in terms of classification performance and model interpretation.

Click to Read Paper and Get Code
With the rapid progress of China's urbanization, research on the automatic detection of land-use patterns in Chinese cities is of substantial importance. Deep learning is an effective method to extract image features. To take advantage of the deep-learning method in detecting urban land-use patterns, we applied a transfer-learning-based remote-sensing image approach to extract and classify features. Using the Google Tensorflow framework, a powerful convolution neural network (CNN) library was created. First, the transferred model was previously trained on ImageNet, one of the largest object-image data sets, to fully develop the model's ability to generate feature vectors of standard remote-sensing land-cover data sets (UC Merced and WHU-SIRI). Then, a random-forest-based classifier was constructed and trained on these generated vectors to classify the actual urban land-use pattern on the scale of traffic analysis zones (TAZs). To avoid the multi-scale effect of remote-sensing imagery, a large random patch (LRP) method was used. The proposed method could efficiently obtain acceptable accuracy (OA = 0.794, Kappa = 0.737) for the study area. In addition, the results show that the proposed method can effectively overcome the multi-scale effect that occurs in urban land-use classification at the irregular land-parcel level. The proposed method can help planners monitor dynamic urban land use and evaluate the impact of urban-planning schemes.

* 8 pages, 8 figures, 2 tables
Click to Read Paper and Get Code
The shortage of high-resolution urban digital elevation model (DEM) datasets has been a challenge for modelling urban flood and managing its risk. A solution is to develop effective approaches to reconstruct high-resolution DEMs from their low-resolution equivalents that are more widely available. However, the current high-resolution DEM reconstruction approaches mainly focus on natural topography. Few attempts have been made for urban topography which is typically an integration of complex man-made and natural features. This study proposes a novel multi-scale mapping approach based on convolutional neural network (CNN) to deal with the complex characteristics of urban topography and reconstruct high-resolution urban DEMs. The proposed multi-scale CNN model is firstly trained using urban DEMs that contain topographic features at different resolutions, and then used to reconstruct the urban DEM at a specified (high) resolution from a low-resolution equivalent. A two-level accuracy assessment approach is also designed to evaluate the performance of the proposed urban DEM reconstruction method, in terms of numerical accuracy and morphological accuracy. The proposed DEM reconstruction approach is applied to a 121 km2 urbanized area in London, UK. Compared with other commonly used methods, the current CNN based approach produces superior results, providing a cost-effective innovative method to acquire high-resolution DEMs in other data-scarce environments.

Click to Read Paper and Get Code
In this paper, we present a fast yet effective method for pixel-level scale-invariant image fusion in spatial domain based on the scale-space theory. Specifically, we propose a scale-invariant structure saliency selection scheme based on the difference-of-Gaussian (DoG) pyramid of images to build the weights or activity map. Due to the scale-invariant structure saliency selection, our method can keep both details of small size objects and the integrity information of large size objects in images. In addition, our method is very efficient since there are no complex operation involved and easy to be implemented and therefore can be used for fast high resolution images fusion. Experimental results demonstrate the proposed method yields competitive or even better results comparing to state-of-the-art image fusion methods both in terms of visual quality and objective evaluation metrics. Furthermore, the proposed method is very fast and can be used to fuse the high resolution images in real-time. Code is available at https://github.com/yiqingmy/Fusion.

Click to Read Paper and Get Code
Parsing articulated objects, e.g. humans and animals, into semantic parts (e.g. body, head and arms, etc.) from natural images is a challenging and fundamental problem for computer vision. A big difficulty is the large variability of scale and location for objects and their corresponding parts. Even limited mistakes in estimating scale and location will degrade the parsing output and cause errors in boundary details. To tackle these difficulties, we propose a "Hierarchical Auto-Zoom Net" (HAZN) for object part parsing which adapts to the local scales of objects and parts. HAZN is a sequence of two "Auto-Zoom Net" (AZNs), each employing fully convolutional networks that perform two tasks: (1) predict the locations and scales of object instances (the first AZN) or their parts (the second AZN); (2) estimate the part scores for predicted object instance or part regions. Our model can adaptively "zoom" (resize) predicted image regions into their proper scales to refine the parsing. We conduct extensive experiments over the PASCAL part datasets on humans, horses, and cows. For humans, our approach significantly outperforms the state-of-the-arts by 5% mIOU and is especially better at segmenting small instances and small parts. We obtain similar improvements for parsing cows and horses over alternative methods. In summary, our strategy of first zooming into objects and then zooming into parts is very effective. It also enables us to process different regions of the image at different scales adaptively so that, for example, we do not need to waste computational resources scaling the entire image.

* A shortened version has been submitted to ECCV 2016
Click to Read Paper and Get Code
As we begin to consider modeling large, realistic 3D building scenes, it becomes necessary to consider a more compact representation over the polygonal mesh model. Due to the large amounts of annotated training data, which is costly to obtain, we leverage synthetic data to train our system for the satellite image domain. By utilizing the synthetic data, we formulate the building decomposition as an application of instance segmentation and primitive fitting to decompose a building into a set of primitive shapes. Experimental results on WorldView-3 satellite image dataset demonstrate the effectiveness of our 3D building modeling approach.

Click to Read Paper and Get Code
Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedback. Users' feedback can be positive and negative and both types of feedback have great potentials to boost recommendations. However, the number of negative feedback is much larger than that of positive one; thus incorporating them simultaneously is challenging since positive feedback could be buried by negative one. In this paper, we develop a novel approach to incorporate them into the proposed deep recommender system (DEERS) framework. The experimental results based on real-world e-commerce data demonstrate the effectiveness of the proposed framework. Further experiments have been conducted to understand the importance of both positive and negative feedback in recommendations.

* arXiv admin note: substantial text overlap with arXiv:1801.00209
Click to Read Paper and Get Code
Motivation: Post-database searching is a key procedure in peptide dentification with tandem mass spectrometry (MS/MS) strategies for refining peptide-spectrum matches (PSMs) generated by database search engines. Although many statistical and machine learning-based methods have been developed to improve the accuracy of peptide identification, the challenge remains on large-scale datasets and datasets with an extremely large proportion of false positives (hard datasets). A more efficient learning strategy is required for improving the performance of peptide identification on challenging datasets. Results: In this work, we present an online learning method to conquer the challenges remained for exiting peptide identification algorithms. We propose a cost-sensitive learning model by using different loss functions for decoy and target PSMs respectively. A larger penalty for wrongly selecting decoy PSMs than that for target PSMs, and thus the new model can reduce its false discovery rate on hard datasets. Also, we design an online learning algorithm, OLCS-Ranker, to solve the proposed learning model. Rather than taking all training data samples all at once, OLCS-Ranker iteratively feeds in only one training sample into the learning model at each round. As a result, the memory requirement is significantly reduced for large-scale problems. Experimental studies show that OLCS-Ranker outperforms benchmark methods, such as CRanker and Batch-CS-Ranker, in terms of accuracy and stability. Furthermore, OLCS-Ranker is 15--85 times faster than CRanker method on large datasets. Availability and implementation: OLCS-Ranker software is available at no charge for non-commercial use at https://github.com/Isaac-QiXing/CRanker.

* 16 pages, 3 figures
Click to Read Paper and Get Code
Chemical reaction practicality is the core task among all symbol intelligence based chemical information processing, for example, it provides indispensable clue for further automatic synthesis route inference. Considering that chemical reactions have been represented in a language form, we propose a new solution to generally judge the practicality of organic reaction without considering complex quantum physical modeling or chemistry knowledge. While tackling the practicality judgment as a machine learning task from positive and negative (chemical reaction) samples, all existing studies have to carefully handle the serious insufficiency issue on the negative samples. We propose an auto-construction method to well solve the extensively existed long-term difficulty. Experimental results show our model can effectively predict the practicality of chemical reactions, which achieves a high accuracy of 99.76\% on real large-scale chemical lab reaction practicality judgment.

Click to Read Paper and Get Code
Segmentation for tracking surgical instruments plays an important role in robot-assisted surgery. Segmentation of surgical instruments contributes to capturing accurate spatial information for tracking. In this paper, a novel network, Refined Attention Segmentation Network, is proposed to simultaneously segment surgical instruments and identify their categories. The U-shape network which is popular in segmentation is used. Different from previous work, an attention module is adopted to help the network focus on key regions, which can improve the segmentation accuracy. To solve the class imbalance problem, the weighted sum of the cross entropy loss and the logarithm of the Jaccard index is used as loss function. Furthermore, transfer learning is adopted in our network. The encoder is pre-trained on ImageNet. The dataset from the MICCAI EndoVis Challenge 2017 is used to evaluate our network. Based on this dataset, our network achieves state-of-the-art performance 94.65% mean Dice and 90.33% mean IOU.

* This paper has been accepted by 2019 41st Annual International Conference of the IEEE Engineering in Medicine &Biology Society (EMBC)
Click to Read Paper and Get Code
China made the announement to create the Xiongan New Area in Hebei in April 1,2017. Thus a new magacity about 110km south west of Beijing will emerge. Xiongan New Area is of great practial significant and historical significant for transferring Beijing's non-capital function. Simulating the urban dynamics in Xiongan New Area can help planners to decide where to build the new urban and further manage the future urban growth. However, only a little research focus on the future urban development in Xiongan New Area. In addition, previous models are unable to simulate the urban dynamics in Xiongan New Area. Because there are no original high density urbna for these models to learn the transition rules.In this study, we proposed a C-FLUS model to solve such problems. This framework was implemented by coupling a modified Cellular automata(CA). An elaborately designed random planted seeds machanism based on local maximums is addressed in the CA model to better simulate the occurrence of the new urban. Through an analysis of the current driving forces, the C-FLUS can detect the potential start zone and simulate the urban development under different scenarios in Xiongan New Area. Our study shows that the new urban is most likely to occur in northwest of Xiongxian, and it will rapidly extend to Rongcheng and Anxin until almost cover the northern part of Xiongan New Area. Moreover, the method can help planners to evaluate the impact of urban expansion in Xiongan New Area.

Click to Read Paper and Get Code
Consider the problem: given data pair $(\mathbf{x}, \mathbf{y})$ drawn from a population with $f_*(x) = \mathbf{E}[\mathbf{y} | \mathbf{x} = x]$, specify a neural network and run gradient flow on the weights over time until reaching any stationarity. How does $f_t$, the function computed by the neural network at time $t$, relate to $f_*$, in terms of approximation and representation? What are the provable benefits of the adaptive representation by neural networks compared to the pre-specified fixed basis representation in the classical nonparametric literature? We answer the above questions via a dynamic reproducing kernel Hilbert space (RKHS) approach indexed by the training process of neural networks. We show that when reaching any local stationarity, gradient flow learns an adaptive RKHS representation, and performs the global least squares projection onto the adaptive RKHS, simultaneously. In addition, we prove that as the RKHS is data-adaptive and task-specific, the residual for $f_*$ lies in a subspace that is smaller than the orthogonal complement of the RKHS, formalizing the representation and approximation benefits of neural networks.

* 24 pages, 5 figures
Click to Read Paper and Get Code
In person re-identification (re-ID),In person re-identification (re-ID), we usually refer the challenges of this task to variances in visual factors such as the viewpoint, pose, illumination and background. In spite of acknowledging these factors to be influential, quantitative studies on how they affect a re-ID system are still lacking.To gain insights in this scientific campaign, this paper makes an early attempt in studying a particular factor, viewpoint. We narrow the viewpoint problem down to the pedestrian rotation angle to obtain focused conclusions. In this regard, this paper makes two contributions to the community. First, we introduce a large-scale synthetic data engine, PersonX. Composed of hand-crafted 3D person models, the salient characteristic of this engine is "controllable". That is, we are able to synthesize pedestrians by setting the visual variables to arbitrary values. Second, on the 3D data engine, we quantitatively analyze the influence of pedestrian rotation angle on re-ID accuracy. Comprehensively, the person rotation angles are precisely customized from 0 to 360, allowing us to investigate its effect on the training, query, and gallery sets. Extensive experiment helps us gain deeper understanding of the fundamental problems in person re-ID. Our research also provides beneficial insights for dataset building and future practical usage, e.g., a person of a side view makes a better query.

* 9 pages, 7 figures
Click to Read Paper and Get Code
We develop a model of social learning from overabundant information: Short-lived agents sequentially choose from a large set of (flexibly correlated) information sources for prediction of an unknown state. Signal realizations are public. We demonstrate two starkly different long-run outcomes: (1) efficient information aggregation, where the community eventually learns as fast as possible; (2) "learning traps," where the community gets stuck observing suboptimal sources and learns inefficiently. Our main results identify a simple property of the signal correlation structure that separates these outcomes. In both regimes, we characterize which sources are observed in the long run and how often.

Click to Read Paper and Get Code
Deep learning techniques have achieved success in aspect-based sentiment analysis in recent years. However, there are two important issues that still remain to be further studied, i.e., 1) how to efficiently represent the target especially when the target contains multiple words; 2) how to utilize the interaction between target and left/right contexts to capture the most important words in them. In this paper, we propose an approach, called left-center-right separated neural network with rotatory attention (LCR-Rot), to better address the two problems. Our approach has two characteristics: 1) it has three separated LSTMs, i.e., left, center and right LSTMs, corresponding to three parts of a review (left context, target phrase and right context); 2) it has a rotatory attention mechanism which models the relation between target and left/right contexts. The target2context attention is used to capture the most indicative sentiment words in left/right contexts. Subsequently, the context2target attention is used to capture the most important word in the target. This leads to a two-side representation of the target: left-aware target and right-aware target. We compare our approach on three benchmark datasets with ten related methods proposed recently. The results show that our approach significantly outperforms the state-of-the-art techniques.

Click to Read Paper and Get Code
This paper studies a novel discriminative part-based model to represent and recognize object shapes with an "And-Or graph". We define this model consisting of three layers: the leaf-nodes with collaborative edges for localizing local parts, the or-nodes specifying the switch of leaf-nodes, and the root-node encoding the global verification. A discriminative learning algorithm, extended from the CCCP [23], is proposed to train the model in a dynamical manner: the model structure (e.g., the configuration of the leaf-nodes associated with the or-nodes) is automatically determined with optimizing the multi-layer parameters during the iteration. The advantages of our method are two-fold. (i) The And-Or graph model enables us to handle well large intra-class variance and background clutters for object shape detection from images. (ii) The proposed learning algorithm is able to obtain the And-Or graph representation without requiring elaborate supervision and initialization. We validate the proposed method on several challenging databases (e.g., INRIA-Horse, ETHZ-Shape, and UIUC-People), and it outperforms the state-of-the-arts approaches.

* Advances in Neural Information Processing Systems (pp. 242-250), 2014
* 9 pages, 4 figures, NIPS 2012
Click to Read Paper and Get Code
Bridges are an essential part of the transportation infrastructure and need to be monitored periodically. Visual inspections by dedicated teams have been one of the primary tools in structural health monitoring (SHM) of bridge structures. However, such conventional methods have certain shortcomings. Manual inspections may be challenging in harsh environments and are commonly biased in nature. In the last decade, camera-equipped unmanned aerial vehicles (UAVs) have been widely used for visual inspections; however, the task of automatically extracting useful information from raw images is still challenging. In this paper, a deep learning semantic segmentation framework is proposed to automatically localize surface cracks. Due to the high imbalance of crack and background classes in images, different strategies are investigated to improve performance and reliability. The trained models are tested on real-world crack images showing impressive robustness in terms of the metrics defined by the concepts of precision and recall. These techniques can be used in SHM of bridges to extract useful information from the unprocessed images taken from UAVs.

Click to Read Paper and Get Code