Models, code, and papers for "Hao Pan":
Learning with a primary objective, such as softmax cross entropy for classification and sequence generation, has been the norm for training deep neural networks for years. Although being a widely-adopted approach, using cross entropy as the primary objective exploits mostly the information from the ground-truth class for maximizing data likelihood, and largely ignores information from the complement (incorrect) classes. We argue that, in addition to the primary objective, training also using a complement objective that leverages information from the complement classes can be effective in improving model performance. This motivates us to study a new training paradigm that maximizes the likelihood of the groundtruth class while neutralizing the probabilities of the complement classes. We conduct extensive experiments on multiple tasks ranging from computer vision to natural language understanding. The experimental results confirm that, compared to the conventional training with just one primary objective, training also with the complement objective further improves the performance of the state-of-the-art models across all tasks. In addition to the accuracy improvement, we also show that models trained with both primary and complement objectives are more robust to single-step adversarial attacks.
Deep learning using multi-layer neural networks (NNs) architecture manifests superb power in modern machine learning systems. The trained Deep Neural Networks (DNNs) are typically large. The question we would like to address is whether it is possible to simplify the NN during training process to achieve a reasonable performance within an acceptable computational time. We presented a novel approach of optimising a deep neural network through regularisation of net- work architecture. We proposed regularisers which support a simple mechanism of dropping neurons during a network training process. The method supports the construction of a simpler deep neural networks with compatible performance with its simplified version. As a proof of concept, we evaluate the proposed method with examples including sparse linear regression, deep autoencoder and convolutional neural network. The valuations demonstrate excellent performance. The code for this work can be found in http://www.github.com/panweihit/DropNeuron
We extend Convolutional Neural Networks (CNNs) on flat and regular domains (e.g. 2D images) to curved surfaces embedded in 3D Euclidean space that are discretized as irregular meshes and widely used to represent geometric data in Computer Vision and Graphics. We define surface convolution on tangent spaces of a surface domain, where the convolution has two desirable properties: 1) the distortion of surface domain signals is locally minimal when being projected to the tangent space, and 2) the translation equi-variance property holds locally, by aligning tangent spaces with the canonical parallel transport that preserves metric. For computation, we rely on a parallel N-direction frame field on the surface that minimizes field variation and therefore is as compatible as possible to and approximates the parallel transport. On the tangent spaces equipped with parallel frames, the computation of surface convolution becomes standard routine. The frames have rotational symmetry which we disambiguate by constructing the covering space of surface induced by the parallel frames and grouping the feature maps into N sets accordingly; convolution is computed on the N branches of the cover space with respective feature maps while the kernel weights are shared. To handle irregular points of a discrete mesh while sharing kernel weights, we make the convolution semi-discrete, i.e. the convolution kernels are polynomial functions, and their convolution with discrete surface points becomes sampling and weighted summation. Pooling and unpooling operations are computed along a mesh hierarchy built through simplification. The presented surface CNNs allow effective deep learning on meshes. We show that for tasks of classification, segmentation and non-rigid registration, surface CNNs using only raw input signals achieve superior performances than previous models using sophisticated input features.
In this paper, we consider the problem of machine reading task when the questions are in the form of keywords, rather than natural language. In recent years, researchers have achieved significant success on machine reading comprehension tasks, such as SQuAD and TriviaQA. These datasets provide a natural language question sentence and a pre-selected passage, and the goal is to answer the question according to the passage. However, in the situation of interacting with machines by means of text, people are more likely to raise a query in form of several keywords rather than a complete sentence. The keyword-based query comprehension is a new challenge, because small variations to a question may completely change its semantical information, thus yield different answers. In this paper, we propose a novel neural network system that consists a Demand Optimization Model based on a passage-attention neural machine translation and a Reader Model that can find the answer given the optimized question. The Demand Optimization Model optimizes the original query and output multiple reconstructed questions, then the Reader Model takes the new questions as input and locate the answers from the passage. To make predictions robust, an evaluation mechanism will score the reconstructed questions so the final answer strike a good balance between the quality of both the Demand Optimization Model and the Reader Model. Experimental results on several datasets show that our framework significantly improves multiple strong baselines on this challenging task.
Finding feasible and collision-free paths for multiple nonlinear agents is challenging in the decentralized scenarios due to limited available information of other agents and complex dynamics constraints. In this paper, we propose a fast multi-agent collision avoidance algorithm for general nonlinear agents with continuous action space, where each agent observes only positions and velocities of nearby agents. To reduce online computation, we first decompose the multi-agent scenario and solve a two agents collision avoidance problem using reinforcement learning (RL). When extending the trained policy to a multi-agent problem, safety is ensured by introducing the optimal reciprocal collision avoidance (ORCA) as linear constraints and the overall collision avoidance action could be found through simple convex optimization. Most existing RL-based multi-agent collision avoidance algorithms rely on the direct control of agent velocities. In sharp contrasts, our approach is applicable to general nonlinear agents. Realistic simulations based on nonlinear bicycle agent models are performed with various challenging scenarios, indicating a competitive performance of the proposed method in avoiding collisions, congestion and deadlock with smooth trajectories.
This paper investigates a new task named Conversational Question Generation (CQG) which is to generate a question based on a passage and a conversation history (i.e., previous turns of question-answer pairs). CQG is a crucial task for developing intelligent agents that can drive question-answering style conversations or test user understanding of a given passage. Towards that end, we propose a new approach named Reinforced Dynamic Reasoning (ReDR) network, which is based on the general encoder-decoder framework but incorporates a reasoning procedure in a dynamic manner to better understand what has been asked and what to ask next about the passage. To encourage producing meaningful questions, we leverage a popular question answering (QA) model to provide feedback and fine-tune the question generator using a reinforcement learning mechanism. Empirical results on the recently released CoQA dataset demonstrate the effectiveness of our method in comparison with various baselines and model variants. Moreover, to show the applicability of our method, we also apply it to create multi-turn question-answering conversations for passages in SQuAD.
Bionic design refers to an approach of generative creativity in which a target object (e.g. a floor lamp) is designed to contain features of biological source objects (e.g. flowers), resulting in creative biologically-inspired design. In this work, we attempt to model the process of shape-oriented bionic design as follows: given an input image of a design target object, the model generates images that 1) maintain shape features of the input design target image, 2) contain shape features of images from the specified biological source domain, 3) are plausible and diverse. We propose DesignGAN, a novel unsupervised deep generative approach to realising bionic design. Specifically, we employ a conditional Generative Adversarial Networks architecture with several designated losses (an adversarial loss, a regression loss, a cycle loss and a latent loss) that respectively constrict our model to meet the corresponding aforementioned requirements of bionic design modelling. We perform qualitative and quantitative experiments to evaluate our method, and demonstrate that our proposed approach successfully generates creative images of bionic design.
Inspired by practical importance of social networks, economic networks, biological networks and so on, studies on large and complex networks have attracted a surge of attentions in the recent years. Link prediction is a fundamental issue to understand the mechanisms by which new links are added to the networks. We introduce the method of robust principal component analysis (robust PCA) into link prediction, and estimate the missing entries of the adjacency matrix. On one hand, our algorithm is based on the sparsity and low rank property of the matrix, on the other hand, it also performs very well when the network is dense. This is because a relatively dense real network is also sparse in comparison to the complete graph. According to extensive experiments on real networks from disparate fields, when the target network is connected and sufficiently dense, whatever it is weighted or unweighted, our method is demonstrated to be very effective and with prediction accuracy being considerably improved comparing with many state-of-the-art algorithms.
Machine comprehension(MC) style question answering is a representative problem in natural language processing. Previous methods rarely spend time on the improvement of encoding layer, especially the embedding of syntactic information and name entity of the words, which are very crucial to the quality of encoding. Moreover, existing attention methods represent each query word as a vector or use a single vector to represent the whole query sentence, neither of them can handle the proper weight of the key words in query sentence. In this paper, we introduce a novel neural network architecture called Multi-layer Embedding with Memory Network(MEMEN) for machine reading task. In the encoding layer, we employ classic skip-gram model to the syntactic and semantic information of the words to train a new kind of embedding layer. We also propose a memory network of full-orientation matching of the query and passage to catch more pivotal information. Experiments show that our model has competitive results both from the perspectives of precision and efficiency in Stanford Question Answering Dataset(SQuAD) among all published results and achieves the state-of-the-art results on TriviaQA dataset.
Developing a safe and efficient collision avoidance policy for multiple robots is challenging in the decentralized scenarios where each robot generate its paths without observing other robots' states and intents. While other distributed multi-robot collision avoidance systems exist, they often require extracting agent-level features to plan a local collision-free action, which can be computationally prohibitive and not robust. More importantly, in practice the performance of these methods are much lower than their centralized counterparts. We present a decentralized sensor-level collision avoidance policy for multi-robot systems, which directly maps raw sensor measurements to an agent's steering commands in terms of movement velocity. As a first step toward reducing the performance gap between decentralized and centralized methods, we present a multi-scenario multi-stage training framework to find an optimal policy which is trained over a large number of robots on rich, complex environments simultaneously using a policy gradient based reinforcement learning algorithm. We validate the learned sensor-level collision avoidance policy in a variety of simulated scenarios with thorough performance evaluations and show that the final learned policy is able to find time efficient, collision-free paths for a large-scale robot system. We also demonstrate that the learned policy can be well generalized to new scenarios that do not appear in the entire training period, including navigating a heterogeneous group of robots and a large-scale scenario with 100 robots. Videos are available at https://sites.google.com/view/drlmaca
Machine Comprehension (MC) is one of the core problems in natural language processing, requiring both understanding of the natural language and knowledge about the world. Rapid progress has been made since the release of several benchmark datasets, and recently the state-of-the-art models even surpass human performance on the well-known SQuAD evaluation. In this paper, we transfer knowledge learned from machine comprehension to the sequence-to-sequence tasks to deepen the understanding of the text. We propose MacNet: a novel encoder-decoder supplementary architecture to the widely used attention-based sequence-to-sequence models. Experiments on neural machine translation (NMT) and abstractive text summarization show that our proposed framework can significantly improve the performance of the baseline models, and our method for the abstractive text summarization achieves the state-of-the-art results on the Gigaword dataset.
In critical care, intensivists are required to continuously monitor high dimensional vital signs and lab measurements to detect and diagnose acute patient conditions. This has always been a challenging task. In this study, we propose a novel self-correcting deep learning prediction approach to address this challenge. We focus on an example of the prediction of acute kidney injury (AKI). Compared with the existing models, our method has a number of distinct features: we utilized the accumulative data of patients in ICU; we developed a self-correcting mechanism that feeds errors from the previous predictions back into the network; we also proposed a regularization method that takes into account not only the model's prediction error on the label but also its estimation errors on the input data. This mechanism is applied in both regression and classification tasks. We compared the performance of our proposed method with the conventional deep learning models on two real-world clinical datasets and demonstrated that our proposed model constantly outperforms these baseline models. In particular, the proposed model achieved area under ROC curve at 0.893 on the MIMIC III dataset, and 0.871 on the Philips eICU dataset.
This paper proposes a practical approach to addressing limitations posed by use of single active electrodes in applications for sleep stage classification. Electroencephalography (EEG)-based characterizations of sleep stage progression contribute the diagnosis and monitoring of the many pathologies of sleep. Several prior reports have explored ways of automating the analysis of sleep EEG and of reducing the complexity of the data needed for reliable discrimination of sleep stages in order to make it possible to perform sleep studies at lower cost in the home (rather than only in specialized clinical facilities). However, these reports have involved recordings from electrodes placed on the cranial vertex or occiput, which can be uncomfortable or difficult for subjects to position. Those that have utilized single EEG channels which contain less sleep information, have showed poor classification performance. We have taken advantage of Rectifier Neural Network for feature detection and Long Short-Term Memory (LSTM) network for sequential data learning to optimize classification performance with single electrode recordings. After exploring alternative electrode placements, we found a comfortable configuration of a single-channel EEG on the forehead and have shown that it can be integrated with additional electrodes for simultaneous recording of the electroocuolgram (EOG). Evaluation of data from 62 people (with 494 hours sleep) demonstrated better performance of our analytical algorithm for automated sleep classification than existing approaches using vertex or occipital electrode placements. Use of this recording configuration with neural network deconvolution promises to make clinically indicated home sleep studies practical.
Combining deep neural networks with reinforcement learning has shown great potential in the next-generation intelligent control. However, there are challenges in terms of safety and cost in practical applications. In this paper, we propose the Intervention Aided Reinforcement Learning (IARL) framework, which utilizes human intervened robot-environment interaction to improve the policy. We used the Unmanned Aerial Vehicle (UAV) as the test platform. We built neural networks as our policy to map sensor readings to control signals on the UAV. Our experiment scenarios cover both simulation and reality. We show that our approach substantially reduces the human intervention and improves the performance in autonomous navigation, at the same time it ensures safety and keeps training cost acceptable.
Label hierarchies widely exist in many vision-related problems, ranging from explicit label hierarchies existed in image classification to latent label hierarchies existed in semantic segmentation. Nevertheless, state-of-the-art methods often deploy cross-entropy loss that implicitly assumes class labels to be exclusive and thus independence from each other. Motivated by the fact that classes from the same parental category usually share certain similarity, we design a new training diagram called Hierarchical Complement Objective Training (HCOT) that leverages the information from label hierarchy. HCOT maximizes the probability of the ground truth class, and at the same time, neutralizes the probabilities of rest of the classes in a hierarchical fashion, making the model take advantage of the label hierarchy explicitly. The proposed HCOT is evaluated on both image classification and semantic segmentation tasks. Experimental results confirm that HCOT outperforms state-of-the-art models in CIFAR-100, ImageNet-2012, and PASCAL-Context. The study further demonstrates that HCOT can be applied on tasks with latent label hierarchies, which is a common characteristic in many machine learning tasks.
Model robustness has been an important issue, since adding small adversarial perturbations to images is sufficient to drive the model accuracy down to nearly zero. In this paper, we propose a new training objective "Guided Complement Entropy" (GCE) that has dual desirable effects: (a) neutralizing the predicted probabilities of incorrect classes, and (b) maximizing the predicted probability of the ground-truth class, particularly when (a) is achieved. Training with GCE encourages models to learn latent representations where samples of different classes form distinct clusters, which we argue, improves the model robustness against adversarial perturbations. Furthermore, compared with the state-of-the-arts trained with cross-entropy, same models trained with GCE achieve significant improvements on the robustness against white-box adversarial attacks, both with and without adversarial training. When no attack is present, training with GCE also outperforms cross-entropy in terms of model accuracy.
Robots that autonomously manipulate objects within warehouses have the potential to shorten the package delivery time and improve the efficiency of the e-commerce industry. In this paper, we present a robotic system that is capable of both picking and placing general objects in warehouse scenarios. Given a target object, the robot autonomously detects it from a shelf or a table and estimates its full 6D pose. With this pose information, the robot picks the object using its gripper, and then places it into a container or at a specified location. We describe our pick-and-place system in detail while highlighting our design principles for the warehouse settings, including the perception method that leverages knowledge about its workspace, three grippers designed to handle a large variety of different objects in terms of shape, weight and material, and grasp planning in cluttered scenarios. We also present extensive experiments to evaluate the performance of our picking system and demonstrate that the robot is competent to accomplish various tasks in warehouse settings, such as picking a target item from a tight space, grasping different objects from the shelf, and performing pick-and-place tasks on the table.
Several prior works have proposed various methods for the task of automatic melody harmonization, in which a model aims to generate a sequence of chords to serve as the harmonic accompaniment of a given multiple-bar melody sequence. In this paper, we present a comparative study evaluating and comparing the performance of a set of canonical approaches to this task, including a template matching based model, a hidden Markov based model, a genetic algorithm based model, and two deep learning based models. The evaluation is conducted on a dataset of 9,226 melody/chord pairs we newly collect for this study, considering up to 48 triad chords, using a standardized training/test split. We report the result of an objective evaluation using six different metrics and a subjective study with 202 participants.
Recently, how to expand data transmission to reduce cell data and repeated cell transmission has received more and more research attention. In mobile social networks, content popularity prediction has always been an important part of traffic offloading and expanding data dissemination. However, current mainstream content popularity prediction methods only use the number of downloads and shares or the distribution of user interests, which do not consider important time and geographic location information in mobile social networks, and all of data is from OSN which is not same as MSN. In this work, we propose D2D Long Short-Term Memory (D2D-LSTM), a deep neural network based on LSTM, which is designed to predict a complete D2D diffusion path. Our work is the first attempt in the world to use real data of MSN to predict diffusion path with deep neural networks which conforms to the D2D structure. Compared to linear sequence networks, only learn users' social features without time distribution or GPS distribution and files' content features, our model can predict the propagation path more accurately (up to 85.858\%) and can reach convergence faster (less than 100 steps) because of the neural network that conforms to the D2D structure and combines user social features and files features. Moreover, we can simulate generating a D2D propagation tree. After experiment and comparison, it is found to be very similar to the ground-truth trees. Finally, we define a user prototype refinement that can more accurately describe the propagation sharing habits of a prototype user (including content preferences, time preferences, and geographic location preferences), and experimentally validate the predictions when the user prototype is added to 1000 classes, it is almost identical to the 50 categories.
Financial fraud detection is one of the core technological assets of Fintech companies. It saves tens of millions of money fro m Chinese Fintech companies since the bad loan rate is more than 10%. HC Financial Service Group is the 3rd largest company in the Chinese P2P financial market. In this paper we illustrate how we tackle the fraud detection problem at HC Financial. We utilize two powerful workhorses in the machine learning field - random forest and gradient boosting decision tree to detect fraudulent users . We demonstrate that by carefully select features and tune model parameters , we could effectively filter out fraudulent users in the P2P market.