Research papers and code for "Yu Zhou":
Cloud computing offers the potential to help scientists to process massive number of computing resources often required in machine learning application such as computer vision problems. This proposal would like to show that which benefits can be obtained from cloud in order to help medical image analysis users (including scientists, clinicians, and research institutes). As security and privacy of algorithms are important for most of algorithms inventors, these algorithms can be hidden in a cloud to allow the users to use the algorithms as a package without any access to see/change their inside. In another word, in the user part, users send their images to the cloud and configure the algorithm via an interface. In the cloud part, the algorithms are applied to this image and the results are returned back to the user. My proposal has two parts: (1) investigate the potential of cloud computing for computer vision problems and (2) study the components of a proposed cloud-based framework for medical image analysis application and develop them (depending on the length of the internship). The investigation part will involve a study on several aspects of the problem including security, usability (for medical end users of the service), appropriate programming abstractions for vision problems, scalability and resource requirements. In the second part of this proposal I am going to thoroughly study of the proposed framework components and their relations and develop them. The proposed cloud-based framework includes an integrated environment to enable scientists and clinicians to access to the previous and current medical image analysis algorithms using a handful user interface without any access to the algorithm codes and procedures.

* 3 pages
Click to Read Paper and Get Code
Multi-instance learning attempts to learn from a training set consisting of labeled bags each containing many unlabeled instances. Previous studies typically treat the instances in the bags as independently and identically distributed. However, the instances in a bag are rarely independent, and therefore a better performance can be expected if the instances are treated in an non-i.i.d. way that exploits the relations among instances. In this paper, we propose a simple yet effective multi-instance learning method, which regards each bag as a graph and uses a specific kernel to distinguish the graphs by considering the features of the nodes as well as the features of the edges that convey some relations among instances. The effectiveness of the proposed method is validated by experiments.

* ICML, 2009
Click to Read Paper and Get Code
Perception and reasoning are basic human abilities that are seamlessly connected as part of human intelligence. However, in current machine learning systems, the perception and reasoning modules are incompatible. Tasks requiring joint perception and reasoning ability are difficult to accomplish autonomously and still demand human intervention. Inspired by the way language experts decoded Mayan scripts by joining two abilities in an abductive manner, this paper proposes the abductive learning framework. The framework learns perception and reasoning simultaneously with the help of a trial-and-error abductive process. We present the Neural-Logical Machine as an implementation of this novel learning framework. We demonstrate that--using human-like abductive learning--the machine learns from a small set of simple hand-written equations and then generalizes well to complex equations, a feat that is beyond the capability of state-of-the-art neural network models. The abductive learning framework explores a new direction for approaching human-level learning ability.

* Corrected typos
Click to Read Paper and Get Code
Researchers often summarize their work in the form of scientific posters. Posters provide a coherent and efficient way to convey core ideas expressed in scientific papers. Generating a good scientific poster, however, is a complex and time consuming cognitive task, since such posters need to be readable, informative, and visually aesthetic. In this paper, for the first time, we study the challenging problem of learning to generate posters from scientific papers. To this end, a data-driven framework, that utilizes graphical models, is proposed. Specifically, given content to display, the key elements of a good poster, including attributes of each panel and arrangements of graphical elements are learned and inferred from data. During the inference stage, an MAP inference framework is employed to incorporate some design principles. In order to bridge the gap between panel attributes and the composition within each panel, we also propose a recursive page splitting algorithm to generate the panel layout for a poster. To learn and validate our model, we collect and release a new benchmark dataset, called NJU-Fudan Paper-Poster dataset, which consists of scientific papers and corresponding posters with exhaustively labelled panels and attributes. Qualitative and quantitative results indicate the effectiveness of our approach.

* 10 pages, submission to IEEE TPAMI. arXiv admin note: text overlap with arXiv:1604.01219
Click to Read Paper and Get Code
StarCraft II poses a grand challenge for reinforcement learning. The main difficulties of it include huge state and action space and a long-time horizon. In this paper, we investigate a hierarchical reinforcement learning approach for StarCraft II. The hierarchy involves two levels of abstraction. One is the macro-action automatically extracted from expert's trajectories, which reduces the action space in an order of magnitude yet remains effective. The other is a two-layer hierarchical architecture which is modular and easy to scale, enabling a curriculum transferring from simpler tasks to more complex tasks. The reinforcement training algorithm for this architecture is also investigated. On a 64x64 map and using restrictive units, we achieve a winning rate of more than 99\% against the difficulty level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat model, we can achieve over 93\% winning rate of Protoss against the most difficult non-cheating built-in AI (level-7) of Terran, training within two days using a single machine with only 48 CPU cores and 8 K40 GPUs. It also shows strong generalization performance, when tested against never seen opponents including cheating levels built-in AI and all levels of Zerg and Protoss built-in AI. We hope this study could shed some light on the future research of large-scale reinforcement learning.

Click to Read Paper and Get Code
Detecting a change point is a crucial task in statistics that has been recently extended to the quantum realm. A source state generator that emits a series of single photons in a default state suffers an alteration at some point and starts to emit photons in a mutated state. The problem consists in identifying the point where the change took place. In this work, we consider a learning agent that applies Bayesian inference on experimental data to solve this problem. This learning machine adjusts the measurement over each photon according to the past experimental results finds the change position in an online fashion. Our results show that the local-detection success probability can be largely improved by using such a machine learning technique. This protocol provides a tool for improvement in many applications where a sequence of identical quantum states is required.

* Phys. Rev. A 98, 040301 (2018)
Click to Read Paper and Get Code
Domain adaptation is an essential task in dialog system building because there are so many new dialog tasks created for different needs every day. Collecting and annotating training data for these new tasks is costly since it involves real user interactions. We propose a domain adaptive dialog generation method based on meta-learning (DAML). DAML is an end-to-end trainable dialog system model that learns from multiple rich-resource tasks and then adapts to new domains with minimal training samples. We train a dialog system model using multiple rich-resource single-domain dialog data by applying the model-agnostic meta-learning algorithm to dialog domain. The model is capable of learning a competitive dialog system on a new domain with only a few training examples in an efficient manner. The two-step gradient updates in DAML enable the model to learn general features across multiple tasks. We evaluate our method on a simulated dialog dataset and achieve state-of-the-art performance, which is generalizable to new tasks.

* Accepted as a long paper in ACL 2019
Click to Read Paper and Get Code
Due to the increasing complexity of Integrated Circuits (ICs) and System-on-Chip (SoC), developing high-quality synthesis flows within a short market time becomes more challenging. We propose a general approach that precisely estimates the Quality-of-Result (QoR), such as delay and area, of unseen synthesis flows for specific designs. The main idea is training a Recurrent Neural Network (RNN) regressor, where the flows are inputs and QoRs are ground truth. The RNN regressor is constructed with Long Short-Term Memory (LSTM) and fully-connected layers. This approach is demonstrated with 1.2 million data points collected using 14nm, 7nm regular-voltage (RVT), and 7nm low-voltage (LVT) FinFET technologies with twelve IC designs. The accuracy of predicting the QoRs (delay and area) within one technology is $\boldsymbol{\geq}$\textbf{98.0}\% over $\sim$240,000 test points. To enable accurate predictions cross different technologies and different IC designs, we propose a transfer-learning approach that utilizes the model pre-trained with 14nm datasets. Our transfer learning approach obtains estimation accuracy $\geq$96.3\% over $\sim$960,000 test points, using only 100 data points for training.

Click to Read Paper and Get Code
End-to-end learning framework is useful for building dialog systems for its simplicity in training and efficiency in model updating. However, current end-to-end approaches only consider user semantic inputs in learning and under-utilize other user information. Therefore, we propose to include user sentiment obtained through multimodal information (acoustic, dialogic and textual), in the end-to-end learning framework to make systems more user-adaptive and effective. We incorporated user sentiment information in both supervised and reinforcement learning settings. In both settings, adding sentiment information reduced the dialog length and improved the task success rate on a bus information search task. This work is the first attempt to incorporate multimodal user information in the adaptive end-to-end dialog system training framework and attained state-of-the-art performance.

* ACL 2018
Click to Read Paper and Get Code
We propose to solve large scale Markowitz mean-variance (MV) portfolio allocation problem using reinforcement learning (RL). By adopting the recently developed continuous-time exploratory control framework, we formulate the exploratory MV problem in high dimensions. We further show the optimality of a multivariate Gaussian feedback policy, with time-decaying variance, in trading off exploration and exploitation. Based on a provable policy improvement theorem, we devise a scalable and data-efficient RL algorithm and conduct large scale empirical tests using data from the S&P 500 stocks. We found that our method consistently achieves over 10% annualized returns and it outperforms econometric methods and the deep RL method by large margins, for both long and medium terms of investment with monthly and daily trading.

* 15 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:1904.11392
Click to Read Paper and Get Code
We approach the continuous-time mean-variance (MV) portfolio selection with reinforcement learning (RL). The problem is to achieve the best tradeoff between exploration and exploitation, and is formulated as an entropy-regularized, relaxed stochastic control problem. We prove that the optimal feedback policy for this problem must be Gaussian, with time-decaying variance. We then establish connections between the entropy-regularized MV and the classical MV, including the solvability equivalence and the convergence as exploration weighting parameter decays to zero. Finally, we prove a policy improvement theorem, based on which we devise an implementable RL algorithm. We find that our algorithm outperforms both an adaptive control based method and a deep neural networks based algorithm by a large margin in our simulations.

* 39 pages, 5 figures
Click to Read Paper and Get Code
Since Internet is so popular and prevailing in human life, countering cyber threats, especially attack detection, is a challenging area of research in the field of cyber security. Intrusion detection systems (IDSs) are essential entities in a network topology aiming to safeguard the integrity and availability of sensitive assets in the protected systems. Although many supervised and unsupervised learning approaches from the field of machine learning and pattern recognition have been used to increase the efficacy of IDSs, it is still a problem to deal with lots of redundant and irrelevant features in high-dimension datasets for network anomaly detection. To this end, we propose a novel methodology combining the benefits of correlation-based feature selection(CFS) and bat algorithm(BA) with an ensemble classifier based on C4.5, Random Forest(RF), and Forest by Penalizing Attributes(Forest PA), which can be able to classify both common and rare types of attacks with high accuracy and efficiency. The experimental results, using a novel intrusion detection dataset, namely CIC-IDS2017, reveal that our CFS-BA-Ensemble method is able to contribute more critical features and significantly outperforms individual approaches, achieving high accuracy and low false alarm rate. Moreover, compared with the majority of the existing state-of-the-art and legacy techniques, our approach exhibits better performance under several classification metrics in the context of classification accuracy, f-measure, attack detection rate, and false alarm rate.

Click to Read Paper and Get Code
The ability to select an appropriate story ending is the first step towards perfect narrative comprehension. Story ending prediction requires not only the explicit clues within the context, but also the implicit knowledge (such as commonsense) to construct a reasonable and consistent story. However, most previous approaches do not explicitly use background commonsense knowledge. We present a neural story ending selection model that integrates three types of information: narrative sequence, sentiment evolution and commonsense knowledge. Experiments show that our model outperforms state-of-the-art approaches on a public dataset, ROCStory Cloze Task , and the performance gain from adding the additional commonsense knowledge is significant.

Click to Read Paper and Get Code
In the empirical study of evolutionary algorithms, the solution quality is evaluated by either the fitness value or approximation error. The latter measures the fitness difference between an approximation solution and the optimal solution. Since the approximation error analysis is more convenient than the direct estimation of the fitness value, this paper focuses on approximation error analysis. However, it is straightforward to extend all related results from the approximation error to the fitness value. Although the evaluation of solution quality plays an essential role in practice, few rigorous analyses have been conducted on this topic. This paper aims at establishing a novel theoretical framework of approximation error analysis of evolutionary algorithms for discrete optimization. This framework is divided into two parts. The first part is about exact expressions of the approximation error. Two methods, Jordan form and Schur's triangularization, are presented to obtain an exact expression. The second part is about upper bounds on approximation error. Two methods, convergence rate and auxiliary matrix iteration, are proposed to estimate the upper bound. The applicability of this framework is demonstrated through several examples.

Click to Read Paper and Get Code
Clustering is a fundamental problem in statistics and machine learning. Lloyd's algorithm, proposed in 1957, is still possibly the most widely used clustering algorithm in practice due to its simplicity and empirical performance. However, there has been little theoretical investigation on the statistical and computational guarantees of Lloyd's algorithm. This paper is an attempt to bridge this gap between practice and theory. We investigate the performance of Lloyd's algorithm on clustering sub-Gaussian mixtures. Under an appropriate initialization for labels or centers, we show that Lloyd's algorithm converges to an exponentially small clustering error after an order of $\log n$ iterations, where $n$ is the sample size. The error rate is shown to be minimax optimal. For the two-mixture case, we only require the initializer to be slightly better than random guess. In addition, we extend the Lloyd's algorithm and its analysis to community detection and crowdsourcing, two problems that have received a lot of attention recently in statistics and machine learning. Two variants of Lloyd's algorithm are proposed respectively for community detection and crowdsourcing. On the theoretical side, we provide statistical and computational guarantees of the two algorithms, and the results improve upon some previous signal-to-noise ratio conditions in literature for both problems. Experimental results on simulated and real data sets demonstrate competitive performance of our algorithms to the state-of-the-art methods.

Click to Read Paper and Get Code
An updated version will be uploaded later.

* This paper has been withdrawn by the authors
Click to Read Paper and Get Code
Convolutional Sparse Coding (CSC) has been attracting more and more attention in recent years, for making full use of image global correlation to improve performance on various computer vision applications. However, very few studies focus on solving CSC based image Super-Resolution (SR) problem. As a consequence, there is no significant progress in this area over a period of time. In this paper, we exploit the natural connection between CSC and Convolutional Neural Networks (CNN) to address CSC based image SR. Specifically, Convolutional Iterative Soft Thresholding Algorithm (CISTA) is introduced to solve CSC problem and it can be implemented using CNN architectures. Then we develop a novel CSC based SR framework analogy to the traditional SC based SR methods. Two models inspired by this framework are proposed for pre-/post-upsampling SR, respectively. Compared with recent state-of-the-art SR methods, both of our proposed models show superior performance in terms of both quantitative and qualitative measurements.

* 10 pages, 12 figures, 3 tables
Click to Read Paper and Get Code
The recent years have witnessed great advances for semantic segmentation using deep convolutional neural networks (DCNNs). However, a large number of convolutional layers and feature channels lead to semantic segmentation as a computationally heavy task, which is disadvantage to the scenario with limited resources. In this paper, we design an efficient symmetric network, called (ESNet), to address this problem. The whole network has nearly symmetric architecture, which is mainly composed of a series of factorized convolution unit (FCU) and its parallel counterparts (PFCU). On one hand, the FCU adopts a widely-used 1D factorized convolution in residual layers. On the other hand, the parallel version employs a transform-split-transform-merge strategy in the designment of residual module, where the split branch adopts dilated convolutions with different rate to enlarge receptive field. Our model has nearly 1.6M parameters, and is able to be performed over 62 FPS on a single GTX 1080Ti GPU. The experiments demonstrate that our approach achieves state-of-the-art results in terms of speed and accuracy trade-off for real-time semantic segmentation on CityScapes dataset.

* 12 pages, 3 figures, 4 tables, submitted. Code can be found at https://github.com/xiaoyufenfei/ESNet
Click to Read Paper and Get Code
Learning a shared dialog structure from a set of task-oriented dialogs is an important challenge in computational linguistics. The learned dialog structure can shed light on how to analyze human dialogs, and more importantly contribute to the design and evaluation of dialog systems. We propose to extract dialog structures using a modified VRNN model with discrete latent vectors. Different from existing HMM-based models, our model is based on variational-autoencoder (VAE). Such model is able to capture more dynamics in dialogs beyond the surface forms of the language. We find that qualitatively, our method extracts meaningful dialog structure, and quantitatively, outperforms previous models on the ability to predict unseen data. We further evaluate the model's effectiveness in a downstream task, the dialog system building task. Experiments show that, by integrating the learned dialog structure into the reward function design, the model converges faster and to a better outcome in a reinforcement learning setting.

* Long paper accepted by NAACL 2019
Click to Read Paper and Get Code
We propose a simple yet effective model for Single Image Super-Resolution (SISR), by combining the merits of Residual Learning and Convolutional Sparse Coding (RL-CSC). Our model is inspired by the Learned Iterative Shrinkage-Threshold Algorithm (LISTA). We extend LISTA to its convolutional version and build the main part of our model by strictly following the convolutional form, which improves the network's interpretability. Specifically, the convolutional sparse codings of input feature maps are learned in a recursive manner, and high-frequency information can be recovered from these CSCs. More importantly, residual learning is applied to alleviate the training difficulty when the network goes deeper. Extensive experiments on benchmark datasets demonstrate the effectiveness of our method. RL-CSC (30 layers) outperforms several recent state-of-the-arts, e.g., DRRN (52 layers) and MemNet (80 layers) in both accuracy and visual qualities. Codes and more results are available at https://github.com/axzml/RL-CSC.

* 10 pages, 8 figures, 4 tables
Click to Read Paper and Get Code