Research papers and code for "Qian Yang":
1 bit deep neural networks (DNNs), of which both the activations and weights are binarized , are attracting more and more attention due to their high computational efficiency and low memory requirement . However, the drawback of large accuracy dropping also restrict s its application. In this paper, we propose a novel Targeted Acceleration and Compression (TAC) framework to improve the performance of 1 bit deep neural networks W e consider that the acceleration and compression effects of binarizing fully connected layer s are not sufficient to compensate for the accuracy loss caused by it In the proposed framework, t he convolutional and fully connected layer are separated and optimized i ndividually . F or the convolutional layer s , both the activations and weights are binarized. For the fully connected layer s, the binarization operation is re placed by network pruning and low bit quantization. The proposed framework is implemented on the CIFAR 10, CIFAR 100 and ImageNet ( ILSVRC 12 ) datasets , and experimental results show that the proposed TAC can significantly improve the accuracy of 1 bit deep neural networks and outperforms the state of the art by more than 6 percentage points .

* 10 pages
Click to Read Paper and Get Code
Evolutionary algorithms (EAs), a large class of general purpose optimization algorithms inspired from the natural phenomena, are widely used in various industrial optimizations and often show excellent performance. This paper presents an attempt towards revealing their general power from a statistical view of EAs. By summarizing a large range of EAs into the sampling-and-learning framework, we show that the framework directly admits a general analysis on the probable-absolute-approximate (PAA) query complexity. We particularly focus on the framework with the learning subroutine being restricted as a binary classification, which results in the sampling-and-classification (SAC) algorithms. With the help of the learning theory, we obtain a general upper bound on the PAA query complexity of SAC algorithms. We further compare SAC algorithms with the uniform search in different situations. Under the error-target independence condition, we show that SAC algorithms can achieve polynomial speedup to the uniform search, but not super-polynomial speedup. Under the one-side-error condition, we show that super-polynomial speedup can be achieved. This work only touches the surface of the framework. Its power under other conditions is still open.

Click to Read Paper and Get Code
In real-world optimization tasks, the objective (i.e., fitness) function evaluation is often disturbed by noise due to a wide range of uncertainties. Evolutionary algorithms (EAs) have been widely applied to tackle noisy optimization, where reducing the negative effect of noise is a crucial issue. One popular strategy to cope with noise is sampling, which evaluates the fitness multiple times and uses the sample average to approximate the true fitness. In this paper, we introduce median sampling as a noise handling strategy into EAs, which uses the median of the multiple evaluations to approximate the true fitness instead of the mean. We theoretically show that median sampling can reduce the expected running time of EAs from exponential to polynomial by considering the (1+1)-EA on OneMax under the commonly used one-bit noise. We also compare mean sampling with median sampling by considering two specific noise models, suggesting that when the 2-quantile of the noisy fitness increases with the true fitness, median sampling can be a better choice. The results provide us with some guidance to employ median sampling efficiently in practice.

* 19 pages. arXiv admin note: text overlap with arXiv:1810.05045, arXiv:1711.00956
Click to Read Paper and Get Code
Clinical decision support tools (DST) promise improved healthcare outcomes by offering data-driven insights. While effective in lab settings, almost all DSTs have failed in practice. Empirical research diagnosed poor contextual fit as the cause. This paper describes the design and field evaluation of a radically new form of DST. It automatically generates slides for clinicians' decision meetings with subtly embedded machine prognostics. This design took inspiration from the notion of "Unremarkable Computing", that by augmenting the users' routines technology/AI can have significant importance for the users yet remain unobtrusive. Our field evaluation suggests clinicians are more likely to encounter and embrace such a DST. Drawing on their responses, we discuss the importance and intricacies of finding the right level of unremarkableness in DST design, and share lessons learned in prototyping critical AI systems as a situated experience.

* CHI Conference on Human Factors in Computing Systems Proceedings 2019 (CHI'19)
Click to Read Paper and Get Code
Fake news detection is a critical yet challenging problem in Natural Language Processing (NLP). The rapid rise of social networking platforms has not only yielded a vast increase in information accessibility but has also accelerated the spread of fake news. Given the massive amount of Web content, automatic fake news detection is a practical NLP problem required by all online content providers. This paper presents a survey on fake news detection. Our survey introduces the challenges of automatic fake news detection. We systematically review the datasets and NLP solutions that have been developed for this task. We also discuss the limits of these datasets and problem formulations, our insights, and recommended solutions.

* 11 pages, no figure
Click to Read Paper and Get Code
Evolutionary algorithms (EAs) are population-based general-purpose optimization algorithms, and have been successfully applied in various real-world optimization tasks. However, previous theoretical studies often employ EAs with only a parent or offspring population and focus on specific problems. Furthermore, they often only show upper bounds on the running time, while lower bounds are also necessary to get a complete understanding of an algorithm. In this paper, we analyze the running time of the ($\mu$+$\lambda$)-EA (a general population-based EA with mutation only) on the class of pseudo-Boolean functions with a unique global optimum. By applying the recently proposed switch analysis approach, we prove the lower bound $\Omega(n \ln n+ \mu + \lambda n\ln\ln n/ \ln n)$ for the first time. Particularly on the two widely-studied problems, OneMax and LeadingOnes, the derived lower bound discloses that the ($\mu$+$\lambda$)-EA will be strictly slower than the (1+1)-EA when the population size $\mu$ or $\lambda$ is above a moderate order. Our results imply that the increase of population size, while usually desired in practice, bears the risk of increasing the lower bound of the running time and thus should be carefully considered.

Click to Read Paper and Get Code
Many optimization tasks have to be handled in noisy environments, where we cannot obtain the exact evaluation of a solution but only a noisy one. For noisy optimization tasks, evolutionary algorithms (EAs), a kind of stochastic metaheuristic search algorithm, have been widely and successfully applied. Previous work mainly focuses on empirical studying and designing EAs for noisy optimization, while, the theoretical counterpart has been little investigated. In this paper, we investigate a largely ignored question, i.e., whether an optimization problem will always become harder for EAs in a noisy environment. We prove that the answer is negative, with respect to the measurement of the expected running time. The result implies that, for optimization tasks that have already been quite hard to solve, the noise may not have a negative effect, and the easier a task the more negatively affected by the noise. On a representative problem where the noise has a strong negative effect, we examine two commonly employed mechanisms in EAs dealing with noise, the re-evaluation and the threshold selection strategies. The analysis discloses that the two strategies, however, both are not effective, i.e., they do not make the EA more noise tolerant. We then find that a small modification of the threshold selection allows it to be proven as an effective strategy for dealing with the noise in the problem.

Click to Read Paper and Get Code
Evolutionary algorithms (EAs), simulating the evolution process of natural species, are used to solve optimization problems. Crossover (also called recombination), originated from simulating the chromosome exchange phenomena in zoogamy reproduction, is widely employed in EAs to generate offspring solutions, of which the effectiveness has been examined empirically in applications. However, due to the irregularity of crossover operators and the complicated interactions to mutation, crossover operators are hard to analyze and thus have few theoretical results. Therefore, analyzing crossover not only helps in understanding EAs, but also helps in developing novel techniques for analyzing sophisticated metaheuristic algorithms. In this paper, we derive the General Markov Chain Switching Theorem (GMCST) to facilitate theoretical studies of crossover-enabled EAs. The theorem allows us to analyze the running time of a sophisticated EA from an easy-to-analyze EA. Using this tool, we analyze EAs with several crossover operators on the LeadingOnes and OneMax problems, which are noticeably two well studied problems for mutation-only EAs but with few results for crossover-enabled EAs. We first derive the bounds of running time of the (2+2)-EA with crossover operators; then we study the running time gap between the mutation-only (2:2)-EA and the (2:2)-EA with crossover operators; finally, we develop strategies that apply crossover operators only when necessary, which improve from the mutation-only as well as the crossover-all-the-time (2:2)-EA. The theoretical results are verified by experiments.

Click to Read Paper and Get Code
A deep learning model is proposed for predicting block-level parking occupancy in real time. The model leverages Graph-Convolutional Neural Networks (GCNN) to extract the spatial relations of traffic flow in large-scale networks, and utilizes Recurrent Neural Networks (RNN) with Long-Short Term Memory (LSTM) to capture the temporal features. In addition, the model is capable of taking multiple heterogeneously structured traffic data sources as input, such as parking meter transactions, traffic speed, and weather conditions. The model performance is evaluated through a case study in Pittsburgh downtown area. The proposed model outperforms other baseline methods including multi-layer LSTM and Lasso with an average testing MAPE of 12.0\% when predicting block-level parking occupancies 30 minutes in advance. The case study also shows that, in generally, the prediction model works better for business areas than for recreational locations. We found that incorporating traffic speed and weather information can significantly improve the prediction performance. Weather data is particularly useful for improving predicting accuracy in recreational areas.

Click to Read Paper and Get Code
This paper focuses on a new task, i.e., transplanting a category-and-task-specific neural network to a generic, modular network without strong supervision. We design a functionally interpretable structure for the generic network. Like building LEGO blocks, we teach the generic network a new category by directly transplanting the module corresponding to the category from a pre-trained network with a few or even without sample annotations. Our method incrementally adds new categories to the generic network but does not affect representations of existing categories. In this way, our method breaks the typical bottleneck of learning a net for massive tasks and categories, i.e., the requirement of collecting samples for all tasks and categories at the same time before the learning begins. Thus, we use a new distillation algorithm, namely back-distillation, to overcome specific challenges of network transplanting. Our method without training samples even outperformed the baseline with 100 training samples.

* In AAAI-19 Workshop on Network Interpretability for Deep Learning. arXiv admin note: substantial text overlap with arXiv:1804.10272
Click to Read Paper and Get Code
In the computer research area, facial expression recognition is a hot research problem. Recent years, the research has moved from the lab environment to in-the-wild circumstances. It is challenging, especially under extreme poses. But current expression detection systems are trying to avoid the pose effects and gain the general applicable ability. In this work, we solve the problem in the opposite approach. We consider the head poses and detect the expressions within special head poses. Our work includes two parts: detect the head pose and group it into one pre-defined head pose class; do facial expression recognize within each pose class. Our experiments show that the recognition results with pose class grouping are much better than that of direct recognition without considering poses. We combine the hand-crafted features, SIFT, LBP and geometric feature, with deep learning feature as the representation of the expressions. The handcrafted features are added into the deep learning framework along with the high level deep learning features. As a comparison, we implement SVM and random forest to as the prediction models. To train and test our methodology, we labeled the face dataset with 6 basic expressions.

* Published on ICGIP2017
Click to Read Paper and Get Code
Ancient Chinese brings the wisdom and spirit culture of the Chinese nation. Automatically translation from ancient Chinese to modern Chinese helps to inherit and carry forward the quintessence of the ancients. In this paper, we propose an Ancient-Modern Chinese clause alignment approach and apply it to create a large scale Ancient-Modern Chinese parallel corpus which contains about 1.24M bilingual pairs. To our best knowledge, this is the first large high-quality Ancient-Modern Chinese dataset. Furthermore, we train the SMT and various NMT based models on this dataset and provide a strong baseline for this task

* 8 pages, 2 figures
Click to Read Paper and Get Code
Backpropagation algorithm is indispensable for the training of feedforward neural networks. It requires propagating error gradients sequentially from the output layer all the way back to the input layer. The backward locking in backpropagation algorithm constrains us from updating network layers in parallel and fully leveraging the computing resources. Recently, several algorithms have been proposed for breaking the backward locking. However, their performances degrade seriously when networks are deep. In this paper, we propose decoupled parallel backpropagation algorithm for deep learning optimization with convergence guarantee. Firstly, we decouple the backpropagation algorithm using delayed gradients, and show that the backward locking is removed when we split the networks into multiple modules. Then, we utilize decoupled parallel backpropagation in two stochastic methods and prove that our method guarantees convergence to critical points for the non-convex problem. Finally, we perform experiments for training deep convolutional neural networks on benchmark datasets. The experimental results not only confirm our theoretical analysis, but also demonstrate that the proposed method can achieve significant speedup without loss of accuracy.

* ICML 2018
Click to Read Paper and Get Code
Motion blur, out of focus, insufficient spatial resolution, lossy compression and many other factors can all cause an image to have poor quality. However, image quality is a largely ignored issue in traditional pattern recognition literature. In this paper, we use face detection and recognition as case studies to show that image quality is an essential factor which will affect the performances of traditional algorithms. We demonstrated that it is not the image quality itself that is the most important, but rather the quality of the images in the training set should have similar quality as those in the testing set. To handle real-world application scenarios where images with different kinds and severities of degradation can be presented to the system, we have developed a quality classified image analysis framework to deal with images of mixed qualities adaptively. We use deep neural networks first to classify images based on their quality classes and then design a separate face detector and recognizer for images in each quality class. We will present experimental results to show that our quality classified framework can accurately classify images based on the type and severity of image degradations and can significantly boost the performances of state-of-the-art face detector and recognizer in dealing with image datasets containing mixed quality images.

* 6 pages
Click to Read Paper and Get Code
Computational chemistry develops fast in recent years due to the rapid growth and breakthroughs in AI. Thanks for the progress in natural language processing, researchers can extract more fine-grained knowledge in publications to stimulate the development in computational chemistry. While the works and corpora in chemical entity extraction have been restricted in the biomedicine or life science field instead of the chemistry field, we build a new corpus in chemical bond field annotated for 7 types of entities: compound, solvent, method, bond, reaction, pKa and pKa value. This paper presents a novel BERT-CRF model to build scientific chemical data chains by extracting 7 chemical entities and relations from publications. And we propose a joint model to extract the entities and relations simultaneously. Experimental results on our Chemical Special Corpus demonstrate that we achieve state-of-art and competitive NER performance.

Click to Read Paper and Get Code
Existing computational models to understand hate speech typically frame the problem as a simple classification task, bypassing the understanding of hate symbols (e.g., 14 words, kigy) and their secret connotations. In this paper, we propose a novel task of deciphering hate symbols. To do this, we leverage the Urban Dictionary and collected a new, symbol-rich Twitter corpus of hate speech. We investigate neural network latent context models for deciphering hate symbols. More specifically, we study Sequence-to-Sequence models and show how they are able to crack the ciphers based on context. Furthermore, we propose a novel Variational Decipher and show how it can generalize better to unseen hate symbols in a more challenging testing setting.

Click to Read Paper and Get Code
In noisy evolutionary optimization, sampling is a common strategy to deal with noise. By the sampling strategy, the fitness of a solution is evaluated multiple times (called \emph{sample size}) independently, and its true fitness is then approximated by the average of these evaluations. Previous studies on sampling are mainly empirical. In this paper, we first investigate the effect of sample size from a theoretical perspective. By analyzing the (1+1)-EA on the noisy LeadingOnes problem, we show that as the sample size increases, the running time can reduce from exponential to polynomial, but then return to exponential. This suggests that a proper sample size is crucial in practice. Then, we investigate what strategies can work when sampling with any fixed sample size fails. By two illustrative examples, we prove that using parent or offspring populations can be better. Finally, we construct an artificial noisy example to show that when using neither sampling nor populations is effective, adaptive sampling (i.e., sampling with an adaptive sample size) can work. This, for the first time, provides a theoretical support for the use of adaptive sampling.

Click to Read Paper and Get Code
Existing work on automated hate speech detection typically focuses on binary classification or on differentiating among a small set of categories. In this paper, we propose a novel method on a fine-grained hate speech classification task, which focuses on differentiating among 40 hate groups of 13 different hate group categories. We first explore the Conditional Variational Autoencoder (CVAE) as a discriminative model and then extend it to a hierarchical architecture to utilize the additional hate category information for more accurate prediction. Experimentally, we show that incorporating the hate category information for training can significantly improve the classification performance and our proposed model outperforms commonly-used discriminative models.

Click to Read Paper and Get Code
Radio-frequency (RF) tomographic imaging is a promising technique for inferring multi-dimensional physical space by processing RF signals traversed across a region of interest. However, conventional RF tomography schemes are generally based on vector compressed sensing, which ignores the geometric structures of the target spaces and leads to low recovery precision. The recently proposed transform-based tensor model is more appropriate for sensory data processing, as it helps exploit the geometric structures of the three-dimensional target and improve the recovery precision. In this paper, we propose a novel tensor sensing approach that achieves highly accurate estimation for real-world three-dimensional spaces. First, we use the transform-based tensor model to formulate a tensor sensing problem, and propose a fast alternating minimization algorithm called Alt-Min. Secondly, we drive an algorithm which is optimized to reduce memory and computation requirements. Finally, we present evaluation of our Alt-Min approach using IKEA 3D data and demonstrate significant improvement in recovery error and convergence speed compared to prior tensor-based compressed sensing.

* 6 pages, 4 figures
Click to Read Paper and Get Code