Models, code, and papers for "Qian Yang":
Throughout the course of my Ph.D., I have been designing the user experience (UX) of various machine learning (ML) systems. In this workshop, I share two projects as case studies in which people engage with ML in much more complicated and nuanced ways than the technical HCML work might assume. The first case study describes how cardiology teams in three hospitals used a clinical decision-support system that helps them decide whether and when to implant an artificial heart to a heart failure patient. I demonstrate that physicians cannot draw on their decision-making experience by seeing only patient data on paper. They are also confused by some fundamental premises upon which ML operates. For example, physicians asked: Are ML predictions made based on clinicians' best efforts? Is it ethical to make decisions based on previous patients' collective outcomes? In the second case study, my collaborators and I designed an intelligent text editor, with the goal of improving authors' writing experience with NLP (Natural Language Processing) technologies. We prototyped a number of generative functionalities where the system provides phrase-or-sentence-level writing suggestions upon user request. When writing with the prototype, however, authors shared that they need to "see where the sentence is going two paragraphs later" in order to decide whether the suggestion aligns with their writing; Some even considered adopting machine suggestions as plagiarism, therefore "is simply wrong". By sharing these unexpected and intriguing responses from these real-world ML users, I hope to start a discussion about such previously-unknown complexities and nuances of -- as the workshop proposal states -- "putting ML at the service of people in a way that is accessible, useful, and trustworthy to all".
1 bit deep neural networks (DNNs), of which both the activations and weights are binarized , are attracting more and more attention due to their high computational efficiency and low memory requirement . However, the drawback of large accuracy dropping also restrict s its application. In this paper, we propose a novel Targeted Acceleration and Compression (TAC) framework to improve the performance of 1 bit deep neural networks W e consider that the acceleration and compression effects of binarizing fully connected layer s are not sufficient to compensate for the accuracy loss caused by it In the proposed framework, t he convolutional and fully connected layer are separated and optimized i ndividually . F or the convolutional layer s , both the activations and weights are binarized. For the fully connected layer s, the binarization operation is re placed by network pruning and low bit quantization. The proposed framework is implemented on the CIFAR 10, CIFAR 100 and ImageNet ( ILSVRC 12 ) datasets , and experimental results show that the proposed TAC can significantly improve the accuracy of 1 bit deep neural networks and outperforms the state of the art by more than 6 percentage points .
Evolutionary algorithms (EAs), a large class of general purpose optimization algorithms inspired from the natural phenomena, are widely used in various industrial optimizations and often show excellent performance. This paper presents an attempt towards revealing their general power from a statistical view of EAs. By summarizing a large range of EAs into the sampling-and-learning framework, we show that the framework directly admits a general analysis on the probable-absolute-approximate (PAA) query complexity. We particularly focus on the framework with the learning subroutine being restricted as a binary classification, which results in the sampling-and-classification (SAC) algorithms. With the help of the learning theory, we obtain a general upper bound on the PAA query complexity of SAC algorithms. We further compare SAC algorithms with the uniform search in different situations. Under the error-target independence condition, we show that SAC algorithms can achieve polynomial speedup to the uniform search, but not super-polynomial speedup. Under the one-side-error condition, we show that super-polynomial speedup can be achieved. This work only touches the surface of the framework. Its power under other conditions is still open.
In real-world optimization tasks, the objective (i.e., fitness) function evaluation is often disturbed by noise due to a wide range of uncertainties. Evolutionary algorithms (EAs) have been widely applied to tackle noisy optimization, where reducing the negative effect of noise is a crucial issue. One popular strategy to cope with noise is sampling, which evaluates the fitness multiple times and uses the sample average to approximate the true fitness. In this paper, we introduce median sampling as a noise handling strategy into EAs, which uses the median of the multiple evaluations to approximate the true fitness instead of the mean. We theoretically show that median sampling can reduce the expected running time of EAs from exponential to polynomial by considering the (1+1)-EA on OneMax under the commonly used one-bit noise. We also compare mean sampling with median sampling by considering two specific noise models, suggesting that when the 2-quantile of the noisy fitness increases with the true fitness, median sampling can be a better choice. The results provide us with some guidance to employ median sampling efficiently in practice.
Accurate pedestrian orientation estimation of autonomous driving helps the ego vehicle obtain the intentions of pedestrians in the related environment, which are the base of safety measures such as collision avoidance and prewarning. However, because of relatively small sizes and high-level deformation of pedestrians, common pedestrian orientation estimation models fail to extract sufficient and comprehensive information from them, thus having their performance restricted, especially monocular ones which fail to obtain depth information of objects and related environment. In this paper, a novel monocular pedestrian orientation estimation model, called FFNet, is proposed. Apart from camera captures, the model adds the 2D and 3D dimensions of pedestrians as two other inputs according to the logic relationship between orientation and them. The 2D and 3D dimensions of pedestrians are determined from the camera captures and further utilized through two feedforward links connected to the orientation estimator. The feedforward links strengthen the logicality and interpretability of the network structure of the proposed model. Experiments show that the proposed model has at least 1.72% AOS increase than most state-of-the-art models after identical training processes. The model also has competitive results in orientation estimation evaluation on KITTI dataset.
Clinical decision support tools (DST) promise improved healthcare outcomes by offering data-driven insights. While effective in lab settings, almost all DSTs have failed in practice. Empirical research diagnosed poor contextual fit as the cause. This paper describes the design and field evaluation of a radically new form of DST. It automatically generates slides for clinicians' decision meetings with subtly embedded machine prognostics. This design took inspiration from the notion of "Unremarkable Computing", that by augmenting the users' routines technology/AI can have significant importance for the users yet remain unobtrusive. Our field evaluation suggests clinicians are more likely to encounter and embrace such a DST. Drawing on their responses, we discuss the importance and intricacies of finding the right level of unremarkableness in DST design, and share lessons learned in prototyping critical AI systems as a situated experience.
Fake news detection is a critical yet challenging problem in Natural Language Processing (NLP). The rapid rise of social networking platforms has not only yielded a vast increase in information accessibility but has also accelerated the spread of fake news. Given the massive amount of Web content, automatic fake news detection is a practical NLP problem required by all online content providers. This paper presents a survey on fake news detection. Our survey introduces the challenges of automatic fake news detection. We systematically review the datasets and NLP solutions that have been developed for this task. We also discuss the limits of these datasets and problem formulations, our insights, and recommended solutions.
Evolutionary algorithms (EAs) are population-based general-purpose optimization algorithms, and have been successfully applied in various real-world optimization tasks. However, previous theoretical studies often employ EAs with only a parent or offspring population and focus on specific problems. Furthermore, they often only show upper bounds on the running time, while lower bounds are also necessary to get a complete understanding of an algorithm. In this paper, we analyze the running time of the ($\mu$+$\lambda$)-EA (a general population-based EA with mutation only) on the class of pseudo-Boolean functions with a unique global optimum. By applying the recently proposed switch analysis approach, we prove the lower bound $\Omega(n \ln n+ \mu + \lambda n\ln\ln n/ \ln n)$ for the first time. Particularly on the two widely-studied problems, OneMax and LeadingOnes, the derived lower bound discloses that the ($\mu$+$\lambda$)-EA will be strictly slower than the (1+1)-EA when the population size $\mu$ or $\lambda$ is above a moderate order. Our results imply that the increase of population size, while usually desired in practice, bears the risk of increasing the lower bound of the running time and thus should be carefully considered.
Many optimization tasks have to be handled in noisy environments, where we cannot obtain the exact evaluation of a solution but only a noisy one. For noisy optimization tasks, evolutionary algorithms (EAs), a kind of stochastic metaheuristic search algorithm, have been widely and successfully applied. Previous work mainly focuses on empirical studying and designing EAs for noisy optimization, while, the theoretical counterpart has been little investigated. In this paper, we investigate a largely ignored question, i.e., whether an optimization problem will always become harder for EAs in a noisy environment. We prove that the answer is negative, with respect to the measurement of the expected running time. The result implies that, for optimization tasks that have already been quite hard to solve, the noise may not have a negative effect, and the easier a task the more negatively affected by the noise. On a representative problem where the noise has a strong negative effect, we examine two commonly employed mechanisms in EAs dealing with noise, the re-evaluation and the threshold selection strategies. The analysis discloses that the two strategies, however, both are not effective, i.e., they do not make the EA more noise tolerant. We then find that a small modification of the threshold selection allows it to be proven as an effective strategy for dealing with the noise in the problem.
Evolutionary algorithms (EAs), simulating the evolution process of natural species, are used to solve optimization problems. Crossover (also called recombination), originated from simulating the chromosome exchange phenomena in zoogamy reproduction, is widely employed in EAs to generate offspring solutions, of which the effectiveness has been examined empirically in applications. However, due to the irregularity of crossover operators and the complicated interactions to mutation, crossover operators are hard to analyze and thus have few theoretical results. Therefore, analyzing crossover not only helps in understanding EAs, but also helps in developing novel techniques for analyzing sophisticated metaheuristic algorithms. In this paper, we derive the General Markov Chain Switching Theorem (GMCST) to facilitate theoretical studies of crossover-enabled EAs. The theorem allows us to analyze the running time of a sophisticated EA from an easy-to-analyze EA. Using this tool, we analyze EAs with several crossover operators on the LeadingOnes and OneMax problems, which are noticeably two well studied problems for mutation-only EAs but with few results for crossover-enabled EAs. We first derive the bounds of running time of the (2+2)-EA with crossover operators; then we study the running time gap between the mutation-only (2:2)-EA and the (2:2)-EA with crossover operators; finally, we develop strategies that apply crossover operators only when necessary, which improve from the mutation-only as well as the crossover-all-the-time (2:2)-EA. The theoretical results are verified by experiments.
Source number detection is a critical problem in array signal processing. Conventional model-driven methods e.g., Akaikes information criterion (AIC) and minimum description length (MDL), suffer from severe performance degradation when the number of snapshots is small or the signal-to-noise ratio (SNR) is low. In this paper, we exploit the model-aided based deep neural network (DNN) to estimate the source number. Specifically, we first propose the eigenvalue based regression network (ERNet) and classification network (ECNet) to estimate the number of non-coherent sources, where the eigenvalues of the received signal covariance matrix and the source number are used as the input and the supervise label of the networks, respectively. Then, we extend the ERNet and ECNet for estimating the number of coherent sources, where the forward-backward spatial smoothing (FBSS) scheme is adopted to improve the performance of ERNet and ECNet. Numerical results demonstrate the outstanding performance of ERNet and ECNet over the conventional AIC and MDL methods as well as their excellent generalization capability, which also shows their great potentials for practical applications.
There are two major paradigms of white-box adversarial attacks that attempt to impose input perturbations. The first paradigm, called the fix-perturbation attack, crafts adversarial samples within a given perturbation level. The second paradigm, called the zero-confidence attack, finds the smallest perturbation needed to cause mis-classification, also known as the margin of an input feature. While the former paradigm is well-resolved, the latter is not. Existing zero-confidence attacks either introduce significant ap-proximation errors, or are too time-consuming. We therefore propose MARGINATTACK, a zero-confidence attack framework that is able to compute the margin with improved accuracy and efficiency. Our experiments show that MARGINATTACK is able to compute a smaller margin than the state-of-the-art zero-confidence attacks, and matches the state-of-the-art fix-perturbation at-tacks. In addition, it runs significantly faster than the Carlini-Wagner attack, currently the most ac-curate zero-confidence attack algorithm.
A deep learning model is proposed for predicting block-level parking occupancy in real time. The model leverages Graph-Convolutional Neural Networks (GCNN) to extract the spatial relations of traffic flow in large-scale networks, and utilizes Recurrent Neural Networks (RNN) with Long-Short Term Memory (LSTM) to capture the temporal features. In addition, the model is capable of taking multiple heterogeneously structured traffic data sources as input, such as parking meter transactions, traffic speed, and weather conditions. The model performance is evaluated through a case study in Pittsburgh downtown area. The proposed model outperforms other baseline methods including multi-layer LSTM and Lasso with an average testing MAPE of 12.0\% when predicting block-level parking occupancies 30 minutes in advance. The case study also shows that, in generally, the prediction model works better for business areas than for recreational locations. We found that incorporating traffic speed and weather information can significantly improve the prediction performance. Weather data is particularly useful for improving predicting accuracy in recreational areas.
This paper focuses on a new task, i.e., transplanting a category-and-task-specific neural network to a generic, modular network without strong supervision. We design a functionally interpretable structure for the generic network. Like building LEGO blocks, we teach the generic network a new category by directly transplanting the module corresponding to the category from a pre-trained network with a few or even without sample annotations. Our method incrementally adds new categories to the generic network but does not affect representations of existing categories. In this way, our method breaks the typical bottleneck of learning a net for massive tasks and categories, i.e., the requirement of collecting samples for all tasks and categories at the same time before the learning begins. Thus, we use a new distillation algorithm, namely back-distillation, to overcome specific challenges of network transplanting. Our method without training samples even outperformed the baseline with 100 training samples.
In the computer research area, facial expression recognition is a hot research problem. Recent years, the research has moved from the lab environment to in-the-wild circumstances. It is challenging, especially under extreme poses. But current expression detection systems are trying to avoid the pose effects and gain the general applicable ability. In this work, we solve the problem in the opposite approach. We consider the head poses and detect the expressions within special head poses. Our work includes two parts: detect the head pose and group it into one pre-defined head pose class; do facial expression recognize within each pose class. Our experiments show that the recognition results with pose class grouping are much better than that of direct recognition without considering poses. We combine the hand-crafted features, SIFT, LBP and geometric feature, with deep learning feature as the representation of the expressions. The handcrafted features are added into the deep learning framework along with the high level deep learning features. As a comparison, we implement SVM and random forest to as the prediction models. To train and test our methodology, we labeled the face dataset with 6 basic expressions.
Ancient Chinese brings the wisdom and spirit culture of the Chinese nation. Automatically translation from ancient Chinese to modern Chinese helps to inherit and carry forward the quintessence of the ancients. In this paper, we propose an Ancient-Modern Chinese clause alignment approach and apply it to create a large scale Ancient-Modern Chinese parallel corpus which contains about 1.24M bilingual pairs. To our best knowledge, this is the first large high-quality Ancient-Modern Chinese dataset. Furthermore, we train the SMT and various NMT based models on this dataset and provide a strong baseline for this task
Backpropagation algorithm is indispensable for the training of feedforward neural networks. It requires propagating error gradients sequentially from the output layer all the way back to the input layer. The backward locking in backpropagation algorithm constrains us from updating network layers in parallel and fully leveraging the computing resources. Recently, several algorithms have been proposed for breaking the backward locking. However, their performances degrade seriously when networks are deep. In this paper, we propose decoupled parallel backpropagation algorithm for deep learning optimization with convergence guarantee. Firstly, we decouple the backpropagation algorithm using delayed gradients, and show that the backward locking is removed when we split the networks into multiple modules. Then, we utilize decoupled parallel backpropagation in two stochastic methods and prove that our method guarantees convergence to critical points for the non-convex problem. Finally, we perform experiments for training deep convolutional neural networks on benchmark datasets. The experimental results not only confirm our theoretical analysis, but also demonstrate that the proposed method can achieve significant speedup without loss of accuracy.
Motion blur, out of focus, insufficient spatial resolution, lossy compression and many other factors can all cause an image to have poor quality. However, image quality is a largely ignored issue in traditional pattern recognition literature. In this paper, we use face detection and recognition as case studies to show that image quality is an essential factor which will affect the performances of traditional algorithms. We demonstrated that it is not the image quality itself that is the most important, but rather the quality of the images in the training set should have similar quality as those in the testing set. To handle real-world application scenarios where images with different kinds and severities of degradation can be presented to the system, we have developed a quality classified image analysis framework to deal with images of mixed qualities adaptively. We use deep neural networks first to classify images based on their quality classes and then design a separate face detector and recognizer for images in each quality class. We will present experimental results to show that our quality classified framework can accurately classify images based on the type and severity of image degradations and can significantly boost the performances of state-of-the-art face detector and recognizer in dealing with image datasets containing mixed quality images.