Models, code, and papers for "Wei Hua":

##### Dual Adaptivity: A Universal Algorithm for Minimizing the Adaptive Regret of Convex Functions

Jun 26, 2019
Lijun Zhang, Guanghui Wang, Wei-Wei Tu, Zhi-Hua Zhou

To deal with changing environments, a new performance measure---adaptive regret, defined as the maximum static regret over any interval, is proposed in online learning. Under the setting of online convex optimization, several algorithms have been successfully developed to minimize the adaptive regret. However, existing algorithms lack universality in the sense that they can only handle one type of convex functions and need apriori knowledge of parameters. By contrast, there exist universal algorithms, such as MetaGrad, that attain optimal static regret for multiple types of convex functions simultaneously. Along this line of research, this paper presents the first universal algorithm for minimizing the adaptive regret of convex functions. Specifically, we borrow the idea of maintaining multiple learning rates in MetaGrad to handle the uncertainty of functions, and utilize the technique of sleeping experts to capture changing environments. In this way, our algorithm automatically adapts to the property of functions (convex, exponentially concave, or strongly convex), as well as the nature of environments (stationary or changing). As a by product, it also allows the type of functions to switch between rounds.

##### Deep Descriptor Transforming for Image Co-Localization

Reusable model design becomes desirable with the rapid expansion of machine learning applications. In this paper, we focus on the reusability of pre-trained deep convolutional models. Specifically, different from treating pre-trained models as feature extractors, we reveal more treasures beneath convolutional layers, i.e., the convolutional activations could act as a detector for the common object in the image co-localization problem. We propose a simple but effective method, named Deep Descriptor Transforming (DDT), for evaluating the correlations of descriptors and then obtaining the category-consistent regions, which can accurately locate the common object in a set of images. Empirical studies validate the effectiveness of the proposed DDT method. On benchmark image co-localization datasets, DDT consistently outperforms existing state-of-the-art methods by a large margin. Moreover, DDT also demonstrates good generalization ability for unseen categories and robustness for dealing with noisy data.

* Accepted by IJCAI 2017
##### Learning to Simulate Human Movement

Mar 03, 2020
Hua Wei, Zhenhui Li

Modeling how human moves on the space is useful for policy-making in transportation, public safety, and public health. The human movements can be viewed as a dynamic process that human transits between states (e.g., locations) over time. In the human world where both intelligent agents like humans or vehicles with human drivers play an important role, the states of agents mostly describe human activities, and the state transition is influenced by both the human decisions and physical constraints from the real-world system (e.g., agents need to spend time to move over a certain distance). Therefore, the modeling of state transition should include the modeling of the agent's decision process and the physical system dynamics. In this paper, we propose to model state transition in human movement through learning decision model and integrating system dynamics. In experiments on real-world datasets, we demonstrate that the proposed method can achieve superior performance against the state-of-the-art methods in predicting the next state and generating long-term future states.

* 9 pages, 6 figures
##### Theoretical Foundation of Co-Training and Disagreement-Based Algorithms

Aug 15, 2017
Wei Wang, Zhi-Hua Zhou

Disagreement-based approaches generate multiple classifiers and exploit the disagreement among them with unlabeled data to improve learning performance. Co-training is a representative paradigm of them, which trains two classifiers separately on two sufficient and redundant views; while for the applications where there is only one view, several successful variants of co-training with two different classifiers on single-view data instead of two views have been proposed. For these disagreement-based approaches, there are several important issues which still are unsolved, in this article we present theoretical analyses to address these issues, which provides a theoretical foundation of co-training and disagreement-based approaches.

##### On the Consistency of AUC Pairwise Optimization

Jul 02, 2014
Wei Gao, Zhi-Hua Zhou

AUC (area under ROC curve) is an important evaluation criterion, which has been popularly used in many learning tasks such as class-imbalance learning, cost-sensitive learning, learning to rank, etc. Many learning approaches try to optimize AUC, while owing to the non-convexity and discontinuousness of AUC, almost all approaches work with surrogate loss functions. Thus, the consistency of AUC is crucial; however, it has been almost untouched before. In this paper, we provide a sufficient condition for the asymptotic consistency of learning approaches based on surrogate loss functions. Based on this result, we prove that exponential loss and logistic loss are consistent with AUC, but hinge loss is inconsistent. Then, we derive the $q$-norm hinge loss and general hinge loss that are consistent with AUC. We also derive the consistent bounds for exponential loss and logistic loss, and obtain the consistent bounds for many surrogate loss functions under the non-noise setting. Further, we disclose an equivalence between the exponential surrogate loss of AUC and exponential surrogate loss of accuracy, and one straightforward consequence of such finding is that AdaBoost and RankBoost are equivalent.

##### Dropout Rademacher Complexity of Deep Neural Networks

Jul 02, 2014
Wei Gao, Zhi-Hua Zhou

Great successes of deep neural networks have been witnessed in various real applications. Many algorithmic and implementation techniques have been developed, however, theoretical understanding of many aspects of deep neural networks is far from clear. A particular interesting issue is the usefulness of dropout, which was motivated from the intuition of preventing complex co-adaptation of feature detectors. In this paper, we study the Rademacher complexity of different types of dropout, and our theoretical results disclose that for shallow neural networks (with one or none hidden layer) dropout is able to reduce the Rademacher complexity in polynomial, whereas for deep neural networks it can amazingly lead to an exponential reduction of the Rademacher complexity.

* 20 pagea
##### On the Doubt about Margin Explanation of Boosting

Aug 28, 2013
Wei Gao, Zhi-Hua Zhou

Margin theory provides one of the most popular explanations to the success of \texttt{AdaBoost}, where the central point lies in the recognition that \textit{margin} is the key for characterizing the performance of \texttt{AdaBoost}. This theory has been very influential, e.g., it has been used to argue that \texttt{AdaBoost} usually does not overfit since it tends to enlarge the margin even after the training error reaches zero. Previously the \textit{minimum margin bound} was established for \texttt{AdaBoost}, however, \cite{Breiman1999} pointed out that maximizing the minimum margin does not necessarily lead to a better generalization. Later, \cite{Reyzin:Schapire2006} emphasized that the margin distribution rather than minimum margin is crucial to the performance of \texttt{AdaBoost}. In this paper, we first present the \textit{$k$th margin bound} and further study on its relationship to previous work such as the minimum margin bound and Emargin bound. Then, we improve the previous empirical Bernstein bounds \citep{Maurer:Pontil2009,Audibert:Munos:Szepesvari2009}, and based on such findings, we defend the margin-based explanation against Breiman's doubts by proving a new generalization error bound that considers exactly the same factors as \cite{Schapire:Freund:Bartlett:Lee1998} but is sharper than \cite{Breiman1999}'s minimum margin bound. By incorporating factors such as average margin and variance, we present a generalization error bound that is heavily related to the whole margin distribution. We also provide margin distribution bounds for generalization error of voting classifiers in finite VC-dimension space.

* Artificial Intelligence 203:1-18 2013
* 35 pages
##### Multi-View Active Learning in the Non-Realizable Case

Oct 29, 2010
Wei Wang, Zhi-Hua Zhou

The sample complexity of active learning under the realizability assumption has been well-studied. The realizability assumption, however, rarely holds in practice. In this paper, we theoretically characterize the sample complexity of active learning in the non-realizable case under multi-view setting. We prove that, with unbounded Tsybakov noise, the sample complexity of multi-view active learning can be $\widetilde{O}(\log\frac{1}{\epsilon})$, contrasting to single-view setting where the polynomial improvement is the best possible achievement. We also prove that in general multi-view setting the sample complexity of active learning with unbounded Tsybakov noise is $\widetilde{O}(\frac{1}{\epsilon})$, where the order of $1/\epsilon$ is independent of the parameter in Tsybakov noise, contrasting to previous polynomial bounds where the order of $1/\epsilon$ is related to the parameter in Tsybakov noise.

* 22 pages, 1 figure
##### 3D Quasi-Recurrent Neural Network for Hyperspectral Image Denoising

Mar 10, 2020
Kaixuan Wei, Ying Fu, Hua Huang

In this paper, we propose an alternating directional 3D quasi-recurrent neural network for hyperspectral image (HSI) denoising, which can effectively embed the domain knowledge -- structural spatio-spectral correlation and global correlation along spectrum. Specifically, 3D convolution is utilized to extract structural spatio-spectral correlation in an HSI, while a quasi-recurrent pooling function is employed to capture the global correlation along spectrum. Moreover, alternating directional structure is introduced to eliminate the causal dependency with no additional computation cost. The proposed model is capable of modeling spatio-spectral dependency while preserving the flexibility towards HSIs with arbitrary number of bands. Extensive experiments on HSI denoising demonstrate significant improvement over state-of-the-arts under various noise settings, in terms of both restoration accuracy and computation time. Our code is available at https://github.com/Vandermode/QRNN3D.

* Accepted by IEEE Transactions on Neural Network and Learning System (TNNLS), 2020
##### Global-Local Metamodel Assisted Two-Stage Optimization via Simulation

Oct 13, 2019
Wei Xie, Yuan Yi, Hua Zheng

To integrate strategic, tactical and operational decisions, the two-stage optimization has been widely used to guide dynamic decision making. In this paper, we study the two-stage stochastic programming for complex systems with unknown response estimated by simulation. We introduce the global-local metamodel assisted two-stage optimization via simulation that can efficiently employ the simulation resource to iteratively solve for the optimal first- and second-stage decisions. Specifically, at each visited first-stage decision, we develop a local metamodel to simultaneously solve a set of scenario-based second-stage optimization problems, which also allows us to estimate the optimality gap. Then, we construct a global metamodel accounting for the errors induced by: (1) using a finite number of scenarios to approximate the expected future cost occurring in the planning horizon, (2) second-stage optimality gap, and (3) finite visited first-stage decisions. Assisted by the global-local metamodel, we propose a new simulation optimization approach that can efficiently and iteratively search for the optimal first- and second-stage decisions. Our framework can guarantee the convergence of optimal solution for the discrete two-stage optimization with unknown objective, and the empirical study indicates that it achieves substantial efficiency and accuracy.

##### BERT-based Ranking for Biomedical Entity Normalization

Aug 09, 2019
Zongcheng Ji, Qiang Wei, Hua Xu

Developing high-performance entity normalization algorithms that can alleviate the term variation problem is of great interest to the biomedical community. Although deep learning-based methods have been successfully applied to biomedical entity normalization, they often depend on traditional context-independent word embeddings. Bidirectional Encoder Representations from Transformers (BERT), BERT for Biomedical Text Mining (BioBERT) and BERT for Clinical Text Mining (ClinicalBERT) were recently introduced to pre-train contextualized word representation models using bidirectional Transformers, advancing the state-of-the-art for many natural language processing tasks. In this study, we proposed an entity normalization architecture by fine-tuning the pre-trained BERT / BioBERT / ClinicalBERT models and conducted extensive experiments to evaluate the effectiveness of the pre-trained models for biomedical entity normalization using three different types of datasets. Our experimental results show that the best fine-tuned models consistently outperformed previous methods and advanced the state-of-the-art for biomedical entity normalization, with up to 1.17% increase in accuracy.

* 9 pages, 1 figure, 4 tables
##### Optimal Control of a Differentially Flat 2D Spring-Loaded Inverted Pendulum Model

Nov 17, 2019
Hua Chen, Patrick M. Wensing, Wei Zhang

This paper considers the optimal control problem of an extended spring-loaded inverted pendulum (SLIP) model with two additional actuators for active leg length and hip torque modulation. These additional features arise naturally in practice, allowing for consideration of swing leg kinematics during flight and active control over stance dynamics. On the other hand, nonlinearity and the hybrid nature of the overall SLIP dynamics introduce challenges in the analysis and control of the model. In this paper, we first show that the stance dynamics of the considered SLIP model are differentially flat, which has a strong implication regarding controllability of the stance dynamics. Leveraging this powerful property, a tractable optimal control strategy is developed. This strategy enables online solution while also treating the hybrid nature of the SLIP dynamics. Together with the optimal control strategy, the extended SLIP model grants active disturbance rejection capability at any point during the gait. Performance of the proposed control strategy is demonstrated via numerical tests and shows significant advantage over existing methods.

##### A Continuously Growing Dataset of Sentential Paraphrases

Aug 01, 2017
Wuwei Lan, Siyu Qiu, Hua He, Wei Xu

A major challenge in paraphrase research is the lack of parallel corpora. In this paper, we present a new method to collect large-scale sentential paraphrases from Twitter by linking tweets through shared URLs. The main advantage of our method is its simplicity, as it gets rid of the classifier or human in the loop needed to select data before annotation and subsequent application of paraphrase identification algorithms in the previous work. We present the largest human-labeled paraphrase corpus to date of 51,524 sentence pairs and the first cross-domain benchmarking for automatic paraphrase identification. In addition, we show that more than 30,000 new sentential paraphrases can be easily and continuously captured every month at ~70% precision, and demonstrate their utility for downstream NLP tasks through phrasal paraphrase extraction. We make our code and data freely available.

* 11 pages, accepted to EMNLP 2017
##### A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising

Mar 28, 2020
Kaixuan Wei, Ying Fu, Jiaolong Yang, Hua Huang

Lacking rich and realistic data, learned single image denoising algorithms generalize poorly to real raw images that do not resemble the data used for training. Although the problem can be alleviated by the heteroscedastic Gaussian model for noise synthesis, the noise sources caused by digital camera electronics are still largely overlooked, despite their significant effect on raw measurement, especially under extremely low-light condition. To address this issue, we present a highly accurate noise formation model based on the characteristics of CMOS photosensors, thereby enabling us to synthesize realistic samples that better match the physics of image formation process. Given the proposed noise model, we additionally propose a method to calibrate the noise parameters for available modern digital cameras, which is simple and reproducible for any new device. We systematically study the generalizability of a neural network trained with existing schemes, by introducing a new low-light denoising dataset that covers many modern digital cameras from diverse brands. Extensive empirical results collectively show that by utilizing our proposed noise formation model, a network can reach the capability as if it had been trained with rich real data, which demonstrates the effectiveness of our noise formation model.

* Accepted to CVPR 2020 (oral); code is available at https://github.com/Vandermode/NoiseModel
##### A Probabilistic Simulator of Spatial Demand for Product Allocation

Jan 09, 2020
Porter Jenkins, Hua Wei, J. Stockton Jenkins, Zhenhui Li

Connecting consumers with relevant products is a very important problem in both online and offline commerce. In physical retail, product placement is an effective way to connect consumers with products. However, selecting product locations within a store can be a tedious process. Moreover, learning important spatial patterns in offline retail is challenging due to the scarcity of data and the high cost of exploration and experimentation in the physical world. To address these challenges, we propose a stochastic model of spatial demand in physical retail. We show that the proposed model is more predictive of demand than existing baselines. We also perform a preliminary study into different automation techniques and show that an optimal product allocation policy can be learned through Deep Q-Learning.

* 8 pages, The AAAI-20 Workshop on Intelligent Process Automation
##### A Survey on Traffic Signal Control Methods

Apr 17, 2019
Hua Wei, Guanjie Zheng, Vikash Gayah, Zhenhui Li

Traffic signal control is an important and challenging real-world problem, which aims to minimize the travel time of vehicles by coordinating their movements at the road intersections. Current traffic signal control systems in use still rely heavily on oversimplified information and rule-based methods, although we now have richer data, more computing power and advanced methods to drive the development of intelligent transportation. With the growing interest in intelligent transportation using machine learning methods like reinforcement learning, this survey covers the widely acknowledged transportation approaches and a comprehensive list of recent literature on reinforcement for traffic signal control. We hope this survey can foster interdisciplinary research on this important topic.

* 30 pages
##### A Compositional Textual Model for Recognition of Imperfect Word Images

Nov 27, 2018
Wei Tang, John Corring, Ying Wu, Gang Hua

Printed text recognition is an important problem for industrial OCR systems. Printed text is constructed in a standard procedural fashion in most settings. We develop a mathematical model for this process that can be applied to the backward inference problem of text recognition from an image. Through ablation experiments we show that this model is realistic and that a multi-task objective setting can help to stabilize estimation of its free parameters, enabling use of conventional deep learning methods. Furthermore, by directly modeling the geometric perturbations of text synthesis we show that our model can help recover missing characters from incomplete text regions, the bane of multicomponent OCR systems, enabling recognition even when the detection returns incomplete information.

##### Matrix Linear Discriminant Analysis

Sep 24, 2018
Wei Hu, Weining Shen, Hua Zhou, Dehan Kong

We propose a novel linear discriminant analysis approach for the classification of high-dimensional matrix-valued data that commonly arises from imaging studies. Motivated by the equivalence of the conventional linear discriminant analysis and the ordinary least squares, we consider an efficient nuclear norm penalized regression that encourages a low-rank structure. Theoretical properties including a non-asymptotic risk bound and a rank consistency result are established. Simulation studies and an application to electroencephalography data show the superior performance of the proposed method over the existing approaches.

##### On the Resistance of Nearest Neighbor to Random Noisy Labels

Sep 13, 2018
Wei Gao, Bin-Bin Yang, Zhi-Hua Zhou

Nearest neighbor has always been one of the most appealing non-parametric approaches in machine learning, pattern recognition, computer vision, etc. Previous empirical studies partly shows that nearest neighbor is resistant to noise, yet there is a lack of deep analysis. This work presents the finite-sample and distribution-dependent bounds on the consistency of nearest neighbor in the random noise setting. The theoretical results show that, for asymmetric noises, k-nearest neighbor is robust enough to classify most data correctly, except for a handful of examples, whose labels are totally misled by random noises. For symmetric noises, however, k-nearest neighbor achieves the same consistent rate as that of noise-free setting, which verifies the resistance of k-nearest neighbor to random noisy labels. Motivated by the theoretical analysis, we propose the Robust k-Nearest Neighbor (RkNN) approach to deal with noisy labels. The basic idea is to make unilateral corrections to examples, whose labels are totally misled by random noises, and classify the others directly by utilizing the robustness of k-nearest neighbor. We verify the effectiveness of the proposed algorithm both theoretically and empirically.

* 35 pages
##### Segmentation of ultrasound images of thyroid nodule for assisting fine needle aspiration cytology

Nov 03, 2012
Jie Zhao, Wei Zheng, Li Zhang, Hua Tian

The incidence of thyroid nodule is very high and generally increases with the age. Thyroid nodule may presage the emergence of thyroid cancer. The thyroid nodule can be completely cured if detected early. Fine needle aspiration cytology is a recognized early diagnosis method of thyroid nodule. There are still some limitations in the fine needle aspiration cytology, and the ultrasound diagnosis of thyroid nodule has become the first choice for auxiliary examination of thyroid nodular disease. If we could combine medical imaging technology and fine needle aspiration cytology, the diagnostic rate of thyroid nodule would be improved significantly. The properties of ultrasound will degrade the image quality, which makes it difficult to recognize the edges for physicians. Image segmentation technique based on graph theory has become a research hotspot at present. Normalized cut (Ncut) is a representative one, which is suitable for segmentation of feature parts of medical image. However, how to solve the normalized cut has become a problem, which needs large memory capacity and heavy calculation of weight matrix. It always generates over segmentation or less segmentation which leads to inaccurate in the segmentation. The speckle noise in B ultrasound image of thyroid tumor makes the quality of the image deteriorate. In the light of this characteristic, we combine the anisotropic diffusion model with the normalized cut in this paper. After the enhancement of anisotropic diffusion model, it removes the noise in the B ultrasound image while preserves the important edges and local details. This reduces the amount of computation in constructing the weight matrix of the improved normalized cut and improves the accuracy of the final segmentation results. The feasibility of the method is proved by the experimental results.

* 15pages,13figures