Models, code, and papers for "Jun Zhao":

Adaptive Statistical Learning with Bayesian Differential Privacy

Nov 02, 2019
Jun Zhao

In statistical learning, a dataset is often partitioned into two parts: the training set and the holdout (i.e., testing) set. For instance, the training set is used to learn a predictor, and then the holdout set is used for estimating the accuracy of the predictor on the true distribution. However, often in practice, the holdout dataset is reused and the estimates tested on the holdout dataset are chosen adaptively based on the results of prior estimates, leading to that the predictor may become dependent of the holdout set. Hence, overfitting may occur, and the learned models may not generalize well to the unseen datasets. Prior studies have established connections between the stability of a learning algorithm and its ability to generalize, but the traditional generalization is not robust to adaptive composition. Recently, Dwork et al. in NIPS, STOC, and Science 2015 show that the holdout dataset from i.i.d. data samples can be reused in adaptive statistical learning, if the estimates are perturbed and coordinated using techniques developed for differential privacy, which is a widely used notion to quantify privacy. Yet, the results of Dwork et al. are applicable to only the case of i.i.d. samples. In contrast, correlations between data samples exist because of various behavioral, social, and genetic relationships between users. Our results in adaptive statistical learning generalize the results of Dwork et al. for i.i.d. data samples to arbitrarily correlated data. Specifically, we show that the holdout dataset from correlated samples can be reused in adaptive statistical learning, if the estimates are perturbed and coordinated using techniques developed for Bayesian differential privacy, which is a privacy notion recently introduced by Yang et al. in SIGMOD 2015 to broaden the application scenarios of differential privacy when data records are correlated.

* WPES '17 Proceedings of the 2017 on Workshop on Privacy in the Electronic Society, held in conjunction with ACM SIGSAC 25th Annual Conference on Computer and Communications Security (CCS), Dallas, Texas, US, October 2017 

  Click for Model/Code and Paper
Cost-Effective Training of Deep CNNs with Active Model Adaptation

Jun 05, 2018
Sheng-Jun Huang, Jia-Wei Zhao, Zhao-Yang Liu

Deep convolutional neural networks have achieved great success in various applications. However, training an effective DNN model for a specific task is rather challenging because it requires a prior knowledge or experience to design the network architecture, repeated trial-and-error process to tune the parameters, and a large set of labeled data to train the model. In this paper, we propose to overcome these challenges by actively adapting a pre-trained model to a new task with less labeled examples. Specifically, the pre-trained model is iteratively fine tuned based on the most useful examples. The examples are actively selected based on a novel criterion, which jointly estimates the potential contribution of an instance on optimizing the feature representation as well as improving the classification model for the target task. On one hand, the pre-trained model brings plentiful information from its original task, avoiding redesign of the network architecture or training from scratch; and on the other hand, the labeling cost can be significantly reduced by active label querying. Experiments on multiple datasets and different pre-trained models demonstrate that the proposed approach can achieve cost-effective training of DNNs.

* 9 pages 

  Click for Model/Code and Paper
Safe and Efficient Screening For Sparse Support Vector Machine

Oct 30, 2013
Zheng Zhao, Jun Liu

Screening is an effective technique for speeding up the training process of a sparse learning model by removing the features that are guaranteed to be inactive the process. In this paper, we present a efficient screening technique for sparse support vector machine based on variational inequality. The technique is both efficient and safe.

  Click for Model/Code and Paper
Quantum transport senses community structure in networks

Jan 12, 2018
Chenchao Zhao, Jun S. Song

Quantum time evolution exhibits rich physics, attributable to the interplay between the density and phase of a wave function. However, unlike classical heat diffusion, the wave nature of quantum mechanics has not yet been extensively explored in modern data analysis. We propose that the Laplace transform of quantum transport (QT) can be used to construct an ensemble of maps from a given complex network to a circle $S^1$, such that closely-related nodes on the network are grouped into sharply concentrated clusters on $S^1$. The resulting QT clustering (QTC) algorithm is as powerful as the state-of-the-art spectral clustering in discerning complex geometric patterns and more robust when clusters show strong density variations or heterogeneity in size. The observed phenomenon of QTC can be interpreted as a collective behavior of the microscopic nodes that evolve as macroscopic cluster orbitals in an effective tight-binding model recapitulating the network. Python source code implementing the algorithm and examples are available at

* Phys. Rev. E 98, 022301 (2018) 

  Click for Model/Code and Paper
Exact heat kernel on a hypersphere and its applications in kernel SVM

Nov 20, 2017
Chenchao Zhao, Jun S. Song

Many contemporary statistical learning methods assume a Euclidean feature space. This paper presents a method for defining similarity based on hyperspherical geometry and shows that it often improves the performance of support vector machine compared to other competing similarity measures. Specifically, the idea of using heat diffusion on a hypersphere to measure similarity has been previously proposed, demonstrating promising results based on a heuristic heat kernel obtained from the zeroth order parametrix expansion; however, how well this heuristic kernel agrees with the exact hyperspherical heat kernel remains unknown. This paper presents a higher order parametrix expansion of the heat kernel on a unit hypersphere and discusses several problems associated with this expansion method. We then compare the heuristic kernel with an exact form of the heat kernel expressed in terms of a uniformly and absolutely convergent series in high-dimensional angular momentum eigenmodes. Being a natural measure of similarity between sample points dwelling on a hypersphere, the exact kernel often shows superior performance in kernel SVM classifications applied to text mining, tumor somatic mutation imputation, and stock market analysis.

  Click for Model/Code and Paper
A Survey of Simultaneous Localization and Mapping

Aug 24, 2019
Baichuan Huang, Jun Zhao, Jingbin Liu

Simultaneous Localization and Mapping (SLAM) achieves the purpose of simultaneous positioning and map construction based on self-perception. The paper makes an overview in SLAM including Lidar SLAM, visual SLAM, and their fusion. For Lidar or visual SLAM, the survey illustrates the basic type and product of sensors, open source system in sort and history, deep learning embedded, the challenge and future. Additionally, visual inertial odometry is supplemented. For Lidar and visual fused SLAM, the paper highlights the multi-sensors calibration, the fusion in hardware, data, task layer. The open question and forward thinking end the paper. The contributions of this paper can be summarized as follows: the paper provides a high quality and full-scale overview in SLAM. It's very friendly for new researchers to hold the development of SLAM and learn it very obviously. Also, the paper can be considered as dictionary for experienced researchers to search and find new interested orientation.

  Click for Model/Code and Paper
Successive Ray Refinement and Its Application to Coordinate Descent for LASSO

Dec 17, 2015
Jun Liu, Zheng Zhao, Ruiwen Zhang

Coordinate descent is one of the most popular approaches for solving Lasso and its extensions due to its simplicity and efficiency. When applying coordinate descent to solving Lasso, we update one coordinate at a time while fixing the remaining coordinates. Such an update, which is usually easy to compute, greedily decreases the objective function value. In this paper, we aim to improve its computational efficiency by reducing the number of coordinate descent iterations. To this end, we propose a novel technique called Successive Ray Refinement (SRR). SRR makes use of the following ray continuation property on the successive iterations: for a particular coordinate, the value obtained in the next iteration almost always lies on a ray that starts at its previous iteration and passes through the current iteration. Motivated by this ray-continuation property, we propose that coordinate descent be performed not directly on the previous iteration but on a refined search point that has the following properties: on one hand, it lies on a ray that starts at a history solution and passes through the previous iteration, and on the other hand, it achieves the minimum objective function value among all the points on the ray. We propose two schemes for defining the search point and show that the refined search point can be efficiently obtained. Empirical results for real and synthetic data sets show that the proposed SRR can significantly reduce the number of coordinate descent iterations, especially for small Lasso regularization parameters.

* 26 pages, 6 figures, 6 tables 

  Click for Model/Code and Paper
Fast Asynchronous Parallel Stochastic Gradient Decent

Aug 24, 2015
Shen-Yi Zhao, Wu-Jun Li

Stochastic gradient descent~(SGD) and its variants have become more and more popular in machine learning due to their efficiency and effectiveness. To handle large-scale problems, researchers have recently proposed several parallel SGD methods for multicore systems. However, existing parallel SGD methods cannot achieve satisfactory performance in real applications. In this paper, we propose a fast asynchronous parallel SGD method, called AsySVRG, by designing an asynchronous strategy to parallelize the recently proposed SGD variant called stochastic variance reduced gradient~(SVRG). Both theoretical and empirical results show that AsySVRG can outperform existing state-of-the-art parallel SGD methods like Hogwild! in terms of convergence rate and computation cost.

  Click for Model/Code and Paper
Copy-Enhanced Heterogeneous Information Learning for Dialogue State Tracking

Aug 21, 2019
Qingbin Liu, Shizhu He, Kang Liu, Shengping Liu, Jun Zhao

Dialogue state tracking (DST) is an essential component in task-oriented dialogue systems, which estimates user goals at every dialogue turn. However, most previous approaches usually suffer from the following problems. Many discriminative models, especially end-to-end (E2E) models, are difficult to extract unknown values that are not in the candidate ontology; previous generative models, which can extract unknown values from utterances, degrade the performance due to ignoring the semantic information of pre-defined ontology. Besides, previous generative models usually need a hand-crafted list to normalize the generated values. How to integrate the semantic information of pre-defined ontology and dialogue text (heterogeneous texts) to generate unknown values and improve performance becomes a severe challenge. In this paper, we propose a Copy-Enhanced Heterogeneous Information Learning model with multiple encoder-decoder for DST (CEDST), which can effectively generate all possible values including unknown values by copying values from heterogeneous texts. Meanwhile, CEDST can effectively decompose the large state space into several small state spaces through multi-encoder, and employ multi-decoder to make full use of the reduced spaces to generate values. Multi-encoder-decoder architecture can significantly improve performance. Experiments show that CEDST can achieve state-of-the-art results on two datasets and our constructed datasets with many unknown values.

* 12 pages, 4 figures 

  Click for Model/Code and Paper
A Boosting Framework of Factorization Machine

Apr 17, 2018
Longfei Li, Peilin Zhao, Jun Zhou, Xiaolong Li

Recently, Factorization Machines (FM) has become more and more popular for recommendation systems, due to its effectiveness in finding informative interactions between features. Usually, the weights for the interactions is learnt as a low rank weight matrix, which is formulated as an inner product of two low rank matrices. This low rank can help improve the generalization ability of Factorization Machines. However, to choose the rank properly, it usually needs to run the algorithm for many times using different ranks, which clearly is inefficient for some large-scale datasets. To alleviate this issue, we propose an Adaptive Boosting framework of Factorization Machines (AdaFM), which can adaptively search for proper ranks for different datasets without re-training. Instead of using a fixed rank for FM, the proposed algorithm will adaptively gradually increases its rank according to its performance until the performance does not grow, using boosting strategy. To verify the performance of our proposed framework, we conduct an extensive set of experiments on many real-world datasets. Encouraging empirical results shows that the proposed algorithms are generally more effective than state-of-the-art other Factorization Machines.

  Click for Model/Code and Paper
Generating Questions for Knowledge Bases via Incorporating Diversified Contexts and Answer-Aware Loss

Oct 29, 2019
Cao Liu, Kang Liu, Shizhu He, Zaiqing Nie, Jun Zhao

We tackle the task of question generation over knowledge bases. Conventional methods for this task neglect two crucial research issues: 1) the given predicate needs to be expressed; 2) the answer to the generated question needs to be definitive. In this paper, we strive toward the above two issues via incorporating diversified contexts and answer-aware loss. Specifically, we propose a neural encoder-decoder model with multi-level copy mechanisms to generate such questions. Furthermore, the answer aware loss is introduced to make generated questions corresponding to more definitive answers. Experiments demonstrate that our model achieves state-of-the-art performance. Meanwhile, such generated question can express the given predicate and correspond to a definitive answer.

* Accepted to EMNLP 2019 

  Click for Model/Code and Paper
Incorporating Interlocutor-Aware Context into Response Generation on Multi-Party Chatbots

Oct 29, 2019
Cao Liu, Kang Liu, Shizhu He, Zaiqing Nie, Jun Zhao

Conventional chatbots focus on two-party response generation, which simplifies the real dialogue scene. In this paper, we strive toward a novel task of Response Generation on Multi-Party Chatbot (RGMPC), where the generated responses heavily rely on the interlocutors' roles (e.g., speaker and addressee) and their utterances. Unfortunately, complex interactions among the interlocutors' roles make it challenging to precisely capture conversational contexts and interlocutors' information. Facing this challenge, we present a response generation model which incorporates Interlocutor-aware Contexts into Recurrent Encoder-Decoder frameworks (ICRED) for RGMPC. Specifically, we employ interactive representations to capture dialogue contexts for different interlocutors. Moreover, we leverage an addressee memory to enhance contextual interlocutor information for the target addressee. Finally, we construct a corpus for RGMPC based on an existing open-access dataset. Automatic and manual evaluations demonstrate that the ICRED remarkably outperforms strong baselines.

* Accepted to CoNLL 2019 

  Click for Model/Code and Paper
ADASS: Adaptive Sample Selection for Training Acceleration

Jun 11, 2019
Shen-Yi Zhao, Hao Gao, Wu-Jun Li

Stochastic gradient decent~(SGD) and its variants, including some accelerated variants, have become popular for training in machine learning. However, in all existing SGD and its variants, the sample size in each iteration~(epoch) of training is the same as the size of the full training set. In this paper, we propose a new method, called \underline{ada}ptive \underline{s}ample \underline{s}election~(ADASS), for training acceleration. During different epoches of training, ADASS only need to visit different training subsets which are adaptively selected from the full training set according to the Lipschitz constants of the loss functions on samples. It means that in ADASS the sample size in each epoch of training can be smaller than the size of the full training set, by discarding some samples. ADASS can be seamlessly integrated with existing optimization methods, such as SGD and momentum SGD, for training acceleration. Theoretical results show that the learning accuracy of ADASS is comparable to that of counterparts with full training set. Furthermore, empirical results on both shallow models and deep models also show that ADASS can accelerate the training process of existing methods without sacrificing accuracy.

  Click for Model/Code and Paper
Clustered Reinforcement Learning

Jun 06, 2019
Xiao Ma, Shen-Yi Zhao, Wu-Jun Li

Exploration strategy design is one of the challenging problems in reinforcement learning~(RL), especially when the environment contains a large state space or sparse rewards. During exploration, the agent tries to discover novel areas or high reward~(quality) areas. In most existing methods, the novelty and quality in the neighboring area of the current state are not well utilized to guide the exploration of the agent. To tackle this problem, we propose a novel RL framework, called \underline{c}lustered \underline{r}einforcement \underline{l}earning~(CRL), for efficient exploration in RL. CRL adopts clustering to divide the collected states into several clusters, based on which a bonus reward reflecting both novelty and quality in the neighboring area~(cluster) of the current state is given to the agent. Experiments on a continuous control task and several \emph{Atari 2600} games show that CRL can outperform other state-of-the-art methods to achieve the best performance in most cases.

* 16pages, 3 figures 

  Click for Model/Code and Paper
On the Convergence of Memory-Based Distributed SGD

May 30, 2019
Shen-Yi Zhao, Hao Gao, Wu-Jun Li

Distributed stochastic gradient descent~(DSGD) has been widely used for optimizing large-scale machine learning models, including both convex and non-convex models. With the rapid growth of model size, huge communication cost has been the bottleneck of traditional DSGD. Recently, many communication compression methods have been proposed. Memory-based distributed stochastic gradient descent~(M-DSGD) is one of the efficient methods since each worker communicates a sparse vector in each iteration so that the communication cost is small. Recent works propose the convergence rate of M-DSGD when it adopts vanilla SGD. However, there is still a lack of convergence theory for M-DSGD when it adopts momentum SGD. In this paper, we propose a universal convergence analysis for M-DSGD by introducing \emph{transformation equation}. The transformation equation describes the relation between traditional DSGD and M-DSGD so that we can transform M-DSGD to its corresponding DSGD. Hence we get the convergence rate of M-DSGD with momentum for both convex and non-convex problems. Furthermore, we combine M-DSGD and stagewise learning that the learning rate of M-DSGD in each stage is a constant and is decreased by stage, instead of iteration. Using the transformation equation, we propose the convergence rate of stagewise M-DSGD which bridges the gap between theory and practice.

  Click for Model/Code and Paper
Quantized Epoch-SGD for Communication-Efficient Distributed Learning

Jan 10, 2019
Shen-Yi Zhao, Hao Gao, Wu-Jun Li

Due to its efficiency and ease to implement, stochastic gradient descent (SGD) has been widely used in machine learning. In particular, SGD is one of the most popular optimization methods for distributed learning. Recently, quantized SGD (QSGD), which adopts quantization to reduce the communication cost in SGD-based distributed learning, has attracted much attention. Although several QSGD methods have been proposed, some of them are heuristic without theoretical guarantee, and others have high quantization variance which makes the convergence become slow. In this paper, we propose a new method, called Quantized Epoch-SGD (QESGD), for communication-efficient distributed learning. QESGD compresses (quantizes) the parameter with variance reduction, so that it can get almost the same performance as that of SGD with less communication cost. QESGD is implemented on the Parameter Server framework, and empirical results on distributed deep learning show that QESGD can outperform other state-of-the-art quantization methods to achieve the best performance.

  Click for Model/Code and Paper
Kid-Net: Convolution Networks for Kidney Vessels Segmentation from CT-Volumes

Jun 18, 2018
Ahmed Taha, Pechin Lo, Junning Li, Tao Zhao

Semantic image segmentation plays an important role in modeling patient-specific anatomy. We propose a convolution neural network, called Kid-Net, along with a training schema to segment kidney vessels: artery, vein and collecting system. Such segmentation is vital during the surgical planning phase in which medical decisions are made before surgical incision. Our main contribution is developing a training schema that handles unbalanced data, reduces false positives and enables high-resolution segmentation with a limited memory budget. These objectives are attained using dynamic weighting, random sampling and 3D patch segmentation. Manual medical image annotation is both time-consuming and expensive. Kid-Net reduces kidney vessels segmentation time from matter of hours to minutes. It is trained end-to-end using 3D patches from volumetric CT-images. A complete segmentation for a 512x512x512 CT-volume is obtained within a few minutes (1-2 mins) by stitching the output 3D patches together. Feature down-sampling and up-sampling are utilized to achieve higher classification and localization accuracies. Quantitative and qualitative evaluation results on a challenging testing dataset show Kid-Net competence.

  Click for Model/Code and Paper
Normalized Cut with Adaptive Similarity and Spatial Regularization

Jun 06, 2018
Faqiang Wang, Cuicui Zhao, Jun Liu, Haiyang Huang

In this paper, we propose a normalized cut segmentation algorithm with spatial regularization priority and adaptive similarity matrix. We integrate the well-known expectation-maximum(EM) method in statistics and the regularization technique in partial differential equation (PDE) method into normalized cut (Ncut). The introduced EM technique makes our method can adaptively update the similarity matrix, which can help us to get a better classification criterion than the classical Ncut method. While the regularization priority can guarantee the proposed algorithm has a robust performance under noise. To unify the three totally different methods including EM, spatial regularization, and spectral graph clustering, we built a variational framework to combine them and get a general normalized cut segmentation algorithm. The well-defined theory of the proposed model is also given in the paper. Compared with some existing spectral clustering methods such as the traditional Ncut algorithm and the variational based Chan-Vese model, the numerical experiments show that our methods can achieve promising segmentation performance.

  Click for Model/Code and Paper
Deep Reinforcement Learning for Sponsored Search Real-time Bidding

Mar 01, 2018
Jun Zhao, Guang Qiu, Ziyu Guan, Wei Zhao, Xiaofei He

Bidding optimization is one of the most critical problems in online advertising. Sponsored search (SS) auction, due to the randomness of user query behavior and platform nature, usually adopts keyword-level bidding strategies. In contrast, the display advertising (DA), as a relatively simpler scenario for auction, has taken advantage of real-time bidding (RTB) to boost the performance for advertisers. In this paper, we consider the RTB problem in sponsored search auction, named SS-RTB. SS-RTB has a much more complex dynamic environment, due to stochastic user query behavior and more complex bidding policies based on multiple keywords of an ad. Most previous methods for DA cannot be applied. We propose a reinforcement learning (RL) solution for handling the complex dynamic environment. Although some RL methods have been proposed for online advertising, they all fail to address the "environment changing" problem: the state transition probabilities vary between two days. Motivated by the observation that auction sequences of two days share similar transition patterns at a proper aggregation level, we formulate a robust MDP model at hour-aggregation level of the auction data and propose a control-by-model framework for SS-RTB. Rather than generating bid prices directly, we decide a bidding model for impressions of each hour and perform real-time bidding accordingly. We also extend the method to handle the multi-agent problem. We deployed the SS-RTB system in the e-commerce search auction platform of Alibaba. Empirical experiments of offline evaluation and online A/B test demonstrate the effectiveness of our method.

  Click for Model/Code and Paper
How to Generate a Good Word Embedding?

Jul 20, 2015
Siwei Lai, Kang Liu, Liheng Xu, Jun Zhao

We analyze three critical components of word embedding training: the model, the corpus, and the training parameters. We systematize existing neural-network-based word embedding algorithms and compare them using the same corpus. We evaluate each word embedding in three ways: analyzing its semantic properties, using it as a feature for supervised tasks and using it to initialize neural networks. We also provide several simple guidelines for training word embeddings. First, we discover that corpus domain is more important than corpus size. We recommend choosing a corpus in a suitable domain for the desired task, after that, using a larger corpus yields better results. Second, we find that faster models provide sufficient performance in most cases, and more complex models can be used if the training corpus is sufficiently large. Third, the early stopping metric for iterating should rely on the development set of the desired task rather than the validation loss of training embedding.

  Click for Model/Code and Paper