Models, code, and papers for "Kun Han":

##### Efficient Superimposition Recovering Algorithm

Nov 19, 2012
Han Li, Kun Gai, Pinghua Gong, Changshui Zhang

In this article, we address the issue of recovering latent transparent layers from superimposition images. Here, we assume we have the estimated transformations and extracted gradients of latent layers. To rapidly recover high-quality image layers, we propose an Efficient Superimposition Recovering Algorithm (ESRA) by extending the framework of accelerated gradient method. In addition, a key building block (in each iteration) in our proposed method is the proximal operator calculating. Here we propose to employ a dual approach and present our Parallel Algorithm with Constrained Total Variation (PACTV) method. Our recovering method not only reconstructs high-quality layers without color-bias problem, but also theoretically guarantees good convergence performance.

##### HEMlets Pose: Learning Part-Centric Heatmap Triplets for Accurate 3D Human Pose Estimation

Oct 26, 2019
Kun Zhou, Xiaoguang Han, Nianjuan Jiang, Kui Jia, Jiangbo Lu

Estimating 3D human pose from a single image is a challenging task. This work attempts to address the uncertainty of lifting the detected 2D joints to the 3D space by introducing an intermediate state - Part-Centric Heatmap Triplets (HEMlets), which shortens the gap between the 2D observation and the 3D interpretation. The HEMlets utilize three joint-heatmaps to represent the relative depth information of the end-joints for each skeletal body part. In our approach, a Convolutional Network (ConvNet) is first trained to predict HEMlests from the input image, followed by a volumetric joint-heatmap regression. We leverage on the integral operation to extract the joint locations from the volumetric heatmaps, guaranteeing end-to-end learning. Despite the simplicity of the network design, the quantitative comparisons show a significant performance improvement over the best-of-grade method (by 20% on Human3.6M). The proposed method naturally supports training with "in-the-wild" images, where only weakly-annotated relative depth information of skeletal joints is available. This further improves the generalization ability of our model, as validated by qualitative comparisons on outdoor images.

* 10 pages, 6 figures, to be presented at ICCV 2019
##### Multiple instance dense connected convolution neural network for aerial image scene classification

Aug 22, 2019
Qi Bi, Kun Qin, Zhili Li, Han Zhang, Kai Xu

With the development of deep learning, many state-of-the-art natural image scene classification methods have demonstrated impressive performance. While the current convolution neural network tends to extract global features and global semantic information in a scene, the geo-spatial objects can be located at anywhere in an aerial image scene and their spatial arrangement tends to be more complicated. One possible solution is to preserve more local semantic information and enhance feature propagation. In this paper, an end to end multiple instance dense connected convolution neural network (MIDCCNN) is proposed for aerial image scene classification. First, a 23 layer dense connected convolution neural network (DCCNN) is built and served as a backbone to extract convolution features. It is capable of preserving middle and low level convolution features. Then, an attention based multiple instance pooling is proposed to highlight the local semantics in an aerial image scene. Finally, we minimize the loss between the bag-level predictions and the ground truth labels so that the whole framework can be trained directly. Experiments on three aerial image datasets demonstrate that our proposed methods can outperform current baselines by a large margin.

* 5 pages,3 figures, a conference paper accepted by IEEE ICIP 2019
##### Learning Tree-based Deep Model for Recommender Systems

Nov 01, 2018
Han Zhu, Xiang Li, Pengye Zhang, Guozheng Li, Jie He, Han Li, Kun Gai

Model-based methods for recommender systems have been studied extensively in recent years. In systems with large corpus, however, the calculation cost for the learnt model to predict all user-item preferences is tremendous, which makes full corpus retrieval extremely difficult. To overcome the calculation barriers, models such as matrix factorization resort to inner product form (i.e., model user-item preference as the inner product of user, item latent factors) and indexes to facilitate efficient approximate k-nearest neighbor searches. However, it still remains challenging to incorporate more expressive interaction forms between user and item features, e.g., interactions through deep neural networks, because of the calculation cost. In this paper, we focus on the problem of introducing arbitrary advanced models to recommender systems with large corpus. We propose a novel tree-based method which can provide logarithmic complexity w.r.t. corpus size even with more expressive models such as deep neural networks. Our main idea is to predict user interests from coarse to fine by traversing tree nodes in a top-down fashion and making decisions for each user-node pair. We also show that the tree structure can be jointly learnt towards better compatibility with users' interest distribution and hence facilitate both training and prediction. Experimental evaluations with two large-scale real-world datasets show that the proposed method significantly outperforms traditional methods. Online A/B test results in Taobao display advertising platform also demonstrate the effectiveness of the proposed method in production environments.

* Accepted by KDD 2018
##### Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction

Apr 18, 2017
Kun Gai, Xiaoqiang Zhu, Han Li, Kai Liu, Zhe Wang

CTR prediction in real-world business is a difficult machine learning problem with large scale nonlinear sparse data. In this paper, we introduce an industrial strength solution with model named Large Scale Piece-wise Linear Model (LS-PLM). We formulate the learning problem with $L_1$ and $L_{2,1}$ regularizers, leading to a non-convex and non-smooth optimization problem. Then, we propose a novel algorithm to solve it efficiently, based on directional derivatives and quasi-Newton method. In addition, we design a distributed system which can run on hundreds of machines parallel and provides us with the industrial scalability. LS-PLM model can capture nonlinear patterns from massive sparse data, saving us from heavy feature engineering jobs. Since 2012, LS-PLM has become the main CTR prediction model in Alibaba's online display advertising system, serving hundreds of millions users every day.

##### Building change detection based on multi-scale filtering and grid partition

Aug 22, 2019
Qi Bi, Kun Qin, Han Zhang, Wenjun Han, Zhili Li, Kai Xu

Building change detection is of great significance in high resolution remote sensing applications. Multi-index learning, one of the state-of-the-art building change detection methods, still has drawbacks like incapability to find change types directly and heavy computation consumption of MBI. In this paper, a two-stage building change detection method is proposed to address these problems. In the first stage, a multi-scale filtering building index (MFBI) is calculated to detect building areas in each temporal with fast speed and moderate accuracy. In the second stage, images and the corresponding building maps are partitioned into grids. In each grid, the ratio of building areas in time T2 and time T1 is calculated. Each grid is classified into one of the three change patterns, i.e., significantly increase, significantly decrease and approximately unchanged. Exhaustive experiments indicate that the proposed method can detect building change types directly and outperform the current multi-index learning method.

* 10th IAPR Workshop on Pattern Recognition in Remote Sensing (PRRS),2018,1-6
* 8 pages, 6 figures, conference paper
##### On Learning Invariant Representation for Domain Adaptation

Jan 27, 2019
Han Zhao, Remi Tachet des Combes, Kun Zhang, Geoffrey J. Gordon

Due to the ability of deep neural nets to learn rich representations, recent advances in unsupervised domain adaptation have focused on learning domain-invariant features that achieve a small error on the source domain. The hope is that the learnt representation, together with the hypothesis learnt from the source domain, can generalize to the target domain. In this paper, we first construct a simple counterexample showing that, contrary to common belief, the above conditions are not sufficient to guarantee successful domain adaptation. In particular, the counterexample (Fig. 1) exhibits \emph{conditional shift}: the class-conditional distributions of input features change between source and target domains. To give a sufficient condition for domain adaptation, we propose a natural and interpretable generalization upper bound that explicitly takes into account the aforementioned shift. Moreover, we shed new light on the problem by proving an information-theoretic lower bound on the joint error of \emph{any} domain adaptation method that attempts to learn invariant representations. Our result characterizes a fundamental tradeoff between learning invariant representations and achieving small joint error on both domains when the marginal label distributions differ from source to target. Finally, we conduct experiments on real-world datasets that corroborate our theoretical findings. We believe these insights are helpful in guiding the future design of domain adaptation and representation learning algorithms.

##### Scaling Gaussian Process Regression with Derivatives

Gaussian processes (GPs) with derivatives are useful in many applications, including Bayesian optimization, implicit surface reconstruction, and terrain reconstruction. Fitting a GP to function values and derivatives at $n$ points in $d$ dimensions requires linear solves and log determinants with an ${n(d+1) \times n(d+1)}$ positive definite matrix -- leading to prohibitive $\mathcal{O}(n^3d^3)$ computations for standard direct methods. We propose iterative solvers using fast $\mathcal{O}(nd)$ matrix-vector multiplications (MVMs), together with pivoted Cholesky preconditioning that cuts the iterations to convergence by several orders of magnitude, allowing for fast kernel learning and prediction. Our approaches, together with dimensionality reduction, enables Bayesian optimization with derivatives to scale to high-dimensional problems and large evaluation budgets.

* Advances in Neural Information Processing Systems 32 (NIPS), 2018
* Appears at Advances in Neural Information Processing Systems 32 (NIPS), 2018
##### Learning Alignment for Multimodal Emotion Recognition from Speech

Sep 06, 2019
Haiyang Xu, Hui Zhang, Kun Han, Yun Wang, Yiping Peng, Xiangang Li

Speech emotion recognition is a challenging problem because human convey emotions in subtle and complex ways. For emotion recognition on human speech, one can either extract emotion related features from audio signals or employ speech recognition techniques to generate text from speech and then apply natural language processing to analyze the sentiment. Further, emotion recognition will be beneficial from using audio-textual multimodal information, it is not trivial to build a system to learn from multimodality. One can build models for two input sources separately and combine them in a decision level, but this method ignores the interaction between speech and text in the temporal domain. In this paper, we propose to use an attention mechanism to learn the alignment between speech frames and text words, aiming to produce more accurate multimodal feature representations. The aligned multimodal features are fed into a sequential model for emotion recognition. We evaluate the approach on the IEMOCAP dataset and the experimental results show the proposed approach achieves the state-of-the-art performance on the dataset.

* InterSpeech 2019
##### Snore-GANs: Improving Automatic Snore Sound Classification with Synthesized Data

Mar 29, 2019
Zixing Zhang, Jing Han, Kun Qian, Christoph Janott, Yanan Guo, Bjoern Schuller

One of the frontier issues that severely hamper the development of automatic snore sound classification (ASSC) associates to the lack of sufficient supervised training data. To cope with this problem, we propose a novel data augmentation approach based on semi-supervised conditional Generative Adversarial Networks (scGANs), which aims to automatically learn a mapping strategy from a random noise space to original data distribution. The proposed approach has the capability of well synthesizing 'realistic' high-dimensional data, while requiring no additional annotation process. To handle the mode collapse problem of GANs, we further introduce an ensemble strategy to enhance the diversity of the generated data. The systematic experiments conducted on a widely used Munich-Passau snore sound corpus demonstrate that the scGANs-based systems can remarkably outperform other classic data augmentation systems, and are also competitive to other recently reported systems for ASSC.

* accepted by IEEE JBHI
##### Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising

Sep 11, 2018
Junqi Jin, Chengru Song, Han Li, Kun Gai, Jun Wang, Weinan Zhang

Real-time advertising allows advertisers to bid for each impression for a visiting user. To optimize specific goals such as maximizing revenue and return on investment (ROI) led by ad placements, advertisers not only need to estimate the relevance between the ads and user's interests, but most importantly require a strategic response with respect to other advertisers bidding in the market. In this paper, we formulate bidding optimization with multi-agent reinforcement learning. To deal with a large number of advertisers, we propose a clustering method and assign each cluster with a strategic bidding agent. A practical Distributed Coordinated Multi-Agent Bidding (DCMAB) has been proposed and implemented to balance the tradeoff between the competition and cooperation among advertisers. The empirical study on our industry-scaled real-world data has demonstrated the effectiveness of our methods. Our results show cluster-based bidding would largely outperform single-agent and bandit approaches, and the coordinated bidding achieves better overall objectives than purely self-interested bidding agents.

* CIKM 2018, Turin, Italy
##### FBI-Pose: Towards Bridging the Gap between 2D Images and 3D Human Poses using Forward-or-Backward Information

Jun 25, 2018
Yulong Shi, Xiaoguang Han, Nianjuan Jiang, Kun Zhou, Kui Jia, Jiangbo Lu

Although significant advances have been made in the area of human poses estimation from images using deep Convolutional Neural Network (ConvNet), it remains a big challenge to perform 3D pose inference in-the-wild. This is due to the difficulty to obtain 3D pose groundtruth for outdoor environments. In this paper, we propose a novel framework to tackle this problem by exploiting the information of each bone indicating if it is forward or backward with respect to the view of the camera(we term it Forwardor-Backward Information abbreviated as FBI). Our method firstly trains a ConvNet with two branches which maps an image of a human to both the 2D joint locations and the FBI of bones. These information is further fed into a deep regression network to predict the 3D positions of joints. To support the training, we also develop an annotation user interface and labeled such FBI for around 12K in-the-wild images which are randomly selected from MPII (a public dataset of 2D pose annotation). Our experimental results on the standard benchmarks demonstrate that our approach outperforms state-of-the-art methods both qualitatively and quantitatively.

* 9 pages, 5 figures
##### Optimized Cost per Click in Taobao Display Advertising

Nov 01, 2018
Han Zhu, Junqi Jin, Chang Tan, Fei Pan, Yifan Zeng, Han Li, Kun Gai

* Accepted by KDD 2017
##### Joint Optimization of Tree-based Index and Deep Model for Recommender Systems

Feb 19, 2019
Han Zhu, Daqing Chang, Ziru Xu, Pengye Zhang, Xiang Li, Jie He, Han Li, Jian Xu, Kun Gai

Large-scale industrial recommender systems are usually confronted with computational problems due to the enormous corpus size. To retrieve and recommend the most relevant items to users under response time limits, resorting to an efficient index structure is an effective and practical solution. Tree-based Deep Model (TDM) for recommendation \cite{zhu2018learning} greatly improves recommendation accuracy using tree index. By indexing items in a tree hierarchy and training a user-node preference prediction model satisfying a max-heap like property in the tree, TDM provides logarithmic computational complexity w.r.t. the corpus size, enabling the use of arbitrary advanced models in candidate retrieval and recommendation. In tree-based recommendation methods, the quality of both the tree index and the trained user preference prediction model determines the recommendation accuracy for the most part. We argue that the learning of tree index and user preference model has interdependence. Our purpose, in this paper, is to develop a method to jointly learn the index structure and user preference prediction model. In our proposed joint optimization framework, the learning of index and user preference prediction model are carried out under a unified performance measure. Besides, we come up with a novel hierarchical user preference representation utilizing the tree index hierarchy. Experimental evaluations with two large-scale real-world datasets show that the proposed method improves recommendation accuracy significantly. Online A/B test results at Taobao display advertising also demonstrate the effectiveness of the proposed method in production environments.

##### CaricatureShop: Personalized and Photorealistic Caricature Sketching

Jul 24, 2018
Xiaoguang Han, Kangcheng Hou, Dong Du, Yuda Qiu, Yizhou Yu, Kun Zhou, Shuguang Cui

In this paper, we propose the first sketching system for interactively personalized and photorealistic face caricaturing. Input an image of a human face, the users can create caricature photos by manipulating its facial feature curves. Our system firstly performs exaggeration on the recovered 3D face model according to the edited sketches, which is conducted by assigning the laplacian of each vertex a scaling factor. To construct the mapping between 2D sketches and a vertex-wise scaling field, a novel deep learning architecture is developed. With the obtained 3D caricature model, two images are generated, one obtained by applying 2D warping guided by the underlying 3D mesh deformation and the other obtained by re-rendering the deformed 3D textured model. These two images are then seamlessly integrated to produce our final output. Due to the severely stretching of meshes, the rendered texture is of blurry appearances. A deep learning approach is exploited to infer the missing details for enhancing these blurry regions. Moreover, a relighting operation is invented to further improve the photorealism of the result. Both quantitative and qualitative experiment results validated the efficiency of our sketching system and the superiority of our proposed techniques against existing methods.

* 12 pages,16 figures,submitted to IEEE TVCG
##### Efficient training and design of photonic neural network through neuroevolution

Aug 04, 2019
Tian Zhang, Jia Wang, Yihang Dan, Yuxiang Lanqiu, Jian Dai, Xu Han, Xiaojuan Sun, Kun Xu

Recently, optical neural networks (ONNs) integrated in photonic chips has received extensive attention because they are expected to implement the same pattern recognition tasks in the electronic platforms with high efficiency and low power consumption. However, the current lack of various learning algorithms to train the ONNs obstructs their further development. In this article, we propose a novel learning strategy based on neuroevolution to design and train the ONNs. Two typical neuroevolution algorithms are used to determine the hyper-parameters of the ONNs and to optimize the weights (phase shifters) in the connections. In order to demonstrate the effectiveness of the training algorithms, the trained ONNs are applied in the classification tasks for iris plants dataset, wine recognition dataset and modulation formats recognition. The calculated results exhibit that the training algorithms based on neuroevolution are competitive with other traditional learning algorithms on both accuracy and stability. Compared with previous works, we introduce an efficient training method for the ONNs and demonstrate their broad application prospects in pattern recognition, reinforcement learning and so on.

* 11 pages, 4 figures
##### Adversarial 3D Human Pose Estimation via Multimodal Depth Supervision

In this paper, a novel deep-learning based framework is proposed to infer 3D human poses from a single image. Specifically, a two-phase approach is developed. We firstly utilize a generator with two branches for the extraction of explicit and implicit depth information respectively. During the training process, an adversarial scheme is also employed to further improve the performance. The implicit and explicit depth information with the estimated 2D joints generated by a widely used estimator, in the second step, are together fed into a deep 3D pose regressor for the final pose generation. Our method achieves MPJPE of 58.68mm on the ECCV2018 3D Human Pose Estimation Challenge.

##### Multi-Stage Temporal Difference Learning for 2048-like Games

Szubert and Jaskowski successfully used temporal difference (TD) learning together with n-tuple networks for playing the game 2048. However, we observed a phenomenon that the programs based on TD learning still hardly reach large tiles. In this paper, we propose multi-stage TD (MS-TD) learning, a kind of hierarchical reinforcement learning method, to effectively improve the performance for the rates of reaching large tiles, which are good metrics to analyze the strength of 2048 programs. Our experiments showed significant improvements over the one without using MS-TD learning. Namely, using 3-ply expectimax search, the program with MS-TD learning reached 32768-tiles with a rate of 18.31%, while the one with TD learning did not reach any. After further tuned, our 2048 program reached 32768-tiles with a rate of 31.75% in 10,000 games, and one among these games even reached a 65536-tile, which is the first ever reaching a 65536-tile to our knowledge. In addition, MS-TD learning method can be easily applied to other 2048-like games, such as Threes. Based on MS-TD learning, our experiments for Threes also demonstrated similar performance improvement, where the program with MS-TD learning reached 6144-tiles with a rate of 7.83%, while the one with TD learning only reached 0.45%.

* The version has been accepted by TCIAIG (The first version was sent on 23, October, 2015)
##### Deep Interest Network for Click-Through Rate Prediction

Click-through rate prediction is an essential task in industrial applications, such as online advertising. Recently deep learning based models have been proposed, which follow a similar Embedding\&MLP paradigm. In these methods large scale sparse input features are first mapped into low dimensional embedding vectors, and then transformed into fixed-length vectors in a group-wise manner, finally concatenated together to fed into a multilayer perceptron (MLP) to learn the nonlinear relations among features. In this way, user features are compressed into a fixed-length representation vector, in regardless of what candidate ads are. The use of fixed-length vector will be a bottleneck, which brings difficulty for Embedding\&MLP methods to capture user's diverse interests effectively from rich historical behaviors. In this paper, we propose a novel model: Deep Interest Network (DIN) which tackles this challenge by designing a local activation unit to adaptively learn the representation of user interests from historical behaviors with respect to a certain ad. This representation vector varies over different ads, improving the expressive ability of model greatly. Besides, we develop two techniques: mini-batch aware regularization and data adaptive activation function which can help training industrial deep networks with hundreds of millions of parameters. Experiments on two public datasets as well as an Alibaba real production dataset with over 2 billion samples demonstrate the effectiveness of proposed approaches, which achieve superior performance compared with state-of-the-art methods. DIN now has been successfully deployed in the online display advertising system in Alibaba, serving the main traffic.

* Accepted by KDD 2018
##### Learning to Advertise for Organic Traffic Maximization in E-Commerce Product Feeds

Aug 19, 2019
Dagui Chen, Junqi Jin, Weinan Zhang, Fei Pan, Lvyin Niu, Chuan Yu, Jun Wang, Han Li, Jian Xu, Kun Gai

Most e-commerce product feeds provide blended results of advertised products and recommended products to consumers. The underlying advertising and recommendation platforms share similar if not exactly the same set of candidate products. Consumers' behaviors on the advertised results constitute part of the recommendation model's training data and therefore can influence the recommended results. We refer to this process as Leverage. Considering this mechanism, we propose a novel perspective that advertisers can strategically bid through the advertising platform to optimize their recommended organic traffic. By analyzing the real-world data, we first explain the principles of Leverage mechanism, i.e., the dynamic models of Leverage. Then we introduce a novel Leverage optimization problem and formulate it with a Markov Decision Process. To deal with the sample complexity challenge in model-free reinforcement learning, we propose a novel Hybrid Training Leverage Bidding (HTLB) algorithm which combines the real-world samples and the emulator-generated samples to boost the learning speed and stability. Our offline experiments as well as the results from the online deployment demonstrate the superior performance of our approach.

* accepted by CIKM2019