Models, code, and papers for "Kun Han":

Adversarial Multi-Binary Neural Network for Multi-class Classification

Mar 25, 2020
Haiyang Xu, Junwen Chen, Kun Han, Xiangang Li

Multi-class text classification is one of the key problems in machine learning and natural language processing. Emerging neural networks deal with the problem using a multi-output softmax layer and achieve substantial progress, but they do not explicitly learn the correlation among classes. In this paper, we use a multi-task framework to address multi-class classification, where a multi-class classifier and multiple binary classifiers are trained together. Moreover, we employ adversarial training to distinguish the class-specific features and the class-agnostic features. The model benefits from better feature representation. We conduct experiments on two large-scale multi-class text classification tasks and demonstrate that the proposed architecture outperforms baseline approaches.


  Access Model/Code and Paper
Efficient Superimposition Recovering Algorithm

Nov 19, 2012
Han Li, Kun Gai, Pinghua Gong, Changshui Zhang

In this article, we address the issue of recovering latent transparent layers from superimposition images. Here, we assume we have the estimated transformations and extracted gradients of latent layers. To rapidly recover high-quality image layers, we propose an Efficient Superimposition Recovering Algorithm (ESRA) by extending the framework of accelerated gradient method. In addition, a key building block (in each iteration) in our proposed method is the proximal operator calculating. Here we propose to employ a dual approach and present our Parallel Algorithm with Constrained Total Variation (PACTV) method. Our recovering method not only reconstructs high-quality layers without color-bias problem, but also theoretically guarantees good convergence performance.


  Access Model/Code and Paper
Learning Syntactic and Dynamic Selective Encoding for Document Summarization

Mar 25, 2020
Haiyang Xu, Yahao He, Kun Han, Junwen Chen, Xiangang Li

Text summarization aims to generate a headline or a short summary consisting of the major information of the source text. Recent studies employ the sequence-to-sequence framework to encode the input with a neural network and generate abstractive summary. However, most studies feed the encoder with the semantic word embedding but ignore the syntactic information of the text. Further, although previous studies proposed the selective gate to control the information flow from the encoder to the decoder, it is static during the decoding and cannot differentiate the information based on the decoder states. In this paper, we propose a novel neural architecture for document summarization. Our approach has the following contributions: first, we incorporate syntactic information such as constituency parsing trees into the encoding sequence to learn both the semantic and syntactic information from the document, resulting in more accurate summary; second, we propose a dynamic gate network to select the salient information based on the context of the decoder state, which is essential to document summarization. The proposed model has been evaluated on CNN/Daily Mail summarization datasets and the experimental results show that the proposed approach outperforms baseline approaches.

* IJCNN 2019 

  Access Model/Code and Paper
HEMlets PoSh: Learning Part-Centric Heatmap Triplets for 3D Human Pose and Shape Estimation

Mar 10, 2020
Kun Zhou, Xiaoguang Han, Nianjuan Jiang, Kui Jia, Jiangbo Lu

Estimating 3D human pose from a single image is a challenging task. This work attempts to address the uncertainty of lifting the detected 2D joints to the 3D space by introducing an intermediate state-Part-Centric Heatmap Triplets (HEMlets), which shortens the gap between the 2D observation and the 3D interpretation. The HEMlets utilize three joint-heatmaps to represent the relative depth information of the end-joints for each skeletal body part. In our approach, a Convolutional Network (ConvNet) is first trained to predict HEMlets from the input image, followed by a volumetric joint-heatmap regression. We leverage on the integral operation to extract the joint locations from the volumetric heatmaps, guaranteeing end-to-end learning. Despite the simplicity of the network design, the quantitative comparisons show a significant performance improvement over the best-of-grade methods (e.g. $20\%$ on Human3.6M). The proposed method naturally supports training with "in-the-wild" images, where only weakly-annotated relative depth information of skeletal joints is available. This further improves the generalization ability of our model, as validated by qualitative comparisons on outdoor images. Leveraging the strength of the HEMlets pose estimation, we further design and append a shallow yet effective network module to regress the SMPL parameters of the body pose and shape. We term the entire HEMlets-based human pose and shape recovery pipeline HEMlets PoSh. Extensive quantitative and qualitative experiments on the existing human body recovery benchmarks justify the state-of-the-art results obtained with our HEMlets PoSh approach.

* 14 pages, 14figures, Externed version of HEMlets Pose [ICCV2019]. arXiv admin note: substantial text overlap with arXiv:1910.12032 

  Access Model/Code and Paper
HEMlets Pose: Learning Part-Centric Heatmap Triplets for Accurate 3D Human Pose Estimation

Oct 26, 2019
Kun Zhou, Xiaoguang Han, Nianjuan Jiang, Kui Jia, Jiangbo Lu

Estimating 3D human pose from a single image is a challenging task. This work attempts to address the uncertainty of lifting the detected 2D joints to the 3D space by introducing an intermediate state - Part-Centric Heatmap Triplets (HEMlets), which shortens the gap between the 2D observation and the 3D interpretation. The HEMlets utilize three joint-heatmaps to represent the relative depth information of the end-joints for each skeletal body part. In our approach, a Convolutional Network (ConvNet) is first trained to predict HEMlests from the input image, followed by a volumetric joint-heatmap regression. We leverage on the integral operation to extract the joint locations from the volumetric heatmaps, guaranteeing end-to-end learning. Despite the simplicity of the network design, the quantitative comparisons show a significant performance improvement over the best-of-grade method (by 20% on Human3.6M). The proposed method naturally supports training with "in-the-wild" images, where only weakly-annotated relative depth information of skeletal joints is available. This further improves the generalization ability of our model, as validated by qualitative comparisons on outdoor images.

* 10 pages, 6 figures, to be presented at ICCV 2019 

  Access Model/Code and Paper
Multiple instance dense connected convolution neural network for aerial image scene classification

Aug 22, 2019
Qi Bi, Kun Qin, Zhili Li, Han Zhang, Kai Xu

With the development of deep learning, many state-of-the-art natural image scene classification methods have demonstrated impressive performance. While the current convolution neural network tends to extract global features and global semantic information in a scene, the geo-spatial objects can be located at anywhere in an aerial image scene and their spatial arrangement tends to be more complicated. One possible solution is to preserve more local semantic information and enhance feature propagation. In this paper, an end to end multiple instance dense connected convolution neural network (MIDCCNN) is proposed for aerial image scene classification. First, a 23 layer dense connected convolution neural network (DCCNN) is built and served as a backbone to extract convolution features. It is capable of preserving middle and low level convolution features. Then, an attention based multiple instance pooling is proposed to highlight the local semantics in an aerial image scene. Finally, we minimize the loss between the bag-level predictions and the ground truth labels so that the whole framework can be trained directly. Experiments on three aerial image datasets demonstrate that our proposed methods can outperform current baselines by a large margin.

* 5 pages,3 figures, a conference paper accepted by IEEE ICIP 2019 

  Access Model/Code and Paper
Learning Tree-based Deep Model for Recommender Systems

Nov 01, 2018
Han Zhu, Xiang Li, Pengye Zhang, Guozheng Li, Jie He, Han Li, Kun Gai

Model-based methods for recommender systems have been studied extensively in recent years. In systems with large corpus, however, the calculation cost for the learnt model to predict all user-item preferences is tremendous, which makes full corpus retrieval extremely difficult. To overcome the calculation barriers, models such as matrix factorization resort to inner product form (i.e., model user-item preference as the inner product of user, item latent factors) and indexes to facilitate efficient approximate k-nearest neighbor searches. However, it still remains challenging to incorporate more expressive interaction forms between user and item features, e.g., interactions through deep neural networks, because of the calculation cost. In this paper, we focus on the problem of introducing arbitrary advanced models to recommender systems with large corpus. We propose a novel tree-based method which can provide logarithmic complexity w.r.t. corpus size even with more expressive models such as deep neural networks. Our main idea is to predict user interests from coarse to fine by traversing tree nodes in a top-down fashion and making decisions for each user-node pair. We also show that the tree structure can be jointly learnt towards better compatibility with users' interest distribution and hence facilitate both training and prediction. Experimental evaluations with two large-scale real-world datasets show that the proposed method significantly outperforms traditional methods. Online A/B test results in Taobao display advertising platform also demonstrate the effectiveness of the proposed method in production environments.

* Accepted by KDD 2018 

  Access Model/Code and Paper
Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction

Apr 18, 2017
Kun Gai, Xiaoqiang Zhu, Han Li, Kai Liu, Zhe Wang

CTR prediction in real-world business is a difficult machine learning problem with large scale nonlinear sparse data. In this paper, we introduce an industrial strength solution with model named Large Scale Piece-wise Linear Model (LS-PLM). We formulate the learning problem with $L_1$ and $L_{2,1}$ regularizers, leading to a non-convex and non-smooth optimization problem. Then, we propose a novel algorithm to solve it efficiently, based on directional derivatives and quasi-Newton method. In addition, we design a distributed system which can run on hundreds of machines parallel and provides us with the industrial scalability. LS-PLM model can capture nonlinear patterns from massive sparse data, saving us from heavy feature engineering jobs. Since 2012, LS-PLM has become the main CTR prediction model in Alibaba's online display advertising system, serving hundreds of millions users every day.


  Access Model/Code and Paper
Building change detection based on multi-scale filtering and grid partition

Aug 22, 2019
Qi Bi, Kun Qin, Han Zhang, Wenjun Han, Zhili Li, Kai Xu

Building change detection is of great significance in high resolution remote sensing applications. Multi-index learning, one of the state-of-the-art building change detection methods, still has drawbacks like incapability to find change types directly and heavy computation consumption of MBI. In this paper, a two-stage building change detection method is proposed to address these problems. In the first stage, a multi-scale filtering building index (MFBI) is calculated to detect building areas in each temporal with fast speed and moderate accuracy. In the second stage, images and the corresponding building maps are partitioned into grids. In each grid, the ratio of building areas in time T2 and time T1 is calculated. Each grid is classified into one of the three change patterns, i.e., significantly increase, significantly decrease and approximately unchanged. Exhaustive experiments indicate that the proposed method can detect building change types directly and outperform the current multi-index learning method.

* 10th IAPR Workshop on Pattern Recognition in Remote Sensing (PRRS),2018,1-6 
* 8 pages, 6 figures, conference paper 

  Access Model/Code and Paper
On Learning Invariant Representation for Domain Adaptation

Jan 27, 2019
Han Zhao, Remi Tachet des Combes, Kun Zhang, Geoffrey J. Gordon

Due to the ability of deep neural nets to learn rich representations, recent advances in unsupervised domain adaptation have focused on learning domain-invariant features that achieve a small error on the source domain. The hope is that the learnt representation, together with the hypothesis learnt from the source domain, can generalize to the target domain. In this paper, we first construct a simple counterexample showing that, contrary to common belief, the above conditions are not sufficient to guarantee successful domain adaptation. In particular, the counterexample (Fig. 1) exhibits \emph{conditional shift}: the class-conditional distributions of input features change between source and target domains. To give a sufficient condition for domain adaptation, we propose a natural and interpretable generalization upper bound that explicitly takes into account the aforementioned shift. Moreover, we shed new light on the problem by proving an information-theoretic lower bound on the joint error of \emph{any} domain adaptation method that attempts to learn invariant representations. Our result characterizes a fundamental tradeoff between learning invariant representations and achieving small joint error on both domains when the marginal label distributions differ from source to target. Finally, we conduct experiments on real-world datasets that corroborate our theoretical findings. We believe these insights are helpful in guiding the future design of domain adaptation and representation learning algorithms.


  Access Model/Code and Paper
Scaling Gaussian Process Regression with Derivatives

Oct 29, 2018
David Eriksson, Kun Dong, Eric Hans Lee, David Bindel, Andrew Gordon Wilson

Gaussian processes (GPs) with derivatives are useful in many applications, including Bayesian optimization, implicit surface reconstruction, and terrain reconstruction. Fitting a GP to function values and derivatives at $n$ points in $d$ dimensions requires linear solves and log determinants with an ${n(d+1) \times n(d+1)}$ positive definite matrix -- leading to prohibitive $\mathcal{O}(n^3d^3)$ computations for standard direct methods. We propose iterative solvers using fast $\mathcal{O}(nd)$ matrix-vector multiplications (MVMs), together with pivoted Cholesky preconditioning that cuts the iterations to convergence by several orders of magnitude, allowing for fast kernel learning and prediction. Our approaches, together with dimensionality reduction, enables Bayesian optimization with derivatives to scale to high-dimensional problems and large evaluation budgets.

* Advances in Neural Information Processing Systems 32 (NIPS), 2018 
* Appears at Advances in Neural Information Processing Systems 32 (NIPS), 2018 

  Access Model/Code and Paper
Selective Attention Encoders by Syntactic Graph Convolutional Networks for Document Summarization

Mar 18, 2020
Haiyang Xu, Yun Wang, Kun Han, Baochang Ma, Junwen Chen, Xiangang Li

Abstractive text summarization is a challenging task, and one need to design a mechanism to effectively extract salient information from the source text and then generate a summary. A parsing process of the source text contains critical syntactic or semantic structures, which is useful to generate more accurate summary. However, modeling a parsing tree for text summarization is not trivial due to its non-linear structure and it is harder to deal with a document that includes multiple sentences and their parsing trees. In this paper, we propose to use a graph to connect the parsing trees from the sentences in a document and utilize the stacked graph convolutional networks (GCNs) to learn the syntactic representation for a document. The selective attention mechanism is used to extract salient information in semantic and structural aspect and generate an abstractive summary. We evaluate our approach on the CNN/Daily Mail text summarization dataset. The experimental results show that the proposed GCNs based selective attention approach outperforms the baselines and achieves the state-of-the-art performance on the dataset.

* ICASSP 2020 

  Access Model/Code and Paper
Learning Alignment for Multimodal Emotion Recognition from Speech

Sep 06, 2019
Haiyang Xu, Hui Zhang, Kun Han, Yun Wang, Yiping Peng, Xiangang Li

Speech emotion recognition is a challenging problem because human convey emotions in subtle and complex ways. For emotion recognition on human speech, one can either extract emotion related features from audio signals or employ speech recognition techniques to generate text from speech and then apply natural language processing to analyze the sentiment. Further, emotion recognition will be beneficial from using audio-textual multimodal information, it is not trivial to build a system to learn from multimodality. One can build models for two input sources separately and combine them in a decision level, but this method ignores the interaction between speech and text in the temporal domain. In this paper, we propose to use an attention mechanism to learn the alignment between speech frames and text words, aiming to produce more accurate multimodal feature representations. The aligned multimodal features are fed into a sequential model for emotion recognition. We evaluate the approach on the IEMOCAP dataset and the experimental results show the proposed approach achieves the state-of-the-art performance on the dataset.

* InterSpeech 2019 

  Access Model/Code and Paper
Snore-GANs: Improving Automatic Snore Sound Classification with Synthesized Data

Mar 29, 2019
Zixing Zhang, Jing Han, Kun Qian, Christoph Janott, Yanan Guo, Bjoern Schuller

One of the frontier issues that severely hamper the development of automatic snore sound classification (ASSC) associates to the lack of sufficient supervised training data. To cope with this problem, we propose a novel data augmentation approach based on semi-supervised conditional Generative Adversarial Networks (scGANs), which aims to automatically learn a mapping strategy from a random noise space to original data distribution. The proposed approach has the capability of well synthesizing 'realistic' high-dimensional data, while requiring no additional annotation process. To handle the mode collapse problem of GANs, we further introduce an ensemble strategy to enhance the diversity of the generated data. The systematic experiments conducted on a widely used Munich-Passau snore sound corpus demonstrate that the scGANs-based systems can remarkably outperform other classic data augmentation systems, and are also competitive to other recently reported systems for ASSC.

* accepted by IEEE JBHI 

  Access Model/Code and Paper
Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising

Sep 11, 2018
Junqi Jin, Chengru Song, Han Li, Kun Gai, Jun Wang, Weinan Zhang

Real-time advertising allows advertisers to bid for each impression for a visiting user. To optimize specific goals such as maximizing revenue and return on investment (ROI) led by ad placements, advertisers not only need to estimate the relevance between the ads and user's interests, but most importantly require a strategic response with respect to other advertisers bidding in the market. In this paper, we formulate bidding optimization with multi-agent reinforcement learning. To deal with a large number of advertisers, we propose a clustering method and assign each cluster with a strategic bidding agent. A practical Distributed Coordinated Multi-Agent Bidding (DCMAB) has been proposed and implemented to balance the tradeoff between the competition and cooperation among advertisers. The empirical study on our industry-scaled real-world data has demonstrated the effectiveness of our methods. Our results show cluster-based bidding would largely outperform single-agent and bandit approaches, and the coordinated bidding achieves better overall objectives than purely self-interested bidding agents.

* CIKM 2018, Turin, Italy 

  Access Model/Code and Paper
FBI-Pose: Towards Bridging the Gap between 2D Images and 3D Human Poses using Forward-or-Backward Information

Jun 25, 2018
Yulong Shi, Xiaoguang Han, Nianjuan Jiang, Kun Zhou, Kui Jia, Jiangbo Lu

Although significant advances have been made in the area of human poses estimation from images using deep Convolutional Neural Network (ConvNet), it remains a big challenge to perform 3D pose inference in-the-wild. This is due to the difficulty to obtain 3D pose groundtruth for outdoor environments. In this paper, we propose a novel framework to tackle this problem by exploiting the information of each bone indicating if it is forward or backward with respect to the view of the camera(we term it Forwardor-Backward Information abbreviated as FBI). Our method firstly trains a ConvNet with two branches which maps an image of a human to both the 2D joint locations and the FBI of bones. These information is further fed into a deep regression network to predict the 3D positions of joints. To support the training, we also develop an annotation user interface and labeled such FBI for around 12K in-the-wild images which are randomly selected from MPII (a public dataset of 2D pose annotation). Our experimental results on the standard benchmarks demonstrate that our approach outperforms state-of-the-art methods both qualitatively and quantitatively.

* 9 pages, 5 figures 

  Access Model/Code and Paper
Optimized Cost per Click in Taobao Display Advertising

Nov 01, 2018
Han Zhu, Junqi Jin, Chang Tan, Fei Pan, Yifan Zeng, Han Li, Kun Gai

Taobao, as the largest online retail platform in the world, provides billions of online display advertising impressions for millions of advertisers every day. For commercial purposes, the advertisers bid for specific spots and target crowds to compete for business traffic. The platform chooses the most suitable ads to display in tens of milliseconds. Common pricing methods include cost per mille (CPM) and cost per click (CPC). Traditional advertising systems target certain traits of users and ad placements with fixed bids, essentially regarded as coarse-grained matching of bid and traffic quality. However, the fixed bids set by the advertisers competing for different quality requests cannot fully optimize the advertisers' key requirements. Moreover, the platform has to be responsible for the business revenue and user experience. Thus, we proposed a bid optimizing strategy called optimized cost per click (OCPC) which automatically adjusts the bid to achieve finer matching of bid and traffic quality of page view (PV) request granularity. Our approach optimizes advertisers' demands, platform business revenue and user experience and as a whole improves traffic allocation efficiency. We have validated our approach in Taobao display advertising system in production. The online A/B test shows our algorithm yields substantially better results than previous fixed bid manner.

* Accepted by KDD 2017 

  Access Model/Code and Paper
Joint Optimization of Tree-based Index and Deep Model for Recommender Systems

Feb 19, 2019
Han Zhu, Daqing Chang, Ziru Xu, Pengye Zhang, Xiang Li, Jie He, Han Li, Jian Xu, Kun Gai

Large-scale industrial recommender systems are usually confronted with computational problems due to the enormous corpus size. To retrieve and recommend the most relevant items to users under response time limits, resorting to an efficient index structure is an effective and practical solution. Tree-based Deep Model (TDM) for recommendation \cite{zhu2018learning} greatly improves recommendation accuracy using tree index. By indexing items in a tree hierarchy and training a user-node preference prediction model satisfying a max-heap like property in the tree, TDM provides logarithmic computational complexity w.r.t. the corpus size, enabling the use of arbitrary advanced models in candidate retrieval and recommendation. In tree-based recommendation methods, the quality of both the tree index and the trained user preference prediction model determines the recommendation accuracy for the most part. We argue that the learning of tree index and user preference model has interdependence. Our purpose, in this paper, is to develop a method to jointly learn the index structure and user preference prediction model. In our proposed joint optimization framework, the learning of index and user preference prediction model are carried out under a unified performance measure. Besides, we come up with a novel hierarchical user preference representation utilizing the tree index hierarchy. Experimental evaluations with two large-scale real-world datasets show that the proposed method improves recommendation accuracy significantly. Online A/B test results at Taobao display advertising also demonstrate the effectiveness of the proposed method in production environments.


  Access Model/Code and Paper
CaricatureShop: Personalized and Photorealistic Caricature Sketching

Jul 24, 2018
Xiaoguang Han, Kangcheng Hou, Dong Du, Yuda Qiu, Yizhou Yu, Kun Zhou, Shuguang Cui

In this paper, we propose the first sketching system for interactively personalized and photorealistic face caricaturing. Input an image of a human face, the users can create caricature photos by manipulating its facial feature curves. Our system firstly performs exaggeration on the recovered 3D face model according to the edited sketches, which is conducted by assigning the laplacian of each vertex a scaling factor. To construct the mapping between 2D sketches and a vertex-wise scaling field, a novel deep learning architecture is developed. With the obtained 3D caricature model, two images are generated, one obtained by applying 2D warping guided by the underlying 3D mesh deformation and the other obtained by re-rendering the deformed 3D textured model. These two images are then seamlessly integrated to produce our final output. Due to the severely stretching of meshes, the rendered texture is of blurry appearances. A deep learning approach is exploited to infer the missing details for enhancing these blurry regions. Moreover, a relighting operation is invented to further improve the photorealism of the result. Both quantitative and qualitative experiment results validated the efficiency of our sketching system and the superiority of our proposed techniques against existing methods.

* 12 pages,16 figures,submitted to IEEE TVCG 

  Access Model/Code and Paper
Efficient training and design of photonic neural network through neuroevolution

Aug 04, 2019
Tian Zhang, Jia Wang, Yihang Dan, Yuxiang Lanqiu, Jian Dai, Xu Han, Xiaojuan Sun, Kun Xu

Recently, optical neural networks (ONNs) integrated in photonic chips has received extensive attention because they are expected to implement the same pattern recognition tasks in the electronic platforms with high efficiency and low power consumption. However, the current lack of various learning algorithms to train the ONNs obstructs their further development. In this article, we propose a novel learning strategy based on neuroevolution to design and train the ONNs. Two typical neuroevolution algorithms are used to determine the hyper-parameters of the ONNs and to optimize the weights (phase shifters) in the connections. In order to demonstrate the effectiveness of the training algorithms, the trained ONNs are applied in the classification tasks for iris plants dataset, wine recognition dataset and modulation formats recognition. The calculated results exhibit that the training algorithms based on neuroevolution are competitive with other traditional learning algorithms on both accuracy and stability. Compared with previous works, we introduce an efficient training method for the ONNs and demonstrate their broad application prospects in pattern recognition, reinforcement learning and so on.

* 11 pages, 4 figures 

  Access Model/Code and Paper