Models, code, and papers for "Yang Zhang":

Language in Our Time: An Empirical Analysis of Hashtags

May 11, 2019
Yang Zhang

Hashtags in online social networks have gained tremendous popularity during the past five years. The resulting large quantity of data has provided a new lens into modern society. Previously, researchers mainly rely on data collected from Twitter to study either a certain type of hashtags or a certain property of hashtags. In this paper, we perform the first large-scale empirical analysis of hashtags shared on Instagram, the major platform for hashtag-sharing. We study hashtags from three different dimensions including the temporal-spatial dimension, the semantic dimension, and the social dimension. Extensive experiments performed on three large-scale datasets with more than 7 million hashtags in total provide a series of interesting observations. First, we show that the temporal patterns of hashtags can be categorized into four different clusters, and people tend to share fewer hashtags at certain places and more hashtags at others. Second, we observe that a non-negligible proportion of hashtags exhibit large semantic displacement. We demonstrate hashtags that are more uniformly shared among users, as quantified by the proposed hashtag entropy, are less prone to semantic displacement. In the end, we propose a bipartite graph embedding model to summarize users' hashtag profiles, and rely on these profiles to perform friendship prediction. Evaluation results show that our approach achieves an effective prediction with AUC (area under the ROC curve) above 0.8 which demonstrates the strong social signals possessed in hashtags.

* WWW 2019 

  Click for Model/Code and Paper
A Causal Perspective to Unbiased Conversion Rate Estimation on Data Missing Not at Random

Oct 16, 2019
Wenhao Zhang, Wentian Bao, Xiao-Yang Liu, Keping Yang, Quan Lin, Hong Wen, Ramin Ramezani

In modern e-commerce and advertising recommender systems, ongoing research works attempt to optimize conversion rate (CVR) estimation, and increase the gross merchandise volume. Even though the state-of-the-art CVR estimators adopt deep learning methods, their model performances are still subject to sample selection bias and data sparsity issues. Conversion labels of exposed items in training dataset are typically missing not at random due to selection bias. Empirically, data sparsity issue causes the performance degradation of model with large parameter space. In this paper, we proposed two causal estimators combined with multi-task learning, and aim to solve sample selection bias (SSB) and data sparsity (DS) issues in conversion rate estimation. The proposed estimators adjust for the MNAR mechanism as if they are trained on a "do dataset" where users are forced to click on all exposed items. We evaluate the causal estimators with billion data samples. Experiment results demonstrate that the proposed CVR estimators outperform other state-of-the-art CVR estimators. In addition, empirical study shows that our methods are cost-effective with large scale dataset.

* 16 pages, 6 figures, 4 tables 

  Click for Model/Code and Paper
Virtual-to-Real: Learning to Control in Visual Semantic Segmentation

Oct 28, 2018
Zhang-Wei Hong, Chen Yu-Ming, Shih-Yang Su, Tzu-Yun Shann, Yi-Hsiang Chang, Hsuan-Kung Yang, Brian Hsi-Lin Ho, Chih-Chieh Tu, Yueh-Chuan Chang, Tsu-Ching Hsiao, Hsin-Wei Hsiao, Sih-Pin Lai, Chun-Yi Lee

Collecting training data from the physical world is usually time-consuming and even dangerous for fragile robots, and thus, recent advances in robot learning advocate the use of simulators as the training platform. Unfortunately, the reality gap between synthetic and real visual data prohibits direct migration of the models trained in virtual worlds to the real world. This paper proposes a modular architecture for tackling the virtual-to-real problem. The proposed architecture separates the learning model into a perception module and a control policy module, and uses semantic image segmentation as the meta representation for relating these two modules. The perception module translates the perceived RGB image to semantic image segmentation. The control policy module is implemented as a deep reinforcement learning agent, which performs actions based on the translated image segmentation. Our architecture is evaluated in an obstacle avoidance task and a target following task. Experimental results show that our architecture significantly outperforms all of the baseline methods in both virtual and real environments, and demonstrates a faster learning curve than them. We also present a detailed analysis for a variety of variant configurations, and validate the transferability of our modular architecture.

* 7 pages, accepted by IJCAI-18 

  Click for Model/Code and Paper
FD-FCN: 3D Fully Dense and Fully Convolutional Network for Semantic Segmentation of Brain Anatomy

Jul 22, 2019
Binbin Yang, Weiwei Zhang

In this paper, a 3D patch-based fully dense and fully convolutional network (FD-FCN) is proposed for fast and accurate segmentation of subcortical structures in T1-weighted magnetic resonance images. Developed from the seminal FCN with an end-to-end learning-based approach and constructed by newly designed dense blocks including a dense fully-connected layer, the proposed FD-FCN is different from other FCN-based methods and leads to an outperformance in the perspective of both efficiency and accuracy. Compared with the U-shaped architecture, FD-FCN discards the upsampling path for model fitness. To alleviate the problem of parameter explosion, the inputs of dense blocks are no longer directly passed to subsequent layers. This architecture of FD-FCN brings a great reduction on both memory and time consumption in training process. Although FD-FCN is slimmed down, in model competence it gains better capability of dense inference than other conventional networks. This benefits from the construction of network architecture and the incorporation of redesigned dense blocks. The multi-scale FD-FCN models both local and global context by embedding intermediate-layer outputs in the final prediction, which encourages consistency between features extracted at different scales and embeds fine-grained information directly in the segmentation process. In addition, dense blocks are rebuilt to enlarge the receptive fields without significantly increasing parameters, and spectral coordinates are exploited for spatial context of the original input patch. The experiments were performed over the IBSR dataset, and FD-FCN produced an accurate segmentation result of overall Dice overlap value of 89.81% for 11 brain structures in 53 seconds, with at least 3.66% absolute improvement of dice accuracy than state-of-the-art 3D FCN-based methods.


  Click for Model/Code and Paper
Policy Optimization with Stochastic Mirror Descent

Jun 25, 2019
Long Yang, Yu Zhang

Stochastic mirror descent (SMD) keeps the advantages of simplicity of implementation, low memory requirement, and low computational complexity. However, the non-convexity of objective function with its non-stationary sampling process is the main bottleneck of applying SMD to reinforcement learning. To address the above problem, we propose the mirror policy optimization (MPO) by estimating the policy gradient via dynamic batch-size of gradient information. Comparing with REINFORCE or VPG, the proposed MPO improves the convergence rate from $\mathcal{O}({{1}/{\sqrt{N}}})$ to $\mathcal{O}({\ln N}/{N})$. We also propose VRMPO algorithm, a variance reduction implementation of MPO. We prove the convergence of VRMPO and show its computational complexity. We evaluate the performance of VRMPO on the MuJoCo continuous control tasks, results show that VRMPO outperforms or matches several state-of-art algorithms DDPG, TRPO, PPO, and TD3.


  Click for Model/Code and Paper
Expected Sarsa($λ$) with Control Variate for Variance Reduction

Jun 25, 2019
Long Yang, Yu Zhang

Off-policy learning is powerful for reinforcement learning. However, the high variance of off-policy evaluation is a critical challenge, which causes off-policy learning with function approximation falls into an uncontrolled instability. In this paper, for reducing the variance, we introduce control variate technique to Expected Sarsa($\lambda$) and propose a tabular ES($\lambda$)-CV algorithm. We prove that if a proper estimator of value function reaches, the proposed ES($\lambda$)-CV enjoys a lower variance than Expected Sarsa($\lambda$). Furthermore, to extend ES($\lambda$)-CV to be a convergent algorithm with linear function approximation, we propose the GES($\lambda$) algorithm under the convex-concave saddle-point formulation. We prove that the convergence rate of GES($\lambda$) achieves $\mathcal{O}(1/T)$, which matches or outperforms several state-of-art gradient-based algorithms, but we use a more relaxed step-size. Numerical experiments show that the proposed algorithm is stable and converges faster with lower variance than several state-of-art gradient-based TD learning algorithms: GQ($\lambda$), GTB($\lambda$) and ABQ($\zeta$).


  Click for Model/Code and Paper
Based on Graph-VAE Model to Predict Student's Score

Mar 08, 2019
Yang Zhang, Mingming Lu

The OECD pointed out that the best way to keep students up to school is to intervene as early as possible [1]. Using education big data and deep learning to predict student's score provides new resources and perspectives for early intervention. Previous forecasting schemes often requires manual filter of features , a large amount of prior knowledge and expert knowledge. Deep learning can automatically extract features without manual intervention to achieve better predictive performance. In this paper, the graph neural network matrix filling model (Graph-VAE) based on deep learning can automatically extract features without a large amount of prior knowledge. The experiment proves that our model is better than the traditional solution in the student's score dataset, and it better describes the correlation and difference between the students and the curriculum, and dimensionality reducing the vector of coding result is visualized, the clustering effect is consistent with the real data distribution clustering. In addition, we use gradient-based attribution methods to analyze the key factors that influence performance prediction.


  Click for Model/Code and Paper
Artificial Intelligence in Intelligent Tutoring Robots: A Systematic Review and Design Guidelines

Feb 26, 2019
Jinyu Yang, Bo Zhang

This study provides a systematic review of the recent advances in designing the intelligent tutoring robot (ITR), and summarises the status quo of applying artificial intelligence (AI) techniques. We first analyse the environment of the ITR and propose a relationship model for describing interactions of ITR with the students, the social milieu and the curriculum. Then, we transform the relationship model into the perception-planning-action model for exploring what AI techniques are suitable to be applied in the ITR. This article provides insights on promoting human-robot teaching-learning process and AI-assisted educational techniques, illustrating the design guidelines and future research perspectives in intelligent tutoring robots.


  Click for Model/Code and Paper
Depth Image Upsampling based on Guided Filter with Low Gradient Minimization

Nov 12, 2018
Hang Yang, Zhongbo Zhang

In this paper, we present a novel upsampling framework to enhance the spatial resolution of the depth image. In our framework, the upscaling of a low-resolution depth image is guided by a corresponding intensity images, we formulate it as a cost aggregation problem with the guided filter. However, the guided filter does not make full use of the properties of the depth image. Since depth images have quite sparse gradients, it inspires us to regularize the gradients for improving depth upscaling results. Statistics show a special property of depth images, that is, there is a non-ignorable part of pixels whose horizontal or vertical derivatives are equal to $\pm 1$. Considering this special property, we propose a low gradient regularization method which reduces the penalty for horizontal or vertical derivative $\pm1$. The proposed low gradient regularization is integrated with the guided filter into the depth image upsampling method. Experimental results demonstrate the effectiveness of our proposed approach both qualitatively and quantitatively compared with the state-of-the-art methods.

* 28 pages, 7figures 

  Click for Model/Code and Paper
G-SMOTE: A GMM-based synthetic minority oversampling technique for imbalanced learning

Oct 24, 2018
Tianlun Zhang, Xi Yang

Imbalanced Learning is an important learning algorithm for the classification models, which have enjoyed much popularity on many applications. Typically, imbalanced learning algorithms can be partitioned into two types, i.e., data level approaches and algorithm level approaches. In this paper, the focus is to develop a robust synthetic minority oversampling technique which falls the umbrella of data level approaches. On one hand, we proposed a method to generate synthetic samples in a high dimensional feature space, instead of a linear sampling space. On the other hand, in the proposed imbalanced learning framework, Gaussian Mixture Model is employed to distinguish the outliers from minority class instances and filter out the synthetic majority class instances. Last and more importantly, an adaptive optimization method is proposed to optimize these parameters in sampling process. By doing so, an effectiveness and efficiency imbalanced learning framework is developed.


  Click for Model/Code and Paper
A Survey on Multi-Task Learning

Jul 27, 2018
Yu Zhang, Qiang Yang

Multi-Task Learning (MTL) is a learning paradigm in machine learning and its aim is to leverage useful information contained in multiple related tasks to help improve the generalization performance of all the tasks. In this paper, we give a survey for MTL. First, we classify different MTL algorithms into several categories, including feature learning approach, low-rank approach, task clustering approach, task relation learning approach, and decomposition approach, and then discuss the characteristics of each approach. In order to improve the performance of learning tasks further, MTL can be combined with other learning paradigms including semi-supervised learning, active learning, unsupervised learning, reinforcement learning, multi-view learning and graphical models. When the number of tasks is large or the data dimensionality is high, batch MTL models are difficult to handle this situation and online, parallel and distributed MTL models as well as dimensionality reduction and feature hashing are reviewed to reveal their computational and storage advantages. Many real-world applications use MTL to boost their performance and we review representative works. Finally, we present theoretical analyses and discuss several future directions for MTL.


  Click for Model/Code and Paper
Chinese NER Using Lattice LSTM

Jul 05, 2018
Yue Zhang, Jie Yang

We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.

* Accepted at ACL 2018 as Long paper 

  Click for Model/Code and Paper
NCRF++: An Open-source Neural Sequence Labeling Toolkit

Jun 17, 2018
Jie Yang, Yue Zhang

This paper describes NCRF++, a toolkit for neural sequence labeling. NCRF++ is designed for quick implementation of different neural sequence labeling models with a CRF inference layer. It provides users with an inference for building the custom model structure through configuration file with flexible neural feature design and utilization. Built on PyTorch, the core operations are calculated in batch, making the toolkit efficient with the acceleration of GPU. It also includes the implementations of most state-of-the-art neural sequence labeling models such as LSTM-CRF, facilitating reproducing and refinement on those methods.

* ACL 2018, demonstration paper 

  Click for Model/Code and Paper
Word Embedding Perturbation for Sentence Classification

Apr 22, 2018
Dongxu Zhang, Zhichao Yang

In this technique report, we aim to mitigate the overfitting problem of natural language by applying data augmentation methods. Specifically, we attempt several types of noise to perturb the input word embedding, such as Gaussian noise, Bernoulli noise, and adversarial noise, etc. We also apply several constraints on different types of noise. By implementing these proposed data augmentation methods, the baseline models can gain improvements on several sentence classification tasks.


  Click for Model/Code and Paper
Robust PCA by Manifold Optimization

Sep 01, 2017
Teng Zhang, Yi Yang

Robust PCA is a widely used statistical procedure to recover a underlying low-rank matrix with grossly corrupted observations. This work considers the problem of robust PCA as a nonconvex optimization problem on the manifold of low-rank matrices, and proposes two algorithms (for two versions of retractions) based on manifold optimization. It is shown that, with a proper designed initialization, the proposed algorithms are guaranteed to converge to the underlying low-rank matrix linearly. Compared with a previous work based on the Burer-Monterio decomposition of low-rank matrices, the proposed algorithms reduce the dependence on the conditional number of the underlying low-rank matrix theoretically. Simulations and real data examples confirm the competitive performance of our method.


  Click for Model/Code and Paper
Multi-channel Speech Separation Using Deep Embedding Model with Multilayer Bootstrap Networks

Oct 24, 2019
Ziye Yang, Xiao-Lei Zhang

Recently, deep clustering (DPCL) based speaker-independent speech separation has drawn much attention, since it needs little speaker prior information. However, it still has much room of improvement, particularly in reverberant environments. If the training and test environments mismatch which is a common case, the embedding vectors produced by DPCL may contain much noise and many small variations. To deal with the problem, we propose a variant of DPCL, named DPCL++, by applying a recent unsupervised deep learning method---multilayer bootstrap networks(MBN)---to further reduce the noise and small variations of the embedding vectors in an unsupervised way in the test stage, which fascinates k-means to produce a good result. MBN builds a gradually narrowed network from bottom-up via a stack of k-centroids clustering ensembles, where the k-centroids clusterings are trained independently by random sampling and one-nearest-neighbor optimization. To further improve the robustness of DPCL++ in reverberant environments, we take spatial features as part of its input. Experimental results demonstrate the effectiveness of the proposed method.


  Click for Model/Code and Paper
Parameter Transfer Unit for Deep Neural Networks

Apr 23, 2018
Yinghua Zhang, Yu Zhang, Qiang Yang

Parameters in deep neural networks which are trained on large-scale databases can generalize across multiple domains, which is referred as "transferability". Unfortunately, the transferability is usually defined as discrete states and it differs with domains and network architectures. Existing works usually heuristically apply parameter-sharing or fine-tuning, and there is no principled approach to learn a parameter transfer strategy. To address the gap, a parameter transfer unit (PTU) is proposed in this paper. The PTU learns a fine-grained nonlinear combination of activations from both the source and the target domain networks, and subsumes hand-crafted discrete transfer states. In the PTU, the transferability is controlled by two gates which are artificial neurons and can be learned from data. The PTU is a general and flexible module which can be used in both CNNs and RNNs. Experiments are conducted with various network architectures and multiple transfer domain pairs. Results demonstrate the effectiveness of the PTU as it outperforms heuristic parameter-sharing and fine-tuning in most settings.


  Click for Model/Code and Paper
A Binary Regression Adaptive Goodness-of-fit Test (BAGofT)

Nov 08, 2019
Jiawei Zhang, Jie Ding, Yuhong Yang

The Pearson's $\chi^2$ test and residual deviance test are two classical goodness-of-fit tests for binary regression models such as logistic regression. These two tests cannot be applied when we have one or more continuous covariates in the data, a quite common situation in practice. In that case, the most widely used approach is the Hosmer-Lemeshow test, which partitions the covariate space into groups according to quantiles of the fitted probabilities from all the observations. However, its grouping scheme is not flexible enough to explore how to adversarially partition the data space in order to enhance the power. In this work, we propose a new methodology, named binary regression adaptive grouping goodness-of-fit test (BAGofT), to address the above concern. It is a two-stage solution where the first stage adaptively selects candidate partitions using "training" data, and the second stage performs $\chi^2$ tests with necessary corrections based on "test" data. A proper data splitting ensures that the test has desirable size and power properties. From our experimental results, BAGofT performs much better than Hosmer-Lemeshow test in many situations.


  Click for Model/Code and Paper
Exploring Hierarchical Interaction Between Review and Summary for Better Sentiment Analysis

Nov 07, 2019
Sen yang, Leyang Cui, Yue Zhang

Sentiment analysis provides a useful overview of customer review contents. Many review websites allow a user to enter a summary in addition to a full review. It has been shown that jointly predicting the review summary and the sentiment rating benefits both tasks. However, these methods consider the integration of review and summary information in an implicit manner, which limits their performance to some extent. In this paper, we propose a hierarchically-refined attention network for better exploiting multi-interaction between a review and its summary for sentiment analysis. In particular, the representation of a review is layer-wise refined by attention over the summary representation. Empirical results show that our model can better make use of user-written summaries for review sentiment analysis, and is also more effective compared to existing methods when the user summary is replaced with summary generated by an automatic summarization system.


  Click for Model/Code and Paper
Neural Embedding Propagation on Heterogeneous Networks

Sep 29, 2019
Carl Yang, Jieyu Zhang, Jiawei Han

Classification is one of the most important problems in machine learning. To address label scarcity, semi-supervised learning (SSL) has been intensively studied over the past two decades, which mainly leverages data affinity modeled by networks. Label propagation (LP), however, as the most popular SSL technique, mostly only works on homogeneous networks with single-typed simple interactions. In this work, we focus on the more general and powerful heterogeneous networks, which accommodate multi-typed objects and links, and thus endure multi-typed complex interactions. Specifically, we propose \textit{neural embedding propagation} (NEP), which leverages distributed embeddings to represent objects and dynamically composed modular networks to model their complex interactions. While generalizing LP as a simple instance, NEP is far more powerful in its natural awareness of different types of objects and links, and the ability to automatically capture their important interaction patterns. Further, we develop a series of efficient training strategies for NEP, leading to its easy deployment on real-world heterogeneous networks with millions of objects. With extensive experiments on three datasets, we comprehensively demonstrate the effectiveness, efficiency, and robustness of NEP compared with state-of-the-art network embedding and SSL algorithms.

* 11 pages, published at ICDM 2019 

  Click for Model/Code and Paper