Research papers and code for "Yang Zhang":
Hashtags in online social networks have gained tremendous popularity during the past five years. The resulting large quantity of data has provided a new lens into modern society. Previously, researchers mainly rely on data collected from Twitter to study either a certain type of hashtags or a certain property of hashtags. In this paper, we perform the first large-scale empirical analysis of hashtags shared on Instagram, the major platform for hashtag-sharing. We study hashtags from three different dimensions including the temporal-spatial dimension, the semantic dimension, and the social dimension. Extensive experiments performed on three large-scale datasets with more than 7 million hashtags in total provide a series of interesting observations. First, we show that the temporal patterns of hashtags can be categorized into four different clusters, and people tend to share fewer hashtags at certain places and more hashtags at others. Second, we observe that a non-negligible proportion of hashtags exhibit large semantic displacement. We demonstrate hashtags that are more uniformly shared among users, as quantified by the proposed hashtag entropy, are less prone to semantic displacement. In the end, we propose a bipartite graph embedding model to summarize users' hashtag profiles, and rely on these profiles to perform friendship prediction. Evaluation results show that our approach achieves an effective prediction with AUC (area under the ROC curve) above 0.8 which demonstrates the strong social signals possessed in hashtags.

* WWW 2019
Click to Read Paper and Get Code
Collecting training data from the physical world is usually time-consuming and even dangerous for fragile robots, and thus, recent advances in robot learning advocate the use of simulators as the training platform. Unfortunately, the reality gap between synthetic and real visual data prohibits direct migration of the models trained in virtual worlds to the real world. This paper proposes a modular architecture for tackling the virtual-to-real problem. The proposed architecture separates the learning model into a perception module and a control policy module, and uses semantic image segmentation as the meta representation for relating these two modules. The perception module translates the perceived RGB image to semantic image segmentation. The control policy module is implemented as a deep reinforcement learning agent, which performs actions based on the translated image segmentation. Our architecture is evaluated in an obstacle avoidance task and a target following task. Experimental results show that our architecture significantly outperforms all of the baseline methods in both virtual and real environments, and demonstrates a faster learning curve than them. We also present a detailed analysis for a variety of variant configurations, and validate the transferability of our modular architecture.

* 7 pages, accepted by IJCAI-18
Click to Read Paper and Get Code
In this paper, a 3D patch-based fully dense and fully convolutional network (FD-FCN) is proposed for fast and accurate segmentation of subcortical structures in T1-weighted magnetic resonance images. Developed from the seminal FCN with an end-to-end learning-based approach and constructed by newly designed dense blocks including a dense fully-connected layer, the proposed FD-FCN is different from other FCN-based methods and leads to an outperformance in the perspective of both efficiency and accuracy. Compared with the U-shaped architecture, FD-FCN discards the upsampling path for model fitness. To alleviate the problem of parameter explosion, the inputs of dense blocks are no longer directly passed to subsequent layers. This architecture of FD-FCN brings a great reduction on both memory and time consumption in training process. Although FD-FCN is slimmed down, in model competence it gains better capability of dense inference than other conventional networks. This benefits from the construction of network architecture and the incorporation of redesigned dense blocks. The multi-scale FD-FCN models both local and global context by embedding intermediate-layer outputs in the final prediction, which encourages consistency between features extracted at different scales and embeds fine-grained information directly in the segmentation process. In addition, dense blocks are rebuilt to enlarge the receptive fields without significantly increasing parameters, and spectral coordinates are exploited for spatial context of the original input patch. The experiments were performed over the IBSR dataset, and FD-FCN produced an accurate segmentation result of overall Dice overlap value of 89.81% for 11 brain structures in 53 seconds, with at least 3.66% absolute improvement of dice accuracy than state-of-the-art 3D FCN-based methods.

Click to Read Paper and Get Code
Stochastic mirror descent (SMD) keeps the advantages of simplicity of implementation, low memory requirement, and low computational complexity. However, the non-convexity of objective function with its non-stationary sampling process is the main bottleneck of applying SMD to reinforcement learning. To address the above problem, we propose the mirror policy optimization (MPO) by estimating the policy gradient via dynamic batch-size of gradient information. Comparing with REINFORCE or VPG, the proposed MPO improves the convergence rate from $\mathcal{O}({{1}/{\sqrt{N}}})$ to $\mathcal{O}({\ln N}/{N})$. We also propose VRMPO algorithm, a variance reduction implementation of MPO. We prove the convergence of VRMPO and show its computational complexity. We evaluate the performance of VRMPO on the MuJoCo continuous control tasks, results show that VRMPO outperforms or matches several state-of-art algorithms DDPG, TRPO, PPO, and TD3.

Click to Read Paper and Get Code
Off-policy learning is powerful for reinforcement learning. However, the high variance of off-policy evaluation is a critical challenge, which causes off-policy learning with function approximation falls into an uncontrolled instability. In this paper, for reducing the variance, we introduce control variate technique to Expected Sarsa($\lambda$) and propose a tabular ES($\lambda$)-CV algorithm. We prove that if a proper estimator of value function reaches, the proposed ES($\lambda$)-CV enjoys a lower variance than Expected Sarsa($\lambda$). Furthermore, to extend ES($\lambda$)-CV to be a convergent algorithm with linear function approximation, we propose the GES($\lambda$) algorithm under the convex-concave saddle-point formulation. We prove that the convergence rate of GES($\lambda$) achieves $\mathcal{O}(1/T)$, which matches or outperforms several state-of-art gradient-based algorithms, but we use a more relaxed step-size. Numerical experiments show that the proposed algorithm is stable and converges faster with lower variance than several state-of-art gradient-based TD learning algorithms: GQ($\lambda$), GTB($\lambda$) and ABQ($\zeta$).

Click to Read Paper and Get Code
The OECD pointed out that the best way to keep students up to school is to intervene as early as possible [1]. Using education big data and deep learning to predict student's score provides new resources and perspectives for early intervention. Previous forecasting schemes often requires manual filter of features , a large amount of prior knowledge and expert knowledge. Deep learning can automatically extract features without manual intervention to achieve better predictive performance. In this paper, the graph neural network matrix filling model (Graph-VAE) based on deep learning can automatically extract features without a large amount of prior knowledge. The experiment proves that our model is better than the traditional solution in the student's score dataset, and it better describes the correlation and difference between the students and the curriculum, and dimensionality reducing the vector of coding result is visualized, the clustering effect is consistent with the real data distribution clustering. In addition, we use gradient-based attribution methods to analyze the key factors that influence performance prediction.

Click to Read Paper and Get Code
This study provides a systematic review of the recent advances in designing the intelligent tutoring robot (ITR), and summarises the status quo of applying artificial intelligence (AI) techniques. We first analyse the environment of the ITR and propose a relationship model for describing interactions of ITR with the students, the social milieu and the curriculum. Then, we transform the relationship model into the perception-planning-action model for exploring what AI techniques are suitable to be applied in the ITR. This article provides insights on promoting human-robot teaching-learning process and AI-assisted educational techniques, illustrating the design guidelines and future research perspectives in intelligent tutoring robots.

Click to Read Paper and Get Code
In this paper, we present a novel upsampling framework to enhance the spatial resolution of the depth image. In our framework, the upscaling of a low-resolution depth image is guided by a corresponding intensity images, we formulate it as a cost aggregation problem with the guided filter. However, the guided filter does not make full use of the properties of the depth image. Since depth images have quite sparse gradients, it inspires us to regularize the gradients for improving depth upscaling results. Statistics show a special property of depth images, that is, there is a non-ignorable part of pixels whose horizontal or vertical derivatives are equal to $\pm 1$. Considering this special property, we propose a low gradient regularization method which reduces the penalty for horizontal or vertical derivative $\pm1$. The proposed low gradient regularization is integrated with the guided filter into the depth image upsampling method. Experimental results demonstrate the effectiveness of our proposed approach both qualitatively and quantitatively compared with the state-of-the-art methods.

* 28 pages, 7figures
Click to Read Paper and Get Code
Imbalanced Learning is an important learning algorithm for the classification models, which have enjoyed much popularity on many applications. Typically, imbalanced learning algorithms can be partitioned into two types, i.e., data level approaches and algorithm level approaches. In this paper, the focus is to develop a robust synthetic minority oversampling technique which falls the umbrella of data level approaches. On one hand, we proposed a method to generate synthetic samples in a high dimensional feature space, instead of a linear sampling space. On the other hand, in the proposed imbalanced learning framework, Gaussian Mixture Model is employed to distinguish the outliers from minority class instances and filter out the synthetic majority class instances. Last and more importantly, an adaptive optimization method is proposed to optimize these parameters in sampling process. By doing so, an effectiveness and efficiency imbalanced learning framework is developed.

Click to Read Paper and Get Code
Multi-Task Learning (MTL) is a learning paradigm in machine learning and its aim is to leverage useful information contained in multiple related tasks to help improve the generalization performance of all the tasks. In this paper, we give a survey for MTL. First, we classify different MTL algorithms into several categories, including feature learning approach, low-rank approach, task clustering approach, task relation learning approach, and decomposition approach, and then discuss the characteristics of each approach. In order to improve the performance of learning tasks further, MTL can be combined with other learning paradigms including semi-supervised learning, active learning, unsupervised learning, reinforcement learning, multi-view learning and graphical models. When the number of tasks is large or the data dimensionality is high, batch MTL models are difficult to handle this situation and online, parallel and distributed MTL models as well as dimensionality reduction and feature hashing are reviewed to reveal their computational and storage advantages. Many real-world applications use MTL to boost their performance and we review representative works. Finally, we present theoretical analyses and discuss several future directions for MTL.

Click to Read Paper and Get Code
We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.

* Accepted at ACL 2018 as Long paper
Click to Read Paper and Get Code
This paper describes NCRF++, a toolkit for neural sequence labeling. NCRF++ is designed for quick implementation of different neural sequence labeling models with a CRF inference layer. It provides users with an inference for building the custom model structure through configuration file with flexible neural feature design and utilization. Built on PyTorch, the core operations are calculated in batch, making the toolkit efficient with the acceleration of GPU. It also includes the implementations of most state-of-the-art neural sequence labeling models such as LSTM-CRF, facilitating reproducing and refinement on those methods.

* ACL 2018, demonstration paper
Click to Read Paper and Get Code
In this technique report, we aim to mitigate the overfitting problem of natural language by applying data augmentation methods. Specifically, we attempt several types of noise to perturb the input word embedding, such as Gaussian noise, Bernoulli noise, and adversarial noise, etc. We also apply several constraints on different types of noise. By implementing these proposed data augmentation methods, the baseline models can gain improvements on several sentence classification tasks.

Click to Read Paper and Get Code
Robust PCA is a widely used statistical procedure to recover a underlying low-rank matrix with grossly corrupted observations. This work considers the problem of robust PCA as a nonconvex optimization problem on the manifold of low-rank matrices, and proposes two algorithms (for two versions of retractions) based on manifold optimization. It is shown that, with a proper designed initialization, the proposed algorithms are guaranteed to converge to the underlying low-rank matrix linearly. Compared with a previous work based on the Burer-Monterio decomposition of low-rank matrices, the proposed algorithms reduce the dependence on the conditional number of the underlying low-rank matrix theoretically. Simulations and real data examples confirm the competitive performance of our method.

Click to Read Paper and Get Code
Parameters in deep neural networks which are trained on large-scale databases can generalize across multiple domains, which is referred as "transferability". Unfortunately, the transferability is usually defined as discrete states and it differs with domains and network architectures. Existing works usually heuristically apply parameter-sharing or fine-tuning, and there is no principled approach to learn a parameter transfer strategy. To address the gap, a parameter transfer unit (PTU) is proposed in this paper. The PTU learns a fine-grained nonlinear combination of activations from both the source and the target domain networks, and subsumes hand-crafted discrete transfer states. In the PTU, the transferability is controlled by two gates which are artificial neurons and can be learned from data. The PTU is a general and flexible module which can be used in both CNNs and RNNs. Experiments are conducted with various network architectures and multiple transfer domain pairs. Results demonstrate the effectiveness of the PTU as it outperforms heuristic parameter-sharing and fine-tuning in most settings.

Click to Read Paper and Get Code
In Cassini ISS (Imaging Science Subsystem) images, contour detection is often performed on disk-resolved object to accurately locate their center. Thus, the contour detection is a key problem. Traditional edge detection methods, such as Canny and Roberts, often extract the contour with too much interior details and noise. Although the deep convolutional neural network has been applied successfully in many image tasks, such as classification and object detection, it needs more time and computer resources. In the paper, a contour detection algorithm based on H-ELM (Hierarchical Extreme Learning Machine) and DenseCRF (Dense Conditional Random Field) is proposed for Cassini ISS images. The experimental results show that this algorithm's performance is better than both traditional machine learning methods such as SVM, ELM and even deep convolutional neural network. And the extracted contour is closer to the actual contour. Moreover, it can be trained and tested quickly on the general configuration of PC, so can be applied to contour detection for Cassini ISS images.

Click to Read Paper and Get Code
We extend the state-of-the-art Cascade R-CNN with a simple feature sharing mechanism. Our approach focuses on the performance increases on high IoU but decreases on low IoU thresholds--a key problem this detector suffers from. Feature sharing is extremely helpful, our results show that given this mechanism embedded into all stages, we can easily narrow the gap between the last stage and preceding stages on low IoU thresholds without resorting to the commonly used testing ensemble but the network itself. We also observe obvious improvements on all IoU thresholds benefited from feature sharing, and the resulting cascade structure can easily match or exceed its counterparts, only with negligible extra parameters introduced. To push the envelope, we demonstrate 43.2 AP on COCO object detection without any bells and whistles including testing ensemble, surpassing previous Cascade R-CNN by a large margin. Our framework is easy to implement and we hope it can serve as a general and strong baseline for future research.

* BMVC 2019 Camera Ready
Click to Read Paper and Get Code
In this paper, we consider the problem of super-resolution recons-truction. This is a hot topic because super-resolution reconstruction has a wide range of applications in the medical field, remote sensing monitoring, and criminal investigation. Compared with traditional algorithms, the current super-resolution reconstruction algorithm based on deep learning greatly improves the clarity of reconstructed pictures. Existing work like Super-Resolution Using a Generative Adversarial Network (SRGAN) can effectively restore the texture details of the image. However, experimentally verified that the texture details of the image recovered by the SRGAN are not robust. In order to get super-resolution reconstructed images with richer high-frequency details, we improve the network structure and propose a super-resolution reconstruction algorithm combining wavelet transform and Generative Adversarial Network. The proposed algorithm can efficiently reconstruct high-resolution images with rich global information and local texture details. We have trained our model by PyTorch framework and VOC2012 dataset, and tested it by Set5, Set14, BSD100 and Urban100 test datasets.

Click to Read Paper and Get Code
Previous models for blind image quality assessment (BIQA) can only be trained (or fine-tuned) on one subject-rated database due to the difficulty of combining multiple databases with different perceptual scales. As a result, models trained in a well-controlled laboratory environment with synthetic distortions fail to generalize to realistic distortions, whose data distribution is different. Similarly, models optimized for images captured in the wild do not account for images simulated in the laboratory. Here we describe a simple technique of training BIQA models on multiple databases simultaneously without additional subjective testing for scale realignment. Specifically, we first create and combine image pairs within individual databases, whose ground-truth binary labels are computed from the corresponding mean opinion scores, indicating which of the two images is of higher quality. We then train a deep neural network for BIQA by learning-to-rank massive such image pairs. Extensive experiments on six databases demonstrate that our BIQA method based on the proposed learning technique works well for both synthetic and realistic distortions, outperforming existing BIQA models with a single set of model parameters. The generalizability of our method is further verified by group maximum differentiation (gMAD) competition.

Click to Read Paper and Get Code
Exploiting resolution invariant representation is critical for person Re-Identification (ReID) in real applications, where the resolutions of captured person images may vary dramatically. This paper learns person representations robust to resolution variance through jointly training a Foreground-Focus Super-Resolution (FFSR) module and a Resolution-Invariant Feature Extractor (RIFE) by end-to-end CNN learning. FFSR upscales the person foreground using a fully convolutional auto-encoder with skip connections learned with a foreground focus training loss. RIFE adopts two feature extraction streams weighted by a dual-attention block to learn features for low and high resolution images, respectively. These two complementary modules are jointly trained, leading to a strong resolution invariant representation. We evaluate our methods on five datasets containing person images at a large range of resolutions, where our methods show substantial superiority to existing solutions. For instance, we achieve Rank-1 accuracy of 36.4% and 73.3% on CAVIAR and MLR-CUHK03, outperforming the state-of-the art by 2.9% and 2.6%, respectively.

* IJCAI 2019
Click to Read Paper and Get Code