Models, code, and papers for "Jingjing Xu":

Shallow Discourse Parsing with Maximum Entropy Model

Oct 31, 2017
Jingjing Xu

In recent years, more research has been devoted to studying the subtask of the complete shallow discourse parsing, such as indentifying discourse connective and arguments of connective. There is a need to design a full discourse parser to pull these subtasks together. So we develop a discourse parser turning the free text into discourse relations. The parser includes connective identifier, arguments identifier, sense classifier and non-explicit identifier, which connects with each other in pipeline. Each component applies the maximum entropy model with abundant lexical and syntax features extracted from the Penn Discourse Tree-bank. The head-based representation of the PDTB is adopted in the arguments identifier, which turns the problem of indentifying the arguments of discourse connective into finding the head and end of the arguments. In the non-explicit identifier, the contextual type features like words which have high frequency and can reflect the discourse relation are introduced to improve the performance of non-explicit identifier. Compared with other methods, experimental results achieve the considerable performance.

  Click for Model/Code and Paper
Improving Social Media Text Summarization by Learning Sentence Weight Distribution

Oct 31, 2017
Jingjing Xu

Recently, encoder-decoder models are widely used in social media text summarization. However, these models sometimes select noise words in irrelevant sentences as part of a summary by error, thus declining the performance. In order to inhibit irrelevant sentences and focus on key information, we propose an effective approach by learning sentence weight distribution. In our model, we build a multi-layer perceptron to predict sentence weights. During training, we use the ROUGE score as an alternative to the estimated sentence weight, and try to minimize the gap between estimated weights and predicted weights. In this way, we encourage our model to focus on the key sentences, which have high relevance with the summary. Experimental results show that our approach outperforms baselines on a large-scale social media corpus.

  Click for Model/Code and Paper
Horizontal and Vertical Ensemble with Deep Representation for Classification

Jun 12, 2013
Jingjing Xie, Bing Xu, Zhang Chuang

Representation learning, especially which by using deep learning, has been widely applied in classification. However, how to use limited size of labeled data to achieve good classification performance with deep neural network, and how can the learned features further improve classification remain indefinite. In this paper, we propose Horizontal Voting Vertical Voting and Horizontal Stacked Ensemble methods to improve the classification performance of deep neural networks. In the ICML 2013 Black Box Challenge, via using these methods independently, Bing Xu achieved 3rd in public leaderboard, and 7th in private leaderboard; Jingjing Xie achieved 4th in public leaderboard, and 5th in private leaderboard.

  Click for Model/Code and Paper
Transfer Deep Learning for Low-Resource Chinese Word Segmentation with a Novel Neural Network

Sep 14, 2017
Jingjing Xu, Xu Sun

Recent studies have shown effectiveness in using neural networks for Chinese word segmentation. However, these models rely on large-scale data and are less effective for low-resource datasets because of insufficient training data. We propose a transfer learning method to improve low-resource word segmentation by leveraging high-resource corpora. First, we train a teacher model on high-resource corpora and then use the learned knowledge to initialize a student model. Second, a weighted data similarity method is proposed to train the student model on low-resource data. Experiment results show that our work significantly improves the performance on low-resource datasets: 2.3% and 1.5% F-score on PKU and CTB datasets. Furthermore, this paper achieves state-of-the-art results: 96.1%, and 96.2% F-score on PKU and CTB datasets.

  Click for Model/Code and Paper
Evaluating Semantic Rationality of a Sentence: A Sememe-Word-Matching Neural Network based on HowNet

Sep 11, 2018
Shu Liu, Jingjing Xu, Xuancheng Ren, Xu Sun

Automatic evaluation of semantic rationality is an important yet challenging task, and current automatic techniques cannot well identify whether a sentence is semantically rational. The methods based on the language model do not measure the sentence by rationality but by commonness. The methods based on the similarity with human written sentences will fail if human-written references are not available. In this paper, we propose a novel model called Sememe-Word-Matching Neural Network (SWM-NN) to tackle semantic rationality evaluation by taking advantage of sememe knowledge base HowNet. The advantage is that our model can utilize a proper combination of sememes to represent the fine-grained semantic meanings of a word within the specific contexts. We use the fine-grained semantic representation to help the model learn the semantic dependency among words. To evaluate the effectiveness of the proposed model, we build a large-scale rationality evaluation dataset. Experimental results on this dataset show that the proposed model outperforms the competitive baselines with a 5.4\% improvement in accuracy.

  Click for Model/Code and Paper
Learning Sentiment Memories for Sentiment Modification without Parallel Data

Aug 22, 2018
Yi Zhang, Jingjing Xu, Pengcheng Yang, Xu Sun

The task of sentiment modification requires reversing the sentiment of the input and preserving the sentiment-independent content. However, aligned sentences with the same content but different sentiments are usually unavailable. Due to the lack of such parallel data, it is hard to extract sentiment independent content and reverse the sentiment in an unsupervised way. Previous work usually can not reconcile sentiment transformation and content preservation. In this paper, motivated by the fact the non-emotional context (e.g., "staff") provides strong cues for the occurrence of emotional words (e.g., "friendly"), we propose a novel method that automatically extracts appropriate sentiment information from learned sentiment memories according to specific context. Experiments show that our method substantially improves the content preservation degree and achieves the state-of-the-art performance.

* Accepted by EMNLP 2018 

  Click for Model/Code and Paper
DP-GAN: Diversity-Promoting Generative Adversarial Network for Generating Informative and Diversified Text

Aug 21, 2018
Jingjing Xu, Xuancheng Ren, Junyang Lin, Xu Sun

Existing text generation methods tend to produce repeated and "boring" expressions. To tackle this problem, we propose a new text generation model, called Diversity-Promoting Generative Adversarial Network (DP-GAN). The proposed model assigns low reward for repeatedly generated text and high reward for "novel" and fluent text, encouraging the generator to produce diverse and informative text. Moreover, we propose a novel language-model based discriminator, which can better distinguish novel text from repeated text without the saturation problem compared with existing classifier-based discriminators. The experimental results on review generation and dialogue generation tasks demonstrate that our model can generate substantially more diverse and informative text than existing baselines. The code is available at

* Accepted by EMNLP 2018 

  Click for Model/Code and Paper
A Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text

Nov 23, 2017
Jingjing Xu, Ji Wen, Xu Sun, Qi Su

Named Entity Recognition and Relation Extraction for Chinese literature text is regarded as the highly difficult problem, partially because of the lack of tagging sets. In this paper, we build a discourse-level dataset from hundreds of Chinese literature articles for improving this task. To build a high quality dataset, we propose two tagging methods to solve the problem of data inconsistency, including a heuristic tagging method and a machine auxiliary tagging method. Based on this corpus, we also introduce several widely used models to conduct experiments. Experimental results not only show the usefulness of the proposed dataset, but also provide baselines for further research. The dataset is available at

  Click for Model/Code and Paper
Minimal Effort Back Propagation for Convolutional Neural Networks

Sep 18, 2017
Bingzhen Wei, Xu Sun, Xuancheng Ren, Jingjing Xu

As traditional neural network consumes a significant amount of computing resources during back propagation, \citet{Sun2017mePropSB} propose a simple yet effective technique to alleviate this problem. In this technique, only a small subset of the full gradients are computed to update the model parameters. In this paper we extend this technique into the Convolutional Neural Network(CNN) to reduce calculation in back propagation, and the surprising results verify its validity in CNN: only 5\% of the gradients are passed back but the model still achieves the same effect as the traditional CNN, or even better. We also show that the top-$k$ selection of gradients leads to a sparse calculation in back propagation, which may bring significant computational benefits for high computational complexity of convolution operation in CNN.

  Click for Model/Code and Paper
Discourse-Aware Neural Extractive Model for Text Summarization

Oct 30, 2019
Jiacheng Xu, Zhe Gan, Yu Cheng, Jingjing Liu

Recently BERT has been adopted in state-of-the-art text summarization models for document encoding. However, such BERT-based extractive models use the sentence as the minimal selection unit, which often results in redundant or uninformative phrases in the generated summaries. As BERT is pre-trained on sentence pairs, not documents, the long-range dependencies between sentences are not well captured. To address these issues, we present a graph-based discourse-aware neural summarization model - DiscoBert. By utilizing discourse segmentation to extract discourse units (instead of sentences) as candidates, DiscoBert provides a fine-grained granularity for extractive selection, which helps reduce redundancy in extracted summaries. Based on this, two discourse graphs are further proposed: ($i$) RST Graph based on RST discourse trees; and ($ii$) Coreference Graph based on coreference mentions in the document. DiscoBert first encodes the extracted discourse units with BERT, and then uses a graph convolutional network to capture the long-range dependencies among discourse units through the constructed graphs. Experimental results on two popular summarization datasets demonstrate that DiscoBert outperforms state-of-the-art methods by a significant margin.

  Click for Model/Code and Paper
Review-Driven Multi-Label Music Style Classification by Exploiting Style Correlations

Aug 23, 2018
Guangxiang Zhao, Jingjing Xu, Qi Zeng, Xuancheng Ren

This paper explores a new natural language processing task, review-driven multi-label music style classification. This task requires the system to identify multiple styles of music based on its reviews on websites. The biggest challenge lies in the complicated relations of music styles. It has brought failure to many multi-label classification methods. To tackle this problem, we propose a novel deep learning approach to automatically learn and exploit style correlations. The proposed method consists of two parts: a label-graph based neural network, and a soft training mechanism with correlation-based continuous label representation. Experimental results show that our approach achieves large improvements over the baselines on the proposed dataset. Especially, the micro F1 is improved from 53.9 to 64.5, and the one-error is reduced from 30.5 to 22.6. Furthermore, the visualized analysis shows that our approach performs well in capturing style correlations.

  Click for Model/Code and Paper
Cross-Domain Visual Recognition via Domain Adaptive Dictionary Learning

Apr 16, 2018
Hongyu Xu, Jingjing Zheng, Azadeh Alavi, Rama Chellappa

In real-world visual recognition problems, the assumption that the training data (source domain) and test data (target domain) are sampled from the same distribution is often violated. This is known as the domain adaptation problem. In this work, we propose a novel domain-adaptive dictionary learning framework for cross-domain visual recognition. Our method generates a set of intermediate domains. These intermediate domains form a smooth path and bridge the gap between the source and target domains. Specifically, we not only learn a common dictionary to encode the domain-shared features, but also learn a set of domain-specific dictionaries to model the domain shift. The separation of the common and domain-specific dictionaries enables us to learn more compact and reconstructive dictionaries for domain adaptation. These dictionaries are learned by alternating between domain-adaptive sparse coding and dictionary updating steps. Meanwhile, our approach gradually recovers the feature representations of both source and target data along the domain path. By aligning all the recovered domain data, we derive the final domain-adaptive features for cross-domain visual recognition. Extensive experiments on three public datasets demonstrates that our approach outperforms most state-of-the-art methods.

* Submitted to IEEE TIP Journal 

  Click for Model/Code and Paper
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning

Nov 17, 2019
Guangxiang Zhao, Xu Sun, Jingjing Xu, Zhiyuan Zhang, Liangchen Luo

In sequence to sequence learning, the self-attention mechanism proves to be highly effective, and achieves significant improvements in many tasks. However, the self-attention mechanism is not without its own flaws. Although self-attention can model extremely long dependencies, the attention in deep layers tends to overconcentrate on a single token, leading to insufficient use of local information and difficultly in representing long sequences. In this work, we explore parallel multi-scale representation learning on sequence data, striving to capture both long-range and short-range language structures. To this end, we propose the Parallel MUlti-Scale attEntion (MUSE) and MUSE-simple. MUSE-simple contains the basic idea of parallel multi-scale sequence representation learning, and it encodes the sequence in parallel, in terms of different scales with the help from self-attention, and pointwise transformation. MUSE builds on MUSE-simple and explores combining convolution and self-attention for learning sequence representations from more different scales. We focus on machine translation and the proposed approach achieves substantial performance improvements over Transformer, especially on long sequences. More importantly, we find that although conceptually simple, its success in practice requires intricate considerations, and the multi-scale attention must build on unified semantic space. Under common setting, the proposed model achieves substantial performance and outperforms all previous models on three main machine translation tasks. In addition, MUSE has potential for accelerating inference due to its parallelism. Code will be available at

* Achieve state-of-the-art BLEU scores on WMT14 En-De, En-Fr, and IWSLT De-En 

  Click for Model/Code and Paper
Understanding and Improving Layer Normalization

Nov 16, 2019
Jingjing Xu, Xu Sun, Zhiyuan Zhang, Guangxiang Zhao, Junyang Lin

Layer normalization (LayerNorm) is a technique to normalize the distributions of intermediate layers. It enables smoother gradients, faster training, and better generalization accuracy. However, it is still unclear where the effectiveness stems from. In this paper, our main contribution is to take a step further in understanding LayerNorm. Many of previous studies believe that the success of LayerNorm comes from forward normalization. Unlike them, we find that the derivatives of the mean and variance are more important than forward normalization by re-centering and re-scaling backward gradients. Furthermore, we find that the parameters of LayerNorm, including the bias and gain, increase the risk of over-fitting and do not work in most cases. Experiments show that a simple version of LayerNorm (LayerNorm-simple) without the bias and gain outperforms LayerNorm on four datasets. It obtains the state-of-the-art performance on En-Vi machine translation. To address the over-fitting problem, we propose a new normalization method, Adaptive Normalization (AdaNorm), by replacing the bias and gain with a new transformation function. Experiments show that AdaNorm demonstrates better results than LayerNorm on seven out of eight datasets.

* Accepted by NeurIPS 2019 

  Click for Model/Code and Paper
PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation

Jun 28, 2019
Ruixuan Luo, Jingjing Xu, Yi Zhang, Xuancheng Ren, Xu Sun

Chinese word segmentation (CWS) is a fundamental step of Chinese natural language processing. In this paper, we build a new toolkit, named PKUSEG, for multi-domain word segmentation. Unlike existing single-model toolkits, PKUSEG targets at multi-domain word segmentation and provides separate models for different domains, such as web, medicine, and tourism. The new toolkit also supports POS tagging and model training to adapt to various application scenarios. Experiments show that PKUSEG achieves high performance on multiple domains. The toolkit is now freely and publicly available for the usage of research and industry.

  Click for Model/Code and Paper
Multi-Task Learning for Machine Reading Comprehension

Sep 18, 2018
Yichong Xu, Xiaodong Liu, Yelong Shen, Jingjing Liu, Jianfeng Gao

We propose a multi-task learning framework to jointly train a Machine Reading Comprehension (MRC) model on multiple datasets across different domains. Key to the proposed method is to learn robust and general contextual representations with the help of out-domain data in a multi-task framework. Empirical study shows that the proposed approach is orthogonal to the existing pre-trained representation models, such as word embedding and language models. Experiments on the Stanford Question Answering Dataset (SQuAD), the Microsoft MAchine Reading COmprehension Dataset (MS MARCO), NewsQA and other datasets show that our multi-task learning approach achieves significant improvement over state-of-the-art models in most MRC tasks.

* 9 pages, 2 figures, 7 tables 

  Click for Model/Code and Paper
An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation

Aug 27, 2018
Liangchen Luo, Jingjing Xu, Junyang Lin, Qi Zeng, Xu Sun

Generating semantically coherent responses is still a major challenge in dialogue generation. Different from conventional text generation tasks, the mapping between inputs and responses in conversations is more complicated, which highly demands the understanding of utterance-level semantic dependency, a relation between the whole meanings of inputs and outputs. To address this problem, we propose an Auto-Encoder Matching (AEM) model to learn such dependency. The model contains two auto-encoders and one mapping module. The auto-encoders learn the semantic representations of inputs and responses, and the mapping module learns to connect the utterance-level representations. Experimental results from automatic and human evaluations demonstrate that our model is capable of generating responses of high coherence and fluency compared to baseline models. The code is available at

* Accepted by EMNLP 2018 

  Click for Model/Code and Paper
Dynamic Fusion Networks for Machine Reading Comprehension

Feb 26, 2018
Yichong Xu, Jingjing Liu, Jianfeng Gao, Yelong Shen, Xiaodong Liu

This paper presents a novel neural model - Dynamic Fusion Network (DFN), for machine reading comprehension (MRC). DFNs differ from most state-of-the-art models in their use of a dynamic multi-strategy attention process, in which passages, questions and answer candidates are jointly fused into attention vectors, along with a dynamic multi-step reasoning module for generating answers. With the use of reinforcement learning, for each input sample that consists of a question, a passage and a list of candidate answers, an instance of DFN with a sample-specific network architecture can be dynamically constructed by determining what attention strategy to apply and how many reasoning steps to take. Experiments show that DFNs achieve the best result reported on RACE, a challenging MRC dataset that contains real human reading questions in a wide variety of types. A detailed empirical analysis also demonstrates that DFNs can produce attention vectors that summarize information from questions, passages and answer candidates more effectively than other popular MRC models.

* 13 pages, 5 figures, 5 tables 

  Click for Model/Code and Paper
Deep Stacking Networks for Low-Resource Chinese Word Segmentation with Transfer Learning

Nov 04, 2017
Jingjing Xu, Xu Sun, Sujian Li, Xiaoyan Cai, Bingzhen Wei

In recent years, neural networks have proven to be effective in Chinese word segmentation. However, this promising performance relies on large-scale training data. Neural networks with conventional architectures cannot achieve the desired results in low-resource datasets due to the lack of labelled training data. In this paper, we propose a deep stacking framework to improve the performance on word segmentation tasks with insufficient data by integrating datasets from diverse domains. Our framework consists of two parts, domain-based models and deep stacking networks. The domain-based models are used to learn knowledge from different datasets. The deep stacking networks are designed to integrate domain-based models. To reduce model conflicts, we innovatively add communication paths among models and design various structures of deep stacking networks, including Gaussian-based Stacking Networks, Concatenate-based Stacking Networks, Sequence-based Stacking Networks and Tree-based Stacking Networks. We conduct experiments on six low-resource datasets from various domains. Our proposed framework shows significant performance improvements on all datasets compared with several strong baselines.

  Click for Model/Code and Paper
Coherent Comment Generation for Chinese Articles with a Graph-to-Sequence Model

Jun 04, 2019
Wei Li, Jingjing Xu, Yancheng He, Shengli Yan, Yunfang Wu, Xu sun

Automatic article commenting is helpful in encouraging user engagement and interaction on online news platforms. However, the news documents are usually too long for traditional encoder-decoder based models, which often results in general and irrelevant comments. In this paper, we propose to generate comments with a graph-to-sequence model that models the input news as a topic interaction graph. By organizing the article into graph structure, our model can better understand the internal structure of the article and the connection between topics, which makes it better able to understand the story. We collect and release a large scale news-comment corpus from a popular Chinese online news platform Tencent Kuaibao. Extensive experiment results show that our model can generate much more coherent and informative comments compared with several strong baseline models.

* Accepted by ACL 2019 

  Click for Model/Code and Paper