Models, code, and papers for "Zhuosheng Zhang":

One-shot Learning for Question-Answering in Gaokao History Challenge

Jun 24, 2018
Zhuosheng Zhang, Hai Zhao

Answering questions from university admission exams (Gaokao in Chinese) is a challenging AI task since it requires effective representation to capture complicated semantic relations between questions and answers. In this work, we propose a hybrid neural model for deep question-answering task from history examinations. Our model employs a cooperative gated neural network to retrieve answers with the assistance of extra labels given by a neural turing machine labeler. Empirical study shows that the labeler works well with only a small training dataset and the gated mechanism is good at fetching the semantic representation of lengthy answers. Experiments on question answering demonstrate the proposed model obtains substantial performance gains over various neural model baselines in terms of multiple evaluation metrics.

* Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018) 

  Access Model/Code and Paper
Retrospective Reader for Machine Reading Comprehension

Jan 27, 2020
Zhuosheng Zhang, Junjie Yang, Hai Zhao

Machine reading comprehension (MRC) is an AI challenge that requires machine to determine the correct answers to questions based on a given passage. MRC systems must not only answer question when necessary but also distinguish when no answer is available according to the given passage and then tactfully abstain from answering. When unanswerable questions are involved in the MRC task, an essential verification module called verifier is especially required in addition to the encoder, though the latest practice on MRC modeling still most benefits from adopting well pre-trained language models as the encoder block by only focusing on the "reading". This paper devotes itself to exploring better verifier design for the MRC task with unanswerable questions. Inspired by how humans solve reading comprehension questions, we proposed a retrospective reader (Retro-Reader) that integrates two stages of reading and verification strategies: 1) sketchy reading that briefly investigates the overall interactions of passage and question, and yield an initial judgment; 2) intensive reading that verifies the answer and gives the final prediction. The proposed reader is evaluated on two benchmark MRC challenge datasets SQuAD2.0 and NewsQA, achieving new state-of-the-art results. Significance tests show that our model is significantly better than the strong ALBERT baseline. A series of analysis is also conducted to interpret the effectiveness of the proposed reader.

  Access Model/Code and Paper
LIMIT-BERT : Linguistic Informed Multi-Task BERT

Oct 31, 2019
Junru Zhou, Zhuosheng Zhang, Hai Zhao

In this paper, we present a Linguistic Informed Multi-Task BERT (LIMIT-BERT) for learning language representations across multiple linguistic tasks by Multi-Task Learning (MTL). LIMIT-BERT includes five key linguistic syntax and semantics tasks: Part-Of-Speech (POS) tags, constituent and dependency syntactic parsing, span and dependency semantic role labeling (SRL). Besides, LIMIT-BERT adopts linguistics mask strategy: Syntactic and Semantic Phrase Masking which mask all of the tokens corresponding to a syntactic/semantic phrase. Different from recent Multi-Task Deep Neural Networks (MT-DNN) (Liu et al., 2019), our LIMIT-BERT is linguistically motivated and learning in a semi-supervised method which provides large amounts of linguistic-task data as same as BERT learning corpus. As a result, LIMIT-BERT not only improves linguistic tasks performance but also benefits from a regularization effect and linguistic information that leads to more general representations to help adapt to new tasks and domains. LIMIT-BERT obtains new state-of-the-art or competitive results on both span and dependency semantic parsing on Propbank benchmarks and both dependency and constituent syntactic parsing on Penn Treebank.

  Access Model/Code and Paper
A Smart Sliding Chinese Pinyin Input Method Editor on Touchscreen

Sep 11, 2019
Zhuosheng Zhang, Zhen Meng, Hai Zhao

This paper presents a smart sliding Chinese pinyin Input Method Editor (IME) for touchscreen devices which allows user finger sliding from one key to another on the touchscreen instead of tapping keys one by one, while the target Chinese character sequence will be predicted during the sliding process to help user input Chinese characters efficiently. Moreover, the layout of the virtual keyboard of our IME adapts to user sliding for more efficient inputting. The layout adaption process is utilized with Recurrent Neural Networks (RNN) and deep reinforcement learning. The pinyin-to-character converter is implemented with a sequence-to-sequence (Seq2Seq) model to predict the target Chinese sequence. A sliding simulator is built to automatically produce sliding samples for model training and virtual keyboard test. The key advantage of our proposed IME is that nearly all its built-in tactics can be optimized automatically with deep learning algorithms only following user behavior. Empirical studies verify the effectiveness of the proposed model and show a better user input efficiency.

* There are some insufficient explanations that may confuse readers. We will continue the research, but it will take a lot of time. After discussing with co-authors, we decide to withdraw this version from ArXiv, instead of replacement. We may re-upload a new version of this work in the future 

  Access Model/Code and Paper
Neural-based Pinyin-to-Character Conversion with Adaptive Vocabulary

Nov 11, 2018
Yafang Huang, Zhuosheng Zhang, Hai Zhao

Pinyin-to-character (P2C) conversion is the core component of pinyin-based Chinese input method engine (IME). However, the conversion is seriously compromised by the ambiguities of Chinese characters corresponding to pinyin as well as the predefined fixed vocabularies. To alleviate such inconveniences, we propose a neural P2C conversion model augmented by a large online updating vocabulary with a target vocabulary sampling mechanism. Our experiments show that the proposed approach reduces the decoding time on CPUs up to 50$\%$ on P2C tasks at the same or only negligible change in conversion accuracy, and the online updated vocabulary indeed helps our IME effectively follows user inputting behavior.

* 8 pages, 6 figures 

  Access Model/Code and Paper
Subword-augmented Embedding for Cloze Reading Comprehension

Jun 24, 2018
Zhuosheng Zhang, Yafang Huang, Hai Zhao

Representation learning is the foundation of machine reading comprehension. In state-of-the-art models, deep learning methods broadly use word and character level representations. However, character is not naturally the minimal linguistic unit. In addition, with a simple concatenation of character and word embedding, previous models actually give suboptimal solution. In this paper, we propose to use subword rather than character for word embedding enhancement. We also empirically explore different augmentation strategies on subword-augmented embedding to enhance the cloze-style reading comprehension model reader. In detail, we present a reader that uses subword-level representation to augment word embedding with a short list to handle rare words effectively. A thorough examination is conducted to evaluate the comprehensive performance and generalization ability of the proposed reader. Experimental results show that the proposed approach helps the reader significantly outperform the state-of-the-art baselines on various public datasets.

* Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018) 

  Access Model/Code and Paper
Attentive Semantic Role Labeling with Boundary Indicator

Sep 08, 2018
Zhuosheng Zhang, Shexia He, Zuchao Li, Hai Zhao

The goal of semantic role labeling (SRL) is to discover the predicate-argument structure of a sentence, which plays a critical role in deep processing of natural language. This paper introduces simple yet effective auxiliary tags for dependency-based SRL to enhance a syntax-agnostic model with multi-hop self-attention. Our syntax-agnostic model achieves competitive performance with state-of-the-art models on the CoNLL-2009 benchmarks both for English and Chinese.

  Access Model/Code and Paper
Modeling Named Entity Embedding Distribution into Hypersphere

Sep 03, 2019
Zhuosheng Zhang, Bingjie Tang, Zuchao Li, Hai Zhao

This work models named entity distribution from a way of visualizing topological structure of embedding space, so that we make an assumption that most, if not all, named entities (NEs) for a language tend to aggregate together to be accommodated by a specific hypersphere in embedding space. Thus we present a novel open definition for NE which alleviates the obvious drawback in previous closed NE definition with a limited NE dictionary. Then, we show two applications with introducing the proposed named entity hypersphere model. First, using a generative adversarial neural network to learn a transformation matrix of two embedding spaces, which results in a convenient determination of named entity distribution in the target language, indicating the potential of fast named entity discovery only using isomorphic relation between embedding spaces. Second, the named entity hypersphere model is directly integrated with various named entity recognition models over sentences to achieve state-of-the-art results. Only assuming that embeddings are available, we show a prior knowledge free approach on effective named entity distribution depiction.

  Access Model/Code and Paper
Effective Character-augmented Word Embedding for Machine Reading Comprehension

Aug 07, 2018
Zhuosheng Zhang, Yafang Huang, Pengfei Zhu, Hai Zhao

Machine reading comprehension is a task to model relationship between passage and query. In terms of deep learning framework, most of state-of-the-art models simply concatenate word and character level representations, which has been shown suboptimal for the concerned task. In this paper, we empirically explore different integration strategies of word and character embeddings and propose a character-augmented reader which attends character-level representation to augment word embedding with a short list to improve word representations, especially for rare words. Experimental results show that the proposed approach helps the baseline model significantly outperform state-of-the-art baselines on various public benchmarks.

* Accepted by NLPCC 2018. arXiv admin note: text overlap with arXiv:1806.09103 

  Access Model/Code and Paper
SJTU-NLP at SemEval-2018 Task 9: Neural Hypernym Discovery with Term Embeddings

May 26, 2018
Zhuosheng Zhang, Jiangtong Li, Hai Zhao, Bingjie Tang

This paper describes a hypernym discovery system for our participation in the SemEval-2018 Task 9, which aims to discover the best (set of) candidate hypernyms for input concepts or entities, given the search space of a pre-defined vocabulary. We introduce a neural network architecture for the concerned task and empirically study various neural network models to build the representations in latent space for words and phrases. The evaluated models include convolutional neural network, long-short term memory network, gated recurrent unit and recurrent convolutional neural network. We also explore different embedding methods, including word embedding and sense embedding for better performance.

* SemEval-2018, Workshop of NAACL-HLT 2018 

  Access Model/Code and Paper
DCMN+: Dual Co-Matching Network for Multi-choice Reading Comprehension

Aug 30, 2019
Shuailiang Zhang, Hai Zhao, Yuwei Wu, Zhuosheng Zhang, Xi Zhou, Xiang Zhou

Multi-choice reading comprehension is a challenging task to select an answer from a set of candidate options when given passage and question. Previous approaches usually only calculate question-aware passage representation and ignore passage-aware question representation when modeling the relationship between passage and question, which obviously cannot take the best of information between passage and question. In this work, we propose dual co-matching network (DCMN) which models the relationship among passage, question and answer options bidirectionally. Besides, inspired by how human solve multi-choice questions, we integrate two reading strategies into our model: (i) passage sentence selection that finds the most salient supporting sentences to answer the question, (ii) answer option interaction that encodes the comparison information between answer options. DCMN integrated with the two strategies (DCMN+) obtains state-of-the-art results on five multi-choice reading comprehension datasets which are from different domains: RACE, SemEval-2018 Task 11, ROCStories, COIN, MCTest.

  Access Model/Code and Paper
SG-Net: Syntax-Guided Machine Reading Comprehension

Aug 14, 2019
Zhuosheng Zhang, Yuwei Wu, Junru Zhou, Sufeng Duan, Hai Zhao

For machine reading comprehension, how to effectively model the linguistic knowledge from the detail-riddled and lengthy passages and get ride of the noises is essential to improve its performance. In this work, we propose using syntax to guide the text modeling of both passages and questions by incorporating syntactic clues into multi-head attention mechanism to fully fuse information from both global and attended representations. Accordingly, we present a novel syntax-guided network (SG-Net) for challenging reading comprehension tasks. Extensive experiments on popular benchmarks including SQuAD 2.0 and RACE validate the effectiveness of the proposed method with substantial improvements over fine-tuned BERT. This work empirically discloses the effectiveness of syntactic structural information for text modeling. The proposed attention mechanism also verifies the practicability of using linguistic information to guide attention learning and can be easily adapted with other tree-structured annotations.

  Access Model/Code and Paper
Dual Co-Matching Network for Multi-choice Reading Comprehension

Jan 27, 2019
Shuailiang Zhang, Hai Zhao, Yuwei Wu, Zhuosheng Zhang, Xi Zhou, Xiang Zhou

Multi-choice reading comprehension is a challenging task that requires complex reasoning procedure. Given passage and question, a correct answer need to be selected from a set of candidate answers. In this paper, we propose \textbf{D}ual \textbf{C}o-\textbf{M}atching \textbf{N}etwork (\textbf{DCMN}) which model the relationship among passage, question and answer bidirectionally. Different from existing approaches which only calculate question-aware or option-aware passage representation, we calculate passage-aware question representation and passage-aware answer representation at the same time. To demonstrate the effectiveness of our model, we evaluate our model on a large-scale multiple choice machine reading comprehension dataset({\em i.e.} RACE). Experimental result show that our proposed model achieves new state-of-the-art results.

* arXiv admin note: text overlap with arXiv:1806.04068 by other authors 

  Access Model/Code and Paper
Modeling Multi-turn Conversation with Deep Utterance Aggregation

Nov 06, 2018
Zhuosheng Zhang, Jiangtong Li, Pengfei Zhu, Hai Zhao, Gongshen Liu

Multi-turn conversation understanding is a major challenge for building intelligent dialogue systems. This work focuses on retrieval-based response matching for multi-turn conversation whose related work simply concatenates the conversation utterances, ignoring the interactions among previous utterances for context modeling. In this paper, we formulate previous utterances into context using a proposed deep utterance aggregation model to form a fine-grained context representation. In detail, a self-matching attention is first introduced to route the vital information in each utterance. Then the model matches a response with each refined utterance and the final matching score is obtained after attentive turns aggregation. Experimental results show our model outperforms the state-of-the-art methods on three multi-turn conversation benchmarks, including a newly introduced e-commerce dialogue corpus.

* COLING 2018, pages 3740-3752 
* Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018) 

  Access Model/Code and Paper
Effective Subword Segmentation for Text Comprehension

Nov 06, 2018
Zhuosheng Zhang, Hai Zhao, Kangwei Ling, Jiangtong Li, Zuchao Li, Shexia He

Character-level representations have been broadly adopted to alleviate the problem of effectively representing rare or complex words. However, character itself is not a natural minimal linguistic unit for representation or word embedding composing due to ignoring the linguistic coherence of consecutive characters inside word. This paper presents a general subword-augmented embedding framework for learning and composing computationally-derived subword-level representations. We survey a series of unsupervised segmentation methods for subword acquisition and different subword-augmented strategies for text understanding, showing that subword-augmented embedding significantly improves our baselines in multiple text understanding tasks on both English and Chinese languages.

  Access Model/Code and Paper
Lingke: A Fine-grained Multi-turn Chatbot for Customer Service

Aug 10, 2018
Pengfei Zhu, Zhuosheng Zhang, Jiangtong Li, Yafang Huang, Hai Zhao

Traditional chatbots usually need a mass of human dialogue data, especially when using supervised machine learning method. Though they can easily deal with single-turn question answering, for multi-turn the performance is usually unsatisfactory. In this paper, we present Lingke, an information retrieval augmented chatbot which is able to answer questions based on given product introduction document and deal with multi-turn conversations. We will introduce a fine-grained pipeline processing to distill responses based on unstructured documents, and attentive sequential context-response matching for multi-turn conversations.

* Accepted by COLING 2018 demonstration paper 

  Access Model/Code and Paper
Dependency or Span, End-to-End Uniform Semantic Role Labeling

Jan 16, 2019
Zuchao Li, Shexia He, Hai Zhao, Yiqing Zhang, Zhuosheng Zhang, Xi Zhou, Xiang Zhou

Semantic role labeling (SRL) aims to discover the predicateargument structure of a sentence. End-to-end SRL without syntactic input has received great attention. However, most of them focus on either span-based or dependency-based semantic representation form and only show specific model optimization respectively. Meanwhile, handling these two SRL tasks uniformly was less successful. This paper presents an end-to-end model for both dependency and span SRL with a unified argument representation to deal with two different types of argument annotations in a uniform fashion. Furthermore, we jointly predict all predicates and arguments, especially including long-term ignored predicate identification subtask. Our single model achieves new state-of-the-art results on both span (CoNLL 2005, 2012) and dependency (CoNLL 2008, 2009) SRL benchmarks.

  Access Model/Code and Paper
Probing Contextualized Sentence Representations with Visual Awareness

Nov 07, 2019
Zhuosheng Zhang, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Hai Zhao

We present a universal framework to model contextualized sentence representations with visual awareness that is motivated to overcome the shortcomings of the multimodal parallel data with manual annotations. For each sentence, we first retrieve a diversity of images from a shared cross-modal embedding space, which is pre-trained on a large-scale of text-image pairs. Then, the texts and images are respectively encoded by transformer encoder and convolutional neural network. The two sequences of representations are further fused by a simple and effective attention layer. The architecture can be easily applied to text-only natural language processing tasks without manually annotating multimodal parallel corpora. We apply the proposed method on three tasks, including neural machine translation, natural language inference and sequence labeling and experimental results verify the effectiveness.

  Access Model/Code and Paper
Semantics-aware BERT for Language Understanding

Sep 05, 2019
Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li, Shuailiang Zhang, Xi Zhou, Xiang Zhou

The latest work on language representations carefully integrates contextualized features into language model training, which enables a series of success especially in various machine reading comprehension and natural language inference tasks. However, the existing language representation models including ELMo, GPT and BERT only exploit plain context-sensitive features such as character or word embeddings. They rarely consider incorporating structured semantic information which can provide rich semantics for language representation. To promote natural language understanding, we propose to incorporate explicit contextual semantics from pre-trained semantic role labeling, and introduce an improved language representation model, Semantics-aware BERT (SemBERT), which is capable of explicitly absorbing contextual semantics over a BERT backbone. SemBERT keeps the convenient usability of its BERT precursor in a light fine-tuning way without substantial task-specific modifications. Compared with BERT, semantics-aware BERT is as simple in concept but more powerful. It obtains new state-of-the-art or substantially improves results on ten reading comprehension and language inference tasks.

  Access Model/Code and Paper