Models, code, and papers for "Zhuosheng Zhang":

One-shot Learning for Question-Answering in Gaokao History Challenge

Jun 24, 2018
Zhuosheng Zhang, Hai Zhao

Answering questions from university admission exams (Gaokao in Chinese) is a challenging AI task since it requires effective representation to capture complicated semantic relations between questions and answers. In this work, we propose a hybrid neural model for deep question-answering task from history examinations. Our model employs a cooperative gated neural network to retrieve answers with the assistance of extra labels given by a neural turing machine labeler. Empirical study shows that the labeler works well with only a small training dataset and the gated mechanism is good at fetching the semantic representation of lengthy answers. Experiments on question answering demonstrate the proposed model obtains substantial performance gains over various neural model baselines in terms of multiple evaluation metrics.

* Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018) 

  Click for Model/Code and Paper
A Smart Sliding Chinese Pinyin Input Method Editor on Touchscreen

Sep 11, 2019
Zhuosheng Zhang, Zhen Meng, Hai Zhao

This paper presents a smart sliding Chinese pinyin Input Method Editor (IME) for touchscreen devices which allows user finger sliding from one key to another on the touchscreen instead of tapping keys one by one, while the target Chinese character sequence will be predicted during the sliding process to help user input Chinese characters efficiently. Moreover, the layout of the virtual keyboard of our IME adapts to user sliding for more efficient inputting. The layout adaption process is utilized with Recurrent Neural Networks (RNN) and deep reinforcement learning. The pinyin-to-character converter is implemented with a sequence-to-sequence (Seq2Seq) model to predict the target Chinese sequence. A sliding simulator is built to automatically produce sliding samples for model training and virtual keyboard test. The key advantage of our proposed IME is that nearly all its built-in tactics can be optimized automatically with deep learning algorithms only following user behavior. Empirical studies verify the effectiveness of the proposed model and show a better user input efficiency.

* There are some insufficient explanations that may confuse readers. We will continue the research, but it will take a lot of time. After discussing with co-authors, we decide to withdraw this version from ArXiv, instead of replacement. We may re-upload a new version of this work in the future 

  Click for Model/Code and Paper
Neural-based Pinyin-to-Character Conversion with Adaptive Vocabulary

Nov 11, 2018
Yafang Huang, Zhuosheng Zhang, Hai Zhao

Pinyin-to-character (P2C) conversion is the core component of pinyin-based Chinese input method engine (IME). However, the conversion is seriously compromised by the ambiguities of Chinese characters corresponding to pinyin as well as the predefined fixed vocabularies. To alleviate such inconveniences, we propose a neural P2C conversion model augmented by a large online updating vocabulary with a target vocabulary sampling mechanism. Our experiments show that the proposed approach reduces the decoding time on CPUs up to 50$\%$ on P2C tasks at the same or only negligible change in conversion accuracy, and the online updated vocabulary indeed helps our IME effectively follows user inputting behavior.

* 8 pages, 6 figures 

  Click for Model/Code and Paper
Subword-augmented Embedding for Cloze Reading Comprehension

Jun 24, 2018
Zhuosheng Zhang, Yafang Huang, Hai Zhao

Representation learning is the foundation of machine reading comprehension. In state-of-the-art models, deep learning methods broadly use word and character level representations. However, character is not naturally the minimal linguistic unit. In addition, with a simple concatenation of character and word embedding, previous models actually give suboptimal solution. In this paper, we propose to use subword rather than character for word embedding enhancement. We also empirically explore different augmentation strategies on subword-augmented embedding to enhance the cloze-style reading comprehension model reader. In detail, we present a reader that uses subword-level representation to augment word embedding with a short list to handle rare words effectively. A thorough examination is conducted to evaluate the comprehensive performance and generalization ability of the proposed reader. Experimental results show that the proposed approach helps the reader significantly outperform the state-of-the-art baselines on various public datasets.

* Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018) 

  Click for Model/Code and Paper
Attentive Semantic Role Labeling with Boundary Indicator

Sep 08, 2018
Zhuosheng Zhang, Shexia He, Zuchao Li, Hai Zhao

The goal of semantic role labeling (SRL) is to discover the predicate-argument structure of a sentence, which plays a critical role in deep processing of natural language. This paper introduces simple yet effective auxiliary tags for dependency-based SRL to enhance a syntax-agnostic model with multi-hop self-attention. Our syntax-agnostic model achieves competitive performance with state-of-the-art models on the CoNLL-2009 benchmarks both for English and Chinese.


  Click for Model/Code and Paper
Modeling Named Entity Embedding Distribution into Hypersphere

Sep 03, 2019
Zhuosheng Zhang, Bingjie Tang, Zuchao Li, Hai Zhao

This work models named entity distribution from a way of visualizing topological structure of embedding space, so that we make an assumption that most, if not all, named entities (NEs) for a language tend to aggregate together to be accommodated by a specific hypersphere in embedding space. Thus we present a novel open definition for NE which alleviates the obvious drawback in previous closed NE definition with a limited NE dictionary. Then, we show two applications with introducing the proposed named entity hypersphere model. First, using a generative adversarial neural network to learn a transformation matrix of two embedding spaces, which results in a convenient determination of named entity distribution in the target language, indicating the potential of fast named entity discovery only using isomorphic relation between embedding spaces. Second, the named entity hypersphere model is directly integrated with various named entity recognition models over sentences to achieve state-of-the-art results. Only assuming that embeddings are available, we show a prior knowledge free approach on effective named entity distribution depiction.


  Click for Model/Code and Paper
Effective Character-augmented Word Embedding for Machine Reading Comprehension

Aug 07, 2018
Zhuosheng Zhang, Yafang Huang, Pengfei Zhu, Hai Zhao

Machine reading comprehension is a task to model relationship between passage and query. In terms of deep learning framework, most of state-of-the-art models simply concatenate word and character level representations, which has been shown suboptimal for the concerned task. In this paper, we empirically explore different integration strategies of word and character embeddings and propose a character-augmented reader which attends character-level representation to augment word embedding with a short list to improve word representations, especially for rare words. Experimental results show that the proposed approach helps the baseline model significantly outperform state-of-the-art baselines on various public benchmarks.

* Accepted by NLPCC 2018. arXiv admin note: text overlap with arXiv:1806.09103 

  Click for Model/Code and Paper
SJTU-NLP at SemEval-2018 Task 9: Neural Hypernym Discovery with Term Embeddings

May 26, 2018
Zhuosheng Zhang, Jiangtong Li, Hai Zhao, Bingjie Tang

This paper describes a hypernym discovery system for our participation in the SemEval-2018 Task 9, which aims to discover the best (set of) candidate hypernyms for input concepts or entities, given the search space of a pre-defined vocabulary. We introduce a neural network architecture for the concerned task and empirically study various neural network models to build the representations in latent space for words and phrases. The evaluated models include convolutional neural network, long-short term memory network, gated recurrent unit and recurrent convolutional neural network. We also explore different embedding methods, including word embedding and sense embedding for better performance.

* SemEval-2018, Workshop of NAACL-HLT 2018 

  Click for Model/Code and Paper
DCMN+: Dual Co-Matching Network for Multi-choice Reading Comprehension

Aug 30, 2019
Shuailiang Zhang, Hai Zhao, Yuwei Wu, Zhuosheng Zhang, Xi Zhou, Xiang Zhou

Multi-choice reading comprehension is a challenging task to select an answer from a set of candidate options when given passage and question. Previous approaches usually only calculate question-aware passage representation and ignore passage-aware question representation when modeling the relationship between passage and question, which obviously cannot take the best of information between passage and question. In this work, we propose dual co-matching network (DCMN) which models the relationship among passage, question and answer options bidirectionally. Besides, inspired by how human solve multi-choice questions, we integrate two reading strategies into our model: (i) passage sentence selection that finds the most salient supporting sentences to answer the question, (ii) answer option interaction that encodes the comparison information between answer options. DCMN integrated with the two strategies (DCMN+) obtains state-of-the-art results on five multi-choice reading comprehension datasets which are from different domains: RACE, SemEval-2018 Task 11, ROCStories, COIN, MCTest.


  Click for Model/Code and Paper
SG-Net: Syntax-Guided Machine Reading Comprehension

Aug 14, 2019
Zhuosheng Zhang, Yuwei Wu, Junru Zhou, Sufeng Duan, Hai Zhao

For machine reading comprehension, how to effectively model the linguistic knowledge from the detail-riddled and lengthy passages and get ride of the noises is essential to improve its performance. In this work, we propose using syntax to guide the text modeling of both passages and questions by incorporating syntactic clues into multi-head attention mechanism to fully fuse information from both global and attended representations. Accordingly, we present a novel syntax-guided network (SG-Net) for challenging reading comprehension tasks. Extensive experiments on popular benchmarks including SQuAD 2.0 and RACE validate the effectiveness of the proposed method with substantial improvements over fine-tuned BERT. This work empirically discloses the effectiveness of syntactic structural information for text modeling. The proposed attention mechanism also verifies the practicability of using linguistic information to guide attention learning and can be easily adapted with other tree-structured annotations.


  Click for Model/Code and Paper
Dual Co-Matching Network for Multi-choice Reading Comprehension

Jan 27, 2019
Shuailiang Zhang, Hai Zhao, Yuwei Wu, Zhuosheng Zhang, Xi Zhou, Xiang Zhou

Multi-choice reading comprehension is a challenging task that requires complex reasoning procedure. Given passage and question, a correct answer need to be selected from a set of candidate answers. In this paper, we propose \textbf{D}ual \textbf{C}o-\textbf{M}atching \textbf{N}etwork (\textbf{DCMN}) which model the relationship among passage, question and answer bidirectionally. Different from existing approaches which only calculate question-aware or option-aware passage representation, we calculate passage-aware question representation and passage-aware answer representation at the same time. To demonstrate the effectiveness of our model, we evaluate our model on a large-scale multiple choice machine reading comprehension dataset({\em i.e.} RACE). Experimental result show that our proposed model achieves new state-of-the-art results.

* arXiv admin note: text overlap with arXiv:1806.04068 by other authors 

  Click for Model/Code and Paper
Modeling Multi-turn Conversation with Deep Utterance Aggregation

Nov 06, 2018
Zhuosheng Zhang, Jiangtong Li, Pengfei Zhu, Hai Zhao, Gongshen Liu

Multi-turn conversation understanding is a major challenge for building intelligent dialogue systems. This work focuses on retrieval-based response matching for multi-turn conversation whose related work simply concatenates the conversation utterances, ignoring the interactions among previous utterances for context modeling. In this paper, we formulate previous utterances into context using a proposed deep utterance aggregation model to form a fine-grained context representation. In detail, a self-matching attention is first introduced to route the vital information in each utterance. Then the model matches a response with each refined utterance and the final matching score is obtained after attentive turns aggregation. Experimental results show our model outperforms the state-of-the-art methods on three multi-turn conversation benchmarks, including a newly introduced e-commerce dialogue corpus.

* COLING 2018, pages 3740-3752 
* Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018) 

  Click for Model/Code and Paper
Effective Subword Segmentation for Text Comprehension

Nov 06, 2018
Zhuosheng Zhang, Hai Zhao, Kangwei Ling, Jiangtong Li, Zuchao Li, Shexia He

Character-level representations have been broadly adopted to alleviate the problem of effectively representing rare or complex words. However, character itself is not a natural minimal linguistic unit for representation or word embedding composing due to ignoring the linguistic coherence of consecutive characters inside word. This paper presents a general subword-augmented embedding framework for learning and composing computationally-derived subword-level representations. We survey a series of unsupervised segmentation methods for subword acquisition and different subword-augmented strategies for text understanding, showing that subword-augmented embedding significantly improves our baselines in multiple text understanding tasks on both English and Chinese languages.


  Click for Model/Code and Paper
Lingke: A Fine-grained Multi-turn Chatbot for Customer Service

Aug 10, 2018
Pengfei Zhu, Zhuosheng Zhang, Jiangtong Li, Yafang Huang, Hai Zhao

Traditional chatbots usually need a mass of human dialogue data, especially when using supervised machine learning method. Though they can easily deal with single-turn question answering, for multi-turn the performance is usually unsatisfactory. In this paper, we present Lingke, an information retrieval augmented chatbot which is able to answer questions based on given product introduction document and deal with multi-turn conversations. We will introduce a fine-grained pipeline processing to distill responses based on unstructured documents, and attentive sequential context-response matching for multi-turn conversations.

* Accepted by COLING 2018 demonstration paper 

  Click for Model/Code and Paper
Dependency or Span, End-to-End Uniform Semantic Role Labeling

Jan 16, 2019
Zuchao Li, Shexia He, Hai Zhao, Yiqing Zhang, Zhuosheng Zhang, Xi Zhou, Xiang Zhou

Semantic role labeling (SRL) aims to discover the predicateargument structure of a sentence. End-to-end SRL without syntactic input has received great attention. However, most of them focus on either span-based or dependency-based semantic representation form and only show specific model optimization respectively. Meanwhile, handling these two SRL tasks uniformly was less successful. This paper presents an end-to-end model for both dependency and span SRL with a unified argument representation to deal with two different types of argument annotations in a uniform fashion. Furthermore, we jointly predict all predicates and arguments, especially including long-term ignored predicate identification subtask. Our single model achieves new state-of-the-art results on both span (CoNLL 2005, 2012) and dependency (CoNLL 2008, 2009) SRL benchmarks.


  Click for Model/Code and Paper
Semantics-aware BERT for Language Understanding

Sep 05, 2019
Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li, Shuailiang Zhang, Xi Zhou, Xiang Zhou

The latest work on language representations carefully integrates contextualized features into language model training, which enables a series of success especially in various machine reading comprehension and natural language inference tasks. However, the existing language representation models including ELMo, GPT and BERT only exploit plain context-sensitive features such as character or word embeddings. They rarely consider incorporating structured semantic information which can provide rich semantics for language representation. To promote natural language understanding, we propose to incorporate explicit contextual semantics from pre-trained semantic role labeling, and introduce an improved language representation model, Semantics-aware BERT (SemBERT), which is capable of explicitly absorbing contextual semantics over a BERT backbone. SemBERT keeps the convenient usability of its BERT precursor in a light fine-tuning way without substantial task-specific modifications. Compared with BERT, semantics-aware BERT is as simple in concept but more powerful. It obtains new state-of-the-art or substantially improves results on ten reading comprehension and language inference tasks.


  Click for Model/Code and Paper
I Know What You Want: Semantic Learning for Text Comprehension

Sep 08, 2018
Zhuosheng Zhang, Yuwei Wu, Zuchao Li, Shexia He, Hai Zhao, Xi Zhou, Xiang Zhou

Who did what to whom is a major focus in natural language understanding, which is right the aim of semantic role labeling (SRL). Although SRL is naturally essential to text comprehension tasks, it is surprisingly ignored in previous work. This paper thus makes the first attempt to let SRL enhance text comprehension and inference through specifying verbal arguments and their corresponding semantic roles. In terms of deep learning models, our embeddings are enhanced by semantic role labels for more fine-grained semantics. We show that the salient labels can be conveniently added to existing models and significantly improve deep learning models in challenging text comprehension tasks. Extensive experiments on benchmark machine reading comprehension and inference datasets verify that the proposed semantic learning helps our system reach new state-of-the-art.


  Click for Model/Code and Paper
Judging Chemical Reaction Practicality From Positive Sample only Learning

Apr 22, 2019
Shu Jiang, Zhuosheng Zhang, Hai Zhao, Jiangtong Li, Yang Yang, Bao-Liang Lu, Ning Xia

Chemical reaction practicality is the core task among all symbol intelligence based chemical information processing, for example, it provides indispensable clue for further automatic synthesis route inference. Considering that chemical reactions have been represented in a language form, we propose a new solution to generally judge the practicality of organic reaction without considering complex quantum physical modeling or chemistry knowledge. While tackling the practicality judgment as a machine learning task from positive and negative (chemical reaction) samples, all existing studies have to carefully handle the serious insufficiency issue on the negative samples. We propose an auto-construction method to well solve the extensively existed long-term difficulty. Experimental results show our model can effectively predict the practicality of chemical reactions, which achieves a high accuracy of 99.76\% on real large-scale chemical lab reaction practicality judgment.


  Click for Model/Code and Paper