Models, code, and papers for "Zhuosheng Zhang":

##### One-shot Learning for Question-Answering in Gaokao History Challenge

Jun 24, 2018
Zhuosheng Zhang, Hai Zhao

Answering questions from university admission exams (Gaokao in Chinese) is a challenging AI task since it requires effective representation to capture complicated semantic relations between questions and answers. In this work, we propose a hybrid neural model for deep question-answering task from history examinations. Our model employs a cooperative gated neural network to retrieve answers with the assistance of extra labels given by a neural turing machine labeler. Empirical study shows that the labeler works well with only a small training dataset and the gated mechanism is good at fetching the semantic representation of lengthy answers. Experiments on question answering demonstrate the proposed model obtains substantial performance gains over various neural model baselines in terms of multiple evaluation metrics.

* Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018)

Jan 27, 2020
Zhuosheng Zhang, Junjie Yang, Hai Zhao

##### LIMIT-BERT : Linguistic Informed Multi-Task BERT

Oct 31, 2019
Junru Zhou, Zhuosheng Zhang, Hai Zhao

##### A Smart Sliding Chinese Pinyin Input Method Editor on Touchscreen

Sep 11, 2019
Zhuosheng Zhang, Zhen Meng, Hai Zhao

This paper presents a smart sliding Chinese pinyin Input Method Editor (IME) for touchscreen devices which allows user finger sliding from one key to another on the touchscreen instead of tapping keys one by one, while the target Chinese character sequence will be predicted during the sliding process to help user input Chinese characters efficiently. Moreover, the layout of the virtual keyboard of our IME adapts to user sliding for more efficient inputting. The layout adaption process is utilized with Recurrent Neural Networks (RNN) and deep reinforcement learning. The pinyin-to-character converter is implemented with a sequence-to-sequence (Seq2Seq) model to predict the target Chinese sequence. A sliding simulator is built to automatically produce sliding samples for model training and virtual keyboard test. The key advantage of our proposed IME is that nearly all its built-in tactics can be optimized automatically with deep learning algorithms only following user behavior. Empirical studies verify the effectiveness of the proposed model and show a better user input efficiency.

* There are some insufficient explanations that may confuse readers. We will continue the research, but it will take a lot of time. After discussing with co-authors, we decide to withdraw this version from ArXiv, instead of replacement. We may re-upload a new version of this work in the future
##### Neural-based Pinyin-to-Character Conversion with Adaptive Vocabulary

Nov 11, 2018
Yafang Huang, Zhuosheng Zhang, Hai Zhao

Pinyin-to-character (P2C) conversion is the core component of pinyin-based Chinese input method engine (IME). However, the conversion is seriously compromised by the ambiguities of Chinese characters corresponding to pinyin as well as the predefined fixed vocabularies. To alleviate such inconveniences, we propose a neural P2C conversion model augmented by a large online updating vocabulary with a target vocabulary sampling mechanism. Our experiments show that the proposed approach reduces the decoding time on CPUs up to 50$\%$ on P2C tasks at the same or only negligible change in conversion accuracy, and the online updated vocabulary indeed helps our IME effectively follows user inputting behavior.

* 8 pages, 6 figures
##### Subword-augmented Embedding for Cloze Reading Comprehension

Jun 24, 2018
Zhuosheng Zhang, Yafang Huang, Hai Zhao

Representation learning is the foundation of machine reading comprehension. In state-of-the-art models, deep learning methods broadly use word and character level representations. However, character is not naturally the minimal linguistic unit. In addition, with a simple concatenation of character and word embedding, previous models actually give suboptimal solution. In this paper, we propose to use subword rather than character for word embedding enhancement. We also empirically explore different augmentation strategies on subword-augmented embedding to enhance the cloze-style reading comprehension model reader. In detail, we present a reader that uses subword-level representation to augment word embedding with a short list to handle rare words effectively. A thorough examination is conducted to evaluate the comprehensive performance and generalization ability of the proposed reader. Experimental results show that the proposed approach helps the reader significantly outperform the state-of-the-art baselines on various public datasets.

* Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018)
##### Attentive Semantic Role Labeling with Boundary Indicator

Sep 08, 2018
Zhuosheng Zhang, Shexia He, Zuchao Li, Hai Zhao

The goal of semantic role labeling (SRL) is to discover the predicate-argument structure of a sentence, which plays a critical role in deep processing of natural language. This paper introduces simple yet effective auxiliary tags for dependency-based SRL to enhance a syntax-agnostic model with multi-hop self-attention. Our syntax-agnostic model achieves competitive performance with state-of-the-art models on the CoNLL-2009 benchmarks both for English and Chinese.

##### Modeling Named Entity Embedding Distribution into Hypersphere

Sep 03, 2019
Zhuosheng Zhang, Bingjie Tang, Zuchao Li, Hai Zhao

This work models named entity distribution from a way of visualizing topological structure of embedding space, so that we make an assumption that most, if not all, named entities (NEs) for a language tend to aggregate together to be accommodated by a specific hypersphere in embedding space. Thus we present a novel open definition for NE which alleviates the obvious drawback in previous closed NE definition with a limited NE dictionary. Then, we show two applications with introducing the proposed named entity hypersphere model. First, using a generative adversarial neural network to learn a transformation matrix of two embedding spaces, which results in a convenient determination of named entity distribution in the target language, indicating the potential of fast named entity discovery only using isomorphic relation between embedding spaces. Second, the named entity hypersphere model is directly integrated with various named entity recognition models over sentences to achieve state-of-the-art results. Only assuming that embeddings are available, we show a prior knowledge free approach on effective named entity distribution depiction.

##### Effective Character-augmented Word Embedding for Machine Reading Comprehension

Aug 07, 2018
Zhuosheng Zhang, Yafang Huang, Pengfei Zhu, Hai Zhao

Machine reading comprehension is a task to model relationship between passage and query. In terms of deep learning framework, most of state-of-the-art models simply concatenate word and character level representations, which has been shown suboptimal for the concerned task. In this paper, we empirically explore different integration strategies of word and character embeddings and propose a character-augmented reader which attends character-level representation to augment word embedding with a short list to improve word representations, especially for rare words. Experimental results show that the proposed approach helps the baseline model significantly outperform state-of-the-art baselines on various public benchmarks.

* Accepted by NLPCC 2018. arXiv admin note: text overlap with arXiv:1806.09103
##### SJTU-NLP at SemEval-2018 Task 9: Neural Hypernym Discovery with Term Embeddings

May 26, 2018
Zhuosheng Zhang, Jiangtong Li, Hai Zhao, Bingjie Tang

This paper describes a hypernym discovery system for our participation in the SemEval-2018 Task 9, which aims to discover the best (set of) candidate hypernyms for input concepts or entities, given the search space of a pre-defined vocabulary. We introduce a neural network architecture for the concerned task and empirically study various neural network models to build the representations in latent space for words and phrases. The evaluated models include convolutional neural network, long-short term memory network, gated recurrent unit and recurrent convolutional neural network. We also explore different embedding methods, including word embedding and sense embedding for better performance.

* SemEval-2018, Workshop of NAACL-HLT 2018
##### DCMN+: Dual Co-Matching Network for Multi-choice Reading Comprehension

Aug 30, 2019
Shuailiang Zhang, Hai Zhao, Yuwei Wu, Zhuosheng Zhang, Xi Zhou, Xiang Zhou

Multi-choice reading comprehension is a challenging task to select an answer from a set of candidate options when given passage and question. Previous approaches usually only calculate question-aware passage representation and ignore passage-aware question representation when modeling the relationship between passage and question, which obviously cannot take the best of information between passage and question. In this work, we propose dual co-matching network (DCMN) which models the relationship among passage, question and answer options bidirectionally. Besides, inspired by how human solve multi-choice questions, we integrate two reading strategies into our model: (i) passage sentence selection that finds the most salient supporting sentences to answer the question, (ii) answer option interaction that encodes the comparison information between answer options. DCMN integrated with the two strategies (DCMN+) obtains state-of-the-art results on five multi-choice reading comprehension datasets which are from different domains: RACE, SemEval-2018 Task 11, ROCStories, COIN, MCTest.

##### SG-Net: Syntax-Guided Machine Reading Comprehension

Aug 14, 2019
Zhuosheng Zhang, Yuwei Wu, Junru Zhou, Sufeng Duan, Hai Zhao

For machine reading comprehension, how to effectively model the linguistic knowledge from the detail-riddled and lengthy passages and get ride of the noises is essential to improve its performance. In this work, we propose using syntax to guide the text modeling of both passages and questions by incorporating syntactic clues into multi-head attention mechanism to fully fuse information from both global and attended representations. Accordingly, we present a novel syntax-guided network (SG-Net) for challenging reading comprehension tasks. Extensive experiments on popular benchmarks including SQuAD 2.0 and RACE validate the effectiveness of the proposed method with substantial improvements over fine-tuned BERT. This work empirically discloses the effectiveness of syntactic structural information for text modeling. The proposed attention mechanism also verifies the practicability of using linguistic information to guide attention learning and can be easily adapted with other tree-structured annotations.

##### Dual Co-Matching Network for Multi-choice Reading Comprehension

Jan 27, 2019
Shuailiang Zhang, Hai Zhao, Yuwei Wu, Zhuosheng Zhang, Xi Zhou, Xiang Zhou

Multi-choice reading comprehension is a challenging task that requires complex reasoning procedure. Given passage and question, a correct answer need to be selected from a set of candidate answers. In this paper, we propose \textbf{D}ual \textbf{C}o-\textbf{M}atching \textbf{N}etwork (\textbf{DCMN}) which model the relationship among passage, question and answer bidirectionally. Different from existing approaches which only calculate question-aware or option-aware passage representation, we calculate passage-aware question representation and passage-aware answer representation at the same time. To demonstrate the effectiveness of our model, we evaluate our model on a large-scale multiple choice machine reading comprehension dataset({\em i.e.} RACE). Experimental result show that our proposed model achieves new state-of-the-art results.

* arXiv admin note: text overlap with arXiv:1806.04068 by other authors
##### Modeling Multi-turn Conversation with Deep Utterance Aggregation

Nov 06, 2018
Zhuosheng Zhang, Jiangtong Li, Pengfei Zhu, Hai Zhao, Gongshen Liu

Multi-turn conversation understanding is a major challenge for building intelligent dialogue systems. This work focuses on retrieval-based response matching for multi-turn conversation whose related work simply concatenates the conversation utterances, ignoring the interactions among previous utterances for context modeling. In this paper, we formulate previous utterances into context using a proposed deep utterance aggregation model to form a fine-grained context representation. In detail, a self-matching attention is first introduced to route the vital information in each utterance. Then the model matches a response with each refined utterance and the final matching score is obtained after attentive turns aggregation. Experimental results show our model outperforms the state-of-the-art methods on three multi-turn conversation benchmarks, including a newly introduced e-commerce dialogue corpus.

* COLING 2018, pages 3740-3752
* Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018)
##### Effective Subword Segmentation for Text Comprehension

Nov 06, 2018
Zhuosheng Zhang, Hai Zhao, Kangwei Ling, Jiangtong Li, Zuchao Li, Shexia He

Character-level representations have been broadly adopted to alleviate the problem of effectively representing rare or complex words. However, character itself is not a natural minimal linguistic unit for representation or word embedding composing due to ignoring the linguistic coherence of consecutive characters inside word. This paper presents a general subword-augmented embedding framework for learning and composing computationally-derived subword-level representations. We survey a series of unsupervised segmentation methods for subword acquisition and different subword-augmented strategies for text understanding, showing that subword-augmented embedding significantly improves our baselines in multiple text understanding tasks on both English and Chinese languages.

##### Lingke: A Fine-grained Multi-turn Chatbot for Customer Service

Aug 10, 2018
Pengfei Zhu, Zhuosheng Zhang, Jiangtong Li, Yafang Huang, Hai Zhao

Traditional chatbots usually need a mass of human dialogue data, especially when using supervised machine learning method. Though they can easily deal with single-turn question answering, for multi-turn the performance is usually unsatisfactory. In this paper, we present Lingke, an information retrieval augmented chatbot which is able to answer questions based on given product introduction document and deal with multi-turn conversations. We will introduce a fine-grained pipeline processing to distill responses based on unstructured documents, and attentive sequential context-response matching for multi-turn conversations.

* Accepted by COLING 2018 demonstration paper
##### Dependency or Span, End-to-End Uniform Semantic Role Labeling

Jan 16, 2019
Zuchao Li, Shexia He, Hai Zhao, Yiqing Zhang, Zhuosheng Zhang, Xi Zhou, Xiang Zhou

Semantic role labeling (SRL) aims to discover the predicateargument structure of a sentence. End-to-end SRL without syntactic input has received great attention. However, most of them focus on either span-based or dependency-based semantic representation form and only show specific model optimization respectively. Meanwhile, handling these two SRL tasks uniformly was less successful. This paper presents an end-to-end model for both dependency and span SRL with a unified argument representation to deal with two different types of argument annotations in a uniform fashion. Furthermore, we jointly predict all predicates and arguments, especially including long-term ignored predicate identification subtask. Our single model achieves new state-of-the-art results on both span (CoNLL 2005, 2012) and dependency (CoNLL 2008, 2009) SRL benchmarks.

##### Probing Contextualized Sentence Representations with Visual Awareness

We present a universal framework to model contextualized sentence representations with visual awareness that is motivated to overcome the shortcomings of the multimodal parallel data with manual annotations. For each sentence, we first retrieve a diversity of images from a shared cross-modal embedding space, which is pre-trained on a large-scale of text-image pairs. Then, the texts and images are respectively encoded by transformer encoder and convolutional neural network. The two sequences of representations are further fused by a simple and effective attention layer. The architecture can be easily applied to text-only natural language processing tasks without manually annotating multimodal parallel corpora. We apply the proposed method on three tasks, including neural machine translation, natural language inference and sequence labeling and experimental results verify the effectiveness.

##### Semantics-aware BERT for Language Understanding

The latest work on language representations carefully integrates contextualized features into language model training, which enables a series of success especially in various machine reading comprehension and natural language inference tasks. However, the existing language representation models including ELMo, GPT and BERT only exploit plain context-sensitive features such as character or word embeddings. They rarely consider incorporating structured semantic information which can provide rich semantics for language representation. To promote natural language understanding, we propose to incorporate explicit contextual semantics from pre-trained semantic role labeling, and introduce an improved language representation model, Semantics-aware BERT (SemBERT), which is capable of explicitly absorbing contextual semantics over a BERT backbone. SemBERT keeps the convenient usability of its BERT precursor in a light fine-tuning way without substantial task-specific modifications. Compared with BERT, semantics-aware BERT is as simple in concept but more powerful. It obtains new state-of-the-art or substantially improves results on ten reading comprehension and language inference tasks.