Short Text Conversation Based on Deep Neural Network and Analysis on Evaluation Measures: Paper and Code

Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Short Text Conversation Based on Deep Neural Network and Analysis on Evaluation Measures

Jul 06, 2019
Hsiang-En Cherng, Chia-Hui Chang

Figure 1 for Short Text Conversation Based on Deep Neural Network and Analysis on Evaluation Measures

Figure 2 for Short Text Conversation Based on Deep Neural Network and Analysis on Evaluation Measures

Figure 3 for Short Text Conversation Based on Deep Neural Network and Analysis on Evaluation Measures

Figure 4 for Short Text Conversation Based on Deep Neural Network and Analysis on Evaluation Measures

Share this with someone who'll enjoy it:

With the development of Natural Language Processing, Automatic question-answering system such as Waston, Siri, Alexa, has become one of the most important NLP applications. Nowadays, enterprises try to build automatic custom service chatbots to save human resources and provide a 24-hour customer service. Evaluation of chatbots currently relied greatly on human annotation which cost a plenty of time. Thus, has initiated a new Short Text Conversation subtask called Dialogue Quality (DQ) and Nugget Detection (ND) which aim to automatically evaluate dialogues generated by chatbots. In this paper, we solve the DQ and ND subtasks by deep neural network. We proposed two models for both DQ and ND subtasks which is constructed by hierarchical structure: embedding layer, utterance layer, context layer and memory layer, to hierarchical learn dialogue representation from word level, sentence level, context level to long range context level. Furthermore, we apply gating and attention mechanism at utterance layer and context layer to improve the performance. We also tried BERT to replace embedding layer and utterance layer as sentence representation. The result shows that BERT produced a better utterance representation than multi-stack CNN for both DQ and ND subtasks and outperform other models proposed by other researches. The evaluation measures are proposed by , that is, NMD, RSNOD for DQ and JSD, RNSS for ND, which is not traditional evaluation measures such as accuracy, precision, recall and f1-score. Thus, we have done a series of experiments by using traditional evaluation measures and analyze the performance and error.

* 8 pages, 5 figures

View paper on

Share this with someone who'll enjoy it: