Research papers and code for "Lei Shu":
One key task of fine-grained sentiment analysis on reviews is to extract aspects or features that users have expressed opinions on. This paper focuses on supervised aspect extraction using a modified CNN called controlled CNN (Ctrl). The modified CNN has two types of control modules. Through asynchronous parameter updating, it prevents over-fitting and boosts CNN's performance significantly. This model achieves state-of-the-art results on standard aspect extraction datasets. To the best of our knowledge, this is the first paper to apply control modules to aspect extraction.

Click to Read Paper and Get Code
This paper concerns open-world classification, where the classifier not only needs to classify test examples into seen classes that have appeared in training but also reject examples from unseen or novel classes that have not appeared in training. Specifically, this paper focuses on discovering the hidden unseen classes of the rejected examples. Clearly, without prior knowledge this is difficult. However, we do have the data from the seen training classes, which can tell us what kind of similarity/difference is expected for examples from the same class or from different classes. It is reasonable to assume that this knowledge can be transferred to the rejected examples and used to discover the hidden unseen classes in them. This paper aims to solve this problem. It first proposes a joint open classification model with a sub-model for classifying whether a pair of examples belongs to the same or different classes. This sub-model can serve as a distance function for clustering to discover the hidden classes of the rejected examples. Experimental results show that the proposed model is highly promising.

Click to Read Paper and Get Code
Traditional supervised learning makes the closed-world assumption that the classes appeared in the test data must have appeared in training. This also applies to text learning or text classification. As learning is used increasingly in dynamic open environments where some new/test documents may not belong to any of the training classes, identifying these novel documents during classification presents an important problem. This problem is called open-world classification or open classification. This paper proposes a novel deep learning based approach. It outperforms existing state-of-the-art techniques dramatically.

* accepted at EMNLP 2017
Click to Read Paper and Get Code
This paper makes a focused contribution to supervised aspect extraction. It shows that if the system has performed aspect extraction from many past domains and retained their results as knowledge, Conditional Random Fields (CRF) can leverage this knowledge in a lifelong learning manner to extract in a new domain markedly better than the traditional CRF without using this prior knowledge. The key innovation is that even after CRF training, the model can still improve its extraction with experiences in its applications.

* Accepted at ACL 2017. arXiv admin note: text overlap with arXiv:1612.07940
Click to Read Paper and Get Code
Extracting opinion targets is an important task in sentiment analysis on product reviews and complementary entities (products) are one important type of opinion targets that may work together with the reviewed product. In this paper, we address the problem of Complementary Entity Recognition (CER) as a supervised sequence labeling with the capability of expanding domain knowledge as key-value pairs from unlabeled reviews, by automatically learning and enhancing knowledge-based features. We use Conditional Random Field (CRF) as the base learner and augment CRF with knowledge-based features (called the Knowledge-based CRF or KCRF for short). We conduct experiments to show that KCRF effectively improves the performance of supervised CER task.

Click to Read Paper and Get Code
One of the key tasks of sentiment analysis of product reviews is to extract product aspects or features that users have expressed opinions on. In this work, we focus on using supervised sequence labeling as the base approach to performing the task. Although several extraction methods using sequence labeling methods such as Conditional Random Fields (CRF) and Hidden Markov Models (HMM) have been proposed, we show that this supervised approach can be significantly improved by exploiting the idea of concept sharing across multiple domains. For example, "screen" is an aspect in iPhone, but not only iPhone has a screen, many electronic devices have screens too. When "screen" appears in a review of a new domain (or product), it is likely to be an aspect too. Knowing this information enables us to do much better extraction in the new domain. This paper proposes a novel extraction method exploiting this idea in the context of supervised sequence labeling. Experimental results show that it produces markedly better results than without using the past information.

* 10 pages
Click to Read Paper and Get Code
Question-answering plays an important role in e-commerce as it allows potential customers to actively seek crucial information about products or services to help their purchase decision making. Inspired by the recent success of machine reading comprehension (MRC) on formal documents, this paper explores the potential of turning customer reviews into a large source of knowledge that can be exploited to answer user questions.~We call this problem Review Reading Comprehension (RRC). To the best of our knowledge, no existing work has been done on RRC. In this work, we first build an RRC dataset called ReviewRC based on a popular benchmark for aspect-based sentiment analysis. Since ReviewRC has limited training examples for RRC (and also for aspect-based sentiment analysis), we then explore a novel post-training approach on the popular language model BERT to enhance the performance of fine-tuning of BERT for RRC. To show the generality of the approach, the proposed post-training is also applied to some other review-based tasks such as aspect extraction and aspect sentiment classification in aspect-based sentiment analysis. Experimental results demonstrate that the proposed post-training is highly effective. The datasets and code are available at https://www.cs.uic.edu/~hxu/.

* accepted by NAACL 2019
Click to Read Paper and Get Code
Seeking information about products and services is an important activity of online consumers before making a purchase decision. Inspired by recent research on conversational reading comprehension (CRC) on formal documents, this paper studies the task of leveraging knowledge from a huge amount of reviews to answer multi-turn questions from consumers or users. Questions spanning multiple turns in a dialogue enables users to ask more specific questions that are hard to ask within a single question as in traditional machine reading comprehension (MRC). In this paper, we first build a dataset and then propose a novel task-adaptation approach to encoding the formulation of CRC task into a pre-trained language model. This task-adaptation approach is unsupervised and can greatly enhance the performance of the end CRC task that has only limited supervision. Experimental results show that the proposed approach is highly effective and has competitive performance as supervised approach. We plan to release the datasets and the code in May 2019.

Click to Read Paper and Get Code
Classic supervised learning makes the closed-world assumption, meaning that classes seen in testing must have been seen in training. However, in the dynamic world, new or unseen class examples may appear constantly. A model working in such an environment must be able to reject unseen classes (not seen or used in training). If enough data is collected for the unseen classes, the system should incrementally learn to accept/classify them. This learning paradigm is called open-world learning (OWL). Existing OWL methods all need some form of re-training to accept or include the new classes in the overall model. In this paper, we propose a meta-learning approach to the problem. Its key novelty is that it only needs to train a meta-classifier, which can then continually accept new classes when they have enough labeled data for the meta-classifier to use, and also detect/reject future unseen classes. No re-training of the meta-classifier or a new overall classifier covering all old and new classes is needed. In testing, the method only uses the examples of the seen classes (including the newly added classes) on-the-fly for classification and rejection. Experimental results demonstrate the effectiveness of the new approach.

Click to Read Paper and Get Code
Learning high-quality domain word embeddings is important for achieving good performance in many NLP tasks. General-purpose embeddings trained on large-scale corpora are often sub-optimal for domain-specific applications. However, domain-specific tasks often do not have large in-domain corpora for training high-quality domain embeddings. In this paper, we propose a novel lifelong learning setting for domain embedding. That is, when performing the new domain embedding, the system has seen many past domains, and it tries to expand the new in-domain corpus by exploiting the corpora from the past domains via meta-learning. The proposed meta-learner characterizes the similarities of the contexts of the same word in many domain corpora, which helps retrieve relevant data from the past domains to expand the new domain corpus. Experimental results show that domain embeddings produced from such a process improve the performance of the downstream tasks.

* IJCAI 2018
* 7 pages
Click to Read Paper and Get Code
One key task of fine-grained sentiment analysis of product reviews is to extract product aspects or features that users have expressed opinions on. This paper focuses on supervised aspect extraction using deep learning. Unlike other highly sophisticated supervised deep learning models, this paper proposes a novel and yet simple CNN model employing two types of pre-trained embeddings for aspect extraction: general-purpose embeddings and domain-specific embeddings. Without using any additional supervision, this model achieves surprisingly good results, outperforming state-of-the-art sophisticated existing methods. To our knowledge, this paper is the first to report such double embeddings based CNN model for aspect extraction and achieve very good results.

* ACL 2018
Click to Read Paper and Get Code
Functionality is of utmost importance to customers when they purchase products. However, it is unclear to customers whether a product can really satisfy their needs on functions. Further, missing functions may be intentionally hidden by the manufacturers or the sellers. As a result, a customer needs to spend a fair amount of time before purchasing or just purchase the product on his/her own risk. In this paper, we first identify a novel QA corpus that is dense on product functionality information \footnote{The annotated corpus can be found at \url{https://www.cs.uic.edu/~hxu/}.}. We then design a neural network called Semi-supervised Attention Network (SAN) to discover product functions from questions. This model leverages unlabeled data as contextual information to perform semi-supervised sequence labeling. We conduct experiments to show that the extracted function have both high coverage and accuracy, compared with a wide spectrum of baselines.

Click to Read Paper and Get Code
Product compatibility and their functionality are of utmost importance to customers when they purchase products, and to sellers and manufacturers when they sell products. Due to the huge number of products available online, it is infeasible to enumerate and test the compatibility and functionality of every product. In this paper, we address two closely related problems: product compatibility analysis and function satisfiability analysis, where the second problem is a generalization of the first problem (e.g., whether a product works with another product can be considered as a special function). We first identify a novel question and answering corpus that is up-to-date regarding product compatibility and functionality information. To allow automatic discovery product compatibility and functionality, we then propose a deep learning model called Dual Attention Network (DAN). Given a QA pair for a to-be-purchased product, DAN learns to 1) discover complementary products (or functions), and 2) accurately predict the actual compatibility (or satisfiability) of the discovered products (or functions). The challenges addressed by the model include the briefness of QAs, linguistic patterns indicating compatibility, and the appropriate fusion of questions and answers. We conduct experiments to quantitatively and qualitatively show that the identified products and functions have both high coverage and accuracy, compared with a wide spectrum of baselines.

Click to Read Paper and Get Code
Product Community Question Answering (PCQA) provides useful information about products and their features (aspects) that may not be well addressed by product descriptions and reviews. We observe that a product's compatibility issues with other products are frequently discussed in PCQA and such issues are more frequently addressed in accessories, i.e., via a yes/no question "Does this mouse work with windows 10?". In this paper, we address the problem of extracting compatible and incompatible products from yes/no questions in PCQA. This problem can naturally have a two-stage framework: first, we perform Complementary Entity (product) Recognition (CER) on yes/no questions; second, we identify the polarities of yes/no answers to assign the complementary entities a compatibility label (compatible, incompatible or unknown). We leverage an existing unsupervised method for the first stage and a 3-class classifier by combining a distant PU-learning method (learning from positive and unlabeled examples) together with a binary classifier for the second stage. The benefit of using distant PU-learning is that it can help to expand more implicit yes/no answers without using any human annotated data. We conduct experiments on 4 products to show that the proposed method is effective.

* 9 pages, 1 figures
Click to Read Paper and Get Code
Product reviews contain a lot of useful information about product features and customer opinions. One important product feature is the complementary entity (products) that may potentially work together with the reviewed product. Knowing complementary entities of the reviewed product is very important because customers want to buy compatible products and avoid incompatible ones. In this paper, we address the problem of Complementary Entity Recognition (CER). Since no existing method can solve this problem, we first propose a novel unsupervised method to utilize syntactic dependency paths to recognize complementary entities. Then we expand category-level domain knowledge about complementary entities using only a few general seed verbs on a large amount of unlabeled reviews. The domain knowledge helps the unsupervised method to adapt to different products and greatly improves the precision of the CER task. The advantage of the proposed method is that it does not require any labeled data for training. We conducted experiments on 7 popular products with about 1200 reviews in total to demonstrate that the proposed approach is effective.

* 10 pages, 2 figures, IEEE BigData 2016
Click to Read Paper and Get Code
Deep learning models have a large number of freeparameters that need to be calculated by effective trainingof the models on a great deal of training data to improvetheir generalization performance. However, data obtaining andlabeling is expensive in practice. Data augmentation is one of themethods to alleviate this problem. In this paper, we conduct apreliminary study on how three variables (augmentation method,augmentation rate and size of basic dataset per label) can affectthe accuracy of deep learning for image classification. The studyprovides some guidelines: (1) it is better to use transformationsthat alter the geometry of the images rather than those justlighting and color. (2) 2-3 times augmentation rate is good enoughfor training. (3) the smaller amount of data, the more obviouscontributions could have.

* 4 pages, 4 figures
Click to Read Paper and Get Code
We study the problem of stock related question answering (StockQA): automatically generating answers to stock related questions, just like professional stock analysts providing action recommendations to stocks upon user's requests. StockQA is quite different from previous QA tasks since (1) the answers in StockQA are natural language sentences (rather than entities or values) and due to the dynamic nature of StockQA, it is scarcely possible to get reasonable answers in an extractive way from the training data; and (2) StockQA requires properly analyzing the relationship between keywords in QA pair and the numerical features of a stock. We propose to address the problem with a memory-augmented encoder-decoder architecture, and integrate different mechanisms of number understanding and generation, which is a critical component of StockQA. We build a large-scale dataset containing over 180K StockQA instances, based on which various technique combinations are extensively studied and compared. Experimental results show that a hybrid word-character model with separate character components for number processing, achieves the best performance. By analyzing the results, we found that 44.8% of answers generated by our best model still suffer from the generic answer problem, which can be alleviated by a straightforward hybrid retrieval-generation model.

* data: http://ai.tencent.com/ailab/nlp/data/stockQA.tar.gz
Click to Read Paper and Get Code
Automated facial expression analysis has a variety of applications in human-computer interaction. Traditional methods mainly analyze prototypical facial expressions of no more than eight discrete emotions as a classification task. However, in practice, spontaneous facial expressions in naturalistic environment can represent not only a wide range of emotions, but also different intensities within an emotion family. In such situation, these methods are not reliable or adequate. In this paper, we propose to train deep convolutional neural networks (CNNs) to analyze facial expressions explainable in a dimensional emotion model. The proposed method accommodates not only a set of basic emotion expressions, but also a full range of other emotions and subtle emotion intensities that we both feel in ourselves and perceive in others in our daily life. Specifically, we first mapped facial expressions into dimensional measures so that we transformed facial expression analysis from a classification problem to a regression one. We then tested our CNN-based methods for facial expression regression and these methods demonstrated promising performance. Moreover, we improved our method by a bilinear pooling which encodes second-order statistics of features. We showed such bilinear-CNN models significantly outperformed their respective baselines.

* code: http://www.ics.uci.edu/~skong2/DimensionalEmotionModel.html
Click to Read Paper and Get Code