Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Song Feng

TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

Feb 20, 2024
Liyan Tang, Igor Shalyminov, Amy Wing-mei Wong, Jon Burnsky, Jake W. Vincent, Yu'an Yang, Siffi Singh, Song Feng, Hwanjun Song, Hang Su, Lijia Sun, Yi Zhang, Saab Mansour, Kathleen McKeown

Single document news summarization has seen substantial progress on faithfulness in recent years, driven by research on the evaluation of factual consistency, or hallucinations. We ask whether these advances carry over to other text summarization domains. We propose a new evaluation benchmark on topic-focused dialogue summarization, generated by LLMs of varying sizes. We provide binary sentence-level human annotations of the factual consistency of these summaries along with detailed explanations of factually inconsistent sentences. Our analysis shows that existing LLMs hallucinate significant amounts of factual errors in the dialogue domain, regardless of the model's size. On the other hand, when LLMs, including GPT-4, serve as binary factual evaluators, they perform poorly and can be outperformed by prevailing state-of-the-art specialized factuality evaluation metrics. Finally, we conducted an analysis of hallucination types with a curated error taxonomy. We find that there are diverse errors and error distributions in model-generated summaries and that non-LLM based metrics can capture all error types better than LLM-based evaluators.

* Linguistic annotations available at https://github.com/amazon-science/tofueval

Via

Access Paper or Ask Questions

DFEE: Interactive DataFlow Execution and Evaluation Kit

Dec 04, 2022
Han He, Song Feng, Daniele Bonadiman, Yi Zhang, Saab Mansour

Figure 1 for DFEE: Interactive DataFlow Execution and Evaluation Kit

Figure 2 for DFEE: Interactive DataFlow Execution and Evaluation Kit

Figure 3 for DFEE: Interactive DataFlow Execution and Evaluation Kit

DataFlow has been emerging as a new paradigm for building task-oriented chatbots due to its expressive semantic representations of the dialogue tasks. Despite the availability of a large dataset SMCalFlow and a simplified syntax, the development and evaluation of DataFlow-based chatbots remain challenging due to the system complexity and the lack of downstream toolchains. In this demonstration, we present DFEE, an interactive DataFlow Execution and Evaluation toolkit that supports execution, visualization and benchmarking of semantic parsers given dialogue input and backend database. We demonstrate the system via a complex dialog task: event scheduling that involves temporal reasoning. It also supports diagnosing the parsing results via a friendly interface that allows developers to examine dynamic DataFlow and the corresponding execution results. To illustrate how to benchmark SoTA models, we propose a novel benchmark that covers more sophisticated event scheduling scenarios and a new metric on task success evaluation. The codes of DFEE have been released on https://github.com/amazonscience/dataflow-evaluation-toolkit.

* Accepted to AAAI-23: the Thirty-Seventh AAAI Conference on Artificial Intelligence

Via

Access Paper or Ask Questions

DG2: Data Augmentation Through Document Grounded Dialogue Generation

Dec 15, 2021
Qingyang Wu, Song Feng, Derek Chen, Sachindra Joshi, Luis A. Lastras, Zhou Yu

Figure 1 for DG2: Data Augmentation Through Document Grounded Dialogue Generation

Figure 2 for DG2: Data Augmentation Through Document Grounded Dialogue Generation

Figure 3 for DG2: Data Augmentation Through Document Grounded Dialogue Generation

Figure 4 for DG2: Data Augmentation Through Document Grounded Dialogue Generation

Collecting data for training dialog systems can be extremely expensive due to the involvement of human participants and need for extensive annotation. Especially in document-grounded dialog systems, human experts need to carefully read the unstructured documents to answer the users' questions. As a result, existing document-grounded dialog datasets are relatively small-scale and obstruct the effective training of dialogue systems. In this paper, we propose an automatic data augmentation technique grounded on documents through a generative dialogue model. The dialogue model consists of a user bot and agent bot that can synthesize diverse dialogues given an input document, which are then used to train a downstream model. When supplementing the original dataset, our method achieves significant improvement over traditional data augmentation methods. We also achieve great performance in the low-resource setting.

Via

Access Paper or Ask Questions

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents

Sep 26, 2021
Song Feng, Siva Sankalp Patel, Hui Wan, Sachindra Joshi

Figure 1 for MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents

Figure 2 for MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents

Figure 3 for MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents

Figure 4 for MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents

We propose MultiDoc2Dial, a new task and dataset on modeling goal-oriented dialogues grounded in multiple documents. Most previous works treat document-grounded dialogue modeling as a machine reading comprehension task based on a single given document or passage. In this work, we aim to address more realistic scenarios where a goal-oriented information-seeking conversation involves multiple topics, and hence is grounded on different documents. To facilitate such a task, we introduce a new dataset that contains dialogues grounded in multiple documents from four different domains. We also explore modeling the dialogue-based and document-based context in the dataset. We present strong baseline approaches and various experimental results, aiming to support further research efforts on such a task.

Via

Access Paper or Ask Questions

Explaining Neural Network Predictions on Sentence Pairs via Learning Word-Group Masks

Apr 13, 2021
Hanjie Chen, Song Feng, Jatin Ganhotra, Hui Wan, Chulaka Gunasekara, Sachindra Joshi, Yangfeng Ji

Figure 1 for Explaining Neural Network Predictions on Sentence Pairs via Learning Word-Group Masks

Figure 2 for Explaining Neural Network Predictions on Sentence Pairs via Learning Word-Group Masks

Figure 3 for Explaining Neural Network Predictions on Sentence Pairs via Learning Word-Group Masks

Figure 4 for Explaining Neural Network Predictions on Sentence Pairs via Learning Word-Group Masks

Explaining neural network models is important for increasing their trustworthiness in real-world applications. Most existing methods generate post-hoc explanations for neural network models by identifying individual feature attributions or detecting interactions between adjacent features. However, for models with text pairs as inputs (e.g., paraphrase identification), existing methods are not sufficient to capture feature interactions between two texts and their simple extension of computing all word-pair interactions between two texts is computationally inefficient. In this work, we propose the Group Mask (GMASK) method to implicitly detect word correlations by grouping correlated words from the input text pair together and measure their contribution to the corresponding NLP tasks as a whole. The proposed method is evaluated with two different model architectures (decomposable attention model and BERT) across four datasets, including natural language inference and paraphrase identification tasks. Experiments show the effectiveness of GMASK in providing faithful explanations to these models.

* NAACL-HLT 2021

Via

Access Paper or Ask Questions

doc2dial: A Goal-Oriented Document-Grounded Dialogue Dataset

Nov 18, 2020
Song Feng, Hui Wan, Chulaka Gunasekara, Siva Sankalp Patel, Sachindra Joshi, Luis A. Lastras

Figure 1 for doc2dial: A Goal-Oriented Document-Grounded Dialogue Dataset

Figure 2 for doc2dial: A Goal-Oriented Document-Grounded Dialogue Dataset

Figure 3 for doc2dial: A Goal-Oriented Document-Grounded Dialogue Dataset

Figure 4 for doc2dial: A Goal-Oriented Document-Grounded Dialogue Dataset

We introduce doc2dial, a new dataset of goal-oriented dialogues that are grounded in the associated documents. Inspired by how the authors compose documents for guiding end users, we first construct dialogue flows based on the content elements that corresponds to higher-level relations across text sections as well as lower-level relations between discourse units within a section. Then we present these dialogue flows to crowd contributors to create conversational utterances. The dataset includes about 4800 annotated conversations with an average of 14 turns that are grounded in over 480 documents from four domains. Compared to the prior document-grounded dialogue datasets, this dataset covers a variety of dialogue scenes in information-seeking conversations. For evaluating the versatility of the dataset, we introduce multiple dialogue modeling tasks and present baseline approaches.

* EMNLP 2020

Via

Access Paper or Ask Questions

Learning Lane Graph Representations for Motion Forecasting

Jul 27, 2020
Ming Liang, Bin Yang, Rui Hu, Yun Chen, Renjie Liao, Song Feng, Raquel Urtasun

Figure 1 for Learning Lane Graph Representations for Motion Forecasting

Figure 2 for Learning Lane Graph Representations for Motion Forecasting

Figure 3 for Learning Lane Graph Representations for Motion Forecasting

Figure 4 for Learning Lane Graph Representations for Motion Forecasting

We propose a motion forecasting model that exploits a novel structured map representation as well as actor-map interactions. Instead of encoding vectorized maps as raster images, we construct a lane graph from raw map data to explicitly preserve the map structure. To capture the complex topology and long range dependencies of the lane graph, we propose LaneGCN which extends graph convolutions with multiple adjacency matrices and along-lane dilation. To capture the complex interactions between actors and maps, we exploit a fusion network consisting of four types of interactions, actor-to-lane, lane-to-lane, lane-to-actor and actor-to-actor. Powered by LaneGCN and actor-map interactions, our model is able to predict accurate and realistic multi-modal trajectories. Our approach significantly outperforms the state-of-the-art on the large scale Argoverse motion forecasting benchmark.

* ECCV 2020 Oral

Via

Access Paper or Ask Questions

MultiXNet: Multiclass Multistage Multimodal Motion Prediction

Jun 10, 2020
Nemanja Djuric, Henggang Cui, Zhaoen Su, Shangxuan Wu, Huahua Wang, Fang-Chieh Chou, Luisa San Martin, Song Feng, Rui Hu, Yang Xu, Alyssa Dayan, Sidney Zhang, Brian C. Becker, Gregory P. Meyer, Carlos Vallespi-Gonzalez, Carl K. Wellington

Figure 1 for MultiXNet: Multiclass Multistage Multimodal Motion Prediction

Figure 2 for MultiXNet: Multiclass Multistage Multimodal Motion Prediction

Figure 3 for MultiXNet: Multiclass Multistage Multimodal Motion Prediction

Figure 4 for MultiXNet: Multiclass Multistage Multimodal Motion Prediction

One of the critical pieces of the self-driving puzzle is understanding the surroundings of the self-driving vehicle (SDV) and predicting how these surroundings will change in the near future. To address this task we propose MultiXNet, an end-to-end approach for detection and motion prediction based directly on lidar sensor data. This approach builds on prior work by handling multiple classes of traffic actors, adding a jointly trained second-stage trajectory refinement step, and producing a multimodal probability distribution over future actor motion that includes both multiple discrete traffic behaviors and calibrated continuous uncertainties. The method was evaluated on a large-scale, real-world data set collected by a fleet of SDVs in several cities, with the results indicating that it outperforms existing state-of-the-art approaches.

Via

Access Paper or Ask Questions

Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries

Nov 10, 2019
Fuwen Tan, Paola Cascante-Bonilla, Xiaoxiao Guo, Hui Wu, Song Feng, Vicente Ordonez

Figure 1 for Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries

Figure 2 for Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries

Figure 3 for Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries

Figure 4 for Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries

This paper explores the task of interactive image retrieval using natural language queries, where a user progressively provides input queries to refine a set of retrieval results. Moreover, our work explores this problem in the context of complex image scenes containing multiple objects. We propose Drill-down, an effective framework for encoding multiple queries with an efficient compact state representation that significantly extends current methods for single-round image retrieval. We show that using multiple rounds of natural language queries as input can be surprisingly effective to find arbitrarily specific images of complex scenes. Furthermore, we find that existing image datasets with textual captions can provide a surprisingly effective form of weak supervision for this task. We compare our method with existing sequential encoding and embedding networks, demonstrating superior performance on two proposed benchmarks: automatic image retrieval on a simulated scenario that uses region captions as queries, and interactive image retrieval using real queries from human evaluators.

* 14 pages, 9 figures, NeurIPS 2019

Via

Access Paper or Ask Questions

Discrete Residual Flow for Probabilistic Pedestrian Behavior Prediction

Oct 17, 2019
Ajay Jain, Sergio Casas, Renjie Liao, Yuwen Xiong, Song Feng, Sean Segal, Raquel Urtasun

Figure 1 for Discrete Residual Flow for Probabilistic Pedestrian Behavior Prediction

Figure 2 for Discrete Residual Flow for Probabilistic Pedestrian Behavior Prediction

Figure 3 for Discrete Residual Flow for Probabilistic Pedestrian Behavior Prediction

Figure 4 for Discrete Residual Flow for Probabilistic Pedestrian Behavior Prediction

Self-driving vehicles plan around both static and dynamic objects, applying predictive models of behavior to estimate future locations of the objects in the environment. However, future behavior is inherently uncertain, and models of motion that produce deterministic outputs are limited to short timescales. Particularly difficult is the prediction of human behavior. In this work, we propose the discrete residual flow network (DRF-Net), a convolutional neural network for human motion prediction that captures the uncertainty inherent in long-range motion forecasting. In particular, our learned network effectively captures multimodal posteriors over future human motion by predicting and updating a discretized distribution over spatial locations. We compare our model against several strong competitors and show that our model outperforms all baselines.

* CoRL 2019

Via

Access Paper or Ask Questions