Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Diego Frassinelli

Generalizable Sarcasm Detection Is Just Around The Corner, Of Course!

Apr 10, 2024
Hyewon Jang, Diego Frassinelli

Figure 1 for Generalizable Sarcasm Detection Is Just Around The Corner, Of Course!

Figure 2 for Generalizable Sarcasm Detection Is Just Around The Corner, Of Course!

Figure 3 for Generalizable Sarcasm Detection Is Just Around The Corner, Of Course!

Figure 4 for Generalizable Sarcasm Detection Is Just Around The Corner, Of Course!

We tested the robustness of sarcasm detection models by examining their behavior when fine-tuned on four sarcasm datasets containing varying characteristics of sarcasm: label source (authors vs. third-party), domain (social media/online vs. offline conversations/dialogues), style (aggressive vs. humorous mocking). We tested their prediction performance on the same dataset (intra-dataset) and across different datasets (cross-dataset). For intra-dataset predictions, models consistently performed better when fine-tuned with third-party labels rather than with author labels. For cross-dataset predictions, most models failed to generalize well to the other datasets, implying that one type of dataset cannot represent all sorts of sarcasm with different styles and domains. Compared to the existing datasets, models fine-tuned on the new dataset we release in this work showed the highest generalizability to other datasets. With a manual inspection of the datasets and post-hoc analysis, we attributed the difficulty in generalization to the fact that sarcasm actually comes in different domains and styles. We argue that future sarcasm research should take the broad scope of sarcasm into account.

Via

Access Paper or Ask Questions

Investigating the Nature of Disagreements on Mid-Scale Ratings: A Case Study on the Abstractness-Concreteness Continuum

Nov 08, 2023
Urban Knupleš, Diego Frassinelli, Sabine Schulte im Walde

Humans tend to strongly agree on ratings on a scale for extreme cases (e.g., a CAT is judged as very concrete), but judgements on mid-scale words exhibit more disagreement. Yet, collected rating norms are heavily exploited across disciplines. Our study focuses on concreteness ratings and (i) implements correlations and supervised classification to identify salient multi-modal characteristics of mid-scale words, and (ii) applies a hard clustering to identify patterns of systematic disagreement across raters. Our results suggest to either fine-tune or filter mid-scale target words before utilising them.

* 17 pages, 13 figures, accepted to CoNLL 2023

Via

Access Paper or Ask Questions

Interpreting Attention Models with Human Visual Attention in Machine Reading Comprehension

Oct 27, 2020
Ekta Sood, Simon Tannert, Diego Frassinelli, Andreas Bulling, Ngoc Thang Vu

Figure 1 for Interpreting Attention Models with Human Visual Attention in Machine Reading Comprehension

Figure 2 for Interpreting Attention Models with Human Visual Attention in Machine Reading Comprehension

Figure 3 for Interpreting Attention Models with Human Visual Attention in Machine Reading Comprehension

Figure 4 for Interpreting Attention Models with Human Visual Attention in Machine Reading Comprehension

While neural networks with attention mechanisms have achieved superior performance on many natural language processing tasks, it remains unclear to which extent learned attention resembles human visual attention. In this paper, we propose a new method that leverages eye-tracking data to investigate the relationship between human visual attention and neural attention in machine reading comprehension. To this end, we introduce a novel 23 participant eye tracking dataset - MQA-RC, in which participants read movie plots and answered pre-defined questions. We compare state of the art networks based on long short-term memory (LSTM), convolutional neural models (CNN) and XLNet Transformer architectures. We find that higher similarity to human attention and performance significantly correlates to the LSTM and CNN models. However, we show this relationship does not hold true for the XLNet models -- despite the fact that the XLNet performs best on this challenging task. Our results suggest that different architectures seem to learn rather different neural attention strategies and similarity of neural to human attention does not guarantee best performance.

* CoNLL 2020

Via

Access Paper or Ask Questions