Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Indira Sen

People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection

Nov 02, 2023
Indira Sen, Dennis Assenmacher, Mattia Samory, Isabelle Augenstein, Wil van der Aalst, Claudia Wagne

NLP models are used in a variety of critical social computing tasks, such as detecting sexist, racist, or otherwise hateful content. Therefore, it is imperative that these models are robust to spurious features. Past work has attempted to tackle such spurious features using training data augmentation, including Counterfactually Augmented Data (CADs). CADs introduce minimal changes to existing training data points and flip their labels; training on them may reduce model dependency on spurious features. However, manually generating CADs can be time-consuming and expensive. Hence in this work, we assess if this task can be automated using generative NLP models. We automatically generate CADs using Polyjuice, ChatGPT, and Flan-T5, and evaluate their usefulness in improving model robustness compared to manually-generated CADs. By testing both model performance on multiple out-of-domain test sets and individual data point efficacy, our results show that while manual CADs are still the most effective, CADs generated by ChatGPT come a close second. One key reason for the lower performance of automated methods is that the changes they introduce are often insufficient to flip the original label.

* Preprint of EMNLP'23 paper

Via

Access Paper or Ask Questions

Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection

May 09, 2022
Indira Sen, Mattia Samory, Claudia Wagner, Isabelle Augenstein

Figure 1 for Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection

Figure 2 for Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection

Figure 3 for Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection

Figure 4 for Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection

Counterfactually Augmented Data (CAD) aims to improve out-of-domain generalizability, an indicator of model robustness. The improvement is credited with promoting core features of the construct over spurious artifacts that happen to correlate with it. Yet, over-relying on core features may lead to unintended model bias. Especially, construct-driven CAD -- perturbations of core features -- may induce models to ignore the context in which core features are used. Here, we test models for sexism and hate speech detection on challenging data: non-hateful and non-sexist usage of identity and gendered terms. In these hard cases, models trained on CAD, especially construct-driven CAD, show higher false-positive rates than models trained on the original, unperturbed data. Using a diverse set of CAD -- construct-driven and construct-agnostic -- reduces such unintended bias.

* Accepted to NAACL'22 as a short paper

Via

Access Paper or Ask Questions

"Unsex me here": Revisiting Sexism Detection Using Psychological Scales and Adversarial Samples

Apr 27, 2020
Mattia Samory, Indira Sen, Julian Kohne, Fabian Floeck, Claudia Wagner

Figure 1 for "Unsex me here": Revisiting Sexism Detection Using Psychological Scales and Adversarial Samples

Figure 2 for "Unsex me here": Revisiting Sexism Detection Using Psychological Scales and Adversarial Samples

Figure 3 for "Unsex me here": Revisiting Sexism Detection Using Psychological Scales and Adversarial Samples

Figure 4 for "Unsex me here": Revisiting Sexism Detection Using Psychological Scales and Adversarial Samples

To effectively tackle sexism online, research has focused on automated methods for detecting sexism. In this paper, we use items from psychological scales and adversarial sample generation to 1) provide a codebook for different types of sexism in theory-driven scales and in social media text; 2) test the performance of different sexism detection methods across multiple data sets; 3) provide an overview of strategies employed by humans to remove sexism through minimal changes. Results highlight that current methods seem inadequate in detecting all but the most blatant forms of sexism and do not generalize well to out-of-domain examples. By providing a scale-based codebook for sexism and insights into what makes a statement sexist, we hope to contribute to the development of better and broader models for sexism detection, including reflections on theory-driven approaches to data collection.

* Indira Sen and Julian Kohne contributed equally to this work

Via

Access Paper or Ask Questions