Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kenny Tsu Wei Choo

SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore

May 03, 2024
Ri Chi Ng, Nirmalendu Prakash, Ming Shan Hee, Kenny Tsu Wei Choo, Roy Ka-Wei Lee

To address the limitations of current hate speech detection models, we introduce \textsf{SGHateCheck}, a novel framework designed for the linguistic and cultural context of Singapore and Southeast Asia. It extends the functional testing approach of HateCheck and MHC, employing large language models for translation and paraphrasing into Singapore's main languages, and refining these with native annotators. \textsf{SGHateCheck} reveals critical flaws in state-of-the-art models, highlighting their inadequacy in sensitive content moderation. This work aims to foster the development of more effective hate speech detection tools for diverse linguistic environments, particularly for Singapore and Southeast Asia contexts.

Via

Access Paper or Ask Questions

Evaluating GPT-3 Generated Explanations for Hateful Content Moderation

May 28, 2023
Han Wang, Ming Shan Hee, Md Rabiul Awal, Kenny Tsu Wei Choo, Roy Ka-Wei Lee

Figure 1 for Evaluating GPT-3 Generated Explanations for Hateful Content Moderation

Figure 2 for Evaluating GPT-3 Generated Explanations for Hateful Content Moderation

Figure 3 for Evaluating GPT-3 Generated Explanations for Hateful Content Moderation

Figure 4 for Evaluating GPT-3 Generated Explanations for Hateful Content Moderation

Recent research has focused on using large language models (LLMs) to generate explanations for hate speech through fine-tuning or prompting. Despite the growing interest in this area, these generated explanations' effectiveness and potential limitations remain poorly understood. A key concern is that these explanations, generated by LLMs, may lead to erroneous judgments about the nature of flagged content by both users and content moderators. For instance, an LLM-generated explanation might inaccurately convince a content moderator that a benign piece of content is hateful. In light of this, we propose an analytical framework for examining hate speech explanations and conducted an extensive survey on evaluating such explanations. Specifically, we prompted GPT-3 to generate explanations for both hateful and non-hateful content, and a survey was conducted with 2,400 unique respondents to evaluate the generated explanations. Our findings reveal that (1) human evaluators rated the GPT-generated explanations as high quality in terms of linguistic fluency, informativeness, persuasiveness, and logical soundness, (2) the persuasive nature of these explanations, however, varied depending on the prompting strategy employed, and (3) this persuasiveness may result in incorrect judgments about the hatefulness of the content. Our study underscores the need for caution in applying LLM-generated explanations for content moderation. Code and results are available at https://github.com/Social-AI-Studio/GPT3-HateEval.

* 9 pages, 2 figures, Accepted by International Joint Conference on Artificial Intelligence(IJCAI)

Via

Access Paper or Ask Questions