Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kartikeya Upasani

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

Dec 07, 2023
Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, Madian Khabsa

We introduce Llama Guard, an LLM-based input-output safeguard model geared towards Human-AI conversation use cases. Our model incorporates a safety risk taxonomy, a valuable tool for categorizing a specific set of safety risks found in LLM prompts (i.e., prompt classification). This taxonomy is also instrumental in classifying the responses generated by LLMs to these prompts, a process we refer to as response classification. For the purpose of both prompt and response classification, we have meticulously gathered a dataset of high quality. Llama Guard, a Llama2-7b model that is instruction-tuned on our collected dataset, albeit low in volume, demonstrates strong performance on existing benchmarks such as the OpenAI Moderation Evaluation dataset and ToxicChat, where its performance matches or exceeds that of currently available content moderation tools. Llama Guard functions as a language model, carrying out multi-class classification and generating binary decision scores. Furthermore, the instruction fine-tuning of Llama Guard allows for the customization of tasks and the adaptation of output formats. This feature enhances the model's capabilities, such as enabling the adjustment of taxonomy categories to align with specific use cases, and facilitating zero-shot or few-shot prompting with diverse taxonomies at the input. We are making Llama Guard model weights available and we encourage researchers to further develop and adapt them to meet the evolving needs of the community for AI safety.

Via

Access Paper or Ask Questions

Interpreting Verbal Irony: Linguistic Strategies and the Connection to the Type of Semantic Incongruity

Nov 05, 2019
Debanjan Ghosh, Elena Musi, Kartikeya Upasani, Smaranda Muresan

Figure 1 for Interpreting Verbal Irony: Linguistic Strategies and the Connection to the Type of Semantic Incongruity

Figure 2 for Interpreting Verbal Irony: Linguistic Strategies and the Connection to the Type of Semantic Incongruity

Figure 3 for Interpreting Verbal Irony: Linguistic Strategies and the Connection to the Type of Semantic Incongruity

Figure 4 for Interpreting Verbal Irony: Linguistic Strategies and the Connection to the Type of Semantic Incongruity

Human communication often involves the use of verbal irony or sarcasm, where the speakers usually mean the opposite of what they say. To better understand how verbal irony is expressed by the speaker and interpreted by the hearer we conduct a crowdsourcing task: given an utterance expressing verbal irony, users are asked to verbalize their interpretation of the speaker's ironic message. We propose a typology of linguistic strategies for verbal irony interpretation and link it to various theoretical linguistic frameworks. We design computational models to capture these strategies and present empirical studies aimed to answer three questions: (1) what is the distribution of linguistic strategies used by hearers to interpret ironic messages?; (2) do hearers adopt similar strategies for interpreting the speaker's ironic intent?; and (3) does the type of semantic incongruity in the ironic message (explicit vs. implicit) influence the choice of interpretation strategies by the hearers?

* Accepted at Society for Computation in Linguistics (SCiL), 2020 Conference

Via

Access Paper or Ask Questions

Constrained Decoding for Neural NLG from Compositional Representations in Task-Oriented Dialogue

Jun 17, 2019
Anusha Balakrishnan, Jinfeng Rao, Kartikeya Upasani, Michael White, Rajen Subba

Figure 1 for Constrained Decoding for Neural NLG from Compositional Representations in Task-Oriented Dialogue

Figure 2 for Constrained Decoding for Neural NLG from Compositional Representations in Task-Oriented Dialogue

Figure 3 for Constrained Decoding for Neural NLG from Compositional Representations in Task-Oriented Dialogue

Figure 4 for Constrained Decoding for Neural NLG from Compositional Representations in Task-Oriented Dialogue

Generating fluent natural language responses from structured semantic representations is a critical step in task-oriented conversational systems. Avenues like the E2E NLG Challenge have encouraged the development of neural approaches, particularly sequence-to-sequence (Seq2Seq) models for this problem. The semantic representations used, however, are often underspecified, which places a higher burden on the generation model for sentence planning, and also limits the extent to which generated responses can be controlled in a live system. In this paper, we (1) propose using tree-structured semantic representations, like those used in traditional rule-based NLG systems, for better discourse-level structuring and sentence-level planning; (2) introduce a challenging dataset using this representation for the weather domain; (3) introduce a constrained decoding approach for Seq2Seq models that leverages this representation to improve semantic correctness; and (4) demonstrate promising results on our dataset and the E2E dataset.

* To appear in the Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019)

Via

Access Paper or Ask Questions

Generate, Filter, and Rank: Grammaticality Classification for Production-Ready NLG Systems

Apr 09, 2019
Ashwini Challa, Kartikeya Upasani, Anusha Balakrishnan, Rajen Subba

Figure 1 for Generate, Filter, and Rank: Grammaticality Classification for Production-Ready NLG Systems

Figure 2 for Generate, Filter, and Rank: Grammaticality Classification for Production-Ready NLG Systems

Figure 3 for Generate, Filter, and Rank: Grammaticality Classification for Production-Ready NLG Systems

Figure 4 for Generate, Filter, and Rank: Grammaticality Classification for Production-Ready NLG Systems

Neural approaches to Natural Language Generation (NLG) have been promising for goal-oriented dialogue. One of the challenges of productionizing these approaches, however, is the ability to control response quality, and ensure that generated responses are acceptable. We propose the use of a generate, filter, and rank framework, in which candidate responses are first filtered to eliminate unacceptable responses, and then ranked to select the best response. While acceptability includes grammatical correctness and semantic correctness, we focus only on grammaticality classification in this paper, and show that existing datasets for grammatical error correction don't correctly capture the distribution of errors that data-driven generators are likely to make. We release a grammatical classification and semantic correctness classification dataset for the weather domain that consists of responses generated by 3 data-driven NLG systems. We then explore two supervised learning approaches (CNNs and GBDTs) for classifying grammaticality. Our experiments show that grammaticality classification is very sensitive to the distribution of errors in the data, and that these distributions vary significantly with both the source of the response as well as the domain. We show that it's possible to achieve high precision with reasonable recall on our dataset.

Via

Access Paper or Ask Questions