Alert button
Picture for Paul Röttger

Paul Röttger

Alert button

From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets

Add code
Bookmark button
Alert button
Apr 27, 2024
Manuel Tonneau, Diyi Liu, Samuel Fraiberger, Ralph Schroeder, Scott A. Hale, Paul Röttger

Viaarxiv icon

Near to Mid-term Risks and Opportunities of Open Source Generative AI

Add code
Bookmark button
Alert button
Apr 25, 2024
Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder de Witt, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Botos Csaba, Fabro Steibel, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Marvin Imperial, Juan A. Nolazco-Flores, Lori Landay, Matthew Jackson, Paul Röttger, Philip H. S. Torr, Trevor Darrell, Yong Suk Lee, Jakob Foerster

Viaarxiv icon

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

Add code
Bookmark button
Alert button
Apr 24, 2024
Hannah Rose Kirk, Alexander Whitefield, Paul Röttger, Andrew Bean, Katerina Margatina, Juan Ciro, Rafael Mosquera, Max Bartolo, Adina Williams, He He, Bertie Vidgen, Scott A. Hale

Viaarxiv icon

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Add code
Bookmark button
Alert button
Apr 18, 2024
Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, Rajat Ghosh, James Goel, Usman Gohar, Sujata Goswami, Scott A. Hale, Wiebke Hutiri, Joseph Marvin Imperial, Surgan Jandial, Nick Judd, Felix Juefei-Xu, Foutse Khomh, Bhavya Kailkhura, Hannah Rose Kirk, Kevin Klyman, Chris Knotz, Michael Kuchnik, Shachi H. Kumar, Chris Lengerich, Bo Li, Zeyi Liao, Eileen Peters Long, Victor Lu, Yifan Mai, Priyanka Mary Mammen, Kelvin Manyeki, Sean McGregor, Virendra Mehta, Shafee Mohammed, Emanuel Moss, Lama Nachman, Dinesh Jinenhally Naganna, Amin Nikanjam, Besmira Nushi, Luis Oala, Iftach Orr, Alicia Parrish, Cigdem Patlak, William Pietri, Forough Poursabzi-Sangdeh, Eleonora Presani, Fabrizio Puletti, Paul Röttger, Saurav Sahay, Tim Santos, Nino Scherrer, Alice Schoenauer Sebag, Patrick Schramowski, Abolfazl Shahbazi, Vin Sharma, Xudong Shen, Vamsi Sistla, Leonard Tang, Davide Testuggine, Vithursan Thangarasa, Elizabeth Anne Watkins, Rebecca Weiss, Chris Welty, Tyler Wilbers, Adina Williams, Carole-Jean Wu, Poonam Yadav, Xianjun Yang, Yi Zeng, Wenhui Zhang, Fedor Zhdanov, Jiacheng Zhu, Percy Liang, Peter Mattson, Joaquin Vanschoren

Viaarxiv icon

Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think

Add code
Bookmark button
Alert button
Apr 12, 2024
Xinpeng Wang, Chengzhi Hu, Bolei Ma, Paul Röttger, Barbara Plank

Viaarxiv icon

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

Add code
Bookmark button
Alert button
Apr 08, 2024
Paul Röttger, Fabio Pernisi, Bertie Vidgen, Dirk Hovy

Viaarxiv icon

Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset

Add code
Bookmark button
Alert button
Mar 28, 2024
Janis Goldzycher, Paul Röttger, Gerold Schneider

Figure 1 for Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
Figure 2 for Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
Figure 3 for Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
Figure 4 for Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
Viaarxiv icon

Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ

Add code
Bookmark button
Alert button
Mar 06, 2024
Carolin Holtermann, Paul Röttger, Timm Dill, Anne Lauscher

Figure 1 for Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ
Figure 2 for Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ
Figure 3 for Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ
Figure 4 for Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ
Viaarxiv icon

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

Add code
Bookmark button
Alert button
Feb 26, 2024
Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schütze, Dirk Hovy

Viaarxiv icon

"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models

Add code
Bookmark button
Alert button
Feb 22, 2024
Xinpeng Wang, Bolei Ma, Chengzhi Hu, Leon Weber-Genzel, Paul Röttger, Frauke Kreuter, Dirk Hovy, Barbara Plank

Viaarxiv icon