Alert button
Picture for Daniel Paleka

Daniel Paleka

Alert button

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Bookmark button
Alert button
Apr 15, 2024
Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Yoshua Bengio, Danqi Chen, Samuel Albanie, Tegan Maharaj, Jakob Foerster, Florian Tramer, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger

Viaarxiv icon

ARB: Advanced Reasoning Benchmark for Large Language Models

Add code
Bookmark button
Alert button
Jul 28, 2023
Tomohiro Sawada, Daniel Paleka, Alexander Havrilla, Pranav Tadepalli, Paula Vidas, Alexander Kranias, John J. Nay, Kshitij Gupta, Aran Komatsuzaki

Figure 1 for ARB: Advanced Reasoning Benchmark for Large Language Models
Figure 2 for ARB: Advanced Reasoning Benchmark for Large Language Models
Figure 3 for ARB: Advanced Reasoning Benchmark for Large Language Models
Figure 4 for ARB: Advanced Reasoning Benchmark for Large Language Models
Viaarxiv icon

Evaluating Superhuman Models with Consistency Checks

Add code
Bookmark button
Alert button
Jun 19, 2023
Lukas Fluri, Daniel Paleka, Florian Tramèr

Figure 1 for Evaluating Superhuman Models with Consistency Checks
Figure 2 for Evaluating Superhuman Models with Consistency Checks
Figure 3 for Evaluating Superhuman Models with Consistency Checks
Figure 4 for Evaluating Superhuman Models with Consistency Checks
Viaarxiv icon

Poisoning Web-Scale Training Datasets is Practical

Add code
Bookmark button
Alert button
Feb 20, 2023
Nicholas Carlini, Matthew Jagielski, Christopher A. Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, Florian Tramèr

Figure 1 for Poisoning Web-Scale Training Datasets is Practical
Figure 2 for Poisoning Web-Scale Training Datasets is Practical
Figure 3 for Poisoning Web-Scale Training Datasets is Practical
Figure 4 for Poisoning Web-Scale Training Datasets is Practical
Viaarxiv icon

Red-Teaming the Stable Diffusion Safety Filter

Add code
Bookmark button
Alert button
Oct 11, 2022
Javier Rando, Daniel Paleka, David Lindner, Lennard Heim, Florian Tramèr

Figure 1 for Red-Teaming the Stable Diffusion Safety Filter
Figure 2 for Red-Teaming the Stable Diffusion Safety Filter
Figure 3 for Red-Teaming the Stable Diffusion Safety Filter
Figure 4 for Red-Teaming the Stable Diffusion Safety Filter
Viaarxiv icon

A law of adversarial risk, interpolation, and label noise

Add code
Bookmark button
Alert button
Jul 08, 2022
Daniel Paleka, Amartya Sanyal

Figure 1 for A law of adversarial risk, interpolation, and label noise
Figure 2 for A law of adversarial risk, interpolation, and label noise
Figure 3 for A law of adversarial risk, interpolation, and label noise
Figure 4 for A law of adversarial risk, interpolation, and label noise
Viaarxiv icon