Alert button
Picture for Erik Jenner

Erik Jenner

Alert button

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Bookmark button
Alert button
Apr 15, 2024
Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Yoshua Bengio, Danqi Chen, Samuel Albanie, Tegan Maharaj, Jakob Foerster, Florian Tramer, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger

Viaarxiv icon

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning

Add code
Bookmark button
Alert button
Mar 03, 2024
Leon Lang, Davis Foote, Stuart Russell, Anca Dragan, Erik Jenner, Scott Emmons

Viaarxiv icon

STARC: A General Framework For Quantifying Differences Between Reward Functions

Add code
Bookmark button
Alert button
Sep 26, 2023
Joar Skalse, Lucy Farnik, Sumeet Ramesh Motwani, Erik Jenner, Adam Gleave, Alessandro Abate

Figure 1 for STARC: A General Framework For Quantifying Differences Between Reward Functions
Viaarxiv icon

imitation: Clean Imitation Learning Implementations

Add code
Bookmark button
Alert button
Nov 22, 2022
Adam Gleave, Mohammad Taufeeque, Juan Rocamonde, Erik Jenner, Steven H. Wang, Sam Toyer, Maximilian Ernestus, Nora Belrose, Scott Emmons, Stuart Russell

Figure 1 for imitation: Clean Imitation Learning Implementations
Figure 2 for imitation: Clean Imitation Learning Implementations
Figure 3 for imitation: Clean Imitation Learning Implementations
Figure 4 for imitation: Clean Imitation Learning Implementations
Viaarxiv icon

Calculus on MDPs: Potential Shaping as a Gradient

Add code
Bookmark button
Alert button
Aug 20, 2022
Erik Jenner, Herke van Hoof, Adam Gleave

Figure 1 for Calculus on MDPs: Potential Shaping as a Gradient
Figure 2 for Calculus on MDPs: Potential Shaping as a Gradient
Figure 3 for Calculus on MDPs: Potential Shaping as a Gradient
Figure 4 for Calculus on MDPs: Potential Shaping as a Gradient
Viaarxiv icon

Preprocessing Reward Functions for Interpretability

Add code
Bookmark button
Alert button
Mar 25, 2022
Erik Jenner, Adam Gleave

Figure 1 for Preprocessing Reward Functions for Interpretability
Figure 2 for Preprocessing Reward Functions for Interpretability
Figure 3 for Preprocessing Reward Functions for Interpretability
Figure 4 for Preprocessing Reward Functions for Interpretability
Viaarxiv icon

Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice

Add code
Bookmark button
Alert button
Oct 05, 2021
Erik Jenner, Enrique Fita Sanmartín, Fred A. Hamprecht

Figure 1 for Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice
Figure 2 for Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice
Figure 3 for Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice
Figure 4 for Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice
Viaarxiv icon

Steerable Partial Differential Operators for Equivariant Neural Networks

Add code
Bookmark button
Alert button
Jun 18, 2021
Erik Jenner, Maurice Weiler

Figure 1 for Steerable Partial Differential Operators for Equivariant Neural Networks
Figure 2 for Steerable Partial Differential Operators for Equivariant Neural Networks
Figure 3 for Steerable Partial Differential Operators for Equivariant Neural Networks
Figure 4 for Steerable Partial Differential Operators for Equivariant Neural Networks
Viaarxiv icon