Alert button
Picture for Eric J. Michaud

Eric J. Michaud

Alert button

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Add code
Bookmark button
Alert button
Mar 31, 2024
Samuel Marks, Can Rager, Eric J. Michaud, Yonatan Belinkov, David Bau, Aaron Mueller

Figure 1 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Figure 2 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Figure 3 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Figure 4 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Viaarxiv icon

Opening the AI black box: program synthesis via mechanistic interpretability

Add code
Bookmark button
Alert button
Feb 07, 2024
Eric J. Michaud, Isaac Liao, Vedang Lad, Ziming Liu, Anish Mudide, Chloe Loughridge, Zifan Carl Guo, Tara Rezaei Kheirkhah, Mateja Vukelić, Max Tegmark

Viaarxiv icon

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Add code
Bookmark button
Alert button
Jul 27, 2023
Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Bıyık, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell

Figure 1 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 2 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 3 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 4 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Viaarxiv icon

The Quantization Model of Neural Scaling

Add code
Bookmark button
Alert button
Mar 23, 2023
Eric J. Michaud, Ziming Liu, Uzay Girit, Max Tegmark

Figure 1 for The Quantization Model of Neural Scaling
Figure 2 for The Quantization Model of Neural Scaling
Figure 3 for The Quantization Model of Neural Scaling
Figure 4 for The Quantization Model of Neural Scaling
Viaarxiv icon

Precision Machine Learning

Add code
Bookmark button
Alert button
Oct 24, 2022
Eric J. Michaud, Ziming Liu, Max Tegmark

Figure 1 for Precision Machine Learning
Figure 2 for Precision Machine Learning
Figure 3 for Precision Machine Learning
Figure 4 for Precision Machine Learning
Viaarxiv icon

Omnigrok: Grokking Beyond Algorithmic Data

Add code
Bookmark button
Alert button
Oct 03, 2022
Ziming Liu, Eric J. Michaud, Max Tegmark

Figure 1 for Omnigrok: Grokking Beyond Algorithmic Data
Figure 2 for Omnigrok: Grokking Beyond Algorithmic Data
Figure 3 for Omnigrok: Grokking Beyond Algorithmic Data
Figure 4 for Omnigrok: Grokking Beyond Algorithmic Data
Viaarxiv icon

Towards Understanding Grokking: An Effective Theory of Representation Learning

Add code
Bookmark button
Alert button
May 20, 2022
Ziming Liu, Ouail Kitouni, Niklas Nolte, Eric J. Michaud, Max Tegmark, Mike Williams

Figure 1 for Towards Understanding Grokking: An Effective Theory of Representation Learning
Figure 2 for Towards Understanding Grokking: An Effective Theory of Representation Learning
Figure 3 for Towards Understanding Grokking: An Effective Theory of Representation Learning
Figure 4 for Towards Understanding Grokking: An Effective Theory of Representation Learning
Viaarxiv icon

Understanding Learned Reward Functions

Add code
Bookmark button
Alert button
Dec 10, 2020
Eric J. Michaud, Adam Gleave, Stuart Russell

Figure 1 for Understanding Learned Reward Functions
Figure 2 for Understanding Learned Reward Functions
Figure 3 for Understanding Learned Reward Functions
Figure 4 for Understanding Learned Reward Functions
Viaarxiv icon

Examining the causal structures of deep neural networks using information theory

Add code
Bookmark button
Alert button
Oct 26, 2020
Simon Mattsson, Eric J. Michaud, Erik Hoel

Figure 1 for Examining the causal structures of deep neural networks using information theory
Figure 2 for Examining the causal structures of deep neural networks using information theory
Figure 3 for Examining the causal structures of deep neural networks using information theory
Figure 4 for Examining the causal structures of deep neural networks using information theory
Viaarxiv icon