Alert button
Picture for Rafael Rafailov

Rafael Rafailov

Alert button

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

Add code
Bookmark button
Alert button
Apr 23, 2024
Fahim Tajwar, Anikait Singh, Archit Sharma, Rafael Rafailov, Jeff Schneider, Tengyang Xie, Stefano Ermon, Chelsea Finn, Aviral Kumar

Viaarxiv icon

Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels

Add code
Bookmark button
Alert button
Apr 22, 2024
Jan-Philipp Fränken, Eric Zelikman, Rafael Rafailov, Kanishk Gandhi, Tobias Gerstenberg, Noah D. Goodman

Viaarxiv icon

From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function

Add code
Bookmark button
Alert button
Apr 18, 2024
Rafael Rafailov, Joey Hejna, Ryan Park, Chelsea Finn

Viaarxiv icon

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

Add code
Bookmark button
Alert button
Apr 01, 2024
Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo

Viaarxiv icon

Disentangling Length from Quality in Direct Preference Optimization

Add code
Bookmark button
Alert button
Mar 28, 2024
Ryan Park, Rafael Rafailov, Stefano Ermon, Chelsea Finn

Figure 1 for Disentangling Length from Quality in Direct Preference Optimization
Figure 2 for Disentangling Length from Quality in Direct Preference Optimization
Figure 3 for Disentangling Length from Quality in Direct Preference Optimization
Figure 4 for Disentangling Length from Quality in Direct Preference Optimization
Viaarxiv icon

Aligning Modalities in Vision Large Language Models via Preference Fine-tuning

Add code
Bookmark button
Alert button
Feb 18, 2024
Yiyang Zhou, Chenhang Cui, Rafael Rafailov, Chelsea Finn, Huaxiu Yao

Viaarxiv icon

MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning

Add code
Bookmark button
Alert button
Jan 06, 2024
Rafael Rafailov, Kyle Hatch, Victor Kolev, John D. Martin, Mariano Phielipp, Chelsea Finn

Viaarxiv icon

Diffusion Model Alignment Using Direct Preference Optimization

Add code
Bookmark button
Alert button
Nov 21, 2023
Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, Nikhil Naik

Viaarxiv icon

Contrastive Preference Learning: Learning from Human Feedback without RL

Add code
Bookmark button
Alert button
Oct 24, 2023
Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh

Viaarxiv icon

An Emulator for Fine-Tuning Large Language Models using Small Language Models

Add code
Bookmark button
Alert button
Oct 19, 2023
Eric Mitchell, Rafael Rafailov, Archit Sharma, Chelsea Finn, Christopher D. Manning

Figure 1 for An Emulator for Fine-Tuning Large Language Models using Small Language Models
Figure 2 for An Emulator for Fine-Tuning Large Language Models using Small Language Models
Figure 3 for An Emulator for Fine-Tuning Large Language Models using Small Language Models
Figure 4 for An Emulator for Fine-Tuning Large Language Models using Small Language Models
Viaarxiv icon