Alert button
Picture for Josef Dai

Josef Dai

Alert button

Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective

Add code
Bookmark button
Alert button
Feb 20, 2024
Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Han Yang, Josef Dai, Xuehai Pan, Yaodong Yang

Viaarxiv icon

Safe RLHF: Safe Reinforcement Learning from Human Feedback

Add code
Bookmark button
Alert button
Oct 19, 2023
Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, Yaodong Yang

Figure 1 for Safe RLHF: Safe Reinforcement Learning from Human Feedback
Figure 2 for Safe RLHF: Safe Reinforcement Learning from Human Feedback
Figure 3 for Safe RLHF: Safe Reinforcement Learning from Human Feedback
Figure 4 for Safe RLHF: Safe Reinforcement Learning from Human Feedback
Viaarxiv icon