Alert button

Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles

Dec 30, 2023
Yuanzhao Zhai, Han Zhang, Yu Lei, Yue Yu, Kele Xu, Dawei Feng, Bo Ding, Huaimin Wang

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: