Blog

Apr 15, 2024

Dataset Reset Policy Optimization for RLHF

Posted by in category: policy

From Cornell, Princeton, & Microsoft.

Dataset Reset Policy Optimization for RLHF https://huggingface.co/papers/2404.

Reinforcement Learning (RL) from Human Preference-based feedback is a popular paradigm for fine-tuning generative models, which has produced impressive models such as GPT-4 and…


Join the discussion on this paper page.

Comments are closed.