Blog

Oct 11, 2024

Over-optimization in RL is well-known, but it even occurs when KL(policy || base model) is constrained fairly tightly

Posted by in category: policy

We’ve detected that JavaScript is disabled in this browser. Please enable JavaScript or switch to a supported browser to continue using x.com. You can see a list of supported browsers in our Help Center.

Help Center

Terms of Service Privacy Policy Cookie Policy Imprint Ads info © 2024 X Corp.

Leave a reply