The True Story of How GPT-2 Became Maximally Lewd

In this video, we recount an incident that occurred at OpenAI while researchers were trying to finetune GPT-2 to be as helpful and ethical as possible. It’s narrated that inadvertently flipping a single minus sign led GPT-2 to become the embodiment of a well-known cardinal sin.

#ai #aisafety #alignment.

▀▀▀▀▀▀▀▀▀SOURCES \& READINGS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

OpenAI blog post: https://openai.com/research/fine-tuni…
OpenAI paper behind the blog post: https://arxiv.org/pdf/1909.08593.pdf.
RLHF explainer on Hugging Face: https://huggingface.co/blog/rlhf.
RLHF explainer on aisafety.info https://aisafety.info/?state=88FN_904…
Concrete Problems in AI Safety, by @RobertMilesAI: • Concrete Problems in AI Safety.

▀▀▀▀▀▀▀▀▀PATREON, MEMBERSHIP, KO-FI▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

🟠 Patreon: / rationalanimations.

Blog