‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.
Two-faced AI language models learn to hide deception
Posted in robotics/AI
Posted in robotics/AI
‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.