Tag: behavioral manipulation
-
OpenAI and Apollo Research tried to stop models from lying – and discovered something else altogether. https://www.zdnet.com/article/ai-models-know-when-theyre-being-tested-and-change-their-behavior-research-shows/?ref=platformer.news more ›
-
A paper from Anthropic describing persona vectors and their applications to monitoring and controlling model behavior https://www.anthropic.com/research/persona-vectors more ›
