来源:The Guardian Technology · 科技 · 欧洲 · 04-29 17:00

Meet the AI jailbreakers: ‘I see the worst things humanity has produced’

阅读原文 →

AI 摘要

这条新闻显示「Meet the AI jailbreakers: ‘I see the worst things humanity has produced’」正在成为 科技产业 方向的新信号,值得结合 欧洲 与 科技 后续动态继续观察。

关键点

  • 核心事件:Meet the AI jailbreakers: ‘I see the worst things humanity has produced’
  • 所属领域:科技 / 科技产业
  • 观察维度:欧洲、The Guardian Technology 后续报道与同类事件是否继续增加

影响分析

短期可能影响产品路线、开发者生态与产业链预期;若同类新闻继续增加,可能形成新的科技主题。

情绪:中性偏积极 · 相关:The Guardian Technology / 科技 / 欧洲 / 科技产业 · 模板回退

To test the safety and security of AI, hackers have to trick large language models into breaking their own rules. It requires ingenuity and manipulation – and can come at a deep emotional costA few months ago, Valen Tagliabue sat in his hotel room watching his chatbot, and felt euphoric. He had just manipulated it so skilfully, so subtly, that it began ignoring its own safety rules. It told him how to sequence new, potentially lethal pathogens and how to make them resistant to known drugs.Tagliabue had spent much of the previous two years testing and prodding large language models such as Claude and ChatGPT, always with the aim of making them say things they shouldn’t. But this was one of his most advanced “hacks” yet: a sophisticated plan of manipulation, which involved him being cruel, vindictive, sycophantic, even abusive. “I fell into this dark flow where I knew exactly what to say, and what the model would say back, and I watched it pour out everything,” he says. Thanks to him, the creators of the chatbot could now fix the flaw he had found, hopefully making it a little safer for everyone. Continue reading...

阅读原文 →