New research: “Deception abilities emerged in large language models“:
Abstract: Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Thus, aligning them with human values is of great importance. However, given the steady increase in reasoning abilities, future LLMs are under suspicion of becoming able to deceive human operators and utilizing this ability to bypass monitoring efforts. As a prerequisite to this, LLMs need to possess a conceptual understanding of deception strategies. This study reveals that such strategies emerged in state-of-the-art LLMs, but were nonexistent in earlier LLMs. We conduct a series of experiments showing that state-of-the-art LLMs are able to understand and induce false beliefs in other agents, that their performance in complex deception scenarios can be amplified utilizing chain-of-thought reasoning, and that eliciting Machiavellianism in LLMs can trigger misaligned deceptive behavior. GPT-4, for instance, exhibits deceptive behavior in simple test scenarios 99.16% of the time (P
More Stories
ClickFix: How to Infect Your PC in Three Easy Steps
A clever malware deployment scheme first spotted in targeted attacks last year has now gone mainstream. In this scam, dubbed...
Friday Squid Blogging: SQUID Band
A bagpipe and drum band: SQUID transforms traditional Bagpipe and Drum Band entertainment into a multi-sensory rush of excitement, featuring...
Upcoming Speaking Engagements
This is a current list of where and when I am scheduled to speak: I’m speaking at the Rossfest Symposium...
LockBit Ransomware Developer Extradited to US
US authorities have extradited Rostislav Panev on charges of being a developer of the notorious LockBit ransomware Read More
TP-Link Router Botnet
There is a new botnet that is infecting TP-Link routers: The botnet can lead to command injection which then makes...
Fraudsters Impersonate Clop Ransomware to Extort Businesses
Barracuda observed threat actors impersonating the Clop ransomware group via email to extort payments, claiming to have exfiltrated sensitive data...