New research: “Deception abilities emerged in large language models“:
Abstract: Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Thus, aligning them with human values is of great importance. However, given the steady increase in reasoning abilities, future LLMs are under suspicion of becoming able to deceive human operators and utilizing this ability to bypass monitoring efforts. As a prerequisite to this, LLMs need to possess a conceptual understanding of deception strategies. This study reveals that such strategies emerged in state-of-the-art LLMs, but were nonexistent in earlier LLMs. We conduct a series of experiments showing that state-of-the-art LLMs are able to understand and induce false beliefs in other agents, that their performance in complex deception scenarios can be amplified utilizing chain-of-thought reasoning, and that eliciting Machiavellianism in LLMs can trigger misaligned deceptive behavior. GPT-4, for instance, exhibits deceptive behavior in simple test scenarios 99.16% of the time (P
More Stories
Crooks Bypassed Google’s Email Verification to Create Workspace Accounts, Access 3rd-Party Services
Google says it recently fixed an authentication weakness that allowed crooks to circumvent the email verification required to create a Google...
Friday Squid Blogging: Sunscreen from Squid Pigments
They’re better for the environment. Blog moderation policy. Read More
Compromising the Secure Boot Process
This isn’t good: On Thursday, researchers from security firm Binarly revealed that Secure Boot is completely compromised on more than...
Synnovis Restores Systems After Cyber-Attack, But Blood Shortages Remain
Synnovis has rebuilt “substantial parts” of its systems following the Qilin ransomware attack on June 3, enabling the restoration of...
Hacktivists Claim Leak of CrowdStrike Threat Intelligence
CrowdStrike has acknowledged the claims by the USDoD hacktivist group, which has provided a link to download the alleged threat...
CrowdStrike Falcon Outage Exploited for Social Engineering
Cyber threat actors are exploiting the CrowdStrike Falcon outage to conduct social engineering attacks. Here's what the CIS CTI team...