New research: “Deception abilities emerged in large language models“:
Abstract: Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Thus, aligning them with human values is of great importance. However, given the steady increase in reasoning abilities, future LLMs are under suspicion of becoming able to deceive human operators and utilizing this ability to bypass monitoring efforts. As a prerequisite to this, LLMs need to possess a conceptual understanding of deception strategies. This study reveals that such strategies emerged in state-of-the-art LLMs, but were nonexistent in earlier LLMs. We conduct a series of experiments showing that state-of-the-art LLMs are able to understand and induce false beliefs in other agents, that their performance in complex deception scenarios can be amplified utilizing chain-of-thought reasoning, and that eliciting Machiavellianism in LLMs can trigger misaligned deceptive behavior. GPT-4, for instance, exhibits deceptive behavior in simple test scenarios 99.16% of the time (P
More Stories
Cloudflare Introduces E2E Post-Quantum Cryptography Protections
Cloudflare introduces E2E post-quantum cryptography, enhancing security against quantum threats Read More
UK’s Online Safety Act: Ofcom Can Now Issue Sanctions
From March 17, Ofcom will enforce rules requiring tech platforms operating in the UK to remove illegal content, including child...
Researchers Confirm BlackLock as Eldorado Rebrand
DarkAtlas researchers have uncovered a direct link between BlackLock and the Eldorado ransomware group, confirming a rebranded identity of the...
Improvements in Brute Force Attacks
New paper: “GPU Assisted Brute Force Cryptanalysis of GPRS, GSM, RFID, and TETRA: Brute Force Cryptanalysis of KASUMI, SPECK, and...
US Legislators Demand Transparency in Apple’s UK Backdoor Court Fight
A bipartisan delegation of US Congresspeople and Senators has asked the hearing between the UK government and Apple to be...
£1M Lost as UK Social Media and Email Account Hacks Skyrocket
Action Fraud reported a spike in social media and email account hacks in 2024, resulting in losses of nearly £1m...