Indirect Instruction Injection in Multi-Modal LLMs

Read Time:31 Second

Interesting research: “(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs“:

Abstract: We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker’s instruction. We illustrate this attack with several proof-of-concept examples targeting LLaVa and PandaGPT.

The AI Fix #30: ChatGPT reveals the devastating truth about Santa (Merry Christmas!)

US and Japan Blame North Korea for $308m Crypto Heist

Spyware Maker NSO Group Found Liable for Hacking WhatsApp

Spyware Maker NSO Group Liable for WhatsApp User Hacks

Major Biometric Data Farming Operation Uncovered

Ransomware Attack Exposes Data of 5.6 Million Ascension Patients

Critical Vulnerabilities Found in WordPress Plugins WPLMS and VibeBP

Criminal Complaint against LockBit Ransomware Writer

Cryptomining Malware Found in Popular Open Source Packages