This is really interesting research from a few months ago:
Abstract: Given the computational cost and technical expertise required to train machine learning models, users may delegate the task of learning to a service provider. Delegation of learning has clear benefits, and at the same time raises serious concerns of trust. This work studies possible abuses of power by untrusted learners.We show how a malicious learner can plant an undetectable backdoor into a classifier. On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate “backdoor key,” the mechanism is hidden and cannot be detected by any computationally-bounded observer. We demonstrate two frameworks for planting undetectable backdoors, with incomparable guarantees.
First, we show how to plant a backdoor in any model, using digital signature schemes. The construction guarantees that given query access to the original model and the backdoored version, it is computationally infeasible to find even a single input where they differ. This property implies that the backdoored model has generalization error comparable with the original model. Moreover, even if the distinguisher can request backdoored inputs of its choice, they cannot backdoor a new inputa property we call non-replicability.
Second, we demonstrate how to insert undetectable backdoors in models trained using the Random Fourier Features (RFF) learning paradigm (Rahimi, Recht; NeurIPS 2007). In this construction, undetectability holds against powerful white-box distinguishers: given a complete description of the network and the training data, no efficient distinguisher can guess whether the model is “clean” or contains a backdoor. The backdooring algorithm executes the RFF algorithm faithfully on the given training data, tampering only with its random coins. We prove this strong guarantee under the hardness of the Continuous Learning With Errors problem (Bruna, Regev, Song, Tang; STOC 2021). We show a similar white-box undetectable backdoor for random ReLU networks based on the hardness of Sparse PCA (Berthet, Rigollet; COLT 2013).
Our construction of undetectable backdoors also sheds light on the related issue of robustness to adversarial examples. In particular, by constructing undetectable backdoor for an “adversarially-robust” learning algorithm, we can produce a classifier that is indistinguishable from a robust classifier, but where every input has an adversarial example! In this way, the existence of undetectable backdoors represent a significant theoretical roadblock to certifying adversarial robustness.
Turns out that securing ML systems is really hard.
More Stories
NVIDIA Container Toolkit Vulnerability Exposes AI Systems to Risk
The vulnerability, discovered by Wiz researchers, affects both cloud-based and on-premises AI applications using the toolkit Read More
Critical RCE Vulnerabilities Found in Common Unix Printing System
The newly identified vulnerabilities exploit improper input validation when managing printer requests over the network Read More
US State CISOs Struggling with Insufficient Cybersecurity Funding
A Deloitte and NASCIO survey found that a third of state CISOs do not have a dedicated cybersecurity budget Read...
British man used genealogy websites to fuel alleged hacking and insider trading scheme
A London-based man is facing extradition to the United States after allegedly masterminding a scheme to hack public companies prior...
AI and the 2024 US Elections
For years now, AI has undermined the public’s ability to trust what it sees, hears, and reads. The Republican National...
Cyber-Attacks Hit Over a Third of English Schools
A survey by Ofqual found that 20% of English schools and colleges were unable to immediately recover after being hit...