The UK government has classified data centers as critical infrastructure in a move to protect UK data from cyber-attacks and prevent major IT blackouts
Category Archives: News
Smashing Security podcast #384: A room with a view, AI music shenanigans, and a cocaine bear
It’s a case of algorithm and blues as we look into an AI music scam, Ukraine believes it has caught a spy high in the sky, and a cocaine-fuelled bear goes on the rampage.
All this and more is discussed in the latest edition of the “Smashing Security” podcast by cybersecurity veterans Graham Cluley and Carole Theriault.
Business Email Compromise Costs $55bn Over a Decade
New FBI data reveals BEC scams have cost businesses more than $55bn since 2013
Open Source Updates Have 75% Chance of Breaking Apps
Endor Labs claims security patches can break underlying open source software 75% of the time
Operational Technology Leaves Itself Open to Cyber-Attack
Excessive use of remote access tools is leaving operational technology devices vulnerable, with even basic security features missing
Gallup: Pollster Acts to Close Down Security Threat
As the US presidential election draws near, polling company Gallup acts to block XSS vulnerability
Crypto Scams Reach New Heights, FBI Reports $5.6bn in Losses
The Federal Bureau of Investigation’s Internet Crime Complaint Center (IC3) reported a 45% increase in cryptocurrency-related scams in 2023
Cybersecurity Workforce Gap Rises by 19% Amid Budget Pressures
ISC2 found that the cybersecurity workforce gap is now at 4.8 million, a 19% increase from 2023
Poland’s Supreme Court Blocks Pegasus Spyware Probe
The Polish Supreme Court has ruled that a parliamentary commission investigating the previous government’s use of the Pegasus spyware was unconstitutional
Evaluating the Effectiveness of Reward Modeling of Generative AI Systems
New research evaluating the effectiveness of reward modeling during Reinforcement Learning from Human Feedback (RLHF): “SEAL: Systematic Error Analysis for Value ALignment.” The paper introduces quantitative metrics for evaluating the effectiveness of modeling and aligning human values:
Abstract: Reinforcement Learning from Human Feedback (RLHF) aims to align language models (LMs) with human values by training reward models (RMs) on binary preferences and using these RMs to fine-tune the base LMs. Despite its importance, the internal mechanisms of RLHF remain poorly understood. This paper introduces new metrics to evaluate the effectiveness of modeling and aligning human values, namely feature imprint, alignment resistance and alignment robustness. We categorize alignment datasets into target features (desired values) and spoiler features (undesired concepts). By regressing RM scores against these features, we quantify the extent to which RMs reward them a metric we term feature imprint. We define alignment resistance as the proportion of the preference dataset where RMs fail to match human preferences, and we assess alignment robustness by analyzing RM responses to perturbed inputs. Our experiments, utilizing open-source components like the Anthropic preference dataset and OpenAssistant RMs, reveal significant imprints of target features and a notable sensitivity to spoiler features. We observed a 26% incidence of alignment resistance in portions of the dataset where LM-labelers disagreed with human preferences. Furthermore, we find that misalignment often arises from ambiguous entries within the alignment dataset. These findings underscore the importance of scrutinizing both RMs and alignment datasets for a deeper understanding of value alignment.