#Enigma2022: Contextual Security Should Supplement Machine Learning for Malware Detection
Malware continues to be one of the most effective attack vectors in use today, and it is often combatted with machine learning-powered security tools for intrusion detection and prevention systems.
According to Nidhi Rastogi, Assistant Professor at the Rochester Institute of Technology, machine learning security tools are not nearly as effective as they could be, as several different limitations often hinder them. Rastogi presented her views on the limitations of machine learning for security and a potential solution known as contextual security at a session on February 2 at the Engima 2022 Conference.
A key challenge for contemporary machine learning security comes from false alerts. Rastogi explained the impact of false alerts is both wasted time by organizations and security gaps that could potentially expose an organization to unnecessary risk.
“It is very difficult to get rid of false positives and false negatives,” Rastogi said.
Why Machine Learning Models Generate False Alerts
Among the primary reasons machine learning models tend to generate false alerts is a lack of sufficient representative data.
Machine learning, by definition, is an approach where a machine learns how to do something that is often enabled by some form of training on a data set. If the training data set doesn’t have all the correct data, it cannot identify all malware accurately.
Rastogi said that one possible way to improve machine learning security models is to integrate a continuous learning model. In that approach, as new attack vectors and vulnerabilities are discovered, the new data is continuously being used to train the machine learning system.
Adding Context to Boost Malware Detection Efficacy
However, getting the right data to train a model is often easier said than done. Rastogi suggests providing additional context as an opportunity to improve malware detection and machine learning models.
The additional context can be derived from third-party and open source threat intelligence (OSINT) sources. Those sources provide threat reports and analysis on new and often novel attacks. The challenge with OSINT is that it is usually in the form of unstructured data, blog posts and other formats that don’t work particularly well to train a machine learning model.
“These reports are written in human-understandable language and provide context which otherwise wouldn’t be possible to capture in code,” Rastogi said.
Using Knowledge Graphs for Contextual Security
So how can unstructured data help to inform machine learning and improve malware detection? Rastogi and her team are attempting to use an approach known as a knowledge graph.
A knowledge graph uses what is known as a graph database, which maps the relationship between different data points. According to Rastogi, the biggest advantage of using knowledge graphs is that it enables an approach to capture and better understand unstructured information written in a language understood by humans.
“All of this combined data on a knowledge graph can help to identify or infer attack patterns when a malware threat is evolving,” she said. “That’s the advantage of using knowledge graphs, and that’s what our research is pursuing.”
By adding context and data lineage that help track the source of the data and its trustworthiness, Rastogi said that the overall accuracy of malware detection could be improved.
“We need to go beyond measuring the performance of machine learning models using accuracy and precision scores,” Rastogi said. “We want to be able to help analysts by inference with confidence and context.”
More Stories
Friday Squid Blogging: A New Explanation of Squid Camouflage
New research: An associate professor of chemistry and chemical biology at Northeastern University, Deravi’s recently published paper in the Journal...
Arrests in Tap-to-Pay Scheme Powered by Phishing
Authorities in at least two U.S. states last week independently announced arrests of Chinese nationals accused of perpetrating a novel...
My Writings Are in the LibGen AI Training Corpus
The Atlantic has a search tool that allows you to search for specific works in the “LibGen” database of copyrighted...
Albabat Ransomware Evolves to Target Linux and macOS
Trend Micro observed a continuous development of Albabat ransomware, designed to expand attacks and streamline operations Read More
Cybercriminals Exploit CheckPoint Antivirus Driver in Malicious Campaign
A security researcher has observed threat actors exploiting vulnerabilities in a driver used by CheckPoint’s ZoneAlarm antivirus to bypass Windows...
NCSC Releases Post-Quantum Cryptography Timeline
The UK’s National Computer Security Center (part of GCHQ) released a timeline—also see their blog post—for migration to quantum-computer-resistant cryptography....