#Enigma2022: Contextual Security Should Supplement Machine Learning for Malware Detection
Malware continues to be one of the most effective attack vectors in use today, and it is often combatted with machine learning-powered security tools for intrusion detection and prevention systems.
According to Nidhi Rastogi, Assistant Professor at the Rochester Institute of Technology, machine learning security tools are not nearly as effective as they could be, as several different limitations often hinder them. Rastogi presented her views on the limitations of machine learning for security and a potential solution known as contextual security at a session on February 2 at the Engima 2022 Conference.
A key challenge for contemporary machine learning security comes from false alerts. Rastogi explained the impact of false alerts is both wasted time by organizations and security gaps that could potentially expose an organization to unnecessary risk.
“It is very difficult to get rid of false positives and false negatives,” Rastogi said.
Why Machine Learning Models Generate False Alerts
Among the primary reasons machine learning models tend to generate false alerts is a lack of sufficient representative data.
Machine learning, by definition, is an approach where a machine learns how to do something that is often enabled by some form of training on a data set. If the training data set doesn’t have all the correct data, it cannot identify all malware accurately.
Rastogi said that one possible way to improve machine learning security models is to integrate a continuous learning model. In that approach, as new attack vectors and vulnerabilities are discovered, the new data is continuously being used to train the machine learning system.
Adding Context to Boost Malware Detection Efficacy
However, getting the right data to train a model is often easier said than done. Rastogi suggests providing additional context as an opportunity to improve malware detection and machine learning models.
The additional context can be derived from third-party and open source threat intelligence (OSINT) sources. Those sources provide threat reports and analysis on new and often novel attacks. The challenge with OSINT is that it is usually in the form of unstructured data, blog posts and other formats that don’t work particularly well to train a machine learning model.
“These reports are written in human-understandable language and provide context which otherwise wouldn’t be possible to capture in code,” Rastogi said.
Using Knowledge Graphs for Contextual Security
So how can unstructured data help to inform machine learning and improve malware detection? Rastogi and her team are attempting to use an approach known as a knowledge graph.
A knowledge graph uses what is known as a graph database, which maps the relationship between different data points. According to Rastogi, the biggest advantage of using knowledge graphs is that it enables an approach to capture and better understand unstructured information written in a language understood by humans.
“All of this combined data on a knowledge graph can help to identify or infer attack patterns when a malware threat is evolving,” she said. “That’s the advantage of using knowledge graphs, and that’s what our research is pursuing.”
By adding context and data lineage that help track the source of the data and its trustworthiness, Rastogi said that the overall accuracy of malware detection could be improved.
“We need to go beyond measuring the performance of machine learning models using accuracy and precision scores,” Rastogi said. “We want to be able to help analysts by inference with confidence and context.”
More Stories
Friday Squid Blogging: Protecting Cephalopods in Medical Research
From Nature: Cephalopods such as octopuses and squid could soon receive the same legal protection as mice and monkeys do...
Russian Company Offers $20M For Non-NATO Mobile Exploits
Operation Zero will pay $20m for exploits like RCE, LPE and SBX, integral to a full-chain attack Read More
Microsoft’s Bing AI Faces Malware Threat From Deceptive Ads
Malwarebytes said the goal of these tactics is to lure victims into downloading malicious software Read More
Phishing, Smishing Surge Targets US Postal Service
The surge in these attacks has prompted DomainTools to delve into their origins and implications Read More
Three men found guilty of laundering $2.5 million in Target gift card tech support scam
Three Californian residents have been convicted of laundering millions of dollars tricked out of older adults who had fallen victim...
ZeroFont trick makes users think that message has been scanned for threats
Attackers are using the "ZeroFont" technique to manipulate the preview of a message to suggest it had already been scanned...