#Enigma2022: Contextual Security Should Supplement Machine Learning for Malware Detection
Malware continues to be one of the most effective attack vectors in use today, and it is often combatted with machine learning-powered security tools for intrusion detection and prevention systems.
According to Nidhi Rastogi, Assistant Professor at the Rochester Institute of Technology, machine learning security tools are not nearly as effective as they could be, as several different limitations often hinder them. Rastogi presented her views on the limitations of machine learning for security and a potential solution known as contextual security at a session on February 2 at the Engima 2022 Conference.
A key challenge for contemporary machine learning security comes from false alerts. Rastogi explained the impact of false alerts is both wasted time by organizations and security gaps that could potentially expose an organization to unnecessary risk.
“It is very difficult to get rid of false positives and false negatives,” Rastogi said.
Why Machine Learning Models Generate False Alerts
Among the primary reasons machine learning models tend to generate false alerts is a lack of sufficient representative data.
Machine learning, by definition, is an approach where a machine learns how to do something that is often enabled by some form of training on a data set. If the training data set doesn’t have all the correct data, it cannot identify all malware accurately.
Rastogi said that one possible way to improve machine learning security models is to integrate a continuous learning model. In that approach, as new attack vectors and vulnerabilities are discovered, the new data is continuously being used to train the machine learning system.
Adding Context to Boost Malware Detection Efficacy
However, getting the right data to train a model is often easier said than done. Rastogi suggests providing additional context as an opportunity to improve malware detection and machine learning models.
The additional context can be derived from third-party and open source threat intelligence (OSINT) sources. Those sources provide threat reports and analysis on new and often novel attacks. The challenge with OSINT is that it is usually in the form of unstructured data, blog posts and other formats that don’t work particularly well to train a machine learning model.
“These reports are written in human-understandable language and provide context which otherwise wouldn’t be possible to capture in code,” Rastogi said.
Using Knowledge Graphs for Contextual Security
So how can unstructured data help to inform machine learning and improve malware detection? Rastogi and her team are attempting to use an approach known as a knowledge graph.
A knowledge graph uses what is known as a graph database, which maps the relationship between different data points. According to Rastogi, the biggest advantage of using knowledge graphs is that it enables an approach to capture and better understand unstructured information written in a language understood by humans.
“All of this combined data on a knowledge graph can help to identify or infer attack patterns when a malware threat is evolving,” she said. “That’s the advantage of using knowledge graphs, and that’s what our research is pursuing.”
By adding context and data lineage that help track the source of the data and its trustworthiness, Rastogi said that the overall accuracy of malware detection could be improved.
“We need to go beyond measuring the performance of machine learning models using accuracy and precision scores,” Rastogi said. “We want to be able to help analysts by inference with confidence and context.”
More Stories
AI Will Write Complex Laws
Artificial intelligence (AI) is writing law today. This has required no changes in legislative procedure or the rules of legislative...
Major Cybersecurity Vendors’ Credentials Found on Dark Web
Cyble has found thousands of security vendors' credentials on the dark web, likely pulled from infostealer logs Read More
Account Compromise and Phishing Top Healthcare Security Incidents
Netwrix claims 84% of healthcare organizations detected a cyber-attack in the past year Read More
Cloudflare Mitigates Record-Breaking 5.6Tbps DDoS Attack
Cloudflare warns of a surge in hyper-volumetric DDoS after revealing it stopped a massive 5.6Tbps attack Read More
Half a million hotel guests at risk after hackers accessed sensitive data
The personal information of almost half a million people is now in the hands of hackers after a security breach...
The AI Fix #34: Fake Brad Pitt and why AI means we will lose our jobs
In episode 34 of The AI Fix, our hosts watch in horror as a vacuum cleaner sprouts a robotic arm...