Security researchers from Ohio State University, Leidos and FireEye were curious if users' opinions about a discovered threat severity that were expressed online aligned with the expert judgments that finally evolved about those threats.
In a paper, "Analyzing the Perceived Severity of Cybersecurity Threats Reported on Social Media," they looked at whether or not natural language processing techniques could be used to analyze users' opinions about the severity of software vulnerabilities that were reported online.
Additionally, they wanted to know if those opinions matched evaluations about the severity of the problem.
As new vulnerabilities are discovered and verified they are assigned CVE numbers (which are unique identifiers), and entered into the National Vulnerability Database (NVD). But, the median delay between the time a vulnerability is first reported online and the time it is published in the NVD has been found by one report to be seven days. The faults are also, at a later date, given severity scores using the Common Vulnerability and Scoring System (CVSS).
So, the researchers wondered if the first descriptions could be used as a predictor of severity.
The short answer is yes, they can. The study took 6,000 tweets that were annotated with opinions toward threat severity, and empirically demonstrated that this dataset would support automatic classification.
The study had a lot of human intervention associated with it. For example, they paid crowd workers on Amazon's MechanicalTurk to annotate their dataset. First, the workers determined whether or not the tweet describes a cybersecurity threat toward a target entity, Secondly, they determined whether the author of the tweet believed the threat was severe.
Two baseline detectors were used to detect reports of cyberthreats as well as analyze opinions about their severity: logistic regression using bag-of-ngram features and 1D convolutional neural networks.
They found that the logistic regression baseline had good performance at identifying threats.
But for evaluating the level of the threat, they found that, "the convolutional neural network consistently achieves higher precision at the same level of recall as compared to logistic regression."
They wanted to predict threat severity so that they could come up with a sorted list of CVEs with those that are indicated to be severe threats at the top.
Analysis of the reliability of individual Twitter accounts was also performed.
They summarized the outcome by saying that they could establish, " linking software vulnerabilities reported in tweets to Common Vulnerabilities and Exposures (CVEs) in the National Vulnerability Database (NVD). Using our predicted severity scores, we show that it is possible to achieve a [email protected] of 0.86 when forecasting high severity vulnerabilities, significantly outperforming a baseline that is based on tweet volume. Finally we showed how reports of severe vulnerabilities online are predictive of real-world exploits."
So, it worked. They found if someone tweets about a problem they call severe, it's likely to end up being classified as severe in the CVE.
This is something the security community already knew, but its always good to have confirmation.
— Larry Loeb has written for many of the last century's major "dead tree" computer magazines, having been, among other things, a consulting editor for BYTE magazine and senior editor for the launch of WebWeek.