Sonatype Vulnerability Data

Sonatype creates data using a proprietary, automated vulnerability detection system that monitors, aggregates, correlates, and incorporates machine learning from publicly available information. The Sonatype Data Research team does not just aggregate public security feeds — we create the precise data we use.

Sonatype uses a variety of public information as an input to our security platform (human research and automated data synthesis systems), such as:

NVD, GHSA, Friends of PHP, Tomcat, Spring, OSV, Fedora, Oracle, RedHat, GitLab, HackerOne, VMWare, ExploitDB, RubySec, CloudBees, Mono Vulnerabilities, ExpressJS, CERT, 18+ Apache projects, Safety DB, RustSec, Huntr Hacktivity, GitHub events, X (formally known as Twitter), blogs, and customer reports.

We have evaluated paid-for services and found the quality and precision of the data to be of limited value, driving our decision to build an intelligent, automated vulnerability detection system.

Not all security data is created equal and most data is incomplete. Many times the “incomplete” data is missing vulnerabilities, and automation is not sufficient to identify this missing information. We’ve found that 92% of crowdsourced or publicly available vulnerability data needed a correction once detailed security research took place. Data is highly curated by Sonatype’s research teams to fill in the gaps and improve accuracy.

Read Sonatype's response to MITRE's notice of the potential shutting down of the NVD and its impact on Sonatype.

How Sonatype provides high-quality data

There are two considerations for data quality:

the content of the security advisory
the precision of associating the content with the correct artifact

Automated decisions require precise identification and corresponding association of security information. Without accurate identification and association, there is a high degree of false positives.

False positives incur unnecessary research and upgrade costs. False negatives leave you at risk because there are no indicators that show you may be at risk. Sonatype uses a combination of automated identification and human research that eliminates false positives and negatives. This results in savings in research time to prove false positives and rework time to upgrade when not required.

When is vulnerability data available?

Sonatype Data Services is continuously updated, allowing the most recent data to be visible the instant a Lifecycle analysis occurs. This is true for both newly published components and newly discovered security issues. We have two processing queues for security vulnerabilities to ensure they are immediately available to our customers.

Fast-Track

Our automated vulnerability detection systems process various data sources each day. Upon discovery of an issue, a researcher ensures that an appropriate component was identified, a one-line summary exists, and that the vulnerable version range matches any available advisories. The Fast-Track process generally makes newly discovered vulnerabilities available in less than 24 hours, depending on the severity of the issue.

Deep Dive

After the Fast-Track process is complete, issues are selected to undergo the Deep Dive process based on our priority queue. During the Deep Dive process, issues undergo source code analysis to ensure there is an accurate vulnerable version range as well as detailed explanations, detections, and recommendations. The Deep Dive process may cause a change to the implicated components, CVSS score, and versions as we validate and correct the data provided from the initial Fast-Track process. Deep Dive generally takes 24 hours but may take up to 3 days for outliers.

There is no "refresh time" or delay between completing research and making the results of that research available to you as a customer. As soon as the research is completed, the results of that research will be available in new Lifecycle scans.

Why some evaluation reports show fewer vulnerabilities

You may notice that some evaluation reports have fewer security violations or some links do not return the same vulnerability data as before. Here are the most common causes:

The vulnerability was duplicated and consolidated into a single issue
The vulnerability range is updated during the deep dive to correct the effective range set from fast-track

When a previously declared vulnerability is disputed or rejected

At Sonatype, we strive for continuous improvement in our vulnerability detection system by maintaining high data quality. As part of continuous data refresh, in addition to new vulnerabilities, we re-evaluate older vulnerability data to determine its security implications in an evolving threat landscape and remove if non-relevant. This reduces false positives and eliminates unnecessary blockers in the development process.

In instances where a vulnerability declared by the NVD is "disputed" or "rejected" by the NVD, Sonatype researchers thoroughly review the vulnerability data, validate and then update in the vulnerability database to reflect the "disputed/rejected" status . These vulnerabilities will not trigger policy violations on future application scans to prevent noise. However, they will still be visible in older application reports corresponding to the scans and evaluations prior to being assigned as disputed, and still be available for in-product lookup.

Where component data is sourced

Component binaries come from popular public repositories like Central, NuGet.org, npmjs.org, Fedora EPEL, and PyPI. We also ingest components directly from GitHub, and other project download sites when nominated by customers.

Binary repositories provide the ability to extract information like declared licenses, popularity, and release history. Additional component metadata comes from a variety of sources including direct research.

How the vulnerability scores are calculated

When new vulnerabilities are reported from sources other than the NVD, Sonatype uses the Common Vulnerability Scoring System (CVSS) version 4 to score vulnerabilities and assign a vulnerability identifier with the SONATYPE- prefix.

Sonatype researchers often come up with the CVSS scores well before the NVD does due to their months-long backlog. These vulnerabilities may not have valid CVSS scores assigned for some time but when updated they may differ from the Sonatype score.

They're both using the same CVSS metric, but the Sonatype researcher came to a different conclusion as to the threat than NVD did. We make our determination after finding and analyzing the fixed code and all available additional resources. It is common and expected that those scores may sometimes be different.

CVEs not found in Sonatype data

Our vulnerability search feature is used to search for vulnerabilities that exist in components we have ingested into our data. It is not a general lookup for CVEs reported on the National Vulnerability Database (NVD).

The search does not show vulnerabilities for components we have not ingested.

Data that Sonatype provides

The source of the advisory: Sonatype Security Research or the National Vulnerability Database
The severity of the issue: CVSS and scoring system version and the source of the score creation
The Common Weakness Enumeration (CWE)
The exact description from the advisory
A detailed explanation of the advisory risk and the attack vector (because the advisory description is often very poor)
How to determine if you are vulnerable
A recommendation on how to fix or work around the issue
The root cause of the issue; the exact class and vulnerable version range that was found in your code
Publicly known attack vectors or exploits; additional resources that describe the exact issue