Skip to main content

Data Quality of Sonatype Data

Sonatype products are driven by data that meets all standards of high-quality data.

How Does Sonatype Provide High-Quality Data?

There are two considerations for data quality: (1) the content of the security advisory and (2) the precision of associating the content with the correct artifact. Automated decisions require extremely precise artifact identification and corresponding association of security information. Without accurate identification and association, there is a high degree of false positives. We recently conducted a study of 6000 of the most popular Java components and found that name-based security association algorithms used by every tool other than Sonatype resulted in:

  • 4500 correct non-issue identifications

  • 1034 true positives

  • 5330 false positives when the advisory identified CPE was part of the component name

  • 2969 false negatives when the advisory identified CPE was not in the component name

False positives incur unnecessary research and upgrade costs. False negatives leave you at risk because there are no indicators that show you may be at risk. Sonatype uses a combination of automated identification and human research that eliminates false positives and negatives. This results in savings in research time to prove false positives and rework time to upgrade when not required.

What Data Does Sonatype Provide?

  • The source of the advisory: Sonatype Security Research or the National Vulnerability Database

  • The severity of the issue: CVSS and scoring system version and the source of the score creation

  • The Common Weakness Enumeration (CWE)

  • The exact description from the advisory

  • A detailed explanation of the advisory risk and the attack vector (because the advisory description is often very poor)

  • How to determine if you are vulnerable

  • A recommendation on how to fix or work around the issue

  • The root cause of the issue; the exact class and vulnerable version range that was found in your code

  • Publicly known attack vectors or exploits; additional resources that describe the exact issue

How is a Vulnerability Score / Severity Calculated?

Sonatype uses the Common Vulnerability Scoring System (CVSS) to score vulnerabilities.

If a vulnerability identifier is prefixed with SONATYPE, then the vulnerability severity is its CVSS version 3 score.

If a vulnerability identifier is prefixed with CVE, then the vulnerability severity is its CVSS version 3 score. In the case that a version 3 score is not available, the score will remain version 2.

Where are the Source Components?

Component binaries come from popular repositories like Central, NuGet.org, npmjs.org, Fedora EPEL, and PyPI. We will also ingest components directly from GitHub, and other project download sites when nominated by customers.

Binary repositories provide the ability to extract information like declared licenses, popularity, and release history. Additional component metadata comes from a variety of sources including direct research.

When is Vulnerability Data Available?

Sonatype Data Services are continuously updated, allowing the most recent data to be visible the instant a Nexus Lifecycle analysis occurs. This is true for both newly published components and newly discovered security issues. We have two processing queues for security vulnerabilities to ensure the immediate availability of security data to our customers:

  • Fast-Track: Our automated vulnerability detection systems process various data sources each day. Upon discovery of an issue, a researcher ensures that an appropriate component was identified, a one-line summary exists, and that the vulnerable version range matches any available advisories. The Fast-Track process generally makes newly discovered vulnerabilities available in less than 24 hours, depending on the severity of the issue.

  • Deep Dive: After the Fast-Track process is complete, issues are selected to undergo the Deep Dive process based on our priority queue. During the Deep Dive process, issues undergo source code analysis to ensure there is an accurate vulnerable version range as well as detailed explanations, detections, and recommendations. The Deep Dive process may cause a change to the implicated components, CVSS score, and versions as we validate and correct the data provided from the initial Fast-Track process. Deep Dive generally takes 24 hours but may take up to 3 days for outliers.

There is no "refresh time" or delay between completing research and making the results of that research available to you as a customer. As soon as the research is completed, the results of that research will be available in new Nexus Lifecycle scans.