Analysis

Whether researching new components or tracking open-source in production, you can scan your applications throughout your software development lifecycle (SDLC). In order to get the most value from those scans, we want to start with a solid understanding of the Sonatype evaluation and make sure we are using the best methods to do the analysis for your environment.

Evaluations start by identifying components using the following methods during the application scan: 

  • Recursively scan all target files using a process called Advanced Binary Fingerprinting (ABF)
  • Observe the dependencies declared in the package-lock and manifest files

Advanced Binary Fingerprinting (ABF)

With ABF scanning, we examine binary fingerprints (similar to a truncated sha1 hash) of all of the files and not just the file names and manifests. ABF is highly accurate because it examines everything included in the application after the build, including any embedded dependencies. This means that an ABF scan will never return false positives in its report. Sonatype data is tied to the component fingerprints of any files where the vulnerability is discovered. When a vulnerability is reported it is because the component fingerprint is in your application. Depending on the environment, there are a few points where this might be confusing.

  1. When the build includes development dependencies and clutter such as testing frameworks left in the source control repository. This can be corrected by rescoping the scan to only the artifacts that are deployed to production.
  2. Embedded dependencies that are renamed within the application. These cannot be detected by name matching or manifest scanning but will be caught by ABF.
  3. When vulnerable files are reused in other open source components with completely different names. This secondary expansion of discovered vulnerabilities is unique to Sonatype Evaluations.
  4. ABF can be used to track inner-source components and apply policy to them.
  5. For Java applications, the ABF scanner uses an additional unique “partial matching” technology, which is capable of identifying when a component is “similar” but not “identical” to the cataloged version. Sometimes modifications to open source are made by development teams to 'fix' the vulnerabilities within the project. One major issue with this is that it introduces technical debt and potential risk to the application when those changes are not documented. Modifying open source components will also trigger license risk for components with weak copyleft licenses.

Manifest Evaluation

Scans performed before the application has been built do not have open source dependencies to fingerprint and must therefore leverage package-lock and manifest files to scan. This happens when directly analyzing the source control repositories or software bill of materials earlier in the development lifecycle. The Lifecycle scanners can use the lock or manifest files to get an idea of what should be in the final application. This will include any transitive dependencies if they are included in the lock file. Otherwise, it will report on only what is requested in the manifest.

The Sonatype scanners default to ABF scans as they are accurate to what is in your application. They will then look for lock files followed by manifest files. As we provide feedback earlier in the development/design process, manifest scanning becomes the only option providing the earliest possible feedback during component selection. As a best practice, we recommend development teams include lock files and use fixed versions of components in manifest files. This will ensure repeatable builds and the most accurate feedback for your production application.

IQ Server Data Analysis

The Nexus IQ Server uses data derived from our automated vulnerability detection system — basically, a big funnel of sources (NVD, GitHub commits, OSS Index, Sonatype research, etc.) that is processed with automated techniques such as data filtering, aggregation, and machine learning algorithms.

Some supported ecosystems will have security, license, and identity data, while others have security-only data. License data includes OSS licenses identified in the package manifest, and in the case of Java, any licenses are also observed within the package itself. Identity refers to component details such as recommendations, version graphs, or cataloged data pulled from the package manager repository. We categorize these ecosystems as having either Premium or Standard data capabilities.

Premium Capabilities

For ecosystems with security, license, and identity data, Sonatype researchers triage incoming data and determine if there is a vulnerability, creating a research ticket for further investigation when necessary. Tickets are prioritized and then entered into our human-curated research process. When research is complete, it goes into our data mart which feeds Sonatype Data Services. Data from the Sonatype Data Services is what you’ll then see in the IQ Server Dashboard and Application Composition report after an application scan.

Standard Capabilities

For ecosystems with security-only data, we use an analysis that identifies only those components that have a security vulnerability — which doesn’t include in-depth research or license and identity data. Although you will not find license and identity information, you will still gain visibility, Lifecycle Dashboard access, and security policy information via the Application Composition Report.

Language

Data

ABF/Manifest

Java (Maven)

Security/License/Identity

Both

Javascript (NPM)

Security/License/Identity

Both

.Net (Nuget)

Security/License/Identity

Both

Ruby (RubyGems)

Security/License/Identity

Both

Go (Go Modules)

Security/License/Identity

Manifest

Python

Security/License/Identity

Both

RPM (Yum, Fedora EPEL Repo)

Security/License/Identity

Both

C++ (Conan)

Security/Identity

Manifest

PHP (Composer)

Security

Manifest

Objective-C (CocoaPods)

Security/Identity

Manifest

Conda

Security

Manifest

R (CRAN)

Security

Manifest

Rust (Cargo)

Security

Manifest

Drupal

Security

Manifest

Debian

Security

Manifest

Alpine

Security

Manifest

Swift

Security/Identity

Manifest


For more information, please see our guide on Understanding Sonatype Vulnerability Data.


The examples throughout this section use the IQ Server CLI for scanning. For additional scanning methods and more detailed information, please see our Comprehensive Guide to Lifecycle Scanning.