Skip to main content

Sonatype Malware Data

The ever increasing number of malicious packages continue to target developers using open source software by exploiting the low entry barrier and shortcomings of the popular ecosystems such as npm and PyPI. The threat due to malicious packages or malware often is overlooked by vulnerability management and scanning tools. This is because the malicious packages exist in a developers’ space, before the DevOps pipeline begins.

Sonatype's malware detection strategy is capable of identifying hundreds of thousands of malicious packages that infiltrate the open source ecosystem. Organizations can block these malicious packages before they enter the development pipelines.

How Sonatype Identifies Malware

Sonatype malware detection uses proprietary machine learning algorithms that are trained by our security research team to identify malware. We use the human-in-the-loop (HITL) approach to eliminate false positives, by further verifying the accuracy of the algorithm’s predictions and tweaking it when errors occur.

Methodology for Data Collection

We examine a broad set of the following data sources:

  • Open source package consumption data and proprietary data, including shadow downloads, which are downloaded directly from package managers and bypass repository manager protections.

  • Dependency update patterns for more than 1.5 trillion requests from Maven Central and thousands of open source projects.

  • Assessment of hundreds of thousands of enterprise applications.

  • Malicious packages observed in Java (Maven Central), JavaScript (npm), Python (PyPI), and .NET(NuGET) ecosystems.

  • Malicious packages blocked by Sonatype Nexus Repository Firewall.

Quarantine Malware

Learn how Sonatype Nexus Repository Firewall can quarantine malicious components and protect your development pipelines from malware.

Malware Detection Process

Our anomaly detection model is an unsupervised machine learning algorithm trained to detect new patterns and outliers in large volumes of open source code.The ability of the model to scale pattern recognition in large data sets and learn continuously allows our malware threat intelligence to stay up-to-date with the latest malware signatures, providing proactive protection against emerging threats.

As part of the model training process, we collect multiple signals for each package which can manifest as anomalies. These signals are treated as features for the model and correlated with historical high impact attacks to generate malware predictions.

Steps:

  1. When an open-source component or version is published, our machine learning algorithm analyzes the release and flags it, if unusual behavior is detected.

  2. An Integrity Rating of “suspicious” is assigned and the component is temporarily quarantined.

  3. Sonatype Research Team conducts a manual in-depth analysis of the “suspicious” component to detect malicious behavior.

  4. If a true positive is confirmed, the Integrity Rating is changed to “malicious”.

  5. If it is a false positive, the Integrity Rating is updated to “normal”.

Integrity Rating

Learn more about Sonatype’s Integrity Rating policy.

Anomalies or Signals Indicating Malicious Behavior

Sonatype’s Malware Detection process looks for behavior anomalies in a package/project that can lead to malicious behavior. These anomalies deviate from normal patterns such as:

  • Sudden changes in the package version numbers

  • Deviation in release cadences

  • Unusual file counts in the package

  • Unidentified code authors

  • Unexplained script/code changes

Association with High Impact Attacks

We analyze past high impact attacks to review the malware attack vectors (e.g. Trojan, Brandjack and Hijack) associated with malware attack types. Common malware attack types that our team has reviewed are crypto miner, host information exfiltration, secrets exfiltration, dropper, data corruption, repository abuser, backdoor, obfuscated code, potentially unwanted application (PUA).

Remediate Malware Threats to Prevent Attacks

Sonatype offers robust malware protection solutions for your open source software supply chain.

Sonatype Repository Firewall: Acts as the first line of defense by blocking malicious components entering your development pipeline.

Sonatype Lifecycle: Helps in early detection of potentially malicious components by integrating with your development environment. Continuous monitoring and policy enforcement prevent infiltration of malicious components further down the development pipeline.

Sonatype Nexus Repository Pro: Offers Malware Risk Dashboard that provides details of malware risk in your repositories and steps to protect your organization. It includes information on the number of proxy repositories that are protected using Sonatype Repository Firewall.