Skip to main content

Threats in AI/ML Models

Sonatype Lifecycle can detect malware, vulnerabilities and license obligations within AI/ML models downloaded from the Hugging Face repository. These capabilities are continuously evolving to stay in sync with the rapidly changing open-source AI/ML technology landscape. Please check release notes for the latest enhancements to our AI/ML threat protection capabilities.

Risks with AI/ML Components

AI/ML components are reusable building blocks generally written in Python, C++, Java, Go(Golang)  that contribute directly to the functionality of an AI/ML system. These functionalities include:

  • Data pre-processing for data cleaning, transformation, and feature engineering.

  • Data pre-processing for data cleaning, transformation, and feature engineering.

  • Model evaluation to assess the performance of trained ML models based on accuracy and effectiveness.

  • Inference components that deploy the trained models to new data.

  • Natural Language Processing (NLP) components that handle text analysis, sentiment analysis and language translations.

  • Computer Vision components that see or interpret images for object or image recognition.

Sonatype Lifecycle with its comprehensive coordinate matching policy constraints can detect most vulnerabilities affecting these components during policy evaluations. It provides protection against security, legal and quality risks.

AI/ML Security Risk

Sonatype Lifecycle can help identify open-source AI/ML libraries of a wide range of formats on Hugging Face, for e.g. Pytorch with extensions .bin, .pt, .pkl, .pickle. Additionally, our Security Research team identifies models that execute malicious code, have been built with poisoned data sets, and other model related vulnerabilities.

Usage Scenario

In addition to having traditional security policies driven by a vulnerability's severity, AppSec teams can create security policies designed to detect pickle files that can execute arbitrary code, create new processes or establish networking capabilities, during the de-serialization process.Lifecycle’s analysis at each DevOps stage ensures that the same (identical) model that has been cleared for use is being used for development.

See table for a complete list of supported formats and extensions.

AI/ML Quality Risk

Architects and AppSec teams can create policy to ensure that the models being used by data science or development meet certain quality criteria. For instance, Lifecycle supports policy on if the model is foundational or derivative, or contains basis or “not safe for work” data in its training set. Having these policies in place can be used to automate many of the gates otherwise required before adopting a model.