Skip to main content

Threats in AI/ML Models

Sonatype Lifecycle can detect malware, vulnerabilities and license obligations within AI/ML models downloaded from the Hugging Face repository. These capabilities are continuously evolving to stay in sync with the rapidly changing open-source AI/ML technology landscape. Please check release notes for the latest enhancements to our AI/ML threat protection capabilities.

Risks with AI/ML Components

AI/ML components are reusable building blocks generally written in Python, C++, Java, Go(Golang)  that contribute directly to the functionality of an AI/ML system. These functionalities include:

  • Data pre-processing for data cleaning, transformation, and feature engineering.

  • Data pre-processing for data cleaning, transformation, and feature engineering.

  • Model evaluation to assess the performance of trained ML models based on accuracy and effectiveness.

  • Inference components that deploy the trained models to new data.

  • Natural Language Processing (NLP) components that handle text analysis, sentiment analysis and language translations.

  • Computer Vision components that see or interpret images for object or image recognition.

Sonatype Lifecycle with its comprehensive coordinate matching policy constraints can detect most vulnerabilities affecting these components during policy evaluations. It provides protection against security, legal and quality risks.

AI/ML Security Risk

Sonatype Lifecycle can help identify open-source AI/ML libraries of a wide range of formats on Hugging Face, for e.g. Pytorch with extensions .bin, .pt, .pkl, .pickle. Additionally, our Security Research team identifies models that execute malicious code, have been built with poisoned data sets, and other model related vulnerabilities.

Usage Scenario

In addition to having traditional security policies driven by a vulnerability's severity, AppSec teams can create security policies designed to detect pickle files that can execute arbitrary code, create new processes or establish networking capabilities, during the de-serialization process.Lifecycle’s analysis at each DevOps stage ensures that the same (identical) model that has been cleared for use is being used for development.

See table for a complete list of supported formats and extensions.

AI/ML Quality Risk

Architects and AppSec teams can create policy to ensure that the models being used by data science or development meet certain quality criteria. For instance, Lifecycle supports policy on if the model is Foundation or derivative, or contains basis or “not safe for work” data in its training set. Having these policies in place can be used to automate many of the gates otherwise required before adopting a model.

AI/ML Derivative Model Detection

Data scientists and data engineers often encounter the need to retrain an existing open-source model to enhance the accuracy or adjust a "drift" in the performance of a model. They can tweak a Foundation Model by retraining or fine-tuning it on different data sets, adjusting weights, features and hyper-parameters.

An AI/ML derivative model is a tweaked version derived from a Foundation Model.

Such derivative AI/ML models are also hosted in Hugging Face repositories. However, the derivative models may not have the exact same license restrictions and obligations as the Foundation Model. They may not be proven, or vetted for the organization's business case.

The "Derivative AI Model" policy condition allows users to set a policy to check the lineage of the AI model being used in the application. The policy evaluation will report a violation if a derivative model is used. The report will include the name of corresponding Foundation Model.

The "Derivative AI Model" condition can also be used to establish provenance by ensuring that the exact same AI model that has been approved for use is actually being used in the application. This will ensure that other derived AI models which may be compromised or do not satisfy the accuracy and relevance requirements of the organization's needs, do not enter the development pipelines.