Threats in AI/ML Models
Sonatype Lifecycle can detect malware, vulnerabilities and license obligations within AI/ML models downloaded from the Hugging Face repository. These capabilities are continuously evolving to stay in sync with the rapidly changing open-source AI/ML technology landscape. Please check release notes for the latest enhancements to our AI/ML threat protection capabilities.
Risks with AI/ML Components
AI/ML components are reusable building blocks generally written in Python, C++, Java, Go(Golang) that contribute directly to the functionality of an AI/ML system. These functionalities include:
Data pre-processing for data cleaning, transformation, and feature engineering.
Data pre-processing for data cleaning, transformation, and feature engineering.
Model evaluation to assess the performance of trained ML models based on accuracy and effectiveness.
Inference components that deploy the trained models to new data.
Natural Language Processing (NLP) components that handle text analysis, sentiment analysis and language translations.
Computer Vision components that see or interpret images for object or image recognition.
Sonatype Lifecycle with its comprehensive coordinate matching policy constraints can detect most vulnerabilities affecting these components during policy evaluations. It provides protection against security, legal and quality risks.
AI/ML Security Risk
Sonatype Lifecycle can help identify open-source AI/ML libraries of a wide range of formats on Hugging Face, for e.g. Pytorch with extensions .bin
, .pt
, .pkl
, .pickle
. Additionally, our Security Research team identifies models that execute malicious code, have been built with poisoned data sets, and other model related vulnerabilities.
Usage Scenario
In addition to having traditional security policies driven by a vulnerability's severity, AppSec teams can create security policies designed to detect pickle files that can execute arbitrary code, create new processes or establish networking capabilities, during the de-serialization process.Lifecycle’s analysis at each DevOps stage ensures that the same (identical) model that has been cleared for use is being used for development.
See table for a complete list of supported formats and extensions.
AI/ML Legal Risk
Sonatype Lifecycle and Advanced Legal Pack (ALP) can detect and manage license attributions and obligations by providing detailed information on effective, declared, and observed licenses for open-source AI/ML models on Hugging Face. Additionally, Lifecycle and ALP provide legal coverage for obligations that are specific to AI/ML models like “restriction on use of output to train other models” or “restriction on use of output for commercial purposes”.
Usage Scenario
Open source governance teams can ensure, through license policies, that AI models meet organization guidelines and are acceptable for use.
AI/ML Quality Risk
Architects and AppSec teams can create policy to ensure that the models being used by data science or development meet certain quality criteria. For instance, Lifecycle supports policy on if the model is foundational or derivative, or contains basis or “not safe for work” data in its training set. Having these policies in place can be used to automate many of the gates otherwise required before adopting a model.