Hugging Face Repositories

Hugging Face is a machine learning platform and community best known for its extensive library of pre-trained transformer models for various tasks, such as natural language processing, computer vision, and audio analysis. The platform provides tools for building, training, and deploying machine learning models and datasets, and spaces to share and collaborate with others.

Nexus Repository 3 supports proxying Hugging Face models allowing you to store and distribute Hugging Face models within your organization, improving efficiency and control.

Think of Hugging Face as a platform with different tools for machine learning:

Models
Models are the core of Hugging Face, pre-trained brains ready to perform tasks. They're like recipes that have already been perfected.
Datasets
These are the ingredients for the recipes. They're collections of data used to train and evaluate the models.
Spaces
Spaces are like interactive demos or apps built with Hugging Face models. You can use them to test out models to see how they work.

The proxy support is for Hugging Face models only. Datasets and spaces are not yet supported, and models related to those may fail to download.

Configuring a Hugging Face Proxy Repository

Follow these steps to set up a Hugging Face proxy repository in Nexus Repository 3:

Select Create Repository and choose the huggingface (proxy) from the list.
Provide the proxy name and remote storage URL
```
https://huggingface.co
```
Select an appropriate blob store. Note that we recommend you use an NFS/EFS/Azure file storage for optimal performance. In testing, AWS S3 and Azure Blob stores exhibit performance issues that can impact download speeds.
Configure authentication when the Hugging Face repository requires it. This is needed when the user wants to download HF-gated models
1. Select Authentication under HTTP, choose Preemptive Bearer Token
2. Enter the bearer token provided by Hugging Face
Set the connection timeout in HTTP request settings:
```
connection timeout = 120
```
Adjust the timeout values to accommodate potential network slowness or large model sizes.
Optional, disable the strict content validation
See Content Type Mismatch Errors under Troubleshooting below.
Save the repository configuration.

Use File Storage for Hugging Face Repositories

Use an NFS/EFS/Azure file storage for optimal performance. In testing, AWS S3 and Azure Blob stores exhibit performance issues that can impact download speeds.

Client Configuration

To configure your clients, set the following environment variables:

HF_ENDPOINT
Override the default Hugging Face endpoint to point to your Nexus Repository Hugging Face proxy repository URL.
```
export HF_ENDPOINT="http://localhost:8081/repository/<repository_name>"
```
When authentication is required on Nexus Repository, include the token in the HF_ENDPOINT as follows:
```
export HF_ENDPOINT="http://<token>:<password>@localhost:8081/repository/<repository_name>"
```
We recommend using user tokens to avoid including personal credentials in your local environment variable.
HF_HUB_DOWNLOAD_TIMEOUT
HF_HUB_DOWNLOAD_TIMEOUT: Increase the download timeout to prevent errors when downloading large models. The recommended value is 120 seconds.
```
export HF_HUB_DOWNLOAD_TIMEOUT=120
```
HF_HUB_ETAG_TIMEOUT
HF_HUB_ETAG_TIMEOUT: Increase the ETag timeout. The recommended value is 900 seconds.
```
export HF_HUB_ETAG_TIMEOUT=1800
```
Timeout on Large Models
You may experience timeout errors on the client side for larger models. Retry the download after a couple of minutes to allow Nexus Repository time to finish proxying the model or increase the timeout configuration value accordingly.
The first time a new model is downloaded through Nexus Repository will be slow due to the time needed to cache the model.
The model files are not retrieved by the client until fully cached in the Nexus Repository.

Passing Credentials in the Header Authorization

The configuration above sets the authentication to Nexus Repository in the URL string of the HF_ENDPOINT variable. Alternatively, your token may be base64 encoded and included in the header authorization instead.

Consider the following example:

import base64
import os
from huggingface_hub import snapshot_download

base64_token = base64.b64encode("<token_user>:<token_pass>".encode('utf-8')).decode('utf-8')
headers = {"Authorization": f"Basic {base64_token}"}
model_path = snapshot_download(repo_id="repo_id", repo_type="model", endpoint=os.environ["HF_ENDPOINT"], headers=headers)

Client Timeout Settings

When using Nexus Repository as a proxy, files are downloaded to the repository before they are returned to the requesting to the client.

Large models take time to download, especially during the initial caching process. Clients may timeout when using the default timeout configuration. To prevent timeout errors, configure the following environment variable used by the client:

export HF_HUB_DOWNLOAD_TIMEOUT=120
export HF_HUB_ETAG_TIMEOUT=1800

Hugging Face may be slow and cause Nexus Repository to trigger timeout errors. As mentioned in the proxy configuration we recommend setting the proxy HTTP request timeout to 120 seconds.

As an alternative to environment variables, you may configure the client timeout settings directly in the snapshot_download request:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="microsoft/OmniParser", repo_type="model", local_dir="/Users/Documents/hugging", etag_timeout=900)

Set timeout for transformers and diffusers:

from diffusers import StableDiffusionPipeline
from huggingface_hub import configure_http_backend
from requests.adapters import HTTPAdapter
from requests import Session
from urllib3.util.retry import Retry

timeout = 900
session = Session()
retries = Retry(total=5, backoff_factor=0.5, status_forcelist=[500, 502, 503, 504])

class TimeoutHTTPAdapter(HTTPAdapter):
    def __init__(self, *args, **kwargs):
        self.timeout = kwargs.pop("timeout", timeout)
        super().__init__(*args, **kwargs)

    def send(self, request, **kwargs):
        kwargs["timeout"] = self.timeout
        return super().send(request, **kwargs)

adapter = TimeoutHTTPAdapter(max_retries=retries, timeout=timeout)
session.mount("https://", adapter)
session.mount("http://", adapter)

def get_custom_session():
    return session

configure_http_backend(get_custom_session)
pipeline = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")

Model Versioning

Hugging Face uses Git commit hashes for versioning. Nexus Repository reflects these commit hashes when storing different model versions.

Example of Hugging Face using GIT commit hash

from huggingface_hub import snapshot_download
snapshot_download(repo_id="keras-io/deeplabv3p-resnet50", revision="9de024eda886161906e18b30cf84d8bcd8a0664c")

On Nexus Repository you will see the same Hugging Face commit hashes when downloading more than one version of the same model.

Nexus Repository stores all files for each version, as it does not determine file-level changes between versions.

Troubleshooting

Content Type Mismatch Errors
Some Hugging Face models may have mismatched file types and content types when requesting image files from the repository. This is a known Hugging Face issue. Disable the strict content validation in your Nexus Repository configuration when you encounter 400 errors for a 'Bad request'.
See the following example:
```
404 Detected content type [image/png], but expected [image/jpeg]
```
Slow Download Speeds
If you experience slow downloads from Hugging Face, increase the Nexus Repository request timeout as described above.

Performance Metrics

Performance testing was conducted for Nexus Repository using a Hugging Face proxy repository. The objective was to evaluate system behavior under load, specifically regarding hardware utilization, response times, and throughput when handling large assets.

Deploying additional hardware resources may be necessary for environments expecting high concurrency due to the large asset sizes.
Configure HTTP timeouts appropriately to accommodate extended download durations for large models.

Test Environment

A two-node HA cluster (r6i.xlarge instances) with an RDS cluster (db.t3.medium with one writer and one read replica).

Simulation Tool: Gatling-based simulation.
Dataset: 45 Hugging Face models ranging in size from megabytes to tens of gigabytes.
Load Conditions: Tests were conducted with 16 and 32 concurrent virtual users, simulating asset downloads from the Hugging Face hub via Nexus Repository.

Key Findings

Due to the prolonged download times, configuring adequate HTTP connection timeout settings for Hugging Face proxies is recommended. Nexus Repository outperformed connecting directly to the Hugging Face Hub for large assets.

System Stability: No significant performance degradation or service interruptions occurred. CPU usage peaked at 60%, while memory utilization remained stable at 11.2%.
Throughput and Response Times: The format's characteristics resulted in a low request rate (~2 requests per second), as large file downloads extended response times to 20–30 minutes.
Concurrency Impact: Tests with 16 and 32 virtual users showed similar response times, with higher concurrency primarily affecting the total number of processed requests.

Hugging Face Repositories

Models

Datasets

Spaces

Configuring a Hugging Face Proxy Repository

Client Configuration

`HF_ENDPOINT`

`HF_HUB_DOWNLOAD_TIMEOUT`

`HF_HUB_ETAG_TIMEOUT`