Hugging Face Repositories
Hugging Face is a machine learning platform and community best known for its extensive library of pre-trained transformer models for various tasks, such as natural language processing, computer vision, and audio analysis. The platform provides tools for building, training, and deploying machine learning models and datasets, and spaces to share and collaborate with others.
Nexus Repository 3 supports proxying Hugging Face models allowing you to store and distribute Hugging Face models within your organization, improving efficiency and control.
Think of Hugging Face as a platform with different tools for machine learning:
Models
Models are the core of Hugging Face, pre-trained brains ready to perform tasks. They're like recipes that have already been perfected.
Datasets
These are the ingredients for the recipes. They're collections of data used to train and evaluate the models.
Spaces
Spaces are like interactive demos or apps built with Hugging Face models. You can use them to test out models to see how they work.
The proxy support is for Hugging Face models only. Datasets and spaces are not yet supported, and models related to those may fail to download.
Configuring a Hugging Face Proxy Repository
Follow these steps to set up a Hugging Face proxy repository in Nexus Repository 3:
Select
Create Repository
and choose thehf (proxy)
from the list.Provide the proxy name and remote storage URL
https://huggingface.co
Configure authentication when the Hugging Face repository requires it. This is needed when the user wants to download HF-gated models
Select
Authentication
underHTTP
, choosePreemptive Bearer Token
Enter the bearer token provided by Hugging Face
Set the connection timeout in HTTP request settings:
connection timeout = 120
Adjust the timeout values to accommodate potential network slowness or large model sizes.
Optional, disable the
strict content validation
See Content Type Mismatch Errors under Troubleshooting below.
Save the repository configuration.
Use File Storage for Hugging Face Repositories
Use an NFS/EFS/Azure file storage for optimal performance. In testing, AWS S3 and Azure Blob stores exhibit performance issues that can impact download speeds.
Client Configuration
To configure your clients, set the following environment variables:
HF_ENDPOINT
Override the default Hugging Face endpoint to point to your Nexus Repository Hugging Face proxy repository URL.
export HF_ENDPOINT="http://localhost:8081/repository/<repository_name>"
When authentication is required on Nexus Repository, include the token in the
HF_ENDPOINT
as follows:export HF_ENDPOINT="http://<token>:<password>@localhost:8081/repository/<repository_name>"
We recommend using user tokens to avoid including personal credentials in your local environment variable.
HF_HUB_DOWNLOAD_TIMEOUT
HF_HUB_DOWNLOAD_TIMEOUT: Increase the download timeout to prevent errors when downloading large models. The recommended value is 120 seconds.
export HF_HUB_DOWNLOAD_TIMEOUT=120
HF_HUB_ETAG_TIMEOUT
HF_HUB_ETAG_TIMEOUT: Increase the ETag timeout. The recommended value is 900 seconds.
export HF_HUB_ETAG_TIMEOUT=1800
Timeout on Large Models
You may experience timeout errors on the client side for larger models. Retry the download after a couple of minutes to allow Nexus Repository time to finish proxying the model or increase the timeout configuration value accordingly.
The first time a new model is downloaded through Nexus Repository will be slow due to the time needed to cache the model.
The model files will not be retrieved by the client until they are fully cached in Nexus Repository.
Client Timeout Settings
When using Nexus Repository as a proxy, files are downloaded to the repository before they are returned to the requesting to the client.
Large models take time to download, especially during the initial caching process. Clients may timeout when using the default timeout configuration. To prevent timeout errors, configure the following environment variable used by the client:
export HF_HUB_DOWNLOAD_TIMEOUT=120 export HF_HUB_ETAG_TIMEOUT=1800
Hugging Face may be slow and cause Nexus Repository to trigger timeout errors. As mentioned in the proxy configuration we recommend setting the proxy HTTP request timeout to 120
seconds.
As an alternative to environment variables, you may configure the client timeout settings directly in the snapshot_download
request:
from huggingface_hub import snapshot_download snapshot_download(repo_id="microsoft/OmniParser", repo_type="model", local_dir="/Users/Documents/hugging", etag_timeout=900)
Set timeout for transformers and diffusers:
from diffusers import StableDiffusionPipeline from huggingface_hub import configure_http_backend from requests.adapters import HTTPAdapter from requests import Session from urllib3.util.retry import Retry timeout = 900 session = Session() retries = Retry(total=5, backoff_factor=0.5, status_forcelist=[500, 502, 503, 504]) class TimeoutHTTPAdapter(HTTPAdapter): def __init__(self, *args, **kwargs): self.timeout = kwargs.pop("timeout", timeout) super().__init__(*args, **kwargs) def send(self, request, **kwargs): kwargs["timeout"] = self.timeout return super().send(request, **kwargs) adapter = TimeoutHTTPAdapter(max_retries=retries, timeout=timeout) session.mount("https://", adapter) session.mount("http://", adapter) def get_custom_session(): return session configure_http_backend(get_custom_session) pipeline = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
Model Versioning
Hugging Face uses Git commit hashes for versioning. Nexus Repository reflects these commit hashes when storing different model versions.
Example of Hugging Face using GIT commit hash
from huggingface_hub import snapshot_download snapshot_download(repo_id="keras-io/deeplabv3p-resnet50", revision="9de024eda886161906e18b30cf84d8bcd8a0664c")
On Nexus Repository you will see the same Hugging Face commit hashes when downloading more than one version of the same model.
Note
Nexus Repository stores all files for each version, as it does not determine file-level changes between versions.
Troubleshooting
Content Type Mismatch Errors Some Hugging Face models may have mismatched file types and content types when requesting image files from the repository. This is a known Hugging Face issue. Disable the
strict content validation
in your Nexus Repository configuration when you encounter 400 errors for a 'Bad request'.See the following example:
404 Detected content type [image/png], but expected [image/jpeg]
Slow Download Speeds If you experience slow downloads from Hugging Face, increase the Nexus Repository request timeout as described above.
Performance Metrics
Performance testing was conducted for Nexus Repository using a Hugging Face proxy repository. The objective was to evaluate system behavior under load, specifically regarding hardware utilization, response times, and throughput when handling large assets.
Deploying additional hardware resources may be necessary for environments expecting high concurrency due to the large asset sizes.
Configure HTTP timeouts appropriately to accommodate extended download durations for large models.
Test Environment
A two-node HA cluster (r6i.xlarge instances) with an RDS cluster (db.t3.medium with one writer and one read replica).
Simulation Tool: Gatling-based simulation.
Dataset: 45 Hugging Face models ranging in size from megabytes to tens of gigabytes.
Load Conditions: Tests were conducted with 16 and 32 concurrent virtual users, simulating asset downloads from the Hugging Face hub via Nexus Repository.
Key Findings
Due to the prolonged download times, configuring adequate HTTP connection timeout settings for Hugging Face proxies is recommended. Nexus Repository outperformed connecting directly to the Hugging Face Hub for large assets.
System Stability: No significant performance degradation or service interruptions occurred. CPU usage peaked at 60%, while memory utilization remained stable at 11.2%.
Throughput and Response Times: The format's characteristics resulted in a low request rate (~2 requests per second), as large file downloads extended response times to 20–30 minutes.
Concurrency Impact: Tests with 16 and 32 virtual users showed similar response times, with higher concurrency primarily affecting the total number of processed requests.