Cleanup Policies


NEW IN RELEASE 3.14

Summary

The cleanup feature is a way of removing components from your repositories using policies customized to suit your needs. Cleanup policies can be shared between repositories. The Cleanup Task, that performs the cleanup, executes at a frequency decided by you.

By default no policies are created or assigned to any repositories, however, it is highly recommended that an internal cleanup strategy is developed based on repository usage, available storage space and component size.  It is further recommended that cleanup and related policies are determined before disk space issues arise and the situation becomes reactive.

Policy setup may vary by format or repository and very likely varies per installation.

Anything deleted by cleanup policies is soft deleted. Disk space is not reclaimed until an Admin - Compact blob store task is run. Frequency and/or automation of that task should be done to your level of caution. See here for more general information on tasks.

Despite the soft deletion, misconfiguration of a cleanup policy can result in the removal of most or even all of your components so caution (and use of preview) is recommended.

Cleanup Policies

A Cleanup Policy can be created for a given format, or optionally for all formats to be used generally, from the Cleanup Policies menu item in the Repository section of the Administration menu. Each policy's criteria are ANDed together removing only components that meet all of the specified conditions (e.g. items that arrived before X days AND were downloaded before Y days).

The creation screen is shown below. The  Name and  Format are required fields and not editable once created. Any criteria enabled is required but is also editable.

Before creation, but after you enter criteria, you can see a sampling of up to 50 components the criteria would delete by clicking the Preview results button.

Specifically, clicking will display a modal where you can select a repository, whose selection is limited by the format you had selected in creation, if applicable. On click of the Preview button the first 50 components alphabetically are returned along with a total number which would be cleaned below this. These results are sortable by column and filterable through the entire resultset, as opposed to just the first 50, in case you are trying to find a specific record or records.

After creation, a policy can then be associated with one or more repositories of the Format specified when creating or editing like repositories. See more about this in the Cleanup Policy section of Managing Repositories and Repository Groups. Cleanup Policies are available for assignment to proxy and hosted repositories but not for group repositories.

Criteria

The available criteria for cleanup are shown in the table below as well as which format criteria are available on:

CriteriaMavenDocker¹YumNPMNuGetRubygemsPyPiGoRawGitLFSBowerAptCocoaPodsConda
Published Before
(tick)
(tick)(tick)(tick)(tick)(tick)(tick)(tick)(tick)(tick)(tick)²(tick)(tick)(tick)
Last Downloaded Before(tick)(tick)(tick)(tick)(tick)(tick)(tick)(tick)(tick)(tick)(tick)²(tick)(tick)(tick)
Release Type(tick)N/A(tick)(tick)









Asset Name Matcher   NEW IN 3.19

(tick)(tick)N/A(tick)(tick)(tick)(tick)(tick)(tick)N/A(tick)(tick)(tick)(tick)

¹ - The Docker cleanup policy checks against the tagged components only. Only when the Docker - Delete unused manifests and images task has run will the cleanup be 'complete'.
² - Bower functionality is for Proxy repositories only. 

"N/A" in the above table means Sonatype does not believe that the criteria applies to the format as built (though acknowledges it is possible people may have their own schemes). If you believe this to be untrue, feel free to contact us at nexus-feedback@sonatype.com and let us know.

Published Before (Days)

This criteria has a number of days parameter to tell the policy how old content should be kept in days. As an example, if the criteria is set to 30 days then if no assets associated with a given component have been modified (including redeploy) within the last 30 days, this component will be soft deleted when the task executes the policy.

Last Downloaded Before (Days)

This criteria has a number of days parameter to tell the policy how old the last downloaded window of a component should be kept in days. As an example, if the criteria is set to 30 days then any component where none of the assets associated with that component have been downloaded in the last 30 days, this component will be soft deleted when the task executes the policy.

When we mark the component’s last download date, we evaluate all the associated assets and use the last downloaded date from the most recently downloaded asset as the the component’s last downloaded date.

Release Type

This criteria has a configurable option to cleanup either prereleases or releases. It is worth noting that for repositories such as Maven this may only be relevant to repositories of mixed type. As an example, on a mixed Maven repository all snapshots would be cleaned up if this criteria was set to prerelease.

Prereleases are identified by different criteria for different formats:

FormatWhat is considered prerelease
MavenThe versions contains -SNAPSHOT
NpmNpm uses semantic versioning so if the version contains "-" it is considered a prerelease.
Yum

The "release" property in the RPM header contains one of the following: alpha, beta, rc, pre, prerelease, snapshot

This is not case-sensitive

Asset Name Matcher

Expressions are powerful and should be used with care. Always preview results to help avoid side effects.

Components in a blobstore consist of one or more assets (individual blobs). Asset names can be viewed when an asset is selected in the Browse or Search views.

The specified expression is evaluated against asset names. If there is a single match, the associated component and all of its contained assets (even those that did not match) are deleted. 

An asset name does not begin with a leading slash. This is different than the asset request path value used when evaluating Content Selector (CSEL) or Routing Rule expressions. 

Expression Syntax

The expression engine uses an Elastic Search Regexp query syntax, from Apache Lucene.

The expressions are not Perl (PCRE) or Java util.regex.Pattern compatible regular expressions and use a limited set of operators.

Expression Examples
Contents of target RepositoryPolicy ExpressionCleanup CaseRemaining items after Cleanup execution
hello/-/hello-0.0.1.tgz
hello/-/hello-0.0.2.tgz

hello/-/hello-0.0.3.tgz
hello/-/hello-0.0.[2-9].tgz
All hello assets with point versions 2-9.
hello/-/hello-0.0.1.tgz
org/example/test.jar
com/example/test.jar
test/example/test.jar
(org|com)/.*
Everything from the org and com groups.
test/example/test.jar
org/sonatype/team1/ui/5.0/ui-5.0.jar 
org/sonatype/team2/format/1.0/format-1.0.jar

org/sonatype/team3/database/10.0/database-10.0.jar
org/sonatype/team[2-3].*
Everything from the org.sonatype group on teams 2-3.
org/sonatype/team1/ui/5.0/ui-5.0.jar
pool/main/libc/libcap2/libcap2_2.25-1.2_amd64.deb

pool/main/z/zsh/zsh_5.4.2-3ubuntu3_amd64.deb
pool/main/z/zsh/zsh-common_5.4.2-3ubuntu3_all.deb
pool/main/z/zsh/zsh-common_5.4.2-3ubuntu3_all.deb
A specific single Apt asset.
pool/main/libc/libcap2/libcap2_2.25-1.2_amd64.deb

pool/main/z/zsh/zsh_5.4.2-3ubuntu3_amd64.deb

Cleanup Task

On start of a server which has cleanup abilities, a task named "Cleanup service" of type Admin - Cleanup repositories using their associated policies will be automatically created. By default, this task is scheduled to run daily at 1AM server time. Similar to other tasks, this task can be edited, disabled and executed manually if desired. If deleted, it will be automatically recreated on server restart. For more on tasks in general, see Configuring and Executing Tasks.

When run, this task will execute cleanup of all the repositories which have a policy other than None set. There is no partial execution. This task cannot be manually created and either runs or does not.

General Guide (FAQ)

What Cleanup Policies are built in?

NXRM does not come with any cleanup policies built in. The assumption that you will want to define your own policies to your specifications and then assign them to the appropriate repositories, not that there is a one solution solves all environment or problems.

Is there anything more I need to do other than apply policies to free up space?

Yes, regardless of policy, no file based blobstore space is reclaimed until you execute an Admin - Compact blob store task.

Except in urgent situations where you need immediate space back, Sonatype recommends running the backup of your blob store before you execute Admin - Compact blob store to make sure and have a copy of everything soft deleted by Cleanup.

How Docker Repositories Use Space

Docker repositories store and serve images. An image consist of docker tags, a manifest and layers.

A manifest describes what layers make up an image. Each manifest can reference many layers. 

A docker tag is an alias pointing at a manifest. The 'latest' tag usually exists pointing at the latest published or proxied image.

A layer is unique and only stored once in a repository but can be referenced by many manifests.

Tags, manifests and layers are individual assets or blobs inside a blob store.

Blob store space is almost entirely consumed by docker layers.  Space consumed by manifests and tags is negligible by comparison.

Finally, Docker V2 API implements resumable uploads. Docker uploads can be abandoned through normal use and this can leave large files dangling in temporary blob store files.

How much space does a Docker Image consume inside a blobstore?

Each image part is stored as an individual asset blob inside NXRM - each one of these has its own individual part size. If you select any one part ( blob, manifest, tag ) inside NXRM UI, the asset Summary → File size section shows the individual asset size as NXRM stores it in the blobstore.

An image manifest asset lists each layer size inside the manifest file. Even if you add up all the sizes listed in the manifests, this total size may still not match what  docker image ls  prints after an image is docker pulled, depending on the operating system where the command is run. This is normal.

A crude method to understand the total physical size of all manifest layers is to download the image manifest file, add up all the individual sizes for the layers inside the manifest. This will roughly total to amount of disk space needed to store that single image on the NXRM side. However, since images share layers, it would not be accurate to simply add all of these image manifest totals up and conclude that is how much storage a NXRM blobstore needs. The space used could overlap shared layers of other images.

Docker Cleanup Strategies

The following table outlines what tasks will soft-delete blobs in blob stores for Docker repositories:

Optimal Run OrderTask TypeFeatureWhat it deletes?
1Docker - Delete incomplete uploadsTasks

soft-delete dangling uploads in temporary blob store storage which have not been resumed

2Admin - Cleanup repositories using their associated policiesCleanup Policiessoft-delete old published or downloaded docker components i.e. tags, not layers or manifests
3Docker - Delete unused manifests and imagesTasks

soft-delete orphaned layers and manifests no longer referenced by tags, possibly orphaned by cleanup policies

The following table outlines what features actually perform a hard-delete (i.e. free storage space) of soft-deleted blobs:

FeatureBlob Store Type
Admin - Compact blob store taskFile
S3 blob store configuration Expiration DaysAWS S3

How can I tell what repositories are using the most space?

Refer to the Support article: Investigating Blobstore and Repository Size and Space Usage, and the nx-blob-repo-space-report.groovy script that is provided.

For purposes of repository size, you'll want to look at totalBytes within the output.

(info) The above script may have performance issues with large blobstores.

How do Cleanup Policies impact other "delete" tasks?

The implementation documented on this page should replace the need for any Maven - Delete usused SNAPSHOT and Repository - Delete unused components  tasks by using the Last Downloaded Before criteria.

Maven - Delete SNAPSHOT tasks are not yet completely replaced .

As mentioned aboveDocker - Delete unused manifests and images is not replaced. In fact, it is necessary to run after your cleanup policies to remove orphaned layers and manifests.

Docker - Delete Incomplete UploadsAdmin - Cleanup tags and any other task specific to delete or cleanup not mentioned here is not covered by policy cleanup at all and need to continue to be used as they are. See here for more about these tasks in general.

How do I upgrade existing tasks?

There is no migration in place, so creation of similar policies then assignment to repositories as well as deletion or disabling of existing tasks must all be done manually. Because cleanup is also implemented as a scheduled task, there is no collision if both remain running however it is a resource drain. We recommend rollover to this feature once configured and comfortable.

What is the needed security to use this feature?

Only users with the admin (i.e. nexus:*) privilege can use the Cleanup Policy left navigation item. Any user with privilege to edit a repository (such as nexus:repository-admin:maven2:maven-central:edit for the default maven-central repository) can adjust the policy a repository uses. Permission to edit the cleanup task is covered by the same permissions as other tasks (nx-tasks-update or nx-tasks-all). There is no individual privilege for just the Cleanup Task.

What if I'm on an older version of NXRM3?

If you're on an older version of NXRM3, you'll need to use the established delete tasks in your version.  See How do Cleanup Policies impact other "delete" tasks?  FAQ question for more about those tasks.

Any other tips?

See the Keeping Disk Usage Low subpage for further tips.