Skip to main content

Cleanup Policies

Impact for Those Migrating to H2 or PostgreSQL

The regular expression that only matched specific options before migrating to H2 or PostgreSQL may match more items after migration. This will impact your cleanup policies where more content may be removed than previously.

Further, there are differences in the Asset Name Matcher that could prevent cleanup policies that used the Asset Name Matcher in OrientDB environments from identifying expected components in PostgreSQL environments.

See the Asset Name Matcher section for examples; expect to review and update cleanup policies as part of migration.

As described in Components, Repositories, and Repository Formats (Components and Assets in Docker for the Docker format), repositories contain components and associated assets. If you are not cleaning out old and unused components, your repositories will grow quickly; over time, this will present risks to your deployment:

  • Storage costs will increase

  • Performance is impacted

  • Artifact discovery will take longer

  • Consuming all available storage will results in server failure

You can create cleanup policies and assign them to one or more repositories so that a scheduled cleanup task (Admin - Cleanup repositories using their associated policies) will automatically soft delete artifacts from the repository. A soft delete means that artifacts are only marked for removal and not yet deleted from the disk. Disk space is not reclaimed until the Admin - Compact blob store task runs.

Note

The Admin - Compact blob store task does not apply to S3 blob stores, which are cleaned up using AWS Lifecycle.

See Tasks for more general information on tasks.

Cleanup Policy Security Requirements

Only users with the admin (i.e., nexus:*) privilege can use the Cleanup Policy left navigation item. Any user with the privilege to edit a repository (such as nexus:repository-admin:maven2:maven-central:edit for the default maven-central repository) can adjust the policy a repository uses.

Permission to edit the cleanup task is either nx-tasks-update or nx-tasks-all

Creating a Cleanup Policy

Note

Nexus Repository does not come with any built-in pre-configured cleanup policies. These are something you should define as best suit your environment.

You can create and assign one or more cleanup policies to a repository under AdministrationRepositories. You can assign cleanup policies to both proxy and hosted repositories.

Each policy's criteria will be ANDed together, removing only components that meet all of the specified conditions.

Name and Format are required fields and the name can not be edited after policy creation. You can also enter cleanup criteria to define the policy and can edit these fields later if desired as well.

Only letters, digits, underscores(_), hyphens(-), and dots(.) are allowed and may not start with underscore or dot.

Define-Cleanup-Policy-1-of-2
Define-Cleanup-Policy-2-of-2

Cleanup Criteria

The table below lists the available cleanup criteria and the formats to which they can apply:

Format

Component Age

Component Usage

Release Type

Asset Name Matcher

Retain Select Versions

All Formats

Confirmed

Confirmed

Apt

Confirmed

Confirmed

Confirmed

Bower²

Confirmed

Confirmed

Confirmed

CocoaPods

Confirmed

Confirmed

Confirmed

Conan

Confirmed

Confirmed

Confirmed

Conda

Confirmed

Confirmed

Confirmed

Docker¹

Confirmed

Confirmed

Confirmed

Confirmed

GitLFS

Confirmed

Confirmed

Go

Confirmed

Confirmed

Confirmed

Helm

Confirmed

Confirmed

Confirmed

Maven

Confirmed

Confirmed

Confirmed

Confirmed

Confirmed

npm

Confirmed

Confirmed

Confirmed

Confirmed

NuGet

Confirmed

Confirmed

Confirmed

p2

Confirmed

Confirmed

Confirmed

PyPI

Confirmed

Confirmed

Confirmed

R

Confirmed

Confirmed

Confirmed

Raw

Confirmed

Confirmed

Confirmed

RubyGems

Confirmed

Confirmed

Confirmed

Yum

Confirmed

Confirmed

Confirmed

Confirmed

¹ - Cleanup only evaluates tagged manifests for Docker. Untagged manifests are not considered components in the Nexus Repository. Cleanup is only complete when the Docker - Delete unused manifests and images task has run.

² - Bower functionality is for proxy repositories only.

Note

"N/A" in the above table means Sonatype does not believe that the criteria applies to the format as built but acknowledges that it is possible that some customers may have their own schemes. If you believe this to be untrue, feel free to contact us at nexus-feedback@sonatype.com and let us know.

Component Age (Days)

This criteria sets how long to keep content based on component age. This is calculated based on when the component was first downloaded from the public repository and when the component was uploaded to a hosted repository. If the criteria is set to 30 days, then the cleanup policy will soft delete components not modified (i.e., no one has re-deployed content to the same path) within the last 30 days.

Component Usage (Days)

This criteria sets how long to keep content based on when a component was last downloaded.

Note

If the component has never been downloaded, the policy will use the published or updated date instead.

Release Type

Use this criteria to set the cleanup policy to either PRELEASES or RELEASES.

Prereleases are identified differently by format:

Format

What is considered prerelease

Maven

The versions contains -SNAPSHOT

npm

npm uses semantic versioning, so a version is considered prerelease if it contains the "-" character.

Yum

The non-case-sensitive "release" property in the RPM header contains one of the following: alpha, beta, rc, pre, prerelease, snapshot

Asset Name Matcher

Components comprise one or more assets (individual files) in a blob store. The Asset Name Matcher allows you to define rules based on asset names to determine which assets should be included in cleanup operations.

You can view asset names after selecting an asset in the Browse or Search views. The specified RegEx will be evaluated against asset names. If there is a match, Nexus Repository deletes the associated component and all of its contained assets.

The specified RegEx will be evaluated against asset names. If there is a match, Nexus Repository deletes the associated component and all of its contained assets.

Important

There are important breaking changes in Asset Name Matcher functionality between H2/PostgreSQL and OrientDB environments. Review the sections below carefully before migrating.

Leading Slash in Asset Names in PostgreSQL vs. OrientDB

In OrientDB environments, asset names do not include a leading slash. Consequently, an asset name matcher like com0/.* would successfully identify matching assets.

In PostgreSQL environments, names inherently contain a leading slash. Therefore, to achieve the same matching behavior as in OrientDB, the asset name matcher needs to incorporate this leading slash (e.g., /com0/.). If the matcher omits the leading slash, no assets will be matched.

When migrating to PostgreSQL, it's crucial to revise your cleanup policies that employ an asset name matcher to include the leading slash in asset names. Failure to do so could result in assets not being matched and cleaned up as expected.

PostgreSQL/H2 Expression Syntax

In H2 or PostgreSQL environments, the Asset Name Matcher uses Java regular expressions. While Lucene regular expressions are anchored by default (meaning that they must match the entire asset name if there isn’t a wildcard somewhere in the regex), Java regular expressions are not anchored by default.

This means that a regular expression that only matched specific options before migrating to H2 or PostgreSQL may match more items after migration.

Example of an Expression That Matches More Items After Migration to PostgreSQL/H2

This example contains the following assets for consideration:

/antlr/antlr/2.7.2/antlr-2.7.2.jar

/org/antlr/antlr-master/3.1.3/antlr-master-3.1.3.pom

The example Sonatype Nexus Repository instance has a cleanup policy where the Asset Name Matcher uses the regex antlr.*.

When running cleanup on OrientDB, /antlr/antlr/2.7.2/antlr-2.7.2.jar is removed while /org/antlr/antlr-master/3.1.3/antlr-master-3.1.3.pom was not removed.

After upgrading to an H2 instance and re-running the same cleanup policy, both components were removed.

OrientDB Expression Syntax

Note

In OrientDB, an asset name does not begin with a leading slash. This is different than the asset request path value used when evaluating Content Selector (CSEL) or Routing Rule expressions.

The expression engine uses an Elastic Search Regexp query syntax, from Apache Lucene.

The expressions are not Perl (PCRE) or Java util.regex.Pattern compatible regular expressions and use a limited set of operators.

Expression Examples

Components in Target Repository

Policy Expression

Cleanup Case

Remaining Components After Cleanup Execution

hello/-/hello-0.0.1.tgz
hello/-/hello-0.0.2.tgzhello/-/hello-0.0.3.tgz
hello/-/hello-0.0.[2-9].tgz

All hello components with point versions 2-9.

hello/-/hello-0.0.1.tgz
org/example/test.jar
com/example/test.jar
test/example/test.jar
(org|com)/.*

Everything from the org and com groups.

test/example/test.jar
org/sonatype/team1/ui/5.0/ui-5.0.jar
org/sonatype/team2/format/1.0/format-1.0.jarorg/sonatype/team3/database/10.0/database-10.0.jar
org/sonatype/team[2-3].*

Everything from the org.sonatype group on teams 2-3.

org/sonatype/team1/ui/5.0/ui-5.0.jar
pool/main/libc/libcap2/libcap2_2.25-1.2_amd64.debpool/main/z/zsh/zsh_5.4.2-3ubuntu3_amd64.debpool/main/z/zsh/zsh-common_5.4.2-3ubuntu3_all.deb
pool/main/z/zsh/zsh-common_5.4.2-3ubuntu3_all.deb

A specific single Apt component.

pool/main/libc/libcap2/libcap2_2.25-1.2_amd64.debpool/main/z/zsh/zsh_5.4.2-3ubuntu3_amd64.deb

(info) Cleanup preview will show the component name but may be analyzing by the path. For example, in the example above, the preview might show hello-0.0.2.tgz and hello-0.0.3.tgz despite the fact the component contains hello/-/.

Retain Select Versions (Pro Only)

As of release 3.65.0, those using a PostgreSQL database have the option to exclude the latest "x" number of versions from your cleanup policy for Maven and Docker formats. Sonatype Nexus Repository will use the version number (for Maven) or component age (for Docker) to determine what to keep.

For Maven, this feature is only available for released versions (i.e., you must select Releases as the Release Type).

For both Maven and Docker, you must select select at least one other cleanup criterion to enable this option.

To use this feature, select the checkbox labeled Except, do not remove any component that meets the following criterion.

Then, under Number of Versions, define the number of versions to exclude from cleanup.

Preview Cleanup Policy Results

This section covers previewing what a cleanup policy will mark for deletion from Sonatype Nexus Repository.

PostgreSQL Database Cleanup Preview Experience (Pro Only)

As of release 3.62.0, before you save a cleanup policy, you can generate and download a list of components that the policy would remove as it is currently configured.

After defining your cleanup criteria, select a repository from the Preview Cleanup Policy Results drop-down menu; then, select theGenerate CSV Report.

This generates and downloads a .csv file containing the complete list of components that would be removed from the selected repository if you were to apply the cleanup policy you are creating.

The downloaded file follows a "<cleanup policy name>-<repository name>-<timestamp>.csv" naming convention and includes component namespaces, names, versions, and paths.

Note

If you are using Content Replication, remember that deletion is not replicated; the cleanup policy will only remove components from the specific instance on which it is run.

Preview CSV Generation Performance

Generating a Cleanup Preview CSV can take some time depending on deployment size and configuration.

The table below provides a rough estimate of how long generating a CSV might take based on our internal testing using the following specifications:

  • Deployed on AWS using the following:

    • ECS c6i.4xlarge

    • Aurora PostgreSQL db.r6g.large

  • Both deployed in the same Availability Zone

  • Hosted repository type

  • Raw format

Components in Repository

Time to Generate CSV Report

1M

~1min

5M

~2min

10M

~4min

20M

~8min

23M

~10min

25M

~15min

27M

~25min

30M

~40min

H2 and OrientDB Cleanup Preview Experience

For those using H2 or OrientDB, before you save the policy, you can preview a sample of the components that the policy would remove. Select a repository from the Preview Repository drop-down menu; then, select the Preview button to return a sample of the components that the cleanup policy would delete if applied to that repository at that point in time. You can also use the preview feature after saving the cleanup policy.

Note that this sample may be an incomplete list of what the policy may actually remove when run. Use the filter to check for specific results not shown in the sample.

To avoid unreasonable wait times in cases where the database or cleanup policy are very complex, there is a 1-minute timeout on the preview feature.

Cleanup Task

When you start a server with cleanup abilities enabled, Nexus Repository automatically creates a task named Cleanup service with the type Admin - Cleanup repositories using their associated policies. By default, this task is scheduled to run daily at 1 AM server time. Similar to other tasks, you can edit, disable, and manually execute this task if desired. If you do delete this task, Nexus Repository will automatically recreate it on server restart.

When run, this task deletes components and assets based on configured cleanup policies. To recover storage, you may also need to run the Admin - Compact blob store task.

This task executes the cleanup of all repositories that have a policy other than None set. There is no partial execution. This task cannot be manually created and either runs or does not.

Hard Deleting Cleaned Up Components

Cleanup tasks remove components and associated assets from the database; however, as with other delete operations in Nexus Repository, the associated blobs (files) are not immediately removed from storage. Blob store space is reclaimed automatically if the Admin - Compact blob store task is scheduled. For those using H2 or PostgreSQL databases, the Admin - cleanup unused asset blobs task must also run before the Admin - Compact blob store task executes.

If you must immediately reclaim storage space, proceed with the following:

  1. If using an H2 or PostgreSQL database, run the Admin - cleanup unused asset blobs task for the related format(s) and wait for completion.

  2. Regardless of the database, you must then run the Admin - Compact blob store task.

For Azure blob stores, the Admin - Compact blob store task asks the Azure blob store to perform a hard delete by calling the delete function on the Azure client. This function then marks the specified blob for deletion, and it is deleted during garbage collection on the Azure side. This may vary depending on whether or not you have the soft delete feature enabled as described in Azure's documentation.

The Admin - Compact blob store task is not used for S3 blob stores. Instead, Nexus Repository creates an AWS bucket lifecycle policy based on the blob store configuration to schedule blobs for deletion. (See our documentation on configuring expiration days for S3 blob stores.)

See Tasks for more general information on configuring tasks.

Docker Cleanup Strategies

Docker has a unique way of managing components and assets (See Components and Assets in Docker) and therefore requires some thought when designing a cleanup strategy.

The following table outlines what tasks will soft-delete blobs in blob stores for Docker repositories:

Optimal Order

Task Type

Feature

What it deletes?

1

Docker - Delete incomplete uploads

Tasks

soft-delete dangling uploads in temporary blob store storage which have not been resumed

2

Admin - Cleanup repositories using their associated policies

Cleanup Policies

soft-delete old published or downloaded docker components i.e. tags, not layers or manifests

3

Docker - Delete unused manifests and images

Tasks

soft-delete orphaned layers and manifests no longer referenced by tags, possibly orphaned by cleanup policies

The following table outlines what features perform a hard delete (i.e. free storage space) of soft-deleted blobs:

Feature

Blob Store Type

Admin - Compact blob store task

File

S3 blob store configuration Expiration Days

AWS S3

Clean Up Components That Have Never Been Downloaded

One very common use case for cleanup policies is to clean up components that have never been downloaded.

To do this, configure a cleanup policy using the Component Usage (Days) criterion. This criterion tells the cleanup policy to remove components that haven't been downloaded in a specified number of days.

If a component has never been downloaded, the policy will use the component's published or updated date instead. To further explain, Sonatype Nexus Repository identifies components for cleanup by looking at the most recent (maximum) blob created date from all assets on the component. For components with multiple assets, that can be considered the last updated time for the component.

So, if a component was not downloaded, created, or updated in the specified number of days, the policy will identify it for removal.

Additional Information

Java- versus SQL-based Cleanup

As of release 3.65.0, all Sonatype Nexus Repository Pro instances using PostgreSQL databases will use SQL-based cleanup by default.

As illustrated in our provided Cleanup Performance Data, SQL-based cleanup is proven to take considerably less time than Java-based cleanup.

Determining Which Repositories Use the Most Space

Refer to the Support article: Investigating Blob Store and Repository Size and Space Usage, and the nx-blob-repo-space-report.groovy script that is provided.

For purposes of repository size, you'll want to look at totalBytes within the output.

The above script may have performance issues with large blobstores and is not applicable for S3 or PostgreSQL; it will only work on file-based blob stores and OrientDB.

Cleanup Policies' Impact to Other "Delete" Tasks

The implementation documented on this page should replace the need for any Maven - Delete unused SNAPSHOT and Repository - Delete unused components tasks by using the Last Downloaded Before criteria.

Maven - DeleteSNAPSHOT tasks are not yet completely replaced.

Docker - Delete unused manifests and images are not replaced. It is necessary to run after your cleanup policies to remove orphaned layers and manifests.

Docker - Delete Incomplete Uploads, Admin - Cleanup tags, and other task-specific to delete or cleanup not mentioned here are not covered by policy cleanup and should be used as they are.

For more on tasks in general, see Tasks.

Upgrading Existing Tasks

There is no migration in place, so the creation of similar policies and then assignment to repositories as well as deletion or disabling of existing tasks must all be done manually. Because cleanup is also implemented as a scheduled task, there is no collision if both remain running however it is a resource drain. We recommend rollover to this feature once configured and comfortable.