Cleanup Policies
Cleanup Policies are the automation rules for removing content stored in repositories of your Nexus Repository. The quantity of components quickly grows over time without reducing the number of components at the same rate as they are being added to the Nexus Repository.
This presents a risk to your deployment when not managed early as detailed below:
A continuing increase in storage costs as more artifacts are published
Impact on performance as searching takes longer
Discovery is challenging as you sort through more artifacts
Consuming the available storage results in server failure and outages
Cleanup Policies and Tasks are not configured by default. Define the policies that best suit your development lifecycle.
Security Requirements
Only users with admin privileges may create cleanup policies. (i.e.,
nexus:*
)Users with edit repository privileges may add a clean policy to the repository.
nexus:repository-admin:maven2:maven-central:edit
Users with either of the following privileges may modify cleanup tasks:
nx-tasks-update, nx-tasks-all
Cleanup Policies Workflow
Nexus Repository cleans up components using a set of rules, or cleanup policies, set on the repository configuration and a series of tasks to safely flag and remove the components.
These are the steps to setting up cleanup policies:
Administrators first create
Cleanup Policies
depending on their requirements.Add one or more
Cleanup Policies
to a repository's configuration.Cleanup Tasks
are set to run regularly.Admin - Cleanup repositories using their associated policies Admin - Cleanup unused asset blobs
Cleanup tasks "soft delete" components by flagging them for removal. Components still consume space but may be recovered when needed.
Run the compact blob store task on off-peak hours for each blob store to reclaim disk space. These tasks completely removed the components from the storage disk, freeing up the available space.
Admin - Compact blob store
Creating Cleanup Policies
Cleanup Policies are located in the repository section of the administration menu and require admin privileges to create or modify. They are intended for use in one more many repositories typically associated with a specific repository format type.
Select the
Create Cleanup Policy
button from the Cleanup Policies viewProvide a unique name for the policy
Only letters, digits, underscores(_), hyphens(-), and dots(.) are allowed and may not start with underscore or dot.
Select a target format for the policy
The '
All formats
' option may be selected for any repository type however most formats have specific cleanup criteria available for use.Optionally enter notes in the Description field (limit of 400 characters)
Understanding what the cleanup policy is for at a later date may be challenging. We recommend providing details on your policy for auditing purposes.
Select at least one criterion for the policy
See the available cleanup criteria by format table below for details.
Select the
Save
button
See Cleanup Criteria for using each criterion.
Strategies for Creating a Cleanup Policy
Cleanup Policies are intended to target components to be removed from either hosted or proxy repositories.
The criteria of the policy are combined together to remove only the components that meet every condition specified.
Multiple cleanup policies may be applied to the same repository.
It is possible for policies to have overlapping criteria targeting the same components.
Adding A Cleanup Policy to a Repository
Cleanup policies may be assigned to both proxy and hosted repositories. Users require repository admin edit privileges to edit repositories to add cleanup policies.
Select a repository from the Repositories view in the Administration menu or create a new one
Navigate to the Cleanup section
Optionally use the search filter to limit the available cleanup policies from the list
Select the required cleanup policies from the available section
Use the right-facing arrow to move the cleanup policy to the applied section
Repeat for as many policies that are required
Select the
Save
button
Previewing Cleanup Policy Results
This section covers previewing a sample of the results of a cleanup policy against a specific repository for testing purposes. These results are not comprehensive audit as the result may defer depending on the selected criteria and when the cleanup policy runs.
Select a repository from the
Preview Repository
drop-down menu below theSave
buttonSelect the Preview button to return a sample of the components
Use the filter to check for specific results not shown in the sample
The sample may be an incomplete list of what may be removed
There is a 1-minute timeout on the preview to reduce the impact to performance
PostgreSQL Cleanup Preview
Generate the complete list of components that the policy would remove as configured.
Select a repository from the
Preview Repository
drop-down menu below theSave
buttonSelect the Generate CSV Report button
The CSV file is downloaded once the query is complete
Filename: <cleanup policy name>-<repository name>-<timestamp>.csv Fields: namespace, name, version, path
Generating the cleanup preview CSV takes time depending on deployment size and configuration. The table below provides generation times based on our internal testing with the following specifications:
Deployed on an AWS ECS c6i.4xlarge instance with Aurora PostgreSQL db.r6g.large database ---------------------------------------------------------------------------- Components in Repository (millions) || Time to Generate CSV Report (minutes) 1M = ~1min 5M = ~2min 20M = ~8min 25M = ~15min 27M = ~25min 30M = ~40min
Cleanup Tasks
Nexus Repository automatically creates a few system tasks to soft delete components identified for cleanup by the cleanup policies. These tasks are not manually created and are re-added on a restart of the service when deleted.
Cleanup service: Admin - Cleanup repositories using their associated policies This task soft deletes components based on the repository's configured cleanup policies. These tasks may be rescheduled or manually executed. By default, this task is set to run once an hour.
Cleanup unused {format} blobs from nexus: Admin - Cleanup unused asset blobs These tasks soft delete orphaned assets that are no longer needed after a component is removed. They are added when a new format has been added as a repository. By default, these tasks run every 30 minutes.
Hard Deleting Components
Cleanup Policies soft delete components to remove after which Nexus Repository no longer displays the soft deleted components in the user interface.
However, these components are not immediately deleted from storage and will still use disk space.
Create and schedule the
Admin - Compact blob store
task to reclaim disk space.Create the task for every blob store that requires cleanup.
Azure Blob Store Cleanup
The compact blob store task requests the Azure blob store to mark blobs for deletion. These are later hard-deleted during garbage collection on the Azure side.
This may vary on whether the soft delete feature is enabled as described in Azure's documentation.
AWS S3 Blob Store Cleanup
AWS S3-based blob stores use a bucket lifecycle policy managed on the S3 blob store configuration to delete components. When components are soft-deleted using cleanup policies, the expiration days property sets the lifecycle on the blob in the S3 bucket.
The compact blob store task is not used for S3 blob stores.
Docker Cleanup Strategies
Docker's tagging, manifests, and layers are unique ways of managing components and assets that require additional configuration when designing a cleanup strategy.
See Components and Assets in Docker to learn more about the docker format.
Docker - Delete incomplete uploads Soft-delete uploads to the temporary storage that are not complete
Admin - Cleanup repositories using their associated policies Soft-delete old published or downloaded docker components i.e. tags, not layers or manifests
Docker - Delete unused manifests and images Soft-delete orphaned layers and manifests no longer referenced by tags, possibly orphaned by cleanup policies
Once the above tasks have run, the following tasks are needed to hard delete the components and reclaim space depending on your deployment.
Run the
Admin - Compact blob store
task for file-based blob storesSet the configuration
Expiration Days
on object-based blob stores such as S3
Additional Information
Clean Up Components That Have Never Been Downloaded Use the
Component Usage (Days)
criterion to clean up components that have never been downloaded. This criterion removes components that haven't been downloaded in a specified number of days. The date the component was published is used when the component has never been downloaded.Cleanup Policies Does Not Remove Components from Replicated Repositories Content Replication does not replicate the deletion of components on remote repositories. Cleanup policies only remove components from the specific instance on which it is run. The remote repository requires its own setup of cleanup policies.
SQL-Based Cleanup Performance As of release 3.65.0, Nexus Repository Pro instances using PostgreSQL databases use SQL-based cleanup by default. SQL-based cleanup is proven to take considerably less time than Java-based cleanup.
See the metrics at Cleanup Performance Data.
Determining the Space Your Repositories Are Using Use the following Support article to determine the space your repositories are using.
Investigating Blob Store and Repository Size and Space Usage
Replace the following tasks with Cleanup Policies Maven - Delete unused SNAPSHOT Repository - Delete unused components
Cleanup Criteria
The table below lists the available cleanup criteria and the formats to which they apply:
Format | Component Age | Component Usage | Release Type | Retain Select Versions | Asset Name Matcher |
---|---|---|---|---|---|
¹ - Cleanup only evaluates tagged manifests for Docker.
² - Bower functionality is for proxy repositories only.
Component Age (Days)
This criteria sets how long to keep content based on component age.
Proxy repositories: based on when the component was first downloaded
Hosted repositories: based on when the component was uploaded or updated
Component Usage (Days)
This criteria sets how long to keep content based on when a component was last downloaded. The published or updated dates are used when the component has never been downloaded.
Release Type
Use to set the cleanup policy to either PRELEASES
or RELEASES
. Prereleases are different by format:
Maven Versions contain the
-SNAPSHOT
phrasenpm Uses semantic versioning where a version is a prerelease when it contains the dash "
-
" characterYum The non-case-sensitive "
release
" property in the RPM header contains one of the following:alpha, beta, rc, pre, prerelease, snapshot
Retain Select Versions (PostgreSQL Only)
Those using a PostgreSQL database have the option to exclude the most recent versions from the cleanup policy. Select the number of versions to keep even when matching other criteria.
Maven The version number is used, available for the release type
Releases
Docker The age of the manifest is used.
Select the checkbox
Select the number of versions to keep
Asset Name Matcher
Rules are based on the component name, namespace, or path in the repository. Supported regular expression patterns differ between the legacy OrientDB and the newer PostgreSQL and H2 environments.
When migrating to PostgreSQL or H2, legacy cleanup policies may result in more assets being removed than expected.
PostgreSQL and H2 Expressions
In H2 or PostgreSQL environments, the Asset Name Matcher uses Java regular expressions.
Not compatible with OrientDB Lucene regular expressions
Java regular expressions may match any part of the component path
When migrating to PostgreSQL, revise cleanup policies to include the leading slash in asset matcher names. Failure to do so may result in assets not being matched and cleaned up as expected.
OrientDB Expressions
Expressions in OrientDB uses the Elastic Search regular expression query syntax, from Apache Lucene.
Not compatible with Perl (PCRE) or Java
util.regex.Pattern
regular expressionsExpressions must match the entire name when wildcards are not used.
Asset names do not require a leading slash and use a limited set of operators
Asset matchers in OrientDB are different than the asset request path value used when evaluating content selector or routing rule expressions
Comparison between OrientDB and PostgreSQL
This example contains the following assets for consideration:
Pattern antlr.* Repository /antlr/antlr/2.7.2/antlr-2.7.2.jar /org/antlr/antlr-master/3.1.3/antlr-master-3.1.3.pom
OrientDB - the first component is matched while the second is not
H2 or PostgreSQL - both components are matched
Expression Examples
The following examples demonstrate a specific asset name matcher against a repository and the remaining components after using the matcher in a cleanup policy.
Components in a version range /hello/-/hello-0.0.[1-2].tgz Repository /hello/-/hello-0.0.1.tgz /hello/-/hello-0.0.2.tgz /hello/-/hello-0.0.3.tgz Remaining /hello/-/hello-0.0.3.tgz
Components with a specific path /(org|com)/.* Repository /org/example/test.jar /com/example/test.jar /test/example/test.jar Remaining /test/example/test.jar
Components not belonging to a specific team /org/sonatype/^(team2)/.* Repository /org/sonatype/team1/ui/5.0/ui-5.0.jar /org/sonatype/team2/format/1.0/format-1.0.jar /org/sonatype/team3/database/10.0/database-10.0.jar Remaining /org/sonatype/team2/format/1.0/format-1.0.jar
A specific component /pool/main/z/zsh/zsh-common_5.4.2-3ubuntu3_all.deb Repository /pool/main/libc/libcap2/libcap2_2.25-1.2_amd64.deb /pool/main/z/zsh/zsh_5.4.2-3ubuntu3_amd64.deb /pool/main/z/zsh/zsh-common_5.4.2-3ubuntu3_all.deb Remaining /pool/main/libc/libcap2/libcap2_2.25-1.2_amd64.deb /pool/main/z/zsh/zsh_5.4.2-3ubuntu3_amd64.deb