Repository Manager Concepts

In modern software development, managing your software efficiently and securely is crucial. A repository manager acts as a centralized location for storing, retrieving, and distributing software components and artifacts. They are your organization's system of record or library of software; from compiled code, to container images, to third party software.

Tip

A system of record is essentially the definitive, trusted source for specific pieces of data within an organization. It's where the most accurate and up-to-date version of that data resides.

This document explains the core concepts of repository managers, their role in DevOps pipelines, and best practices for secure artifact management.

What is a Repository Manager?

A repository manager is a dedicated server application that stores and manages binary software artifacts and dependencies used in your development process. Think of it as a private, curated app store for your organization's software components.

Eliminate shadow downloads with centralized artifact storage.
Simplify dependency resolution for both private and public components.
Cache remote artifacts to reduce build times and improve build performance.
Control access to artifacts and open-source components that present risk to the organization.
Integrate with DevOps pipelines for seamless artifact lifecycle management and distribution.

Instead of relying on scattered downloads or inconsistent internal practices, a repository manager provides a single source of truth for all your artifacts, including:

Open-source components: Libraries, frameworks, and other dependencies from public repositories like Maven Central, npm, PyPI, and NuGet.
Build artifacts: Outputs of your build process, ready for deployment.
Internal artifacts: Components developed within your organization, such as libraries, modules, and applications.
Third-party components: Software obtained from commercial vendors.

Why Use a Repository Manager?

The benefits of using a repository manager are numerous and impact every stage of the software development lifecycle:

Simple Dependency Management: Nexus Repository acts as a proxy for external repositories, caching frequently used components. This speeds up builds, reduces network traffic, and provides a consistent and reliable source for dependencies. By caching external dependencies, Nexus Repository insulates your builds from network outages or issues with external repositories.
Centralized Storage: A single repository for all your artifacts eliminates the chaos of managing dependencies across different locations. Integrate with your CI/CD pipeline to automate the deployment of artifacts to any environment.
Enhanced Security: Nexus Repository allows you to implement security policies and control access to artifacts. Prevent the use of vulnerable components and protect your software supply chain.
Designed for Automation: Nexus Repository provides a rich API that allows for automation and integration with other systems.

Components

A component is a package of resources that your software application uses (e.g., a library or a framework). Some examples of components include the following:

Java byte code in class files
C object files
text files (e.g., properties files, XML files, JavaScript code, HTML, CSS
binary files such as images, PDF files, sound files

Components can come in numerous formats, including the following:

Java JAR, WAR, and EAR formats
plain ZIP or .tar.gz files
other package formats such as NuGet packages, Ruby gems, NPM packages
executable formats, Android APK files, various installer formats

Components can be as complex as an entire application or as simple as a static resource; they can even comprise multiple nested components themselves along with assets. For example, a Java web application may be packaged as a WAR component containing multiple JAR components and JavaScript libraries. These JARs and libraries are also standalone components in other contexts while also being included as part of the WAR component.

While we use the generic term "component" in Nexus Repository, components are also called artifacts, packages, bundles, archives, and other terms.

Each component is identified by a unique set of coordinates. For example, you may have heard of GAV (group, artifact, and version) coordinates for Maven; however, coordinate names and usage strategies vary between formats.

Assets

An asset is a single file associated with a component. Many formats have a one-to-one mapping for component to asset; however, more complex formats have numerous assets associated with a component. For example, a typical JAR component in a Maven repository is defined at least by the POM and the JAR files; each file as well as additional files (e.g., Javadoc, Sources JAR) is a separate asset belonging to the same component.

In the Docker format, assets have unique identifiers called Docker layers. You can reuse these assets for different components (i.e., Docker images).

Components in Repositories

The open source community as well as proprietary vendors are continually creating new components. For example, there are libraries and frameworks written in various languages on different platforms that developers use for application development every day. Developers typically build applications for a specific domain by combining multiple components' features with their own custom components containing their application code.

To make consumption and usage easier, components are aggregated into collections called repositories; these are typically available on the Internet as a service. Different platforms may use terms such as "registry" and others to refer to repositories.

Examples of repositories available on the Internet as a service include the following:

Central Repository, also known as Maven Central
NuGet Gallery
RubyGems.org
npmjs.org
Docker

Numerous tools like the following access components in these repositories:

package managers like npm, nuget, gem
build tools such as Maven, Gradle, rake, grunt
IDE’s such as Eclipse, IntelliJ, Visual Studio

Repository Formats

Different repositories use different technologies to store and expose the components in them to client tools. These technologies define a repository format that is closely related to the tools interacting with the repository.

For example, the Maven repository format relies on a specific directory structure defined by the components' identifiers and several XML-formatted files for metadata. Component interaction is performed via plain HTTP commands and some additional custom interaction with the XML files.

Other repository formats use databases for storage and REST API interactions or different directory structures with format-specific files for the metadata.

Docker Components

Components in a Docker repository are formally called "Docker images." An image contains a single asset: a tagged manifest file.

A manifest can either reference a set of layers that make up an image or a set of manifests that are not associated with a component. A tagged manifest is an asset that represents a component.
See the Docker documentation on manifests
A Docker tag is an alias pointing at a manifest (i.e., the name attribute of the component).
A layer is unique and only stored once in a repository; however, many manifests are able to reference one layer. A Docker layer, for example, could be a specific operating system referenced by multiple Docker images.
Tags, manifests, and layers are each assets inside a blob store.

Docker Image Space Consumption

Blob store space is almost entirely consumed by Docker layers while manifests and tags consume comparatively little space.

To view the size of a specific asset via the Nexus Repository user interface, do the following:

Navigate to Search → Docker
Select a specific image (component); assets within that component display in a list.
Select a specific asset to view its summary. The file size field displays this asset's size.

A crude method to understand the total physical size of all manifest layers is to download the image manifest file and add up all the individual sizes for the layers inside the manifest. This will roughly total to amount of disk space needed to store that single image on the Nexus Repository side. However, since images share layers, it would not be accurate to simply add all of these image manifest totals up and conclude that is how much storage a Nexus Repository blob store needs. The space used could overlap shared layers of other images.

An image manifest asset lists each layer size inside the manifest file. It is normal that the total resulting from adding up all listed sizes in the manifests still not match what docker image ls prints after an image is pulled from Docker.

Docker V2 API implements resumable uploads. Docker uploads can be abandoned through normal use and this can leave large files dangling in temporary blob store files.

Proxy Repositories

Dependency managers use components from public open-source repositories to build applications. They use a local cache to reduce the requests and bandwidth used to download components from the remote repository. This speeds up build times and keeps the final output consistent.

This introduces risk as the local cache is not centrally managed. In a modern DevOps pipeline, organizations have multiple build servers and development teams. Managing multiple component caches is not efficient or cost-effective. Routing traffic through a proxy repository is a primary use case when using a universal artifact repository like Nexus Repository.

A proxy repository is a substitute access point and managed cache for remote repositories. These are public repositories for open source components or private repositories such as another Nexus Repository for instance. They respond in the same way as the public repository does while allowing your organization to centrally manage the cache, ensuring your dependencies are always available, and greatly reducing traffic to external services.

Here is a simple example of the proxy in action.

Your build requests a component from the proxy repository.
When the component is not cached, the request is forwarded to the remote repository, downloaded, and cached to the local storage.
The component is then forwarded to your build.

Future requests for the same component skip the external request and immediately deliver the component.

Routing Rules

Routing rules are useful tools to limit requests to external repositories to only the artifacts needed from the proxy. Routing rules help block dependency confusion like attacks against an organization’s internal namespace. Routing rules speed up access times for group repositories by limiting requests to only the proxies where the components should be fetched.

Use routing rules on all proxy repositories.
BLOCK access to your component namespaces to prevent dependency confusion attacks.
ALLOW access to only the components needed from the public repository.

Remote Teams

Development teams are often located across the globe. Remote teams may choose to run a local repository manager to proxy components from a centrally managed server. When building artifacts remotely they may also want to set up a bidirectional flow of artifacts using a combination of proxies and hosted repositories. Here are some considerations to keep in mind.

The central hub and spoke model, where built artifacts are written to a central server, is far easier to manage and scale than trying to set up a bi-directional configuration. Each additional remote location will require an exponential number of connections and proxies to keep the organization connected. They are more complicated to back up and recover from any form of outage.
Avoid using large group repositories (hundreds) where many remote proxies are configured as this will very negatively affect performance as the group will need to check each remote repository for every request. Keep group repositories to only the proxies that are needed for the build and leverage a single hosted repository for teams where possible.
When proxying remote group repositories, it is pretty easy to create circle references. This happens when a proxy is made against a remote group repository which may also include a proxy to the source repository.

Clean Up your Proxies

You may accumulate several proxy repositories over time and not remember why or who uses them unless you start with a clear onboarding process, strict RBAC roles, and good documentation. Every proxy in a group repository adds time to resolve dependencies and overhead to the server. Regularly review if you need proxies and take them offline when they are not actively being used.

Use custom group repositories for teams that need additional proxies.
Remote proxies should use HTTPS to protect against man-in-the-middle attacks.
Periodically audit the URLs from a proxy that it is still valid and if their remote authentication is correct.
If a proxy will be permanently offline, consider exporting and reimporting it as a 3rd-party hosted repository.
Avoid duplicate proxies to the same URL. Proxies can be reused by multiple teams in group repositories.
The order of repositories in group repositories matters. Keep the local hosted repositories first. Searching the hosted repositories is far quicker and keeps you from looking for your internal components in external proxies.
Review proxy caching intervals to improve response times.