Proxy Repository Concepts
Explain the value of proxy repositories
Discuss audit guidelines for proxy repositories
Share best practices when using proxies at scale
What is a proxy repository?
Dependency managers use components from public open-source repositories such as Central, npm, and PyPI to build applications.They will often use the local cache first to see if the component already exists in order to reduce the number of requests and the bandwidth used to download all the components from the remote repository. This speeds up build times and keeps the final output consistent, however it can introduce risk as the local cache is not centrally managed or consistent everywhere the application is built. In a modern DevOps pipeline, organizations have multiple build servers and development teams so trying to manage multiple component caches is not efficient or cost-effective.Redirecting traffic through proxy repositories is a primary use case when using a universal artifact repository like Repository Manager.
A proxy repository is a substitute access point and managed cache for remote repositories. These could be the public repositories for open source components or private repositories such as another Nexus Repository for instance. They respond in exactly the same way as the public repository would however they allow your organization to centrally manage to cache, ensure your dependencies are always available, and greatly reduce traffic to external servers. Here is a simple example of the proxy in action.
Your build makes a request for a component from the proxy repository.
If the component is not cached, the request is forwarded to the remote repository, downloaded, and cached to the local storage.
The component is then forwarded to your build.
Future requests for the same component skip the external request and immediately deliver the component.
As part of a Nexus Repository Manager Health check workshop, we review the server configuration and provide detailed recommendations on how to better optimize your repository instance. There are a number of recommendations regarding proxy repositories that are fairly universal to any production instance. We will go through the most common ones here.
Routing Rules
Routing rules are incredibly useful tools that are often overlooked. Administrators use routing rules to limit requests to external repositories to only the artifacts needed from the proxy. Routing rules can help with dependency confusion like attacks against an organization’s internal namespace. Routing rules will speed up access times for group repositories by limiting requests to only the proxies where the components should be fetched.
Recommendations
It is a best practice to use routing rules on all proxy repositories.
For public repositories, BLOCK access to internal namespaces to prevent dependency confusion attacks.
For other repositories, only ALLOW access to the namespaces that are required from that repository.
Remote Teams
Development teams are often located across the globe. Remote teams may choose to run a local repository manager to proxy components from acentrally managed server. When building artifacts remotely they may also want to set up a bidirectional flow of artifacts using a combination of proxies and hosted repositories. Here are some considerations to keep in mind.
Recommendations
The central hub and spoke model, where built artifacts are written to a central server, is far easier to manage and scale than trying to set up a bi-direction configuration. Each additional remote location will require an exponential number of connections and proxies to keep the organization connected. They are more complicated to backup and recover from any form of outage.
Avoid using large group repositories (hundreds) where many remote proxies are configured as this will very negatively affect performance as the group will need to check each remote repository for every request. Keep group repositories to only the proxies that are needed for the build and leverage a single hosted repository for teams where possible.
When proxying remote group repositories, it is pretty easy to create circle references. This happens when a proxy is made against a remote group repository which may also include a proxy to the source repository.
Audit your Proxies
You may accumulate a number of proxies over time and not remember why or who uses them unless you start out with a clear onboarding process, strict RBAC roles, and good documentation. Every proxy in a group repository adds time to resolve dependencies and overhead to the server. The best practice is to regularly review if you need proxies and take them offline when they are not actively being used.
Recommendations
Consider custom groups repositories for teams that need additional proxies.
Remote proxies should use HTTPS to protect against man-in-the-middle attacks.
Periodically audit the URLs from a proxy that it is still valid and their remote authentication is correct.
If a proxy will be permanently offline, consider exporting and reimporting it as a 3rd-party hosted repository.
Avoid duplicate proxies to the same url. Proxies can be reused by multiple teams in group repositories.
The order of repositories in group repositories matters. Keep the local hosted repositories first. Searching the hosted repositories is far quicker and keeps you from looking for your internal components in external proxies.
Review proxy caching intervals to improve response times.