Backup and Restore in Amazon Web Services

This section covers recovery from backup in Amazon Web Services (AWS). While the details here are specific to AWS, a similarly effective deployment model is likely possible with other cloud vendors. As a best practice, you should also examine current AWS documentation before implementing your own deployment to validate that AWS services have not changed in a way that will impact your needs and to estimate deployment costs. 

Planning Your Backup Strategy

When determining your backup strategy, consider the following:

  • Your organization’s data retention policy
  • Your budget
  • The scenarios against which you are trying to protect, such as the following:
    • AWS outages
    • Data corruption
    • Inadvertent large-scale deletes
    • Deployment errors 
    • Cyber attacks 

You will need to account for these areas of your Nexus Repository deployment when creating a backup strategy:

While backed up independently, the database does include references to objects in the blob store(s), so you should ensure that the backups of both occur around the same time to minimize the drift between the two.

As you review the backup strategies in the following sections, there are two important terms to remember:

  • Recovery Point Objective (RPO) - the amount of downtime and resultant data loss that is acceptable to lose if a restore becomes necessary
  • Recovery Time Objective (RTO) - the length of time required to restore the service 

Database Backups 

The backup method described here requires an external database.  If using H2 or Orient, you should take down the instance to do the snapshot to avoid catching the database in an inconsistent state.

You can configure Amazon’s Relational Database Service (RDS) to automatically snapshot RDS instances as frequently as every 5 minutes; you can also create a manual snapshot on demand. It is important to configure the retention policy for these snapshots to meet your data retention requirements.

RDS provides a mechanism for restoring to a specific point in time. Depending upon the nature of the incident, you may still be able to recover Nexus Repository content (not configuration) added after the incident using the Repair - reconcile component database from blob store task.

If you have a support contract, contact Sonatype support for assistance before using any repair task.

You can also configure RDS to automatically replicate backup snapshots to a distinct region. This allows you to restore Nexus Repository to a secondary AWS region in the event of a complete failure of the deployed region. When using this replication, you must also replicate the blob storage to the same region as the RDS backup.

Snapshots of an RDS instance are associated with that instance. In some scenarios, the snapshots may be deleted automatically when an RDS instance is removed. To avoid this, you should periodically export database snapshot data to S3. These exported snapshots are also important should access to the database be maliciously compromised.

Blob Store Backups 

Note that you can't use all file systems for all purposes; consult the System Requirements for more information.

Amazon S3’s standard storage classes largely provide redundancy across at least three availability zones (AZs). When used for Nexus Repository’s blob storage in conjunction with an available RDS database, this allows an RTO on the order of minutes from detecting an AWS AZ outage.

Amazon’s S3 service also supports cross-region replication. Their documentation states that replication typically occurs within 15 minutes; however, objects may take longer in some cases. This means you can expect an average RTO of 15 minutes or less for blobs and should allow eventual recovery of all blobs.

File System Backups 

If using EBS, point-in-time snapshots can help provide multi-AZ redundancyWhile EBS only uses one AZ, it stores snapshots in multiple AZs. In the event of an AZ failure, you could use EBS snapshots to provision a new EBS volume in another AZ. However, this is not automatic. You will need to implement a process such as specifiying the EBS snapshot ID in EC2 launch configuration. See the AWS documentation for more information. Amazon EBS also offers a higher durability volume type (i.e., io2), that is designed to provide 99.999% durability with an annual failure rate (AFR) of 0.001%, where failure refers to a complete or partial loss of the volume. 

Amazon EFS Standard is inherently designed to protect against losing an entire AZ. 

Nexus Repository also stores Elasticsearch indexes on the local disk. Snapshots of the file system do not guarantee a consistent index, but you can rebuild this while Nexus Repository is online.

Expected Results

The table below outlines various recovery methods and the RPO and RTO you can expect to achieve given your selected database, blob storage, and file system. The expected RTO does not include the time required to rebuild the Elasticsearch index. Primary Nexus Repository services are expected to be functional while this occurs; even large instances typically rebuild the index in less than 1 hour.

Recovery Method Used

Expected

Database

Blob Storage

File System

RPO

RTO

AZ Failover

AZ Redundancy

Snapshot

No loss

Minutes

Point of Time Snapshot

Same Region Replication (SRR)

Snapshot

5-15 minutes *

Minutes

Cross Region Snapshot

Cross Region Replication (CRR)

Snapshot

5-15 minutes *

Minutes


* Use of the Repair - Reconcile component database from blob store task within Nexus Repository may further reduce the RPO with some delay after recovery.