Backup and Restore in Amazon Web Services
This section covers recovery from backup in Amazon Web Services (AWS). While the details here are specific to AWS, a similarly effective deployment model is likely possible with other cloud vendors. As a best practice, you should also examine current AWS documentation before implementing your deployment to validate that AWS services have not changed in a way that will impact your needs and estimate deployment costs.
Planning Your Backup Strategy
When determining your backup strategy, consider the following:
Your organization’s data retention policy
Your budget
The scenarios against which you are trying to protect, such as the following:
AWS outages
Data corruption
Inadvertent large-scale deletes
Deployment errors
Cyber attacks
You will need to account for these areas of your Nexus Repository deployment when creating a backup strategy:
While backed up independently, the database does include references to objects in the blob store(s), so you should ensure that the backups of both occur around the same time to minimize the drift between the two.
As you review the backup strategies in the following sections, there are two important terms to remember:
Recovery Point Objective (RPO)- the amount of downtime and resultant data loss that is acceptable to lose if a restore becomes necessary
Recovery Time Objective (RTO)- the length of time required to restore the service
Database Backups
Note
The backup method described here requires an external database. If using H2 or Orient, you should take down the instance to do the snapshot to avoid catching the database in an inconsistent state.
You can configure Amazon’s Relational Database Service (RDS) to automatically snapshot RDS instances as frequently as every 5 minutes; you can also create a manual snapshot on demand. It is important to configure the retention policy for these snapshots to meet your data retention requirements.
RDS provides a mechanism for restoring to a specific point in time. Depending upon the nature of the incident, you may still be able to recover Nexus Repository content (not configuration) added after the incident using the Repair - reconcile component database from the blob store task.
Note
If you have a support contract, contact Sonatype support for assistancebeforeusing any repair task.
You can also configure RDS to replicate backup snapshots to a distinct region automatically. This allows you to restore Nexus Repository to a secondary AWS region in the event of a complete failure of the deployed region. When using this replication, you must replicate the blob storage to the same region as the RDS backup.
Snapshots of an RDS instance are associated with that instance. In some scenarios, the snapshots may be deleted automatically when an RDS instance is removed. To avoid this, you should periodically export database snapshot data to S3. These exported snapshots are also important should access to the database be maliciously compromised.
Blob Store Backups
You cannot use all file systems for all purposes; consult the Sonatype Nexus Repository System Requirements for more information.
Amazon S3’s standard storage classes largely provide redundancy across at least three availability zones (AZs). When used for Nexus Repository’s blob storage in conjunction with an available RDS database, this allows an RTO on the order of minutes from detecting an AWS AZ outage.
Amazon’s S3 service also supports cross-region replication. Their documentation states that replication typically occurs within 15 minutes; however, objects may take longer in some cases. This means you can expect an average RTO of 15 minutes or less for blobs and should allow eventual recovery of all blobs.
File System Backups
If using EBS, point-in-time snapshots can help provide multi-AZ redundancy. While EBS only uses one AZ, it stores snapshots in multiple AZs. In the event of an AZ failure, you could use EBS snapshots to provision a new EBS volume in another AZ. However, this is not automatic. You will need to implement a process such as specifying the EBS snapshot ID in the EC2 launch configuration. See the AWS documentation for more information. Amazon EBS also offers a higher durability volume type (i.e., io2), that is designed to provide 99.999% durability with an annual failure rate (AFR) of 0.001%, where failure refers to a complete or partial loss of the volume.
Amazon EFS Standard is inherently designed to protect against losing an entire AZ.
Nexus Repository also stores Elasticsearch indexes on the local disk. Snapshots of the file system do not guarantee a consistent index, but you can rebuild this while Nexus Repository is online.
Expected Results
The table below outlines various recovery methods and the RPO and RTO you can expect to achieve given your selected database, blob storage, and file system. The expected RTO does not include the time required to rebuild the Elasticsearch index. Primary Nexus Repository services are expected to be functional while this occurs; even large instances typically rebuild the index in less than 1 hour.
Recovery Method Used | Expected | |||
Database | Blob Storage | File System | RPO | RTO |
AZ Failover | AZ Redundancy | Snapshot | No loss | Minutes |
Point of Time Snapshot | Same Region Replication (SRR) | Snapshot | 5-15 minutes | Minutes |
Cross Region Snapshot | Cross Region Replication (CRR) | Snapshot | 5-15 minutes | Minutes |