210609 De-Duplication Database baselining
This Problem is Live
Problem Statement
The live de-duplication database requires re-baselining to ensure all de-duplication chunks are maintained in a single database.
FAQ’s
What’s the impact on my data?
Only jobs that have extended retention periods will be affected.
For those jobs, any data that was created immediately before the backup job and destroyed immediately after it, i.e. data that only existed for one night and was then deleted entirely, will no longer be retained by that backup job.
This is common practice for retaining data out to stop storage bloat and as such, jobs with lower retention periods will be retained out as normal and no data is affected.
Will it cost me more in Storage or Licensing?
The new jobs will not add net new storage and will not affect your licensing costs. The new jobs are only re-baselining the data the affected jobs.
Will it increase resource utilization, e.g. CPU utilization, on my environment?
There may be a small increase in utilization but we are throttling this. If you do experience any adverse effects please let the BackupSimple team know through the normal channels and we will work to accommodate you.
What jobs will you run and when?
We have been running the jobs already and we would like to continue to do so. We have seen no adverse effects and no customers have raised concerns, but if you do experience any issues please let us know immediately.
The table below shows the jobs affected and the actions we would like to run in your environment.
Cause
During scheduled Vendor Maintenance on the Backup System the De-Duplication database became split-brained due to unexpected routing to a secondary database. De-duplication chunks were then split across two databases and the primary database now needs to be re-baselined to the 9th June 2021
Impact
Only jobs that have extended retention periods will be affected.
For those jobs, any data that was created immediately before the backup job and destroyed immediately after it, i.e. data that only existed for one night and was then deleted entirely, will no longer be retained by that backup job.
Mitigation
A new series of Baselining Jobs, a standard Differential, will be run to realign the split-brain chunks to the primary database
Timeline
The following table gives a timeline of events throughout the problem
Date | Event |
---|---|
9th June 2021 | Scheduled Maintenance begins |
12th June 2021 | Health Checks indicate De-duplication issues Investigation begins |
14th June 2021 | Route to Secondary database closed De-duplication Split Brain closed |
15th June 2021 | Impact Analysis begins Data is intact but de-duplication is not running as efficiently due to missing chunks |
20th June 2021 | Deeper Analysis across customers for impact Re-baselining jobs begin |
24th June 2021 | Customer Communication Sent |
25th June 2021 | Deduplication repair process ongoing with Vendor. Dedup verification jobs finished. Running checks on headers. |
26th June 2021 | We are continuously running our efforts with Vendor to close any possible gaps. Additional jobs started. |
27th June 2021 | We are closely monitoring and reacting with Vendor right away on all jobs failures to mitigate issues. |
28th June 2021 | We are closely monitoring and reacting with Vendor right away on all jobs failures to mitigate issues. |
29th June 2021 | We are closely monitoring and reacting with Vendor right away on all jobs failures to mitigate issues. |
30th June 2021 | We are closely monitoring and reacting with Vendor right away on all jobs failures to mitigate issues. |
1st July 2021 | We are closely monitoring and reacting with Vendor right away on all jobs failures to mitigate issues. |
2nd July 2021 | We are finishing running full jobs to replace stabs. |