Get backups running on gdp-0
For Terraswarm review. Back up 01→02, 02→03, 03→04, 04→01.
#1 Updated by Nitesh Mor over 3 years ago
- Status changed from New to In Progress
- Split the 4TB disk on gdp-0 into two logical volumes each, one for own data and one for the backup of a remote log-server.
- Created a different user gdp-backup, with SSH keys setup for password-less login.
- Changed the umask setting for the log-server to be a little more relaxed, such that gdp-backup can read the data
Data transfer in progress. Looks like a few hours for gdp-01's 100+GB data files.
#3 Updated by Nitesh Mor over 3 years ago
It turns out that naively doing an
rsync is not a very good strategy. Default behavior of
- make a decision whether a file needs to be synced or not based on modification time and filesize.
- if so, then do a full sync based on a rolling checksum.
With such an approach, if all the logs get a single new record appended, this kind of
rsync will require reading the entire contents of the disk. However, with the knowledge that the
.gdplog files are only appended to, we can use the
--append flag safely. For
.gdptidx files, this assumption does not hold true necessarily. But we can always regenerate them from the data file, so we don't even need to do any backups for them.