Scanning for corrupt indices at gdplogd startup
Even with appropriate flushing of appends to data file and to index files (
timstamp=>recno), there is a chance of application/system crash at a specific moment which will end up in out-of-sync condition for the index files.
The root cause of this is that the update to data file and the index files is not an atomic operation. It'd be nice to have the ACID semantics for such updates, but one could argue that the index files could always be rebuilt from the data file, and hence the strong ACID semantics aren't needed for the entire operation. However, this leads to the necessity for detecting such corruption and repairing it.
There are various ways to do it, here is just one possible way: The corruption is limited to only one record (assuming the data is flushed with each append individually). At startup time,
gdplogd has to scan the local disk for the available records anyways. An extra step would be to verify the integrity of each index file by just looking at the most recent entry, and if needed, do a repair. Another optimization could be to keep a soft-state on persistent storage to mark non-corrupt logs on a clean shutdown.