Project

General

Profile

Bug #47

No sync to disk after timestamp index update

Added by Nitesh Mor almost 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
gdplogd
Start date:
08/23/2016
Due date:
% Done:

0%


Description

In gdplogd timestamp index maintenance, there ought to be a DB->sync() after any update (See source:gdplogd/logd_disklog.c#L294). At present, after a record append, the last-modified timestamp on the files in the log-server's filesystem shows that the gdptidx file is not updated for a while (probably stays cached in memory). The read-by-timestamp works well, which means that the index is updated (at least in the memory).

History

#1 Updated by Eric Allman almost 7 years ago

  • Status changed from New to Resolved

DB->sync does an fsync system call, which is slow and expensive. Doing three on every write (since it would have to happen for the data file and both indices) is not a good idea.

On the other hand, I have changed the code so that all three files are synced when the log is closed, and more importantly I close all logs when the daemon shuts down. This should reduce the window somewhat.

Nitesh and I discussed having some sort of "fsck for logs" functionality. That could be done either as a separate program that is run before gdplogd starts up or as a part of gdplogd as it starts up. The advantage of the former is that it reduces the code in the daemon, which we want to keep as small as possible. The advantage of the latter is that it might enable recovery while the server was running (probably if it detects a corrupt log). At some point we definitely need such a utility.

No matter how it is implemented, any "fsck for logs" has the potential for being very time consuming, since it might have to read the entire data file in order to rebuild the indices. One way around that might be to sync the on-disk files every N writes or M milliseconds, thus bounding the likely corruption. However, doing a thorough scan would still require reading all of the files in their entirety.

#2 Updated by Nitesh Mor almost 7 years ago

  • Status changed from Resolved to Closed

Doing a sync at the time of closing and a potential recovery operation before/during gdplogd startup seems to be a good enough compromise, given the cost of fsync with each append.

Closing this issue.

Also available in: Atom PDF