On-Disk SQLite-based Log Format

The updated on-disk format is designed to be much more extensible than the previous format, and to support new functionality, especially vis-à-vis security. To these ends, the updated format is implemented with SQLite3, a database system specifically designed for embedded use.

Each log is implemented using a single SQLite database, which is a single file on disk. The schema is simple:

    CREATE TABLE log_entry (
       hash BLOB PRIMARY KEY,
       recno INTEGER,
       timestamp INTEGER,
       prevhash BLOB,
       value BLOB,
       sig BLOB);
    CREATE INDEX recno_index
       ON log_entry(recno);
    CREATE INDEX timestamp_index
       ON log_entry(timestamp);

This allows query-by-hash (using the primary key), by record number (using the secondary index), or by timestamp (ditto). Each log has an initial entry containing the serialized metadata using the log name as the primary key. Since ultimately the log name will be equal to the hash of the metadata, this will be consistent with other records.

There is no need for record numbers to be unique or contiguous (i.e., there can be gaps). The intent is that a "read by recno" will return a (possibly empty) set containing all the records with the matching record number. In contrast, "read by timestamp" will return the record with the lowest timestamp that is greater than the key value. In contrast, "read by hash" must return a unique record with an exact match (or no record at all).

Record hashes (hash and prevhash values) are over recno, timestamp, prevhash, and value. The signature is optional for long term storage, since the hash chain provides the same assurances, as long as all leaf records contain signatures. See Hashing and Signatures in GDP 2018 for more details.

This will probably have to be updated to include Nitesh-style proofs.