Hashing and Signatures in GDP 2018

GDP 2018 is much more reliant in hashing, both to validate PDUs (e.g., signed ACKs) and log records. This version includes hash backlinks between records, query-by-record-hash, and similar security features. Hashing is also important when preparing for a signature (that is, the signature will always be a signature on the hash).

Problem Statement

There are two cases for hashes/signatures: transient and persistent.

  • Transient is for signing commands to demonstrate authenticity. Once verified, the signature can be discarded. This is the relatively easy case since it doesn't matter if the underlying data format changes.

  • Persistent is for signing data. Even if the signature itself is not stored, it is essential that the hash never change. Thus, signing the marshaled protobuf does not work, since any future change to the schema (or even the implementation) breaks the system.

Transient Hashes and Signatures

The easiest way to do this would be to break the PDU into two (or possibly three) parts: a body and a signature over the body. The body would include the command, the request id, the sequence number, the hash of the previous record, and any command-specific payload. The simplest way to do this would be to marshal the body and then hash and sign the serialized data, putting both into another protobuf. However, this would involve considerable data copying (since the first protobuf would have to be copied into the second).

The alternative would be to have each command body have an associated module to hash the contents. This complicates the implementation but avoids extra copies.

Persistent Hashes and Signatures

These apply only to payloads containing persistent data. At this time, those commands would be for the APPEND command and the ACK_CONTENT response. Also note that the command must not be included in the hash since APPENDs and ACK_CONTENTs are different commands.

It seems like the easiest thing to do is to run the hash over:

  • The (nominal) record number, in some specified byte order.
  • The timestamp in some defined format.
  • The hash of the previous record.
  • The opaque data payload.

This is analogous to the database schema:

    CREATE TABLE log_entry (
       hash BLOB(32) PRIMARY KEY,
       recno INTEGER,
       timestamp INTEGER,
       prevhash BLOB(32),
       value BLOB,
       sig BLOB);
    CREATE INDEX recno_index
       ON log_entry(recno);
    CREATE INDEX timestamp_index
       ON log_entry(timestamp);;

Note that it is not feasible to just run the signature over the entire Layer 5 PDU after converting the command to a fixed value, since it also includes the request identifier (rid) and sequence number (seqid) which are themselves transient.

Future Issues

This discussion does not provide for future Nitesh-style proofs. That's probably OK for this purpose. It might be made slightly more extensible if the metadata for each persistent record included the version number of the hash (that is, a version number that would indicate what the hash was taken over, not the version number of the hash function itself).