Project

General

Profile

Preview of GDP version 2.1

A high level overview of changes we expect to deliver in GDP version 2.1.
Added by Eric Allman almost 5 years ago

Following is a brief summary of significant user-visible features and changes expected to be in version 2.1 of the Global Data Plane. Note: version 2.1 has not yet been released, and not all of the described features have been pushed to the repository.

This summary is still a rough draft. Everything herein is subject to change.

Overview

Log Naming

One of the major intended properties of the GDP is that readers can verify the provenance of the data that they read. This is predicated on the reader having the correct internal name of the log, which is a SHA-256 of the log metadata. The metadata includes the public key matching the secret key needed to sign data written to the log. Thus, given the internal name of the log, one can guarantee that the metadata matches the desired log, and the data signatures match the key in the metadata.

However, humans don't work well with long binary strings, so there has to be some way of mapping a human-oriented name to the GDP name. Before 2.1, GDP names were computed as the SHA-256 of the human-readable name and these security properties did not apply. The new technique is to use a Human-Oriented Name to GDPname Directory, as described in the next section.

The change in the algorithm used to create the 256-bit GDPname of a log means that in general, the old human-oriented names will no longer work to access logs that were created using a different algorithm. See the What This Means section for a deeper discussion of this issue.

Human-Oriented Name to GDPname Directory (HONGDS)

Since there is no longer an algorithmic way to derive an internal GDPname from a human-oriented name, this now has to be done using a directory service. This service maintains a database mapping the human-oriented name to the internal GDPname.

The current implementation is a stub that directly accesses a MySQL database from applications. This has several implications:

  • The implementation is easily spoofable; in particular, there is no way for a client to confirm the authenticity of a name mapping.
  • The implementation doesn't scale well, since the database is not distributed or replicated.
  • The implementation isn't durable.

When new logs are created the human-oriented name is inserted into the log metadata, so given just the logs themselves it is theoretically possible to rebuild this database, but this is also not scalable. For now, this service should be considered to be a work in progress.

Docker Packaging

  • Easier installation.
  • Multiple versions can easily run in parallel.

Miscellaneous Smaller Changes

There is a new system environment variable GDP_NAME_ROOT that you can use to simulate namespaces. For example, if GDP_NAME_ROOT=edu.berkeley.edu.eecs.eric then specifying a name such as test on the command line will actually search for edu.berkeley.edu.eecs.eric.test. Input names that already have dots will be tried as originally specified before prefixing the qualification.

Logs now have two keypairs: owner and writer. Previously they only had a writer keypair. If no writer keypair exists, the owner keypair will be replicated into the writer keypair. This should be largely transparent.

Results read using asynchronous reads of large data sets will be returned more reliably in version 2.1. Previously, if the network layer reordered results (e.g., due to packet loss) some data might not be delivered to the application.

What This Means For You

Naming Changes

Old logs will not be accessible using the names you are familiar with unless you use compatibility workarounds (see below).

Since existing logs were created using a different algorithm for computing the name, the primary security property (verifying that the name matches the metadata) will fail. Old logs must be accessed without checking this condition. See the Back Compatibility section for more information.

Installation and Configuration

  • Use docker packages.
  • Install MySQL for Human-Oriented Name to GDPname Directory Service (HONGDS). [There should be a script to do this.]
  • Initialize HONGDS database.
  • Create pointer to HONGDS using administrative parameters. See the Details section for more information.

Log Creation

There is a new programmatic API for log creation. In particular, gdp_gin_create has changed substantially. Programs that wish to create logs directly will need to be updated.

Scripts that use gdp-create to create logs will probably not be affected.

The existing Log Creation Service will need to be updated to update HONGDS in addition to its other functions.

Back Compatibility

To ease the transition between the old method of naming logs (using the SHA-256 of the human-oriented name) to the new method (using the SHA-256 of the log metadata), the administrative parameter swarm.gdp.compat.lognames can be set to true.
* Manually add entries to HONGDS database to render existing names usable.

Details

Log Creation

New and Changed APIs

  • gdp_create_info_new — added.
  • gdp_create_info_free — added.
  • gdp_create_info_add_metadata — added.
  • gdp_create_info_new_owner_key — added.
  • gdp_create_info_new_writer_key — added.
  • gdp_create_info_save_keys — added.
  • gdp_create_info_set_creator — added.
  • gdp_create_info_set_expiration — added. This API will probably change in the future.
  • gdp_create_info_set_owner_key — added.
  • gdp_create_info_set_writer_key — added.
  • gdp_gin_create — new parameters.
  • gdp_name_parse — was gdp_parse_name; new parameters. The old API still exists but is deprecated. Works with the Human to GDP Name Directory.
  • gdp_name_root_set — added.
  • gdp_name_root_get — added.

Changed Applications

There are several updates to the gdp-create command:

  • -K sets owner key location. As before, if it points to a file it should be an existing, previously created secret key; otherwise it should be a directory, into which the secret key will be saved. The default is to look for a subdirectory named KEYS, and if not found use the current directory.
  • The new flag -W is equivalent to -K, but for the writer key.
  • A new flag -w specifies that separate owner and writer keys should be created; by default the owner key is used as the writer key. If -W is specified and points to a file, -w is implied.

Human-Oriented Name to GDPname Directory

Configuration

There are several runtime configuration parameters controlling access to the Human-Oriented Name to GDPName Directory. Only the first of these is likely to be needed in most cases.

  • swarm.gdp.namedb.host — the IP name of the MySQL server host; generally the only parameter that must be set assuming the local server was set up using the standard defaults.
  • swarm.gdp.namedb.user
  • swarm.gdp.namedb.passwd
  • swarm.gdp.namedb.database
  • swarm.gdp.namedb.table

New APIs

  • gdp_name_resolve — accesses the Human to GDP Name Directory (added). Most applications should call gdp_name_parse.
  • gdp_name_update — added.

New Applications

  • gdp-name-add

Security

New and Changed APIs

  • gdp_datum_vrfy — added.
  • gdp_open_info_set_vrfy — new API to turn on read-side proof (signature) validation.
  • Searches for secret keys slightly expanded: swarm.gdp.crypto.key.dir is now a search path.

Changed Applications

  • gdp-reader-V flag to verify read results.

Miscellaneous

New and Changed APIs

  • ep_dbg_backtrace — takes a file pointer parameter (used to default to the debug file).
  • ep_file_search — resolve a fliesystem search path (added).
  • ep_funclist_push — changed paramters for called function.
  • ep_time_diff_usec — added.
  • ep_time_from_nsec — new parameter.
  • ep_time_from_sec — added.
  • ep_time_from_usec — added.
  • ep_time_zero — added.

Changed Applications

  • gdp-reader-o flag to set data output location.
  • log-view — for the moment, no longer supported.

Changed Semantics

  • Environment can override administrative parameters (on a compile flag; may not survive the cut due to security issues).
  • Asynchronous results handled better (requires new generation of router). Previously large asynchronous data reads might terminate early in the presense of network glitches.
  • Some parameter renaming for consistency:
    • swarm.gdp.crypto.md.algswarm.gdp.crypto.digest.alg

Comments