Global Data Plane Programmatic API

Editor: Eric Allman, U.C. Berkeley Swarm Lab, eric@cs.berkeley.edu
Version 2.1.0, 2018-10-20

This document describes the procedural programmatic interface to the Global Data Plane.  The native code is written in C for maximum flexibility and performance, but it is expected that most applications will be written in higher level languages, and hence there will be multiple language bindings for the GDP library.  There is also a REST interface that is not described in this document.

The GDP library uses the EP portability library, and applications are free to use that library as well; in particular, since the GDP libraries makes extensive use of the EP library some efficiencies may result from using them both.  However, this document does not attempt to define the EP library and describes it only as necessary.  However, one EP concept that appears commonly is the EP_STAT data type, which represents a status code that can be returned by a function to indicate completion status that includes a "severity" (e.g. OK, ERROR, SEVERE), a "registry" (in our case always UC Berkeley), a "module" (e.g., GDP or the EP library itself), and detail information.  An OK status can return a positive integer as extra information.

The code distribution includes an "apps" directory with two example programs: gdp-writer.c and gdp-reader.c, that show simple cases of how to append to a GOB and read from a GOB (including subscriptions).

This document corresponds to version 2.1 of the GDP library and associated applications.  Note that this interface is massively incompatible with version 0.9.

0  To Be Done

1  Terminology

Datum
A unit of data in the GDP; essentially a record.  Each datum has associated with it a record number (numbered sequentially from one), a commit timestamp (the time the record was committed into the GDP, as distinct from the time that the data originated), an associated blob containing the data itself, which we expect to be encrypted, and in most cases a signature.  Beware: because of our unique single-writer commit protocol, record numbers are not guaranteed to be unique nor continuous; for example, there can be no record 3 but two record 4s in some circumstances.  This should be rare.
GDP Object (GOB)
This represents an addressable entity in the Global Data Plane, which may be a log or a service.  It is always addressed with a location-independent 256-bit name (a "GDPName").  Applications interact with GOBs via GDP Instances (GINs).
GOB Instance (GIN)
An instance of a Global Data Plane log.  An application can have multiple instances of a given GDP Object.  It is the equivalent of a Unix File Descriptor (as distinct from a Unix file) that is created by gdp_gin_open.
GDP Name
The name of a Global Data Plane Object.  This is a 256-bit, opaque, globally unique number created as the SHA-256 hash of the metadata of the object.  The usual representation for printing is a 43 character base64 encoded string.
Human-Oriented Name
A name that human beings like to use, such as a conventional file name.  These are unrelated to the internal name.  A Human-Oriented Name to GDPname Directory Service (HONGDS) maintains a database mapping from one to the other.

In earlier versions of the GDP, both GINs and GOBs were referred to as GCLs (GDP Channel-Logs). These names are now broken apart for clarity. For historic reasons, some of the interfaces and much of the documentation still refer to GCLs, but that name is officially deprecated.

A Note on Naming: Function and type names are mostly intended to map cleanly to class and method names, which occasionally leads to somewhat tortured names.  For example, function names beginning with gdp_gin_ operate on objects of type gdp_gin_t (with a few exceptions such as gdp_gin_new) and will take a pointer to a gdp_gin_t as the first ("self") argument.  In most cases, everything that has a type (identified by a name ending _t) is probably a class.  More concretely, gdp_gin_read_by_recno(gin, ...) where gin is type gdp_gin_t is intended to map to a method "gin.read_by_recno(...)" (i.e, the operation read_by_recno on a variable of class gdp_gin), gdp_buf_getlength(buf, ...) maps to buf.getlength(...), etc.  In some cases these also apply to what would be class methods; for example gdp_gin_create(...) would map to a class method gdp_gin.create(...).


2  Theory of Operation

GDP-based applications rely on three major pieces: an in-process GDP library, a GDP Log Daemon, and the Routing Layer.  In the future there will also be a service layer, but to date that is ad hoc at best.  This document describes the GDP library API.

The primary point of the GDP library is to speak the network protocol between the application and the GDP Daemon. The library is threaded, with (at the moment) at least two threads: one to process events (data arriving from the daemon, although others can be added), and the other to run the application itself. This allows the application to pretend it is a sequential program while still allowing asynchronous input from the GDP Daemon (e.g., processing results from subscription requests).  Applications are free to create other threads as desired.  The code has been written to do the locking as necessary, but stress tests have not been run, so you may find unhappy results.

The primary abstraction is the GDP Object (GOB). A GOB represents the rendezvous point for communication in the data plane. It is not directly tied to either a file or a network connection. On creation, a GOB is assigned a 256-bit opaque name. A GOB is append-only to writers.  For the moment you can access the dataplane in one of two modes: synchronous mode (e.g., using gdp_gin_read_by_* for reading) or an asynchronous mode (e.g., using gdp_gin_read_by_*_async or gdp_gin_subscribe for reading). In asynchronous mode the original call return status applies to the sending of the command, while final results (including any read data) will be returned using callbacks or an event interface.  These are described in more detail below.

All GOBs are named with an opaque, location independent, 256-bit number from a flat namespace.  When printed it is shown as a base64-encoded value that is tweaked to be valid in a URI (that is, "+" and "/" are replaced with "–" and "_").  Applications may choose to overlay these unsightly names with some sort of directory service.  Such a directory service is planned but not yet implemented.

Applications using the GDP library should #include <gdp/gdp.h> for all the essential definitions.



3  GDP Operations

3.1 Data Types and Initialization


Name

GDP data types and basic utilities

Synopsis

#include <gdp/gdp.h>

// names: internal and printable
gdp_name_t InternalGdpName; // 256-bit number
gdp_pname_t PrintableGdpName; // base-64 encoded string
bool GDP_NAME_SAME(gdp_name_t a, gdp_name_t b);
bool gdp_name_is_valid(gdp_name_t gdpname);

// convert between printable and internal names
char *gdp_printable_name(const gdp_name_t InternalFormat,
gdp_pname_t PrintableFormat);
EP_STAT gdp_internal_name(const gdp_pname_t PrintableFormat,
gdp_name_t Internal);

// this will do human-friendly name lookup (may be expensive)
EP_STAT gdp_parse_name(const char *external,
gdp_name_t Internal);

gdp_gin_t *GdpLogInstance;

gdp_datum_t *GdpDatum; // data storage unit
gdp_hash_t *GdpHash; // hash function over a gdp_datum_t
gdp_sig_t *GdpSignature; // signature (multiple usage)
gdp_recno_t RecordNumber; // datum record number
EP_TIME_SPEC TimeStampSpec; // timestamp: see libep

Notes



Name

gdp_init — Initialize the GDP library

Synopsis

#include <gdp/gdp.h>

EP_STAT gdp_init(const char *gdpd_addr)

Notes


Name

GDP_LIB_VERSION — GDP library version

Synopsis

#include <gdp/gdp_version.h>

Notes


3.2  GOB Management

This section covers operations on GOBs as a whole, notably creating, opening, and closing.



Name

gdp_gin_create — Create an append-only GOB, returning a GIN

Synopsis

EP_STAT gdp_gin_create(gdp_create_info_t *gci,
const char *human_name,
gdp_gin_t **ginp)

Notes



Name

gdp_create_info_new — allocate a new creation information data structure

Synopsis

gdp_create_info_t *gdp_create_info_new(void)

Notes


Name

gdp_create_info_free — Free the creation information structure

Synopsis

void gdp_create_info_free(gdp_create_info_t *info)

Notes


Name

gdp_create_info_set_owner_key, gdp_create_info_set_writer_key — Set owner/writer key for a new GOB

Synopsis

EP_STAT gdp_create_info_set_owner_key(
gdp_create_info_t *info, EP_CRYPTO_KEY *skey, const char *dig_alg_name) EP_STAT gdp_create_info_set_writer_key( gdp_create_info_t *info, EP_CRYPTO_KEY *skey, const char *dig_alg_name)

Notes


Name

gdp_create_info_new_owner_key, gdp_create_info_new_writer_key — Create new owner/writer key for a new GOB

Synopsis

EP_STAT gdp_create_info_new_owner_key(
    gdp_create_info_t *info,
    const char *dig_alg_name,
const char *key_alg_name,
int key_bits,
const char *curve_name,
const char *key_enc_alg_name) EP_STAT gdp_create_info_new_writer_key(     gdp_create_info_t *info,
    const char *dig_alg_name,
const char *key_alg_name,
int key_bits,
const char *curve_name,
const char *key_enc_alg_name)

Notes


Name

gdp_create_info_set_creator — set name and domain of entity creating this GOB

Synopsis

EP_STAT gdp_create_info_set_creator(
                gdp_create_info_t *info,
    const char *user,
const char *domain)

Notes


Name

gdp_create_info_set_creation_service — set the name of the GDP creation service

Synopsis

EP_STAT gdp_create_info_set_creation_service(
                gdp_create_info_t *info,
    const char *creation_service_name)

Notes


Name

gdp_create_info_set_expiration — set GOB expiration parameters

Synopsis

EP_STAT gdp_create_info_add_expiration(
    gdp_create_info_t *info,
<to be determined>)

Notes


Name

gdp_create_info_add_metadata — add user-defined metadata to the GOB metadata

Synopsis

EP_STAT gdp_create_info_add_metadata(
                gdp_create_info_t *info,
    uint32_t md_name,
size_t md_len,
const char *md_val)

Notes


Name

gdp_gin_open — Open an existing GOB, returning a GIN

Synopsis

EP_STAT gdp_gin_open(gdp_name_t name,
		gdp_iomode_t rw,
                gdp_open_info_t *info,
                gdp_gin_t **ginp)

Notes


Name

gdp_open_info_new — Create a new open information data structure

Synopsis

gdp_open_info_t *gdp_open_info_new(void)

Notes


Name

gdp_open_info_free — Free the open information structure

Synopsis

void gdp_open_info_free(gdp_open_info_t *info)

Notes


Name

gdp_open_info_set_signing_key — Set signing key for an open GOB

Synopsis

EP_STAT gdp_open_info_set_signing_key(
    gdp_open_info_t *info,
    EP_CRYPTO_KEY *skey)

Notes


Name

gdp_open_info_set_signkey_cb — Set a callback function to read a signing key

Synopsis

EP_STAT gdp_open_info_set_signkey_cb(
gdp_open_info_t *info,
EP_STAT (*signkey_cb)(
gdp_name_t gname,
void *signkey_udata,
EP_CRYPTO_KEY **skey),
void *signkey_udata)

Notes


Name

gdp_open_info_set_caching — set caching behavior

Synopsis

EP_STAT gdp_open_info_set_caching(
     gdp_open_info_t *info,
     bool keep_in_cache)

Notes


Name

EP_STAT gdp_open_info_set_no_skey_nonfatal(
     gdp_open_info_t *info,
     bool no_skey_nonfatal)

Notes


Name

gdp_open_info_set_vrfy — set log verification behavior

Synopsis

EP_STAT gdp_open_info_set_vrfy(
     gdp_open_info_t *info,
     bool do_verification)

Notes


Name

gdp_gin_close — Close a GDP Instance (GIN) and release resources

Synopsis

EP_STAT gdp_gin_close(gdp_gin_t *gin)

Notes


3.3  Synchronous Operations

Synchronous operations block until the operation is complete.  They are the easiest interface for simple programs, but may not perform as well as the asynchronous versions.  The synchronous calls only read or write single records at a time; to operate on many records in one call, use the asynchronous versions.

If synchronous operations do not receive an acknowledgement, they will attempt to re-send the request after a timeout.


Name

gdp_gin_append — Append a record to a GOB

Synopsis

EP_STAT gdp_gin_append(gdp_gin_t *gin,
		gdp_datum_t *datum,
gdp_hash_t *prevhash)

Notes



Name

gdp_gin_read_by_recno, gdp_gin_read_by_ts, gdp_gin_read_by_hash — Read from a readable GIN

Synopsis

EP_STAT gdp_gin_read_by_recno(gdp_gin_t *gin,
		gdp_recno_t recno,
		gdp_datum_t *datum)
EP_STAT gdp_gin_read_by_ts(gdp_gin_t *gin, EP_TIME_SPEC *ts, gdp_datum_t *datum)
EP_STAT gdp_gin_read_by_hash(gdp_gin_t *gin,
gdp_hash_t *hash,
gdp_datum_t *datum)

Notes


3.4  Asynchronous Operations (Asynchronous I/O, Subscriptions, and Events)

Asynchronous operations allow an application to subscribe to one or more GOBs and receive events as those GOBs see activity.  The event mechanism is intended to be extensible for possible future expansion.

Every event has a type, a pointer to the GIN handle, and a pointer to a datum.  Applications could in principle define their own event types, but at the moment this functionality is not exposed.

All asynchronous operations return status and/or data via either a callback function or the event interface.  Callback functions may not be called in the same thread as the operation initiation.  If no callback function is given then the event interface is used; this has the effect of serializing the event stream.  In either case, it is the responsibility of the caller to free the event after use using gdp_event_free.

Note that asynchronous calls do not do retransmissions.



Name

gdp_event_t — event structure

Synopsis

typedef struct _gdp_event    gdp_event_t;

Notes


Name

gdp_event_cbfunc_t — event callback function type

Synopsis

typedef void (*gdp_event_cbfunc_t)(gdp_event_t *gev);

Notes



Name

gdp_gin_read_by_recno_async, gdp_gin_read_by_ts_async, gdp_gin_read_by_hash_async — Asynchronously read records from a readable GOB

Synopsis

typedef void (*gdp_event_cbfunc_t)(gdp_event_t *gev)

EP_STAT gdp_gin_read_by_recno_async(
gdp_gin_t *gin, gdp_recno_t start,
int32_t numrecs,
gdp_event_cbfunc_t cbfunc,
void *udata)
EP_STAT gdp_gin_read_by_ts_async(
gdp_gin_t *gin, EP_TIME_SPEC *start,
int32_t numrecs,
gdp_event_cbfunc_t cbfunc,
void *udata)
EP_STAT gdp_gin_read_by_hash_async(
gdp_gin_t *gin,
int32_t n_hashes,
gdp_hash_t **hashes,
gdp_event_cbfunc_t cbfunc,
void *udata)

Notes

start numrecs Behavior
1 10 Returns records 1–10 immediately and terminates the read.
–10 10 Returns records 11–20 immediately and terminates the read.
0 any Returns "4.02 bad option" failure.
–10 20 Returns records 11–20.
1 0 Returns records 1–20 immediately.
–30 30 Returns records 1–20 immediately.
30 10 Currently undefined.  Should probably wait until 10 more records are added before starting to return the data.
any –1 Returns "4.02 bad option" failure.




Name

gdp_gin_append_async — Asynchronously append one or more records to a writable GOB

Synopsis

EP_STAT gdp_gin_append_async(
gdp_gin_t *gin,
int32_t n_datums
gdp_datum_t **datums,
gdp_hash_t *prevhash,
gdp_event_cbfunc_t *cbfunc,
void *udata)

Notes


Name

gdp_gin_subscribe_by_* — Subscribe to a readable GOB

Synopsis

EP_STAT gdp_gin_subscribe_by_recno(
gdp_gin_t *gin, gdp_recno_t start,
int32_t numrecs,
gdp_sub_qos_t *qos;
gdp_event_cbfunc_t *cbfunc,
void *udata)
EP_STAT gdp_gin_subscribe_by_ts(
gdp_gin_t *gin, EP_TIME_SPEC *start,
int32_t numrecs,
gdp_sub_qos_t *qos;
gdp_event_cbfunc_t *cbfunc,
void *udata)

Notes

start numrecs Behavior
1 10 Returns records 1–10 immediately and terminates the subscription.
–10 10 Returns records 11–20 immediately and terminates the subscription.
0 0 Starts returning data when record 21 is published and continues forever.
–10 20 Returns records 11–20 immediately, then returns records 21–30 as they are published.  The subscription then terminates.
1 0 Returns records 1–20 immediately, then returns any new records published in perpetuity.
–30 30 Returns records 1–20 immediately, then returns records 21–30 as they are published.
30 10 Currently undefined.  Should probably wait until 10 more records are added before starting to return the data.
any –1 Returns "4.02 bad option" failure.


Name

gdp_sub_qos_new, gdp_sub_qos_free — allocate/free subscription quality of service information

Synopsis

gdp_sub_qos_t *gdp_sub_qos_new(void)
void *gdp_sub_qos_free(
gdp_sub_qos_t *qos)

Notes


Name

gdp_sub_qos_set_xyzzy — set xyzzy qos

Synopsis

gdp_sub_qos_set_xyzzy(
gdp_sub_qos_t *qos,
xxx yyy)

Notes


Name gdp_gin_unsubscribe — Unsubscribe GIN from an associated GOB

Synopsis

EP_STAT gdp_gin_unsubscribe(gdp_gin_t *gin,
		gdp_event_cbfunc_t *cbfunc,
		void *udata)

Notes



Name

gdp_event_next — get next asynchronous event

Synopsis

gdp_event_t *gdp_event_next(
gdp_gin_t *gin,
EP_TIME_SPEC *timeout)

Notes


Name

gdp_event_gettype — extract the type from the event

Synopsis

int gdp_event_gettype(gdp_event_t *gev)

Notes

Event Name Meaning
GDP_EVENT_DATA Data is returned in the event from a previous subscription or asynchronous read.
GDP_EVENT_DONE Indicates the end of a subscription or asynchronous read.
GDP_EVENT_SHUTDOWN Subscription is terminated because the log daemon has shut down.
GDP_EVENT_CREATED Status is returned from an asynchronous append, create, or other similar operation.
GDP_EVENT_MISSING The requested data was not available at this time, but more data may be available.
GDP_EVENT_SUCCESS
Generic asynchronous success status. See the detailed status using gdp_event_getstat.
GDP_EVENT_FAILURE Generic asynchronous failure status.  See the detailed status using gdp_event_getstat.

Name

gdp_event_getstat — extract the detailed result status from the event

Synopsis

EP_STAT gdp_event_getstat(gdp_event_t *gev)

Notes


Name

gdp_event_getgin — extract the GIN handle from the event

Synopsis

gdp_gin_t *gdp_event_getgin(gdp_event_t *gev)

Notes


Name

gdp_event_getdatum — get the datum associated with this event

Synopsis

gdp_datum_t *gdp_event_getdatum(gdp_event_t *gev)

Notes


Name

gdp_event_getudata — get user data associated with this event

Synopsis

void *gdp_event_getudata(gdp_event_t *gev)
    

Notes


3.6  Utilities


Name

gdp_name_parse, gdp_printable_name — parse an external representation to internal, create printable version of name

Synopsis

EP_STAT gdp_name_parse(
const char *external_name,
 gdp_name_t gob_name,
char **extended_name)

char *gdp_printable_name(const gdp_name_t gob_name,
gdp_pname_t printable)

Notes


Name

gdp_name_root_set, gdp_name_root_get — set/get the root name used by gdp_name_parse

Synopsis

EP_STAT gdp_name_root_set(
const char *root_name)
const char *gdp_name_root_get(void)

Notes


Name

gdp_gin_getname — Return the name of a GOB from a GIN

Synopsis

EP_STAT gdp_gin_getname(gdp_gin_t *gin,
		gdp_name_t namebuf)

Notes



Name

gdp_getstat — Return information about a GIN  [NOT YET IMPLEMENTED]

Synopsis

EP_STAT gdp_getstat(gdp_name_t gobname,
		gob_stat_t *statbuf)

Notes



Name

gdp_gin_getnrecs — return the number of records in an existing GOB

Synopsis

gdp_recno_t gdp_gin_getnrecs(gdp_gin_t *gin)

Notes


Name

gdp_gin_gethashalg — get the hash algorithm used by a GOB

Synopsis

    int gdp_gin_gethashalg(const gdp_gin_t *gin)

Notes


Name

gdp_gin_getsigalg — get the signature algorithm used by a GOB

Synopsis

    int gdp_gin_getsigalg(const gdp_gin_t *gin)

Notes


Name

gdp_gin_set_append_filter — filter appended data

Synopsis

void gdp_gin_set_append_filter(gdp_gin_t *gin,
             EP_STAT (*filter(gdp_datum_t *, void *),
             void *filterdata)

Notes


Name

gdp_gin_set_read_filter — filter read data

Synopsis

void gdp_gin_set_read_filter(gdp_gin_t *gin,
             EP_STAT (*filter(gdp_datum_t *, void *),
             void *filterdata)

Notes



Name

gdp_gin_print — print a GIN and associated GOB (for debugging)

Synopsis

void gdp_gin_print(const gdp_gin_t *gin, FILE *fp)

Notes


4  Datums (Records)

GOBs are represented as a series of records of type gdp_datum_t.  Each record has a record number, a commit timestamp, associated data, and possible signature information if the record was signed.  Record numbers are of type gdp_recno_t and count up by one as records are added (i.e., record numbers are unique within a GOB and dense).  Data is represented in dynamic buffers, as described below.

4.1  Datum Headers


Name

gdp_datum_new, gdp_datum_reset, gdp_datum_free, gdp_datum_print — allocate/free/print a datum structure

Synopsis

gdp_datum_t *gdp_datum_new(void)
void gdp_datum_reset(gdp_datum_t *datum)
void gdp_datum_free(gdp_datum_t *datum) void gdp_datum_print(const gdp_datum_t *datum, FILE *fp, uint32_t flags)

Notes


Name

gdp_datum_getrecno — get the record number from a datum

Synopsis

    gdp_recno_t gdp_datum_getrecno(const gdp_datum_t *datum)

Notes


Name

gdp_datum_getts — get the timestamp from a datum

Synopsis

    void gdp_datum_getts(const gdp_datum_t *datum, EP_TIME_SPEC *ts)

Notes


Name

gdp_datum_getdlen — get the data length from a datum

Synopsis

    size_t gdp_datum_getdlen(const gdp_datum_t *datum)

Notes


Name

gdp_datum_getdbuf — get the data buffer from a datum

Synopsis

    gdp_buf_t *gdp_datum_getdbuf(const gdp_datum_t *datum)

Notes


Name

gdp_datum_getsig — get the signature from a datum

Synopsis

    gdp_sig_t *gdp_datum_getsig(const gdp_datum_t *datum)

Notes


4.2  Data Buffers

Data buffers grow dynamically as needed.


Name

gdp_buf_new, gdp_buf_reset, gdp_buf_free — allocate, reset, or free a buffer

Synopsis

gdp_buf_t *gdp_buf_new(void)
void gdp_buf_reset(gdp_buf_t *b)
void gdp_buf_free(gdp_buf_t *b)

Notes


Name

gdp_buf_getlength — return the length of the data in the buffer

Synopsis

size_t gdp_buf_getlength(gdp_buf_t *b)

Notes


Name

gdp_buf_read, gdp_buf_peek, gdp_buf_drain — remove or peek at data in a buffer

Synopsis

size_t gdp_buf_read(gdp_buf_t *b, void *out, size_t sz)
size_t gdp_buf_peek(gdp_buf_t *b, void *out, size_t sz)
int gdp_buf_drain(gdp_buf_t *b, size_t sz)

Notes


Name

gdp_buf_write, gdp_buf_printf — copy data into a buffer

Synopsis

int gdp_buf_write(gdp_buf_t *b, void *in, size_t sz)
int gdp_buf_printf(gdp_buf_t *b, const char *fmt, ...)

Notes


Name

gdp_buf_move — move data from one buffer into another

Synopsis

int gdp_buf_move(gdp_buf_t *ob, gdp_buf_t *ib, size_t sz)

Notes


Name

gdp_buf_dump — print the contents of the buffer for debugging

Synopsis

void gdp_buf_dump(gdp_buf_t *b, FILE *fp)

Notes


4.3  Timestamps

The time abstraction is imported directly from the ep library.  Times are represented as follows:

#pragma pack(push, 1)
typedef struct
{
     int64_t	tv_sec;         // seconds since January 1, 1970
     uint32_t   tv_nsec;        // nanoseconds
float     tv_accuracy;    // accuracy in seconds
} EP_TIME_SPEC;
#pragma pack(pop)
Note that the host system struct timespec may not match this structure; some systems still represent the time with only four bytes for tv_sec, which expires in 2038.  The tv_accuracy field indicates an estimate for how accurate the clock is; for example, if you are running NTP this value is likely to be on the order of a few tens to a few hundreds of milliseconds, but if you set your clock manually it is likely to be several seconds or worse.

5  Signing and Encryption

5.1  Signing

Each log should have a public key in the metadata which is used to verify writes to the log.  The library hides most of the details of this, but some still appear.

The gdp-create command automatically creates a public/secret keypair unless otherwise specified.  See the man page for details.  The public part of the key is inserted into the log metadata and stored with the log.  The secret part is stored somewhere on your local filesystem, typically KEYS/gob-id.pem.  Normally gdp-create will encrypt the secret key with another key entered from the command line, although this can also be turned off.

When a GDP application attempts to open a log using gdp_gin_open, the library will attempt to find a secret key by searching the directories named in the swarm.gdp.crypto.key.path administrative parameter for a file having the same name as the log (with a .pem file suffix).  If that secret key is encrypted, the library will prompt the (human) user for the secret key password.  The default path is ".", "KEYS", "~/.swarm/gdp/keys", "/usr/local/etc/swarm/gdp/keys", and "/etc/swarm/gdp/keys".

Once the secret key has been located and decrypted, all further append requests will be signed using the secret key and verified by the log daemon against the public key in the log metadata.

5.2  Encryption

Encryption is explicitly not part of the GDP.  Ideally the GDP will never see unencrypted data.  However, read and write filters (see gdp_gin_set_append_filter and gdp_gin_set_append_filter for details) can be used to set encryption and decryption hooks for externally implemented encryption.  We need to make this easier.



To be done

Header files
Version info
PRIgdp_recno macro

Appendix A:  Examples

The following examples have not been validated with the v2 API, so they may be inaccurate.

The following pseudo-code example excerpts from apps/gdp-writer.c.

#include <gdp/gdp.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
gdp_gin_t *gin;
EP_STAT estat;
gdp_name_t gobiname; // internal name of GOB
gdp_datum_t *d;

// general startup and initialization
if (argc < 2)
usage_error();
estat = gdp_init();
if (!EP_STAT_ISOK(estat))
initialization_error(estat);
d = gdp_datum_new();

// parse command line name to internal format
estat = gdp_parse_name(argv[1], gobiname);
if (!EP_STAT_ISOK(estat))
name_syntax_error();

// attempt to create that name
estat = gdp_gin_create(gobiname, &gin);
if (!EP_STAT_ISOK(estat))
creation_error(estat);

// read lines from standard input
while (fgets(buf, sizeof buf, stdin) != NULL)
{
char *p = strchr(buf, '\n');
if (p != NULL)
*p = '\0';

// write them to the dataplane
if (gdp_buf_write(gdp_datum_getbuf(d), buf, strlen(buf)) < 0)
estat = GDP_STAT_BUFFER_FAILURE;
else
estat = gdp_gin_append(gin, d);
EP_STAT_CHECK(estat, break);
}

// cleanup and exit
gdp_gin_close(gin);
exit(!EP_STAT_ISOK(estat));
}

This example is a similar excerpt from apps/gdp-reader.c (without using subscriptions):

#include <gdp/gdp.h>

int main(int argc, char **argv)
{
gdp_gin_t *gin;
EP_STAT estat;
gdp_name_t gobiname; // internal name of GOB
gdp_datum_t *d;
gdp_recno_t recno;

// general startup and initialization
if (argc < 2)
usage_error();
estat = gdp_init();
if (!EP_STAT_ISOK(estat))
initialization_error(estat);
d = gdp_datum_new();

// parse command line name to internal format
estat = gdp_parse_name(argv[1], gobiname);
if (!EP_STAT_ISOK(estat))
name_syntax_error();
// attempt to open the GOB
estat = gdp_gin_open(gobiname, GDP_MODE_RO, &gin);
if (!EP_STAT_ISOK(estat))
open_error(estat, argv[1]);
  recno = 1;
for (;;)
{
estat = gdp_gin_read_by_recno(gin, recno++, d);
EP_STAT_CHECK(estat, break);
gdp_datum_print(d, stdout);
}
exit(0);
}


If you want to use subscriptions, the recno variable can be removed and the for loop replaced with:

	// enable the subscription
estat = gdp_gin_subscribe_by_recno(gin, 1, -1, NULL, NULL);
if (!EP_STAT_ISOK(estat))
subscribe_error(estat, argv[1]);

for (;;)
{
gdp_event_t *gev = gdp_event_next(true);
if (gdp_event_gettype(gev) != GDP_EVENT_DATA)
continue;
gdp_datum_print(gdp_event_getdatum(gev), stdout);
gdp_event_free(gev);
}


Appendix B:  Compiling and Linking

The GDP library uses a reduced version of libep and also uses the libevent library version 2.1. These will need to be included both during compilation and linking.

At compile time you must use:

-Ilibevent_includes_parent -Ilibep_includes_parent

Note that these take the parent of the directory containing the include files. For example, if the include files for libevent are in /usr/local/include/event2 and the include files for libep are in /usr/local/include/ep you only need to specify the one flag "-I/usr/local/include".

For linking you must use:
-Llibevent_libraries -levent -levent_pthreads -Llibep_libraries -lep
As before, if the libraries for libevent and libep are in the same directory you only need a single -L flag.
Libep is a library that I produced several years ago intended for use in sendmail. This uses a stripped down version of that library that excludes several things that would not be helpful here. For more details of the original (full) library, see http://www.neophilic.com/blogs/eric.php/2014/05/12/libep-portable-c-runtime.

For additional information, see the README file in the distribution directory.

Appendix C: Open Questions

This section is really an addendum to the document — a "scratch area" to keep track of issues that we still need to consider.  It may not be up to date.

C.1 Access Control

Do this using Access Control Lists (so each user/app has a keypair) or by passing public/secret keys around (so each GOB has a secret keypair). The latter makes revocation impossible (even for write access), so I prefer the ACL approach. Third way?

Revocation? Deep vs. Shallow. Deep = take away permissions that have already been given. Shallow = you can only prevent an accessor from getting to new versions. Argument: deep revocation is hard to do from a technical perspective and ultimately futile (someone might have taken a photo of a screen while they still had access), but is still what people are used to (Unix and SQL permissions work this way). Shallow is all that can really be guaranteed. Also, anything involving Certificate Revocation Lists (CRLs) is doomed to failure. This implies that ACLs are the correct direction.

ACLs get us into the question of identity. Pretending that a keypair represents an identity doesn't work in the real world where bad players simply create new "identities" (keypairs) when an old identity has become untrusted. See the extensive work in email sender reputation. However, when a bad player creates a new identity/keypair they do not get access to any previous grants, so this may be sufficient.

C.2 Naming

If each GOB has a secret keypair, then the public key is sufficient to name the entity. If not, then assigning a GOB a GUID on creation seems like the best approach. Having the user assign a name seems like a non-starter, if only because of the possibility of conflicts.

There will probably be some need for external naming, e.g., some overlay directory structure. That might be a different gob_type.

This seems like an open research topic.

C.3 Orphans, Expiration, Charging, and Accounting

If a GOB isn't linked into a directory structure and everyone forgets its name then it will live forever (or until it expires). This could be quite common if a GOB is temporary, that is, not a candidate for long-term archival.

Expiration could be an issue without some sort of charging, which implies accounting.

Charging and accounting will affect the API. It seems like on GOB creation the creator needs to offer payment for both carrying and storing the data. This payment would presumably accrue to the actors providing the actual service. Payment for storage might be limited time or indefinite time (i.e., it would be an endowment).

The creator could also specify a cost for any potential consumer in order to access the GOB. Such payments would accrue to the creator of the GOB, and might be used to fund continued access, i.e. it could be rolled back into the endowment. This would lean toward making less-used data disappear: appealing in some ways, but anathema to librarians and historians.

As for API effects, it seems that GOB creation needs to include a payment for initial service, a cost for access, and an account into which to deposit any consumer payments. Accessing a GOB only requires an offered payment (which is probably best viewed as a bid rather than a payment, thus allowing multiple providers to compete for access).

Note that none of this is dependent on the form of payment. It does however assume that there is a mutually agreed upon form of payment, i.e., a universal currency.

C.4 Quality of Service

Is Quality of Service specified on a particular GOB, a particular open instance of a GOB, or between a pair of endpoints?

What does QoS actually mean? For example, in a live media stream it probably means the resolution of the data stream (which determines real-time bandwidth), latency, and possibly jitter, but after that stream is stored the QoS will be resolution (as before), delivery bandwidth (how quickly you can download the video, for example), and possibly jitter of the network connection (that is, how even the data flow will be). Delivery bandwidth depends on the entire path between the data source and the data sync, and may be higher or lower than the bandwidth required to send a real-time version of the stream — for example, over a slow network link.

Appendix D: References

[Dab13a] Palmer Dabbelt, Swarm OS Universal Dataplane, August 22, 2013
[Dab13b] Palmer Dabbelt, What is the Universal Dataplane, Anyway?, September 17, 2013