Global Data Plane Programmatic API

Editor: Eric Allman, U.C. Berkeley Swarm Lab, eric@cs.berkeley.edu
Version 0.8.0, 2017-07-27

This document describes the procedural programmatic interface to the Global Data Plane.  The native code is written in C for maximum flexibility and performance, but it is expected that most applications will be written in higher level languages, and hence there will be multiple language bindings for the GDP library.  There is also a REST interface that is not described in this document.

The GDP library uses the EP portability library, and applications are free to use that library as well; in particular, since the GDP libraries makes extensive use of the EP library some efficiencies may result from using them both.  However, this document does not attempt to define the EP library and describes it only as necessary.  However, one EP concept that appears commonly is the EP_STAT data type, which represents a status code that can be returned by a function to indicate completion status that includes a "severity" (e.g. OK, ERROR, SEVERE), a "registry" (in our case always UC Berkeley), a "module" (e.g., GDP or the EP library itself), and detail information.  An OK status can return a positive integer as extra information.

The code distribution includes an "apps" directory with two example programs: gdp-writer.c and gdp-reader.c, that show simple cases of how to append to a GCL and read from a GCL (including subscriptions).

1  Terminology

Datum
A unit of data in the GDP; essentially a record.  Each datum has associated with it a record number (numbered sequentially from one), a commit timestamp (the time the record was committed into the GDP, as distinct from the time that the data originated), an associated blob containing the data itself, which we expect to be encrypted, and in most cases a signature.
GDP Channel/Log
Global Data Plane Channel/Log.  This represents an addressable entity in the Global Data Plane, which may be a log or a service.

2  Theory of Operation, Data Types, and Initialization

GDP-based applications rely on three pieces: an in-process GDP library, a GDP Log Daemon, and the Routing Layer.   This document describes the GDP library API.

The primary point of the GDP library is to speak the network protocol between the application and the GDP Daemon. The library is threaded, with (at the moment) two threads: one to process events (data arriving from the daemon, although others can be added), and the other to run the application itself. This allows the application to pretend it is a sequential program while still allowing asynchronous input from the GDP Daemon (e.g., processing results from subscription requests).  Applications are free to create other threads as desired.  The code has been written to do the locking as necessary, but stress tests have not been run, so you may find unhappy results.

The primary abstraction is the GDP Channel-Log (GCL). A GCL represents the rendezvous point for communication in the data plane. It is not directly tied to either a file or a network connection. On creation, a GCL is assigned a 256-bit opaque name. A GCL is append-only to writers.  For the moment you can access the dataplane in one of two modes: synchronous mode (using gdp_gcl_read for reading) an asynchronous mode (using gdp_gcl_subscribe for reading). To use it in asynchronous mode you must subscribe to any GCLs of interest and then call gdp_event_next repeatedly to read the results.  These are described in more detail below.

All GCLs are named with an opaque, location independent, 256-bit number from a flat namespace.  When printed it is shown as a base64-encoded value that is tweaked to be valid in a URI (that is, "+" and "/" are replaced with "–" and "_").  Applications may choose to overlay these unsightly names with some sort of directory service.

Applications using the GDP library should #include <gdp/gdp.h> for all the essential definitions.


Name

GDP data types and basic utilities

Synopsis

#include <gdp/gdp.h>

gdp_name_t InternalGdpName; // 256-bit number
gdp_pname_t PrintableGdpName; // base-64 encoded string
bool GDP_NAME_SAME(gdp_name_t a, gdp_name_t b);
bool gdp_name_is_valid(gdp_name_t gname);
char *gdp_printable_name(const gdp_name_t Internal, gdp_pname_t Printable);
EP_STAT gdp_internal_name(const gdp_pname_t Printable, gdp_name_t Internal);
EP_STAT gdp_parse_name(const char *external, gdp_name_t Internal);

gdp_gcl_t *GdpChannelLog;

gdp_datum_t *GdpDatum;
gdp_recno_t RecordNumber;
EP_TIME_SPEC TimeStampSpec;

Notes



Name

gdp_init — Initialize the GDP library

Synopsis

#include <gdp/gdp.h>

EP_STAT gdp_init(const char *gdpd_addr)

Notes


Name

GDP_LIB_VERSION — GDP library version

Synopsis

#include <gdp/gdp_version.h>

Notes


3  GCL Operations

3.1  GCL Synchronous Operations

Synchronous operations block until the operation is complete.  They are the easiest interface for simple programs.


Name

gdp_gcl_create — Create an append-only GCL on a specified log daemon node   [TEMPORARY INTERFACE]

Synopsis

EP_STAT gdp_gcl_create(gdp_name_t gcl_name,
		gdp_name_t logdname,
gdp_gclmd_t *gmd,
gdp_gcl_t **gclp)

Notes



Name

gdp_gcl_open — Open an existing GCL

Synopsis

EP_STAT gdp_gcl_open(gdp_name_t name,
		gcl_iomode_t rw,
                gdp_gcl_open_info_t *info,
                gdp_gcl_t **gclp)

Notes


Name

gdp_gcl_open_info_new — Create a new open information data structure

Synopsis

gdp_gcl_open_info_t *gdp_gcl_open_info_new(void)

Notes



Name

gdp_gcl_open_info_free — Free the open information structure

Synopsis

void gdp_gcl_open_info_free(gdp_gcl_open_info_t *info)

Notes


Name

gdp_gcl_open_info_set_signing_key — Set signing key for an open GCL

Synopsis

EP_STAT gdp_gcl_open_info_set_signing_key(
    gdp_gcl_open_info_t *info,
    EP_CRYPTO_KEY *skey)

Notes


Name

gdp_gcl_open_info_set_signkey_cb — Set a callback function to read a signing key

Synopsis

EP_STAT gdp_gcl_open_info_set_signkey_cb(
gdp_gcl_open_info_t *info,
EP_STAT (*signkey_cb)(
gdp_name_t gname,
void *signkey_udata,
EP_CRYPTO_KEY **skey),
void *signkey_udata)

Notes


Name

gdp_gcl_open_info_set_caching — set caching behavior

Synopsis

EP_STAT gdp_gcl_open_info_set_caching(
     gdp_gcl_open_info_t *info,
     bool keep_in_cache)

Notes


Name

gdp_gcl_close — Close a GCL and release resources

Synopsis

EP_STAT gdp_gcl_close(gdp_gcl_t *gcl)

Notes



Name

gdp_gcl_getname — Return the name of a GCL

Synopsis

EP_STAT gdp_gcl_getname(gdp_gcl_t *gcl,
		gdp_name_t namebuf)

Notes



Name

gdp_gcl_getstat — Return information about a GCL  [NOT YET IMPLEMENTED]

Synopsis

EP_STAT gdp_gcl_getstat(gdp_name_t gclname,
		gcl_stat_t *statbuf)

Notes



Name

gdp_gcl_print — print a GCL handle (for debugging)

Synopsis

void gdp_gcl_print(const gdp_gcl_t *gclh, FILE *fp)

Notes


Name

gdp_gcl_append — Append a record to a writable GCL

Synopsis

EP_STAT gdp_gcl_append(gdp_gcl_t *gcl,
		gdp_datum_t *datum)

Notes



Name

gdp_gcl_read_by_recno, gdp_gcl_read_by_ts — Read from a readable GCL

Synopsis

EP_STAT gdp_gcl_read_by_recno(gdp_gcl_t *gcl,
		gdp_recno_t recno,
		gdp_datum_t *datum)
EP_STAT gdp_gcl_read_by_ts(gdp_gcl_t *gcl, EP_TIME_SPEC *ts, gdp_datum_t *datum)

Notes


Name

gdp_parse_name — parse an external representation to internal

Synopsis

EP_STAT gdp_parse_name(const char *ext, gdp_name_t gcl_name)

Notes


3.2  GCL Asynchronous Operations (Asynchronous I/O, Subscriptions, and Events)

Asynchronous operations allow an application to subscribe to one or more GCLs and receive events as those GCLs see activity.  The event mechanism is intended to be extensible for possible future expansion.

Every event has a type, a pointer to the GCL handle, and a pointer to a datum.  Applications could in principle define their own event types, but at the moment this functionality is not exposed.

All asynchronous operations return status and/or data via either a callback function or the event interface.  Callback functions may not be called in the same thread as the operation initiation.  If no callback function is given then the event interface is used; this has the effect of serializing the event stream.  In either case, it is the responsibility of the caller to free the event after use using gdp_event_free.

Note that asynchronous calls do not do retransmissions.



Name

gdp_event_t — event structure

Synopsis

typedef struct _gdp_event    gdp_event_t;

Notes


Name

gdp_event_cbfunc_t — event callback function type

Synopsis

typedef void (*gdp_event_cbfunc_t)(gdp_event_t *gev);

Notes



Name

gdp_gcl_read_async — Asynchronously read data from a GCL

Synopsis

EP_STAT gdp_gcl_read_async(
gdp_gcl_t *gcl,
gdp_recno_t recno,
gdp_event_cbfunc_t *cbfunc,
void *udata)

Notes


Name

gdp_gcl_append_async — Asynchronously append to a writable GCL

Synopsis

EP_STAT gdp_gcl_append_async(
gdp_gcl_t *gcl,
gdp_datum_t *datum,
gdp_event_cbfunc_t *cbfunc,
void *udata)

Notes


Name

gdp_gcl_subscribe_by_* — Subscribe to a readable GCL

Synopsis

EP_STAT gdp_gcl_subscribe_by_recno(
gdp_gcl_t *gcl, gdp_recno_t start,
int32_t numrecs,
gdp_sub_qos_t *qos;
gdp_event_cbfunc_t *cbfunc,
void *udata)
EP_STAT gdp_gcl_subscribe_by_ts(
gdp_gcl_t *gcl, EP_TIME_SPEC *start,
int32_t numrecs,
gdp_sub_qos_t *qos;
gdp_event_cbfunc_t *cbfunc,
void *udata)

Notes

start numrecs Behavior
1 10 Returns records 1–10 immediately and terminates the subscription.
–10 10 Returns records 11–20 immediately and terminates the subscription.
0 0 Starts returning data when record 21 is published and continues forever.
–10 20 Returns records 11–20 immediately, then returns records 21–30 as they are published.  The subscription then terminates.
1 0 Returns records 1–20 immediately, then returns any new records published in perpetuity.
–30 30 Returns records 1–20 immediately, then returns records 21–30 as they are published.
30 10 Currently undefined.  Should probably wait until 10 more records are added before starting to return the data.
any –1 Returns "4.02 bad option" failure.


Name

gdp_sub_qos_new, gdp_sub_qos_free — allocate/free subscription quality of service information

Synopsis

gdp_sub_qos_t *gdp_sub_qos_new(void)
void *gdp_sub_qos_free(
gdp_sub_qos_t *qos)

Notes


Name

gdp_sub_qos_set_xyzzy — set xyzzy qos

Synopsis

gdp_sub_qos_set_xyzzy(
gdp_sub_qos_t *qos,
xxx yyy)

Notes



Name

gdp_gcl_multiread— Read multiple records from a readable GCL

Synopsis

EP_STAT gdp_gcl_multiread(
gdp_gcl_t *gcl, gdp_recno_t start,
int32_t numrecs,
gdp_event_cbfunc_t cbfunc,
void *udata)
EP_STAT gdp_gcl_multiread_ts(
gdp_gcl_t *gcl, EP_TIME_SPEC *start,
int32_t numrecs,
gdp_event_cbfunc_t cbfunc,
void *udata) void (*cbfunc)(gdp_event_t *gev)

Notes

start numrecs Behavior
1 10 Returns records 1–10 immediately and terminates the read.
–10 10 Returns records 11–20 immediately and terminates the read.
0 any Returns "4.02 bad option" failure.
–10 20 Returns records 11–20.
1 0 Returns records 1–20 immediately.
–30 30 Returns records 1–20 immediately.
30 10 Currently undefined.  Should probably wait until 10 more records are added before starting to return the data.
any –1 Returns "4.02 bad option" failure.


Name

gdp_gcl_unsubscribe — Unsubscribe from a GCL

Synopsis

EP_STAT gdp_gcl_unsubscribe(gdp_gcl_t *gcl,
		gdp_event_cbfunc_t *cbfunc,
		void *udata)

Notes



Name

gdp_event_next — get next asynchronous event

Synopsis

gdp_event_t *gdp_event_next(
gdp_gcl_t *gcl,
EP_TIME_SPEC *timeout)

Notes


Name

gdp_event_gettype — extract the type from the event

Synopsis

int gdp_event_gettype(gdp_event_t *gev)

Notes

Event Name Meaning
GDP_EVENT_DATA Data is returned in the event from a previous subscription or asynchronous read.
GDP_EVENT_EOS Subscription is terminated.
GDP_EVENT_SHUTDOWN Subscription is terminated because the log daemon has shut down.
GDP_EVENT_CREATED Status is returned from an asynchronous append, create, or other similar operation.
GDP_EVENT_MISSING The requested data was not available at this time, but more data may be available.
GDP_EVENT_SUCCESS
Generic asynchronous success status. See the detailed status using gdp_event_getstat.
GDP_EVENT_FAILURE Generic asynchronous failure status.  See the detailed status using gdp_event_getstat.

Name

gdp_event_getstat — extract the detailed result status from the event

Synopsis

EP_STAT gdp_event_getstat(gdp_event_t *gev)

Notes


Name

gdp_event_getgcl — extract the GCL handle from the event

Synopsis

gdp_gcl_t *gdp_event_getgcl(gdp_event_t *gev)

Notes


Name

gdp_event_getdatum — get the datum associated with this event

Synopsis

gdp_datum_t *gdp_event_getdatum(gdp_event_t *gev)

Notes


Name

gdp_event_getudata — get user data associated with this event

Synopsis

void *gdp_event_getudata(gdp_event_t *gev)
    

Notes


3.3  GCL Open Information

There is potentially a huge amount of information that might be provided when a GCL is opened.  Because this set is open-ended, it is abstracted out into a separate API.  The gdp_gcl_open_info_t datastructure encapsulates this information and can be passed into gdp_gcl_open.


Name

gdp_gcl_open_info_new — create new open information datastructure

Synopsis

gdp_gcl_open_info_t *gdp_gcl_open_info_new(void)

Notes


Name

gdp_gcl_open_info_free — free an existing open information datastructure

Synopsis

void gdp_gcl_open_info_free(gdp_gcl_open_info_t *info)

Notes


Name

gdp_gcl_open_info_set_signing_key — set the signing key in an open information datastructure

Synopsis

EP_STAT gdp_gcl_open_info_set_signing_key(
		gdp_gcl_open_info_t *info,
		EP_CRYPTO_KEY *skey)

Notes




4  Datums (Records)

GCLs are represented as a series of records of type gdp_datum_t.  Each record has a record number, a commit timestamp, associated data, and possible signature information if the record was signed.  Record numbers are of type gdp_recno_t and count up by one as records are added (i.e., record numbers are unique within a GCL and dense).  Data is represented in dynamic buffers, as described below.

4.1  Datum Headers


Name

gdp_datum_new, gdp_datum_free, gdp_datum_print — allocate/free/print a datum structure

Synopsis

gdp_datum_t *gdp_datum_new(void)
void gdp_datum_free(gdp_datum_t *datum)
void gdp_datum_print(const gdp_datum_t *datum,
		FILE *fp,
		uint32_t flags)

Notes


Name

gdp_datum_getrecno — get the record number from a datum

Synopsis

    gdp_recno_t gdp_datum_getrecno(const gdp_datum_t *datum)

Notes


Name

gdp_datum_getts — get the timestamp from a datum

Synopsis

    void gdp_datum_getts(const gdp_datum_t *datum, EP_TIME_SPEC *ts)

Notes


Name

gdp_datum_getdlen — get the data length from a datum

Synopsis

    size_t gdp_datum_getdlen(const gdp_datum_t *datum)

Notes


Name

gdp_datum_getdbuf — get the data buffer from a datum

Synopsis

    gdp_buf_t *gdp_datum_getdbuf(const gdp_datum_t *datum)

Notes


Name

gdp_datum_getsigmdalg — get the signature message digest algorithm from a datum

Synopsis

    int gdp_datum_getmdalg(const gdp_datum_t *datum)

Notes


Name

gdp_datum_getsig — get the signature from a datum

Synopsis

    gdp_buf_t *gdp_datum_getsig(const gdp_datum_t *datum)

Notes


4.2  Data Buffers

Data buffers grow dynamically as needed.


Name

gdp_buf_new, gdp_buf_reset, gdp_buf_free — allocate, reset, or free a buffer

Synopsis

gdp_buf_t *gdp_buf_new(void)
void gdp_buf_reset(gdp_buf_t *b)
void gdp_buf_free(gdp_buf_t *b)

Notes


Name

gdp_buf_getlength — return the length of the data in the buffer

Synopsis

size_t gdp_buf_getlength(gdp_buf_t *b)

Notes


Name

gdp_buf_read, gdp_buf_peek, gdp_buf_drain — remove or peek at data in a buffer

Synopsis

size_t gdp_buf_read(gdp_buf_t *b, void *out, size_t sz)
size_t gdp_buf_peek(gdp_buf_t *b, void *out, size_t sz)
int gdp_buf_drain(gdp_buf_t *b, size_t sz)

Notes


Name

gdp_buf_write, gdp_buf_printf — copy data into a buffer

Synopsis

int gdp_buf_write(gdp_buf_t *b, void *in, size_t sz)
int gdp_buf_printf(gdp_buf_t *b, const char *fmt, ...)

Notes


Name

gdp_buf_move — move data from one buffer into another

Synopsis

int gdp_buf_move(gdp_buf_t *ob, gdp_buf_t *ib, size_t sz)

Notes


Name

gdp_buf_dump — print the contents of the buffer for debugging

Synopsis

void gdp_buf_dump(gdp_buf_t *b, FILE *fp)

Notes


4.3  Timestamps

The time abstraction is imported directly from the ep library.  Times are represented as follows:

#pragma pack(push, 1)
typedef struct
{
     int64_t	tv_sec;         // seconds since January 1, 1970
     uint32_t   tv_nsec;        // nanoseconds
float     tv_accuracy;    // accuracy in seconds
} EP_TIME_SPEC;
#pragma pack(pop)
Note that the host system struct timespec may not match this structure; some systems still represent the time with only four bytes for tv_sec, which expires in 2038.  The tv_accuracy field indicates an estimate for how accurate the clock is; for example, if you are running NTP this value is likely to be on the order of a few tens to a few hundreds of milliseconds, but if you set your clock manually it is likely to be several seconds or worse.

5  Signing and Encryption

5.1  Signing

Each log should have a public key in the metadata which is used to verify writes to the log.  The library hides most of the details of this, but some still appear.

The gcl-create command automatically creates a public/secret keypair unless otherwise specified.  See the man page for details.  The public part of the key is inserted into the log metadata and stored with the log.  The secret part is stored somewhere on your local filesystem, typically KEYS/gcl-id.pem.  Normally gcl-create will encrypt the secret key with another key entered from the command line, although this can also be turned off.

When a GDP application attempts to open a log using gdp_gcl_open, the library will attempt to find a secret key by searching the directories named in the swarm.gdp.crypto.key.path administrative parameter for a file having the same name as the log (with a .pem file suffix).  If that secret key is encrypted, the library will prompt the (human) user for the secret key password.  The default path is ".", "KEYS", "~/.swarm/gdp/keys", "/usr/local/etc/swarm/gdp/keys", and "/etc/swarm/gdp/keys".

Once the secret key has been located and decrypted, all further append requests will be signed using the secret key and verified by the log daemon against the public key in the log metadata.

5.2  Encryption

Encryption is explicitly not part of the GDP.  Ideally the GDP will never see unencrypted data.  However, read and write filters (see the next section) can be used to set encryption and decryption hooks for externally implemented encryption.

6  Miscellaneous and Utilities


Name

gdp_gcl_getnrecs — return the number of records in an existing GCL

Synopsis

gdp_recno_t gdp_gcl_getnrecs(gdp_gcl_t *gcl)

Notes


Name

gdp_gcl_set_append_filter — filter appended data

Synopsis

void gdp_gcl_set_append_filter(gdp_gcl_t *gcl,
             EP_STAT (*filter(gdp_datum_t *, void *),
             void *filterdata)

Notes




Name

gdp_gcl_set_read_filter — filter read data

Synopsis

void gdp_gcl_set_read_filter(gdp_gcl_t *gcl,
             EP_STAT (*filter(gdp_datum_t *, void *),
             void *filterdata)

Notes


To be done

Header files
Version info
PRIgdp_recno macro

Appendix A:  Examples

The following pseudo-code example excerpts from apps/gdp-writer.c.

#include <gdp/gdp.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
gdp_tcl_t *gcl;
EP_STAT estat;
gdp_name_t gcliname; // internal name of GCL
gdp_datum_t *d;

// general startup and initialization
if (argc < 2)
usage_error();
estat = gdp_init();
if (!EP_STAT_ISOK(estat))
initialization_error(estat);
d = gdp_datum_new();

// parse command line name to internal format
estat = gdp_gcl_parse_name(argv[1], gcliname);
if (!EP_STAT_ISOK(estat))
name_syntax_error();

// attempt to create that name
estat = gdp_gcl_create(gcliname, &gcl);
if (!EP_STAT_ISOK(estat))
creation_error(estat);

// read lines from standard input
while (fgets(buf, sizeof buf, stdin) != NULL)
{
char *p = strchr(buf, '\n');
if (p != NULL)
*p = '\0';

// write them to the dataplane
if (gdp_buf_write(gdp_datum_getbuf(d), buf, strlen(buf)) < 0)
estat = GDP_STAT_BUFFER_FAILURE;
else
estat = gdp_gcl_append(gcl, d);
EP_STAT_CHECK(estat, break);
}

// cleanup and exit
gdp_gcl_close(gcl);
exit(!EP_STAT_ISOK(estat));
}

This example is a similar excerpt from apps/gdp-reader.c (without using subscriptions):

#include <gdp/gdp.h>

int main(int argc, char **argv)
{
gdp_gcl_t *gcl;
EP_STAT estat;
gdp_name_t gcliname; // internal name of GCL
gdp_datum_t *d;
gdp_recno_t recno;

// general startup and initialization
if (argc < 2)
usage_error();
estat = gdp_init();
if (!EP_STAT_ISOK(estat))
initialization_error(estat);
d = gdp_datum_new();

// parse command line name to internal format
estat = gdp_gcl_parse_name(argv[1], gcliname);
if (!EP_STAT_ISOK(estat))
name_syntax_error();
// attempt to open the GCL
estat = gdp_gcl_open(gcliname, GDP_MODE_RO, &gcl);
if (!EP_STAT_ISOK(estat))
open_error(estat, argv[1]);
  recno = 1;
for (;;)
{
estat = gdp_gcl_read_by_recno(gcl, recno++, d);
EP_STAT_CHECK(estat, break);
gdp_datum_print(d, stdout);
}
exit(0);
}


If you want to use subscriptions, the recno variable can be removed and the for loop replaced with:

	// enable the subscription
estat = gdp_gcl_subscribe_by_recno(gcl, 1, -1, NULL, NULL);
if (!EP_STAT_ISOK(estat))
subscribe_error(estat, argv[1]);

for (;;)
{
gdp_event_t *gev = gdp_event_next(true);
if (gdp_event_gettype(gev) != GDP_EVENT_DATA)
continue;
gdp_datum_print(gdp_event_getdatum(gev), stdout);
gdp_event_free(gev);
}


Appendix B:  Compiling and Linking

The GDP library uses a reduced version of libep and also uses the libevent library version 2.1. These will need to be included both during compilation and linking.

At compile time you must use:

-Ilibevent_includes_parent -Ilibep_includes_parent

Note that these take the parent of the directory containing the include files. For example, if the include files for libevent are in /usr/local/include/event2 and the include files for libep are in /usr/local/include/ep you only need to specify the one flag "-I/usr/local/include".

For linking you must use:
-Llibevent_libraries -levent -levent_pthreads -Llibep_libraries -lep
As before, if the libraries for libevent and libep are in the same directory you only need a single -L flag.
Libep is a library that I produced several years ago intended for use in sendmail. This uses a stripped down version of that library that excludes several things that would not be helpful here. For more details of the original (full) library, see http://www.neophilic.com/blogs/eric.php/2014/05/12/libep-portable-c-runtime.

For additional information, see the README file in the distribution directory.

Appendix C: Open Questions

This section is really an addendum to the document — a "scratch area" to keep track of issues that we still need to consider.  It may not be up to date.

C.1 Access Control

Do this using Access Control Lists (so each user/app has a keypair) or by passing public/secret keys around (so each GCL has a secret keypair). The latter makes revocation impossible (even for write access), so I prefer the ACL approach. Third way?

Revocation? Deep vs. Shallow. Deep = take away permissions that have already been given. Shallow = you can only prevent an accessor from getting to new versions. Argument: deep revocation is hard to do from a technical perspective and ultimately futile (someone might have taken a photo of a screen while they still had access), but is still what people are used to (Unix and SQL permissions work this way). Shallow is all that can really be guaranteed. Also, anything involving Certificate Revocation Lists (CRLs) is doomed to failure. This implies that ACLs are the correct direction.

ACLs get us into the question of identity. Pretending that a keypair represents an identity doesn't work in the real world where bad players simply create new "identities" (keypairs) when an old identity has become untrusted. See the extensive work in email sender reputation. However, when a bad player creates a new identity/keypair they do not get access to any previous grants, so this may be sufficient.

C.2 Naming

If each GCL has a secret keypair, then the public key is sufficient to name the entity. If not, then assigning a GCL a GUID on creation seems like the best approach. Having the user assign a name seems like a non-starter, if only because of the possibility of conflicts.

There will probably be some need for external naming, e.g., some overlay directory structure. That might be a different gcl_type.

This seems like an open research topic.

C.3 Orphans, Expiration, Charging, and Accounting

If a GCL isn't linked into a directory structure and everyone forgets its name then it will live forever (or until it expires). This could be quite common if a GCL is temporary, that is, not a candidate for long-term archival.

Expiration could be an issue without some sort of charging, which implies accounting.

Charging and accounting will affect the API. It seems like on GCL creation the creator needs to offer payment for both carrying and storing the data. This payment would presumably accrue to the actors providing the actual service. Payment for storage might be limited time or indefinite time (i.e., it would be an endowment).

The creator could also specify a cost for any potential consumer in order to access the GCL. Such payments would accrue to the creator of the GCL, and might be used to fund continued access, i.e. it could be rolled back into the endowment. This would lean toward making less-used data disappear: appealing in some ways, but anathema to librarians and historians.

As for API effects, it seems that GCL creation needs to include a payment for initial service, a cost for access, and an account into which to deposit any consumer payments. Accessing a GCL only requires an offered payment (which is probably best viewed as a bid rather than a payment, thus allowing multiple providers to compete for access).

Note that none of this is dependent on the form of payment. It does however assume that there is a mutually agreed upon form of payment, i.e., a universal currency.

C.4 Quality of Service

Is Quality of Service specified on a particular GCL, a particular open instance of a GCL, or between a pair of endpoints?

What does QoS actually mean? For example, in a live media stream it probably means the resolution of the data stream (which determines real-time bandwidth), latency, and possibly jitter, but after that stream is stored the QoS will be resolution (as before), delivery bandwidth (how quickly you can download the video, for example), and possibly jitter of the network connection (that is, how even the data flow will be). Delivery bandwidth depends on the entire path between the data source and the data sync, and may be higher or lower than the bandwidth required to send a real-time version of the stream — for example, over a slow network link.

Appendix D: References

[Dab13a] Palmer Dabbelt, Swarm OS Universal Dataplane, August 22, 2013
[Dab13b] Palmer Dabbelt, What is the Universal Dataplane, Anyway?, September 17, 2013