This document describes the procedural programmatic interface to the Global Data Plane. The native code is written in C for maximum flexibility and performance, but it is expected that most applications will be written in higher level languages, and hence there will be multiple language bindings for the GDP library. There is also a REST interface that is not described in this document.
The GDP library uses the EP portability library, and applications are
free to use that library as well; in particular, since the GDP libraries
makes extensive use of the EP library some efficiencies may result from
using them both. However, this document does not attempt to define
the EP library and describes it only as necessary. However, one EP
concept that appears commonly is the EP_STAT
data type,
which represents a status code that can be returned by a function to
indicate completion status that includes a "severity" (e.g. OK, ERROR,
SEVERE), a "registry" (in our case always UC Berkeley), a "module" (e.g.,
GDP or the EP library itself), and detail information. An OK status
can return a positive integer as extra information.
The code distribution includes an "apps" directory with two example programs: gdp-writer.c and gdp-reader.c, that show simple cases of how to append to a GCL and read from a GCL (including subscriptions).
GDP-based applications rely on three pieces: an in-process GDP library, a GDP Log Daemon, and the Routing Layer. This document describes the GDP library API.
The primary point of the GDP library is to speak the network protocol between the application and the GDP Daemon. The library is threaded, with (at the moment) two threads: one to process events (data arriving from the daemon, although others can be added), and the other to run the application itself. This allows the application to pretend it is a sequential program while still allowing asynchronous input from the GDP Daemon (e.g., processing results from subscription requests). Applications are free to create other threads as desired. The code has been written to do the locking as necessary, but stress tests have not been run, so you may find unhappy results.
The primary abstraction is the GDP Channel-Log (GCL). A GCL represents
the rendezvous point for communication in the data plane. It is not
directly tied to either a file or a network connection. On creation, a GCL
is assigned a 256-bit opaque name. A GCL is append-only to writers.
For the moment you can access the dataplane in one of two modes:
synchronous mode (using gdp_gcl_read
for reading) an
asynchronous mode (using gdp_gcl_subscribe
for reading). To
use it in asynchronous mode you must subscribe to any GCLs of interest and
then call gdp_event_next
repeatedly to read the
results. These are described in more detail below.
All GCLs are named with an opaque, location independent, 256-bit number from a flat namespace. When printed it is shown as a base64-encoded value that is tweaked to be valid in a URI (that is, "+" and "/" are replaced with "–" and "_"). Applications may choose to overlay these unsightly names with some sort of directory service.
Applications using the GDP library should #include
<gdp/gdp.h>
for all the essential definitions.
GDP data types and basic utilities
#include <gdp/gdp.h>
gdp_name_t InternalGdpName; // 256-bit number
gdp_pname_t PrintableGdpName; // base-64 encoded string
bool GDP_NAME_SAME(gdp_name_t a, gdp_name_t b);
bool gdp_name_is_valid(gdp_name_t gname);
char *gdp_printable_name(const gdp_name_t Internal, gdp_pname_t Printable);
EP_STAT gdp_internal_name(const gdp_pname_t Printable, gdp_name_t Internal);
EP_STAT gdp_parse_name(const char *external, gdp_name_t Internal);
gdp_gcl_t *GdpChannelLog;
gdp_datum_t *GdpDatum;
gdp_recno_t RecordNumber;
EP_TIME_SPEC TimeStampSpec;
#include <gdp/gdp.h>
EP_STAT gdp_init(const char *gdpd_addr)
gdpd_addr
parameter is the address to use to contact
the GDP routing layer in the format "host:port". If NULL
a system default is used. EP_STAT_OK
then the library failed
to initialize (for example, by being unable to acquire resources.
Failure to check this status may result in mysterious failures later.<gdp/gdp.h>
.
GDP_LIB_VERSION — GDP library version
#include <gdp/gdp_version.h>
gdp_version.h
file defines the integer constant GDP_LIB_VERSION
as the major, minor, and patch level of this version of the GDP library,
for example, 0x010203 for version 1.2.3. It can be used during
compilation. There is also a string GdpVersion
that
is suitable for printing.Synchronous operations block until the operation is complete. They are the easiest interface for simple programs.
EP_STAT gdp_gcl_create(gdp_name_t gcl_name, gdp_name_t logdname,
gdp_gclmd_t *gmd,
gdp_gcl_t **gclp)
*gclp
).EP_STAT gdp_gcl_open(gdp_name_t name, gcl_iomode_t rw, gdp_gcl_open_info_t *info, gdp_gcl_t **gclp)
rw
,
which may be GDP_MODE_RO
(read only), GDP_MODE_AO
(append only), or GDP_MODE_RA
(read and append).*gclp
.gdp_gcl_open_info_t *gdp_gcl_open_info_new(void)
void gdp_gcl_open_info_free(gdp_gcl_open_info_t *info)
info
.gdp_gcl_open
.EP_STAT gdp_gcl_open_info_set_signing_key(
gdp_gcl_open_info_t *info,
EP_CRYPTO_KEY *skey)
gdp_gcl_open_info_set_signkey_cb — Set a callback function to read a signing key
EP_STAT gdp_gcl_open_info_set_signkey_cb(
gdp_gcl_open_info_t *info,
EP_STAT (*signkey_cb)(
gdp_name_t gname,
void *signkey_udata,
EP_CRYPTO_KEY **skey),
void *signkey_udata)
gdp_gcl_open
call requires a secret key, that
that key was not passed in using gdp_gcl_open_info_set_signing_key
,
the callback function signkey_cb
is invoked to get a
key. It will only be invoked if the key is required (notably
because it isn't already cached).signkey_udata
is passed
through to signkey_cb
if it is invoked.EP_STAT gdp_gcl_open_info_set_caching(
gdp_gcl_open_info_t *info,
bool keep_in_cache)
gdp_gcl_close
is called.keep_in_cache
set to TRUE
may cause a cleanup thread to be spawned.EP_STAT gdp_gcl_close(gdp_gcl_t *gcl)
EP_STAT gdp_gcl_getname(gdp_gcl_t *gcl, gdp_name_t namebuf)
gcl
into
namebuf
. gdp_gcl_create
so that the name can be
shared to other nodes that want to gdp_gcl_open
it.EP_STAT gdp_gcl_getstat(gdp_name_t gclname, gcl_stat_t *statbuf)
void gdp_gcl_print(const gdp_gcl_t *gclh, FILE *fp)
EP_STAT gdp_gcl_append(gdp_gcl_t *gcl, gdp_datum_t *datum)
EP_STAT gdp_gcl_read_by_recno(gdp_gcl_t *gcl, gdp_recno_t recno, gdp_datum_t *datum)
EP_STAT gdp_gcl_read_by_ts(gdp_gcl_t *gcl, EP_TIME_SPEC *ts, gdp_datum_t *datum)
gdp_gcl_read_by_recno
reads the specified record number
and returns it in the user-supplied datum (see below).gdp_gcl_read_by_ts
reads the record dated on or
immediately after the indicated timestamp.EP_STAT gdp_parse_name(const char *ext, gdp_name_t gcl_name)
Asynchronous operations allow an application to subscribe to one or more GCLs and receive events as those GCLs see activity. The event mechanism is intended to be extensible for possible future expansion.
Every event has a type, a pointer to the GCL handle, and a pointer to a datum. Applications could in principle define their own event types, but at the moment this functionality is not exposed.
All asynchronous operations return status and/or data via either a
callback function or the event interface. Callback functions may not
be called in the same thread as the operation initiation. If no
callback function is given then the event interface is used; this has the
effect of serializing the event stream. In either case, it is the
responsibility of the caller to free the event after use using gdp_event_free
.
Note that asynchronous calls do not do retransmissions.
gdp_event_t — event structure
typedef struct _gdp_event gdp_event_t;
gdp_event_cbfunc_t — event callback function type
typedef void (*gdp_event_cbfunc_t)(gdp_event_t *gev);
gdp_event_next
.
gdp_gcl_read_async — Asynchronously read data from a GCL
EP_STAT gdp_gcl_read_async(
gdp_gcl_t *gcl,
gdp_recno_t recno,
gdp_event_cbfunc_t *cbfunc,
void *udata)
GDP_STAT_OK
if the read
command is successfully sent, and a later callback or event will give
the actual status; otherwise no callback or event will occur.cbfunc
is NULL
) or through cbfunc
.GDP_EVENT_DATA
if the read succeeded
or an error event if the read failed.gdp_gcl_append_async — Asynchronously append to a writable GCL
EP_STAT gdp_gcl_append_async(
gdp_gcl_t *gcl,
gdp_datum_t *datum,
gdp_event_cbfunc_t *cbfunc,
void *udata)
datum
to the GCL.GDP_STAT_OK
if the append
command is successfully sent, and a later callback or event will give
the actual status; otherwise no callback or event will occur.cbfunc
is NULL
) or through cbfunc
with an event
type of GDP_EVENT_ASTAT
.EP_STAT gdp_gcl_subscribe_by_recno(
gdp_gcl_t *gcl, gdp_recno_t start,
int32_t numrecs,
gdp_sub_qos_t *qos;
gdp_event_cbfunc_t *cbfunc,
void *udata)
EP_STAT gdp_gcl_subscribe_by_ts(
gdp_gcl_t *gcl, EP_TIME_SPEC *start,
int32_t numrecs,
gdp_sub_qos_t *qos;
gdp_event_cbfunc_t *cbfunc,
void *udata)
cbfunc
is specified, arranges to call callback when
a message is generated on the gcl
.gdp_event_free(gev)
.
cbfunc
is specified, subscription information is
available through the gdp_gcl_event
interface (see below).udata
is passed through untouched in generated
events. See below for the definition of gdp_event_t
.numrecs
records are returned, after which the
subscription is terminated. If numrecs
is 0 it waits
for data forever. start
parameter tells when to start the
subscription (that is, the starting record number for gdp_gcl_subscribe_by_recno
or the earliest time of interest for gdp_gcl_subscribe_by_ts
).gdp_gcl_subscribe_by_recno
, if start
is
negative, returns the most recent –start
records. If a negative start
indicates going back
more records than are available, it starts from the first record. start
specifies an existing record but there are
fewer than numrecs
records available, this returns the
available records and then waits for the additional data to appear as it
is published.gdp_gcl_subscribe_by_recno
, if start
is
zero, or in gdp_gcl_subscribe_by_ts
, it points past the
last record already in the log, no current records are returned (i.e.,
it returns new records as they are published).start | numrecs | Behavior |
1 | 10 | Returns records 1–10 immediately and terminates the subscription. |
–10 | 10 | Returns records 11–20 immediately and terminates the subscription. |
0 | 0 | Starts returning data when record 21 is published and continues forever. |
–10 | 20 | Returns records 11–20 immediately, then returns records 21–30 as they are published. The subscription then terminates. |
1 | 0 | Returns records 1–20 immediately, then returns any new records published in perpetuity. |
–30 | 30 | Returns records 1–20 immediately, then returns records 21–30 as they are published. |
30 | 10 | Currently undefined. Should probably wait until 10 more records are added before starting to return the data. |
any | –1 | Returns "4.02 bad option" failure. |
gdp_sub_qos_new, gdp_sub_qos_free — allocate/free subscription quality of service information
gdp_sub_qos_t *gdp_sub_qos_new(void)
void *gdp_sub_qos_free(
gdp_sub_qos_t *qos)
gdp_sub_qos_set_xyzzy — set xyzzy qos
gdp_sub_qos_set_xyzzy(
gdp_sub_qos_t *qos,
xxx yyy)
EP_STAT gdp_gcl_multiread(
gdp_gcl_t *gcl, gdp_recno_t start,
int32_t numrecs,
gdp_event_cbfunc_t cbfunc,
void *udata)
EP_STAT gdp_gcl_multiread_ts(
gdp_gcl_t *gcl, EP_TIME_SPEC *start,
int32_t numrecs,
gdp_event_cbfunc_t cbfunc,
void *udata) void (*cbfunc)(gdp_event_t *gev)
cbfunc
is specified, arranges to call callback when
a message is generated on the gcl
. See below for the
definition of gdp_event_t. gdp_event_free(gev)
.cbfunc
is specified, subscription information is
available through the gdp_gcl_event
interface (see below).udata
is passed through untouched in generated
events. See below for the definition of gdp_event_t
.numrecs
records are returned, after which the
subscription is terminated. If numrecs
is 0 it reads
to the end of the existing data. start
parameter tells when to start the
subscription (that is, the starting record number).start
is negative, returns the most recent –start
records. If a negative start
indicates going back
more records than are available, it starts from the first record. start
specifies an existing record but there are
fewer than numrecs
records available, only the existing
records are returned.start
is zero, an error is returned.start | numrecs | Behavior |
1 | 10 | Returns records 1–10 immediately and terminates the read. |
–10 | 10 | Returns records 11–20 immediately and terminates the read. |
0 | any | Returns "4.02 bad option" failure. |
–10 | 20 | Returns records 11–20. |
1 | 0 | Returns records 1–20 immediately. |
–30 | 30 | Returns records 1–20 immediately. |
30 | 10 | Currently undefined. Should probably wait until 10 more records are added before starting to return the data. |
any | –1 | Returns "4.02 bad option" failure. |
EP_STAT gdp_gcl_unsubscribe(gdp_gcl_t *gcl, gdp_event_cbfunc_t *cbfunc, void *udata)
cbfunc
and or udata
is NULL
they are treated as wildcards. For example, "gdp_gcl_unsubscribe(gcl,
NULL, NULL)
" deletes all subscriptions for the given GCL.gdp_event_t *gdp_event_next(
gdp_gcl_t *gcl,
EP_TIME_SPEC *timeout)
NULL
if the timeout
expires.
If timeout
is NULL
, it waits forever. int gdp_event_gettype(gdp_event_t *gev)
Event Name | Meaning |
GDP_EVENT_DATA
|
Data is returned in the event from a previous subscription or asynchronous read. |
GDP_EVENT_EOS
|
Subscription is terminated. |
GDP_EVENT_SHUTDOWN | Subscription is terminated because the log daemon has shut down. |
GDP_EVENT_CREATED |
Status is returned from an asynchronous append, create, or other similar operation. |
GDP_EVENT_MISSING |
The requested data was not available at this time, but more data may be available. |
GDP_EVENT_SUCCESS |
Generic asynchronous success status. See the detailed status using gdp_event_getstat. |
GDP_EVENT_FAILURE | Generic asynchronous failure status. See the detailed status using gdp_event_getstat. |
gdp_event_getstat — extract the detailed result status from the event
EP_STAT gdp_event_getstat(gdp_event_t *gev)
EP_STAT_OK
, but may be otherwise in some event types
such as GDP_EVENT_ASTAT
.gdp_gcl_t *gdp_event_getgcl(gdp_event_t *gev)
gdp_datum_t *gdp_event_getdatum(gdp_event_t *gev)
gdp_event_getudata — get user data associated with this event
void *gdp_event_getudata(gdp_event_t *gev)
There is potentially a huge amount of information that might be provided
when a GCL is opened. Because this set is open-ended, it is
abstracted out into a separate API. The gdp_gcl_open_info_t
datastructure encapsulates this information and can be passed into gdp_gcl_open
.
gdp_gcl_open_info_new — create new open information datastructure
gdp_gcl_open_info_t *gdp_gcl_open_info_new(void)
gdp_gcl_open_info_t
datastructure.gdp_gcl_open_info_free — free an existing open information datastructure
void gdp_gcl_open_info_free(gdp_gcl_open_info_t *info)
gdp_gcl_open_info_t
datastructure.gdp_gcl_open_info_set_signing_key — set the signing key in an open information datastructure
EP_STAT gdp_gcl_open_info_set_signing_key( gdp_gcl_open_info_t *info, EP_CRYPTO_KEY *skey)
skey
to do the signing for any GCL opened using this datastructure.info
is freed, the signing key
will be freed as well.
GCLs are represented as a series of records of type gdp_datum_t
.
Each record has a record number, a commit timestamp, associated data, and
possible signature information if the record was signed. Record
numbers are of type gdp_recno_t
and count up by one as
records are added (i.e., record numbers are unique within a GCL and
dense). Data is represented in dynamic buffers, as described below.
gdp_datum_new, gdp_datum_free, gdp_datum_print — allocate/free/print a datum structure
gdp_datum_t *gdp_datum_new(void) void gdp_datum_free(gdp_datum_t *datum) void gdp_datum_print(const gdp_datum_t *datum, FILE *fp, uint32_t flags)
gdp_datum_new
allocates a new empty datum.gdp_datum_free
frees a datum.gdp_datum_print
writes a description of the datum
(including the data contents) to the given file. If flags includes
the GDP_DATUM_PRTEXT bit, it shows the
datum as plain text (the default shows it as a hex dump). It is up
to the caller to determine that the datum is printable. If the GDP_DATUM_PRSIG bit is set, signature
information is included. If the GDP_DATUM_PRDEBUG
flag is set, additional information about the datum is printed.gdp_recno_t gdp_datum_getrecno(const gdp_datum_t *datum)
void gdp_datum_getts(const gdp_datum_t *datum, EP_TIME_SPEC *ts)
size_t gdp_datum_getdlen(const gdp_datum_t *datum)
gdp_buf_t *gdp_datum_getdbuf(const gdp_datum_t *datum)
int gdp_datum_getmdalg(const gdp_datum_t *datum)
gdp_buf_t *gdp_datum_getsig(const gdp_datum_t *datum)
Data buffers grow dynamically as needed.
gdp_buf_t *gdp_buf_new(void)
void gdp_buf_reset(gdp_buf_t *b)
void gdp_buf_free(gdp_buf_t *b)
gdp_buf_new
creates a new, empty buffer.gdp_buf_reset
clears the buffer, leaving it in the same
condition as when it was first created.gdp_buf_free
frees the buffer. It must not be used
again after being freed.size_t gdp_buf_getlength(gdp_buf_t *b)
size_t gdp_buf_read(gdp_buf_t *b, void *out, size_t sz)
size_t gdp_buf_peek(gdp_buf_t *b, void *out, size_t sz)
int gdp_buf_drain(gdp_buf_t *b, size_t sz)
gdp_buf_read
;
data is copied from the buffer into a memory area. gdp_buf_peek
.
This is identical to gdp_buf_read
except that the data
remains in the buffer.
gdp_buf_drain
.sz
is the number of bytes to copy out
and/or discard.int gdp_buf_write(gdp_buf_t *b, void *in, size_t sz)
int gdp_buf_printf(gdp_buf_t *b, const char *fmt, ...)
gdp_buf_write
copies sz
bytes into the
buffer from the memory area in and returns 0 on success or –1 on
failure. gdp_buf_printf
essentially does a "printf" into the
buffer and returns the number of bytes appended. int gdp_buf_move(gdp_buf_t *ob, gdp_buf_t *ib, size_t sz)
sz
bytes of ib
to the
end of ob
.gdp_buf_read
and gdp_buf_write
.void gdp_buf_dump(gdp_buf_t *b, FILE *fp)
The time abstraction is imported directly from the ep library. Times are represented as follows:
Note that the host system#pragma pack(push, 1) typedef struct { int64_t tv_sec; // seconds since January 1, 1970 uint32_t tv_nsec; // nanoseconds
float tv_accuracy; // accuracy in seconds
} EP_TIME_SPEC;
#pragma pack(pop)
struct timespec
may not match this
structure; some systems still represent the time with only four bytes for tv_sec
,
which expires in 2038. The tv_accuracy
field indicates
an estimate for how accurate the clock is; for example, if you are running
NTP this value is likely to be on the order of a few tens to a few hundreds
of milliseconds, but if you set your clock manually it is likely to be
several seconds or worse.
Each log should have a public key in the metadata which is used to verify writes to the log. The library hides most of the details of this, but some still appear.
The gcl-create command automatically creates a public/secret keypair unless otherwise specified. See the man page for details. The public part of the key is inserted into the log metadata and stored with the log. The secret part is stored somewhere on your local filesystem, typically KEYS/gcl-id.pem. Normally gcl-create will encrypt the secret key with another key entered from the command line, although this can also be turned off.
When a GDP application attempts to open a log using gdp_gcl_open, the library will attempt to find a secret key by searching the directories named in the swarm.gdp.crypto.key.path administrative parameter for a file having the same name as the log (with a .pem file suffix). If that secret key is encrypted, the library will prompt the (human) user for the secret key password. The default path is ".", "KEYS", "~/.swarm/gdp/keys", "/usr/local/etc/swarm/gdp/keys", and "/etc/swarm/gdp/keys".
Once the secret key has been located and decrypted, all further append requests will be signed using the secret key and verified by the log daemon against the public key in the log metadata.
Encryption is explicitly not part of the GDP. Ideally the GDP will never see unencrypted data. However, read and write filters (see the next section) can be used to set encryption and decryption hooks for externally implemented encryption.
gdp_gcl_getnrecs — return the number of records in an existing GCL
gdp_recno_t gdp_gcl_getnrecs(gdp_gcl_t *gcl)
gdp_gcl_set_append_filter — filter appended data
void gdp_gcl_set_append_filter(gdp_gcl_t *gcl,
EP_STAT (*filter(gdp_datum_t *, void *),
void *filterdata)
gdp_gcl_set_read_filter — filter read data
void gdp_gcl_set_read_filter(gdp_gcl_t *gcl,
EP_STAT (*filter(gdp_datum_t *, void *),
void *filterdata)
Header files
Version info
PRIgdp_recno macro
The following pseudo-code example excerpts from apps/gdp-writer.c.
#include <gdp/gdp.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv)
{
gdp_tcl_t *gcl;
EP_STAT estat;
gdp_name_t gcliname; // internal name of GCL
gdp_datum_t *d;
// general startup and initialization
if (argc < 2)
usage_error();
estat = gdp_init();
if (!EP_STAT_ISOK(estat))
initialization_error(estat);
d = gdp_datum_new();
// parse command line name to internal format
estat = gdp_gcl_parse_name(argv[1], gcliname);
if (!EP_STAT_ISOK(estat))
name_syntax_error();
// attempt to create that name
estat = gdp_gcl_create(gcliname, &gcl);
if (!EP_STAT_ISOK(estat))
creation_error(estat);
// read lines from standard input
while (fgets(buf, sizeof buf, stdin) != NULL)
{
char *p = strchr(buf, '\n');
if (p != NULL)
*p = '\0';
// write them to the dataplane
if (gdp_buf_write(gdp_datum_getbuf(d), buf, strlen(buf)) < 0)
estat = GDP_STAT_BUFFER_FAILURE;
else
estat = gdp_gcl_append(gcl, d);
EP_STAT_CHECK(estat, break);
}
// cleanup and exit
gdp_gcl_close(gcl);
exit(!EP_STAT_ISOK(estat));
}
This example is a similar excerpt from apps/gdp-reader.c (without using subscriptions):
#include <gdp/gdp.h>
int main(int argc, char **argv)
{
gdp_gcl_t *gcl;
EP_STAT estat;
gdp_name_t gcliname; // internal name of GCL
gdp_datum_t *d;
gdp_recno_t recno;
// general startup and initialization
if (argc < 2)
usage_error();
estat = gdp_init();
if (!EP_STAT_ISOK(estat))
initialization_error(estat);
d = gdp_datum_new();
// parse command line name to internal format
estat = gdp_gcl_parse_name(argv[1], gcliname);
if (!EP_STAT_ISOK(estat))
name_syntax_error();
// attempt to open the GCL
estat = gdp_gcl_open(gcliname, GDP_MODE_RO, &gcl);
if (!EP_STAT_ISOK(estat))
open_error(estat, argv[1]);
recno = 1;
for (;;)
{
estat = gdp_gcl_read_by_recno(gcl, recno++, d);
EP_STAT_CHECK(estat, break);
gdp_datum_print(d, stdout);
}
exit(0);
}
If you want to use subscriptions, the recno variable can be removed and the for loop replaced with:
// enable the subscription
estat = gdp_gcl_subscribe_by_recno(gcl, 1, -1, NULL, NULL);
if (!EP_STAT_ISOK(estat))
subscribe_error(estat, argv[1]);
for (;;)
{
gdp_event_t *gev = gdp_event_next(true);
if (gdp_event_gettype(gev) != GDP_EVENT_DATA)
continue;
gdp_datum_print(gdp_event_getdatum(gev), stdout);
gdp_event_free(gev);
}
The GDP library uses a reduced version of libep and also uses the libevent library version 2.1. These will need to be included both during compilation and linking.
At compile time you must use:
-I
libevent_includes_parent
-I
libep_includes_parent
Note that these take the parent of the directory containing the include
files. For example, if the include files for libevent are in /usr/local/include/event2
and the include files for libep are in /usr/local/include/ep you
only need to specify the one flag "-I/usr/local/include
".
-Llibevent_libraries -levent -levent_pthreads -Llibep_libraries -lepAs before, if the libraries for libevent and libep are in the same directory you only need a single -L flag.
For additional information, see the README file in the distribution directory.
This section is really an addendum to the document — a "scratch area" to keep track of issues that we still need to consider. It may not be up to date.
Do this using Access Control Lists (so each user/app has a keypair) or by passing public/secret keys around (so each GCL has a secret keypair). The latter makes revocation impossible (even for write access), so I prefer the ACL approach. Third way?
Revocation? Deep vs. Shallow. Deep = take away permissions that have already been given. Shallow = you can only prevent an accessor from getting to new versions. Argument: deep revocation is hard to do from a technical perspective and ultimately futile (someone might have taken a photo of a screen while they still had access), but is still what people are used to (Unix and SQL permissions work this way). Shallow is all that can really be guaranteed. Also, anything involving Certificate Revocation Lists (CRLs) is doomed to failure. This implies that ACLs are the correct direction.
ACLs get us into the question of identity. Pretending that a keypair represents an identity doesn't work in the real world where bad players simply create new "identities" (keypairs) when an old identity has become untrusted. See the extensive work in email sender reputation. However, when a bad player creates a new identity/keypair they do not get access to any previous grants, so this may be sufficient.
If each GCL has a secret keypair, then the public key is sufficient to name the entity. If not, then assigning a GCL a GUID on creation seems like the best approach. Having the user assign a name seems like a non-starter, if only because of the possibility of conflicts.
There will probably be some need for external naming, e.g., some overlay directory structure. That might be a different gcl_type.
This seems like an open research topic.
If a GCL isn't linked into a directory structure and everyone forgets its name then it will live forever (or until it expires). This could be quite common if a GCL is temporary, that is, not a candidate for long-term archival.
Expiration could be an issue without some sort of charging, which implies accounting.
Charging and accounting will affect the API. It seems like on GCL creation the creator needs to offer payment for both carrying and storing the data. This payment would presumably accrue to the actors providing the actual service. Payment for storage might be limited time or indefinite time (i.e., it would be an endowment).
The creator could also specify a cost for any potential consumer in order to access the GCL. Such payments would accrue to the creator of the GCL, and might be used to fund continued access, i.e. it could be rolled back into the endowment. This would lean toward making less-used data disappear: appealing in some ways, but anathema to librarians and historians.
As for API effects, it seems that GCL creation needs to include a payment for initial service, a cost for access, and an account into which to deposit any consumer payments. Accessing a GCL only requires an offered payment (which is probably best viewed as a bid rather than a payment, thus allowing multiple providers to compete for access).
Note that none of this is dependent on the form of payment. It does however assume that there is a mutually agreed upon form of payment, i.e., a universal currency.
Is Quality of Service specified on a particular GCL, a particular open instance of a GCL, or between a pair of endpoints?
What does QoS actually mean? For example, in a live media stream it probably means the resolution of the data stream (which determines real-time bandwidth), latency, and possibly jitter, but after that stream is stored the QoS will be resolution (as before), delivery bandwidth (how quickly you can download the video, for example), and possibly jitter of the network connection (that is, how even the data flow will be). Delivery bandwidth depends on the entire path between the data source and the data sync, and may be higher or lower than the bandwidth required to send a real-time version of the stream — for example, over a slow network link.
[Dab13a] | Palmer Dabbelt, Swarm OS Universal Dataplane, August 22, 2013 |
[Dab13b] | Palmer Dabbelt, What is the Universal Dataplane, Anyway?, September 17, 2013 |