Project

General

Profile

GDP Version 0 to Version 2 Transition — Code

This document briefly describes the differences between Version 0 of the GDP API and Version 2 (introduced around June 2018). For more details of the current API, see doc/gdp-programmatic-api.html. If you are not familiar with the old API, please do not read this document; go directly to the current documentation.

This only describes changes to the C Programmatic API, but the concepts should be relevant across all language bindings.

Overview

The API has been updated to be a better fit to object-oriented paradigms. For example, function names beginning with gdp_gin_ operate on objects of type gdp_gin_t (with a few exceptions such as gdp_gin_new) and will take a pointer to a gdp_gin_t as the first ("self") argument. In most cases, everything that has a type (identified by a name ending _t) is probably a class.

The asynchronous APIs have become the primary focus versus the synchronous APIs. In particular, the asynchronous versions can handle sets of records in a single call, which improves performance and makes handling of holes and branches in a log more elegant. As a result, the "multiread" routines have been merged with the "async" routines.

Applications now manipulate a "GDP Instance" (GIN) instead of a "GDP Channel-Log" (GCL). This has semantic implications with asynchronous calls. This has lead to re-naming above and beyond the other semantic changes. The name "GCL" has been deprecated.

These changes also coincide with a change in the on-the-wire network protocol, which has a few subtle but important implications. Notably, the size of an individual PDU (Protocol Data Unit) has been reduced from approximately 4GB to approximately 65kB in order to avoid large protocol elements flooding the network, thus creating convoys. This in turn limits the size of the maximum size of any log entry (a.k.a. "record" or "datum").

Name Changes

Names marked with * also have parameter changes.

OLD NEW
gdp_gcl_t gdp_gin_t
gdp_gcl_create gdp_gin_create
gdp_gcl_open gdp_gin_open
gdp_gcl_open_info_t gdp_open_info_t
gdp_gcl_open_info_new gdp_open_info_new
gdp_gcl_open_info_free gdp_open_info_free
gdp_gcl_open_info_set_signing_key gdp_open_info_set_signing_key
gdp_gcl_open_info_set_signkey_cb gdp_open_info_set_signkey_cb
gdp_gcl_open_info_set_caching gdp_open_info_set_caching
gdp_gcl_open_info_free gdp_open_info_free
gdp_gcl_close gdp_gin_close
gdp_gcl_append gdp_gin_append*
gdp_gcl_append_async gdp_gin_append_async*
gdp_gcl_read gdp_gin_read_by_recno
gdp_gcl_read_async gdp_gin_read_by_recno_async*
gdp_gcl_read_ts gdp_gin_read_by_ts
new gdp_gin_read_by_ts_async*
new gdp_gin_read_by_hash
new gdp_gin_read_by_hash_async*
gdp_gcl_subscribe gdp_gin_subscribe_by_recno*
gdp_gcl_subscribe_ts gdp_gin_subscribe_by_ts*
gdp_gcl_unsubscribe gdp_gin_unsubscribe*
gdp_gcl_multiread gdp_gin_read_by_recno_async
gdp_gcl_multiread_ts gdp_gin_read_by_ts_async
gdp_gcl_getmetadata gdp_gin_getmetadata
gdp_gcl_newsegment deleted
gdp_gcl_set_append_filter gdp_gin_set_append_filter
gdp_gcl_set_read_filter gdp_gin_set_read_filter
gdp_gcl_getname gdp_gin_getname
gdp_gcl_getnrecs gdp_gin_getnrecs
gdp_gcl_print gdp_gin_print
gdp_gclmd_t gdp_md_t
gdp_gclmd_id_t gdp_md_id_t
gdp_gclmd_new gdp_md_new
gdp_gclmd_free gdp_md_free
gdp_gclmd_add gdp_md_add
gdp_gclmd_get gdp_md_get
gdp_gclmd_find gdp_md_find
gdp_gclmd_print gdp_md_print*
GDP_EVENT_EOS GDP_EVENT_DONE
new gdp_hash_t
new gdp_hash_new
new gdp_hash_free
new gdp_hash_reset
new gdp_hash_set
new gdp_hash_getlength
new gdp_hash_getptr
new gdp_sig_t
new gdp_sig_new
new gdp_sig_reset
new gdp_sig_free
new gdp_sig_set
new gdp_sig_copy
new gdp_sig_dup
new gdp_sig_getlength
new gdp_sig_getptr

Details

Appends, Hashes, and Signatures

The long(ish) term intent is that all records (datums) will be cryptographically linked in an Authenticated Data Structure. We are discussing many ways of doing this, but all of them involve hash chains of records. As a result, the "append" interfaces now take a prevhash parameter which is a hash of the previously written record.

When writing consecutive records, the GDP library can maintain the previous hash and insert it automatically if the prevhash parameter is NULL. However, when a writer initializes, it must determine the hash of the previous record. Ideally the writer would not trust the underlying infrastructure, and would instead save the hash of the previous record written (an exception being made for the first record in the log). This should be done by saving the previous hash on local stable storage. It's possible that the GDP library could manage this state, but that is not yet implemented. At this point the details of how this should work are unclear, so this feature is only partly implemented.

Similarly, it is important that readers be able to validate signatures for themselves. This is the rationale behind elevating them (gdp_sig_t) to first-class citizens.

Hashes will become more important as readers start checking the provenance of data returned by servers. The details of that are still in the research arena and are out of scope of this document.

Subscriptions and Asynchronous Reads

Calling gdp_event_next with a given GIN will only return events from asynchronous reads and subscriptions listed on that GIN. Previously, if a log was opened twice (and hence had two GCL handles) the data might be returned on a different instance. For example, consider the (old) code:

gdp_name_t gcl_name;
gdp_gcl_t *gcl1, *gcl2;
extern gdp_event_cbfunc_t cb1, cb2;
EP_STAT estat;

// open the same log twice
estat = gdp_gcl_open(gcl_name, GDP_MODE_RO, NULL, &gcl1);
estat = gdp_gcl_open(gcl_name, GDP_MODE_RO, NULL, &gcl2);

// subscribe to the end, and read from the beginning
estat = gdp_gcl_subscribe(gcl1, 0, 20, NULL, &cb1, NULL);
estat = gdp_gcl_multiread(gcl2, 1, 100, &cb2, NULL);

would cause cb1 and cb2 to be called somewhat randomly with results of the multiread from the beginning of the log and the results of the subscribe from the end of the log.

The new code would be:

gdp_name_t log_name;
gdp_gin_t *gin1, *gin2;
extern gdp_event_cbfunc_t cb1, cb2;
EP_STAT estat;

// open the same log twice
estat = gdp_gin_open(log_name, GDP_MODE_RO, NULL, &gin1);
estat = gdp_gin_open(log_name, GDP_MODE_RO, NULL, &gin2);

// subscribe to the end, and read from the beginning
estat = gdp_gin_subscribe_by_recno(gin1, 0, 20, NULL, &cb1, NULL);
estat = gdp_gin_read_by_recno_async(gin2, 1, 100, &cb2, NULL);

would return the results from the subscription exclusively to cb1 and the results of the read exclusively to cb2, which was probably what was intended.

Similarly, gdp_gin_unsubscribe only deletes subscriptions that were created on a specific GIN; previously it was somewhat random.

Appending Multiple Records

The old gdp_gcl_append_async call previously only added one datum to a log. The new gdp_gin_append_async call can append multiple datums in one call. Besides sending fewer network commands, this allows the individual datums to be linked together in a hash chain with only the last datum digitally signed, which is much more efficient.

Beware however that all the datums must fit within a single network PDU (Protocol Data Unit), and the maximum size has been reduced to approximately 65k to avoid network congestion.

Compatibility

There is also a gdp_compat_v0.h that will to the extent possible make it feasible to run programs coded against the old API. To use it, just use #include <gdp_compat_v0.h>. This will only deal with the syntactic issues however.