gdp / doc / developer / api-v0-v2-changes.md @ master
History | View | Annotate | Download (8.31 KB)
% Changes Between GDP API v0 and API v2
This document briefly describes the differences between Version 0
of the GDP API and Version 2 (introduced around June 2018). For
more details of the current API, see doc/gdp-programmatic-api.html
.
If you are not familiar with the old API, please do not read this
document; go directly to the current documentation.
This only describes changes to the C Programmatic API, but the concepts should be relevant across all language bindings.
Overview
The API has been updated to be a better fit to object-oriented
paradigms. For example, function names beginning with gdp_gin_
operate on objects of type gdp_gin_t
(with a few exceptions
such as gdp_gin_new
) and will take a pointer to a gdp_gin_t
as the first ("self") argument. In most cases, everything that
has a type (identified by a name ending _t
) is probably a class.
The asynchronous APIs have become the primary focus versus the synchronous APIs. In particular, the asynchronous versions can handle sets of records in a single call, which improves performance and makes handling of holes and branches in a log more elegant. As a result, the "multiread" routines have been merged with the "async" routines.
Applications now manipulate a "GDP Instance" (GIN) instead of a "GDP Channel-Log" (GCL). This has semantic implications with asynchronous calls. This has lead to re-naming above and beyond the other semantic changes. The name "GCL" has been deprecated.
These changes also coincide with a change in the on-the-wire network protocol, which has a few subtle but important implications. Notably, the size of an individual PDU (Protocol Data Unit) has been reduced from approximately 4GB to approximately 65kB in order to avoid large protocol elements flooding the network, thus creating convoys. This in turn limits the size of the maximum size of any log entry (a.k.a. "record" or "datum").
Name Changes
Names marked with * also have parameter changes.
OLD | NEW |
---|---|
gdp_gcl_t |
gdp_gin_t |
gdp_gcl_create |
gdp_gin_create |
gdp_gcl_open |
gdp_gin_open |
gdp_gcl_open_info_t |
gdp_open_info_t |
gdp_gcl_open_info_new |
gdp_open_info_new |
gdp_gcl_open_info_free |
gdp_open_info_free |
gdp_gcl_open_info_set_signing_key |
gdp_open_info_set_signing_key |
gdp_gcl_open_info_set_signkey_cb |
gdp_open_info_set_signkey_cb |
gdp_gcl_open_info_set_caching |
gdp_open_info_set_caching |
gdp_gcl_open_info_free |
gdp_open_info_free |
gdp_gcl_close |
gdp_gin_close |
gdp_gcl_append |
gdp_gin_append * |
gdp_gcl_append_async |
gdp_gin_append_async * |
gdp_gcl_read |
gdp_gin_read_by_recno |
gdp_gcl_read_async |
gdp_gin_read_by_recno_async * |
gdp_gcl_read_ts |
gdp_gin_read_by_ts |
new | gdp_gin_read_by_ts_async * |
new | gdp_gin_read_by_hash |
new | gdp_gin_read_by_hash_async * |
gdp_gcl_subscribe |
gdp_gin_subscribe_by_recno * |
gdp_gcl_subscribe_ts |
gdp_gin_subscribe_by_ts * |
gdp_gcl_unsubscribe |
gdp_gin_unsubscribe * |
gdp_gcl_multiread |
gdp_gin_read_by_recno_async |
gdp_gcl_multiread_ts |
gdp_gin_read_by_ts_async |
gdp_gcl_getmetadata |
gdp_gin_getmetadata |
gdp_gcl_newsegment |
deleted |
gdp_gcl_set_append_filter |
gdp_gin_set_append_filter |
gdp_gcl_set_read_filter |
gdp_gin_set_read_filter |
gdp_gcl_getname |
gdp_gin_getname |
gdp_gcl_getnrecs |
gdp_gin_getnrecs |
gdp_gcl_print |
gdp_gin_print |
gdp_gclmd_t |
gdp_md_t |
gdp_gclmd_id_t |
gdp_md_id_t |
gdp_gclmd_new |
gdp_md_new |
gdp_gclmd_free |
gdp_md_free |
gdp_gclmd_add |
gdp_md_add |
gdp_gclmd_get |
gdp_md_get |
gdp_gclmd_find |
gdp_md_find |
gdp_gclmd_print |
gdp_md_print * |
GDP_EVENT_EOS |
GDP_EVENT_DONE |
new | gdp_hash_t |
new | gdp_hash_new |
new | gdp_hash_free |
new | gdp_hash_reset |
new | gdp_hash_set |
new | gdp_hash_getlength |
new | gdp_hash_getptr |
new | gdp_sig_t |
new | gdp_sig_new |
new | gdp_sig_reset |
new | gdp_sig_free |
new | gdp_sig_set |
new | gdp_sig_copy |
new | gdp_sig_dup |
new | gdp_sig_getlength |
new | gdp_sig_getptr |
Details
Appends, Hashes, and Signatures
The long(ish) term intent is that all records (datums) will be
cryptographically linked in an Authenticated Data Structure.
We are discussing many ways of doing this, but all of them involve
hash chains of records. As a result, the "append" interfaces now
take a prevhash
parameter which is a hash of the previously
written record.
When writing consecutive records, the GDP library can maintain
the previous hash and insert it automatically if the prevhash
parameter is NULL
. However, when a writer initializes, it must
determine the hash of the previous record. Ideally the writer
would not trust the underlying infrastructure, and would instead
save the hash of the previous record written (an exception being
made for the first record in the log). This should be done by
saving the previous hash on local stable storage. It's possible
that the GDP library could manage this state, but that is not yet
implemented. At this point the details of how this should work
are unclear, so this feature is only partly implemented.
Similarly, it is important that readers be able to validate
signatures for themselves. This is the rationale behind elevating
them (gdp_sig_t
) to first-class citizens.
Hashes will become more important as readers start checking the provenance of data returned by servers. The details of that are still in the research arena and are out of scope of this document.
Subscriptions and Asynchronous Reads
Calling gdp_event_next
with a given GIN will only return events
from asynchronous reads and subscriptions listed on that GIN.
Previously, if a log was opened twice (and hence had two GCL
handles) the data might be returned on a different instance.
For example, consider the (old) code:
gdp_name_t gcl_name;
gdp_gcl_t *gcl1, *gcl2;
extern gdp_event_cbfunc_t cb1, cb2;
EP_STAT estat;
// open the same log twice
estat = gdp_gcl_open(gcl_name, GDP_MODE_RO, NULL, &gcl1);
estat = gdp_gcl_open(gcl_name, GDP_MODE_RO, NULL, &gcl2);
// subscribe to the end, and read from the beginning
estat = gdp_gcl_subscribe(gcl1, 0, 20, NULL, &cb1, NULL);
estat = gdp_gcl_multiread(gcl2, 1, 100, &cb2, NULL);
would cause cb1
and cb2
to be called somewhat randomly
with results of the multiread from the beginning of the log and
the results of the subscribe from the end of the log.
The new code would be:
gdp_name_t log_name;
gdp_gin_t *gin1, *gin2;
extern gdp_event_cbfunc_t cb1, cb2;
EP_STAT estat;
// open the same log twice
estat = gdp_gin_open(log_name, GDP_MODE_RO, NULL, &gin1);
estat = gdp_gin_open(log_name, GDP_MODE_RO, NULL, &gin2);
// subscribe to the end, and read from the beginning
estat = gdp_gin_subscribe_by_recno(gin1, 0, 20, NULL, &cb1, NULL);
estat = gdp_gin_read_by_recno_async(gin2, 1, 100, &cb2, NULL);
would return the results from the subscription exclusively
to cb1
and the results of the read exclusively to cb2
,
which was probably what was intended.
Similarly, gdp_gin_unsubscribe
only deletes subscriptions
that were created on a specific GIN; previously it was
somewhat random.
Appending Multiple Records
The old gdp_gcl_append_async
call previously only added one
datum to a log. The new gdp_gin_append_async
call can
append multiple datums in one call. Besides sending fewer
network commands, this allows the individual datums to be linked
together in a hash chain with only the last datum digitally
signed, which is much more efficient.
Beware however that all the datums must fit within a single network PDU (Protocol Data Unit), and the maximum size has been reduced to approximately 65k to avoid network congestion.
Compatibility
At some point, if there is demand, we may add a <gdp_compat_v0.h>
that will to the extent possible make it feasible to run programs
coded against the old API. This will only deal with the syntactic
issues however.