Project

General

Profile

Statistics
| Branch: | Tag: | Revision:

gdp / doc / developer / api-v0-v2-changes.md @ master

History | View | Annotate | Download (8.31 KB)

1 7f982ac3 Eric Allman
% Changes Between GDP API v0 and API v2
2 d9827038 Eric Allman
3
This document briefly describes the differences between Version 0
4
of the GDP API and Version 2 (introduced around June 2018).  For
5
more details of the current API, see `doc/gdp-programmatic-api.html`.
6
If you are not familiar with the old API, please do not read this
7
document; go directly to the current documentation.
8
9
This only describes changes to the C Programmatic API, but the
10
concepts should be relevant across all language bindings.
11
12 7f982ac3 Eric Allman
# Overview
13 d9827038 Eric Allman
14
The API has been updated to be a better fit to object-oriented
15
paradigms.  For example, function names beginning with `gdp_gin_`
16
operate on objects of type `gdp_gin_t` (with a few exceptions
17
such as `gdp_gin_new`) and will take a pointer to a `gdp_gin_t`
18
as the first ("self") argument.  In most cases, everything that
19
has a type (identified by a name ending `_t`) is probably a class.
20
21
The asynchronous APIs have become the primary focus versus the
22
synchronous APIs.  In particular, the asynchronous versions can
23
handle sets of records in a single call, which improves
24
performance and makes handling of holes and branches in a log
25
more elegant.  As a result, the "multiread" routines have been
26
merged with the "async" routines.
27
28
Applications now manipulate a "GDP Instance" (GIN) instead of
29
a "GDP Channel-Log" (GCL).  This has semantic implications with
30
asynchronous calls.  This has lead to re-naming above and beyond
31
the other semantic changes.  The name "GCL" has been deprecated.
32
33
These changes also coincide with a change in the on-the-wire network
34
protocol, which has a few subtle but important implications.  Notably,
35
the size of an individual PDU (Protocol Data Unit) has been reduced
36
from approximately 4GB to approximately 65kB in order to avoid large
37
protocol elements flooding the network, thus creating convoys.  This
38
in turn limits the size of the maximum size of any log entry (a.k.a.
39
"record" or "datum").
40
41
42 7f982ac3 Eric Allman
# Name Changes
43 d9827038 Eric Allman
44
Names marked with \* also have parameter changes.
45
46
| OLD				| NEW					|
47
|-------------------------------|---------------------------------------|
48
| `gdp_gcl_t`			| `gdp_gin_t`				|
49
| `gdp_gcl_create`		| `gdp_gin_create`			|
50
| `gdp_gcl_open`		| `gdp_gin_open`			|
51
| `gdp_gcl_open_info_t`		| `gdp_open_info_t`			|
52
| `gdp_gcl_open_info_new`	| `gdp_open_info_new`			|
53
| `gdp_gcl_open_info_free`	| `gdp_open_info_free`			|
54
| `gdp_gcl_open_info_set_signing_key` | `gdp_open_info_set_signing_key`	|
55
| `gdp_gcl_open_info_set_signkey_cb`  | `gdp_open_info_set_signkey_cb`	|
56
| `gdp_gcl_open_info_set_caching`     | `gdp_open_info_set_caching`	|
57
| `gdp_gcl_open_info_free`	| `gdp_open_info_free`			|
58
| `gdp_gcl_close`		| `gdp_gin_close`			|
59
| `gdp_gcl_append`		| `gdp_gin_append`\*			|
60
| `gdp_gcl_append_async`	| `gdp_gin_append_async`\*		|
61
| `gdp_gcl_read`		| `gdp_gin_read_by_recno`		|
62
| `gdp_gcl_read_async`		| `gdp_gin_read_by_recno_async`\*	|
63
| `gdp_gcl_read_ts`		| `gdp_gin_read_by_ts`			|
64 4e44b3f7 Eric Allman
| _new_				| `gdp_gin_read_by_ts_async`\*		|
65 d9827038 Eric Allman
| _new_				| `gdp_gin_read_by_hash`		|
66
| _new_				| `gdp_gin_read_by_hash_async`\*	|
67
| `gdp_gcl_subscribe`		| `gdp_gin_subscribe_by_recno`\*	|
68
| `gdp_gcl_subscribe_ts`	| `gdp_gin_subscribe_by_ts`\*		|
69
| `gdp_gcl_unsubscribe`		| `gdp_gin_unsubscribe`\*		|
70 4e44b3f7 Eric Allman
| `gdp_gcl_multiread`		| `gdp_gin_read_by_recno_async`		|
71
| `gdp_gcl_multiread_ts`	| `gdp_gin_read_by_ts_async`		|
72 d9827038 Eric Allman
| `gdp_gcl_getmetadata`		| `gdp_gin_getmetadata`			|
73
| `gdp_gcl_newsegment`		| _deleted_				|
74
| `gdp_gcl_set_append_filter`	| `gdp_gin_set_append_filter`		|
75
| `gdp_gcl_set_read_filter`	| `gdp_gin_set_read_filter`		|
76
| `gdp_gcl_getname`		| `gdp_gin_getname`			|
77
| `gdp_gcl_getnrecs`		| `gdp_gin_getnrecs`			|
78
| `gdp_gcl_print`		| `gdp_gin_print`			|
79
|||
80
| `gdp_gclmd_t`			| `gdp_md_t`				|
81
| `gdp_gclmd_id_t`		| `gdp_md_id_t`				|
82
| `gdp_gclmd_new`		| `gdp_md_new`				|
83
| `gdp_gclmd_free`		| `gdp_md_free`				|
84
| `gdp_gclmd_add`		| `gdp_md_add`				|
85
| `gdp_gclmd_get`		| `gdp_md_get`				|
86
| `gdp_gclmd_find`		| `gdp_md_find`				|
87
| `gdp_gclmd_print`		| `gdp_md_print`\*			|
88
|||
89
| `GDP_EVENT_EOS`		| `GDP_EVENT_DONE`			|
90
|||
91
| _new_				| `gdp_hash_t`				|
92
| _new_				| `gdp_hash_new`			|
93
| _new_				| `gdp_hash_free`			|
94
| _new_				| `gdp_hash_reset`			|
95
| _new_				| `gdp_hash_set`			|
96
| _new_				| `gdp_hash_getlength`			|
97
| _new_				| `gdp_hash_getptr`			|
98
|||
99
| _new_				| `gdp_sig_t`				|
100
| _new_				| `gdp_sig_new`				|
101
| _new_				| `gdp_sig_reset`			|
102
| _new_				| `gdp_sig_free`			|
103
| _new_				| `gdp_sig_set`				|
104
| _new_				| `gdp_sig_copy`			|
105
| _new_				| `gdp_sig_dup`				|
106
| _new_				| `gdp_sig_getlength`			|
107
| _new_				| `gdp_sig_getptr`			|
108
109 7f982ac3 Eric Allman
# Details
110 d9827038 Eric Allman
111 7f982ac3 Eric Allman
## Appends, Hashes, and Signatures
112 d9827038 Eric Allman
113
The long(ish) term intent is that all records (datums) will be
114
cryptographically linked in an Authenticated Data Structure.
115
We are discussing many ways of doing this, but all of them involve
116
hash chains of records.  As a result, the "append" interfaces now
117
take a `prevhash` parameter which is a hash of the previously
118
written record.
119
120
When writing consecutive records, the GDP library can maintain
121
the previous hash and insert it automatically if the `prevhash`
122
parameter is `NULL`.  However, when a writer initializes, it must
123
determine the hash of the previous record.  Ideally the writer
124
would not trust the underlying infrastructure, and would instead
125
save the hash of the previous record written (an exception being
126
made for the first record in the log).  This should be done by
127
saving the previous hash on local stable storage.  It's possible
128
that the GDP library could manage this state, but that is not yet
129
implemented.  At this point the details of how this should work
130
are unclear, so this feature is only partly implemented.
131
132
Similarly, it is important that readers be able to validate
133
signatures for themselves.  This is the rationale behind elevating
134
them (`gdp_sig_t`) to first-class citizens.
135
136
Hashes will become more important as readers start checking the
137
provenance of data returned by servers.  The details of that are
138
still in the research arena and are out of scope of this document.
139
140 7f982ac3 Eric Allman
## Subscriptions and Asynchronous Reads
141 d9827038 Eric Allman
142
Calling `gdp_event_next` with a given GIN will only return events
143
from asynchronous reads and subscriptions listed on that GIN.
144
Previously, if a log was opened twice (and hence had two GCL
145
handles) the data might be returned on a different instance.
146
For example, consider the (old) code:
147
148
``` c
149
gdp_name_t gcl_name;
150
gdp_gcl_t *gcl1, *gcl2;
151
extern gdp_event_cbfunc_t cb1, cb2;
152
EP_STAT estat;
153
154
// open the same log twice
155
estat = gdp_gcl_open(gcl_name, GDP_MODE_RO, NULL, &gcl1);
156
estat = gdp_gcl_open(gcl_name, GDP_MODE_RO, NULL, &gcl2);
157
158
// subscribe to the end, and read from the beginning
159
estat = gdp_gcl_subscribe(gcl1, 0, 20, NULL, &cb1, NULL);
160
estat = gdp_gcl_multiread(gcl2, 1, 100, &cb2, NULL);
161
```
162
163
would cause `cb1` and `cb2` to be called somewhat randomly
164
with results of the multiread from the beginning of the log and
165
the results of the subscribe from the end of the log.
166
167
The new code would be:
168
169
``` c
170
gdp_name_t log_name;
171
gdp_gin_t *gin1, *gin2;
172
extern gdp_event_cbfunc_t cb1, cb2;
173
EP_STAT estat;
174
175
// open the same log twice
176
estat = gdp_gin_open(log_name, GDP_MODE_RO, NULL, &gin1);
177
estat = gdp_gin_open(log_name, GDP_MODE_RO, NULL, &gin2);
178
179
// subscribe to the end, and read from the beginning
180
estat = gdp_gin_subscribe_by_recno(gin1, 0, 20, NULL, &cb1, NULL);
181
estat = gdp_gin_read_by_recno_async(gin2, 1, 100, &cb2, NULL);
182
```
183
184
would return the results from the subscription exclusively
185
to `cb1` and the results of the read exclusively to `cb2`,
186
which was probably what was intended.
187
188
Similarly, `gdp_gin_unsubscribe` only deletes subscriptions
189
that were created on a specific GIN; previously it was
190
somewhat random.
191
192 7f982ac3 Eric Allman
## Appending Multiple Records
193 d9827038 Eric Allman
194
The old `gdp_gcl_append_async` call previously only added one
195
datum to a log.  The new `gdp_gin_append_async` call can
196
append multiple datums in one call.  Besides sending fewer
197
network commands, this allows the individual datums to be linked
198
together in a hash chain with only the last datum digitally
199
signed, which is much more efficient.
200
201
Beware however that all the datums must fit within a single
202
network PDU (Protocol Data Unit), and the maximum size has been
203
reduced to approximately 65k to avoid network congestion.
204
205 7f982ac3 Eric Allman
# Compatibility
206 d9827038 Eric Allman
207
At some point, if there is demand, we may add a `<gdp_compat_v0.h>`
208
that will to the extent possible make it feasible to run programs
209
coded against the old API.  This will only deal with the syntactic
210
issues however.