gdp / doc / developer / api-v0-v2-changes.md @ master
History | View | Annotate | Download (8.31 KB)
1 | 7f982ac3 | Eric Allman | % Changes Between GDP API v0 and API v2 |
---|---|---|---|
2 | d9827038 | Eric Allman | |
3 | This document briefly describes the differences between Version 0 |
||
4 | of the GDP API and Version 2 (introduced around June 2018). For |
||
5 | more details of the current API, see `doc/gdp-programmatic-api.html`. |
||
6 | If you are not familiar with the old API, please do not read this |
||
7 | document; go directly to the current documentation. |
||
8 | |||
9 | This only describes changes to the C Programmatic API, but the |
||
10 | concepts should be relevant across all language bindings. |
||
11 | |||
12 | 7f982ac3 | Eric Allman | # Overview |
13 | d9827038 | Eric Allman | |
14 | The API has been updated to be a better fit to object-oriented |
||
15 | paradigms. For example, function names beginning with `gdp_gin_` |
||
16 | operate on objects of type `gdp_gin_t` (with a few exceptions |
||
17 | such as `gdp_gin_new`) and will take a pointer to a `gdp_gin_t` |
||
18 | as the first ("self") argument. In most cases, everything that |
||
19 | has a type (identified by a name ending `_t`) is probably a class. |
||
20 | |||
21 | The asynchronous APIs have become the primary focus versus the |
||
22 | synchronous APIs. In particular, the asynchronous versions can |
||
23 | handle sets of records in a single call, which improves |
||
24 | performance and makes handling of holes and branches in a log |
||
25 | more elegant. As a result, the "multiread" routines have been |
||
26 | merged with the "async" routines. |
||
27 | |||
28 | Applications now manipulate a "GDP Instance" (GIN) instead of |
||
29 | a "GDP Channel-Log" (GCL). This has semantic implications with |
||
30 | asynchronous calls. This has lead to re-naming above and beyond |
||
31 | the other semantic changes. The name "GCL" has been deprecated. |
||
32 | |||
33 | These changes also coincide with a change in the on-the-wire network |
||
34 | protocol, which has a few subtle but important implications. Notably, |
||
35 | the size of an individual PDU (Protocol Data Unit) has been reduced |
||
36 | from approximately 4GB to approximately 65kB in order to avoid large |
||
37 | protocol elements flooding the network, thus creating convoys. This |
||
38 | in turn limits the size of the maximum size of any log entry (a.k.a. |
||
39 | "record" or "datum"). |
||
40 | |||
41 | |||
42 | 7f982ac3 | Eric Allman | # Name Changes |
43 | d9827038 | Eric Allman | |
44 | Names marked with \* also have parameter changes. |
||
45 | |||
46 | | OLD | NEW | |
||
47 | |-------------------------------|---------------------------------------| |
||
48 | | `gdp_gcl_t` | `gdp_gin_t` | |
||
49 | | `gdp_gcl_create` | `gdp_gin_create` | |
||
50 | | `gdp_gcl_open` | `gdp_gin_open` | |
||
51 | | `gdp_gcl_open_info_t` | `gdp_open_info_t` | |
||
52 | | `gdp_gcl_open_info_new` | `gdp_open_info_new` | |
||
53 | | `gdp_gcl_open_info_free` | `gdp_open_info_free` | |
||
54 | | `gdp_gcl_open_info_set_signing_key` | `gdp_open_info_set_signing_key` | |
||
55 | | `gdp_gcl_open_info_set_signkey_cb` | `gdp_open_info_set_signkey_cb` | |
||
56 | | `gdp_gcl_open_info_set_caching` | `gdp_open_info_set_caching` | |
||
57 | | `gdp_gcl_open_info_free` | `gdp_open_info_free` | |
||
58 | | `gdp_gcl_close` | `gdp_gin_close` | |
||
59 | | `gdp_gcl_append` | `gdp_gin_append`\* | |
||
60 | | `gdp_gcl_append_async` | `gdp_gin_append_async`\* | |
||
61 | | `gdp_gcl_read` | `gdp_gin_read_by_recno` | |
||
62 | | `gdp_gcl_read_async` | `gdp_gin_read_by_recno_async`\* | |
||
63 | | `gdp_gcl_read_ts` | `gdp_gin_read_by_ts` | |
||
64 | 4e44b3f7 | Eric Allman | | _new_ | `gdp_gin_read_by_ts_async`\* | |
65 | d9827038 | Eric Allman | | _new_ | `gdp_gin_read_by_hash` | |
66 | | _new_ | `gdp_gin_read_by_hash_async`\* | |
||
67 | | `gdp_gcl_subscribe` | `gdp_gin_subscribe_by_recno`\* | |
||
68 | | `gdp_gcl_subscribe_ts` | `gdp_gin_subscribe_by_ts`\* | |
||
69 | | `gdp_gcl_unsubscribe` | `gdp_gin_unsubscribe`\* | |
||
70 | 4e44b3f7 | Eric Allman | | `gdp_gcl_multiread` | `gdp_gin_read_by_recno_async` | |
71 | | `gdp_gcl_multiread_ts` | `gdp_gin_read_by_ts_async` | |
||
72 | d9827038 | Eric Allman | | `gdp_gcl_getmetadata` | `gdp_gin_getmetadata` | |
73 | | `gdp_gcl_newsegment` | _deleted_ | |
||
74 | | `gdp_gcl_set_append_filter` | `gdp_gin_set_append_filter` | |
||
75 | | `gdp_gcl_set_read_filter` | `gdp_gin_set_read_filter` | |
||
76 | | `gdp_gcl_getname` | `gdp_gin_getname` | |
||
77 | | `gdp_gcl_getnrecs` | `gdp_gin_getnrecs` | |
||
78 | | `gdp_gcl_print` | `gdp_gin_print` | |
||
79 | ||| |
||
80 | | `gdp_gclmd_t` | `gdp_md_t` | |
||
81 | | `gdp_gclmd_id_t` | `gdp_md_id_t` | |
||
82 | | `gdp_gclmd_new` | `gdp_md_new` | |
||
83 | | `gdp_gclmd_free` | `gdp_md_free` | |
||
84 | | `gdp_gclmd_add` | `gdp_md_add` | |
||
85 | | `gdp_gclmd_get` | `gdp_md_get` | |
||
86 | | `gdp_gclmd_find` | `gdp_md_find` | |
||
87 | | `gdp_gclmd_print` | `gdp_md_print`\* | |
||
88 | ||| |
||
89 | | `GDP_EVENT_EOS` | `GDP_EVENT_DONE` | |
||
90 | ||| |
||
91 | | _new_ | `gdp_hash_t` | |
||
92 | | _new_ | `gdp_hash_new` | |
||
93 | | _new_ | `gdp_hash_free` | |
||
94 | | _new_ | `gdp_hash_reset` | |
||
95 | | _new_ | `gdp_hash_set` | |
||
96 | | _new_ | `gdp_hash_getlength` | |
||
97 | | _new_ | `gdp_hash_getptr` | |
||
98 | ||| |
||
99 | | _new_ | `gdp_sig_t` | |
||
100 | | _new_ | `gdp_sig_new` | |
||
101 | | _new_ | `gdp_sig_reset` | |
||
102 | | _new_ | `gdp_sig_free` | |
||
103 | | _new_ | `gdp_sig_set` | |
||
104 | | _new_ | `gdp_sig_copy` | |
||
105 | | _new_ | `gdp_sig_dup` | |
||
106 | | _new_ | `gdp_sig_getlength` | |
||
107 | | _new_ | `gdp_sig_getptr` | |
||
108 | |||
109 | 7f982ac3 | Eric Allman | # Details |
110 | d9827038 | Eric Allman | |
111 | 7f982ac3 | Eric Allman | ## Appends, Hashes, and Signatures |
112 | d9827038 | Eric Allman | |
113 | The long(ish) term intent is that all records (datums) will be |
||
114 | cryptographically linked in an Authenticated Data Structure. |
||
115 | We are discussing many ways of doing this, but all of them involve |
||
116 | hash chains of records. As a result, the "append" interfaces now |
||
117 | take a `prevhash` parameter which is a hash of the previously |
||
118 | written record. |
||
119 | |||
120 | When writing consecutive records, the GDP library can maintain |
||
121 | the previous hash and insert it automatically if the `prevhash` |
||
122 | parameter is `NULL`. However, when a writer initializes, it must |
||
123 | determine the hash of the previous record. Ideally the writer |
||
124 | would not trust the underlying infrastructure, and would instead |
||
125 | save the hash of the previous record written (an exception being |
||
126 | made for the first record in the log). This should be done by |
||
127 | saving the previous hash on local stable storage. It's possible |
||
128 | that the GDP library could manage this state, but that is not yet |
||
129 | implemented. At this point the details of how this should work |
||
130 | are unclear, so this feature is only partly implemented. |
||
131 | |||
132 | Similarly, it is important that readers be able to validate |
||
133 | signatures for themselves. This is the rationale behind elevating |
||
134 | them (`gdp_sig_t`) to first-class citizens. |
||
135 | |||
136 | Hashes will become more important as readers start checking the |
||
137 | provenance of data returned by servers. The details of that are |
||
138 | still in the research arena and are out of scope of this document. |
||
139 | |||
140 | 7f982ac3 | Eric Allman | ## Subscriptions and Asynchronous Reads |
141 | d9827038 | Eric Allman | |
142 | Calling `gdp_event_next` with a given GIN will only return events |
||
143 | from asynchronous reads and subscriptions listed on that GIN. |
||
144 | Previously, if a log was opened twice (and hence had two GCL |
||
145 | handles) the data might be returned on a different instance. |
||
146 | For example, consider the (old) code: |
||
147 | |||
148 | ``` c |
||
149 | gdp_name_t gcl_name; |
||
150 | gdp_gcl_t *gcl1, *gcl2; |
||
151 | extern gdp_event_cbfunc_t cb1, cb2; |
||
152 | EP_STAT estat; |
||
153 | |||
154 | // open the same log twice |
||
155 | estat = gdp_gcl_open(gcl_name, GDP_MODE_RO, NULL, &gcl1); |
||
156 | estat = gdp_gcl_open(gcl_name, GDP_MODE_RO, NULL, &gcl2); |
||
157 | |||
158 | // subscribe to the end, and read from the beginning |
||
159 | estat = gdp_gcl_subscribe(gcl1, 0, 20, NULL, &cb1, NULL); |
||
160 | estat = gdp_gcl_multiread(gcl2, 1, 100, &cb2, NULL); |
||
161 | ``` |
||
162 | |||
163 | would cause `cb1` and `cb2` to be called somewhat randomly |
||
164 | with results of the multiread from the beginning of the log and |
||
165 | the results of the subscribe from the end of the log. |
||
166 | |||
167 | The new code would be: |
||
168 | |||
169 | ``` c |
||
170 | gdp_name_t log_name; |
||
171 | gdp_gin_t *gin1, *gin2; |
||
172 | extern gdp_event_cbfunc_t cb1, cb2; |
||
173 | EP_STAT estat; |
||
174 | |||
175 | // open the same log twice |
||
176 | estat = gdp_gin_open(log_name, GDP_MODE_RO, NULL, &gin1); |
||
177 | estat = gdp_gin_open(log_name, GDP_MODE_RO, NULL, &gin2); |
||
178 | |||
179 | // subscribe to the end, and read from the beginning |
||
180 | estat = gdp_gin_subscribe_by_recno(gin1, 0, 20, NULL, &cb1, NULL); |
||
181 | estat = gdp_gin_read_by_recno_async(gin2, 1, 100, &cb2, NULL); |
||
182 | ``` |
||
183 | |||
184 | would return the results from the subscription exclusively |
||
185 | to `cb1` and the results of the read exclusively to `cb2`, |
||
186 | which was probably what was intended. |
||
187 | |||
188 | Similarly, `gdp_gin_unsubscribe` only deletes subscriptions |
||
189 | that were created on a specific GIN; previously it was |
||
190 | somewhat random. |
||
191 | |||
192 | 7f982ac3 | Eric Allman | ## Appending Multiple Records |
193 | d9827038 | Eric Allman | |
194 | The old `gdp_gcl_append_async` call previously only added one |
||
195 | datum to a log. The new `gdp_gin_append_async` call can |
||
196 | append multiple datums in one call. Besides sending fewer |
||
197 | network commands, this allows the individual datums to be linked |
||
198 | together in a hash chain with only the last datum digitally |
||
199 | signed, which is much more efficient. |
||
200 | |||
201 | Beware however that all the datums must fit within a single |
||
202 | network PDU (Protocol Data Unit), and the maximum size has been |
||
203 | reduced to approximately 65k to avoid network congestion. |
||
204 | |||
205 | 7f982ac3 | Eric Allman | # Compatibility |
206 | d9827038 | Eric Allman | |
207 | At some point, if there is demand, we may add a `<gdp_compat_v0.h>` |
||
208 | that will to the extent possible make it feasible to run programs |
||
209 | coded against the old API. This will only deal with the syntactic |
||
210 | issues however. |