Global Data Plane Library Implementation

Eric Allman
2015-02-26

This document is not yet complete.

This document describes the internals of the Global Data Plane (GDP) run-time library at a conceptual level.  This library is linked into any client that wishes to participate in the Global Data Plane.  The base library is implemented in C, which is what this document will assume, but bindings for the external interfaces are available for other languages.  See the document Global Data Plane Programmatic API for details of that interface.

Overview

To be completed.

Main modules:

Generally speaking, the GDP library is structured as an event-driven program with a synchronous API.  One thread services events (e.g., responses from the GDP daemon) while the main thread executes the user application.  When the application needs to contact the daemon, it sends the message and then waits on a condition variable until signaled.  In the meantime, the event look will wait for a response, associate it with the appropriate GOB handle on the basis of the GOB associative cache, and then signal the application to collect the results.

Data is exchanged through a data structure of type gdp_datum_t, which contains a record number a time stamp, and a data buffer.  When sending data to the GDP daemon, the application creates a datum, fills it in with data, and sends it to the daemon.  When receiving data from the GDP daemon, the application passes a datum into the GDP library that will be filled in.  Generally when sending data to the daemon the record number and time stamp are ignored and replaced with the real record number and timestamp after the write completes.

Important Data Types and Structures

Module Details

API

The API module (gdp/gdp_api.c) defines all the externally visible routines.  Since these are already documented in the GDP Programmatic API document, suffice it to say that it does the "translation" between the internal protocol and the external API.  Approximately speaking, it packages up parameters into a request, invokes the request, and translates the updated request into any return codes.

Request Management

Implemented in gdp/gdp_req.c.

Internally the data flow is managed through a series of requests.  In many cases there will be only one request active on a given GOB at a time, but this is not necessarily true, especially in the GDP daemon when handling subscriptions (each subscription is a separate request).  Requests (potentially) have a pointer to a GOB handle, a pointer to a protocol data unit (in internal form; essentially a packet), the status code from the operation embodied in the request, and special information for use when processing subscriptions.

The internal routines are:

_gdp_req_new
Create a new request and fill it in with a GDP protocol command, GOB handle, I/O channel, and flags (passed in as parameters) as well as space for a packet (in internal form) and request ID.
_gdp_req_free
Free the request and all associated resources such as the space for packet information.  It also decrements the reference count on the GOB handle indicated in the request.
_gdp_req_freeall
Free all requests associated with a particular GOB handle.  GOBs that have pending subscriptions will have one request per subscription, which are linked off the GOB handle.
_gdp_req_find
Given a GOB handle and a request ID, find the associated request on the list associated with that GOB handle.  Note that request IDs need only be unique within a particular GOB handle.


Datums

Implemented in gdp/gdp_datum.c.

As described above, a datum is the internal version of a GOB record.  The routines, which are externally visible, are:

gdp_datum_new
Create a new empty datum, including its associated (empty) buffer.
gdp_datum_free
Free the datum, including it's associated data buffer.
gdp_datum_getrecno
Get the record number.
gdp_datum_getts
Get the timestamp.
gdp_datum_getdlen
Get the length of the data buffer.
gdp_datum_getbuf
Get the data buffer itself.
gdp_datum_print
Print a datum in a format suitable for debugging use.

When sending data to the GDP, the application has to create a datum, get the buffer from that datum, and add the data to the buffer.  When receiving data from the GDP, the application creates a datum, hands it in to the appropriate read API, and upon the return can access the data buffer with return data, the record number, and the timestamp.

GOB Associative Cache

Implemented in gdp/gdp_gob_cache.c.

The primary purpose of the GOB Associative Cache is to allow quick association between a GOB name and the associated handle.  When a packet is received that contains a GOB name, this delivers the handle that contains the necessary state information.

_gdp_gob_cache_init
Initializes the GOB cache.  Called only once on startup.
_gdp_gob_cache_get
Extracts the GOB handle from the cache based on name and I/O mode.  If it is found the reference count on the GOB handle is incremented; if not, it returns NULL.
_gdp_gob_cache_add
Adds the GOB handle to the cache.
_gdp_gob_cache_drop
Removes the GOB name → handle association from the cache.
_gdp_gob_incref
Increments the reference count on the GOB handle.
_gdp_gob_decref
Decrements the reference count on the GOB handle.  If the reference count reaches zero the handle becomes a candidate for cleanup, but this is deferred because, in the common case, another request for this GOB will appear shortly.
_gdp_gob_newhandle
Creates a new GOB handle.  Note that this is just the library data — it sends no protocol to the GDP daemon.
_gdp_gob_freehandle
Does the actual deallocation of the handle.  Removes the GOB from the cache (by calling _gdp_gob_cache_drop).  If the GOB includes a free function, that function is called (this is used by the GDP daemon).  It then frees the memory allocated to the handle itself.

Protocol Data Units

Implemented in gdp/gdp_pdu.c.

Packets are marshalled and demarshalled in gdp/gdp_pdu.c.  Each packet has the following fields (with the number of octets for the field):

One field is used to indicate both commands and acknowledgements/negative acknowledgements.  See the comments in gdp/gdp_pdu.h for the details of those values.  This is worth emphasizing: the command field in the protocol encodes both imperative commands (e.g., "write this data") and responses ("that data was written" or "could not write that data").  Both forms are described in the code as "commands", and even share a single dispatch table.

The routines for handling packets are:

_gdp_pdu_new
Allocates a new (empty) packet.
_gdp_pdu_free
Frees a packet.
_gdp_pdu_out
Given a packet structure and an output buffer, converts that packet to external format and writes it to the buffer.  Under normal circumstances this buffer is associated with an I/O channel, and hence is written to the communication socket automatically.
_gdp_pdu_in
Reads a packet from an I/O buffer and converts it to internal format.  It is possible that this routine can return without reading the entire packet with the special status code GDP_STAT_KEEP_READING.  Under most cases it should be called in a loop until a successful status is returned.  As with _gdp_pdu_out, the I/O buffer is normally associated with an I/O channel.  See the discussion of the event loop for more details.
_gdp_pdu_dump
Prints a packet in a form suitable only for debugging.

GDP Protocol

Implemented in gdp/gdp_proto.c.

The basic model is that users (e.g., the API layer) create a request with _gdp_req_new which contains a command (what operation needs to be done), an optional PDU buffer, an optional pointer to a GOB handle on which to perform the operation, the connection on which to operate, and some flag bits.  The PDU buffer in turn contains all the information to be passed to or from the service, including data, timestamps, record numbers, etc.  For example, on read the PDU will contain the record number to be read, and the response will fill in the rest of the information.  The client then calls _gdp_invoke, passing it the request.  That routine in turn sends the request using _gdp_req_send, waits on the condition variable contained in the request to get the final return status, and returns that.

The sending part, implemented by _gdp_req_send, links the request to the requesting GOB (so it can be found when the reply eventually comes in), makes sure that the GOB name is in the associative name → handle cache, and sends the packet.

When the reply message eventually comes in, it triggers an event in the main I/O loop; that is handed to (another) thread for processing.  This is done through the bufferevent interface, part of libevent2, which invokes callbacks when events happen on sockets (that is, it can be considered as having a similar functionality to select, or more accurately kqueue or /dev/poll).  The primary callback used is gdp_read_cb.  That routine allocates a new packet and reads a packet into that area.  If the packet is incomplete (i.e., it hasn't all been read in) the packet is freed and the callback returns (it will be called again later when more of the packet is read).  If the entire packet is available, it calls _gdp_pdu_process (indirectly, via the process field in the channel) to interpret it.

[The code currently has a non-functional #ifdef for GDP_PDU_QUEUE.  This is for a future extension allowing the packet to be dropped into another queue for interpretation from a process in the thread pool so that the read thread can focus entirely on reading and handing off packets.  This technique is already used in gdplogd, and will only be used in the client library if necessary for performance.]

PDU processing in _gdp_pdu_process involves finding the associated GOB, if available, and from that finding the associated request.  If no request is found a new request is created; this is the case with spontaneous commands.  There is some processing of datums to handle the case where a request had an existing datum that needs to be replaced with a new one (for example, a read request that passed in a datum with the record number and returns a datum with the associated timestamp and data; since the read API passes in the datum, there is some shuffling necessary with the underlying data buffers so that the caller can actually access the returned data).  The request (now a response) is then passed to _gdp_req_dispatch for processing.

Processing in _gdp_req_dispatch is done through a simple dispatch table indexed by command.  Requests can be either commands or responses (ack/nak); in most cases, client programs should only receive responses, and will get a "not implemented" if any commands are received.  Responses fall into 2½ classes: successes (acks), client naks, and server naks.  There are several of each of these, which roughly correspond to HTTP response codes (2xx, 4xx, and 5xx respectively) or CoAP codes (2.xx, 4.xx, and 5.xx).  These piggyback on three routines (ack_success, nak_client, and nak_server), the latter two of which are essentially identical, doing nothing but passing the error on up the stack.  The ack_success routine checks for some nonsensical situations and passes the (interpreted) status back up to _gdp_pdu_process.

The response from the command is then interpreted by _gdp_pdu_process.  There are two cases.  The simpler one is when the command/response was a simple ack/nak, in which case the status is stored in the request and the thread waiting on that request is poked to wake up.  The other case is when the request is a subscription, in which case this request (which in this case must be a response) must be turned into an event.  If so, a new event is created and passed off to the event subsystem for delivery (described elsewhere).

Response Confusion

One major confusion results from the large variety of response codes from various existing subsystems.  One common status encoding is HTTP status codes and CoAP status codes.  These overlap (mostly), so we can (mostly) treat them as the same thing.  They are the primary mode of passing protocol status in the GDP protocol, but since those response are only eight bits the codes are offset: HTTP/COAP codes 200–264 become commands 128–191, codes 400–432 become commands 192–223, and codes 500–531 become commands 224–254.

Internally the GDP library uses the EP_STAT abstraction as a status lingua franca.  EP_STATs allow encoding of response codes in a single integer (so they can be passed easily back from functions).  Those integers encode a severity (most importantly, success or failure), a registry and a module identifier (which for this purpose can be treated as one piece), and detail information.  The module is used to create broad categories: for example, one module corresponds to Unix errnos, allowing them to be passed back directly.  Another module is specific to the GDP.  There are several generic status codes defined in gdp/gdp_stat.h such as GDP_STAT_KEEP_READING (a warning that a partial packet has been read but the remainder remains to be read) or GDP_STAT_CORRUPT_GOB (a severe error saying that the disk representation of a GOB is corrupt and cannot be read).  The HTTP/CoAP codes are encoded in the same module, but back in their original positions, i.e., in the 200–599 range.

Event Processing

Events (see gdp/gdp_event.c) are a way of delivering information to a client without using an RPC-style blocking response.  Specifically, a client can issue several commands and then wait for the responses to come in an arbitrary order.  The implementation is simple: as messages are read that cannot be immediately processed they are turned into events.  Those events are linked onto an active list.  The client can collect events using gdp_event_next, which takes them off the active queue.  The client is responsible for freeing them using gdp_event_free.

Event Loop

To be written.