Project

General

Profile

Statistics
| Branch: | Tag: | Revision:

gdp / README.md @ master

History | View | Annotate | Download (8.79 KB)

% Global Dataplane

This directory contains the source code for the Global Dataplane (GDP).

NOTE WELL: This is an incomplete implementation of the GDP. There will be incompatible changes in the future. Use in production at your own risk, void where prohibited by law, etc., etc. See Implementation Status below.

If you are a user of the GDP you probably do not want to start from the source code. See README-deb.md for installing from the Debian packages (which includes Ubuntu).

If you are running on any system other than Debian, you have to compile from source code. At the time of this writing, the GDP also compiles on MacOS, FreeBSD, and RedHat. For details on compiling from source, see https://gdp.cs.berkeley.edu/redmine/projects/gdp/wiki/Compiling_the_GDP_from_Source or README-compiling.md. The former is probably more up to date. If you are going to be working with the source code you should definitely register for an account at https://gdp.cs.berkeley.edu/redmine/account/register. Unfortunately the CAPTCHA isn't sufficient to thwart bogus accounts, so if you don't have a berkeley.edu address please contact us in advance so we'll know to approve your account.

In most cases you will only need the client libraries and applications. Specifically, you do not need to run your own routers and log servers. If you do want to administer your own servers, see doc/admin/gdp-server-admin.md.

If you are actually doing development on the GDP itself, please see README-developers.md for more information.

The remainder of this document is broken into two parts. The first assumes you are just going to use the pre-compiled applications from the apps directory. The second assumes you are going to be writing your own programs and linking them against the GDP library.

GDP General Use

This section applies for everyone using the GDP. It primarily discusses configuration.

Configuration

If you are using the servers at Berkeley you should just need to run adm/gdp-bin-setup.sh. If you are running your own servers you can examine that script to see what is needed. In particular, you'll need to adjust the parameters specifying:

  • The address of the routing node(s) for clients to use to contact the GDP.
  • The name of the log creation service for your cluster.
  • The location of the Human-Oriented Name to GDPname Directory.

Configuration files are simple "name=value" pairs, one per line. There is a built-in search path ".gdp/params:~/.gdp/params:/usr/local/etc/gdp/params:/etc/gdp/params" that can be overridden the EP_PARAM_PATH environment variable. (Note: if a program is running setuid then only the two system paths are searched, and EP_PARAM_PATH is ignored.) Those directories are searched for a file named "gdp". The major parameters of interest are:

swarm.gdp.routers (file: gdp) swarm.gdp.hongdb.host (file: gdp) swarm.gdp.creation-service.name (file: gdp-create)

See man 7 gdp (or man gdp/gdp.7) for details of configuration.

Example

In file /etc/gdp/params/gdp:

swarm.gdp.routers=mygdp.example.com; gdp-01.eecs.berkeley.edu; gdp-02.eecs.berkeley.edu
swarm.gdp.hongdb.host=gdp-hongd.cs.berkeley.edu

This tells application programs where to look for routers and where to find the Human-Oriented Name to GDPname Directory service.

Supplied Applications

All of the supplied applications have man pages included. This description will just give an overview of the programs; see the associated man pages for details.

The gdp-writer program reads records from the standard input and writes to the target log. It is invoked as:

gdp-writer log-name

The log-name is the name of the GCL to be appended to. Lines are read from the input and written to the log, where each input line creates one log record. See the gdp-writer(1) man page for more details.

The gdp-reader program reads records from the log and writes them to the standard output. It is invoked as:

gdp-reader [-f firstrec] [-n nrecs] [-s] [-t] [-v] log-name

The -f and -n flags specify the first record number (starting from 1) and the maximum number of records to read. By default gdp-reader reads all the records in the log until it has returned nrec records or reached the end of the log, whichever comes first. The -s flag turns on subscription mode, causing gdp-reader to wait until new records are added instead of stopping at the end of the log. It will terminate if nrecs records have been displayed or if the program is interrupted. By default gdp-reader assumes that the data may be binary, so it gives a hexadecimal dump; the -t flag outputs text only. The -v flag prints signature information associated with the record. For example:

gdp-reader -f 2 -s edu.berkeley.eecs.eric.sensor45

will return all the data already recorded in the log starting from record 2 and then wait until more data is written; the new data will be immediately printed. If -s were used without -f, no existing data would be printed, only new data. If neither flag were specified all the existing data would be printed, and then gdp-reader would exit.

Writing GDP-based Programs

The native programming interface for the GDP is in C. The API is documented in doc/developer/gdp-programmatic-api.html. To compile, you'll need to use:

-I_path-to-include-files_ `mysql_config --include`

If the GDP is installed in the standard system directories you can skip the -I flag.

To link, you'll need:

-L_path-to-libraries_ \
-lgdp -lep -lcrypto -levent -levent_pthreads -pthread \
`mysql_config --libs`

The -L flag is not needed if the GDP is installed in the standard system directories. You may need additional-L` flags for some of the other packages.

[Yes, we know that we should have a gdp-config that acts like mysql_config to simplify things. In the meantime, see apps/Makefile for a template.]

There is also a binding for Python in lang/python which is well tested. Documentation is in lang/python/README. There are also binding for Java in lang/java and JavaScript in lang/js, both of which have not been maintained and probably do not work at the present time.

Implementation Status

There are many functions that are not yet working. This list focuses primarily on items that will require incompatible changes, at least internally (that is, recompilation may be necessary). We hope we have the API in a reasonably good state, but we are not ready to guarantee that there will be no more flag days.

This list is probably incomplete.

  • The security model is incomplete. The extensions for hash chains have been started but are incomplete. Signatures should still work, but signatures on every append request is too slow for a large system.

  • Similarly, acknowledgements (server to client) should be signed (or otherwise validated). The wire protocol has been updated to accommodate this, but it isn't implemented.

  • Log names, which should ultimately be the hash of the log metadata, are essentially random. This means that at some point existing logs will become inaccessible (since the naming scheme will change). This will require a log name directory service (to allow human-friendly names), which in turn probably requires the Control Plane interface.

  • Log Replication has not been integrated. This is needed for better durability.

  • Log Migration does not exist.

  • Log Expiration is not implemented. This means that all data in all logs last forever, and logs never disappear.

  • The current PDU (Protocol Data Unit) format has been radically updated to allow future flexibility, but new features (such as header compression) still don't exist. It all works with the updated router infrastructure which should be faster than the old version, but there is still a lot to do.

  • The maximum size of a PDU has been reduced in the hopes that it would improve performance. This means that individual writes to the GDP can no longer be particularly large (e.g., entire videos). Changing this back to allow large writes would entail another flag day. Also, at the moment the failures if you write an overly-large record are obscure at best.

  • There is no Control Plane interface. This means that functions that should be automated (e.g., log placement, directory service) must be done manually. This may change how end users and applications interact with the system. The log creation service exists, but is a complete hack and needs to be cleaned up.

  • The on-disk representation has been updated to allow for richer semantics, but performance and reliability testing remain to be done. It isn't clear that the new format is more compact than the old format, but it does more.

YOU HAVE BEEN WARNED!!!