Project

General

Profile

Statistics
| Branch: | Tag: | Revision:

gdp / README.md @ master

History | View | Annotate | Download (8.81 KB)

1 b2714365 Eric Allman
GLOBAL DATAPLANE
2
================
3 92e82652 Eric Allman
4 b2714365 Eric Allman
This directory contains the source code for the Global Dataplane (GDP).
5 92e82652 Eric Allman
6 33c6e739 Eric Allman
**NOTE WELL: This is an incomplete implementation of the GDP.  There
7
*will* be incompatible changes in the future.  Use in production at
8
your own risk, void where prohibited by law, etc., etc.  See
9
Implementation Status below.**
10
11 b2714365 Eric Allman
If you are a user of the GDP you probably do not want to start from
12
the source code.  See `README-deb.md` for installing from the Debian
13
packages (which includes Ubuntu).
14 92e82652 Eric Allman
15 b2714365 Eric Allman
If you are running on any system other than Debian, you have to
16
compile from source code.  At the time of this writing, the GDP
17
also compiles on MacOS, FreeBSD, and RedHat.  For details on
18 5ca0856c Eric Allman
compiling from source, see
19 632861ce Eric Allman
<https://gdp.cs.berkeley.edu/redmine/projects/gdp/wiki/Compiling_the_GDP_from_Source>
20 390904c6 Eric Allman
or `README-compiling.md`.  The former is probably more up to date.
21
If you are going to be working with the source code you should
22
definitely register for an account at
23
<https://gdp.cs.berkeley.edu/redmine/account/register>.
24
Unfortunately the CAPTCHA isn't sufficient to thwart bogus accounts,
25
so if you don't have a berkeley.edu address please contact us in
26
advance so we'll know to approve your account.
27 92e82652 Eric Allman
28 b2714365 Eric Allman
In most cases you will only need the client libraries and
29
applications.  Specifically, you do not need to run your own
30
routers and log servers.  If you do want to administer your own
31 390904c6 Eric Allman
servers, see `doc/admin/gdp-server-admin.md`.
32 92e82652 Eric Allman
33 b2714365 Eric Allman
If you are actually doing development on the GDP itself, please
34
see `README-developers.md` for more information.
35 92e82652 Eric Allman
36 b2714365 Eric Allman
The remainder of this document is broken into two parts.  The first
37
assumes you are just going to use the pre-compiled applications from
38
the `apps` directory.  The second assumes you are going to be
39
writing your own programs and linking them against the GDP library.
40 92e82652 Eric Allman
41 b2714365 Eric Allman
GDP General Use
42
---------------
43 92e82652 Eric Allman
44 b2714365 Eric Allman
This section applies for everyone using the GDP.  It primarily
45
discusses configuration.
46 92e82652 Eric Allman
47 b2714365 Eric Allman
### Configuration
48 d3f4b844 Nitesh Mor
49 390904c6 Eric Allman
If you are using the servers at Berkeley you should just need to
50
run `adm/gdp-bin-setup.sh`.  If you are running your own servers
51
you can examine that script to see what is needed.  In particular,
52
you'll need to adjust the parameters specifying:
53
54
  * The address of the routing node(s) for clients to use to
55
    contact the GDP.
56
  * The name of the log creation service for your cluster.
57
  * The location of the Human-Oriented Name to GDPname Directory.
58 d3f4b844 Nitesh Mor
59 b2714365 Eric Allman
Configuration files are simple "name=value" pairs, one per line.
60 ade0a0c9 Eric Allman
There is a built-in search path
61 9e62cb26 Eric Allman
"`.gdp/params:~/.gdp/params:/usr/local/etc/gdp/params:/etc/gdp/params`"
62 92e82652 Eric Allman
that can be overridden the `EP_PARAM_PATH` environment variable.
63
(Note: if a program is running setuid then only the two
64
system paths are searched, and `EP_PARAM_PATH` is ignored.)
65 b2714365 Eric Allman
Those directories are searched for a file named "`gdp`".  The
66 390904c6 Eric Allman
major parameters of interest are:
67 92e82652 Eric Allman
68 390904c6 Eric Allman
`swarm.gdp.routers`                (file: `gdp`)
69
`swarm.gdp.hongdb.host`            (file: `gdp`)
70
`swarm.gdp.creation-service.name`  (file: `gdp-create`)
71 92e82652 Eric Allman
72 390904c6 Eric Allman
See `man 7 gdp` (or `man gdp/gdp.7`) for details of configuration.
73 92e82652 Eric Allman
74
#### Example
75
76 9e62cb26 Eric Allman
In file `/etc/gdp/params/gdp`:
77 92e82652 Eric Allman
78 74875cb2 Eric Allman
	swarm.gdp.routers=mygdp.example.com; gdp-01.eecs.berkeley.edu; gdp-02.eecs.berkeley.edu
79 390904c6 Eric Allman
	swarm.gdp.hongdb.host=gdp-hongd.cs.berkeley.edu
80 92e82652 Eric Allman
81 390904c6 Eric Allman
This tells application programs where to look for routers and
82
where to find the Human-Oriented Name to GDPname Directory service.
83 92e82652 Eric Allman
84 b2714365 Eric Allman
### Supplied Applications
85 92e82652 Eric Allman
86 b2714365 Eric Allman
All of the supplied applications have man pages included.  This
87
description will just give an overview of the programs; see the
88
associated man pages for details.
89 92e82652 Eric Allman
90
The gdp-writer program reads records from the standard input
91
and writes to the target log.  It is invoked as:
92
93 395537ec Eric Allman
	gdp-writer log-name
94 92e82652 Eric Allman
95 395537ec Eric Allman
The _log-name_ is the name of the GCL to be appended to.  Lines are
96 92e82652 Eric Allman
read from the input and written to the log, where each input
97
line creates one log record.  See the gdp-writer(1) man page
98
for more details.
99
100
The gdp-reader program reads records from the log and writes
101
them to the standard output.  It is invoked as:
102
103 395537ec Eric Allman
	gdp-reader [-f firstrec] [-n nrecs] [-s] [-t] [-v] log-name
104 b2714365 Eric Allman
105
The `-f` and `-n` flags specify the first record number (starting
106
from 1) and the maximum number of records to read.  By default
107
`gdp-reader` reads all the records in the log until it has returned
108
`nrec` records or reached the end of the log, whichever comes first.
109
The `-s` flag turns on subscription mode, causing `gdp-reader` to
110
wait until new records are added instead of stopping at the end of
111
the log.  It will terminate if `nrecs` records have been displayed
112
or if the program is interrupted.  By default `gdp-reader` assumes
113
that the data may be binary, so it gives a hexadecimal dump; the
114
`-t` flag outputs text only.  The `-v` flag prints signature
115
information associated with the record.  For example:
116
117
	gdp-reader -f 2 -s edu.berkeley.eecs.eric.sensor45
118
119
will return all the data already recorded in the log starting
120
from record 2 and then wait until more data is written; the new
121
data will be immediately printed.  If `-s` were used without `-f`,
122
no existing data would be printed, only new data.  If neither
123
flag were specified all the existing data would be printed, and
124
then `gdp-reader` would exit.
125
126
Writing GDP-based Programs
127
--------------------------
128 92e82652 Eric Allman
129
The native programming interface for the GDP is in C.  The API is
130 390904c6 Eric Allman
documented in `doc/developer/gdp-programmatic-api.html`.  To compile,
131
you'll need to use:
132
133 e0451f24 Eric Allman
    -I_path-to-include-files_ `mysql_config --include`
134 390904c6 Eric Allman
135
If the GDP is installed in the standard system directories you
136
can skip the `-I` flag.
137
138
To link, you'll need:
139
140
    -L_path-to-libraries_ \
141
	-lgdp -lep -lcrypto -levent -levent_pthreads -pthread \
142 e0451f24 Eric Allman
	`mysql_config --libs`
143 390904c6 Eric Allman
144
The `-L flag is not needed if the GDP is installed in the standard
145
system directories.  You may need additional `-L` flags for some
146
of the other packages.
147
148
[Yes, we know that we should have a `gdp-config` that acts like
149 e0451f24 Eric Allman
`mysql_config` to simplify things.  In the meantime, see
150 390904c6 Eric Allman
`apps/Makefile` for a template.]
151 92e82652 Eric Allman
152
There is also a binding for Python in `lang/python` which is well
153 395537ec Eric Allman
tested.  Documentation is in `lang/python/README`.  There are also
154
binding for Java in `lang/java` and JavaScript in `lang/js`, both
155
of which have not been maintained and probably do not work at the
156
present time.
157 92e82652 Eric Allman
158 33c6e739 Eric Allman
Implementation Status
159
---------------------
160
161
There are many functions that are not yet working.  This list
162
focuses primarily on items that will require incompatible changes,
163
at least internally (that is, recompilation may be necessary).
164 395537ec Eric Allman
We hope we have the API in a reasonably good state, but we are
165
not ready to guarantee that there will be no more flag days.
166 33c6e739 Eric Allman
167
This list is probably incomplete.
168
169 395537ec Eric Allman
* The security model is incomplete.  The extensions for hash
170
chains have been started but are incomplete.  Signatures should
171
still work, but signatures on every append request is too slow for
172
a large system.
173 33c6e739 Eric Allman
174
* Similarly, acknowledgements (server to client) should be signed
175 395537ec Eric Allman
(or otherwise validated).  The wire protocol has been updated to
176
accommodate this, but it isn't implemented.
177 33c6e739 Eric Allman
178
* Log names, which should ultimately be the hash of the log
179
metadata, are essentially random.  This means that at some point
180
existing logs will become inaccessible (since the naming scheme
181
will change).  This will require a log name directory service
182
(to allow human-friendly names), which in turn probably requires
183
the Control Plane interface.
184
185 395537ec Eric Allman
* Log Replication has not been integrated.  This is needed for
186
better durability.
187 33c6e739 Eric Allman
188
* Log Migration does not exist.
189
190
* Log Expiration is not implemented.  This means that all data in
191 390904c6 Eric Allman
all logs last forever, and logs never disappear.
192 33c6e739 Eric Allman
193 395537ec Eric Allman
* The current PDU (Protocol Data Unit) format has been radically
194
updated to allow future flexibility, but new features (such as
195
header compression) still don't exist.  It all works with the
196
updated router infrastructure which should be faster than the old
197
version, but there is still a lot to do.
198 33c6e739 Eric Allman
199 395537ec Eric Allman
* The maximum size of a PDU has been reduced in the hopes that
200
it would improve performance.  This means that individual writes
201
to the GDP can no longer be particularly large (e.g., entire
202
videos).  Changing this back to allow large writes would entail
203
another flag day.  Also, at the moment the failures if you write
204
an overly-large record are obscure at best.
205 33c6e739 Eric Allman
206
* There is no Control Plane interface.  This means that functions
207
that should be automated (e.g., log placement, directory service)
208
must be done manually.  This may change how end users and
209 395537ec Eric Allman
applications interact with the system.  The log creation service
210
exists, but is a complete hack and needs to be cleaned up.
211 33c6e739 Eric Allman
212 395537ec Eric Allman
* The on-disk representation has been updated to allow for richer
213
semantics, but performance and reliability testing remain to be
214
done.  It isn't clear that the new format is more compact than
215
the old format, but it does more.
216 33c6e739 Eric Allman
217
**YOU HAVE BEEN WARNED!!!**
218
219
220 92e82652 Eric Allman
<!-- vim: set ai sw=4 sts=4 ts=4 : -->
221 390904c6 Eric Allman
<!-- Use "pandoc -sS -o README.html README.md"
222 632861ce Eric Allman
to process this to HTML -->