Bug #58
403 forbidden errors for many consecutive asynchronous writes (15 bytes @ 100 Hz)
0%
Description
I'm piping data of around 15 bytes per reading (line) at 100 Hz to ./gdp-writer -a edu.berkeley.eecs.bid.mellis.arduino101
(with ~/.ep_adm_params/gdp set to swarm.gdp.routers=gdp-01.eecs.berkeley.edu; gdp-02.eecs.berkeley.edu
). Sometimes the data seems to write correctly for a second or so at a time, but mostly it seems no data is written (a subscribed GDP reader sees nothing) and the writer gives a successive string of:
» Got input «-587 -107 4054 Asynchronous event type 6: recno 17469, stat ERROR: 403 forbidden [Berkeley:Swarm-GDP:403] » Got input «-581 -113 4103 Asynchronous event type 6: recno 17469, stat ERROR: 403 forbidden [Berkeley:Swarm-GDP:403] » Got input «-565 -116 4098 Asynchronous event type 6: recno 17469, stat ERROR: 403 forbidden [Berkeley:Swarm-GDP:403] » Got input «-584 -117 4060 Asynchronous event type 6: recno 17469, stat ERROR: 403 forbidden [Berkeley:Swarm-GDP:403] » Got input «-579 -118 4081 Asynchronous event type 6: recno 17469, stat ERROR: 403 forbidden [Berkeley:Swarm-GDP:403] » Got input «-566 -113 4061 Asynchronous event type 6: recno 17469, stat ERROR: 403 forbidden [Berkeley:Swarm-GDP:403] » Got input «-573 -110 4095
I've seen this at lower rates as well (e.g. 50 Hz) but it seems to happen less often at those lower speeds. In synchronous mode, I don't see any errors, but the writing to the log quickly backs up.
Specifically, I'm reading data being output by an Arduino 101 over USB using: sudo cu -s 115200 -l /dev/tty.usbmodem1411131 | ./gdp-writer -a edu.berkeley.eecs.bid.mellis.arduino101
. Pinging gdp-01.eecs.berkeley.edu gives values over the place:
PING gdp-01.eecs.berkeley.edu (128.32.33.233): 56 data bytes 64 bytes from 128.32.33.233: icmp_seq=0 ttl=61 time=9.010 ms 64 bytes from 128.32.33.233: icmp_seq=1 ttl=61 time=34.585 ms 64 bytes from 128.32.33.233: icmp_seq=2 ttl=61 time=9.591 ms 64 bytes from 128.32.33.233: icmp_seq=3 ttl=61 time=1.926 ms 64 bytes from 128.32.33.233: icmp_seq=4 ttl=61 time=37.147 ms 64 bytes from 128.32.33.233: icmp_seq=5 ttl=61 time=168.484 ms 64 bytes from 128.32.33.233: icmp_seq=6 ttl=61 time=157.103 ms 64 bytes from 128.32.33.233: icmp_seq=7 ttl=61 time=158.294 ms 64 bytes from 128.32.33.233: icmp_seq=8 ttl=61 time=217.128 ms 64 bytes from 128.32.33.233: icmp_seq=9 ttl=61 time=1.966 ms 64 bytes from 128.32.33.233: icmp_seq=10 ttl=61 time=12.873 ms 64 bytes from 128.32.33.233: icmp_seq=11 ttl=61 time=1.830 ms 64 bytes from 128.32.33.233: icmp_seq=12 ttl=61 time=1.673 ms 64 bytes from 128.32.33.233: icmp_seq=13 ttl=61 time=34.974 ms 64 bytes from 128.32.33.233: icmp_seq=14 ttl=61 time=2.448 ms 64 bytes from 128.32.33.233: icmp_seq=15 ttl=61 time=2.100 ms 64 bytes from 128.32.33.233: icmp_seq=16 ttl=61 time=42.286 ms 64 bytes from 128.32.33.233: icmp_seq=17 ttl=61 time=35.831 ms 64 bytes from 128.32.33.233: icmp_seq=18 ttl=61 time=26.300 ms
This is on eecs-secure wifi in BiD, on a Mac running macOS 10.12, using commit 202d751 of the GDP.
History
#1 Updated by David Mellis about 7 years ago
At the moment, this is happening reasonably often for data at 10 Hz. Weirdly, ping times seem to be better:
PING gdp-01.eecs.berkeley.edu (128.32.33.233): 56 data bytes 64 bytes from 128.32.33.233: icmp_seq=0 ttl=61 time=5.821 ms 64 bytes from 128.32.33.233: icmp_seq=1 ttl=61 time=2.079 ms 64 bytes from 128.32.33.233: icmp_seq=2 ttl=61 time=2.181 ms 64 bytes from 128.32.33.233: icmp_seq=3 ttl=61 time=2.279 ms Request timeout for icmp_seq 4 64 bytes from 128.32.33.233: icmp_seq=5 ttl=61 time=4.321 ms 64 bytes from 128.32.33.233: icmp_seq=6 ttl=61 time=4.917 ms 64 bytes from 128.32.33.233: icmp_seq=7 ttl=61 time=4.611 ms 64 bytes from 128.32.33.233: icmp_seq=8 ttl=61 time=1.518 ms 64 bytes from 128.32.33.233: icmp_seq=9 ttl=61 time=4.193 ms 64 bytes from 128.32.33.233: icmp_seq=10 ttl=61 time=1.946 ms 64 bytes from 128.32.33.233: icmp_seq=11 ttl=61 time=1.767 ms 64 bytes from 128.32.33.233: icmp_seq=12 ttl=61 time=1.605 ms 64 bytes from 128.32.33.233: icmp_seq=13 ttl=61 time=2.317 ms 64 bytes from 128.32.33.233: icmp_seq=14 ttl=61 time=28.407 ms 64 bytes from 128.32.33.233: icmp_seq=15 ttl=61 time=6.121 ms 64 bytes from 128.32.33.233: icmp_seq=16 ttl=61 time=1.842 ms 64 bytes from 128.32.33.233: icmp_seq=17 ttl=61 time=2.185 ms 64 bytes from 128.32.33.233: icmp_seq=18 ttl=61 time=2.392 ms 64 bytes from 128.32.33.233: icmp_seq=19 ttl=61 time=1.960 ms 64 bytes from 128.32.33.233: icmp_seq=20 ttl=61 time=12.125 ms 64 bytes from 128.32.33.233: icmp_seq=21 ttl=61 time=2.007 ms 64 bytes from 128.32.33.233: icmp_seq=22 ttl=61 time=16.474 ms 64 bytes from 128.32.33.233: icmp_seq=23 ttl=61 time=2.290 ms 64 bytes from 128.32.33.233: icmp_seq=24 ttl=61 time=2.551 ms 64 bytes from 128.32.33.233: icmp_seq=25 ttl=61 time=3.492 ms 64 bytes from 128.32.33.233: icmp_seq=26 ttl=61 time=2.573 ms 64 bytes from 128.32.33.233: icmp_seq=27 ttl=61 time=2.009 ms 64 bytes from 128.32.33.233: icmp_seq=28 ttl=61 time=1.959 ms
#2 Updated by Nitesh Mor about 7 years ago
I believe the bug is because of the timing of multiple worker threads. I verified that
- the bug exists on the latest code (and such a write pattern crashes the log-server occasionally)
- restricting the number of worker threads by using
-n 1
parameter when starting up a log-server works fine (checked at 200Hz)
#3 Updated by Eric Allman about 7 years ago
- Category set to gdplogd
- Status changed from New to In Progress
- Assignee set to Eric Allman
The problem is that asynchronous appends are sent off to the thread pool in gdplogd. Those threads are not guaranteed to be run in the order of input, so you get out-of-order writes. This has to be fixed in the log daemon, after setting the runtime parameter swarm.gdplogd.sequencing.allowgaps = true
. We should be able to reboot with the new parameter set today.
#4 Updated by Eric Allman about 7 years ago
- Status changed from In Progress to Feedback
David, the servers have been rebooted. If I'm right, this problem should be fixed. Please let me know.
#5 Updated by Nitesh Mor almost 7 years ago
- Status changed from Feedback to Closed
Marking this closed for now, I can not seem to reproduce this.
If the issue persists, please reopen.