Project

General

Profile

Bug #58

403 forbidden errors for many consecutive asynchronous writes (15 bytes @ 100 Hz)

Added by David Mellis about 5 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
gdplogd
Start date:
09/27/2016
Due date:
% Done:

0%


Description

I'm piping data of around 15 bytes per reading (line) at 100 Hz to ./gdp-writer -a edu.berkeley.eecs.bid.mellis.arduino101 (with ~/.ep_adm_params/gdp set to swarm.gdp.routers=gdp-01.eecs.berkeley.edu; gdp-02.eecs.berkeley.edu). Sometimes the data seems to write correctly for a second or so at a time, but mostly it seems no data is written (a subscribed GDP reader sees nothing) and the writer gives a successive string of:

»                                                                                                                                                                                              Got input «-587    -107    4054
 Asynchronous event type 6:
                            recno 17469, stat ERROR: 403 forbidden [Berkeley:Swarm-GDP:403]
»                                                                                              Got input «-581    -113    4103
 Asynchronous event type 6:
                            recno 17469, stat ERROR: 403 forbidden [Berkeley:Swarm-GDP:403]
»                                                                                              Got input «-565    -116    4098
 Asynchronous event type 6:
                            recno 17469, stat ERROR: 403 forbidden [Berkeley:Swarm-GDP:403]
»                                                                                              Got input «-584    -117    4060
 Asynchronous event type 6:
                            recno 17469, stat ERROR: 403 forbidden [Berkeley:Swarm-GDP:403]
»                                                                                              Got input «-579    -118    4081
 Asynchronous event type 6:
                            recno 17469, stat ERROR: 403 forbidden [Berkeley:Swarm-GDP:403]
»                                                                                              Got input «-566    -113    4061
 Asynchronous event type 6:
                            recno 17469, stat ERROR: 403 forbidden [Berkeley:Swarm-GDP:403]
»                                                                                              Got input «-573    -110    4095

I've seen this at lower rates as well (e.g. 50 Hz) but it seems to happen less often at those lower speeds. In synchronous mode, I don't see any errors, but the writing to the log quickly backs up.

Specifically, I'm reading data being output by an Arduino 101 over USB using: sudo cu -s 115200 -l /dev/tty.usbmodem1411131 | ./gdp-writer -a edu.berkeley.eecs.bid.mellis.arduino101. Pinging gdp-01.eecs.berkeley.edu gives values over the place:

PING gdp-01.eecs.berkeley.edu (128.32.33.233): 56 data bytes
64 bytes from 128.32.33.233: icmp_seq=0 ttl=61 time=9.010 ms
64 bytes from 128.32.33.233: icmp_seq=1 ttl=61 time=34.585 ms
64 bytes from 128.32.33.233: icmp_seq=2 ttl=61 time=9.591 ms
64 bytes from 128.32.33.233: icmp_seq=3 ttl=61 time=1.926 ms
64 bytes from 128.32.33.233: icmp_seq=4 ttl=61 time=37.147 ms
64 bytes from 128.32.33.233: icmp_seq=5 ttl=61 time=168.484 ms
64 bytes from 128.32.33.233: icmp_seq=6 ttl=61 time=157.103 ms
64 bytes from 128.32.33.233: icmp_seq=7 ttl=61 time=158.294 ms
64 bytes from 128.32.33.233: icmp_seq=8 ttl=61 time=217.128 ms
64 bytes from 128.32.33.233: icmp_seq=9 ttl=61 time=1.966 ms
64 bytes from 128.32.33.233: icmp_seq=10 ttl=61 time=12.873 ms
64 bytes from 128.32.33.233: icmp_seq=11 ttl=61 time=1.830 ms
64 bytes from 128.32.33.233: icmp_seq=12 ttl=61 time=1.673 ms
64 bytes from 128.32.33.233: icmp_seq=13 ttl=61 time=34.974 ms
64 bytes from 128.32.33.233: icmp_seq=14 ttl=61 time=2.448 ms
64 bytes from 128.32.33.233: icmp_seq=15 ttl=61 time=2.100 ms
64 bytes from 128.32.33.233: icmp_seq=16 ttl=61 time=42.286 ms
64 bytes from 128.32.33.233: icmp_seq=17 ttl=61 time=35.831 ms
64 bytes from 128.32.33.233: icmp_seq=18 ttl=61 time=26.300 ms

This is on eecs-secure wifi in BiD, on a Mac running macOS 10.12, using commit 202d751 of the GDP.

History

#1 Updated by David Mellis about 5 years ago

At the moment, this is happening reasonably often for data at 10 Hz. Weirdly, ping times seem to be better:

PING gdp-01.eecs.berkeley.edu (128.32.33.233): 56 data bytes
64 bytes from 128.32.33.233: icmp_seq=0 ttl=61 time=5.821 ms
64 bytes from 128.32.33.233: icmp_seq=1 ttl=61 time=2.079 ms
64 bytes from 128.32.33.233: icmp_seq=2 ttl=61 time=2.181 ms
64 bytes from 128.32.33.233: icmp_seq=3 ttl=61 time=2.279 ms
Request timeout for icmp_seq 4
64 bytes from 128.32.33.233: icmp_seq=5 ttl=61 time=4.321 ms
64 bytes from 128.32.33.233: icmp_seq=6 ttl=61 time=4.917 ms
64 bytes from 128.32.33.233: icmp_seq=7 ttl=61 time=4.611 ms
64 bytes from 128.32.33.233: icmp_seq=8 ttl=61 time=1.518 ms
64 bytes from 128.32.33.233: icmp_seq=9 ttl=61 time=4.193 ms
64 bytes from 128.32.33.233: icmp_seq=10 ttl=61 time=1.946 ms
64 bytes from 128.32.33.233: icmp_seq=11 ttl=61 time=1.767 ms
64 bytes from 128.32.33.233: icmp_seq=12 ttl=61 time=1.605 ms
64 bytes from 128.32.33.233: icmp_seq=13 ttl=61 time=2.317 ms
64 bytes from 128.32.33.233: icmp_seq=14 ttl=61 time=28.407 ms
64 bytes from 128.32.33.233: icmp_seq=15 ttl=61 time=6.121 ms
64 bytes from 128.32.33.233: icmp_seq=16 ttl=61 time=1.842 ms
64 bytes from 128.32.33.233: icmp_seq=17 ttl=61 time=2.185 ms
64 bytes from 128.32.33.233: icmp_seq=18 ttl=61 time=2.392 ms
64 bytes from 128.32.33.233: icmp_seq=19 ttl=61 time=1.960 ms
64 bytes from 128.32.33.233: icmp_seq=20 ttl=61 time=12.125 ms
64 bytes from 128.32.33.233: icmp_seq=21 ttl=61 time=2.007 ms
64 bytes from 128.32.33.233: icmp_seq=22 ttl=61 time=16.474 ms
64 bytes from 128.32.33.233: icmp_seq=23 ttl=61 time=2.290 ms
64 bytes from 128.32.33.233: icmp_seq=24 ttl=61 time=2.551 ms
64 bytes from 128.32.33.233: icmp_seq=25 ttl=61 time=3.492 ms
64 bytes from 128.32.33.233: icmp_seq=26 ttl=61 time=2.573 ms
64 bytes from 128.32.33.233: icmp_seq=27 ttl=61 time=2.009 ms
64 bytes from 128.32.33.233: icmp_seq=28 ttl=61 time=1.959 ms

#2 Updated by Nitesh Mor about 5 years ago

I believe the bug is because of the timing of multiple worker threads. I verified that

  • the bug exists on the latest code (and such a write pattern crashes the log-server occasionally)
  • restricting the number of worker threads by using -n 1 parameter when starting up a log-server works fine (checked at 200Hz)

#3 Updated by Eric Allman about 5 years ago

  • Category set to gdplogd
  • Status changed from New to In Progress
  • Assignee set to Eric Allman

The problem is that asynchronous appends are sent off to the thread pool in gdplogd. Those threads are not guaranteed to be run in the order of input, so you get out-of-order writes. This has to be fixed in the log daemon, after setting the runtime parameter swarm.gdplogd.sequencing.allowgaps = true. We should be able to reboot with the new parameter set today.

#4 Updated by Eric Allman about 5 years ago

  • Status changed from In Progress to Feedback

David, the servers have been rebooted. If I'm right, this problem should be fixed. Please let me know.

#5 Updated by Nitesh Mor about 5 years ago

  • Status changed from Feedback to Closed

Marking this closed for now, I can not seem to reproduce this.

If the issue persists, please reopen.

Also available in: Atom PDF