Project

General

Profile

Bug #61

GDP Error 600

Added by Nima Mousavi almost 4 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Routing Layer
Start date:
09/30/2016
Due date:
% Done:

0%


Description

I've been getting error 600 today when I try to create a gcl, or try to write to some of the logs that I created before.

pi@raspberrypi:~/Desktop/gdpRepo/gdp $ apps/gcl-create
Enter PEM pass phrase:
Verifying - Enter PEM pass phrase:
exiting with status ERROR: 600 no route available [Berkeley:Swarm-GDP:600]

and when accessing "tCcbytv6gY0BdzvMx_JHw9ovPGwcpzvptFJiZ1k2u7Y" to write:
pi@raspberrypi:~/Desktop/gdpRepo/gdp/lang/python/apps $ python writer_test.py tCcbytv6gY0BdzvMx_JHw9ovPGwcpzvptFJiZ1k2u7Y ../../../../../ContextEngine27/test/DgAAAOhbtHYAAAAA-Fm0diD2h36MEfB2AAAAAAAAAAA.pem
Traceback (most recent call last):
File "writer_test.py", line 68, in <module>
main(sys.argv[1], sys.argv[2])
File "writer_test.py", line 50, in main
open_info={'skey':skey})
File "/usr/local/lib/python2.7/dist-packages/gdp/GDP_GCL.py", line 116, in __new__
check_EP_STAT(estat)
File "/usr/local/lib/python2.7/dist-packages/gdp/MISC.py", line 169, in check_EP_STAT
raise EP_STAT_SEV_ERROR(ep_stat)
gdp.MISC.EP_STAT_SEV_ERROR: 'ERROR: 600 no route available [Berkeley:Swarm-GDP:600]'
*** Error in
python': double free or corruption (fasttop): 0x01d80c10 ***
Aborted`

Is something wrong with the servers? I've tried with three different systems (Raspbian, Ubuntu, and Ubuntu Mate) all with the same issue. I'm able to write to my other log (GCL name: 58jK2obVbOma7OwQNkgA7kuYqrEVcy4Tw5hMlREn5jY) with no issue.

Thanks.

RPi.txt Magnifier (10.8 KB) Nima Mousavi, 10/07/2016 02:42 PM

SwarmBox.txt Magnifier (11.1 KB) Nima Mousavi, 10/07/2016 02:42 PM

SwarmBox_2.txt Magnifier (21.4 KB) Nima Mousavi, 10/09/2016 10:07 AM

History

#1 Updated by Eric Allman almost 4 years ago

  • Project changed from GDP to Click GDP Routers

The problem is in the router. The log is hosted on gdp-03, and the router on gdp-03 knows about it, but that name has not been propagated to other routers (or it was propagated and then lost). I should be able to fix the immediate problem by restarting the log daemon on gdp-03, but the bug remains.

#2 Updated by Eric Allman almost 4 years ago

Eric Allman wrote:

The problem is in the router. The log is hosted on gdp-03, and the router on gdp-03 knows about it, but that name has not been propagated to other routers (or it was propagated and then lost). I should be able to fix the immediate problem by restarting the log daemon on gdp-03, but the bug remains.

Also, gdplogd should re-advertise periodically, and advertisements should time out.

#3 Updated by Eric Allman almost 4 years ago

Another update: it looks like the router on gdp-01 is the only router that doesn't know about that log, and restarting the gdp-01 router doesn't seem to fix the problem. At this point I've restarted the routers on gdp-01 and gdp-03 and the gdplogd on gdp-03 (twice) and the problem persists.

#4 Updated by Eric Allman almost 4 years ago

To make things more problematic, the other log (58jK2obVbOma7OwQNkgA7kuYqrEVcy4Tw5hMlREn5jY) is only accessible by the router on gdp-01. To test accessibility, use:

gdp-reader -G gdp-$r.eecs.berkeley.edu -n 1 $log

where $log is the name of the log and $r is 01, 02, 03, or 04. The -G flags forces the client to talk to the named router.

#5 Updated by Nima Mousavi almost 4 years ago

I just tested whether I have access or not, and it seems like on some routers I receive error 600.
Is there anything I should do at this point?
Should I add gdp-04 to the EP_Param file?

Thanks.

#6 Updated by Eric Allman almost 4 years ago

  • Status changed from New to Feedback

Nima, please give it another try. I fixed a router bug late yesterday that might be the problem.

#7 Updated by Nima Mousavi almost 4 years ago

Hi,
As far as I can see, error 600 is gone. I am able to run my codes on on one of my RPis (Ubuntu Mate) and I'm yet to try Raspbian. For some weird reason, I get timeouts in the Ubuntu Swarmbox with the same code that works on RPi.

File "/home/sbuser/ContextEngine27/test/../python/ContextEngineInterface/ioClass.py", line 58, in init
self.gclHandle = gdp.GDP_GCL(self.gclName, gdp.GDP_MODE_RO)
File "/usr/lib/python2.7/dist-packages/gdp/GDP_GCL.py", line 113, in init
check_EP_STAT(estat)
File "/usr/lib/python2.7/dist-packages/gdp/MISC.py", line 159, in check_EP_STAT
raise EP_STAT_SEV_ERROR(ep_stat)
gdp.MISC.EP_STAT_SEV_ERROR: 'ERROR: Connection timed out [EPLIB:errno:110]'

#8 Updated by Eric Allman almost 4 years ago

That looks like the client can't open a connection to the GDP router. This could be because of a network configuration problem, or possibly because the swarm.gdp.routers parameter is set incorrectly. You might check /etc/ep_adm_params/gdp to see if that value is set. This could happen if the value got set to localhost at some point.

#9 Updated by Nima Mousavi almost 4 years ago

I checked that parameter before. On RPi (which works) I have the following parameters set in /usr/local/etc/ep_adm_params/gdp:
swarm.gdp.routers=gdp-01.eecs.berkeley.edu;gdp-02.eecs.berkeley.edu;gdp-03.eecs.berkeley.edu
swarm.gdp.zeroconf.enable=false

I used to have the same parameters in the /usr/local/etc/ep_adm_params/gdp on the Swarmbox as well, and I just tried adding a similar file at /etc/ep_adm_params/gdp. Still I get time out error.
Is there any way to check the swarm.gdp.routers parameter during runtime? (perhaps with the python API)

#10 Updated by Eric Allman almost 4 years ago

Assuming you have the regular client software installed, please capture the output of

gdp-reader -n 1 -D50 edu.berkeley.cs.eric.test.00

This should work, but I'm guessing it will give you the "Connection timed out" error again. Please forward the debug output (you can attach it as a file to this issue report).

#11 Updated by Nima Mousavi almost 4 years ago

I am receiving a new error on my RPi (that was previously working).
I have attached the log for both systems. On swarmbox I have timeout error, and on RPi I get assertion error.

#12 Updated by Eric Allman almost 4 years ago

The gdplogds on gdp-01 and gdp-04 seem to have hung, hence the timeouts. It was hard to restart on both systems: I ultimately had to use kill -9.

Nima, let's try one more time. I can see I'm working over the weekend. You can ignore the rest of this text; it's for reference.

Last log information from gdp-01:

2016-10-07 17:46:55.206126 get_open_handle: opening eQW7QifD5vMNK55KOupmrwq5K7zkwJEExdGhB4dCksM
2016-10-07 17:46:55.206232 get_open_handle: opening Xoc5ieK1prdAWKTjoN4pOmXk9lIs7CF6nGf5lgIuhXM
2016-10-07 17:46:55.491205 *** Error in `/usr/sbin/gdplogd': double free or corruption (!prev): 0x00007f1cf404bc90 ***
2016-10-07 17:46:58.357339 _gdp_advertise => OK
2016-10-07 17:46:58.357373 logd_advertise_all => OK
2016-10-07 17:47:28.356868 _gdp_advertise => OK
2016-10-07 17:47:28.356904 logd_advertise_all => OK
2016-10-07 17:47:58.363473 _gdp_advertise => OK
2016-10-07 17:47:58.363509 logd_advertise_all => OK
2016-10-07 17:48:28.365047 _gdp_advertise => OK
2016-10-07 17:48:28.365078 logd_advertise_all => OK
2016-10-07 17:48:58.356869 _gdp_advertise => OK
2016-10-07 17:48:58.356899 logd_advertise_all => OK
2016-10-07 17:49:28.356309 _gdp_advertise => OK
2016-10-07 17:49:28.356341 logd_advertise_all => OK
2016-10-07 17:49:58.361279 _gdp_advertise => OK
2016-10-07 17:49:58.361311 logd_advertise_all => OK
2016-10-07 17:50:28.364872 _gdp_advertise => OK
2016-10-07 17:50:28.364903 logd_advertise_all => OK
2016-10-07 17:50:58.356465 _gdp_advertise => OK
2016-10-07 17:50:58.356500 logd_advertise_all => OK
2016-10-07 17:51:28.361453 _gdp_advertise => OK
2016-10-07 17:51:28.361483 logd_advertise_all => OK
2016-10-07 17:51:58.356736 2016-10-07 10:51:58.356615 -0700 gdplogd: _gdp_gcl_cache_drop: ref count 16 != 0: ABORT: invalid reference count [Berkeley:Swarm-GDP:32]
2016-10-07 17:51:58.356775
2016-10-07 17:51:58.356789 GCL@0x7f1cf0020af0: S5apbSYzgC1ebGy5XYyZnqMjQQATN4i2Z4Ji10b18Ac
2016-10-07 17:51:58.356801  iomode = 3, refcnt = 16, reqs = (nil), nrecs = 2667
2016-10-07 17:51:58.356813  flags = 0x1e<INCACHE,ISLOCKED,INUSE,DEFER_FREE>
2016-10-07 17:51:58.421791 2016-10-07 10:51:58.421661 -0700 gdplogd: _gdp_gcl_cache_drop: ref count 4 != 0: ABORT: invalid reference count [Berkeley:Swarm-GDP:32]
2016-10-07 17:51:58.421828
2016-10-07 17:51:58.421841 GCL@0x7f1d000011a0: fe41EFHar8w_23O77e0eY0PikxVKDJbq3kwqBgr0dwA
2016-10-07 17:51:58.421853  iomode = 3, refcnt = 4, reqs = (nil), nrecs = 2904732
2016-10-07 17:51:58.421865  flags = 0x1e<INCACHE,ISLOCKED,INUSE,DEFER_FREE>

Oddly, there were no errors listed in the log on gdp-04. It just stopped:

2016-10-07 20:28:57.012876 _gdp_advertise => OK
2016-10-07 20:28:57.012911 logd_advertise_all => OK
2016-10-07 20:29:19.966541 get_open_handle: opening u8Gh7ILi0kja3c-aKZi8l6JRTeA3REEEOgQu8AZoNgA
2016-10-07 20:29:27.005061 _gdp_advertise => OK
2016-10-07 20:29:27.005092 logd_advertise_all => OK
2016-10-07 20:29:57.012943 _gdp_advertise => OK
2016-10-07 20:29:57.012973 logd_advertise_all => OK
2016-10-07 20:30:01.317192 get_open_handle: opening SMbUbrFOnBYjH_XP4WxVJsATKCWXKXQ_THZ35AIH3s0
2016-10-07 20:30:27.013284 _gdp_advertise => OK
2016-10-07 20:30:27.013318 logd_advertise_all => OK
2016-10-07 20:30:41.824314 get_open_handle: opening 8qqFnqXstJFOEHuANOPzk6vJxQIZNofJ-zd9AFHtmzI
2016-10-07 20:30:57.000444 _gdp_advertise => OK
2016-10-07 20:30:57.000475 logd_advertise_all => OK
2016-10-07 20:31:27.000133 _gdp_advertise => OK
2016-10-07 20:31:27.000165 logd_advertise_all => OK
2016-10-07 20:31:56.999505 _gdp_advertise => OK
2016-10-07 20:31:56.999535 logd_advertise_all => OK
2016-10-07 20:32:26.999794 _gdp_advertise => OK
2016-10-07 20:32:26.999826 logd_advertise_all => OK

(the UTC time when I did this tail was after 22:00).

#13 Updated by Nima Mousavi almost 4 years ago

I just pulled gdp again and tried one more time.

On my RPi:
* It seems to be working, and it is logging this on my screen (repeated gazillion times) when I access log (edu.berkeley.eecs.swarmlab.device.c098e5300003):
Connection to 75.80.49.247 9999 port [tcp/*] succeeded!
* It also works correctly when I write to log (58jK2obVbOma7OwQNkgA7kuYqrEVcy4Tw5hMlREn5jY).

On the swarm box:
* Reading from log (edu.berkeley.eecs.swarmlab.device.c098e5300003) does not work, and it throws timeout error.
* subscribing to log (58jK2obVbOma7OwQNkgA7kuYqrEVcy4Tw5hMlREn5jY) that I'm writing to with my RPi, and writing to log(tCcbytv6gY0BdzvMx_JHw9ovPGwcpzvptFJiZ1k2u7Y) works correctly.
* I've attached the result of running (gdp-reader -G gdp-$r.eecs.berkeley.edu -n 1 $log) on swarm box.

#14 Updated by Rick Pratt about 2 years ago

  • Subject changed from GDP Error 600 to GDP Error 600 (net3 router)

#15 Updated by Rick Pratt about 2 years ago

  • Subject changed from GDP Error 600 (net3 router) to GDP Error 600
  • Category set to GDP Version 0

#16 Updated by Eric Allman over 1 year ago

  • Project changed from Click GDP Routers to GDP
  • Category changed from GDP Version 0 to Routing Layer

#17 Updated by Eric Allman over 1 year ago

  • Status changed from Feedback to Closed

Also available in: Atom PDF