Project

General

Profile

Bug #77

GDP Log Server Crashes when we end subscription

Added by Syed Hasnain over 3 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
10/16/2016
Due date:
% Done:

0%


Description

If we end the gdp-reader process after subscribing to log using ctrl+c, the log server/GDP stops working. If multiple programs subscribe to a log and only one does that, same result. Are we not supposed to exit a process that way ?

The writer part hangs and displays this error message

Assertion failed at gdp_gcl_ops.c:514: require:
(gcl) != NULL && EP_UT_BITSET(GCLF_INUSE, (gcl)->flags)
Aborted (core dumped)

output.txt Magnifier (35.3 KB) Syed Hasnain, 10/16/2016 05:20 PM

router.txt Magnifier (3.88 KB) Syed Hasnain, 10/17/2016 01:36 PM

gdplogd-log.txt Magnifier (458 KB) Syed Hasnain, 10/17/2016 01:36 PM


Related issues

Duplicated by GDP - Bug #78: A routing failure causes gdplogd to become non-responsive. Closed 10/17/2016

History

#1 Updated by Nitesh Mor over 3 years ago

I am sorry, you have to be more specific. It is most certainly an annoying bug, but I can not reproduce it as it is.

  • What version of code are you running?
  • What do you mean when you say "log server/GDP" stops working?
  • What steps can one take in order to reproduce the said bug?

I hate to say it, but please read this write-up on how to report bugs effectively.

#2 Updated by Syed Hasnain over 3 years ago

  1. If the release notes are correct it seems I am running release 0.4.0. However I did a pull from repo earlier today. Is there some other way to check the version?

  2. The writer program is writing to multiple logs, once multiple clients subscribe to them, we can read the data correctly. However, when one client ends the subscription by killing the process, the other client sees one new record on that log, before everything seems to hang. The writer can't write from there on either. To fix the problem we have to restart the gdplogd service.

  3. To reproduce the problem, start gdp-writer. Start a client program to read from that log (in our case always on different devices), kill the subscription of one client. The writer program hangs and displays this error:
    Assertion failed at gdp_gcl_ops.c:514: require:
    (gcl) != NULL && EP_UT_BITSET(GCLF_INUSE, (gcl)->flags)
    Aborted (core dumped)

#3 Updated by Nitesh Mor over 3 years ago

If you are running version 0.4.0, that would be severely outdated. Since you are running from the source code, a git commit number would be helpful.

In the source repository, git show will show you the commit number.

$ git show
commit 7bf6e9b20d68d4962be0a18805ba1e53b34939ab
Author: Nitesh Mor <mor.nitesh@gmail.com>
Date:   Sat Oct 15 22:32:24 2016 -0700
...

In order to make sure that your binaries are compiled from that specific code, you can pass an additional command line argument -D*=10 (or similar). For instance, if you run gdp-reader -D *=10 <remaining arguments>, the first two lines will tell you the git commit. See for an example (the long nonsensical string is the git commit):

_gdp_lib_init(NULL)
    @(#)libgdp 0.7.2 (2016-10-16 15:01) 7bf6e9b20d68d4962be0a18805ba1e53b34939ab++

In case of log-server, such commit number will in /var/log/gdp/gdplogd.log right after you restart the log-server (assuming you are already running it with something like D*=xx. One way to check this would be to look into the output of ps -ef | grep gdplogd.

If you find out the version of the code on all devices that are interacting (readers, writers, log-servers), that will be helpful.

I can not seem to reproduce this with multiple subscribers either (I am on commit 7bf6e9b20d68d4962be0a18805ba1e53b34939ab). The reason I believe you are running an older version of code somewhere is that the the assertion failure was fixed in one of the recent commits.

#4 Updated by Nitesh Mor over 3 years ago

I just saw the gdplogd.log file in #76. You are definitely running an older version of log-server (git commit 2023d77857ad557a53cf8d9b044ed94e7c0228f2). Eric and I have pushed some fixes for the log-server after that. This makes me believe that your clients aren't up to date either.

Let's make sure you are running the latest version of code on your server and clients. Please report back if you can still reproduce this bug after updating.

#5 Updated by Syed Hasnain over 3 years ago

Ok that might be the problem. The client side on our swarmboxes seem to be updated.
For gdp-reader

_gdp_lib_init(NULL)
        @(#)libgdp 0.7.2 (2016-10-16 17:26) 7bf6e9b20d68d4962be0a18805ba1e53b34939ab++

However, the client side on Viswam's laptop was not, I have updated that.

For the server side, I did git pull in top directory followed by make install

sbuser@TamuSB003:~/gdp$ git show
commit 7bf6e9b20d68d4962be0a18805ba1e53b34939ab
Author: Nitesh Mor <mor.nitesh@gmail.com>
Date:   Sat Oct 15 22:32:24 2016 -0700

However, when I restart the service the log still sees the same old commit 
2016-10-16 23:43:10.530842 2016-10-16 16:43:10 -0700 Running /usr/sbin/gdplogd -D50
2016-10-16 23:43:10.532486 _gdp_lib_init(NULL)
2016-10-16 23:43:10.532500      @(#)libgdp 0.7.2 (2016-10-14 07:32) 2023d77857ad557a53cf8d9b044ed94e7c0228f2
2016-10-16 23:43:10.532757 Setting my name:
2016-10-16 23:43:10.532767      edu.tamu.TamuSB003.gdplogd
2016-10-16 23:43:10.532770      GFh8bl8BQXeuvUYHu-z7mhS7mRMdKj2bmUvaXkwcjMM
2016-10-16 23:43:10.532773 My GDP routing name = GFh8bl8BQXeuvUYHu-z7mhS7mRMdKj2bmUvaXkwcjMM

Is there any step in between that I am missing?

Thank you for all the help Nitesh, this seems to be a naive issue, probably at my end.

#6 Updated by Nitesh Mor over 3 years ago

Let's try updating the log-server one more time. Assuming you are in the top directory of gdp source, can you do the following and post the output on the console:

$ sudo service gdplogd stop
$ sudo killall -9 gdplogd    # Just to make sure it is not running
$ sudo sh -c 'echo "GDPLOGD_ARGS=-D50" > /usr/local/etc/gdp.conf.sh'
$ git pull
$ git describe
$ sudo make install
$ sudo service gdplogd start
$ ps -ef | grep gdplogd
$ grep -A 2 -B 2 _gdp_lib_init /var/log/gdp/gdplogd.log

#7 Updated by Syed Hasnain over 3 years ago

It doesn't seem to work. Even after doing that it seems I am running the server still at commit

2016-10-17 00:16:14.151889 2016-10-16 17:16:14 -0700 Running /usr/sbin/gdplogd -D50
2016-10-17 00:16:14.154347 _gdp_lib_init(NULL)
2016-10-17 00:16:14.154365      @(#)libgdp 0.7.2 (2016-10-14 07:32) 2023d77857ad557a53cf8d9b044ed94e7c0228f2
2016-10-17 00:16:14.154717 Setting my name:
2016-10-17 00:16:14.154731      edu.tamu.TamuSB003.gdplogd
2016-10-17 00:16:14.154736      GFh8bl8BQXeuvUYHu-z7mhS7mRMdKj2bmUvaXkwcjMM
2016-10-17 00:16:14.154740 My GDP routing name = GFh8bl8BQXeuvUYHu-z7mhS7mRMdKj2bmUvaXkwcjMM
2016-10-17 00:16:14.154791 gdp_lib_init: OK

Additionally the result of git show is :

sbuser@TamuSB003:~/gdp$ git show
commit 7bf6e9b20d68d4962be0a18805ba1e53b34939ab
Author: Nitesh Mor <mor.nitesh@gmail.com>
Date:   Sat Oct 15 22:32:24 2016 -0700
......

Attached is the complete output as on console.

#8 Updated by Nitesh Mor over 3 years ago

Ok, let us try one more time. At least it is now running with higher debug setting. Could you run the following?

$ sudo service gdplogd stop
$ sudo killall -9 gdplogd        # Just to make sure it is not running
$ strings gdplogd/gdplogd    # Check if we the binary gets rebuilt at all or not
$ make clean                        # Just clean up everything
$ make                                  # this will make everything from scratch
$ strings gdplogd/gdplogd | grep libgdp              # make sure we have the newest version
$ sudo make install
$ sudo service gdplogd start
$ ps -ef | grep gdplogd
$ grep -A 2 -B 2 _gdp_lib_init /var/log/gdp/gdplogd.log

Thanks for your patience.

#9 Updated by Nitesh Mor over 3 years ago

  • Assignee set to Nitesh Mor

#10 Updated by Nitesh Mor over 3 years ago

Just checking in: did updating to the latest version of code solve your issue?

#11 Updated by Syed Hasnain over 3 years ago

Yes I did, the log for gdplogd confirms that It is using the latest commit. However, the problem seems to be there. Let me send you the logs and details in a short while. Just making sure everything is ok on my side

#12 Updated by Syed Hasnain over 3 years ago

gdplogd was updated to latest commit. I still see the problem. The gdp-reader is also the latest commit. gdp-writer is modified version of the original code that reads input from udp port. I have the following issue now.

I start the program to write to GDP (This program alone works correctly). The program writes to multiple logs. Separately I start the gdp-reader with subscription mode (-s flag). After reading certain data if I exit by killing process (ctrl + c), a few things happen. gdp-writer hangs for a certain time, doesn't read any data, after that I read data but I am not sure if it is being written to log. If I restart the program, it stays hung at opening first log and resturns:

sbuser@swarmnuc015:~/gdp/apps$ sudo ./gdp-writer 4570         #The input argument is the port number 
[sudo] password for sbuser:

Opening log tamu.watch.2KMX:
2016-10-17 15:43:16.809999 -0500 gdp-writer: _gdp_req_freeall: couldn't acquire req lock: ERROR: request freed while in use [Berkeley:Swarm-GDP:31]

gdp-writer: exiting with status ERROR: Connection timed out [EPLIB:errno:110]

If I try to subscribe to the log again, while gdp-writer is till running I can't. If I try to read log simply from beginning I can't. The gdp-reader returns:

sbuser@swarmnuc015:~$ gdp-reader -D *=10 tamu.watch.2KMX
_gdp_lib_init(NULL)
        @(#)libgdp 0.7.2 (2016-10-17 11:07) 7bf6e9b20d68d4962be0a18805ba1e53b349                                                                                                             39ab
My GDP routing name = -gDL2gYy5PJbdXXdvSvqGST7K_5P72mlM2lTFZaXbnQ
gdp_lib_init: OK
Trying Zeroconf:
: Cache exhausted
: All for now
_gdp_chan_open(165.91.121.43)
Trying 165.91.121.43
_gdp_chan_open: talking to router at 165.91.121.43:8007
gdp_init: OK
Reading GCL 107OxcL32wGOxpz7Q_NVoJ2kQMIC93pP-Is584lzODE

>>> _gdp_invoke(req=0xb14a80 rid=1): CMD_OPEN_RO (68), gcl@0xb13490
<<< _gdp_invoke(0xb14a80 rid=1) CMD_OPEN_RO: ERROR: Connection timed out [EPLIB:errno:110]
Couldn't open GCL 107OxcL32wGOxpz7Q_NVoJ2kQMIC93pP-Is584lzODE:
        ERROR: Connection timed out [EPLIB:errno:110]
2016-10-17 15:46:26.022126 -0500 gdp-reader: _gdp_req_freeall: couldn't acquire req lock: ERROR: request freed while in use [Berkeley:Swarm-GDP:31]

<<< _gdp_req_freeall(0xb134d0): ERROR: request freed while in use [Berkeley:Swarm-GDP:31]
<<< gdp_gcl_open(107OxcL32wGOxpz7Q_NVoJ2kQMIC93pP-Is584lzODE): ERROR: Connection timed out [EPLIB:errno:110]
[ERROR] gdp-reader: Cannot open GCL:
    ERROR: Connection timed out [EPLIB:errno:110]
exiting after 0 records with status ERROR: Connection timed out [EPLIB:errno:110]

To fix the issue I have to restart the gdplogd service

I have attached the corresponding logs.

By the way, do you recommend that I switch to Berkeley setup for now? If it is a problem at my side, I don't want to crash your system too.

#13 Updated by Nitesh Mor over 3 years ago

  • Status changed from New to In Progress

#14 Updated by Syed Hasnain over 3 years ago

just checking if you were able to reproduce it?

#15 Updated by Nitesh Mor over 3 years ago

Yes, I can reproduce it now. What I see different in your logs that I didn't see earlier on my side was a NAK_R_NOROUTE, primarily because I use a different version of the router for testing and development. It appears that the behavior of two routers is different, which triggered this bug for you but not for me.

#16 Updated by Nitesh Mor over 3 years ago

  • Duplicated by Bug #78: A routing failure causes gdplogd to become non-responsive. added

#17 Updated by Nitesh Mor over 3 years ago

A more accurate description of the problem in #78

#18 Updated by Eric Allman over 3 years ago

  • Status changed from In Progress to Resolved

It looks like this is the same problem as #78, which is closed. Syed, could you see if this is fixed? You may need to run it against the latest version of gdplogd (our servers are still a bit behind). In the meantime I'm marking this as "Resolved" (but not Closed).

#19 Updated by Eric Allman about 3 years ago

  • Status changed from Resolved to Closed

Closing because no further comments have arrived.

Also available in: Atom PDF