Project

General

Profile

Bug #75

Subscription times out

Added by Nitesh Mor almost 4 years ago. Updated almost 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
gdplogd
Start date:
10/15/2016
Due date:
% Done:

0%


Description

As reported in #74, #55 and an email from Rico, it looks like there is a common bug, which is: subscriptions going away is no data is received for a while. Here's one way to reproduce it:

  1. Create a log logname using gcl-create
  2. Start a gdp-writer logname process and a separate gdp-reader -s -t logname
  3. Append some data using the running gdp-writer process reading from STDIN. It should appear in the gdp-reader process maintaining a subscription.
  4. Leave the processes running for a while and go drink some coffee.
  5. Come back and append some more data using the gdp-writer. Ideally it should be delivered to gdp-reader subscription, but no data is delivered.

Attached are log files from server, writer and subscriber. Note that this is in a setup with the older Python based router, so no issues with the router here (as noted in #74).

dead-subscription-subscriber.log Magnifier (43.3 KB) Nitesh Mor, 10/15/2016 03:56 PM

dead-subscription-writer.log Magnifier (44.1 KB) Nitesh Mor, 10/15/2016 03:56 PM

dead-subscription-server.log Magnifier (50.5 KB) Nitesh Mor, 10/15/2016 03:56 PM

new_clone_output.txt Magnifier (26.3 KB) Rico Maestro, 11/12/2016 02:11 AM

old_clone_output.txt Magnifier (31.6 KB) Rico Maestro, 11/12/2016 02:11 AM


Related issues

Related to GDP - Bug #74: Asyn Write/Subscribe Closed 10/14/2016
Related to GDP - Bug #55: GDP log subscribe hangs Closed 09/26/2016

History

#1 Updated by Nitesh Mor almost 4 years ago

  • Related to Bug #74: Asyn Write/Subscribe added

#2 Updated by Nitesh Mor almost 4 years ago

  • Related to Bug #55: GDP log subscribe hangs added

#3 Updated by Nitesh Mor almost 4 years ago

  • Category set to gdplogd
  • Status changed from New to In Progress

I have been able to narrow down the bug to subscription renewals in gdplogd.

Specifically, any post processing code assigned in cmd_subscribe in source:gdplogd/logd_proto.c never gets called in source:gdp/gdp_main.c#L213. See source:gdplogd/logd_proto.c#L911, req gets assigned to a new value in cmd_subscribe, but the caller (gdp_pdu_proc_cmd) still has the old reference.

#4 Updated by Nitesh Mor almost 4 years ago

  • Status changed from In Progress to Feedback

Hopefully fixed in commit:7bf6e9b2. It is a server side bug, and still needs to be deployed to production infrastructure.

#5 Updated by Rico Maestro almost 4 years ago

Hi Nitesh,

I have compiled the latest version of the gdplogd codes from the repo and tried if the bug is already fixed. The subscription still goes away after some time.

I tried using an old version of the client (cloned last Feb 2016) and in that version, the subscription does not go away. I've noticed some differences when the client is renewing the subscription (I'm assuming it is doing that, please correct me if I'm wrong). I've run the two lang/python/apps/reader_test_subscribe.py simultaneously. Portions of the debug outputs are shown below. This is after writing four times in the log, which the two programs were able to catch and print.

Debug output of latest clone:

*** Processing ack/nak 128=ACK_SUCCESS from socket 6
gdp_pdu_proc_resp(0x7f8651454f50 ACK_SUCCESS) gcl 0x7f8651554750
gdp_pdu_proc_resp: searching gcl 0x7f8651554750 for rid 2
gdp_req_dispatch >>> ACK_SUCCESS (128) [gcl->refcnt 9], req@0x7f8651454180:
nextrec=0, numrecs=0, chan=0x7f86514537d0
postproc=0x0, sub_cb=0x0, udata=0x0
state=WAITING, stat=OK [67896520 = 0x40c04c8]
act_ts=2016-11-12T09:51:20.479507000Z
flags=0x1d4
GCL@0x7f8651554750: F7Tz7h_UoOCrUt2JV55UvJuyQHlAyzSJsYcawwRf66c
iomode = 1, refcnt = 9, reqs = 0x7f8651454180, nrecs = 475
flags = 0xa
PDU@0x7f8651521c70:
v=3, ttl=15, rsvd1=0, cmd=72=CMD_SUBSCRIBE
dst=F7Tz7h_UoOCrUt2JV55UvJuyQHlAyzSJsYcawwRf66c
src=KAR3dp3v4hwbGif0-trmRwadHj0SpD3VdZGVKCgw5NQ
rid=2, olen=8, chan=0x7f86514537d0, seqno=0
flags=0x2
datum=0x7f8651454290, recno=476, dbuf=0x7f8651454310, dlen=4
ts=(none)
sigmdalg=0x0, siglen=0, sig=0x0
total header=96
rPDU@0x7f8651454f50:
v=3, ttl=15, rsvd1=0, cmd=128=ACK_SUCCESS
dst=KAR3dp3v4hwbGif0-trmRwadHj0SpD3VdZGVKCgw5NQ
src=F7Tz7h_UoOCrUt2JV55UvJuyQHlAyzSJsYcawwRf66c
rid=2, olen=8, chan=0x7f86514537d0, seqno=0
flags=0x2
datum=0x7f8651549f30, recno=476, dbuf=0x7f8651453c00, dlen=0
ts=(none)
sigmdalg=0x0, siglen=0, sig=0x0
total header=96
ack
success: received ACK_SUCCESS for CMD_SUBSCRIBE
gdp_req_dispatch <<< ACK_SUCCESS [gcl->refcnt 9]
OK [67896520 = 0x40c04c8]
<<< gdp_invoke(0x7f8651454180 rid=2) CMD_SUBSCRIBE: OK [67896520 = 0x40c04c8]
req@0x7f8651454180:
nextrec=0, numrecs=0, chan=0x7f86514537d0
postproc=0x0, sub_cb=0x0, udata=0x0
state=ACTIVE, stat=OK [67896520 = 0x40c04c8]
act_ts=2016-11-12T09:51:20.479602000Z
flags=0x1d7<ASYNCIO,DONE,CLT
SUBSCR,PERSIST,ALLOC_RID,ON_GCL_LIST,ON_CHAN_LIST>
GCL@0x7f8651554750: F7Tz7h_UoOCrUt2JV55UvJuyQHlAyzSJsYcawwRf66c
iomode = 1, refcnt = 9, reqs = 0x7f8651454180, nrecs = 475
flags = 0xa
PDU@0x7f8651454f50:
v=3, ttl=15, rsvd1=0, cmd=128=ACK
SUCCESS
dst=KAR3dp3v4hwbGif0-trmRwadHj0SpD3VdZGVKCgw5NQ
src=F7Tz7h_UoOCrUt2JV55UvJuyQHlAyzSJsYcawwRf66c
rid=2, olen=8, chan=0x7f86514537d0, seqno=0
flags=0x2
datum=0x7f8651454290, recno=476, dbuf=0x7f8651454310, dlen=0
ts=(none)
sigmdalg=0x0, siglen=0, sig=0x0
total header=96

subscr_resub(F7Tz7h_UoOCrUt2JV55UvJuyQHlAyzSJsYcawwRf66c) ->
OK [67896520 = 0x40c04c8]

Debug output of clone from Feb 2016:

*** Processing ack/nak 128=ACK_SUCCESS from socket 6
gdp_req_dispatch >>> ACK_SUCCESS (128), req@0x7fab31480b80:
nextrec=0, numrecs=0, chan=0x7fab31518480
postproc=0x0, sub_cb=0x0, udata=0x0
state=WAITING, stat=OK [67896520 = 0x40c04c8]
act_ts=2016-11-12T09:51:19.020515000Z
flags=0x194
GCL@0x7fab31441ac0: F7Tz7h_UoOCrUt2JV55UvJuyQHlAyzSJsYcawwRf66c
iomode = 1, refcnt = 8, reqs = 0x7fab31480b80, nrecs = 479
PDU@0x7fab31487d50:
v=3, ttl=15, rsvd1=0, cmd=72=CMD_SUBSCRIBE
dst=F7Tz7h_UoOCrUt2JV55UvJuyQHlAyzSJsYcawwRf66c
src=pm2VIFGXIG8LeVOHFFznjawUjbpte0CsbDuF_GJeGtM
rid=2, olen=8, chan=0x7fab31518480, seqno=0
flags=0x2
datum=0x7fab3146e520, recno=480, dbuf=0x7fab3151c540, dlen=4
ts=(none)
sigmdalg=0x3, siglen=78, sig=0x7fab315219e0
total header=96
rPDU@0x7fab31519890:
v=3, ttl=15, rsvd1=0, cmd=128=ACK_SUCCESS
dst=pm2VIFGXIG8LeVOHFFznjawUjbpte0CsbDuF_GJeGtM
src=F7Tz7h_UoOCrUt2JV55UvJuyQHlAyzSJsYcawwRf66c
rid=2, olen=8, chan=0x7fab31518480, seqno=0
flags=0x2
datum=0x7fab3147e290, recno=480, dbuf=0x7fab3148b280, dlen=0
ts=(none)
sigmdalg=0x3, siglen=78, sig=0x7fab315219e0
total header=96
ack
success: received 72
gdp_req_dispatch <<< ACK_SUCCESS
OK [67896520 = 0x40c04c8]
gdp_invoke(CMD_SUBSCRIBE) <<< OK [67896520 = 0x40c04c8]
req@0x7fab31480b80:
nextrec=0, numrecs=0, chan=0x7fab31518480
postproc=0x0, sub_cb=0x0, udata=0x0
state=ACTIVE, stat=OK [67896520 = 0x40c04c8]
act_ts=2016-11-12T09:51:19.020633000Z
flags=0x196
GCL@0x7fab31441ac0: F7Tz7h_UoOCrUt2JV55UvJuyQHlAyzSJsYcawwRf66c
iomode = 1, refcnt = 8, reqs = 0x7fab31480b80, nrecs = 479
PDU@0x7fab31519890:
v=3, ttl=15, rsvd1=0, cmd=128=ACK
SUCCESS
dst=pm2VIFGXIG8LeVOHFFznjawUjbpte0CsbDuF_GJeGtM
src=F7Tz7h_UoOCrUt2JV55UvJuyQHlAyzSJsYcawwRf66c
rid=2, olen=8, chan=0x7fab31518480, seqno=0
flags=0x2
datum=0x7fab3146e520, recno=480, dbuf=0x7fab3148b280, dlen=0
ts=(none)
sigmdalg=0x3, siglen=78, sig=0x7fab315219e0
total header=96

subscr_resub(F7Tz7h_UoOCrUt2JV55UvJuyQHlAyzSJsYcawwRf66c) ->
OK [67896520 = 0x40c04c8]

After this part, I wrote in the log again and the program from the latest clone was not able to get the newly appended data.

Some differences I've noticed (although I don't know if these are actually intended due to updates):
1. The "recno" did not change in the latest program even after adding additional data on the log.
2. The "flags" are not the same.
3. The fields "sigmdalg", "siglen" and "sig" are zero in the latest program.

Hope this helps.

#6 Updated by Eric Allman almost 4 years ago

  • Assignee changed from Nitesh Mor to Eric Allman

Rico, I'm not able to reproduce this on my latest code version (and yes, there have been some changes to the way subscriptions are processed). Please run the command

strings gdplogd | grep '@(#)[gl]'

You should get output that looks something like:

@(#)gdplogd 0.7.2 (2016-11-11 13:39) 029cf0078995be366ca1e4c2b1d8d8f9e5322bc6++
@(#)libgdp 0.7.2 (2016-11-11 13:40) 029cf0078995be366ca1e4c2b1d8d8f9e5322bc6++

The magic numbers are the git revision. These will let me determine what version you are running.

#7 Updated by Rico Maestro almost 4 years ago

Hi Eric. Here is the output of the command in our server.

@(#)gdplogd 0.7.2 (2016-11-12 11:53) 4f976ad9f75dad291b4f4516b5ddee4025503957
@(#)libgdp 0.7.2 (2016-11-12 11:53) 4f976ad9f75dad291b4f4516b5ddee4025503957

#8 Updated by Eric Allman almost 4 years ago

I'm completely stumped — I can't reproduce this at all, at least using the current code (I tried both the latest gdplogd and the ones running on our public servers). It's possible that older clients are broken, but Rico seems to be running the latest version of the code. I'll keep looking, but I'm running out of ideas.

#9 Updated by Rico Maestro almost 4 years ago

Hi Eric,

After rereading the documentations for installing and running the log server, and checking if I've followed them properly, I think I've found my mistake. I followed everything in the documents. However, since we're migrating to a new hardware, I decided to copy the folders in /var/swarm/gdp/gcls from our old log serve. The problem is that I forgot to change the owner and the permissions for those folders. And since the GCL I'm using for testing is an old GCL, it is in the folders with wrong permissions. After changing everything to the correct owner and permissions, the subscriptions are now functioning properly. Apologies for this mistake and thanks for your patience in this matter.

#10 Updated by Eric Allman almost 4 years ago

  • Status changed from Feedback to Closed

Rico, I am relieved that the problem's been tracked down. Good sleuthing job, and thanks for letting us know.

Also available in: Atom PDF