Bug #40
gcl-create calls abort: there should be a version of _gdp_gcl_decref that does not invoke GDP_ASSERT_GOOD_GCL
0%
Description
I suspect that gdp-03.eecs.berkeley.edu might be down.
The more immediate symptom is that gcl-create Is failing with an abort.
bash-3.2$ /Users/cxh/ptII/vendors/gdp/gdp/apps/gcl-create -k none -D '*=70' -s edu.berkeley.eecs.gdp-01.gdplogd cxh.test11 >& ~/Downloads/gcl-create.log.txt Abort trap: 6 bash-3.2$
I added some fprintfs to the end of _gdp_gcl_create():
fail0: fprintf(stderr, "%s:%d:_gdp_gcl_create: fail0!\n", __FILE__, __LINE__); if (gcl != NULL) { fprintf(stderr, "%s:%d:_gdp_gcl_create: fail0, calling _gdp_gcl_decref\n", __FILE__, __LINE__); _gdp_gcl_decref(&gcl); } if (req != NULL) { fprintf(stderr, "%s:%d:_gdp_gcl_create: fail0, calling _gdp_recfref\n", __FILE__, __LINE__); _gdp_req_free(&req); } { char ebuf[100]; ep_dbg_cprintf(Dbg, 8, "Could not create GCL: %s\n", ep_stat_tostr(estat, ebuf, sizeof ebuf)); } fprintf(stderr, "%s:%d:_gdp_gcl_create: fail0, returning estat\n", __FILE__, __LINE__); return estat; }
During the run, it looks like the gcl is not null, but it is not valid?
gdp_invoke(CMD_CREATE) <<< ERROR: Operation timed out [EPLIB:errno:60] req@0x7ffe895007b0: nextrec=0, numrecs=0, chan=0x7ffe89604f10 postproc=0x0, sub_cb=0x0, udata=0x0 state=ACTIVE, stat=OK act_ts=1970-01-01T00:00:00.000000000Z flags=0x140<ALLOC_RID,ON_CHAN_LIST> GCL@0x0: NULL PDU@0x7ffe89403530: v=3, ttl=15, rsvd1=0, cmd=66=CMD_CREATE dst=m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U src=fb7vsMiCnmvxIssRD0H5RXoYa9QZeAvl9b92lkyJt5g rid=2, olen=0, chan=0x0, seqno=0 flags=0 datum=0x7ffe894039c0, recno=(none), dbuf=0x7ffe89403a40, dlen=117 ts=(none) sigmdalg=0x0, siglen=0, sig=0x0 total header=80 gdp_gcl_ops.c:283:_gdp_gcl_create: fail0! gdp_gcl_ops.c:285:_gdp_gcl_create: fail0, calling _gdp_gcl_decref _gdp_gcl_decref(0x7ffe89403f50)... Assertion failed at gdp_gcl_cache.c:466: require: (gcl) != NULL && EP_UT_BITSET(GCLF_INUSE, (gcl)->flags)
there should be a version of gdp_gcl_decref that does invoke GDPASSERT_GOOD_GCL(gcl);
Attached is the complete log of the command.
In general, having assertions that call abort make it very difficult to have robust IoT code.
Related issues
History
#1 Updated by Nitesh Mor almost 7 years ago
- Assignee set to Eric Allman
Christopher Brooks wrote:
I suspect that gdp-03.eecs.berkeley.edu might be down.
gdp-03 is working fine (just checked). It's issues with GDP router. gcl-create
works okay, if you connect directly to the router on gdp-03
. The following works fine
$ ./gcl-create -D*=10 -k none -G gdp-03.eecs.berkeley.edu -s edu.berkeley.eecs.gdp-03.gdplogd edu.berkeley.eecs.mor.aug14.test1
But going to gdp-03
via gdp-01
results in the assertion error:
$ ./gcl-create -D*=10 -k none -s edu.berkeley.eecs.gdp-03.gdplogd edu.berkeley.eecs.mor.aug14.test2 _gdp_chan_open(gdp-01.eecs.berkeley.edu; gdp-02.eecs.berkeley.edu; gdp-03.eecs.berkeley.edu; gdp-04.eecs.berkeley.edu) Trying gdp-01.eecs.berkeley.edu Talking to router at gdp-01.eecs.berkeley.edu:8007 _gdp_advertise => OK gdp_init: OK Couldn't open GCL 0aZ_mffIWtEZQ6CLTpEsA2oQ9a0b54aMgKYxI0ZUePc: ERROR: 600 no route available [Berkeley:Swarm-GDP:600] Creating log as mor@io _gdp_req_unsend: req 0x191fef0 has NULL GCL _gdp_req_unsend: req 0x191fef0 has NULL GCL Assertion failed at gdp_gcl_cache.c:466: require: (gcl) != NULL && EP_UT_BITSET(GCLF_INUSE, (gcl)->flags) Aborted
Regardless, the assertion error still stands.
#2 Updated by Anonymous almost 7 years ago
Is gdp-03 up? I'm getting failures from both Mac OS and RHEL:
~~~
bash-3.2$ ./gcl-create -D*=10 -k none -G gdp-03.eecs.berkeley.edu -s edu.berkeley.eecs.gdp-03.gdplogd edu.berkeley.eecs.cxh.aug15.test2
gdp_chan_open(gdp-03.eecs.berkeley.edu)
Trying gdp-03.eecs.berkeley.edu
Talking to router at gdp-03.eecs.berkeley.edu:8007
gdp_advertise => OK
gdp_init: OK
Couldn't open GCL QU606BrplvT6p4HCZ5exUwhNUonhm8CyO6BQGCzrumE:
ERROR: 600 no route available [Berkeley:Swarm-GDP:600]
Creating log as cxh@terramac1.local
gdp_gcl_ops.c:250:gdp_gcl_create: create a new pseudo-GCL for the daemon so we can correlate the results
gdp_gcl_ops.c:255:gdp_gcl_create: create the request
gdp_gcl_ops.c:260:_gdp_gcl_create: send the name of the log to be created in the payload
gdpgcl_ops.c:264:_gdp_gcl_create: add the metadata to the output stream
gdpgcl_ops.c:268:gdp_gcl_create: about to call gdp_invoke()
_gdp_req_unsend: req 0x7feb12e00360 not on GCL list
gdppduproc_resp: no req for incoming response
PDU@0x7feb12d058e0:
v=2, ttl=0, rsvd1=0, cmd=240=NAKR_NOROUTE
dst=C584dLeKIDgrPdi2ZemGCp4kLXl1RDMz7FFjIK6ypNM
src=ra41V0YihrYsPkUji64k0Z_i6KMfgOQcbUmHDo4-vMs
rid=2, olen=0, chan=0x7feb12d04f50, seqno=0
flags=0
datum=0x7feb12d06020, recno=(none), dbuf=0x7feb12d060a0, dlen=0
ts=(none)
sigmdalg=0x0, siglen=0, sig=0x0
total header=80
GCL@0x7feb12f005b0: ra41V0YihrYsPkUji64k0Z_i6KMfgOQcbUmHDo4-vMs
iomode = 0, refcnt = 2, reqs = 0x0, nrecs = 0
flags = 0xa
freefunc = 0x0, gclmd = 0x0, digest = 0x0, x = 0x0
utime = 2016-08-15T22:34:26
_gdpreq_unsend: req 0x7feb12e00360 has NULL GCL
gdp_pdu_proc_resp: no req for incoming response
PDU@0x7feb12d058e0:
v=2, ttl=0, rsvd1=0, cmd=240=NAK_R_NOROUTE
dst=C584dLeKIDgrPdi2ZemGCp4kLXl1RDMz7FFjIK6ypNM
src=ra41V0YihrYsPkUji64k0Z_i6KMfgOQcbUmHDo4-vMs
rid=2, olen=0, chan=0x7feb12d04f50, seqno=0
flags=0
datum=0x7feb12d06020, recno=(none), dbuf=0x7feb12d060a0, dlen=0
ts=(none)
sigmdalg=0x0, siglen=0, sig=0x0
total header=80
GCL@0x7feb12f005b0: ra41V0YihrYsPkUji64k0Z_i6KMfgOQcbUmHDo4-vMs
iomode = 0, refcnt = 3, reqs = 0x0, nrecs = 0
flags = 0xa
freefunc = 0x0, gclmd = 0x0, digest = 0x0, x = 0x0
utime = 2016-08-15T22:34:41
gdp_req_unsend: req 0x7feb12e00360 has NULL GCL
gdp_gcl_ops.c:283:gdp_gcl_create: fail0!
gdp_gcl_ops.c:285:gdp_gcl_create: fail0, calling gdp_gcl_decref
gdpgclops.c:289:_gdp_gcl_create: fail0, calling _gdprecfref
Could not create GCL: ERROR: Operation timed out [EPLIB:errno:60]
gdp_gcl_ops.c:299:gdp_gcl_create: fail0, returning estat
gdp_api.c:313:gdp_gcl_create: _gdp_gcl_create() returned
gdp_api.c:318:gdp_gcl_create: about to print estat
gdp_gcl_create: ERROR: Operation timed out [EPLIB:errno:60]
exiting with status ERROR: Operation timed out [EPLIB:errno:60]
bash-3.2$
~~~
#3 Updated by Anonymous almost 7 years ago
- Category set to libgdp
- Priority changed from Normal to Low
I can't reproduce this, it was probably because some of the daemons were not working, or else there was a fix. I'm moving this to low priority. It could be closed, though the bug might still exist. I don't seem to have the ability to close bugs?
#4 Updated by Anonymous almost 7 years ago
- Assignee changed from Eric Allman to Anonymous
Assigning this to myself to see if I can get a close button
#5 Updated by Anonymous almost 7 years ago
- Status changed from New to Closed
- Assignee changed from Anonymous to Eric Allman
I'm going to close this because I can no longer reproduce it. The bug probably still persists though.
#6 Updated by Anonymous almost 7 years ago
I can't reopen this, but it is happening again:
bash-3.2$ $PTII/vendors/gdp/gdp/apps/gcl-create -D*=10 -k none -G gdp-01.eecs.berkeley.edu -s edu.\ berkeley.eecs.gdp-03.gdplogd $LOGNAME.5 $PTII/vendors/gdp/gdp/apps/gcl-create -D*=10 -k none -G gdp-01.eecs.berkeley.edu -s edu.berkeley.e\ ecs.gdp-03.gdplogd $LOGNAME.5 _gdp_lib_init(NULL) @(#)libgdp 0.7.0 (2016-08-22 09:08) 88929fffe1c744d41409d42683e4760b25d2102e My GDP routing name = _TdAJ6UhFwCjwkxFyEWeZbYrSYukss1su42ED0sTZRU gdp_lib_init: OK _gdp_chan_open(gdp-01.eecs.berkeley.edu) Trying gdp-01.eecs.berkeley.edu _gdp_chan_open: talking to router at gdp-01.eecs.berkeley.edu:8007 _gdp_advertise => OK gdp_init: OK Couldn't open GCL ZljkBm2KG9yw17pzPnRg8GEC9owAJWXx4DtQi30Cdfc: ERROR: 600 no route available [Berkeley:Swarm-GDP:600] Creating log as cxh@terramac1.local _gdp_req_unsend: req 0x7fd2c1408a80 has NULL GCL _gdp_req_unsend: req 0x7fd2c1408a80 has NULL GCL Assertion failed at gdp_gcl_cache.c:466: require: (gcl) != NULL && EP_UT_BITSET(GCLF_INUSE, (gcl)->flags) Abort trap: 6 bash-3.2$
#7 Updated by Eric Allman almost 7 years ago
- Status changed from Closed to In Progress
#8 Updated by Eric Allman over 6 years ago
- Status changed from In Progress to Resolved
I believe this is related to (or the same as) issue #83, which has been fixed.
#9 Updated by Eric Allman over 6 years ago
- Related to Bug #83: Assertions should not crash the calling process: assertion in gdp_gcl_close() causes the application to exit added
#10 Updated by Eric Allman about 6 years ago
- Status changed from Resolved to Closed
Closed due to duplication (#83).