Feature #81
Need a way to unsubscribe from subscriptions
100%
Description
I think we may need a way to unsubscribe from a subscription.
gdplogd/logd_proto.c invokes a private _gdp_gcl_unsubscribe(), shouldn't gdp-reader.c?
It looks like gdp-reader is leaking memory when subscribing, part of which could be from not closing the subscription.
Also, how do I add Edward to the list of interested people for this issue
Details below.
Edward wrote:
This subscription demo now works:
$PTII/org/terraswarm/accessor/demo/WatchHandRaisingLight/GDPSubscribe.xml
One thing you could help with would be to figure out how to properly
shut down a GDP subscription.GDPHelper has this:
public void unsubscribe() {
// FIXME: Properly close the C side. How to do that?
_subscribed = false;
}The subscription keeps running after the model stops, which is a
problem. Maybe the cause of some deadlocks I've seen...
gdp/doc/gdp-programmatic-api.html says:
Name
gdp_gcl_unsubscribe — Unsubscribe from a GCL [NOT IMPLEMENTED; this interface is probably wrong]
SynopsisEP_STAT gdp_gcl_unsubscribe(gdp_gcl_t *gcl,
gdp_event_cbfunc_t *cbfunc,
void *udata)Notes
Unsubscribes to the given {gcl, cbfunc, udata} tuple.
(Should subscribe/multiread return a handle to be passed to this function rather than this interface?)
There is no public gdp_gcl_unsubscribe, but .gdp/gdp_subscr.c has:
/* ** Unsubscribe all requests for a given gcl and destination. */ EP_STAT _gdp_gcl_unsubscribe( gdp_gcl_t *gcl, gdp_name_t dest) { EP_STAT estat; gdp_req_t *req; gdp_req_t *nextreq; do { estat = EP_STAT_OK; for (req = LIST_FIRST(&gcl->reqs); req != NULL; req = nextreq) { estat = _gdp_req_lock(req); EP_STAT_CHECK(estat, break); nextreq = LIST_NEXT(req, gcllist); if (!GDP_NAME_SAME(req->pdu->dst, dest)) { _gdp_req_unlock(req); continue; } // remove subscription for this destination EP_ASSERT_INVARIANT(req->gcl == gcl); LIST_REMOVE(req, gcllist); _gdp_req_free(&req); } } while (!EP_STAT_ISOK(estat)); return estat; }
It looks like gdp/gdplogd/logd_proto.c has a cmd_close() function that invokes _gdp_gcl_unsubscribe():
/* ** CMD_CLOSE --- close an open GCL ** ** XXX Since GCLs are shared between clients you really can't just ** close things willy-nilly. Thus, this is currently a no-op ** until such time as reference counting works. ** ** XXX We need to have a way of expiring unused GCLs that are not ** closed. */ EP_STAT cmd_close(gdp_req_t *req) { EP_STAT estat = EP_STAT_OK; req->pdu->cmd = GDP_ACK_SUCCESS; // should have no input data; ignore anything there flush_input_data(req, "cmd_close"); // a bit wierd to open the GCL only to close it again.... estat = get_open_handle(req, GDP_MODE_ANY); if (!EP_STAT_ISOK(estat)) { return gdpd_gcl_error(req->pdu->dst, "cmd_close: GCL not open", estat, GDP_STAT_NAK_BADREQ); } // remove any subscriptions _gdp_gcl_unsubscribe(req->gcl, req->pdu->src); //return number of records req->pdu->datum->recno = req->gcl->nrecs; if (ep_dbg_test(Dbg, 20)) { char ebuf[100]; ep_dbg_printf("<<< cmd_close(%s): %s\n", req->gcl->pname, ep_stat_tostr(estat, ebuf, sizeof ebuf)); } // drop reference _gdp_gcl_decref(&req->gcl); return estat; }
It looks like gdp-reader is leaking memory during subscription, it could be that things are not being closed.
I don't think it is worth fixing memory leaks this close to the annual meeting, but the leaks do indicate that we are not freeing resources.
To replicate this, under RHEL, I create a log ptolemy.actor.lib.jjs.modules.gdp.test.auto.GDPLogSubscribeJS.0.33254535198571245
and then ran
valgrind --leak-check=full gdp-reader -n 1 -s ptolemy.actor.lib.jjs.modules.gdp.test.auto.GDPLogSubscribeJS.0.33254535198571245
Then, on a different machine, ran
echo "bar" | $PTII/vendors/gdp/gdp/apps/gdp-writer -G gdp-03.eecs.berkeley.edu ptolemy.actor.lib.jjs.modules.gdp.test.auto.GDPLogSubscribeJS.0.33254535198571245
The output of valgrind shows:
==6724== LEAK SUMMARY: ==6724== definitely lost: 67 bytes in 3 blocks ==6724== indirectly lost: 0 bytes in 0 blocks ==6724== possibly lost: 33,923 bytes in 11 blocks ==6724== still reachable: 30,858 bytes in 640 blocks ==6724== suppressed: 0 bytes in 0 blocks ==6724== Reachable blocks (those to which a pointer was found) are not shown. ==6724== To see them, rerun with: --leak-check=full --show-reachable=yes
It looks like we are leaking around subscription creation?
==6724== 560 bytes in 1 blocks are possibly lost in loss record 634 of 646 ==6724== at 0x4A057BB: calloc (vg_replace_malloc.c:593) ==6724== by 0x344A611A82: _dl_allocate_tls (in /lib64/ld-2.12.so) ==6724== by 0x344B2072CC: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.12.so) ==6724== by 0x40FFE5: _gdp_gcl_subscribe (gdp_subscr.c:252) ==6724== by 0x405BE0: gdp_gcl_subscribe (gdp_api.c:472) ==6724== by 0x405465: do_multiread (gdp-reader.c:238) ==6724== by 0x405939: main (gdp-reader.c:499)
For your reference, I've attached the output of valgrind.
Related issues
History
#1 Updated by Nitesh Mor about 7 years ago
Christopher Brooks wrote:
Also, how do I add Edward to the list of interested people for this issue
Add 'Watchers' (right side bar).
#2 Updated by Anonymous about 7 years ago
How do I add Edward to the list of watchers? I told him to go to the page and click on the link, but that seems not reliable.
A common bug support best practice is to add a mailing list to the list of watchers for a bug.
#3 Updated by Nitesh Mor about 7 years ago
Christopher Brooks wrote:
How do I add Edward to the list of watchers? I told him to go to the page and click on the link, but that seems not reliable.
A common bug support best practice is to add a mailing list to the list of watchers for a bug.
Individuals can 'Watch' any page (bug/wiki-page/etc), or you can explicitly add someone by using the 'Add' link next to watchers, search for their name and check their name.
#4 Updated by Anonymous about 7 years ago
- File Feature81Screeshot.pdf added
I don't have an Add Watchers choice on https://gdp.cs.berkeley.edu/redmine/issues/81. I have
Issues
View all issues
Summary
I've attached a screenshot.
Anyway, I see now that gdplogd is probably calling _gdp_gcl_unsubscribe() on the server side. I'm wondering if I need something similar on the client side.
#5 Updated by Nitesh Mor about 7 years ago
Adding/removing watchers was a permissions related things. My bad, should be fixed now.
Your analysis is correct. There is no explicit command for the server to stop sending data as of now. For the moment, a poor-man's unsubscription is to call gdp_gcl_close
on the client side, which will tell the server to free up any resources (including removing any state related to subscriptions). If you need to access the same log again, you can re-open it.
#6 Updated by Anonymous about 7 years ago
- File subscribeLog.txt
added
I'm closing the gcl after the first.
I'm not sure, but It is starting to look like a deadlock.
$PTII/ptolemy/actor/lib/jjs/modules/gdp/test/auto/GDPLogSubscribe.xml is a Ptolemy model that creates a log, appends to it, reads from it and subscribes from it.
The model runs fine once.
The second time it hangs during the append.
Attached is a log. Towards the bottom is
>>> gdp_gcl_close(m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U) <<< gdp_gcl_close(m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U): OK 6992 ms. Memory: 459264K Free: 224738K (49%) GDP_GCL.close() >>> gdp_gcl_close(m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U) <<< gdp_gcl_close(m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U): OK
which shows that the first run completed.
Then we have
<<< gdp_gcl_open(m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U): OK GDPLogAppend2: Append data: 1 (Thread[GDPLogSubscribe,1,main]) GDPHelper.append(1) ep_thr_mutex_lock gdp_datum.c:58 0x186a8fd20 (&DatumFreeListMutex) [0x700001da2000] ep_thr_mutex_unlock gdp_datum.c:63 0x186a8fd20 (&DatumFreeListMutex) [0x700001da2000] gdp_datum_new => 0x7fcf71e08f60 >>> gdp_gcl_append
This is where we attempt to append a 1 to the log.
Then, not much happens
ep_thr_mutex_lock gdp_req.c:132 0x186a90fd0 (&ReqFreeListMutex) [0x700001da2000] ep_thr_mutex_unlock gdp_req.c:138 0x186a90fd0 (&ReqFreeListMutex) [0x700001da2000] ep_thr_mutex_init gdp_req.c:143 0x7fcf71e32540 (&req->mutex) [0x700001da2000] ep_thr_mutex_lock ep_hash.c:169 0x7fcf70b56830 (&hp->mutex) [0x700001da2000] ep_thr_mutex_unlock ep_hash.c:175 0x7fcf70b56830 (&hp->mutex) [0x700001da2000] ep_thr_cond_init gdp_req.c:144 0x7fcf71e32580 (&req->cond) [0x700001da2000] _gdp_req_lock: req @ 0x7fcf71e32540 ep_thr_mutex_lock gdp_req.c:332 0x7fcf71e32540 (&req->mutex) [0x700001da2000] ep_thr_mutex_lock gdp_pdu.c:778 0x186a8ff00 (&PduFreeListMutex) [0x700001da2000] ep_thr_mutex_unlock gdp_pdu.c:781 0x186a8ff00 (&PduFreeListMutex) [0x700001da2000] ep_thr_mutex_lock gdp_datum.c:58 0x186a8fd20 (&DatumFreeListMutex) [0x700001da2000] ep_thr_mutex_unlock gdp_datum.c:63 0x186a8fd20 (&DatumFreeListMutex) [0x700001da2000] gdp_datum_new => 0x7fcf71e09600 _gdp_pdu_new => 0x7fcf6f6e4170 ep_thr_mutex_lock gdp_req.c:170 0x7fcf7198f2d0 (&chan->mutex) [0x700001da2000] ep_thr_cond_wait: Operation timed out gdp_event_next: ep_thr_cond_wait => 60 ep_thr_mutex_unlock gdp_event.c:175 0x186a8faf0 (&ActiveListMutex) [0x7000022b4000] gdp_event_next => 0x0 gdp_event_next: gcl 0x7fcf6f7f9f30 ep_thr_mutex_lock gdp_event.c:132 0x186a8faf0 (&ActiveListMutex) [0x7000022b4000] gdp_event_next: empty ActiveList; waiting ep_thr_cond_wait-cv gdp_event.c:141 0x186a8fb40 (&ActiveListSig) [0x7000022b4000] ep_thr_cond-wait-mtx gdp_event.c:141 0x186a8faf0 (&ActiveListSig) [0x7000022b4000] I/O event loop timeout subscr_poker_thread: poking _gdp_req_lock: req @ 0x7fcf6f28c3a0 ep_thr_mutex_lock gdp_req.c:332 0x7fcf6f28c3a0 (&req->mutex) [0x7000021b1000] subscr_poker_thread: at 2016-10-23T00:23:40.820829000Z checking req@0x7fcf6f28c3a0: nextrec=0, numrecs=0, chan=0x7fcf7198f2d0 postproc=0x0, sub_cb=0x0, udata=0x0 state=IDLE, stat=OK [67896520 = 0x40c04c8] act_ts=2016-10-23T00:23:27.590895000Z flags=0x1d7<ASYNCIO,DONE,CLT_SUBSCR,PERSIST,ALLOC_RID,ON_GCL_LIST,ON_CHAN_LIST> GCL@0x7fcf6f7f9f30: m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U iomode = 3, refcnt = 5, reqs = 0x7fcf6f28c3a0, nrecs = 0 flags = 0xa<INCACHE,INUSE> PDU@0x7fcf6f6e3fe0: v=3, ttl=15, rsvd1=0, cmd=128=ACK_SUCCESS dst=roUarUJGXIj_ME8LOnz6Q0ntIoa63esyGi-bPsbMwEQ src=KMo2Rsb7djEfraiGRXex_PtqBKtCgWA0q71jfceSGKg rid=6, olen=8, chan=0x7fcf7198f2d0, seqno=0 flags=0x2<HAS_RECNO> datum=0x0 total header=88 _gdp_req_unlock: req @ 0x7fcf6f28c3a0 ep_thr_mutex_unlock gdp_req.c:348 0x7fcf6f28c3a0 (&req->mutex) [0x7000021b1000] _gdp_req_lock: req @ 0x7fcf6f6e3ed0 ...
Attached is the complete log.
I suppose I could try to replicate this with C or Python...
#7 Updated by Anonymous about 7 years ago
I created gdp/gdp/lang/python/test/writer_subscriber_test2.py that hangs during the second append when creating a second log.
Here's how to run it and the hang:
bash-3.2$ pwd /Users/cxh/ptII/vendors/gdp/gdp/lang/python/test bash-3.2$ python writer_subscriber_test2.py Name: python.test.writer_subscriber_test2.LN2QAXRZE9 About to create python.test.writer_subscriber_test2.LN2QAXRZE9 Created python.test.writer_subscriber_test2.LN2QAXRZE9 Get the writer Get the subscriber Make the subscribe call About to loop About to append data Done appending data About to call get_next_event() {'sig': '', 'recno': 1, 'data': '1', 'ts': {'tv_sec': 1477187224, 'tv_accuracy': 0.5, 'tv_nsec': 388691000L}, 'sigalg': 0} About to append data Done appending data About to call get_next_event() {'sig': '', 'recno': 2, 'data': '2', 'ts': {'tv_sec': 1477187224, 'tv_accuracy': 0.5, 'tv_nsec': 447695000L}, 'sigalg': 0} About to append data Done appending data About to call get_next_event() {'sig': '', 'recno': 3, 'data': '3', 'ts': {'tv_sec': 1477187224, 'tv_accuracy': 0.5, 'tv_nsec': 506352000L}, 'sigalg': 0} About to append data Done appending data About to call get_next_event() {'sig': '', 'recno': 4, 'data': '4', 'ts': {'tv_sec': 1477187224, 'tv_accuracy': 0.5, 'tv_nsec': 576832000L}, 'sigalg': 0} About to append data Done appending data About to call get_next_event() {'sig': '', 'recno': 5, 'data': '5', 'ts': {'tv_sec': 1477187224, 'tv_accuracy': 0.5, 'tv_nsec': 636540000L}, 'sigalg': 0} About to append data Done appending data About to call get_next_event() {'sig': '', 'recno': 6, 'data': '6', 'ts': {'tv_sec': 1477187224, 'tv_accuracy': 0.5, 'tv_nsec': 694816000L}, 'sigalg': 0} About to append data Done appending data About to call get_next_event() {'sig': '', 'recno': 7, 'data': '7', 'ts': {'tv_sec': 1477187224, 'tv_accuracy': 0.5, 'tv_nsec': 754734000L}, 'sigalg': 0} About to append data Done appending data About to call get_next_event() {'sig': '', 'recno': 8, 'data': '8', 'ts': {'tv_sec': 1477187224, 'tv_accuracy': 0.5, 'tv_nsec': 814780000L}, 'sigalg': 0} About to append data Done appending data About to call get_next_event() {'sig': '', 'recno': 9, 'data': '9', 'ts': {'tv_sec': 1477187224, 'tv_accuracy': 0.5, 'tv_nsec': 885064000L}, 'sigalg': 0} About to append data Done appending data About to call get_next_event() {'sig': '', 'recno': 10, 'data': '10', 'ts': {'tv_sec': 1477187224, 'tv_accuracy': 0.5, 'tv_nsec': 948893000L}, 'sigalg': 0} OK Name: python.test.writer_subscriber_test2.TIXDLJL570 About to create python.test.writer_subscriber_test2.TIXDLJL570 Created python.test.writer_subscriber_test2.TIXDLJL570 Get the writer Get the subscriber Make the subscribe call About to loop About to append data
#8 Updated by Nitesh Mor about 7 years ago
Yes, it appears to be a deadlock. Fixed in commit:795b9a94. Multiple subscriptions should work as expected now.
#9 Updated by Nitesh Mor about 7 years ago
- Assignee changed from Eric Allman to Nitesh Mor
#10 Updated by Nitesh Mor about 7 years ago
- Copied to Bug #82: Memory leaks in subscriptions (and elsewhere) added
#11 Updated by Anonymous about 7 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
- Estimated time set to 5.00
Thanks! That fix fixes this bug.