Project

General

Profile

Feature #81

Need a way to unsubscribe from subscriptions

Added by Anonymous about 7 years ago. Updated about 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
libgdp
Start date:
10/22/2016
Due date:
% Done:

100%

Estimated time:
5.00 h

Description

I think we may need a way to unsubscribe from a subscription.

gdplogd/logd_proto.c invokes a private _gdp_gcl_unsubscribe(), shouldn't gdp-reader.c?

It looks like gdp-reader is leaking memory when subscribing, part of which could be from not closing the subscription.

Also, how do I add Edward to the list of interested people for this issue

Details below.

Edward wrote:

This subscription demo now works:

$PTII/org/terraswarm/accessor/demo/WatchHandRaisingLight/GDPSubscribe.xml

One thing you could help with would be to figure out how to properly
shut down a GDP subscription.

GDPHelper has this:

public void unsubscribe() {
// FIXME: Properly close the C side. How to do that?
_subscribed = false;
}

The subscription keeps running after the model stops, which is a
problem. Maybe the cause of some deadlocks I've seen...

gdp/doc/gdp-programmatic-api.html says:

Name
gdp_gcl_unsubscribe — Unsubscribe from a GCL [NOT IMPLEMENTED; this interface is probably wrong]
Synopsis

EP_STAT gdp_gcl_unsubscribe(gdp_gcl_t *gcl,
gdp_event_cbfunc_t *cbfunc,
void *udata)

Notes

Unsubscribes to the given {gcl, cbfunc, udata} tuple.
(Should subscribe/multiread return a handle to be passed to this function rather than this interface?)

There is no public gdp_gcl_unsubscribe, but .gdp/gdp_subscr.c has:

/*                                                                                                                                                                               
**  Unsubscribe all requests for a given gcl and destination.                                                                                                                    
*/

EP_STAT
_gdp_gcl_unsubscribe(
                gdp_gcl_t *gcl,
                gdp_name_t dest)
{
        EP_STAT estat;
        gdp_req_t *req;
        gdp_req_t *nextreq;

        do
        {
                estat = EP_STAT_OK;
                for (req = LIST_FIRST(&gcl->reqs); req != NULL; req = nextreq)
                {
                        estat = _gdp_req_lock(req);
                        EP_STAT_CHECK(estat, break);
            nextreq = LIST_NEXT(req, gcllist);
                        if (!GDP_NAME_SAME(req->pdu->dst, dest))
            {
                                _gdp_req_unlock(req);
                                continue;
                        }

                        // remove subscription for this destination                                                                                                              
            EP_ASSERT_INVARIANT(req->gcl == gcl);
                        LIST_REMOVE(req, gcllist);
                        _gdp_req_free(&req);
                }
        } while (!EP_STAT_ISOK(estat));
        return estat;
}

It looks like gdp/gdplogd/logd_proto.c has a cmd_close() function that invokes _gdp_gcl_unsubscribe():

/*                                                                                      
**  CMD_CLOSE --- close an open GCL                                                     
**                                                                                      
**              XXX     Since GCLs are shared between clients you really can't just     
**                      close things willy-nilly.  Thus, this is currently a no-op      
**                      until such time as reference counting works.                    
**                                                                                      
**              XXX     We need to have a way of expiring unused GCLs that are not      
**                      closed.                                                         
*/

EP_STAT
cmd_close(gdp_req_t *req)
{
        EP_STAT estat = EP_STAT_OK;

        req->pdu->cmd = GDP_ACK_SUCCESS;

        // should have no input data; ignore anything there                             
        flush_input_data(req, "cmd_close");

        // a bit wierd to open the GCL only to close it again....                       
        estat = get_open_handle(req, GDP_MODE_ANY);
        if (!EP_STAT_ISOK(estat))
        {
                return gdpd_gcl_error(req->pdu->dst, "cmd_close: GCL not open",
                                                        estat, GDP_STAT_NAK_BADREQ);
        }

        // remove any subscriptions                                                     
        _gdp_gcl_unsubscribe(req->gcl, req->pdu->src);

        //return number of records                                                      
        req->pdu->datum->recno = req->gcl->nrecs;

        if (ep_dbg_test(Dbg, 20))
        {
                char ebuf[100];
                ep_dbg_printf("<<< cmd_close(%s): %s\n", req->gcl->pname,
                                        ep_stat_tostr(estat, ebuf, sizeof ebuf));
        }

        // drop reference                                                               
        _gdp_gcl_decref(&req->gcl);

        return estat;
}

It looks like gdp-reader is leaking memory during subscription, it could be that things are not being closed.

I don't think it is worth fixing memory leaks this close to the annual meeting, but the leaks do indicate that we are not freeing resources.

To replicate this, under RHEL, I create a log ptolemy.actor.lib.jjs.modules.gdp.test.auto.GDPLogSubscribeJS.0.33254535198571245

and then ran

valgrind --leak-check=full gdp-reader -n 1 -s ptolemy.actor.lib.jjs.modules.gdp.test.auto.GDPLogSubscribeJS.0.33254535198571245

Then, on a different machine, ran

echo "bar" | $PTII/vendors/gdp/gdp/apps/gdp-writer -G gdp-03.eecs.berkeley.edu ptolemy.actor.lib.jjs.modules.gdp.test.auto.GDPLogSubscribeJS.0.33254535198571245

The output of valgrind shows:

==6724== LEAK SUMMARY:
==6724==    definitely lost: 67 bytes in 3 blocks
==6724==    indirectly lost: 0 bytes in 0 blocks
==6724==      possibly lost: 33,923 bytes in 11 blocks
==6724==    still reachable: 30,858 bytes in 640 blocks
==6724==         suppressed: 0 bytes in 0 blocks
==6724== Reachable blocks (those to which a pointer was found) are not shown.
==6724== To see them, rerun with: --leak-check=full --show-reachable=yes

It looks like we are leaking around subscription creation?

==6724== 560 bytes in 1 blocks are possibly lost in loss record 634 of 646
==6724==    at 0x4A057BB: calloc (vg_replace_malloc.c:593)
==6724==    by 0x344A611A82: _dl_allocate_tls (in /lib64/ld-2.12.so)
==6724==    by 0x344B2072CC: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.12.so)
==6724==    by 0x40FFE5: _gdp_gcl_subscribe (gdp_subscr.c:252)
==6724==    by 0x405BE0: gdp_gcl_subscribe (gdp_api.c:472)
==6724==    by 0x405465: do_multiread (gdp-reader.c:238)
==6724==    by 0x405939: main (gdp-reader.c:499)

For your reference, I've attached the output of valgrind.

gdp-reader-valgrind.txt Magnifier (9.88 KB) Anonymous, 10/22/2016 03:39 PM

Feature81Screeshot.pdf (153 KB) Anonymous, 10/22/2016 04:20 PM

subscribeLog.txt Magnifier (116 KB) Anonymous, 10/22/2016 05:25 PM


Related issues

Copied to GDP - Bug #82: Memory leaks in subscriptions (and elsewhere) Closed 10/22/2016

History

#1 Updated by Nitesh Mor about 7 years ago

Christopher Brooks wrote:

Also, how do I add Edward to the list of interested people for this issue

Add 'Watchers' (right side bar).

#2 Updated by Anonymous about 7 years ago

How do I add Edward to the list of watchers? I told him to go to the page and click on the link, but that seems not reliable.

A common bug support best practice is to add a mailing list to the list of watchers for a bug.

#3 Updated by Nitesh Mor about 7 years ago

Christopher Brooks wrote:

How do I add Edward to the list of watchers? I told him to go to the page and click on the link, but that seems not reliable.

A common bug support best practice is to add a mailing list to the list of watchers for a bug.

Individuals can 'Watch' any page (bug/wiki-page/etc), or you can explicitly add someone by using the 'Add' link next to watchers, search for their name and check their name.

#4 Updated by Anonymous about 7 years ago

I don't have an Add Watchers choice on https://gdp.cs.berkeley.edu/redmine/issues/81. I have

Issues

View all issues
Summary

I've attached a screenshot.

Anyway, I see now that gdplogd is probably calling _gdp_gcl_unsubscribe() on the server side. I'm wondering if I need something similar on the client side.

#5 Updated by Nitesh Mor about 7 years ago

Adding/removing watchers was a permissions related things. My bad, should be fixed now.

Your analysis is correct. There is no explicit command for the server to stop sending data as of now. For the moment, a poor-man's unsubscription is to call gdp_gcl_close on the client side, which will tell the server to free up any resources (including removing any state related to subscriptions). If you need to access the same log again, you can re-open it.

#6 Updated by Anonymous about 7 years ago

I'm closing the gcl after the first.

I'm not sure, but It is starting to look like a deadlock.

$PTII/ptolemy/actor/lib/jjs/modules/gdp/test/auto/GDPLogSubscribe.xml is a Ptolemy model that creates a log, appends to it, reads from it and subscribes from it.

The model runs fine once.

The second time it hangs during the append.

Attached is a log. Towards the bottom is

>>> gdp_gcl_close(m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U)
<<< gdp_gcl_close(m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U): OK
6992 ms. Memory: 459264K Free: 224738K (49%)
GDP_GCL.close()

>>> gdp_gcl_close(m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U)
<<< gdp_gcl_close(m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U): OK

which shows that the first run completed.

Then we have

<<< gdp_gcl_open(m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U): OK
GDPLogAppend2: Append data: 1 (Thread[GDPLogSubscribe,1,main])
GDPHelper.append(1)
ep_thr_mutex_lock gdp_datum.c:58 0x186a8fd20 (&DatumFreeListMutex) [0x700001da2000]
ep_thr_mutex_unlock gdp_datum.c:63 0x186a8fd20 (&DatumFreeListMutex) [0x700001da2000]
gdp_datum_new => 0x7fcf71e08f60

>>> gdp_gcl_append

This is where we attempt to append a 1 to the log.

Then, not much happens

ep_thr_mutex_lock gdp_req.c:132 0x186a90fd0 (&ReqFreeListMutex) [0x700001da2000]
ep_thr_mutex_unlock gdp_req.c:138 0x186a90fd0 (&ReqFreeListMutex) [0x700001da2000]
ep_thr_mutex_init gdp_req.c:143 0x7fcf71e32540 (&req->mutex) [0x700001da2000]
ep_thr_mutex_lock ep_hash.c:169 0x7fcf70b56830 (&hp->mutex) [0x700001da2000]
ep_thr_mutex_unlock ep_hash.c:175 0x7fcf70b56830 (&hp->mutex) [0x700001da2000]
ep_thr_cond_init gdp_req.c:144 0x7fcf71e32580 (&req->cond) [0x700001da2000]
_gdp_req_lock: req @ 0x7fcf71e32540
ep_thr_mutex_lock gdp_req.c:332 0x7fcf71e32540 (&req->mutex) [0x700001da2000]
ep_thr_mutex_lock gdp_pdu.c:778 0x186a8ff00 (&PduFreeListMutex) [0x700001da2000]
ep_thr_mutex_unlock gdp_pdu.c:781 0x186a8ff00 (&PduFreeListMutex) [0x700001da2000]
ep_thr_mutex_lock gdp_datum.c:58 0x186a8fd20 (&DatumFreeListMutex) [0x700001da2000]
ep_thr_mutex_unlock gdp_datum.c:63 0x186a8fd20 (&DatumFreeListMutex) [0x700001da2000]
gdp_datum_new => 0x7fcf71e09600
_gdp_pdu_new => 0x7fcf6f6e4170
ep_thr_mutex_lock gdp_req.c:170 0x7fcf7198f2d0 (&chan->mutex) [0x700001da2000]
ep_thr_cond_wait: Operation timed out
gdp_event_next: ep_thr_cond_wait => 60
ep_thr_mutex_unlock gdp_event.c:175 0x186a8faf0 (&ActiveListMutex) [0x7000022b4000]
gdp_event_next => 0x0
gdp_event_next: gcl 0x7fcf6f7f9f30
ep_thr_mutex_lock gdp_event.c:132 0x186a8faf0 (&ActiveListMutex) [0x7000022b4000]
gdp_event_next: empty ActiveList; waiting
ep_thr_cond_wait-cv gdp_event.c:141 0x186a8fb40 (&ActiveListSig) [0x7000022b4000]
ep_thr_cond-wait-mtx gdp_event.c:141 0x186a8faf0 (&ActiveListSig) [0x7000022b4000]
I/O event loop timeout

subscr_poker_thread: poking
_gdp_req_lock: req @ 0x7fcf6f28c3a0
ep_thr_mutex_lock gdp_req.c:332 0x7fcf6f28c3a0 (&req->mutex) [0x7000021b1000]
subscr_poker_thread: at 2016-10-23T00:23:40.820829000Z checking req@0x7fcf6f28c3a0:
    nextrec=0, numrecs=0, chan=0x7fcf7198f2d0
    postproc=0x0, sub_cb=0x0, udata=0x0
    state=IDLE, stat=OK [67896520 = 0x40c04c8]
    act_ts=2016-10-23T00:23:27.590895000Z
    flags=0x1d7<ASYNCIO,DONE,CLT_SUBSCR,PERSIST,ALLOC_RID,ON_GCL_LIST,ON_CHAN_LIST>
    GCL@0x7fcf6f7f9f30: m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U
        iomode = 3, refcnt = 5, reqs = 0x7fcf6f28c3a0, nrecs = 0
        flags = 0xa<INCACHE,INUSE>
    PDU@0x7fcf6f6e3fe0:
        v=3, ttl=15, rsvd1=0, cmd=128=ACK_SUCCESS
        dst=roUarUJGXIj_ME8LOnz6Q0ntIoa63esyGi-bPsbMwEQ
        src=KMo2Rsb7djEfraiGRXex_PtqBKtCgWA0q71jfceSGKg
        rid=6, olen=8, chan=0x7fcf7198f2d0, seqno=0
        flags=0x2<HAS_RECNO>
        datum=0x0
        total header=88
_gdp_req_unlock: req @ 0x7fcf6f28c3a0
ep_thr_mutex_unlock gdp_req.c:348 0x7fcf6f28c3a0 (&req->mutex) [0x7000021b1000]
_gdp_req_lock: req @ 0x7fcf6f6e3ed0
...

Attached is the complete log.

I suppose I could try to replicate this with C or Python...

#7 Updated by Anonymous about 7 years ago

I created gdp/gdp/lang/python/test/writer_subscriber_test2.py that hangs during the second append when creating a second log.

Here's how to run it and the hang:

bash-3.2$ pwd
/Users/cxh/ptII/vendors/gdp/gdp/lang/python/test
bash-3.2$ python writer_subscriber_test2.py
Name: python.test.writer_subscriber_test2.LN2QAXRZE9
About to create python.test.writer_subscriber_test2.LN2QAXRZE9
Created python.test.writer_subscriber_test2.LN2QAXRZE9
Get the writer
Get the subscriber
Make the subscribe call
About to loop
About to append data
Done appending data
About to call get_next_event()
{'sig': '', 'recno': 1, 'data': '1', 'ts': {'tv_sec': 1477187224, 'tv_accuracy': 0.5, 'tv_nsec': 388691000L}, 'sigalg': 0}
About to append data
Done appending data
About to call get_next_event()
{'sig': '', 'recno': 2, 'data': '2', 'ts': {'tv_sec': 1477187224, 'tv_accuracy': 0.5, 'tv_nsec': 447695000L}, 'sigalg': 0}
About to append data
Done appending data
About to call get_next_event()
{'sig': '', 'recno': 3, 'data': '3', 'ts': {'tv_sec': 1477187224, 'tv_accuracy': 0.5, 'tv_nsec': 506352000L}, 'sigalg': 0}
About to append data
Done appending data
About to call get_next_event()
{'sig': '', 'recno': 4, 'data': '4', 'ts': {'tv_sec': 1477187224, 'tv_accuracy': 0.5, 'tv_nsec': 576832000L}, 'sigalg': 0}
About to append data
Done appending data
About to call get_next_event()
{'sig': '', 'recno': 5, 'data': '5', 'ts': {'tv_sec': 1477187224, 'tv_accuracy': 0.5, 'tv_nsec': 636540000L}, 'sigalg': 0}
About to append data
Done appending data
About to call get_next_event()
{'sig': '', 'recno': 6, 'data': '6', 'ts': {'tv_sec': 1477187224, 'tv_accuracy': 0.5, 'tv_nsec': 694816000L}, 'sigalg': 0}
About to append data
Done appending data
About to call get_next_event()
{'sig': '', 'recno': 7, 'data': '7', 'ts': {'tv_sec': 1477187224, 'tv_accuracy': 0.5, 'tv_nsec': 754734000L}, 'sigalg': 0}
About to append data
Done appending data
About to call get_next_event()
{'sig': '', 'recno': 8, 'data': '8', 'ts': {'tv_sec': 1477187224, 'tv_accuracy': 0.5, 'tv_nsec': 814780000L}, 'sigalg': 0}
About to append data
Done appending data
About to call get_next_event()
{'sig': '', 'recno': 9, 'data': '9', 'ts': {'tv_sec': 1477187224, 'tv_accuracy': 0.5, 'tv_nsec': 885064000L}, 'sigalg': 0}
About to append data
Done appending data
About to call get_next_event()
{'sig': '', 'recno': 10, 'data': '10', 'ts': {'tv_sec': 1477187224, 'tv_accuracy': 0.5, 'tv_nsec': 948893000L}, 'sigalg': 0}
OK
Name: python.test.writer_subscriber_test2.TIXDLJL570
About to create python.test.writer_subscriber_test2.TIXDLJL570
Created python.test.writer_subscriber_test2.TIXDLJL570
Get the writer
Get the subscriber
Make the subscribe call
About to loop
About to append data

#8 Updated by Nitesh Mor about 7 years ago

Yes, it appears to be a deadlock. Fixed in commit:795b9a94. Multiple subscriptions should work as expected now.

#9 Updated by Nitesh Mor about 7 years ago

  • Assignee changed from Eric Allman to Nitesh Mor

#10 Updated by Nitesh Mor about 7 years ago

  • Copied to Bug #82: Memory leaks in subscriptions (and elsewhere) added

#11 Updated by Anonymous about 7 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100
  • Estimated time set to 5.00

Thanks! That fix fixes this bug.

Also available in: Atom PDF