Project

General

Profile

Bug #82

Memory leaks in subscriptions (and elsewhere)

Added by Nitesh Mor over 3 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Low
Assignee:
Category:
libgdp
Start date:
10/22/2016
Due date:
% Done:

0%


Description

(copied from #81 filed by Christopher)

It looks like gdp-reader is leaking memory during subscription, it could be that things are not being closed.

I don't think it is worth fixing memory leaks this close to the annual meeting, but the leaks do indicate that we are not freeing resources.

To replicate this, under RHEL, I create a log ptolemy.actor.lib.jjs.modules.gdp.test.auto.GDPLogSubscribeJS.0.33254535198571245

and then ran

valgrind --leak-check=full gdp-reader -n 1 -s ptolemy.actor.lib.jjs.modules.gdp.test.auto.GDPLogSubscribeJS.0.33254535198571245

Then, on a different machine, ran

echo "bar" | $PTII/vendors/gdp/gdp/apps/gdp-writer -G gdp-03.eecs.berkeley.edu ptolemy.actor.lib.jjs.modules.gdp.test.auto.GDPLogSubscribeJS.0.33254535198571245

The output of valgrind shows:

==6724== LEAK SUMMARY:
==6724==    definitely lost: 67 bytes in 3 blocks
==6724==    indirectly lost: 0 bytes in 0 blocks
==6724==      possibly lost: 33,923 bytes in 11 blocks
==6724==    still reachable: 30,858 bytes in 640 blocks
==6724==         suppressed: 0 bytes in 0 blocks
==6724== Reachable blocks (those to which a pointer was found) are not shown.
==6724== To see them, rerun with: --leak-check=full --show-reachable=yes

It looks like we are leaking around subscription creation?

==6724== 560 bytes in 1 blocks are possibly lost in loss record 634 of 646
==6724==    at 0x4A057BB: calloc (vg_replace_malloc.c:593)
==6724==    by 0x344A611A82: _dl_allocate_tls (in /lib64/ld-2.12.so)
==6724==    by 0x344B2072CC: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.12.so)
==6724==    by 0x40FFE5: _gdp_gcl_subscribe (gdp_subscr.c:252)
==6724==    by 0x405BE0: gdp_gcl_subscribe (gdp_api.c:472)
==6724==    by 0x405465: do_multiread (gdp-reader.c:238)
==6724==    by 0x405939: main (gdp-reader.c:499)

For your reference, I've attached the output of valgrind.

gdp-reader-valgrind.txt Magnifier (9.88 KB) Anonymous, 10/22/2016 03:39 PM

gcl-subscribe-valgrind-14Nov2016.txt Magnifier (10.5 KB) Anonymous, 11/14/2016 07:31 AM


Related issues

Copied from GDP - Feature #81: Need a way to unsubscribe from subscriptions Closed 10/22/2016

History

#1 Updated by Nitesh Mor over 3 years ago

  • Copied from Feature #81: Need a way to unsubscribe from subscriptions added

#2 Updated by Nitesh Mor over 3 years ago

  • File deleted (Feature81Screeshot.pdf)

#3 Updated by Nitesh Mor over 3 years ago

  • File deleted (subscribeLog.txt)

#4 Updated by Eric Allman over 3 years ago

  • Status changed from New to Feedback
  • Assignee changed from Nitesh Mor to Eric Allman

I can't seem to reproduce this on the latest version (although admittedly I'm testing on MacOS; Linux may be different). However, I do notice that the memory leak shown is actually happening in pthread_create, so I wonder if it isn't in the pthread implementation. It is the case that gdp_gcl_subscribe will spawn one long-running thread, which doesn't terminate even if the subscription does, but that thread is shared among all subscriptions. If so, this will leak a fixed amount of memory per instantiation, and have only minimal effect.

#5 Updated by Anonymous over 3 years ago

I can still reproduce this under RHEL with today's version of the GDP.

[cxh@terra apps]$ gcl-create -k none ptolemy.actor.lib.jjs.modules.gdp.test.auto.GDPLogSubscribeJS.0.6
Created new GCL n63QFLX1yqjy0hWddy5JpxVjAkuIrSTCfX7sLQbPHew
    on log server edu.berkeley.eecs.gdp-01.gdplogd
exiting with status OK
[cxh@terra apps]$ valgrind --leak-check=full gdp-reader -D "*=10" -n 1 -s ptolemy.actor.lib.jjs.modules.gdp.test.auto.GDPLogSubscribeJS.0.6 >& /tmp/gcl-subscribe-valgrind-14Nov2016.txt

See the attached file for the output of valgrind.

Then, on a Mac, write to the log

bash-3.2$ echo "bar" | $PTII/vendors/gdp/gdp/apps/gdp-writer -D "*=10" -G gdp-03.eecs.berkeley.edu ptolemy.actor.lib.jjs.modules.gdp.test.auto.GDPLogSubscribeJS.0.6
_gdp_lib_init(NULL)
        @(#)libgdp 0.7.2 (2016-11-14 07:23) 401e1b73efd8e611c830e0a6aed865a02b5aa86c
My GDP routing name = xuoK-hghn0wRLw8tzT9rgIGRVX2XhIWHShXd1-9J0iU
gdp_lib_init: OK
_gdp_chan_open(gdp-03.eecs.berkeley.edu)
Trying gdp-03.eecs.berkeley.edu
_gdp_chan_open: talking to router at gdp-03.eecs.berkeley.edu:8007
gdp_init: OK

>>> _gdp_invoke(req=0x7fbbaa800030 rid=1): CMD_OPEN_AO (67), gcl@0x7fbbaa8002e0
<<< _gdp_invoke(0x7fbbaa800030 rid=1) CMD_OPEN_AO: OK [67896520 = 0x40c04c8]
Opened GCL n63QFLX1yqjy0hWddy5JpxVjAkuIrSTCfX7sLQbPHew
<<< gdp_gcl_open(n63QFLX1yqjy0hWddy5JpxVjAkuIrSTCfX7sLQbPHew): OK
GDPname: n63QFLX1yqjy0hWddy5JpxVjAkuIrSTCfX7sLQbPHew (0 recs)

Starting to read input
Got input «bar»

>>> _gdp_invoke(req=0x7fbbaa800030 rid=0): CMD_APPEND (71), gcl@0x7fbbaa8002e0
<<< _gdp_invoke(0x7fbbaa800030 rid=0) CMD_APPEND: OK [67896521 = 0x40c04c9]
recno 1, len 0, ts 2016-11-14T15:30:10.391485000Z
<<< gdp_gcl_close(n63QFLX1yqjy0hWddy5JpxVjAkuIrSTCfX7sLQbPHew): OK
gdp-writer: exiting with status OK

Below is an analysis of the leaks I see

==7885== 6 bytes in 1 blocks are definitely lost in loss record 2 of 650
==7885==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==7885==    by 0x3FA4C03674: avahi_malloc (in /usr/lib64/libavahi-common.so.3.5.1)
==7885==    by 0x3FA4C039A8: avahi_strdup (in /usr/lib64/libavahi-common.so.3.5.1)
==7885==    by 0x410983: gdp_zc_scan (gdp_zc_client.c:602)
==7885==    by 0x406A87: _gdp_chan_open (gdp_chan.c:239)
==7885==    by 0x405CFE: gdp_init (gdp_api.c:254)
==7885==    by 0x40580A: main (gdp-reader.c:456)
==7885==
==7885== 10 bytes in 1 blocks are definitely lost in loss record 11 of 650
==7885==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==7885==    by 0x3FA4C03674: avahi_malloc (in /usr/lib64/libavahi-common.so.3.5.1)
==7885==    by 0x3FA4C039A8: avahi_strdup (in /usr/lib64/libavahi-common.so.3.5.1)
==7885==    by 0x410963: gdp_zc_scan (gdp_zc_client.c:600)
==7885==    by 0x406A87: _gdp_chan_open (gdp_chan.c:239)
==7885==    by 0x405CFE: gdp_init (gdp_api.c:254)
==7885==    by 0x40580A: main (gdp-reader.c:456)

Avahi leaks. it would be good to check to see if there is a way to free the string that is leaked. An alternative would be to build with "make all_noavahi"

==7885== 51 bytes in 1 blocks are possibly lost in loss record 586 of 650
==7885==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==7885==    by 0x41513E: system_malloc (ep_mem.c:102)
==7885==    by 0x4151CA: ep_mem_ialloc (ep_mem.c:171)
==7885==    by 0x41531C: ep_mem_istrdup (ep_mem.c:319)
==7885==    by 0x41536C: ep_mem_strdup (ep_mem.c:337)
==7885==    by 0x411D12: read_param_file (ep_adm.c:107)
==7885==    by 0x411DFA: ep_adm_readparams (ep_adm.c:208)
==7885==    by 0x40B729: gdp_lib_init (gdp_main.c:548)
==7885==    by 0x405CD1: gdp_init (gdp_api.c:249)
==7885==    by 0x40580A: main (gdp-reader.c:456)
==7885==
==7885== 51 bytes in 1 blocks are definitely lost in loss record 587 of 650
==7885==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==7885==    by 0x41513E: system_malloc (ep_mem.c:102)
==7885==    by 0x4151CA: ep_mem_ialloc (ep_mem.c:171)
==7885==    by 0x41531C: ep_mem_istrdup (ep_mem.c:319)
==7885==    by 0x41536C: ep_mem_strdup (ep_mem.c:337)
==7885==    by 0x411D12: read_param_file (ep_adm.c:107)
==7885==    by 0x411DFA: ep_adm_readparams (ep_adm.c:208)
==7885==    by 0x40B729: gdp_lib_init (gdp_main.c:548)
==7885==    by 0x405CD1: gdp_init (gdp_api.c:249)
==7885==    by 0x40580A: main (gdp-reader.c:456)
==7885==
==7885== 80 bytes in 1 blocks are possibly lost in loss record 599 of 650
==7885==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==7885==    by 0x41513E: system_malloc (ep_mem.c:102)
==7885==    by 0x4151CA: ep_mem_ialloc (ep_mem.c:171)
==7885==    by 0x415E9F: ep_rpool_new (ep_rpool.c:169)
==7885==    by 0x414932: ep_hash_new (ep_hash.c:107)
==7885==    by 0x4162AA: ep_stat_reg_strings (ep_stat.c:84)
==7885==    by 0x4162EB: _ep_stat_init (ep_statcodes.c:71)
==7885==    by 0x414B7F: ep_lib_init (ep_lib.c:50)
==7885==    by 0x40B71D: gdp_lib_init (gdp_main.c:547)
==7885==    by 0x405CD1: gdp_init (gdp_api.c:249)
==7885==    by 0x40580A: main (gdp-reader.c:456)
==7885== 80 bytes in 1 blocks are possibly lost in loss record 600 of 650
==7885==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==7885==    by 0x41513E: system_malloc (ep_mem.c:102)
==7885==    by 0x4151CA: ep_mem_ialloc (ep_mem.c:171)
==7885==    by 0x415E9F: ep_rpool_new (ep_rpool.c:169)
==7885==    by 0x414932: ep_hash_new (ep_hash.c:107)
==7885==    by 0x411A97: get_param_path (ep_adm.c:130)
==7885==    by 0x411D9C: ep_adm_readparams (ep_adm.c:195)
==7885==    by 0x414B8B: ep_lib_init (ep_lib.c:51)
==7885==    by 0x40B71D: gdp_lib_init (gdp_main.c:547)
==7885==    by 0x405CD1: gdp_init (gdp_api.c:249)
==7885==    by 0x40580A: main (gdp-reader.c:456)

The above look like ep librar leaks surround reading the params file. Not a huge deal, but if a long running processes invokes gdp_init many times, there will be problems.

==7885==
==7885== 80 bytes in 1 blocks are possibly lost in loss record 601 of 650
==7885==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==7885==    by 0x41513E: system_malloc (ep_mem.c:102)
==7885==    by 0x4151CA: ep_mem_ialloc (ep_mem.c:171)
==7885==    by 0x415E9F: ep_rpool_new (ep_rpool.c:169)
==7885==    by 0x414932: ep_hash_new (ep_hash.c:107)
==7885==    by 0x408C6D: _gdp_gcl_cache_init (gdp_gcl_cache.c:94)
==7885==    by 0x40B993: gdp_lib_init (gdp_main.c:623)
==7885==    by 0x405CD1: gdp_init (gdp_api.c:249)
==7885==    by 0x40580A: main (gdp-reader.c:456)

Some sort of cache leak.

==7885== 560 bytes in 1 blocks are possibly lost in loss record 638 of 650
==7885==    at 0x4A057BB: calloc (vg_replace_malloc.c:593)
==7885==    by 0x344A611A82: _dl_allocate_tls (in /lib64/ld-2.12.so)
==7885==    by 0x344B2072CC: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.12.so)
==7885==    by 0x40FFE5: _gdp_gcl_subscribe (gdp_subscr.c:252)
==7885==    by 0x405BE0: gdp_gcl_subscribe (gdp_api.c:472)
==7885==    by 0x405465: do_multiread (gdp-reader.c:238)
==7885==    by 0x405939: main (gdp-reader.c:499)

Here's that pthread leak. See http://stackoverflow.com/questions/5610677/valgrind-memory-leak-errors-when-using-pthread-create

==7885== 4,096 bytes in 1 blocks are possibly lost in loss record 646 of 650
==7885==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==7885==    by 0x41513E: system_malloc (ep_mem.c:102)
==7885==    by 0x4151CA: ep_mem_ialloc (ep_mem.c:171)
==7885==    by 0x415982: ep_rpool_ialloc (ep_rpool.c:405)
==7885==    by 0x415B28: ep_rpool_zalloc (ep_rpool.c:271)
==7885==    by 0x414948: ep_hash_new (ep_hash.c:112)
==7885==    by 0x4162AA: ep_stat_reg_strings (ep_stat.c:84)
==7885==    by 0x4162EB: _ep_stat_init (ep_statcodes.c:71)
==7885==    by 0x414B7F: ep_lib_init (ep_lib.c:50)
==7885==    by 0x40B71D: gdp_lib_init (gdp_main.c:547)
==7885==    by 0x405CD1: gdp_init (gdp_api.c:249)
==7885==    by 0x40580A: main (gdp-reader.c:456)
==7885==
==7885== 4,096 bytes in 1 blocks are possibly lost in loss record 647 of 650
==7885==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==7885==    by 0x41513E: system_malloc (ep_mem.c:102)
==7885==    by 0x4151CA: ep_mem_ialloc (ep_mem.c:171)
==7885==    by 0x415982: ep_rpool_ialloc (ep_rpool.c:405)
==7885==    by 0x415B28: ep_rpool_zalloc (ep_rpool.c:271)
==7885==    by 0x414948: ep_hash_new (ep_hash.c:112)
==7885==    by 0x411A97: get_param_path (ep_adm.c:130)
==7885==    by 0x411D9C: ep_adm_readparams (ep_adm.c:195)
==7885==    by 0x414B8B: ep_lib_init (ep_lib.c:51)
==7885==    by 0x40B71D: gdp_lib_init (gdp_main.c:547)
==7885==    by 0x405CD1: gdp_init (gdp_api.c:249)
==7885==    by 0x40580A: main (gdp-reader.c:456)
==7885==
==7885== 4,096 bytes in 1 blocks are possibly lost in loss record 648 of 650
==7885==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==7885==    by 0x41513E: system_malloc (ep_mem.c:102)
==7885==    by 0x4151CA: ep_mem_ialloc (ep_mem.c:171)
==7885==    by 0x415982: ep_rpool_ialloc (ep_rpool.c:405)
==7885==    by 0x414725: ep_hash_insert (ep_hash.c:208)
==7885==    by 0x4162CC: ep_stat_reg_strings (ep_stat.c:91)
==7885==    by 0x40FD14: _gdp_stat_init (gdp_stat.c:112)
==7885==    by 0x40B7AE: gdp_lib_init (gdp_main.c:567)
==7885==    by 0x405CD1: gdp_init (gdp_api.c:249)
==7885==    by 0x40580A: main (gdp-reader.c:456)
==7885== 4,096 bytes in 1 blocks are possibly lost in loss record 649 of 650
==7885==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==7885==    by 0x41513E: system_malloc (ep_mem.c:102)
==7885==    by 0x4151CA: ep_mem_ialloc (ep_mem.c:171)
==7885==    by 0x415982: ep_rpool_ialloc (ep_rpool.c:405)
==7885==    by 0x414725: ep_hash_insert (ep_hash.c:208)
==7885==    by 0x408B68: _gdp_gcl_cache_add (gdp_gcl_cache.c:134)
==7885==    by 0x40F62A: _gdp_req_send (gdp_req.c:365)
==7885==    by 0x40EBB9: _gdp_invoke (gdp_proto.c:112)
==7885==    by 0x40998C: _gdp_gcl_open (gdp_gcl_ops.c:299)
==7885==    by 0x405FA0: gdp_gcl_open (gdp_api.c:381)
==7885==    by 0x4058BB: main (gdp-reader.c:484)
==7885==
==7885== 16,128 bytes in 1 blocks are possibly lost in loss record 650 of 650
==7885==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==7885==    by 0x41513E: system_malloc (ep_mem.c:102)
==7885==    by 0x4151CA: ep_mem_ialloc (ep_mem.c:171)
==7885==    by 0x415982: ep_rpool_ialloc (ep_rpool.c:405)
==7885==    by 0x415B28: ep_rpool_zalloc (ep_rpool.c:271)
==7885==    by 0x414948: ep_hash_new (ep_hash.c:112)
==7885==    by 0x408C6D: _gdp_gcl_cache_init (gdp_gcl_cache.c:94)
==7885==    by 0x40B993: gdp_lib_init (gdp_main.c:623)
==7885==    by 0x405CD1: gdp_init (gdp_api.c:249)
==7885==    by 0x40580A: main (gdp-reader.c:456)
==7885==
==7885== LEAK SUMMARY:
==7885==    definitely lost: 67 bytes in 3 blocks
==7885==    indirectly lost: 0 bytes in 0 blocks
==7885==      possibly lost: 33,923 bytes in 11 blocks
==7885==    still reachable: 30,859 bytes in 642 blocks
==7885==         suppressed: 0 bytes in 0 blocks
==7885== Reachable blocks (those to which a pointer was found) are not shown.
==7885== To see them, rerun with: --leak-check=full --show-reachable=yes
==7885==
==7885== For counts of detected and suppressed errors, rerun with: -v
==7885== ERROR SUMMARY: 14 errors from 14 contexts (suppressed: 6 from 6)

More ep leaks.

The amount of memory leaked is not huge, but should be addressed. Long running processes should be checked for memory leaks as well.

Valgrind does not work as well under Mac OS X as it does under Linux.

#6 Updated by Eric Allman over 3 years ago

There are many cases where memory is allocated during startup and never released; for example, when reading the administrative parameters memory is allocated for those parameters and stored in memory. These don't seem like memory "leaks" to me since it is intended to last for the life of the process, and the memory used should not increase as the process runs. Several of your cases fall into the category.

Avahi

These might be true leaks. I suspect that Avahi is mature enough that it doesn't leak if used properly, but there's some local code that definitely does memory allocation and may or may not free everything. If there are leaks here they should be slow leaks, since that code is only used when trying to reconnect to a GDP router. I'll have to examine this code in more detail.

gdp_init related memory

This is all related to reading and storing administrative parameter and are definitely in the "startup only" class. Even multiple calls to gdp_init won't cause this to leak, since that code is protected by the _GdpLibInitialized flag.

Pthread-related memory

There's a single thread started at the first subscription to renew subscriptions (which are actually leases). This thread knows nothing of the context of individual subscriptions, and multiple subscriptions share one thread. With some effort (I'm not sure how much) I could probably change that thread to terminate if there are no remaining subscriptions, but this isn't a high priority since the memory usage will not grow in long-running processes.

"More ep leaks"

The first and third blocks are creating and initializing a hash table to store the internal EP_STAT ⇒ string mappings. This is allocated once during startup and after that is read-only.

The second block is the hash table storing runtime administrative parameters. This is also allocated once during startup and is then read-only.

The fourth and fifth blocks are the hash table implementing the GCL cache. In particular, the fifth block is allocating the hash table header, of which there will be exactly one. The fourth block is a bit trickier; it is the individual entry in the hash chain. If a process is opening huge numbers of GCLs this might leak a bit for each new GCL, but only for new names. This would require updating the way the hash table implementation does memory management. This is a legitimate bug, and although it's unlikely to impact most client programs, it will impact gdplogd. I'll open a new issue for this case.

Other than this last one, none of these look like they are leaks that will get worse as a client process runs. I'm not saying that they don't exist, but these particular ones don't seem concerning.

#7 Updated by Anonymous over 3 years ago

Thanks for taking a look, I agree that these leaks are of low priority.

In an embedded system a process could run for a very long time, or could have very little memory, so leaks are important.

I consider it to be good programming hygiene to not leak memory, which may require providing a shutdown method. However, there are bigger fish to fry.

For the Java interface to the GDP, Java loads a shared library, which is loaded only once. So, _GdpLibInitialized should work as planned. As a quick test, I called gdp_init() 100x in gdp-reader.c and the memory leaks did not change, so we are set.

It would be good to take a look at the memory usage writer as well.

#8 Updated by Eric Allman over 1 year ago

  • Status changed from Feedback to Closed

Also available in: Atom PDF