Project

General

Profile

Bug #97

GDP client crash and invalid writes

Added by Anonymous over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
libgdp
Start date:
04/06/2017
Due date:
% Done:

0%


Description

Under RHEL and Darwin, I'm seeing problems with free(), problems with invalid writes and crashes.

To replicate on terra.eecs.berkeley.edu as the jenkins user try the following commands:

export LD_LIBRARY_PATH=/usr/local/lib:/usr/lib64
export EP_PARAM_PATH=/home/jenkins/workspace/ep_adm_params
( cd /home/jenkins/workspace/gdp_router; python ./src/gdp_router.py -l routerLog.txt ) &
/home/jenkins/workspace/gdp/gdplogd/gdplogd -F -N terra.EECS.Berkeley.EDU &
/home/jenkins/workspace/gdp/apps/gcl-create -k none -s terra.EECS.Berkeley.EDU gdp.runTests.16166

Crash with:

My GDP routing name = terra.EECS.Berkeley.EDU
Created new GCL 2u7js-Fclq6R8as0m5T5V4Vc0YjRUkQQ5yJdt7svorY
        on log server terra.EECS.Berkeley.EDU
*** glibc detected *** /home/jenkins/workspace/gdp/apps/gcl-create: free(): invalid next size (fast): 0x0000000000a7fd60 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3382675dee]
/lib64/libc.so.6[0x3382678c80]
/home/jenkins/workspace/gdp/apps/gcl-create[0x41abaa]
/home/jenkins/workspace/gdp/apps/gcl-create[0x40c219]
/home/jenkins/workspace/gdp/apps/gcl-create[0x406642]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x338261ed1d]
/home/jenkins/workspace/gdp/apps/gcl-create[0x405789]
======= Memory map: ========
00400000-0042a000 r-xp 00000000 08:04 261358121                          /home/jenkins/workspace/gdp/apps/gcl-create
0062a000-0062e000 rw-p 0002a000 08:04 261358121                          /home/jenkins/workspace/gdp/apps/gcl-create
00a6f000-00a90000 rw-p 00000000 00:00 0                                  [heap]
3382200000-3382220000 r-xp 00000000 08:02 791569                         /lib64/ld-2.12.so
338241f000-3382421000 r--p 0001f000 08:02 791569                         /lib64/ld-2.12.so
3382421000-3382422000 rw-p 00021000 08:02 791569                         /lib64/ld-2.12.so
3382422000-3382423000 rw-p 00000000 00:00 0
3382600000-338278a000 r-xp 00000000 08:02 791570                         /lib64/libc-2.12.so
338278a000-338298a000 ---p 0018a000 08:02 791570                         /lib64/libc-2.12.so
338298a000-338298e000 r--p 0018a000 08:02 791570                         /lib64/libc-2.12.so
338298e000-3382990000 rw-p 0018e000 08:02 791570                         /lib64/libc-2.12.so
3382990000-3382994000 rw-p 00000000 00:00 0
3382a00000-3382a02000 r-xp 00000000 08:02 791575                         /lib64/libdl-2.12.so
3382a02000-3382c02000 ---p 00002000 08:02 791575                         /lib64/libdl-2.12.so
3382c02000-3382c03000 r--p 00002000 08:02 791575                         /lib64/libdl-2.12.so
3382c03000-3382c04000 rw-p 00003000 08:02 791575                         /lib64/libdl-2.12.so
3382e00000-3382e17000 r-xp 00000000 08:02 791571                         /lib64/libpthread-2.12.so
3382e17000-3383017000 ---p 00017000 08:02 791571                         /lib64/libpthread-2.12.so
3383017000-3383018000 r--p 00017000 08:02 791571                         /lib64/libpthread-2.12.so
3383018000-3383019000 rw-p 00018000 08:02 791571                         /lib64/libpthread-2.12.so
3383019000-338301d000 rw-p 00000000 00:00 0
3383600000-3383615000 r-xp 00000000 08:02 791580                         /lib64/libz.so.1.2.3
3383615000-3383814000 ---p 00015000 08:02 791580                         /lib64/libz.so.1.2.3
3383814000-3383815000 r--p 00014000 08:02 791580                         /lib64/libz.so.1.2.3
3383815000-3383816000 rw-p 00015000 08:02 791580                         /lib64/libz.so.1.2.3
3383a00000-3383a07000 r-xp 00000000 08:02 791572                         /lib64/librt-2.12.so
3383a07000-3383c06000 ---p 00007000 08:02 791572                         /lib64/librt-2.12.so
3383c06000-3383c07000 r--p 00006000 08:02 791572                         /lib64/librt-2.12.so
3383c07000-3383c08000 rw-p 00007000 08:02 791572                         /lib64/librt-2.12.so
3384200000-338420c000 r-xp 00000000 08:02 2368054                        /usr/lib64/libjansson.so.4.9.0
338420c000-338440b000 ---p 0000c000 08:02 2368054                        /usr/lib64/libjansson.so.4.9.0
338440b000-338440c000 rw-p 0000b000 08:02 2368054                        /usr/lib64/libjansson.so.4.9.0
3385a00000-3385b6f000 r-xp 00000000 08:02 791652                         /lib64/libdb-4.7.so
3385b6f000-3385d6f000 ---p 0016f000 08:02 791652                         /lib64/libdb-4.7.so
3385d6f000-3385d74000 rw-p 0016f000 08:02 791652                         /lib64/libdb-4.7.so
3385d74000-3385d75000 rw-p 00000000 00:00 0
3386640000-338683f000 ---p 00040000 08:02 791574                         /lib64/libdbus-1.so.3.4.0
338683f000-3386840000 r--p 0003f000 08:02 791574                         /lib64/libdbus-1.so.3.4.0
3386840000-3386841000 rw-p 00040000 08:02 791574                         /lib64/libdbus-1.so.3.4.0
3386e00000-3386fba000 r-xp 00000000 08:02 2362027                        /usr/lib64/libcrypto.so.1.0.1e
3386fba000-33871ba000 ---p 001ba000 08:02 2362027                        /usr/lib64/libcrypto.so.1.0.1e
33871ba000-33871d5000 r--p 001ba000 08:02 2362027                        /usr/lib64/libcrypto.so.1.0.1e
33871d5000-33871e1000 rw-p 001d5000 08:02 2362027                        /usr/lib64/libcrypto.so.1.0.1e
33871e1000-33871e5000 rw-p 00000000 00:00 0
3388e00000-3388e0b000 r-xp 00000000 08:02 2368897                        /usr/lib64/libavahi-common.so.3.5.1
3388e0b000-338900b000 ---p 0000b000 08:02 2368897                        /usr/lib64/libavahi-common.so.3.5.1
338900b000-338900c000 rw-p 0000b000 08:02 2368897                        /usr/lib64/libavahi-common.so.3.5.1
3389600000-3389610000 r-xp 00000000 08:02 2368899                        /usr/lib64/libavahi-client.so.3.2.5
3389610000-338980f000 ---p 00010000 08:02 2368899                        /usr/lib64/libavahi-client.so.3.2.5
338980f000-3389810000 rw-p 0000f000 08:02 2368899                        /usr/lib64/libavahi-client.so.3.2.5
7f432c000000-7f432c021000 rw-p 00000000 00:00 0
7f432c021000-7f4330000000 ---p 00000000 00:00 0
7f4334000000-7f4334021000 rw-p 00000000 00:00 0
7f4334021000-7f4338000000 ---p 00000000 00:00 0
7f4338908000-7f433891e000 r-xp 00000000 08:02 791642                     /lib64/libgcc_s-4.4.7-20120601.so.1
7f433891e000-7f4338b1d000 ---p 00016000 08:02 791642                     /lib64/libgcc_s-4.4.7-20120601.so.1
7f4338b1d000-7f4338b1e000 rw-p 00015000 08:02 791642                     /lib64/libgcc_s-4.4.7-20120601.so.1
7f4338b38000-7f4338b39000 ---p 00000000 00:00 0
7f4338b39000-7f4339539000 rw-p 00000000 00:00 0
7f4339539000-7f4339546000 r-xp 00000000 08:02 786522                     /lib64/libnss_files-2.12.so
7f4339546000-7f4339745000 ---p 0000d000 08:02 786522                     /lib64/libnss_files-2.12.so
7f4339745000-7f4339746000 r--p 0000c000 08:02 786522                     /lib64/libnss_files-2.12.so
7f4339746000-7f4339747000 rw-p 0000d000 08:02 786522                     /lib64/libnss_files-2.12.so
7f4339747000-7f433974d000 rw-p 00000000 00:00 0
7f4339765000-7f4339768000 rw-p 00000000 00:00 0
7f4339768000-7f433976a000 r-xp 00000000 08:02 2374673                    /usr/local/lib/libevent_pthreads-2.0.so.5.1.9
7f433976a000-7f4339969000 ---p 00002000 08:02 2374673                    /usr/local/lib/libevent_pthreads-2.0.so.5.1.9
7f4339969000-7f433996a000 rw-p 00001000 08:02 2374673                    /usr/local/lib/libevent_pthreads-2.0.so.5.1.9
7f433996a000-7f43399ae000 r-xp 00000000 08:02 2368999                    /usr/local/lib/libevent-2.0.so.5.1.9
7f43399ae000-7f4339bad000 ---p 00044000 08:02 2368999                    /usr/local/lib/libevent-2.0.so.5.1.9
7f4339bad000-7f4339baf000 rw-p 00043000 08:02 2368999                    /usr/local/lib/libevent-2.0.so.5.1.9
7f4339baf000-7f4339bb1000 rw-p 00000000 00:00 0
7fff784b4000-7fff784c9000 rw-p 00000000 00:00 0                          [stack]
7fff78573000-7fff78574000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

If the daemons are running, then using valgrind and incrementing the log name has some invalid writes:

bash-4.1$ valgrind /home/jenkins/workspace/gdp/apps/gcl-create -k none -s terra.EECS.Berkeley.EDU gdp.runTests.16167
==22124== Memcheck, a memory error detector
==22124== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==22124== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==22124== Command: /home/jenkins/workspace/gdp/apps/gcl-create -k none -s terra.EECS.Berkeley.EDU gdp.runTests.16167
==22124==
==22124== Invalid write of size 8
==22124==    at 0x40A3F0: _gdp_gcl_cache_add (gdp_gcl_cache.c:164)
==22124==    by 0x4122A7: _gdp_req_send (gdp_req.c:450)
==22124==    by 0x411942: _gdp_invoke (gdp_proto.c:127)
==22124==    by 0x40BCD2: _gdp_gcl_create (gdp_gcl_ops.c:281)
==22124==    by 0x407884: gdp_gcl_create (gdp_api.c:328)
==22124==    by 0x406500: main (gcl-create.c:620)
==22124==  Address 0x50aa8b8 is 72 bytes inside a block of size 248 free'd
==22124==    at 0x4A06430: free (vg_replace_malloc.c:446)
==22124==    by 0x41ABA9: ep_mem_free (ep_mem.c:121)
==22124==    by 0x407E42: gdp_gcl_open (gdp_api.c:420)
==22124==    by 0x405D40: main (gcl-create.c:308)
==22124==
==22124== Thread 2:
==22124== Invalid write of size 8
==22124==    at 0x4096B5: _gdp_gcl_touch (gdp_gcl_cache.c:339)
==22124==    by 0x409828: _gdp_gcl_incref (gdp_gcl_cache.c:594)
==22124==    by 0x409E15: _gdp_gcl_cache_get (gdp_gcl_cache.c:242)
==22124==    by 0x40DD6F: _gdp_pdu_process (gdp_main.c:322)
==22124==    by 0x408F09: gdp_read_cb (gdp_chan.c:88)
==22124==    by 0x4C28672: bufferevent_run_deferred_callbacks_unlocked (bufferevent.c:188)
==22124==    by 0x4C213EA: event_base_loop (event.c:1391)
==22124==    by 0x40D588: run_event_loop (gdp_main.c:593)
==22124==    by 0x3382E07AA0: start_thread (in /lib64/libpthread-2.12.so)
==22124==    by 0x628A6FF: ???
==22124==  Address 0x50aa8b8 is 72 bytes inside a block of size 248 free'd
==22124==    at 0x4A06430: free (vg_replace_malloc.c:446)
==22124==    by 0x41ABA9: ep_mem_free (ep_mem.c:121)
==22124==    by 0x407E42: gdp_gcl_open (gdp_api.c:420)
==22124==    by 0x405D40: main (gcl-create.c:308)
==22124==
==22124== Invalid write of size 8
==22124==    at 0x409768: _gdp_gcl_touch (gdp_gcl_cache.c:342)
==22124==    by 0x409828: _gdp_gcl_incref (gdp_gcl_cache.c:594)
==22124==    by 0x409E15: _gdp_gcl_cache_get (gdp_gcl_cache.c:242)
==22124==    by 0x40DD6F: _gdp_pdu_process (gdp_main.c:322)
==22124==    by 0x408F09: gdp_read_cb (gdp_chan.c:88)
==22124==    by 0x4C28672: bufferevent_run_deferred_callbacks_unlocked (bufferevent.c:188)
==22124==    by 0x4C213EA: event_base_loop (event.c:1391)
==22124==    by 0x40D588: run_event_loop (gdp_main.c:593)
==22124==    by 0x3382E07AA0: start_thread (in /lib64/libpthread-2.12.so)
==22124==    by 0x628A6FF: ???
==22124==  Address 0x50aa8b8 is 72 bytes inside a block of size 248 free'd
==22124==    at 0x4A06430: free (vg_replace_malloc.c:446)
==22124==    by 0x41ABA9: ep_mem_free (ep_mem.c:121)
==22124==    by 0x407E42: gdp_gcl_open (gdp_api.c:420)
==22124==    by 0x405D40: main (gcl-create.c:308)
==22124==
exiting with status ERROR: 600 no route available [Berkeley:Swarm-GDP:600]
==22124==
==22124== HEAP SUMMARY:
==22124==     in use at exit: 64,176 bytes in 649 blocks
==22124==   total heap usage: 945 allocs, 296 frees, 169,209 bytes allocated
==22124==
==22124== LEAK SUMMARY:
==22124==    definitely lost: 51 bytes in 1 blocks
==22124==    indirectly lost: 0 bytes in 0 blocks
==22124==      possibly lost: 33,427 bytes in 10 blocks
==22124==    still reachable: 30,698 bytes in 638 blocks
==22124==         suppressed: 0 bytes in 0 blocks
==22124== Rerun with --leak-check=full to see details of leaked memory
==22124==
==22124== For counts of detected and suppressed errors, rerun with: -v
==22124== ERROR SUMMARY: 7 errors from 3 contexts (suppressed: 6 from 6)

Perhaps running the above on an Ubuntu host would be of use.

Also, under Darwin, using the GDP Java interface crashes in _gdp_gcl_touch, which looks like the same problem:

bash-3.2% $PTII/bin/vergil ptolemy/actor/lib/jjs/modules/gdp/test/auto/GDPLogSubscribeJS.xml  

Updating local copy of the accessors repository.
In "/Users/cxh/ptII/org/terraswarm/accessor/accessors", about to execute:
    svn update --non-interactive --trust-server-cert --accept postpone
Updating '.':
At revision 1477.
All Done
Updated accessor repository.
GDPLogCreate.js: initialize()
GDPLogAppend.js: initialize()
JavaScriptGDPLogName: ptolemy.actor.lib.jjs.modules.gdp.test.auto.GDPLogSubscribeJS.0.9761887910993737
GDPLogCreate.js: create() Start.
GDPHelper.GDPHelper(ptolemy.actor.lib.jjs.modules.gdp.test.auto.GDPLogSubscribeJS.0.9761887910993737, 3, edu.berkeley.eecs.gdp-01.gdplogd):
GDPManager: Using configuration files in /Users/cxh/.ep_adm_params
GDPManager: gdp settings:
swarm.gdp.routers=gdp-03.eecs.berkeley.edu; gdp-02.eecs.berkeley.edu
GDP_GCL.java: newGCL(org.terraswarm.gdp.GDP_NAME@613fd327, 3, org.terraswarm.gdp.GDP_NAME@4a3490b4)
GDP_GCL.java: GDP_GCL(org.terraswarm.gdp.GDP_NAME@613fd327(7EqbDC0L4Zg8n2d7yDQBsIQ4oEjxh724p44yQYaMDQ8^@), RA, org.terraswarm.gdp.GDP_NAME@4a3490b\
4(m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U^@)

>>> gdp_gcl_open(7EqbDC0L4Zg8n2d7yDQBsIQ4oEjxh724p44yQYaMDQ8)
_gdp_lib_init(NULL)
        @(#)libgdp 0.7.2 (2017-04-06 15:21) d4712db48058060d35c2c2dad78976148115d24e++
My GDP routing name = mTWjJbHkcxgK0_uoJtdrbNXjPwZKnxIIcPto8870A6Y
gdp_lib_init: OK
_gdp_chan_open(gdp-03.eecs.berkeley.edu; gdp-02.eecs.berkeley.edu)
Trying gdp-03.eecs.berkeley.edu
_gdp_chan_open: trying host gdp-03.eecs.berkeley.edu port 8007
_gdp_chan_open: talking to router at gdp-03.eecs.berkeley.edu:8007
gdp_event_loop: starting up I/O base loop
Inserted events:
  0x7fe2c09e1860 [fd  115] Write Persist
  0x7fe2c09e17e0 [fd  115] Read Persist
  0x7fe2c35611d0 [fd  -1] Persist Timeout=1491517963.442131
Active events:
_gdp_req_new: allocated new pdu @ 0x7fe2c0bd3890
_gdp_pdu_out, fd = 115, basemd = 0x0: CMD_ADVERTISE
gdp_init: OK
_gdp_req_new: allocated new pdu @ 0x7fe2c0bd3890

>>> _gdp_invoke(req=0x7fe2c0bd3770 rid=1): CMD_OPEN_RA (75), gcl@0x7fe2c0bd3a30
        datum @ 0x7fe2c0bd3910: recno -1, len 0, no timestamp
_gdp_pdu_out, fd = 115, basemd = 0x0: CMD_OPEN_RA
_gdp_pdu_in(NAK_R_NOROUTE) => OK

*** Processing ack/nak 240=NAK_R_NOROUTE from socket 115
gdp_pdu_proc_resp(0x7fe2c09e1d10 NAK_R_NOROUTE) gcl 0x7fe2c0bd3a30
_gdp_req_dispatch >>> NAK_R_NOROUTE (240) [gcl->refcnt 2]
nak_router: received NAK_R_NOROUTE for CMD_OPEN_RA
_gdp_req_dispatch <<< NAK_R_NOROUTE [gcl->refcnt 2]
    ERROR: 600 no route available [Berkeley:Swarm-GDP:600]
<<< _gdp_invoke(0x7fe2c0bd3770 rid=1) CMD_OPEN_RA: ERROR: 600 no route available [Berkeley:Swarm-GDP:600]
Couldn't open GCL 7EqbDC0L4Zg8n2d7yDQBsIQ4oEjxh724p44yQYaMDQ8:
        ERROR: 600 no route available [Berkeley:Swarm-GDP:600]
<<< gdp_gcl_open(7EqbDC0L4Zg8n2d7yDQBsIQ4oEjxh724p44yQYaMDQ8): ERROR: 600 no route available [Berkeley:Swarm-GDP:600]
GDP_GCL: gdp_gcl_open() failed, trying to create the log and call gdp_gcl_open() again.
GDP_GCL.java: create(org.terraswarm.gdp.GDP_NAME@613fd327, , org.terraswarm.gdp.GDP_NAME@4a3490b4, {})

>>> gdp_gcl_create
_gdp_gcl_create: gcl=7EqbDC0L4Zg8n2d7yDQBsIQ4oEjxh724p44yQYaMDQ8
        logd=m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U
_gdp_req_new: allocated new pdu @ 0x7fe2c0bd3890

>>> _gdp_invoke(req=0x7fe2c0bd3770 rid=2): CMD_CREATE (66), gcl@0x7fe2c08e71b0
        datum @ 0x7fe2c0bd3910: recno -1, len 34, no timestamp
00000000 ec 4a 9b 0c 2d 0b e1 98 3c 9f 67 7b c8 34 01 b0
         \354  J  ⌧  ⌧  -  ⌧  \341  ⌧  <  ⌧  g  {  \310  4  ⌧  \260
00000010 84 38 a0 48 f1 87 bd b8 a7 8e 32 41 86 8c 0d 0f
         ⌧  8  \240  H  \361  ⌧  \275  \270  \247  ⌧  2  A  ⌧  ⌧  ⌧  ⌧
00000020 00 00
         ⌧  ⌧
_gdp_pdu_out, fd = 115, basemd = 0x0: CMD_CREATE
_gdp_pdu_in(ACK_DATA_CREATED) => OK

*** Processing ack/nak 129=ACK_DATA_CREATED from socket 115
gdp_pdu_proc_resp(0x7fe2c09e1d10 ACK_DATA_CREATED) gcl 0x7fe2c08e71b0
_gdp_req_dispatch >>> ACK_DATA_CREATED (129) [gcl->refcnt 2]
ack_success: received ACK_DATA_CREATED for CMD_CREATE
_gdp_req_dispatch <<< ACK_DATA_CREATED [gcl->refcnt 2]
    OK [201 = 0xc9]
<<< _gdp_invoke(0x7fe2c0bd3770 rid=2) CMD_CREATE: OK [201 = 0xc9]
<<< gdp_gcl_create: OK [201 = 0xc9]

>>> gdp_gcl_open(7EqbDC0L4Zg8n2d7yDQBsIQ4oEjxh724p44yQYaMDQ8)
_ep_thr_mutex_lock: mutex 0x7fe2c08e71b0 (gcl) has order 12,  but 31 is already locked (gdp_gcl_cache.c:227)
gdp_gcl_open(m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U): using existing GCL @ 0x7fe2c08e71b0
<<< gdp_gcl_open(m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U): OK

GDPLogCreate.js: create() Done!
GDPHelper.GDPHelper(ptolemy.actor.lib.jjs.modules.gdp.test.auto.GDPLogSubscribeJS.0.9761887910993737, 1, edu.berkeley.eecs.gdp-01.gdplogd):
GDPManager: Using configuration files in /Users/cxh/.ep_adm_params
GDPManager: gdp settings:
swarm.gdp.routers=gdp-03.eecs.berkeley.edu; gdp-02.eecs.berkeley.edu
GDP_GCL.java: newGCL(org.terraswarm.gdp.GDP_NAME@495424f0, 1, org.terraswarm.gdp.GDP_NAME@39fd2905)
GDP_GCL.java: GDP_GCL(org.terraswarm.gdp.GDP_NAME@495424f0(7EqbDC0L4Zg8n2d7yDQBsIQ4oEjxh724p44yQYaMDQ8^@), RO, org.terraswarm.gdp.GDP_NAME@39fd290\
5(m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U^@)

>>> gdp_gcl_open(7EqbDC0L4Zg8n2d7yDQBsIQ4oEjxh724p44yQYaMDQ8)
_ep_thr_mutex_lock: mutex 0x7fe2c08e71b0 (gcl) has order 12,  but 31 is already locked (gdp_gcl_cache.c:227)
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00000001530a4535, pid=77816, tid=0x000000000001091f
#
# JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build 1.8.0_101-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C  [jna4710083213482355109.tmp+0x6535]  _gdp_gcl_touch+0x175
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/cxh/ptII/ptolemy/actor/lib/jjs/modules/gdp/test/auto/hs_err_pid77816.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Abort trap: 6
bash-3.2$ 

The Java stack trace is not much help. The crashed thread is #29, included below.

Process:               java [77816]
Path:                  /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java
Identifier:            net.java.openjdk.cmd
Version:               1.0 (1.0)
Code Type:             X86-64 (Native)
Parent Process:        bash [73040]
Responsible:           java [77816]
User ID:               502

Date/Time:             2017-04-06 15:32:14.083 -0700
OS Version:            Mac OS X 10.12.3 (16D32)
Report Version:        12
Anonymous UUID:        95DE4877-FDC5-7AC4-9E0A-28B9C5B2F6A4

Sleep/Wake UUID:       C99488A9-D84C-42E1-9E36-4892270F185E

Time Awake Since Boot: 250000 seconds
Time Since Wake:       13000 seconds

System Integrity Protection: enabled

Crashed Thread:        29  Java: GDPLogSubscribeJS

Exception Type:        EXC_BAD_ACCESS (SIGABRT)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000419
Exception Note:        EXC_CORPSE_NOTIFY

.....

Stack: [0x000070000c333000,0x000070000c433000],  sp=0x000070000c4300d0,  free space=1012k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [jna4710083213482355109.tmp+0x6535]  _gdp_gcl_touch+0x175
C  [jna4710083213482355109.tmp+0x6177]  _gdp_gcl_incref+0x67
C  [jna4710083213482355109.tmp+0x5fda]  _gdp_gcl_cache_get+0x8a
C  [jna4710083213482355109.tmp+0x1b83]  gdp_gcl_open+0x153
C  [jna5070547521473446232.tmp+0xd514]  ffi_call_unix64+0x4c
C  0x00007fe2c083ce00

....
Thread 29 Crashed:: Java: GDPLogSubscribeJS
0   libsystem_kernel.dylib          0x00007fffb113add6 __pthread_kill + 10
1   libsystem_pthread.dylib         0x00007fffb1226787 pthread_kill + 90
2   libsystem_c.dylib               0x00007fffb10a0420 abort + 129
3   libjvm.dylib                    0x0000000104b2faeb os::abort(bool) + 25
4   libjvm.dylib                    0x0000000104c55e3e VMError::report_and_die() + 2304
5   libjvm.dylib                    0x0000000104b31716 JVM_handle_bsd_signal + 1131
6   libjvm.dylib                    0x0000000104b2d97b signalHandler(int, __siginfo*, void*) + 47
7   libsystem_platform.dylib        0x00007fffb1219bba _sigtramp + 26
8   jna4710083213482355109.tmp      0x00000001530bc4b8 0x15309e000 + 124088
9   jna4710083213482355109.tmp      0x00000001530a4177 _gdp_gcl_incref + 103 (gdp_gcl_cache.c:595)
10  jna4710083213482355109.tmp      0x00000001530a3fda _gdp_gcl_cache_get + 138 (gdp_gcl_cache.c:248)
11  jna4710083213482355109.tmp      0x000000015309fb83 gdp_gcl_open + 339 (gdp_api.c:394)
12  jna5070547521473446232.tmp      0x0000000126ce0514 ffi_call_unix64 + 76
13  jna5070547521473446232.tmp      0x0000000126ce042d ffi_call + 781
14  jna5070547521473446232.tmp      0x0000000126cd7565 0x126cd3000 + 17765
15  ???                             0x000000010565b9d4 0 + 4385520084

gdpTestsRhel11Apr2016.txt Magnifier (14.8 KB) Anonymous, 04/11/2017 12:47 PM

gdpPtolemyDarwin11Apr2017.txt Magnifier (110 KB) Anonymous, 04/11/2017 12:49 PM

Screen Shot 2017-04-11 at 12.49.40 PM.png (688 KB) Anonymous, 04/11/2017 12:50 PM

PtolemyRhel11Apr2017.txt Magnifier (38.2 KB) Anonymous, 04/11/2017 01:15 PM

hs_err_pid2764.log Magnifier (69.5 KB) Anonymous, 04/11/2017 01:15 PM

95

History

#1 Updated by Anonymous over 6 years ago

Hi Eric,
Any word on this? I can't test the under Mac or RHEL because the GDP crashes

_Christopher

#2 Updated by Eric Allman over 6 years ago

  • Status changed from New to In Progress

Well, I've never been able to reproduce this, so I can't be sure that it's gone. However, I did push a batch of changes today that includes a number of changes to locking, which in turn affects memory memory management, so I'm waiting for Jenkins to do its thing to see if anything changes. For what it's worth, valgrind doesn't seem to complain.

#3 Updated by Anonymous over 6 years ago

95

Under RHEL, valglrind looks better, but I'm getting a new error:

[cxh@terra junit]$     export EP_PARAM_PATH=/home/cxh/src/ptII/vendors/gdp/ep_adm_params
[cxh@terra junit]$            ( cd /home/cxh/src/ptII/vendors/gdp/gdp_router; python ./src/gdp_router.py -l routerLog.txt ) &
[1] 2219
[cxh@terra junit]$            /home/cxh/src/ptII/vendors/gdp/gdp/gdplogd/gdplogd -F -N terra.EECS.Berkeley.EDU &
[2] 2224
[cxh@terra junit]$ My GDP routing name = terra.EECS.Berkeley.EDU
avahi_service_browser_new() failed: Too many clients

[cxh@terra junit]$ valgrind /home/jenkins/workspace/gdp/apps/gcl-create -k none -s terra.EECS.Berkeley.EDU gdp.runTests.16182
==2236== Memcheck, a memory error detector
==2236== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==2236== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==2236== Command: /home/jenkins/workspace/gdp/apps/gcl-create -k none -s terra.EECS.Berkeley.EDU gdp.runTests.16182
==2236==
avahi_service_browser_new() failed: Too many clients
[ERROR] gcl-create: Cannot create gdp.runTests.16182: already exists
==2236==
==2236== HEAP SUMMARY:
==2236==     in use at exit: 64,112 bytes in 645 blocks
==2236==   total heap usage: 894 allocs, 249 frees, 83,741 bytes allocated
==2236==
==2236== LEAK SUMMARY:
==2236==    definitely lost: 32 bytes in 1 blocks
==2236==    indirectly lost: 96 bytes in 1 blocks
==2236==      possibly lost: 33,386 bytes in 10 blocks
==2236==    still reachable: 30,598 bytes in 633 blocks
==2236==         suppressed: 0 bytes in 0 blocks
==2236== Rerun with --leak-check=full to see details of leaked memory
==2236==
==2236== For counts of detected and suppressed errors, rerun with: -v
==2236== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 6)
[cxh@terra junit]$ valgrind /home/jenkins/workspace/gdp/apps/gcl-create -k none -s terra.EECS.Berkeley.EDU gdp.runTests.16183
==2240== Memcheck, a memory error detector
==2240== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==2240== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==2240== Command: /home/jenkins/workspace/gdp/apps/gcl-create -k none -s terra.EECS.Berkeley.EDU gdp.runTests.16183
==2240==
avahi_service_browser_new() failed: Too many clients
Created new GCL rh58-gzsgdGsxBll9OkcFeacxlwGIDkg_ls730NO91I
        on log server terra.EECS.Berkeley.EDU
Assertion failed at gcl-create:gdp_gcl_ops.c:473: _ep_thr_mutex_lock: mutex 0x509b1c0 (gcl) already self-locked

Under Darwin, the tests in gdp/test succeeded!

Under RHEL, when running as cxh on terra.eecs, one of the tests in gdp/test fails, see the attached file (gdpTestsRhel11Apr2016.txt (sorry about the 2016, I thought I'd be able to adjust the date.)

Under Darwin, the Ptolemy model /ptolemy/actor/lib/jjs/modules/gdp/test/auto/GDPLogSubscribe.xml fails, see gdpPtolemyDarwin11Apr2017.txt and Screen Shot 2017-04-11 at 12.49.40 PM.png

The problem also occurs under RHEL.

It looks like the Subscriber is missing data. The expected values are

{"nil", "1", "2", "3", "4"}

Instead, I'm getting

{"nil", "1", "3", "4", "5"}

What happened to 2?

#4 Updated by Anonymous over 6 years ago

I got a core dump under RHEL using the Java interface, see below and the attached files.

To replicate on terra:

cd /home/cxh/src/ptII/ptolemy/actor/lib/jjs/modules/gdp/test/junit
make

Below is the error:

*** Processing ack/nak 240=NAK_R_NOROUTE from socket 35
gdp_pdu_proc_resp(0x7f5ccc000db0 NAK_R_NOROUTE) gcl 0x7f5cd04ba970
_gdp_req_dispatch >>> NAK_R_NOROUTE (240) [gcl->refcnt 4]
nak_router: received NAK_R_NOROUTE for CMD_CREATE
_gdp_req_dispatch <<< NAK_R_NOROUTE [gcl->refcnt 4]
    ERROR: 600 no route available [Berkeley:Swarm-GDP:600]
<<< _gdp_invoke(0x7f5cd04baec0 rid=2) CMD_CREATE: ERROR: 600 no route available [Berkeley:Swarm-GDP:600]
Could not create GCL: ERROR: 600 no route available [Berkeley:Swarm-GDP:600]
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f5cd67c2df4, pid=2764, tid=140038016612096
#
# JRE version: Java(TM) SE Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libgdp.0.7.so+0x12df4]  _gdp_gcl_unlock_trace+0x4
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/cxh/src/ptII/ptolemy/actor/lib/jjs/modules/gdp/test/hs_err_pid2764.log

#5 Updated by Eric Allman over 6 years ago

This looks like multiple problems.

Re Assertion failed at gcl-create:gdp_gcl_ops.c:473: _ep_thr_mutex_lock: mutex 0x509b1c0 (gcl) already self-locked: is there any chance you aren't running the very latest code (commit:00cf2f7b)? That was exactly the symptom fixed in that commit.

Re avahi_service_browser_new() failed: Too many clients: is this one of the problems, or is it just noise?

Re run_tests.py::test_t_multimultiread FAILED: That looks like log gdp.runTests.28330 is empty, so it's not surprising that the multiread fails. I'm not sure, but it looks like that might be due to the next problem:

Re run_tests.py::test_t_fwd_append FAILED: I hadn't updated that routine to use the new locking convention because it is for an experimental interface that really needs to change (and that interface has not been deployed). However, as a short term workaround I tweaked the locking so the test passes, even if the test is pretty meaningless at the moment.

I think this covers all the issues, but I may have missed something. So at this point the current commit is commit:c2f7e025.

#6 Updated by Eric Allman over 6 years ago

Re:

ERROR: 600 no route available [Berkeley:Swarm-GDP:600]

About all I can figure out from this is that it looks like gdplogd isn't running. But I don't see any evidence that it has crashed, either. Is there any more info?

#7 Updated by Anonymous over 6 years ago

I think we can close this issue.

Re Assertion failed at gcl-create:gdp_gcl_ops.c:473: _ep_thr_mutex_lock: mutex 0x509b1c0 (gcl) already self-locked: is there any chance you aren't running the very latest code (00cf2f7b)? That was exactly the symptom fixed in that commit.

I'm not sure.

Re avahi_service_browser_new() failed: Too many clients: is this one of the problems, or is it just noise?

I'm not sure. This is the first I've see of that.

Re run_tests.py::test_t_multimultiread FAILED: That looks like log gdp.runTests.28330 is empty, so it's not surprising that the multiread fails. I'm not sure, but it looks like that might be due to the next problem:

Ok.

->Re run_tests.py::test_t_fwd_append FAILED: I hadn't updated that routine to use the new locking convention because it is for an experimental interface that really needs to change (and that interface has not been deployed). However, as a short term workaround I tweaked the locking so the test passes, even if the test is pretty meaningless at the moment.

Ok,

It looks like this issue is fixed. There is still the problem of subscriber dropping data. I'll introduce another issue for that if it fails for a few times.

This issue can be closed.

#8 Updated by Eric Allman over 6 years ago

Opening the subscription problem as a separate issue would be a good idea. This issue is already too many things.

When you do that, could you check to see if the underlying log has the right data in it? Since you had said the values should be nil — 4, and it was actually nil — 5, I suspect that the problem is with the write rather than the subscribe. Also, are the writes synchronous or asynchronous?

#9 Updated by Eric Allman over 6 years ago

  • Status changed from In Progress to Closed

Also available in: Atom PDF