Project

General

Profile

Bug #115

RHEL Java segfault while creating a log in ep_dbg_vprintf()

Added by Anonymous about 2 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
libgdp
Start date:
09/07/2017
Due date:
% Done:

0%


Description

After updating the GDP/Java interface yesterday, I'm seeing a segfault under RHEL on terra.eecs.berkeley.edu as the terra user.

$PTII/bin/vergil ptolemy/actor/lib/jjs/modules/gdp/test/auto/GDPLogCreateAppendReadJS.xml

fails with:

*** Processing ack/nak 240=NAK_R_NOROUTE from socket 161
gdp_pdu_proc_resp(0x7f3c44001760 NAK_R_NOROUTE) gcl 0x7f3c3c4a21a0
nak_router: received NAK_R_NOROUTE for CMD_CREATE
_gdp_req_dispatch <<< NAK_R_NOROUTE(m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U): ERROR: 600 \
no route available [Berkeley:Swarm-GDP:600]
<<< _gdp_invoke(0x7f3c3c4a1ae0 rid=2) CMD_CREATE: ERROR: 600 no route available [Berkeley:Swa\
rm-GDP:600]
Could not create GCL: ERROR: 600 no route available [Berkeley:Swarm-GDP:600]
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000003ed4a47c9c, pid=14633, tid=139900214380288
#
# JRE version: Java(TM) SE Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libc.so.6+0x47c9c]  _IO_vfprintf+0x3e5c
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/jenkins/hs_err_pid14633.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
<<< gdp_gcl_create(Aborted (core dumped)

See the attached gdpCrash.txt for the complete output
The Java crash log (attached) shows:

Stack: [0x00007f3d0e850000,0x00007f3d0e951000],  sp=0x00007f3d0e94ca60,  free space=1010k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libc.so.6+0x47c9c]  _IO_vfprintf+0x3e5c
C  [libgdp.0.8.so+0x2014e]  ep_dbg_vprintf+0x64

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  com.sun.jna.Native.invokeStructure(JI[Ljava/lang/Object;JJ)V+0
j  com.sun.jna.Native.invokeStructure(JI[Ljava/lang/Object;Lcom/sun/jna/Structure;)Lcom/sun/jna/Structure;+19
j  com.sun.jna.Function.invoke([Ljava/lang/Object;Ljava/lang/Class;Z)Ljava/lang/Object;+463
j  com.sun.jna.Function.invoke(Ljava/lang/Class;[Ljava/lang/Object;Ljava/util/Map;)Ljava/lang/Object;+262
j  com.sun.jna.Library$Handler.invoke(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object;+316
j  com.sun.proxy.$Proxy3.gdp_gcl_create(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;Lcom/sun/jna/ptr/PointerByReference;Lcom/sun/jna/ptr/PointerByReference;)Lorg/terraswarm/gdp/EP_STAT$ByValue;+29
j  org.terraswarm.gdp.GDP_GCL.create(Lorg/terraswarm/gdp/GDP_NAME;Lorg/terraswarm/gdp/GDP_NAME;Ljava/util/Map;)V+161
j  org.terraswarm.gdp.GDP_GCL.create(Lorg/terraswarm/gdp/GDP_NAME;Lorg/terraswarm/gdp/GDP_NAME;)V+9
j  org.terraswarm.gdp.GDP_GCL.<init>(Lorg/terraswarm/gdp/GDP_NAME;Lorg/terraswarm/gdp/GDP_GCL$GDP_MODE;Lorg/terraswarm/gdp/GDP_NAME;)V+167
j  org.terraswarm.gdp.GDP_GCL.newGCL(Lorg/terraswarm/gdp/GDP_NAME;ILorg/terraswarm/gdp/GDP_NAME;)Lorg/terraswarm/gdp/GDP_GCL;+146
j  ptolemy.actor.lib.jjs.modules.gdp.GDPHelper.<init>(Ljava/lang/Object;Ljdk/nashorn/api/scripting/ScriptObjectMirror;Ljava/lang/String;ILjava/lang/String;)V+311
...

It looks like this error is happening when the log is being created.

The same test does not fail under Darwin.

gdpCrash.txt Magnifier - Output from vergil showing GDP status (6.49 KB) Anonymous, 09/07/2017 01:33 PM

hs_err_pid14633.log Magnifier - Java crash dump (92.5 KB) Anonymous, 09/07/2017 01:33 PM

History

#1 Updated by Eric Allman about 2 years ago

  • Category set to libgdp
  • Status changed from New to In Progress
  • Assignee set to Eric Allman

Unfortunately I can't seem to find a true backtrace of the C stack in those logs, so all I can see for sure is that it apparently died in ep_dbg_vprintf, as you noted. Your guess that it died in gdp_gcl_create seems plausible, but it's hard to determine the exact problem. I've indulged in some guess work and made a couple of tweaks that should either make the problem more obvious or patch around the problem, depending on what the problem actually is. That code has been pushed to the repo. Please give it a try.

#2 Updated by Anonymous about 2 years ago

  • Status changed from In Progress to Closed

Thanks, that seems to have fixed the segfault under RHEL. I'll need to find the time to update the other platforms and npm sometime soon.

I'm closing this one.

Also available in: Atom PDF