Project

General

Profile

Bug #83

Assertions should not crash the calling process: assertion in gdp_gcl_close() causes the application to exit

Added by Anonymous over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
libgdp
Start date:
10/23/2016
Due date:
% Done:

0%


Description

Vergil crashed because gdp_gcl_close() failed an assertion. The Java stack trace was:

...
Crashed Thread:        14  Java: Finalizer

Exception Type:        EXC_BAD_ACCESS (SIGABRT)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000008
...
Thread 14 Crashed:: Java: Finalizer
0   libsystem_kernel.dylib          0x00007fff96037f06 __pthread_kill + 10
1   libsystem_pthread.dylib         0x00007fff83c0f4ec pthread_kill + 90
2   libsystem_c.dylib               0x00007fff8af666df abort + 129
3   jna4136321097568769475.tmp      0x000000017a407ac1 ep_assert_failure + 273
4   jna4136321097568769475.tmp      0x000000017a3fce95 _gdp_gcl_close + 325
5   jna4136321097568769475.tmp      0x000000017a3f7d35 gdp_gcl_close + 117 (gdp_api.c:442)
6   jna3104559744272979422.tmp      0x000000017a3ed514 ffi_call_unix64 + 76
7   jna3104559744272979422.tmp      0x000000017a3ed42d ffi_call + 781
8   jna3104559744272979422.tmp      0x000000017a3e4565 0x17a3e0000 + 17765
9   ???                             0x000000010b5b49d4 0 + 4485499348
10  ???                             0x000000010b5a52bd 0 + 4485436093
11  ???                             0x000000010b5a5040 0 + 4485435456
12  ???                             0x000000010b5a5040 0 + 4485435456
13  ???                             0x000000010b5a5040 0 + 4485435456
14  ???                             0x000000010b5a5114 0 + 4485435668
15  ???                             0x000000010b5a5114 0 + 4485435668
16  ???                             0x000000010b5a52bd 0 + 4485436093

Assertions that cause the process to crash make it difficult to use the library. It would be better if gdp_gcl_close() did the best it could and then returned an error condition. In general closing a gdp should succeed even if there are issues.

I'm not sure how I triggered this, but at the time I had just run a model in Cape Code that used the GDP and then quickly was trying to generate code.

My guess is that multiple actors are attempting to close the gcl and one of them succeeded.

I've attached the Java crash log.

_

gdp_gcl_closeJavaCrash.txt Magnifier (86.7 KB) Anonymous, 10/23/2016 03:25 PM


Related issues

Related to GDP - Bug #40: gcl-create calls abort: there should be a version of _gdp_gcl_decref that does not invoke GDP_ASSERT_GOOD_GCL Closed 08/14/2016
Duplicates GDP - Bug #84: Assertion in gdp_gcl_create() crashes the calling process Closed 10/23/2016

History

#1 Updated by Nitesh Mor over 3 years ago

  • Duplicates Bug #84: Assertion in gdp_gcl_create() crashes the calling process added

#2 Updated by Eric Allman over 3 years ago

  • Status changed from New to Feedback

I'm not 100% convinced that this is actually a duplicate of Issue #84, since it happened on a call to gdp_gcl_close. It's probable that a NULL pointer was passed into gdp_gcl_close, apparently directly from Java. Note that had the assertion not been present the process almost certainly would have crashed anyway (with a NULL pointer dereference).

I'm setting this issue to Feedback status since it looks like a bug in the calling program, not libgdp. If there's no further indication that it's actually a GDP problem, we'll close it out in a week or so.

#3 Updated by Anonymous over 3 years ago

The problem seems to be that if I run the model twice, then it sometimes crashes because close() is called twice.

To replicate in Ptolemy, run

$PTII/bin/vergil $PTII/ptolemy/actor/lib/jjs/modules/gdp/test/auto/GDPLogCreateAppendReadJS.xml

Here is the end of a successful run

>>> gdp_gcl_close(m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U)
gdp_gcl_ops.c459: _gdp_gcl_close(): gcl is not null
<<< gdp_gcl_close(m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U): OK
13968 ms. Memory: 544256K Free: 210047K (39%)

The message about gcl is not null is because I modified gdp_gcl_ops.c so that it prints that message:

EP_STAT
_gdp_gcl_close(gdp_gcl_t *gcl,
                        gdp_chan_t *chan,
                        uint32_t reqflags)
{
        EP_STAT estat = EP_STAT_OK;
        gdp_req_t *req;
    int nrefs;

        errno = 0;                              // avoid spurious messages                                         
    if (gcl == NULL) {
          fprintf(stderr, "%s%d: _gdp_gcl_close(): gcl is null\n", __FILE__, __LINE__);
        } else {
          fprintf(stderr, "%s%d: _gdp_gcl_close(): gcl is not null\n", __FILE__, __LINE__);
        }
        GDP_ASSERT_GOOD_GCL(gcl);

Here is the continuation of the run, just after the 13968 ms. line above:

GDP_GCL.close(): gclh: native@0x7ffc5cc6cc00
GDP_GCL.close(): after removing gclh, gclh: native@0x7ffc5cc6cc00

>>> gdp_gcl_close(m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U)
gdp_gcl_ops.c459: _gdp_gcl_close(): gcl is not null

>>> _gdp_invoke(req=0x7ffc5cb298b0 rid=0): CMD_CLOSE (69), gcl@0x7ffc5cc6cc00
        datum @ 0x7ffc5c4f9c30: recno -1, len 0, no timestamp
_gdp_pdu_out, fd = 120, basemd = 0x0: CMD_CLOSE
_gdp_pdu_in(ACK_SUCCESS) => OK

*** Processing ack/nak 128=ACK_SUCCESS from socket 120
gdp_pdu_proc_resp(0x7ffc5c88d060 ACK_SUCCESS) gcl 0x7ffc5cc6cc00
_gdp_req_dispatch >>> ACK_SUCCESS (128) [gcl->refcnt 2]
ack_success: received ACK_SUCCESS for CMD_CLOSE
_gdp_req_dispatch <<< ACK_SUCCESS [gcl->refcnt 2]
    OK [67896520 = 0x40c04c8]
<<< _gdp_invoke(0x7ffc5cb298b0 rid=0) CMD_CLOSE: OK [67896520 = 0x40c04c8]
<<< gdp_gcl_close(m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U): OK [67896520 = 0x40c04c8]
java.lang.Exception: GDP_GCL.close() was called
    at org.terraswarm.gdp.GDP_GCL.close(GDP_GCL.java:222)
        at org.terraswarm.gdp.GDP_GCL.finalize(GDP_GCL.java:298)
        at java.lang.System$2.invokeFinalize(System.java:1270)
        at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:98)
        at java.lang.ref.Finalizer.access$100(Finalizer.java:34)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:210)
GDP_GCL.close(): gclh: native@0x7ffc5cc6cc00
GDP_GCL.close(): after removing gclh, gclh: native@0x7ffc5cc6cc00

>>> gdp_gcl_close(m5FiAV65ufV8Oe3SinfrOZPcbzXtlNOG-lmD-KSsE1U)
gdp_gcl_ops.c459: _gdp_gcl_close(): gcl is not null
Assertion failed at gdp_gcl_ops.c:461: require:
        (gcl) != NULL && EP_UT_BITSET(GCLF_INUSE, (gcl)->flags)
Abort trap: 6
bash-3.2$

The GDP_GCL messages are from gdp/lang/java/org/terraswarm/gdp/GDP_GCL.java

    /** Close the GCL.                                                                                             
     */
    public void close() {
        new Exception("GDP_GCL.close() was called").printStackTrace();
        System.out.println("GDP_GCL.close(): gclh: " + gclh);
        // Remove ourselves from the global list.                                                                  
        _allGclhs.remove(gclh);

        System.out.println("GDP_GCL.close(): after removing gclh, gclh: " + gclh);
        // Free the associated gdp_gcl_t.                                                                          
        Gdp07Library.INSTANCE.gdp_gcl_close(gclh);
    }

What's happening here is that the GDP_GCL Java object is being finalized so close() is being called again.

Calling close() with a gcl that has already been closed should not ever fail.

Passing null or any other garbage as a gcl should also not ever fail.

Take a look at how fuzz testing works. We should expect the user to pass anything as an argument and the gdp library code should detect that and return an error.

Have a library exit the process is overkill. An absurd extension would be to have the process reformat the hard drive because of an error like this.

The best approach would be to return an error value.

A big problem with crashing the process is that I can't put the close() in a Java try block. I think that a good library always allows the user to try to close something multiple times and at worst returns a message that says that it was already closed.

If the GDP is going to be used successfully as a library it needs to be more forgiving about how it is used. Crashing the process is not user-friendly.

#4 Updated by Eric Allman over 3 years ago

As of commit:8701c522 gdp_gcl_close should return GDP_STAT_GCL_NOT_OPEN when passed a bad GCL.

#5 Updated by Anonymous over 3 years ago

  • Status changed from Feedback to Closed

Cool, thanks, I think we can close this one, so I'm closing it.

#6 Updated by Eric Allman over 3 years ago

  • Related to Bug #40: gcl-create calls abort: there should be a version of _gdp_gcl_decref that does not invoke GDP_ASSERT_GOOD_GCL added

Also available in: Atom PDF