Bug #92
Deadlock in gdplogd
0%
Description
Christopher reports issues with non-responsive gdplogd
(see #91). It appears to be a deadlock, here's a backtrace from gdb
(running from @(#)libgdp 0.7.2 (2016-12-11 14:25)
; the git hash is missing for some reason).
root@gdp-01:~# ps -ef | grep gdplogd root 11621 5467 0 09:22 pts/2 00:00:00 grep --color=auto gdplogd gdp 22522 1 0 Dec19 ? 00:00:00 /bin/sh /usr/sbin/gdplogd-wrapper.sh gdp 22526 22522 0 Dec19 ? 00:00:12 /usr/sbin/gdplogd -D20 gdp 22527 22522 0 Dec19 ? 00:00:00 llogger -a /var/log/gdp/gdplogd.log
(gdb) attach 22526 Attaching to process 22526 [New LWP 22529] [New LWP 22530] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". 0x00007f4d0e50798d in pthread_join (threadid=139968895817472, thread_return=thread_return@entry=0x0) at pthread_join.c:90 90 pthread_join.c: No such file or directory. (gdb) info threads Id Target Id Frame * 1 Thread 0x7f4d0f900880 (LWP 22526) "gdplogd" 0x00007f4d0e50798d in pthread_join (threadid=139968895817472, thread_return=thread_return@entry=0x0) at pthread_join.c:90 2 Thread 0x7f4d0cd0a700 (LWP 22529) "gdplogd" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 3 Thread 0x7f4d0c509700 (LWP 22530) "gdplogd" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 (gdb) thread apply all bt Thread 3 (Thread 0x7f4d0c509700 (LWP 22530)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007f4d0e508e42 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x6372c0 <GclCacheMutex>) at ../nptl/pthread_mutex_lock.c:115 #2 0x0000000000422db8 in _ep_thr_mutex_lock (mtx=mtx@entry=0x6372c0 <GclCacheMutex>, file=file@entry=0x427e56 "gdp_gcl_cache.c", line=line@entry=383, name=name@entry=0x427e47 "&GclCacheMutex") at ep_thr.c:232 #3 0x00000000004110ce in _gdp_gcl_cache_reclaim (maxage=300) at gdp_gcl_cache.c:383 #4 0x000000000040ac01 in gcl_reclaim_resources () at logd_gcl.c:203 #5 0x000000000040584b in gdpd_reclaim_resources (fd=<optimized out>, what=<optimized out>, ctx=<optimized out>) at logd.c:82 #6 0x00007f4d0f53aa0c in event_base_loop () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5 #7 0x0000000000414efb in run_event_loop (eli_=0x19c7ae0) at gdp_main.c:565 #8 0x00007f4d0e5066ba in start_thread (arg=0x7f4d0c509700) at pthread_create.c:333 #9 0x00007f4d0e23c82d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 2 (Thread 0x7f4d0cd0a700 (LWP 22529)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007f4d0e508dbd in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f4cfc07d1e0) at ../nptl/pthread_mutex_lock.c:80 #2 0x0000000000422db8 in _ep_thr_mutex_lock (mtx=mtx@entry=0x7f4cfc07d1e0, file=file@entry=0x427e56 "gdp_gcl_cache.c", line=line@entry=457, name=name@entry=0x427450 "&gcl->mutex") at ep_thr.c:232 #3 0x00000000004112f9 in _gdp_gcl_lock (gcl=gcl@entry=0x7f4cfc07d1e0) at gdp_gcl_cache.c:457 #4 0x0000000000411746 in _gdp_gcl_cache_get ( gcl_name=gcl_name@entry=0x7f4d0cd09bb0 "J\232H\224\323V\210\252\251a\330\370KP\367\066\302G\363te\274\225F\214t\341\354\017M\032\021/var/swarm/gdp/gcls/_4a", mode=mode@entry=GDP_MODE_ANY) at gdp_gcl_cache.c:208 #5 0x000000000040630d in post_one_log ( gdpname=0x7f4d0cd09bb0 "J\232H\224\323V\210\252\251a\330\370KP\367\066\302G\363te\274\225F\214t\341\354\017M\032\021/var/swarm/gdp/gcls/_4a", ctx=<optimized out>) at logd_admin.c:296 #6 0x0000000000407112 in disk_foreach (func=0x4062d7 <post_one_log>, ctx=0x0) at logd_disklog.c:2380 #7 0x0000000000405d2b in admin_probe_thread (ctx=<optimized out>) at logd_admin.c:336 #8 0x00000000004238fa in worker_thread (a=<optimized out>) at ep_thr_pool.c:145 #9 0x00007f4d0e5066ba in start_thread (arg=0x7f4d0cd0a700) at pthread_create.c:333 #10 0x00007f4d0e23c82d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 1 (Thread 0x7f4d0f900880 (LWP 22526)): #0 0x00007f4d0e50798d in pthread_join (threadid=139968895817472, thread_return=thread_return@entry=0x0) at pthread_join.c:90 #1 0x0000000000405cd0 in main (argc=<optimized out>, argv=<optimized out>) at logd.c:447 (gdb)
History
#1 Updated by Nitesh Mor almost 7 years ago
- Blocks Bug #91: edu.berkeley.eecs.gdp-01.gdplog down or hung? gdp create hanging under Node with Ubuntu, RHEL and Mac OS X added
#2 Updated by Nitesh Mor almost 7 years ago
Trace from another stuck instance
(gdb) thread apply all bt Thread 8 (Thread 0x7fae9effd700 (LWP 14029)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007faeafe54e42 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x6372c0 <GclCacheMutex>) at ../nptl/pthread_mutex_lock.c:115 #2 0x0000000000422db8 in _ep_thr_mutex_lock (mtx=mtx@entry=0x6372c0 <GclCacheMutex>, file=file@entry=0x427e56 "gdp_gcl_cache.c", line=line@entry=202, name=name@entry=0x427e47 "&GclCacheMutex") at ep_thr.c:232 #3 0x000000000041171e in _gdp_gcl_cache_get (gcl_name=gcl_name@entry=0x7faea800594d "\242\267\210\375{~Y\375ʣ\256\266\061n\251'Os\215\366\363\024\016\362\210\343\315\364ő\205t\r\213\221\362,_\261\235\022\342'g\364 \273\216\335k\347S\266\r\252\"v\005eܳ\001\273", <incomplete sequence \307>, mode=mode@entry=0) at gdp_gcl_cache.c:202 #4 0x0000000000414998 in gdp_pdu_proc_cmd (pdu_=0x7faea8005930) at gdp_main.c:146 #5 0x00000000004238fa in worker_thread (a=<optimized out>) at ep_thr_pool.c:145 #6 0x00007faeafe526ba in start_thread (arg=0x7fae9effd700) at pthread_create.c:333 #7 0x00007faeafb8882d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 7 (Thread 0x7fae9f7fe700 (LWP 14028)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007faeafe54e42 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x6372c0 <GclCacheMutex>) at ../nptl/pthread_mutex_lock.c:115 #2 0x0000000000422db8 in _ep_thr_mutex_lock (mtx=mtx@entry=0x6372c0 <GclCacheMutex>, file=file@entry=0x427e56 "gdp_gcl_cache.c", line=line@entry=202, name=name@entry=0x427e47 "&GclCacheMutex") at ep_thr.c:232 #3 0x000000000041171e in _gdp_gcl_cache_get (gcl_name=gcl_name@entry=0x7faea8001fcd "W\246P2\262\256@\231\365\371\363\274Qy\250\021\262\354\340\354\310JJ\276)\027VG\017\006\336\035\061\027\264/\202\033\n\215ڹ\f\a\321I\276\226&-5\032\034\026\316L\320\366\241\321vf", <incomplete sequence \350>, mode=mode@entry=0) at gdp_gcl_cache.c:202 #4 0x0000000000414998 in gdp_pdu_proc_cmd (pdu_=0x7faea8001fb0) at gdp_main.c:146 #5 0x00000000004238fa in worker_thread (a=<optimized out>) at ep_thr_pool.c:145 #6 0x00007faeafe526ba in start_thread (arg=0x7fae9f7fe700) at pthread_create.c:333 #7 0x00007faeafb8882d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 6 (Thread 0x7fae9ffff700 (LWP 14027)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007faeafe54e42 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x6372c0 <GclCacheMutex>) at ../nptl/pthread_mutex_lock.c:115 #2 0x0000000000422db8 in _ep_thr_mutex_lock (mtx=mtx@entry=0x6372c0 <GclCacheMutex>, file=file@entry=0x427e56 "gdp_gcl_cache.c", line=line@entry=202, name=name@entry=0x427e47 "&GclCacheMutex") at ep_thr.c:232 #3 0x000000000041171e in _gdp_gcl_cache_get (gcl_name=gcl_name@entry=0x7faea8001d9d "\341\023\346\070A\033\242\350\070\352\261\061jU\374\027\321\335\030\240}h~d\003\070S}sV\340\066\a1\253a\324\316NLIXvr\260}\206", mode=mode@entry=0) at gdp_gcl_cache.c:202 #4 0x0000000000414998 in gdp_pdu_proc_cmd (pdu_=0x7faea8001d80) at gdp_main.c:146 #5 0x00000000004238fa in worker_thread (a=<optimized out>) at ep_thr_pool.c:145 #6 0x00007faeafe526ba in start_thread (arg=0x7fae9ffff700) at pthread_create.c:333 #7 0x00007faeafb8882d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 5 (Thread 0x7faeace53700 (LWP 12377)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007faeafe54dbd in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7faea00ba720) at ../nptl/pthread_mutex_lock.c:80 #2 0x0000000000422db8 in _ep_thr_mutex_lock (mtx=mtx@entry=0x7faea00ba720, file=file@entry=0x427e56 "gdp_gcl_cache.c", line=line@entry=457, name=name@entry=0x427450 "&gcl->mutex") at ep_thr.c:232 #3 0x00000000004112f9 in _gdp_gcl_lock (gcl=gcl@entry=0x7faea00ba720) at gdp_gcl_cache.c:457 #4 0x0000000000411746 in _gdp_gcl_cache_get (gcl_name=gcl_name@entry=0x7faeace52bb0 "\372p\263'{Y\250?\341\305 A\362\n\376`\225\251\325\355ZE\336YvN\255\306\314\036\264\277/var/swarm/gdp/gcls/_fa", mode=mode@entry=GDP_MODE_ANY) at gdp_gcl_cache.c:208 #5 0x000000000040630d in post_one_log (gdpname=0x7faeace52bb0 "\372p\263'{Y\250?\341\305 A\362\n\376`\225\251\325\355ZE\336YvN\255\306\314\036\264\277/var/swarm/gdp/gcls/_fa", ctx=<optimized out>) at logd_admin.c:296 #6 0x0000000000407112 in disk_foreach (func=0x4062d7 <post_one_log>, ctx=0x0) at logd_disklog.c:2380 #7 0x0000000000405d2b in admin_probe_thread (ctx=<optimized out>) at logd_admin.c:336 #8 0x00000000004238fa in worker_thread (a=<optimized out>) at ep_thr_pool.c:145 #9 0x00007faeafe526ba in start_thread (arg=0x7faeace53700) at pthread_create.c:333 #10 0x00007faeafb8882d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 4 (Thread 0x7faead654700 (LWP 12376)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007faeafe54e42 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x6372c0 <GclCacheMutex>) at ../nptl/pthread_mutex_lock.c:115 #2 0x0000000000422db8 in _ep_thr_mutex_lock (mtx=mtx@entry=0x6372c0 <GclCacheMutex>, file=file@entry=0x427e56 "gdp_gcl_cache.c", line=line@entry=202, name=name@entry=0x427e47 "&GclCacheMutex") at ep_thr.c:232 #3 0x000000000041171e in _gdp_gcl_cache_get (gcl_name=gcl_name@entry=0x7fae9810c1ad "\307\033\345\307UC\205\r|h-!\033bE/\232\327sc\b\270\345\333'd\334(\315\371w\251P:\330k\021\266\247J\251\306\371Ǜ3)\224\257\347\a\270\345\324\346\031\255\214'Q\246", <incomplete sequence \345\246>, mode=mode@entry=0) at gdp_gcl_cache.c:202 #4 0x0000000000414998 in gdp_pdu_proc_cmd (pdu_=0x7fae9810c190) at gdp_main.c:146 #5 0x00000000004238fa in worker_thread (a=<optimized out>) at ep_thr_pool.c:145 #6 0x00007faeafe526ba in start_thread (arg=0x7faead654700) at pthread_create.c:333 #7 0x00007faeafb8882d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 3 (Thread 0x7faeade55700 (LWP 27640)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007faeafe54e42 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x6372c0 <GclCacheMutex>) at ../nptl/pthread_mutex_lock.c:115 #2 0x0000000000422db8 in _ep_thr_mutex_lock (mtx=mtx@entry=0x6372c0 <GclCacheMutex>, file=file@entry=0x427e56 "gdp_gcl_cache.c", line=line@entry=383, name=name@entry=0x427e47 "&GclCacheMutex") at ep_thr.c:232 #3 0x00000000004110ce in _gdp_gcl_cache_reclaim (maxage=300) at gdp_gcl_cache.c:383 #4 0x000000000040ac01 in gcl_reclaim_resources () at logd_gcl.c:203 #5 0x000000000040584b in gdpd_reclaim_resources (fd=<optimized out>, what=<optimized out>, ctx=<optimized out>) at logd.c:82 #6 0x00007faeb0e86a0c in event_base_loop () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5 #7 0x0000000000414efb in run_event_loop (eli_=0x1d6eae0) at gdp_main.c:565 #8 0x00007faeafe526ba in start_thread (arg=0x7faeade55700) at pthread_create.c:333 #9 0x00007faeafb8882d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 2 (Thread 0x7faeae656700 (LWP 27639)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007faeafe54e42 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x6372c0 <GclCacheMutex>) at ../nptl/pthread_mutex_lock.c:115 #2 0x0000000000422db8 in _ep_thr_mutex_lock (mtx=mtx@entry=0x6372c0 <GclCacheMutex>, file=file@entry=0x427e56 "gdp_gcl_cache.c", line=line@entry=202, name=name@entry=0x427e47 "&GclCacheMutex") at ep_thr.c:232 #3 0x000000000041171e in _gdp_gcl_cache_get (gcl_name=gcl_name@entry=0x7faea40bbd1d "\316\356\366c\344\222\021D`47\223\267\304 V^c", mode=mode@entry=0) at gdp_gcl_cache.c:202 #4 0x0000000000414998 in gdp_pdu_proc_cmd (pdu_=0x7faea40bbd00) at gdp_main.c:146 #5 0x00000000004238fa in worker_thread (a=<optimized out>) at ep_thr_pool.c:145 #6 0x00007faeafe526ba in start_thread (arg=0x7faeae656700) at pthread_create.c:333 #7 0x00007faeafb8882d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 1 (Thread 0x7faeb124c880 (LWP 27634)): #0 0x00007faeafe5398d in pthread_join (threadid=140388218525440, thread_return=thread_return@entry=0x0) at pthread_join.c:90 #1 0x0000000000405cd0 in main (argc=<optimized out>, argv=<optimized out>) at logd.c:447
#3 Updated by Eric Allman almost 7 years ago
- Status changed from New to In Progress
This looks like it may be a problem that I fixed already, probably in the existing master branch. It might be commit:6f1724a8, but I"m not sure. I last updated the running gdplogd
code on 12/11 and a fair number of locking changes happened after that. No guarantees.
#4 Updated by Nitesh Mor almost 7 years ago
I will deploy the most recent code to gdp-01,02,03,04
and see how it goes. In any case, there is a core dump of a stuck-instance in /home/mor/core.27634
on gdp-01
, which might be useful in case this resurfaces again.
#5 Updated by Eric Allman over 6 years ago
- Status changed from In Progress to Resolved
- Assignee changed from Nitesh Mor to Eric Allman
I believe this was fixed in commit:b572158d. New code is now running on all four gdp-xx machines, but we need more time to ensure that we are deadlock free.
#6 Updated by Eric Allman over 6 years ago
- Blocks deleted (Bug #91: edu.berkeley.eecs.gdp-01.gdplog down or hung? gdp create hanging under Node with Ubuntu, RHEL and Mac OS X)
#7 Updated by Eric Allman over 6 years ago
- Status changed from Resolved to Closed