keiths / rpms / gdb

Forked from rpms/gdb 6 days ago
Clone
0911f37
http://sourceware.org/ml/gdb-patches/2009-12/msg00364.html
0911f37
Subject: [patch] related_breakpoint stale ref crash fix
0911f37
0911f37
Hi,
0911f37
0911f37
getting occasional random:
0911f37
 PASS: gdb.threads/local-watch-wrong-thread.exp: local watchpoint still triggers
0911f37
 PASS: gdb.threads/local-watch-wrong-thread.exp: let thread_function0 return
0911f37
 PASS: gdb.threads/local-watch-wrong-thread.exp: breakpoint on thread_function0's caller
0911f37
-PASS: gdb.threads/local-watch-wrong-thread.exp: local watchpoint automatically deleted
0911f37
+ERROR: Process no longer exists
0911f37
+UNRESOLVED: gdb.threads/local-watch-wrong-thread.exp: local watchpoint automatically deleted
0911f37
0911f37
It is even reproducible on HEAD using "input" file below and:
0911f37
	valgrind ../gdb -nx 
0911f37
0911f37
(gdb) (gdb) Breakpoint 6 at 0x400685: file ./gdb.threads/local-watch-wrong-thread.c, line 47.
0911f37
(gdb) ==31759== Invalid write of size 4
0911f37
==31759==    at 0x601A11: bpstat_check_breakpoint_conditions (breakpoint.c:3482)
0911f37
==31759==    by 0x601C70: bpstat_stop_status (breakpoint.c:3596)
0911f37
==31759==    by 0x65D228: handle_inferior_event (infrun.c:3589)
0911f37
==31759==    by 0x65A563: wait_for_inferior (infrun.c:2281)
0911f37
==31759==    by 0x659AA0: proceed (infrun.c:1883)
0911f37
==31759==    by 0x65300B: continue_1 (infcmd.c:668)
0911f37
==31759==    by 0x653282: continue_command (infcmd.c:760)
0911f37
==31759==    by 0x5C51D8: do_cfunc (cli-decode.c:67)
0911f37
==31759==    by 0x5C824F: cmd_func (cli-decode.c:1738)
0911f37
==31759==    by 0x48335A: execute_command (top.c:450)
0911f37
==31759==    by 0x67273E: command_handler (event-top.c:511)
0911f37
==31759==    by 0x672E53: command_line_handler (event-top.c:736)
0911f37
==31759==  Address 0xbbdc950 is 240 bytes inside a block of size 336 free'd
0911f37
==31759==    at 0x4A04D72: free (vg_replace_malloc.c:325)
0911f37
==31759==    by 0x486E4B: xfree (utils.c:1286)
0911f37
==31759==    by 0x60BC35: delete_breakpoint (breakpoint.c:8708)
0911f37
==31759==    by 0x60BDAF: delete_command (breakpoint.c:8765)
0911f37
==31759==    by 0x5C51D8: do_cfunc (cli-decode.c:67)
0911f37
==31759==    by 0x5C824F: cmd_func (cli-decode.c:1738)
0911f37
==31759==    by 0x48335A: execute_command (top.c:450)
0911f37
==31759==    by 0x67273E: command_handler (event-top.c:511)
0911f37
==31759==    by 0x672E53: command_line_handler (event-top.c:736)
0911f37
==31759==    by 0x672FCF: gdb_readline2 (event-top.c:817)
0911f37
==31759==    by 0x6725F7: stdin_event_handler (event-top.c:433)
0911f37
==31759==    by 0x670CDE: handle_file_event (event-loop.c:812)
0911f37
==31759== 
0911f37
0911f37
Watchpoint 4 deleted because the program has left the block in
0911f37
which its expression is valid.
0911f37
0911f37
0911f37
There is already automatic deletion of RELATED_BREAKPOINT in
0911f37
map_breakpoint_numbers but "delete breakpoints" (for all the breakpoints)
0911f37
calls delete_breakpoint from delete_command directly without calling
0911f37
map_breakpoint_numbers and it does not delete the associated
0911f37
bp_watchpoint_scope.
0911f37
0911f37
I find the attached patch is right for delete_breakpoint itself as such
0911f37
function should not leave stale references in the leftover data structures.
0911f37
How well could be other code cleaned up with this patch in place I have not
0911f37
targeted by this patch.
0911f37
0911f37
0911f37
The existing code expects accessibility of freed memory and discusses the
0911f37
current stale references problem:
0911f37
0911f37
void
0911f37
delete_breakpoint (struct breakpoint *bpt)
0911f37
[...]
0911f37
  /* Has this bp already been deleted?  This can happen because multiple
0911f37
     lists can hold pointers to bp's.  bpstat lists are especial culprits.
0911f37
0911f37
     One example of this happening is a watchpoint's scope bp.  When the
0911f37
     scope bp triggers, we notice that the watchpoint is out of scope, and
0911f37
     delete it.  We also delete its scope bp.  But the scope bp is marked
0911f37
     "auto-deleting", and is already on a bpstat.  That bpstat is then
0911f37
     checked for auto-deleting bp's, which are deleted.
0911f37
0911f37
     A real solution to this problem might involve reference counts in bp's,
0911f37
     and/or giving them pointers back to their referencing bpstat's, and
0911f37
     teaching delete_breakpoint to only free a bp's storage when no more
0911f37
     references were extent.  A cheaper bandaid was chosen.  */
0911f37
  if (bpt->type == bp_none)
0911f37
    return;
0911f37
[...]
0911f37
  bpt->type = bp_none;
0911f37
0911f37
  xfree (bpt);
0911f37
}
0911f37
0911f37
0911f37
While fixing this part may be difficult I find the attached patch easy enough
0911f37
fixing the IMO currently most common crash due to it.
0911f37
0911f37
No regressions on {x86_64,x86_64-m32,i686}-fedora12-linux-gnu.
0911f37
0911f37
0911f37
Thanks,
0911f37
Jan
0911f37
0911f37
0911f37
"input":
0911f37
set height 0
0911f37
set width 0
0911f37
set confirm no
0911f37
file ../testsuite/gdb.threads/local-watch-wrong-thread
0911f37
set can-use-hw-watchpoints 1
0911f37
break main
0911f37
run
0911f37
break local-watch-wrong-thread.c:36
0911f37
continue
0911f37
delete breakpoints
0911f37
watch *myp
0911f37
continue
0911f37
delete breakpoints
0911f37
echo MAKE watch\n
0911f37
watch *myp if trigger != 0
0911f37
echo MAKE break\n
0911f37
break local-watch-wrong-thread.c:60
0911f37
info break
0911f37
continue
0911f37
echo DELETE five\n
0911f37
delete 5
0911f37
set trigger=1
0911f37
continue
0911f37
set *myp=0
0911f37
break local-watch-wrong-thread.c:47
0911f37
continue
0911f37
0911f37
0911f37
0911f37
2009-12-23  Jan Kratochvil  <jan.kratochvil@redhat.com>
0911f37
0911f37
	* breakpoint.c (delete_breakpoint <bpt->related_breakpoint != NULL>):
0911f37
	New.
0911f37
0911f37
--- a/gdb/breakpoint.c
0911f37
+++ b/gdb/breakpoint.c
0911f37
@@ -8649,6 +8649,16 @@ delete_breakpoint (struct breakpoint *bpt)
0911f37
   if (bpt->type == bp_none)
0911f37
     return;
0911f37
 
0911f37
+  /* At least avoid this stale reference until the reference counting of
0911f37
+     breakpoints gets resolved.  */
0911f37
+  if (bpt->related_breakpoint != NULL)
0911f37
+    {
0911f37
+      gdb_assert (bpt->related_breakpoint->related_breakpoint == bpt);
0911f37
+      bpt->related_breakpoint->disposition = disp_del_at_next_stop;
0911f37
+      bpt->related_breakpoint->related_breakpoint = NULL;
0911f37
+      bpt->related_breakpoint = NULL;
0911f37
+    }
0911f37
+
0911f37
   observer_notify_breakpoint_deleted (bpt->number);
0911f37
 
0911f37
   if (breakpoint_chain == bpt)
0911f37