Dave Jones e67d827
From davej  Thu Sep 16 11:55:58 2010
Dave Jones e67d827
Return-Path: linux-kernel-owner@vger.kernel.org
Dave Jones e67d827
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on gelk
Dave Jones e67d827
X-Spam-Level: 
Dave Jones e67d827
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED,
Dave Jones e67d827
	T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=ham version=3.3.1
Dave Jones e67d827
Received: from mail.corp.redhat.com [10.5.5.52]
Dave Jones e67d827
	by gelk with IMAP (fetchmail-6.3.17)
Dave Jones e67d827
	for <davej@localhost> (single-drop); Thu, 16 Sep 2010 11:55:58 -0400 (EDT)
Dave Jones e67d827
Received: from zmta02.collab.prod.int.phx2.redhat.com (LHLO
Dave Jones e67d827
 zmta02.collab.prod.int.phx2.redhat.com) (10.5.5.32) by
Dave Jones e67d827
 mail04.corp.redhat.com with LMTP; Thu, 16 Sep 2010 11:51:27 -0400 (EDT)
Dave Jones e67d827
Received: from localhost (localhost.localdomain [127.0.0.1])
Dave Jones e67d827
	by zmta02.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id 4889C9FC56;
Dave Jones e67d827
	Thu, 16 Sep 2010 11:51:27 -0400 (EDT)
Dave Jones e67d827
Received: from zmta02.collab.prod.int.phx2.redhat.com ([127.0.0.1])
Dave Jones e67d827
	by localhost (zmta02.collab.prod.int.phx2.redhat.com [127.0.0.1]) (amavisd-new, port 10024)
Dave Jones e67d827
	with ESMTP id 94mQrmwfCpY4; Thu, 16 Sep 2010 11:51:27 -0400 (EDT)
Dave Jones e67d827
Received: from int-mx03.intmail.prod.int.phx2.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.16])
Dave Jones e67d827
	by zmta02.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id 0DBDB9FC4B;
Dave Jones e67d827
	Thu, 16 Sep 2010 11:51:27 -0400 (EDT)
Dave Jones e67d827
Received: from mx1.redhat.com (ext-mx05.extmail.prod.ext.phx2.redhat.com [10.5.110.9])
Dave Jones e67d827
	by int-mx03.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id o8GFpQnO003857;
Dave Jones e67d827
	Thu, 16 Sep 2010 11:51:26 -0400
Dave Jones e67d827
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
Dave Jones e67d827
	by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id o8GFFCFE031066;
Dave Jones e67d827
	Thu, 16 Sep 2010 11:51:17 -0400
Dave Jones e67d827
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
Dave Jones e67d827
	id S1755493Ab0IPPvH (ORCPT <rfc822;jasowang@redhat.com> + 41 others);
Dave Jones e67d827
	Thu, 16 Sep 2010 11:51:07 -0400
Dave Jones e67d827
Received: from casper.infradead.org ([85.118.1.10]:41834 "EHLO
Dave Jones e67d827
	casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
Dave Jones e67d827
	with ESMTP id S1754921Ab0IPPvC convert rfc822-to-8bit (ORCPT
Dave Jones e67d827
	<rfc822;linux-kernel@vger.kernel.org>);
Dave Jones e67d827
	Thu, 16 Sep 2010 11:51:02 -0400
Dave Jones e67d827
Received: from f199130.upc-f.chello.nl ([80.56.199.130] helo=laptop)
Dave Jones e67d827
	by casper.infradead.org with esmtpsa (Exim 4.72 #1 (Red Hat Linux))
Dave Jones e67d827
	id 1OwGjI-0003VE-Ux; Thu, 16 Sep 2010 15:50:33 +0000
Dave Jones e67d827
Received: by laptop (Postfix, from userid 1000)
Dave Jones e67d827
	id 6DCDB100AEB1D; Thu, 16 Sep 2010 17:50:32 +0200 (CEST)
Dave Jones e67d827
Subject: Re: 2.6.35-stable/ppc64/p7: suspicious rcu_dereference_check()
Dave Jones e67d827
 usage detected during 2.6.35-stable boot
Dave Jones e67d827
From: Peter Zijlstra <peterz@infradead.org>
Dave Jones e67d827
To: paulmck@linux.vnet.ibm.com
Dave Jones e67d827
Cc: Subrata Modak <subrata@linux.vnet.ibm.com>,
Dave Jones e67d827
        linux-kernel <linux-kernel@vger.kernel.org>,
Dave Jones e67d827
        Li Zefan <lizf@cn.fujitsu.com>, Linuxppc-dev <Linuxppc-dev@ozlabs.org>,
Dave Jones e67d827
        sachinp <sachinp@linux.vnet.ibm.com>,
Dave Jones e67d827
        DIVYA PRAKASH <dipraksh@linux.vnet.ibm.com>,
Dave Jones e67d827
        "Valdis.Kletnieks" <Valdis.Kletnieks@vt.edu>
Dave Jones e67d827
In-Reply-To: <20100809161200.GC3026@linux.vnet.ibm.com>
Dave Jones e67d827
References: <1280739132.15317.9.camel@subratamodak.linux.ibm.com>
Dave Jones e67d827
	 <20100809161200.GC3026@linux.vnet.ibm.com>
Dave Jones e67d827
Content-Type: text/plain; charset="UTF-8"
Dave Jones e67d827
Content-Transfer-Encoding: 8BIT
Dave Jones e67d827
Date: 	Thu, 16 Sep 2010 17:50:31 +0200
Dave Jones e67d827
Message-ID: <1284652231.2275.569.camel@laptop>
Dave Jones e67d827
Mime-Version: 1.0
Dave Jones e67d827
Sender: linux-kernel-owner@vger.kernel.org
Dave Jones e67d827
Precedence: bulk
Dave Jones e67d827
List-ID: <linux-kernel.vger.kernel.org>
Dave Jones e67d827
X-Mailing-List: 	linux-kernel@vger.kernel.org
Dave Jones e67d827
X-RedHat-Spam-Score: -2.31  (RCVD_IN_DNSWL_MED,T_RP_MATCHES_RCVD)
Dave Jones e67d827
X-Scanned-By: MIMEDefang 2.67 on 10.5.11.16
Dave Jones e67d827
X-Scanned-By: MIMEDefang 2.67 on 10.5.110.9
Dave Jones e67d827
Status: RO
Dave Jones e67d827
Content-Length: 6752
Dave Jones e67d827
Lines: 145
Dave Jones e67d827
Dave Jones e67d827
On Mon, 2010-08-09 at 09:12 -0700, Paul E. McKenney wrote:
Dave Jones e67d827
Dave Jones e67d827
> > [    0.051203] CPU0: AMD QEMU Virtual CPU version 0.12.4 stepping 03
Dave Jones e67d827
> > [    0.052999] lockdep: fixing up alternatives.
Dave Jones e67d827
> > [    0.054105]
Dave Jones e67d827
> > [    0.054106] ===================================================
Dave Jones e67d827
> > [    0.054999] [ INFO: suspicious rcu_dereference_check() usage. ]
Dave Jones e67d827
> > [    0.054999] ---------------------------------------------------
Dave Jones e67d827
> > [    0.054999] kernel/sched.c:616 invoked rcu_dereference_check() without protection!
Dave Jones e67d827
> > [    0.054999]
Dave Jones e67d827
> > [    0.054999] other info that might help us debug this:
Dave Jones e67d827
> > [    0.054999]
Dave Jones e67d827
> > [    0.054999]
Dave Jones e67d827
> > [    0.054999] rcu_scheduler_active = 1, debug_locks = 1
Dave Jones e67d827
> > [    0.054999] 3 locks held by swapper/1:
Dave Jones e67d827
> > [    0.054999]  #0:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff814be933>] cpu_up+0x42/0x6a
Dave Jones e67d827
> > [    0.054999]  #1:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810400d8>] cpu_hotplug_begin+0x2a/0x51
Dave Jones e67d827
> > [    0.054999]  #2:  (&rq->lock){-.-...}, at: [<ffffffff814be2f7>] init_idle+0x2f/0x113
Dave Jones e67d827
> > [    0.054999]
Dave Jones e67d827
> > [    0.054999] stack backtrace:
Dave Jones e67d827
> > [    0.054999] Pid: 1, comm: swapper Not tainted 2.6.35 #1
Dave Jones e67d827
> > [    0.054999] Call Trace:
Dave Jones e67d827
> > [    0.054999]  [<ffffffff81068054>] lockdep_rcu_dereference+0x9b/0xa3
Dave Jones e67d827
> > [    0.054999]  [<ffffffff810325c3>] task_group+0x7b/0x8a
Dave Jones e67d827
> > [    0.054999]  [<ffffffff810325e5>] set_task_rq+0x13/0x40
Dave Jones e67d827
> > [    0.054999]  [<ffffffff814be39a>] init_idle+0xd2/0x113
Dave Jones e67d827
> > [    0.054999]  [<ffffffff814be78a>] fork_idle+0xb8/0xc7
Dave Jones e67d827
> > [    0.054999]  [<ffffffff81068717>] ? mark_held_locks+0x4d/0x6b
Dave Jones e67d827
> > [    0.054999]  [<ffffffff814bcebd>] do_fork_idle+0x17/0x2b
Dave Jones e67d827
> > [    0.054999]  [<ffffffff814bc89b>] native_cpu_up+0x1c1/0x724
Dave Jones e67d827
> > [    0.054999]  [<ffffffff814bcea6>] ? do_fork_idle+0x0/0x2b
Dave Jones e67d827
> > [    0.054999]  [<ffffffff814be876>] _cpu_up+0xac/0x127
Dave Jones e67d827
> > [    0.054999]  [<ffffffff814be946>] cpu_up+0x55/0x6a
Dave Jones e67d827
> > [    0.054999]  [<ffffffff81ab562a>] kernel_init+0xe1/0x1ff
Dave Jones e67d827
> > [    0.054999]  [<ffffffff81003854>] kernel_thread_helper+0x4/0x10
Dave Jones e67d827
> > [    0.054999]  [<ffffffff814c353c>] ? restore_args+0x0/0x30
Dave Jones e67d827
> > [    0.054999]  [<ffffffff81ab5549>] ? kernel_init+0x0/0x1ff
Dave Jones e67d827
> > [    0.054999]  [<ffffffff81003850>] ? kernel_thread_helper+0x0/0x10
Dave Jones e67d827
> > [    0.056074] Booting Node   0, Processors  #1lockdep: fixing up alternatives.
Dave Jones e67d827
> > [    0.130045]  #2lockdep: fixing up alternatives.
Dave Jones e67d827
> > [    0.203089]  #3 Ok.
Dave Jones e67d827
> > [    0.275286] Brought up 4 CPUs
Dave Jones e67d827
> > [    0.276005] Total of 4 processors activated (16017.17 BogoMIPS).
Dave Jones e67d827
> 
Dave Jones e67d827
> This does look like a new one, thank you for reporting it!
Dave Jones e67d827
> 
Dave Jones e67d827
> Here is my analysis, which should at least provide some humor value to
Dave Jones e67d827
> those who understand the code better than I do.  ;-)
Dave Jones e67d827
> 
Dave Jones e67d827
> So the corresponding rcu_dereference_check() is in
Dave Jones e67d827
> task_subsys_state_check(), and is fetching the cpu_cgroup_subsys_id
Dave Jones e67d827
> element of the newly created task's task->cgroups->subsys[] array.
Dave Jones e67d827
> The "git grep" command finds only three uses of cpu_cgroup_subsys_id,
Dave Jones e67d827
> but no definition.
Dave Jones e67d827
> 
Dave Jones e67d827
> Now, fork_idle() invokes copy_process(), which invokes cgroup_fork(),
Dave Jones e67d827
> which sets the child process's ->cgroups pointer to that of the parent,
Dave Jones e67d827
> also invoking get_css_set(), which increments the corresponding reference
Dave Jones e67d827
> count, doing both operations under task_lock() protection (->alloc_lock).
Dave Jones e67d827
> Because fork_idle() does not specify any of CLONE_NEWNS, CLONE_NEWUTS,
Dave Jones e67d827
> CLONE_NEWIPC, CLONE_NEWPID, or CLONE_NEWNET, copy_namespaces() should
Dave Jones e67d827
> not create a new namespace, and so there should be no ns_cgroup_clone().
Dave Jones e67d827
> We should thus retain the parent's ->cgroups pointer.  And copy_process()
Dave Jones e67d827
> installs the new task in the various lists, so that the task is externally
Dave Jones e67d827
> accessible upon return.
Dave Jones e67d827
> 
Dave Jones e67d827
> After a non-error return from copy_process(), fork_init() invokes
Dave Jones e67d827
> init_idle_pid(), which does not appear to affect the task's cgroup
Dave Jones e67d827
> state.  Next fork_init() invokes init_idle(), which in turn invokes
Dave Jones e67d827
> __set_task_cpu(), which invokes set_task_rq(), which calls task_group()
Dave Jones e67d827
> several times, which calls task_subsys_state_check(), which calls the
Dave Jones e67d827
> rcu_dereference_check() that complained above.
Dave Jones e67d827
> 
Dave Jones e67d827
> However, the result returns by rcu_dereference_check() is stored into
Dave Jones e67d827
> the task structure:
Dave Jones e67d827
> 
Dave Jones e67d827
> 	p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
Dave Jones e67d827
> 	p->se.parent = task_group(p)->se[cpu];
Dave Jones e67d827
> 
Dave Jones e67d827
> This means that the corresponding structure must have been tied down with
Dave Jones e67d827
> a reference count or some such.  If such a reference has been taken, then
Dave Jones e67d827
> this complaint is a false positive, and could be suppressed by putting
Dave Jones e67d827
> rcu_read_lock() and rcu_read_unlock() around the call to init_idle()
Dave Jones e67d827
> from fork_idle().  However, although, reference to the enclosing ->cgroups
Dave Jones e67d827
> struct css_set is held, it is not clear to me that this reference applies
Dave Jones e67d827
> to the structures pointed to by the ->subsys[] array, especially given
Dave Jones e67d827
> that the cgroup_subsys_state structures referenced by this array have
Dave Jones e67d827
> their own reference count, which does not appear to me to be acquired
Dave Jones e67d827
> by this code path.
Dave Jones e67d827
> 
Dave Jones e67d827
> Or are the cgroup_subsys_state structures referenced by idle tasks
Dave Jones e67d827
> never freed or some such?
Dave Jones e67d827
Dave Jones e67d827
I would hope so!, the idle tasks should be part of the root cgroup,
Dave Jones e67d827
which is not removable.
Dave Jones e67d827
Dave Jones e67d827
The problem is that while we do in-fact hold rq->lock, the newly spawned
Dave Jones e67d827
idle thread's cpu is not yet set to the correct cpu so the lockdep check
Dave Jones e67d827
in task_group():
Dave Jones e67d827
Dave Jones e67d827
  lockdep_is_held(&task_rq(p)->lock)
Dave Jones e67d827
Dave Jones e67d827
will fail.
Dave Jones e67d827
Dave Jones e67d827
But of a chicken and egg problem. Setting the cpu needs to have the cpu
Dave Jones e67d827
set ;-)
Dave Jones e67d827
Dave Jones e67d827
Ingo, why do we have rq->lock there at all? The CPU isn't up and running
Dave Jones e67d827
yet, nothing should be touching it.
Dave Jones e67d827
Dave Jones e67d827
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Dave Jones e67d827
---
Dave Jones e67d827
 kernel/sched.c |   12 ++++++++++++
Dave Jones e67d827
 1 files changed, 12 insertions(+), 0 deletions(-)
Dave Jones e67d827
Dave Jones e67d827
diff --git a/kernel/sched.c b/kernel/sched.c
Dave Jones e67d827
index bd8b487..6241049 100644
Dave Jones e67d827
--- a/kernel/sched.c
Dave Jones e67d827
+++ b/kernel/sched.c
Dave Jones e67d827
@@ -5332,7 +5332,19 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu)
Dave Jones e67d827
 	idle->se.exec_start = sched_clock();
Dave Jones e67d827
 
Dave Jones e67d827
 	cpumask_copy(&idle->cpus_allowed, cpumask_of(cpu));
Dave Jones e67d827
+	/*
Dave Jones e67d827
+	 * We're having a chicken and egg problem, even though we are
Dave Jones e67d827
+	 * holding rq->lock, the cpu isn't yet set to this cpu so the
Dave Jones e67d827
+	 * lockdep check in task_group() will fail.
Dave Jones e67d827
+	 *
Dave Jones e67d827
+	 * Similar case to sched_fork(). / Alternatively we could
Dave Jones e67d827
+	 * use task_rq_lock() here and obtain the other rq->lock.
Dave Jones e67d827
+	 *
Dave Jones e67d827
+	 * Silence PROVE_RCU
Dave Jones e67d827
+	 */
Dave Jones e67d827
+	rcu_read_lock();
Dave Jones e67d827
 	__set_task_cpu(idle, cpu);
Dave Jones e67d827
+	rcu_read_unlock();
Dave Jones e67d827
 
Dave Jones e67d827
 	rq->curr = rq->idle = idle;
Dave Jones e67d827
 #if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW)
Dave Jones e67d827
Dave Jones e67d827
--
Dave Jones e67d827
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Dave Jones e67d827
the body of a message to majordomo@vger.kernel.org
Dave Jones e67d827
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Jones e67d827
Please read the FAQ at  http://www.tux.org/lkml/
Dave Jones e67d827