kraxel / rpms / kernel

Forked from rpms/kernel 2 years ago
Clone
Dave Jones 191eabc
Date: 	Mon, 30 Jan 2012 22:37:28 +0100
Dave Jones 191eabc
Message-ID: <CAPRPZsAt+e3cy1YTriikpb2SNN=jOusvnPF0ByFeun+uaBa5Og@mail.gmail.com>
Dave Jones 191eabc
Subject: [PATCH] Unhandled IRQs on AMD E-450: temporarily switch to
Dave Jones 191eabc
 low-performance polling IRQ mode
Dave Jones 191eabc
From: Jeroen Van den Keybus <jeroen.vandenkeybus@gmail.com>
Dave Jones 191eabc
To: linux-kernel@vger.kernel.org
Dave Jones 191eabc
Cc: Clemens Ladisch <clemens@ladisch.de>, "Huang, Shane" <Shane.Huang@amd.com>,
Dave Jones 191eabc
        Borislav Petkov <bp@amd64.org>, "Nguyen, Dong" <Dong.Nguyen@amd.com>,
Dave Jones 191eabc
        jesse.brandeburg@gmail.com
Dave Jones 191eabc
Content-Type: text/plain; charset=ISO-8859-1
Dave Jones 191eabc
Sender: linux-kernel-owner@vger.kernel.org
Dave Jones 191eabc
Precedence: bulk
Dave Jones 191eabc
List-ID: <linux-kernel.vger.kernel.org>
Dave Jones 191eabc
X-Mailing-List: 	linux-kernel@vger.kernel.org
Dave Jones 191eabc
X-RedHat-Spam-Score: -4.898  (DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,FREEMAIL_FROM,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,T_RP_MATCHES_RCVD)
Dave Jones 191eabc
X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12
Dave Jones 191eabc
X-Scanned-By: MIMEDefang 2.68 on 10.5.110.19
Dave Jones 191eabc
Status: RO
Dave Jones 191eabc
Content-Length: 7029
Dave Jones 191eabc
Lines: 189
Dave Jones 191eabc
Dave Jones 191eabc
It seems that some motherboard designs using the ASM1083 PCI/PCIe
Dave Jones 191eabc
bridge (PCI device ID 1b21:1080, Rev. 01) suffer from stuck IRQ lines
Dave Jones 191eabc
on the PCI bus (causing the kernel to emit 'IRQxx: nobody cared' and
Dave Jones 191eabc
disable the IRQ). The following patch is an attempt to mitigate the
Dave Jones 191eabc
serious impact of permanently disabling an IRQ in that case and
Dave Jones 191eabc
actually make PCI devices better usable on this platform.
Dave Jones 191eabc
Dave Jones 191eabc
It seems that the bridge fails to issue a IRQ deassertion message on
Dave Jones 191eabc
the PCIe bus, when the relevant driver causes the interrupting PCI
Dave Jones 191eabc
device to deassert its IRQ line. To solve this issue, it was tried to
Dave Jones 191eabc
re-issue an IRQ on a PCI device being able to do so (e1000 in this
Dave Jones 191eabc
case), but we suspect that the attempt to re-assert/deassert may have
Dave Jones 191eabc
occurred too soon after the initial IRQ for the ASM1083. Anyway, it
Dave Jones 191eabc
didn't work but if, after some delay, a new IRQ occurred, the related
Dave Jones 191eabc
IRQ deassertion message eventually did clear the IOAPIC IRQ. It would
Dave Jones 191eabc
be useful to re-enable the IRQ here.
Dave Jones 191eabc
Dave Jones 191eabc
Therefore the patch below to poll_spurious_irqs() in spurious.c is
Dave Jones 191eabc
proposed, It does the following:
Dave Jones 191eabc
Dave Jones 191eabc
1. lets the kernel decide that an IRQ is unhandled after only 10
Dave Jones 191eabc
positives (instead of 100,000);
Dave Jones 191eabc
2. briefly (a few seconds or so, currently 1 s) switches to polling
Dave Jones 191eabc
IRQ at a higher rate than usual (100..1,000Hz instead of 10Hz,
Dave Jones 191eabc
currently 100Hz), but not too high to avoid excessive CPU load. Any
Dave Jones 191eabc
device drivers 'see' their interrupts handled with a higher latency
Dave Jones 191eabc
than usual, but they will still operate properly;
Dave Jones 191eabc
3. afterwards, simply reenable the IRQ.
Dave Jones 191eabc
Dave Jones 191eabc
If proper operation of the PCIe legacy IRQ line emulation is restored
Dave Jones 191eabc
after 3, the system operates again at normal performance. If the IRQ
Dave Jones 191eabc
is still stuck after this procedure, the sequence repeats.
Dave Jones 191eabc
Dave Jones 191eabc
If a genuinely stuck IRQ is used with this solution, the system would
Dave Jones 191eabc
simply sustain short bursts of 10 unhandled IRQs per second, and use
Dave Jones 191eabc
polling mode indefinitely at a moderate 100Hz rate. It seemed a good
Dave Jones 191eabc
alternative to the default irqpoll behaviour to me, which is why I
Dave Jones 191eabc
left it in poll_spurious_irqs() (instead of creating a new kernel
Dave Jones 191eabc
option). Additionally, if any device happens to share an IRQ with a
Dave Jones 191eabc
faulty one, that device is no longer banned forever.
Dave Jones 191eabc
Dave Jones 191eabc
Debugging output is still present and may be removed. Bad IRQ
Dave Jones 191eabc
reporting is also commented out now.
Dave Jones 191eabc
Dave Jones 191eabc
I have now tried it for about 2 months and I can conclude the following:
Dave Jones 191eabc
Dave Jones 191eabc
1. The patch works and, judging from my Firewire card interrupt on
Dave Jones 191eabc
IRQ16, which repeats every 64 secs, I can confirm that the IRQ usually
Dave Jones 191eabc
gets reset when a new IRQ arrives (polling mode runs for 64 seconds
Dave Jones 191eabc
every time).
Dave Jones 191eabc
2. When testing a SiL-3114 SATA PCI card behind the ASM1083, I could
Dave Jones 191eabc
keep this running at fairly high speeds (50..70MB/s) for an hour or
Dave Jones 191eabc
so, but eventually the SiL driver crashed. In such conditions the PCI
Dave Jones 191eabc
system had to deal with a few hundred IRQs per second / polling mode
Dave Jones 191eabc
kicking in every 5..10 seconds).
Dave Jones 191eabc
Dave Jones 191eabc
I would like to thank Clemens Ladisch for his invaluable help in
Dave Jones 191eabc
finding a solution (and providing a patch to avoid my SATA going down
Dave Jones 191eabc
every time during debugging).
Dave Jones 191eabc
Dave Jones 191eabc
Dave Jones 191eabc
Signed-off-by: Jeroen Van den Keybus <jeroen.vandenkeybus@gmail.com>
4f07bf3
6314625
Make it less chatty.  Only kick it in if we detect an ASM1083 PCI bridge.
6314625
6314625
Josh Boyer <jwboyer@redhat.com>
Dave Jones 191eabc
======
Dave Jones 191eabc
4f07bf3
--- linux-2.6.orig/kernel/irq/spurious.c
4f07bf3
+++ linux-2.6/kernel/irq/spurious.c
6314625
@@ -18,6 +18,8 @@
4f07bf3
 
Dave Jones 191eabc
 static int irqfixup __read_mostly;
4f07bf3
 
6314625
+int irq_poll_and_retry = 0;
6314625
+
6314625
 #define POLL_SPURIOUS_IRQ_INTERVAL (HZ/10)
Dave Jones 191eabc
 static void poll_spurious_irqs(unsigned long dummy);
Dave Jones 191eabc
 static DEFINE_TIMER(poll_spurious_irq_timer, poll_spurious_irqs, 0, 0);
6314625
@@ -141,12 +143,13 @@ out:
Dave Jones 191eabc
 static void poll_spurious_irqs(unsigned long dummy)
Dave Jones 191eabc
 {
Dave Jones 191eabc
 	struct irq_desc *desc;
Dave Jones 191eabc
-	int i;
Dave Jones 191eabc
+	int i, poll_again;
4f07bf3
 
Dave Jones 191eabc
 	if (atomic_inc_return(&irq_poll_active) != 1)
Dave Jones 191eabc
 		goto out;
Dave Jones 191eabc
 	irq_poll_cpu = smp_processor_id();
4f07bf3
 
Dave Jones 191eabc
+	poll_again = 0; /* Will stay false as long as no polling candidate is found */
Dave Jones 191eabc
 	for_each_irq_desc(i, desc) {
6314625
 		unsigned int state;
4f07bf3
 
6314625
@@ -159,14 +162,33 @@ static void poll_spurious_irqs(unsigned
Dave Jones 191eabc
 		if (!(state & IRQS_SPURIOUS_DISABLED))
Dave Jones 191eabc
 			continue;
4f07bf3
 
Dave Jones 191eabc
-		local_irq_disable();
Dave Jones 191eabc
-		try_one_irq(i, desc, true);
Dave Jones 191eabc
-		local_irq_enable();
Dave Jones 191eabc
+		/* We end up here with a disabled spurious interrupt.
Dave Jones 191eabc
+		   desc->irqs_unhandled now tracks the number of times
Dave Jones 191eabc
+		   the interrupt has been polled */
6314625
+		if (irq_poll_and_retry) {
6314625
+			if (desc->irqs_unhandled < 100) { /* 1 second delay with poll frequency 100 Hz */
6314625
+				local_irq_disable();
6314625
+				try_one_irq(i, desc, true);
6314625
+				local_irq_enable();
6314625
+				desc->irqs_unhandled++;
6314625
+				poll_again = 1;
6314625
+			} else {
6314625
+				irq_enable(desc); /* Reenable the interrupt line */
6314625
+				desc->depth--;
6314625
+				desc->istate &= (~IRQS_SPURIOUS_DISABLED);
6314625
+				desc->irqs_unhandled = 0;
6314625
+			}
6314625
+		} else {
Dave Jones 191eabc
+			local_irq_disable();
Dave Jones 191eabc
+			try_one_irq(i, desc, true);
Dave Jones 191eabc
+			local_irq_enable();
Dave Jones 191eabc
+		}
Dave Jones 191eabc
 	}
Dave Jones 191eabc
+	if (poll_again)
Dave Jones 191eabc
+		mod_timer(&poll_spurious_irq_timer,
Dave Jones 191eabc
+			  jiffies + POLL_SPURIOUS_IRQ_INTERVAL);
Dave Jones 191eabc
 out:
Dave Jones 191eabc
 	atomic_dec(&irq_poll_active);
Dave Jones 191eabc
-	mod_timer(&poll_spurious_irq_timer,
Dave Jones 191eabc
-		  jiffies + POLL_SPURIOUS_IRQ_INTERVAL);
Dave Jones 191eabc
 }
4f07bf3
 
Dave Jones 191eabc
 static inline int bad_action_ret(irqreturn_t action_ret)
6314625
@@ -177,11 +199,19 @@ static inline int bad_action_ret(irqretu
Dave Jones 191eabc
 }
4f07bf3
 
Dave Jones 191eabc
 /*
Dave Jones 191eabc
- * If 99,900 of the previous 100,000 interrupts have not been handled
Dave Jones 191eabc
+ * If 9 of the previous 10 interrupts have not been handled
Dave Jones 191eabc
  * then assume that the IRQ is stuck in some manner. Drop a diagnostic
Dave Jones 191eabc
  * and try to turn the IRQ off.
Dave Jones 191eabc
  *
Dave Jones 191eabc
- * (The other 100-of-100,000 interrupts may have been a correctly
Dave Jones 191eabc
+ * Although this may cause early deactivation of a sporadically
Dave Jones 191eabc
+ * malfunctioning IRQ line, the poll system will:
Dave Jones 191eabc
+ * a) Poll it for 100 cycles at a 100 Hz rate
Dave Jones 191eabc
+ * b) Reenable it afterwards
Dave Jones 191eabc
+ *
Dave Jones 191eabc
+ * In worst case, with current settings,  this will cause short bursts
Dave Jones 191eabc
+ * of 10 interrupts every second.
Dave Jones 191eabc
+ *
Dave Jones 191eabc
+ * (The other single interrupt may have been a correctly
Dave Jones 191eabc
  *  functioning device sharing an IRQ with the failing one)
Dave Jones 191eabc
  */
Dave Jones 191eabc
 static void
841f4ae
@@ -269,6 +299,8 @@ try_misrouted_irq(unsigned int irq, stru
841f4ae
 void note_interrupt(unsigned int irq, struct irq_desc *desc,
841f4ae
 		    irqreturn_t action_ret)
841f4ae
 {
841f4ae
+	int unhandled_thresh = 999000;
841f4ae
+
841f4ae
 	if (desc->istate & IRQS_POLL_INPROGRESS)
841f4ae
 		return;
841f4ae
 
841f4ae
@@ -302,19 +334,31 @@ void note_interrupt(unsigned int irq, st
Dave Jones 191eabc
 	}
4f07bf3
 
Dave Jones 191eabc
 	desc->irq_count++;
Dave Jones 191eabc
-	if (likely(desc->irq_count < 100000))
841f4ae
-		return;
841f4ae
+	if (!irq_poll_and_retry)
841f4ae
+		if (likely(desc->irq_count < 100000))
841f4ae
+			return;
841f4ae
+	else
841f4ae
+		if (likely(desc->irq_count < 10))
841f4ae
+			return;
4f07bf3
 
Dave Jones 191eabc
 	desc->irq_count = 0;
Dave Jones 191eabc
-	if (unlikely(desc->irqs_unhandled > 99900)) {
841f4ae
+	if (irq_poll_and_retry)
841f4ae
+		unhandled_thresh = 9;
841f4ae
+
841f4ae
+	if (unlikely(desc->irqs_unhandled >= unhandled_thresh)) {
Dave Jones 191eabc
 		/*
6314625
-		 * The interrupt is stuck
6314625
+		 * The interrupt might be stuck
Dave Jones 191eabc
 		 */
Dave Jones 191eabc
-		__report_bad_irq(irq, desc, action_ret);
6314625
+		if (!irq_poll_and_retry) {
6314625
+			__report_bad_irq(irq, desc, action_ret);
6314625
+			printk(KERN_EMERG "Disabling IRQ %d\n", irq);
6314625
+		} else {
6314625
+			printk(KERN_INFO "IRQ %d might be stuck.  Polling\n",
6314625
+				irq);
6314625
+		}
Dave Jones 191eabc
 		/*
Dave Jones 191eabc
 		 * Now kill the IRQ
Dave Jones 191eabc
 		 */
Dave Jones 191eabc
-		printk(KERN_EMERG "Disabling IRQ #%d\n", irq);
Dave Jones 191eabc
 		desc->istate |= IRQS_SPURIOUS_DISABLED;
Dave Jones 191eabc
 		desc->depth++;
Dave Jones 191eabc
 		irq_disable(desc);
6314625
--- linux-2.6.orig/drivers/pci/quirks.c
6314625
+++ linux-2.6/drivers/pci/quirks.c
6314625
@@ -1677,6 +1677,22 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_IN
6314625
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,	0x260a, quirk_intel_pcie_pm);
6314625
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,	0x260b, quirk_intel_pcie_pm);
6314625
 
6314625
+/* ASM108x transparent PCI bridges apparently have broken IRQ deassert
6314625
+ * handling.  This causes interrupts to get "stuck" and eventually disabled.
6314625
+ * However, the interrupts are often shared and disabling them is fairly bad.
6314625
+ * It's been somewhat successful to switch to polling mode and retry after
6314625
+ * a bit, so let's do that.
6314625
+ */
6314625
+extern int irq_poll_and_retry;
6314625
+static void quirk_asm108x_poll_interrupts(struct pci_dev *dev)
6314625
+{
6314625
+	dev_info(&dev->dev, "Buggy bridge found [%04x:%04x]\n",
6314625
+		dev->vendor, dev->device);
6314625
+	dev_info(&dev->dev, "Stuck interrupts will be polled and retried\n");
6314625
+	irq_poll_and_retry = 1;
6314625
+}
6314625
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_ASMEDIA,	0x1080,	quirk_asm108x_poll_interrupts);
6314625
+
6314625
 #ifdef CONFIG_X86_IO_APIC
6314625
 /*
6314625
  * Boot interrupts on some chipsets cannot be turned off. For these chipsets,