linux/arch
Alex Chiang 66db2e6331 [IA64] Revert "prevent ia64 from invoking irq handlers on offline CPUs"
This reverts commit e7b140365b.

Commit e7b14036 removes the targetted disabled CPU from the
cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.

Paul McKenney states that the reasoning behind the patch was to
prevent irq handlers from running on CPUs marked offline because:

	RCU happily ignores CPUs that don't have their bits set in
	cpu_online_map, so if there are RCU read-side critical sections
	in the irq handlers being run, RCU will ignore them.  If the
	other CPUs were running, they might sequence through the RCU
	state machine, which could result in data structures being
	yanked out from under those irq handlers, which in turn could
	result in oopses or worse.

Unfortunately, both ia64 functions above look at cpu_online_map to find
a new CPU to migrate interrupts onto. This means we can potentially
migrate an interrupt off ourself back to... ourself. Uh oh.

This causes an oops when we finally try to process pending interrupts on
the CPU we want to disable. The oops results from calling __do_IRQ with
a NULL pt_regs:

Unable to handle kernel NULL pointer dereference (address 0000000000000040)
Call Trace:
 [<a000000100016930>] show_stack+0x50/0xa0
                                sp=e0000009c922fa00 bsp=e0000009c92214d0
 [<a0000001000171a0>] show_regs+0x820/0x860
                                sp=e0000009c922fbd0 bsp=e0000009c9221478
 [<a00000010003c700>] die+0x1a0/0x2e0
                                sp=e0000009c922fbd0 bsp=e0000009c9221438
 [<a0000001006e92f0>] ia64_do_page_fault+0x950/0xa80
                                sp=e0000009c922fbd0 bsp=e0000009c92213d8
 [<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
                                sp=e0000009c922fc60 bsp=e0000009c92213d8
 [<a0000001000ecdb0>] profile_tick+0xd0/0x1c0
                                sp=e0000009c922fe30 bsp=e0000009c9221398
 [<a00000010003bb90>] timer_interrupt+0x170/0x3e0
                                sp=e0000009c922fe30 bsp=e0000009c9221330
 [<a00000010013a800>] handle_IRQ_event+0x80/0x120
                                sp=e0000009c922fe30 bsp=e0000009c92212f8
 [<a00000010013aa00>] __do_IRQ+0x160/0x4a0
                                sp=e0000009c922fe30 bsp=e0000009c9221290
 [<a000000100012290>] ia64_process_pending_intr+0x2b0/0x360
                                sp=e0000009c922fe30 bsp=e0000009c9221208
 [<a0000001000112d0>] fixup_irqs+0xf0/0x2a0
                                sp=e0000009c922fe30 bsp=e0000009c92211a8
 [<a00000010005bd80>] __cpu_disable+0x140/0x240
                                sp=e0000009c922fe30 bsp=e0000009c9221168
 [<a0000001006c5870>] take_cpu_down+0x50/0xa0
                                sp=e0000009c922fe30 bsp=e0000009c9221148
 [<a000000100122610>] stop_cpu+0xd0/0x200
                                sp=e0000009c922fe30 bsp=e0000009c92210f0
 [<a0000001000e0440>] kthread+0xc0/0x140
                                sp=e0000009c922fe30 bsp=e0000009c92210c8
 [<a000000100014ab0>] kernel_thread_helper+0xd0/0x100
                                sp=e0000009c922fe30 bsp=e0000009c92210a0
 [<a00000010000a4c0>] start_kernel_thread+0x20/0x40
                                sp=e0000009c922fe30 bsp=e0000009c92210a0

I don't like this revert because it is fragile. ia64 is getting lucky
because we seem to only ever process timer interrupts in this path, but
if we ever race with an IPI here, we definitely use RCU and have the
potential of hitting an oops that Paul describes above.

Patching ia64's timer_interrupt() to check for NULL pt_regs is
insufficient though, as we still hit the above oops.

As a short term solution, I do think that this revert is the right
answer. The revert hold up under repeated testing (24+ hour test runs)
with this setup:

	- 8-way rx6600
	- randomly toggling CPU online/offline state every 2 seconds
	- running CPU exercisers, memory hog, disk exercisers, and
	  network stressors
	- average system load around ~160

In the long term, we really need to figure out why we set pt_regs = NULL
in ia64_process_pending_intr(). If it turns out that it is unnecessary
to do so, then we could safely re-introduce e7b14036 (along with some
other logic to be smarter about migrating interrupts).

One final note: x86 also removes the disabled CPU from cpu_online_map
and then re-enables interrupts for 1ms, presumably to handle any pending
interrupts:

arch/x86/kernel/irq_32.c (and irq_64.c):
cpu_disable_common:
	[remove cpu from cpu_online_map]

	fixup_irqs():
		for_each_irq:
			[break CPU affinities]

		local_irq_enable();
		mdelay(1);
		local_irq_disable();

So they are doing implicitly what ia64 is doing explicitly.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>
2009-02-19 11:32:26 -08:00
..
alpha cpumask: Use cpu_*_mask accessors code: alpha 2009-02-16 17:32:00 +10:30
arm [ARM] Storage class should be before const qualifier 2009-02-10 09:59:19 +00:00
avr32 eeprom: More consistent symbol names 2009-01-26 21:19:57 +01:00
blackfin Blackfin arch: Remove outdated code 2009-02-04 16:49:45 +08:00
cris Merge branch 'syscalls' of git://git390.osdl.marist.edu/pub/scm/linux-2.6 2009-01-14 19:58:40 -08:00
frv FRV: in_interrupt() requires #inclusion of linux/hardirq.h not asm/hardirq.h now 2009-02-09 08:51:35 -08:00
h8300 Merge branch 'syscalls' of git://git390.osdl.marist.edu/pub/scm/linux-2.6 2009-01-14 19:58:40 -08:00
ia64 [IA64] Revert "prevent ia64 from invoking irq handlers on offline CPUs" 2009-02-19 11:32:26 -08:00
m32r eeprom: More consistent symbol names 2009-01-26 21:19:57 +01:00
m68k m68knommu: remove the no longer used PCI support option 2009-01-27 16:42:02 +10:00
m68knommu m68knommu: fix 5329 ColdFire periphal addressing 2009-01-27 16:42:03 +10:00
mips x86: spinlocks: define dummy __raw_spin_is_contended 2009-02-09 08:15:39 -08:00
mn10300 [CVE-2009-0029] Rename old_readdir to sys_old_readdir 2009-01-14 14:15:15 +01:00
parisc Documentation: move DMA-mapping.txt to Doc/PCI/ 2009-01-29 18:19:29 -08:00
powerpc Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2009-02-17 14:23:49 -08:00
s390 KVM: Add kvm_arch_sync_events to sync with asynchronize events 2009-02-15 02:47:36 +02:00
sh sh: Fix up T-bit error handling in SH-4A mutex fastpath. 2009-01-29 11:56:03 +09:00
sparc sparc64: Fix probe_kernel_{read,write}(). 2009-02-08 22:32:31 -08:00
um mm: invoke oom-killer from page fault 2009-01-06 15:58:58 -08:00
x86 mm: clean up for early_pfn_to_nid() 2009-02-18 15:37:55 -08:00
xtensa byteorder: make swab.h include asm/swab.h like a regular header 2009-01-14 19:56:50 -08:00
.gitignore
Kconfig [CVE-2009-0029] System call wrapper infrastructure 2009-01-14 14:15:16 +01:00