So I've had a deadlock reported to me. I've found that the sequence of
events goes like this:
1) process A (modprobe) runs to remove ip_tables.ko
2) process B (iptables-restore) runs and calls setsockopt on a netfilter socket,
increasing the ip_tables socket_ops use count
3) process A acquires a file lock on the file ip_tables.ko, calls remove_module
in the kernel, which in turn executes the ip_tables module cleanup routine,
which calls nf_unregister_sockopt
4) nf_unregister_sockopt, seeing that the use count is non-zero, puts the
calling process into uninterruptible sleep, expecting the process using the
socket option code to wake it up when it exits the kernel
4) the user of the socket option code (process B) in do_ipt_get_ctl, calls
ipt_find_table_lock, which in this case calls request_module to load
ip_tables_nat.ko
5) request_module forks a copy of modprobe (process C) to load the module and
blocks until modprobe exits.
6) Process C. forked by request_module process the dependencies of
ip_tables_nat.ko, of which ip_tables.ko is one.
7) Process C attempts to lock the request module and all its dependencies, it
blocks when it attempts to lock ip_tables.ko (which was previously locked in
step 3)
Theres not really any great permanent solution to this that I can see, but I've
developed a two part solution that corrects the problem
Part 1) Modifies the nf_sockopt registration code so that, instead of using a
use counter internal to the nf_sockopt_ops structure, we instead use a pointer
to the registering modules owner to do module reference counting when nf_sockopt
calls a modules set/get routine. This prevents the deadlock by preventing set 4
from happening.
Part 2) Enhances the modprobe utilty so that by default it preforms non-blocking
remove operations (the same way rmmod does), and add an option to explicity
request blocking operation. So if you select blocking operation in modprobe you
can still cause the above deadlock, but only if you explicity try (and since
root can do any old stupid thing it would like.... :) ).
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we're now using a generic tuple decoding function in ICMP
connection tracking, ipv4_get_l4proto() might get called with a
fragmented packet from within an ICMP error. Remove the error
message we used to print when this happens.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Updated the multiqueue.txt document to call out the correct kernel
options to select to enable multiqueue.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Denis V. Lunev <den@openvz.org>
addrconf_dad_failure calls addrconf_dad_stop which takes referenced address
and drops the count. So, in6_ifa_put perrformed at out: is extra. This
results in message: "Freeing alive inet6 address" and not released dst entries.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not all are listed, same as the IPV4 devinet bug.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bug: http://bugzilla.kernel.org/show_bug.cgi?id=8876
Not all ips are shown by "ip addr show" command when IPs number assigned to an
interface is more than 60-80 (in fact it depends on broadcast/label etc
presence on each address).
Steps to reproduce:
It's terribly simple to reproduce:
# for i in $(seq 1 100); do ip ad add 10.0.$i.1/24 dev eth10 ; done
# ip addr show
this will _not_ show all IPs.
Looks like the problem is in netlink/ipv4 message processing.
This is fix from bug submitter, it looks correct.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When msg_iovlen is zero we shouldn't try to dereference
msg_iov. Right now the only thing that tries to do so
is skb_copy_and_csum_datagram_iovec. Since the total
length should also be zero if msg_iovlen is zero, it's
sufficient to check the total length there and simply
return if it's zero.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
A hardware bug was revealed after a recent PCI MSI patch was made to
always disable legacy INTX when enabling MSI. The 5714/5780 chips
will not generate MSI when INTX is disabled, causing MSI failure
messages to be reported, and another patch was made to workaround the
problem by disabling MSI on ServerWorks HT1000 bridge chips commonly
found with the 5714.
We workaround this chip bug by enabling INTX after we enable MSI and
after we resume from suspend.
Update version to 3.81.
This problem was discovered by David Miller.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev:
libata clear horkage on ata_dev_init()
[libata, IDE] add new VIA bridge to VIA PATA drivers
pata_it821x: fix lost interrupt with atapi devices
Fix broken pata_via cable detection
dev->horkage should be cleared over device hotunplug/plug. Clear it
in ata_dev_init().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The earlier crash dump fix on x86_64 depended on patches in -mm which
are intended for post-2.6.23. Without those, it broke the build when
it went into 2.6.23-rc5.
This changes the field references in ELF_CORE_COPY_REGS back to those
still used in mainline.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the case when an nmi gets stucks the endflag stays equal to zero.
This causes the busy looping on other cpus to continue, even though the
nmi test is done.
On my machine with out the change below the system would hang right
after check_nmi_watchdog(). The change below just sets endflag prior to
checking if the test was successful or not.
Signed-off-by: Daniel Walker <dwalker@mvista.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The CLFLUSH for the modified code line in text_poke was supposed
to speed up CPU recovery. Unfortunately it seems to cause hangs
on some VIA C3s (at least on VIA Esther Model 10 Stepping 9)
Remove it.
Thanks to Stefan Becker for reporting/testing.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the NMI watchdog on Intel CoreDuo processor where the kernel would
get stuck during boot. The issue is related to errata AE49, where the
PERFEVTSEL1 counter does not have a working enable bit. Thus it is not
possible to use it for NMI.
The patch creates a dedicated wd_ops for CoreDuo which falls back to
using PERFEVTSEL0. The other Intel processors supporting the
architectural PMU will keep on using PERFEVTSEL1 as this allows other
subsystems, such as perfmon, to use PERFEVTSEL0 for PEBS monitoring in
particular. Bug initially reported by Daniel Walker.
AK: Added comments
Signed-off-by: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fsid_source decided where to get the 'fsid' number to
return for a GETATTR based on the type of filehandle.
It can be from the device, from the fsid, or from the
UUID.
It is possible for the filehandle to be inconsistent
with the export information, so make sure the export information
actually has the info implied by the value returned by
fsid_source.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: "Luiz Fernando N. Capitulino" <lcapitulino@gmail.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recent changes in NFSd cause a directory which is mounted-on
to not appear properly when the filesystem containing it is exported.
*exp_get* now returns -ENOENT rather than NULL and when
commit 5d3dbbeaf5
removed the NULL checks, it didn't add a check for -ENOENT.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When PTRACE_SYSCALL was used and then PTRACE_DETACH is used, the
TIF_SYSCALL_TRACE flag is left set on the formerly-traced task. This
means that when a new tracer comes along and does PTRACE_ATTACH, it's
possible he gets a syscall tracing stop even though he's never used
PTRACE_SYSCALL. This happens if the task was in the middle of a system
call when the second PTRACE_ATTACH was done. The symptom is an
unexpected SIGTRAP when the tracer thinks that only SIGSTOP should have
been provoked by his ptrace calls so far.
A few machines already fixed this in ptrace_disable (i386, ia64, m68k).
But all other machines do not, and still have this bug. On x86_64, this
constitutes a regression in IA32 compatibility support.
Since all machines now use TIF_SYSCALL_TRACE for this, I put the
clearing of TIF_SYSCALL_TRACE in the generic ptrace_detach code rather
than adding it to every other machine's ptrace_disable.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix "lost" interrupt problem when using dma with CD/DVD drives in some
configurations. This problem can make installing linux from media
impossible for distro's that have switched to libata-only configurations.
The simple fix is to eliminate the use of dma for reading drive status, etc,
by checking the number of bytes to transferred.
This change will only affect the behavior of atapi devices, not disks.
There is more info at http://bugzilla.redhat.com/show_bug.cgi?id=242229
This patch is for 2.6.22.1
Signed-off-by: Jeff Norden <jnorden@math.tntech.edu>
Reviewed-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] Ocelot: remove remaining bits
[MIPS] TLB: Fix instruction bitmasks
[MIPS] R10000: Fix wrong test in dma-default.c
[MIPS] Provide empty irq_enable_hazard definition for legacy and R1 cores.
[MIPS] Sibyte: Remove broken dependency on EXPERIMENTAL from SIBYTE_SB1xxx_SOC.
[MIPS] Kconfig: whitespace cleanup.
[MIPS] PCI: Set need_domain_info if controller domain index is non-zero.
[MIPS] BCM1480: Fix computation of interrupt mask address register.
[MIPS] i8259: Add disable method.
[MIPS] tty: add the new ioctls and definitions.
* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
i2c-algo-bit: Read block data bugfix
i2c-pxa: Fix adapter number
i2c-gpio: Fix adapter number
The hardware adds one to the BRG value to get the divider, so it must
be subtracted by software. Without this patch, characters will occasionally
be corrupted.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Instantiation of 8MB pages on the TLB cache for the kernel static
mapping trashes r3 register on !CONFIG_8xx_CPU6 configurations.
This ensures r3 gets saved and restored.
This has been posted to linuxppc-embedded by Marcelo Tosatti
<marcelo@kvack.org>, but only an incomplete version of the patch
has been applied in c51e078f82.
This patch adds the rest of the fix.
Signed-off-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Following a strict interpretation the empty definition of irq_enable_hazard
has always been a bug - but an intentional one because it didn't bite.
This has now changed, for uniprocessor kernels mm/slab.c:do_drain()
[...]
on_each_cpu(do_drain, cachep, 1, 1);
check_irq_on();
[...]
may be compiled into a mtc0 c0_status; mfc0 c0_status sequence resulting
in a back-to-back hazard.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
CC arch/mips/sibyte/bcm1480/irq.o
arch/mips/sibyte/bcm1480/irq.c: In function 'bcm1480_mask_irq':
arch/mips/sibyte/bcm1480/irq.c:112: warning: cast to pointer from integer of different size
arch/mips/sibyte/bcm1480/irq.c:114: warning: cast to pointer from integer of different size
arch/mips/sibyte/bcm1480/irq.c: In function 'bcm1480_unmask_irq':
arch/mips/sibyte/bcm1480/irq.c:130: warning: cast to pointer from integer of different size
arch/mips/sibyte/bcm1480/irq.c:132: warning: cast to pointer from integer of different size
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Same as all the others, just put in the constants for the existing kernel
code and termios2 structure
Signed-off-by: Alan Cox <alan@redhat.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Previously, ibmebus derived a device's bus_id from its location code.
The location code is not guaranteed to be unique, so we might get bus_id
collisions if two devices share the same location code. The OFDT
full_name, however, is unique, so we use that instead (truncating it
on the left if it is too long).
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
On PS3, A storage device may show up in the repository before the hypervisor
has finished probing:
- If its type is not yet known, it shows up as PS3_DEV_TYPE_STOR_DUMMY,
- If its regions are being probed, it shows up as having zero regions.
If any of these happen, consider the device not yet present. The storage
probe thread will retry later.
This fixes the timing-dependent problem where a kernel booted from FLASH ROM
sometimes cannot find the hard disk.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
At present, running any SPE program on the ps3 will trigger a BUG_ON
when spufs_run_spu tries to clear the master run control bit, as lv1
does not make the master run control available to Linux.
This change makes SPE apps work again by disabling changes to the
master run control on PS3. Although we don't have the facility to
disable a SPE with supervisor-level privileges, it's better than
hitting the BUG_ON unconditionally.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The Cell BE Architecture spec states that the SPU MFC Class 0 interrupt
is edge-triggered. The current spu interrupt handler assumes this
behavior and does not clear the interrupt status.
The PS3 hypervisor visualizes all SPU interrupts as level, and on return
from the interrupt handler the hypervisor will deliver a new virtual
interrupt for any unmasked interrupts which for which the status has not
been cleared. This fix clears the interrupt status in the interrupt
handler.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The previous patch had the conditional inverted. This patch fixes it
so that we return the original position if it does not straddle a page.
Thanks to Bob Gilligan for spotting this.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This fixes a bug in the way i2c-algo-bit handles I2C_M_RECV_LEN,
used to implement i2c_smbus_read_block_data(). Previously, in the
absence of PEC (rarely used!) it would NAK the "length" byte:
S addr Rd [A] [length] NA
That prevents the subsequent data bytes from being read:
S addr Rd [A] [length] { A [data] }* NA
The primary fix just reorders two code blocks, so the length used
in the "should I NAK now?" check incorporates the data which it
just read from the slave device.
However, that move also highlighted other fault handling glitches.
This fixes those by abstracting the RX path ack/nak logic, so it
can be used in more than one location. Also, a few CodingStyle
issues were also resolved.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>