Move all labels in the swsusp code to the second column, so that they won't
fool diff -p.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@suspend2.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move the loop from freeze_processes() to a separate function and call it
independently for user space processes and kernel threads so that the order
of freezing tasks is clearly visible.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@suspend2.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move the loop from thaw_processes() to a separate function and call it
independently for kernel threads and user space processes so that the order
of thawing tasks is clearly visible.
Drop thaw_kernel_threads() which is never used.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@suspend2.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The power management semaphore is only used as mutex, so convert it.
[akpm@osdl.org: fix rotten bug]
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix http://bugzilla.kernel.org/show_bug.cgi?id=7534
Fix the freezing of processes so that it won't fail if there is a traced
process the parent of which has been stopped.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: maurice barnum <pixi+kbug@burble.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make swsusp measure and print the time needed to shrink memory during the
suspend.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@suspend2.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't modify the cpus_allowed of the task initiating the suspend.
_cpu_down() already makes sure that the task doing the suspend doesn't run
on dying cpu.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Nigel Cunningham <nigel@suspend2.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make swsusp support i386 systems with PAE or without PSE.
This is done by creating temporary page tables located in resume-safe page
frames before the suspend image is restored in the same way as x86_64 does
it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andi Kleen <ak@suse.de>
Cc: Dave Jones <davej@redhat.com>
Cc: Nigel Cunningham <ncunningham@linuxmail.org>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Modify process thawing so that we can thaw kernel space without thawing
userspace, and thaw kernelspace first. This will be useful in later
patches, where I intend to get swsusp thawing kernel threads only before
seeking to free memory.
Signed-off-by: Nigel Cunningham <nigel@suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Minor whitespace and formatting modifications for the freezer.
Signed-off-by: Nigel Cunningham <nigel@suspend2.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The freezer currently prints an '=' for every process that is frozen. This
is pretty pointless, as the equals sign says nothing about which process is
frozen, and makes logs look messier (especially if there were a large
number of processes running). All we really need to know is that we
started trying to freeze processes and what processes (if any) failed to
freeze, or that we succeeded.
Signed-off-by: Nigel Cunningham <nigel@suspend2.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move process freezing functions from include/linux/sched.h to freezer.h, so
that modifications to the freezer or the kernel configuration don't require
recompiling just about everything.
[akpm@osdl.org: fix ueagle driver]
Signed-off-by: Nigel Cunningham <nigel@suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
At some point after 2.6.13, in-kernel software suspend got "incomplete" for
the so-called "platform" mode. pm_ops->prepare() is never called. A
visible sign of this is the "moon" light on thinkpads not flashing during
suspend. Fix by readding the pm_ops->prepare call during suspend.
Signed-off-by: Stefan Seyfried <seife@suse.de>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
swsusp uses GFP_ATOMIC, but it can afford to use __GFP_WAIT, which will
permit it to reclaim clean pagecache instead of emitting scary
page-allocation-failure messages.
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently swsusp saves the contents of highmem pages by copying them to the
normal zone which is quite inefficient (eg. it requires two normal pages
to be used for saving one highmem page). This may be improved by using
highmem for saving the contents of saveable highmem pages.
Namely, during the suspend phase of the suspend-resume cycle we try to
allocate as many free highmem pages as there are saveable highmem pages.
If there are not enough highmem image pages to store the contents of all of
the saveable highmem pages, some of them will be stored in the "normal"
memory. Next, we allocate as many free "normal" pages as needed to store
the (remaining) image data. We use a memory bitmap to mark the allocated
free pages (ie. highmem as well as "normal" image pages).
Now, we use another memory bitmap to mark all of the saveable pages
(highmem as well as "normal") and the contents of the saveable pages are
copied into the image pages. Then, the second bitmap is used to save the
pfns corresponding to the saveable pages and the first one is used to save
their data.
During the resume phase the pfns of the pages that were saveable during the
suspend are loaded from the image and used to mark the "unsafe" page
frames. Next, we try to allocate as many free highmem page frames as to
load all of the image data that had been in the highmem before the suspend
and we allocate so many free "normal" page frames that the total number of
allocated free pages (highmem and "normal") is equal to the size of the
image. While doing this we have to make sure that there will be some extra
free "normal" and "safe" page frames for two lists of PBEs constructed
later.
Now, the image data are loaded, if possible, into their "original" page
frames. The image data that cannot be written into their "original" page
frames are loaded into "safe" page frames and their "original" kernel
virtual addresses, as well as the addresses of the "safe" pages containing
their copies, are stored in one of two lists of PBEs.
One list of PBEs is for the copies of "normal" suspend pages (ie. "normal"
pages that were saveable during the suspend) and it is used in the same way
as previously (ie. by the architecture-dependent parts of swsusp). The
other list of PBEs is for the copies of highmem suspend pages. The pages
in this list are restored (in a reversible way) right before the
arch-dependent code is called.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
To be able to use swap files as suspend storage from the userland suspend
tools we need an additional ioctl() that will allow us to provide the kernel
with both the swap header's offset and the identification of the resume
partition.
The new ioctl() should be regarded as a replacement for the
SNAPSHOT_SET_SWAP_FILE ioctl() that from now on will be considered as
obsolete, but has to stay for backwards compatibility of the interface.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the kernel command line parameter "resume_offset=" allowing us to specify
the offset, in <PAGE_SIZE> units, from the beginning of the partition pointed
to by the "resume=" parameter at which the swap header is located.
This offset can be determined, for example, by an application using the FIBMAP
ioctl to obtain the swap header's block number for given file.
[akpm@osdl.org: we don't know what type sector_t is]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make swsusp use block device offsets instead of swap offsets to identify swap
locations and make it use the same code paths for writing as well as for
reading data.
This allows us to use the same code for handling swap files and swap
partitions and to simplify the code, eg. by dropping rw_swap_page_sync().
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rearrange the code in kernel/power/swap.c so that the next patch is more
readable.
[This patch only moves the existing code.]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The Linux kernel handles swap files almost in the same way as it handles swap
partitions and there are only two differences between these two types of swap
areas:
(1) swap files need not be contiguous,
(2) the header of a swap file is not in the first block of the partition
that holds it. From the swsusp's point of view (1) is not a problem,
because it is already taken care of by the swap-handling code, but (2) has
to be taken into consideration.
In principle the location of a swap file's header may be determined with the
help of appropriate filesystem driver. Unfortunately, however, it requires
the filesystem holding the swap file to be mounted, and if this filesystem is
journaled, it cannot be mounted during a resume from disk. For this reason we
need some other means by which swap areas can be identified.
For example, to identify a swap area we can use the partition that holds the
area and the offset from the beginning of this partition at which the swap
header is located.
The following patch allows swsusp to identify swap areas this way. It changes
swap_type_of() so that it takes an additional argument representing an offset
of the swap header within the partition represented by its first argument.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add an ioctl to the userspace swsusp code that enables the usage of the
pmops->prepare, pmops->enter and pmops->finish methods (the in-kernel
suspend knows these as "platform method"). These are needed on many
machines to (among others) speed up resuming by letting the BIOS skip some
steps or let my hp nx5000 recognise the correct ac_adapter state after
resume again.
It also ensures on many machines, that changed hardware (unplugged AC
adapters) gets correctly detected and that kacpid does not run wild after
resume.
Signed-off-by: Stefan Seyfried <seife@suse.de>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace all uses of kmem_cache_t with struct kmem_cache.
The patch was generated using the following script:
#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#
set -e
for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done
The script was run like this
sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_KERNEL is an alias of GFP_KERNEL.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce pagefault_{disable,enable}() and use these where previously we did
manual preempt increments/decrements to make the pagefault handler do the
atomic thing.
Currently they still rely on the increased preempt count, but do not rely on
the disabled preemption, this might go away in the future.
(NOTE: the extra barrier() in pagefault_disable might fix some holes on
machines which have too many registers for their own good)
[heiko.carstens@de.ibm.com: s390 fix]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Nick Piggin <npiggin@suse.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The new swap token patches replace the current token traversal algo. The old
algo had a crude timeout parameter that was used to handover the token from
one task to another. This algo, transfers the token to the tasks that are in
need of the token. The urgency for the token is based on the number of times
a task is required to swap-in pages. Accordingly, the priority of a task is
incremented if it has been badly affected due to swap-outs. To ensure that
the token doesnt bounce around rapidly, the token holders are given a priority
boost. The priority of tasks is also decremented, if their rate of swap-in's
keeps reducing. This way, the condition to check whether to pre-empt the swap
token, is a matter of comparing two task's priority fields.
[akpm@osdl.org: cleanups]
Signed-off-by: Ashwin Chaugule <ashwin.chaugule@celunite.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
An audit message occurs when an ipsec SA
or ipsec policy is created/deleted.
Signed-off-by: Joy Latten <latten@austin.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The .eh_frame section contents is never written to, so it can as well
benefit from CONFIG_DEBUG_RODATA.
Diff-ed against firstfloor tree.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Now that binutils' ld is able to properly populate .eh_frame_hdr in the
Linux kernel case, here's a patch to add some functionality to the Dwarf2
unwinder to actually be able to make use of this (applies on firstfloor
tree with the previously sent patch to add debug output, but not on plain
2.6.19).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Add debugging printks to the unwinder to allow easier debugging
when something goes wrong with it.
This can be controlled with the new unwinder_debug=N option
Most output is given by N=1
AK: Added documentation of unwinder_debug=
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Tighten the requirements on both input to and output from the Dwarf2
unwinder.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
This avoids trouble with the page fault handler if the fault
happens inside an interrupt context.
Suggested by Linus
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Add sysctl for kstack_depth_to_print. This lets users change
the amount of raw stack data printed in dump_stack() without
having to reboot.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch is the meat of the PDA change. This patch makes several related
changes:
1: Most significantly, %gs is now used in the kernel. This means that on
entry, the old value of %gs is saved away, and it is reloaded with
__KERNEL_PDA.
2: entry.S constructs the stack in the shape of struct pt_regs, and this
is passed around the kernel so that the process's saved register
state can be accessed.
Unfortunately struct pt_regs doesn't currently have space for %gs
(or %fs). This patch extends pt_regs to add space for gs (no space
is allocated for %fs, since it won't be used, and it would just
complicate the code in entry.S to work around the space).
3: Because %gs is now saved on the stack like %ds, %es and the integer
registers, there are a number of places where it no longer needs to
be handled specially; namely context switch, and saving/restoring the
register state in a signal context.
4: And since kernel threads run in kernel space and call normal kernel
code, they need to be created with their %gs == __KERNEL_PDA.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Conflicts:
drivers/ata/libata-scsi.c
include/linux/libata.h
Futher merge of Linus's head and compilation fixups.
Signed-Off-By: David Howells <dhowells@redhat.com>
Conflicts:
drivers/infiniband/core/iwcm.c
drivers/net/chelsio/cxgb2.c
drivers/net/wireless/bcm43xx/bcm43xx_main.c
drivers/net/wireless/prism54/islpci_eth.c
drivers/usb/core/hub.h
drivers/usb/input/hid-core.c
net/core/netpoll.c
Fix up merge failures with Linus's head and fix new compilation failures.
Signed-Off-By: David Howells <dhowells@redhat.com>
By modyfing genlmsg_put() to take a genl_family and by adding
genlmsg_put_reply() the process of constructing the netlink
and generic netlink headers is simplified.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Account for the netlink message header size directly in nlmsg_new()
instead of relying on the caller calculate it correctly.
Replaces error handling of message construction functions when
constructing notifications with bug traps since a failure implies
a bug in calculating the size of the skb.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Show the drivers, which belong to the module:
$ ls -l /sys/module/usbcore/drivers/
hub -> ../../../bus/usb/drivers/hub
usb -> ../../../bus/usb/drivers/usb
usbfs -> ../../../bus/usb/drivers/usbfs
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6:
[PATCH] x86-64: Use stricter in process stack check for unwinder
[PATCH] i386: Fix compilation with UP genericarch
[PATCH] x86-64: Fix warning in io_apic.c
[PATCH] x86-64: work around gcc4 issue with -Os in Dwarf2 stack unwind
[PATCH] x86_64: Align data segment to PAGE_SIZE boundary
The return value of create_write_pipe()/create_read_pipe() should be
checked by IS_ERR().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes a problem with gcc4 mis-compiling the stack unwind code under
-Os, which resulted in 'stuck' messages whenever an assembly routine was
encountered.
(The second hunk is trivial cleanup.)
Signed-off-by: Jan Beulich <jbeulich@novell.com>
The return value of copy_process() should be checked by IS_ERR().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This reverts commit f72fa70760, and solves
the problem that it tried to fix by simply making "__do_IRQ()" call the
note_interrupt() function without the lock held, the way everybody else
does.
It should be noted that all interrupt handling code must never allow the
descriptor actors to be entered "recursively" (that's why we do all the
magic IRQ_PENDING stuff in the first place), so there actually is
exclusion at that much higher level, even in the absense of locking.
Acked-by: Vivek Goyal <vgoyal@in.ibm.com>
Acked-by:Pavel Emelianov <xemul@openvz.org>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass the work_struct pointer to the work function rather than context data.
The work function can use container_of() to work out the data.
For the cases where the container of the work_struct may go away the moment the
pending bit is cleared, it is made possible to defer the release of the
structure by deferring the clearing of the pending bit.
To make this work, an extra flag is introduced into the management side of the
work_struct. This governs auto-release of the structure upon execution.
Ordinarily, the work queue executor would release the work_struct for further
scheduling or deallocation by clearing the pending bit prior to jumping to the
work function. This means that, unless the driver makes some guarantee itself
that the work_struct won't go away, the work function may not access anything
else in the work_struct or its container lest they be deallocated.. This is a
problem if the auxiliary data is taken away (as done by the last patch).
However, if the pending bit is *not* cleared before jumping to the work
function, then the work function *may* access the work_struct and its container
with no problems. But then the work function must itself release the
work_struct by calling work_release().
In most cases, automatic release is fine, so this is the default. Special
initiators exist for the non-auto-release case (ending in _NAR).
Signed-Off-By: David Howells <dhowells@redhat.com>
Reclaim a word from the size of the work_struct by folding the pending bit and
the wq_data pointer together. This shouldn't cause misalignment problems as
all pointers should be at least 4-byte aligned.
Signed-Off-By: David Howells <dhowells@redhat.com>
Define a type for the work function prototype. It's not only kept in the
work_struct struct, it's also passed as an argument to several functions.
This makes it easier to change it.
Signed-Off-By: David Howells <dhowells@redhat.com>
Separate delayable work items from non-delayable work items be splitting them
into a separate structure (delayed_work), which incorporates a work_struct and
the timer_list removed from work_struct.
The work_struct struct is huge, and this limits it's usefulness. On a 64-bit
architecture it's nearly 100 bytes in size. This reduces that by half for the
non-delayable type of event.
Signed-Off-By: David Howells <dhowells@redhat.com>
lockdep got confused by certain locks in modules:
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
Call Trace:
[<ffffffff8026f40d>] dump_trace+0xaa/0x3f2
[<ffffffff8026f78f>] show_trace+0x3a/0x60
[<ffffffff8026f9d1>] dump_stack+0x15/0x17
[<ffffffff802abfe8>] __lock_acquire+0x724/0x9bb
[<ffffffff802ac52b>] lock_acquire+0x4d/0x67
[<ffffffff80267139>] rt_spin_lock+0x3d/0x41
[<ffffffff8839ed3f>] :ip_conntrack:__ip_ct_refresh_acct+0x131/0x174
[<ffffffff883a1334>] :ip_conntrack:udp_packet+0xbf/0xcf
[<ffffffff8839f9af>] :ip_conntrack:ip_conntrack_in+0x394/0x4a7
[<ffffffff8023551f>] nf_iterate+0x41/0x7f
[<ffffffff8025946a>] nf_hook_slow+0x64/0xd5
[<ffffffff802369a2>] ip_rcv+0x24e/0x506
[...]
Steven Rostedt found the bug: static_obj() check did not take
PERCPU_ENOUGH_ROOM into account, so in-module DEFINE_PER_CPU-area locks
were triggering this message.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This reverts commit 0130b0b32e.
Sergey Vlasov points out (and Vadim Lobanov concurs) that the bug it was
supposed to fix must be some unrelated memory corruption, and the "fix"
actually causes more problems:
"However, the new code does not look safe in all cases. If some other
task has opened more files while dup_fd() released oldf->file_lock, the
new code will update open_files to the new larger value. But newf was
allocated with the old smaller value of open_files, therefore subsequent
accesses to newf may try to write into unallocated memory."
so revert it.
Cc: Sharyathi Nagesh <sharyath@in.ibm.com>
Cc: Sergey Vlasov <vsu@altlinux.ru>
Cc: Vadim Lobanov <vlobanov@speakeasy.net>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When we get a mismatch between handlers on the same IRQ, all we get is "IRQ
handler type mismatch for IRQ n". Let's print the name of the
presently-registered handler with which we got the mismatch.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While testing kernel on machine with "irqpoll" option I've caught such a
lockup:
__do_IRQ()
spin_lock(&desc->lock);
desc->chip->ack(); /* IRQ is ACKed */
note_interrupt()
misrouted_irq()
handle_IRQ_event()
if (...)
local_irq_enable_in_hardirq();
/* interrupts are enabled from now */
...
__do_IRQ() /* same IRQ we've started from */
spin_lock(&desc->lock); /* LOCKUP */
Looking at misrouted_irq() code I've found that a potential deadlock like
this can also take place:
1CPU:
__do_IRQ()
spin_lock(&desc->lock); /* irq = A */
misrouted_irq()
for (i = 1; i < NR_IRQS; i++) {
spin_lock(&desc->lock); /* irq = B */
if (desc->status & IRQ_INPROGRESS) {
2CPU:
__do_IRQ()
spin_lock(&desc->lock); /* irq = B */
misrouted_irq()
for (i = 1; i < NR_IRQS; i++) {
spin_lock(&desc->lock); /* irq = A */
if (desc->status & IRQ_INPROGRESS) {
As the second lock on both CPUs is taken before checking that this irq is
being handled in another processor this may cause a deadlock. This issue
is only theoretical.
I propose the attached patch to fix booth problems: when trying to handle
misrouted IRQ active desc->lock may be unlocked.
Acked-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On running the Stress Test on machine for more than 72 hours following
error message was observed.
0:mon> e
cpu 0x0: Vector: 300 (Data Access) at [c00000007ce2f7f0]
pc: c000000000060d90: .dup_fd+0x240/0x39c
lr: c000000000060d6c: .dup_fd+0x21c/0x39c
sp: c00000007ce2fa70
msr: 800000000000b032
dar: ffffffff00000028
dsisr: 40000000
current = 0xc000000074950980
paca = 0xc000000000454500
pid = 27330, comm = bash
0:mon> t
[c00000007ce2fa70] c000000000060d28 .dup_fd+0x1d8/0x39c (unreliable)
[c00000007ce2fb30] c000000000060f48 .copy_files+0x5c/0x88
[c00000007ce2fbd0] c000000000061f5c .copy_process+0x574/0x1520
[c00000007ce2fcd0] c000000000062f88 .do_fork+0x80/0x1c4
[c00000007ce2fdc0] c000000000011790 .sys_clone+0x5c/0x74
[c00000007ce2fe30] c000000000008950 .ppc_clone+0x8/0xc
The problem is because of race window. When if(expand) block is executed in
dup_fd unlocking of oldf->file_lock give a window for fdtable in oldf to be
modified. So actual open_files in oldf may not match with open_files
variable.
Cc: Vadim Lobanov <vlobanov@speakeasy.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since it is becoming clear that there are just enough users of the binary
sysctl interface that completely removing the binary interface from the kernel
will not be an option for foreseeable future, we need to find a way to address
the sysctl maintenance issues.
The basic problem is that sysctl requires one central authority to allocate
sysctl numbers, or else conflicts and ABI breakage occur. The proc interface
to sysctl does not have that problem, as names are not densely allocated.
By not terminating a sysctl table until I have neither a ctl_name nor a
procname, it becomes simple to add sysctl entries that don't show up in the
binary sysctl interface. Which allows people to avoid allocating a binary
sysctl value when not needed.
I have audited the kernel code and in my reading I have not found a single
sysctl table that wasn't terminated by a completely zero filled entry. So
this change in behavior should not affect anything.
I think this mechanism eases the pain enough that combined with a little
disciple we can solve the reoccurring sysctl ABI breakage.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't warn about libpthread's access to kernel.version. When it receives
-ENOSYS it will read /proc/sys/kernel/version.
If anything else shows up print the sysctl number string.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cal Peake <cp@absolutedigital.net>
Cc: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make the delayacct lock irqsave; this avoids the possible deadlock where
an interrupt is taken while holding the delayacct lock which needs to
take the delayacct lock.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Cpu-hotplug locking has a minor race case caused because of setting the
variable "recursive" to NULL *after* releasing the cpu_bitmask_lock in the
function unlock_cpu_hotplug,instead of doing so before releasing the
cpu_bitmask_lock.
This was the cause of most of the recent false spurious lock_cpu_unlock
warnings.
This should fix the problem reported by Martin Lorenz reported in
http://lkml.org/lkml/2006/10/29/127.
Thanks to Srinivasa DS for pointing it out.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The previous commit (45c18b0bb5, aka "Fix
unlikely (but possible) race condition on task->user access") fixed a
potential oops due to __sigqueue_alloc() getting its "user" pointer out
of sync with switch_user(), and accessing a user pointer that had been
de-allocated on another CPU.
It still left another (much less serious) problem, where a concurrent
__sigqueue_alloc and swich_user could cause sigqueue_alloc to do signal
pending reference counting for a _different_ user than the one it then
actually ended up using. No oops, but we'd end up with the wrong signal
accounting.
Another case of Oleg's eagle-eyes picking up the problem.
This is trivially fixed by just making sure we load whichever "user"
structure we decide to use (it doesn't matter _which_ one we pick, we
just need to pick one) just once.
Acked-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's a possible race condition when doing a "switch_uid()" from one
user to another, which could race with another thread doing a signal
allocation and looking at the old thread ->user pointer as it is freed.
This explains an oops reported by Lukasz Trabinski:
http://permalink.gmane.org/gmane.linux.kernel/462241
We fix this by delaying the (reference-counted) freeing of the user
structure until the thread signal handler lock has been released, so
that we know that the signal allocation has either seen the new value or
has properly incremented the reference count of the old one.
Race identified by Oleg Nesterov.
Cc: Lukasz Trabinski <lukasz@wsisiz.edu.pl>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is needed on bigendian 64bit architectures.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a swsusp debugging mode. This does everything that's needed for a suspend
except for actually suspending. So we can look in the log messages and work
out a) what code is being slow and b) which drivers are misbehaving.
(1)
# echo testproc > /sys/power/disk
# echo disk > /sys/power/state
This should turn off the non-boot CPU, freeze all processes, wait for 5
seconds and then thaw the processes and the CPU.
(2)
# echo test > /sys/power/disk
# echo disk > /sys/power/state
This should turn off the non-boot CPU, freeze all processes, shrink
memory, suspend all devices, wait for 5 seconds, resume the devices etc.
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Stefan Seyfried <seife@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Apparently FUTEX_FD is unfixably racy and nothing uses it (or if it does, it
shouldn't).
Add a warning printk, give any remaining users six months to migrate off it.
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
printk_ratelimit() has global state which makes it not useful for callers
which wish to perform ratelimiting at a particular frequency.
Add a printk_timed_ratelimit() which utilises caller-provided state storage to
permit more flexibility.
This function can in fact be used for things other than printk ratelimiting
and is perhaps poorly named.
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If there are no listeners, taskstats_exit_send() just returns because
taskstats_exit_alloc() didn't allocate *tidstats. This is wrong, each
sub-thread should do fill_tgid_exit() on exit, otherwise its ->delays is
not recorded in ->signal->stats and lost.
Q: We don't send TASKSTATS_TYPE_AGGR_TGID when single-threaded process
exits. Is it good? How can the listener figure out that it was actually a
process exit, not sub-thread?
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Balbir Singh <balbir@in.ibm.com>
Acked-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paranoid fix. The task can free its ->mm after the 'if (p->mm)' check.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For ndiswrapper, don't set the module->taints flags, just set the kernel
global tainted flag. This should allow ndiswrapper to continue to use GPL
symbols.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Florin Malita <fmalita@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
prepare_reply() adds GENL_HDRLEN to the payload (genlmsg_total_size()),
but then it does genlmsg_put()->nlmsg_put(). This means we forget to
reserve a room for 'struct nlmsghdr'.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
'return genlmsg_cancel()' in taskstats_user_cmd/taskstats_exit_send
potentially leaks a skb. Unless we pass 'rep_skb' to the netlink layer
we own sk_buff. This means we should always do kfree_skb() on failure.
[ Thomas acked and pointed out missing return value in original version ]
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Thomas Graf <tgraf@suug.ch>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch (as812) changes the kerneldoc comments explaining the return
values from queue_work(), queue_delayed_work(), and
queue_delayed_work_on(). The updated comments explain more accurately the
meaning of the return code and avoid suggesting that a 0 value means the
routine was unsuccessful.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
_cpu_down() acquires `workqueue_mutex' on its process, but doen't release it
if __cpu_disable() fails.
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I notice that the code which implements adjtime clears the time_adjust
value before using it. The attached patch makes the obvious fix.
Acked-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Jim Houston <jim.houston@ccur.com>
Cc: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fill_tgid() should skip not only an already exited group leader. If the
task has ->exit_state != 0 it already did exit_notify(), so it also did
fill_tgid_exit()->delayacct_add_tsk(->signal->stats) and we should skip it
to avoid a double accounting.
This patch doesn't close the race completely, but it cleanups the code.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove tasklist_lock from taskstats.c. find_task_by_pid() is rcu-safe.
->siglock allows us to traverse subthread without tasklist.
Q: delay accounting looks wrong to me. If sub-thread has already called
taskstats_exit_send() but didn't call release_task(self) yet it will be
accounted twice. The window is big. No?
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
signal_struct is (mostly) protected by ->sighand->siglock, I think we don't
need ->taskstats_lock to protect ->stats. This also allows us to simplify the
locking in fill_tgid().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
taskstats_tgid_free() is called on copy_process's error path. This is wrong.
IF (clone_flags & CLONE_THREAD)
We should not clear ->signal->taskstats, current uses it,
it probably has a valid accumulated info.
ELSE
taskstats_tgid_init() set ->signal->taskstats = NULL,
there is nothing to free.
Move the callsite to __exit_signal(). We don't need any locking, entire
thread group is exiting, nobody should have a reference to soon to be
released ->signal.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1. ts = timespec_sub(uptime, current->group_leader->start_time);
It is possible that current != tsk. Probably it was supposed
to be 'tsk->group_leader->start_time. But why we are reading
group_leader's start_time ? This accounting is per thread,
not per procees, I changed this to 'tsk->start_time.
Please corect me.
2. stats->ac_ppid = (tsk->parent) ? tsk->parent->pid : 0;
tsk->parent never == NULL, and it is unsafe to dereference it.
Both the task and it's parent may exit after the caller unlocks
tasklist_lock, the memory could be unmapped (DEBUG_SLAB).
(And we should use ->real_parent->tgid in fact).
Q: I don't understand the 'if (thread_group_leader(tsk))' check.
Why it is needed ?
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Acked-by: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1. fill_tgid() forgets to do put_task_struct(first).
2. release_task(first) can happen after fill_tgid() drops tasklist_lock,
it is unsafe to dereference first->signal.
This is a temporary fix, imho the locking should be reworked.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@sgi.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This means we can call it when the bitmap we want to fetch is declared
const.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This changes the dwarf2 unwinder to do a binary search for CIEs
instead of a linear work. The linker is unfortunately not
able to build a proper lookup table at link time, instead it creates
one at runtime as soon as the bootmem allocator is usable (so you'll continue
using the linear lookup for the first [hopefully] few calls).
The code should be ready to utilize a build-time created table once
a fixed linker becomes available.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
If CONFIG_PROC_FS=n:
kernel/sysctl.c:148: warning: 'proc_do_cad_pid' used but never defined
kernel/built-in.o:(.data+0x1228): undefined reference to `proc_do_cad_pid'
make: *** [.tmp_vmlinux1] Error 1
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The use of blocking notifier by _cpu_up and _cpu_down in cpu.c has two
problem.
1/ An interaction with the workqueue notifier causes lockdep to spit a
warning.
2/ A notifier could conceivable be added or removed while _cpu_up or
_cpu_down are in process. As each notifier is called twice (prepare
then commit/abort) this could be unhealthy.
To fix to we simply take cpu_add_remove_lock while adding or removing
notifiers to/from the list.
This makes the 'blocking' usage unnecessary as all accesses to cpu_chain
are now protected by cpu_add_remove_lock. So change "blocking" to "raw" in
all relevant places. This fixes 1.
Credit: Andrew Morton
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> (reporter)
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
BUG: warning at kernel/rtmutex-debug.c:125/rt_mutex_debug_task_free() (Not tainted)
[<c04051e3>] show_trace_log_lvl+0x58/0x16a
[<c04057f0>] show_trace+0xd/0x10
[<c0405900>] dump_stack+0x19/0x1b
[<c043f03d>] rt_mutex_debug_task_free+0x35/0x6a
[<c04224c0>] free_task+0x15/0x24
[<c042378c>] copy_process+0x12bd/0x1324
[<c0423835>] do_fork+0x42/0x113
[<c04021dd>] sys_fork+0x19/0x1b
[<c0403fb7>] syscall_call+0x7/0xb
In copy_process(), dup_task_struct() also duplicates the ->pi_lock,
->pi_waiters and ->pi_blocked_on members. rt_mutex_debug_task_free()
called from free_task() validates these members. However free_task() can
be invoked before these members are reset for the new task.
Move the initialization code before the first bail that can hit free_task().
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce desc->name and eliminate the handle_irq_name() hack. Add
set_irq_chip_and_handler_name() to set the flow type and name at once.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
My fancy new swsusp IO code had a big memory leak. It's somewhat invisible
because the whole mem_map[] gets overwritten after resume, but it can cause us
to get low on memory during the actual suspend process.
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The integer divisions in the timer accounting code can round the result
down to 0. Adding 0 is without effect and the signal delivery stops.
Clamp the division result to minimum 1 to avoid this.
Problem was reported by Seongbae Park <spark@google.com>, who provided
also an inital patch.
Roland sayeth:
I have had some more time to think about the problem, and to reproduce it
using Toyo's test case. For the record, if my understanding of the problem
is correct, this happens only in one very particular case. First, the
expiry time has to be so soon that in cputime_t units (usually 1s/HZ ticks)
it's < nthreads so the division yields zero. Second, it only affects each
thread that is so new that its CPU time accumulation is zero so now+0 is
still zero and ->it_*_expires winds up staying zero. For the VIRT and PROF
clocks when cputime_t is tick granularity (or the SCHED clock on
configurations where sched_clock's value only advances on clock ticks), this
is not hard to arrange with new threads starting up and blocking before they
accumulate a whole tick of CPU time. That's what happens in Toyo's test
case.
Note that in general it is fine for that division to round down to zero,
and set each thread's expiry time to its "now" time. The problem only
arises with thread's whose "now" value is still zero, so that now+0 winds up
0 and is interpreted as "not set" instead of ">= now". So it would be a
sufficient and more precise fix to just use max(ticks, 1) inside the loop
when setting each it_*_expires value.
But, it does no harm to round the division up to one and always advance
every thread's expiry time. If the thread didn't already fire timers for
the expiry time of "now", there is no expectation that it will do so before
the next tick anyway. So I followed Thomas's patch in lifting the max out
of the loops.
This patch also covers the reload cases, which are harder to write a test
for (and I didn't try). I've tested it with Toyo's case and it fixes that.
[toyoa@mvista.com: fix: min_t -> max_t]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Daniel Walker <dwalker@mvista.com>
Cc: Toyo Abe <toyoa@mvista.com>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Seongbae Park <spark@google.com>
Cc: Peter Mattis <pmattis@google.com>
Cc: Rohit Seth <rohitseth@google.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Avoid possible PIT livelock issues seen on SMP systems (and reported by
Andi), by not allowing it as a clocksource on SMP boxes.
However, since the PIT may no longer be present, we have to properly handle
the cases where SMP systems have TSC skew and fall back from the TSC.
Since the PIT isn't there, it would "fall back" to the TSC again. So this
changes the jiffies rating to 1, and the TSC-bad rating value to 0.
Thus you will get the following behavior priority on i386 systems:
tsc [if present & stable]
hpet [if present]
cyclone [if present]
acpi_pm [if present]
pit [if UP]
jiffies
Rather then the current more complicated:
tsc [if present & stable]
hpet [if present]
cyclone [if present]
acpi_pm [if present]
pit [if cpus < 4]
tsc [if present & unstable]
jiffies
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In general, lockdep warnings are intended to be non-fatal, so I have put in
various practical limits on internal data structure failure modes. We haven't
had a /single/ lockdep-internal crash ever since lockdep went upstream [the
unwinder crashes are outside of lockdep], and that's largely due to the good
internal checks it does.
Recursion within the dependency graph is currently limited to 20, that's
probably not enough on some many-CPU boxes - this patch doubles it to 40. I
have written the lockdep functions to have as small stackframes as possible,
so 40 should be OK too. (The practical recursion limit should be somewhere
between 100 and 200 entries. If we hit that then I'll change the algorithm to
be iteration-based. Graph walking logic is so easy to program via recursion,
so i'd like to keep recursion as long as possible.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=7371
sys_epoll_pwait needs to be listed as a conditional (weak)
entry point for CONFIG_EPOLL=n.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Bug reported and fixed by Tilman Schmidt <tilman@imap.cc>: if lockdep is
enabled then log messages make it to /var/log/messages belatedly. The
reason is a missed wakeup of klogd.
Initially there was only a lockdep_internal() protection against lockdep
recursion within vprintk() - it grew the 'outer' lockdep_off()/on()
protection only later on. But that lockdep_off() made the
release_console_sem() within vprintk() always happen under the
lockdep_internal() condition, causing the bug.
The right solution to remove the inner protection against recursion here -
the outer one is enough.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>