Commit Graph

29147 Commits

Author SHA1 Message Date
Russell King ba84be2338 remove linux/hardirq.h from asm-generic/local.h
While looking at reducing the amount of architecture namespace pollution
in the generic kernel, I found that asm/irq.h is included in the vast
majority of compilations on ARM (around 650 files.)

Since asm/irq.h includes a sub-architecture include file on ARM, this
causes a negative impact on the ccache's ability to re-use the build
results from other sub-architectures, so we have a desire to reduce the
dependencies on asm/irq.h.

It turns out that a major cause of this is the needless include of
linux/hardirq.h into asm-generic/local.h.  The patch below removes this
include, resulting in some 250 to 300 files (around half) of the kernel
then omitting asm/irq.h.

My test builds still succeed, provided two ARM files are fixed
(arch/arm/kernel/traps.c and arch/arm/mm/fault.c) - so there may be
negative impacts for this on other architectures.

Note that x86 does not include asm/irq.h nor linux/hardirq.h in its
asm/local.h, so this patch can be viewed as bringing the generic version
into line with the x86 version.

[kosaki.motohiro@jp.fujitsu.com: add #include <linux/irqflags.h> to acpi/processor_idle.c]
[adobriyan@gmail.com: fix sparc64]
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:13 -08:00
Eric Dumazet 179f7ebff6 percpu_counter: FBC_BATCH should be a variable
For NR_CPUS >= 16 values, FBC_BATCH is 2*NR_CPUS

Considering more and more distros are using high NR_CPUS values, it makes
sense to use a more sensible value for FBC_BATCH, and get rid of NR_CPUS.

A sensible value is 2*num_online_cpus(), with a minimum value of 32 (This
minimum value helps branch prediction in __percpu_counter_add())

We already have a hotcpu notifier, so we can adjust FBC_BATCH dynamically.

We rename FBC_BATCH to percpu_counter_batch since its not a constant
anymore.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:13 -08:00
David Brownell af9379c712 documentation: when to BUG(), and when to not BUG()
Provide some basic advice about when to use BUG()/BUG_ON(): never, unless
there's really no better option.

This matches my understanding of the standard policy ...  which seems not
to be written down so far, outside of LKML messages that I haven't
bookmarked.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Hans Verkuil <hverkuil@xs4all.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:13 -08:00
Tejun Heo 5f820f648c poll: allow f_op->poll to sleep
f_op->poll is the only vfs operation which is not allowed to sleep.  It's
because poll and select implementation used task state to synchronize
against wake ups, which doesn't have to be the case anymore as wait/wake
interface can now use custom wake up functions.  The non-sleep restriction
can be a bit tricky because ->poll is not called from an atomic context
and the result of accidentally sleeping in ->poll only shows up as
temporary busy looping when the timing is right or rather wrong.

This patch converts poll/select to use custom wake up function and use
separate triggered variable to synchronize against wake up events.  The
only added overhead is an extra function call during wake up and
negligible.

This patch removes the one non-sleep exception from vfs locking rules and
is beneficial to userland filesystem implementations like FUSE, 9p or
peculiar fs like spufs as it's very difficult for those to implement
non-sleeping poll method.

While at it, make the following cosmetic changes to make poll.h and
select.c checkpatch friendly.

* s/type * symbol/type *symbol/		   : three places in poll.h
* remove blank line before EXPORT_SYMBOL() : two places in select.c

Oleg: spotted missing barrier in poll_schedule_timeout()
Davide: spotted missing write barrier in pollwake()

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Brad Boyer <flar@allandria.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Roland McGrath <roland@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:12 -08:00
Darrick J. Wong 9fe06081ef Create a DIV_ROUND_CLOSEST macro to do division with rounding
Create a helper macro to divide two numbers and round the result to the
nearest whole number.  This is a helper macro for hwmon drivers that want
to convert incoming sysfs values per standard hwmon practice, though the
macro itself can be used by anyone.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:12 -08:00
Alexey Dobriyan f1883f86de Remove remaining unwinder code
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Gabor Gombas <gombasg@sztaki.hu>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>,
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:11 -08:00
Matthew Wilcox ea43546750 atomic_t: unify all arch definitions
The atomic_t type cannot currently be used in some header files because it
would create an include loop with asm/atomic.h.  Move the type definition
to linux/types.h to break the loop.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:10 -08:00
KOSAKI Motohiro 9f572e3f96 mm: remove CONFIG_OUT_OF_LINE_PFN_TO_PAGE
No architectures use CONFIG_OUT_OF_LINE_PFN_TO_PAGE - it can be removed.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:10 -08:00
Oleg Nesterov 901608d904 mm: introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting
xacct_add_tsk() relies on do_exit()->update_hiwater_xxx() and uses
mm->hiwater_xxx directly, this leads to 2 problems:

- taskstats_user_cmd() can call fill_pid()->xacct_add_tsk() at any
  moment before the task exits, so we should check the current values of
  rss/vm anyway.

- do_exit()->update_hiwater_xxx() calls are racy.  An exiting thread can
  be preempted right before mm->hiwater_xxx = new_val, and another thread
  can use A_LOT of memory and exit in between.  When the first thread
  resumes it can be the last thread in the thread group, in that case we
  report the wrong hiwater_xxx values which do not take A_LOT into
  account.

Introduce get_mm_hiwater_rss() and get_mm_hiwater_vm() helpers and change
xacct_add_tsk() to use them.  The first helper will also be used by
rusage->ru_maxrss accounting.

Kill do_exit()->update_hiwater_xxx() calls.  Unless we are going to
decrease rss/vm there is no point to update mm->hiwater_xxx, and nobody
can look at this mm_struct when exit_mmap() actually unmaps the memory.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:09 -08:00
Nick Piggin 856bf4d717 fs: sys_sync fix
s_syncing livelock avoidance was breaking data integrity guarantee of
sys_sync, by allowing sys_sync to skip writing or waiting for superblocks
if there is a concurrent sys_sync happening.

This livelock avoidance is much less important now that we don't have the
get_super_to_sync() call after every sb that we sync.  This was replaced
by __put_super_and_need_restart.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:09 -08:00
Nick Piggin 4f5a99d64c fs: remove WB_SYNC_HOLD
Remove WB_SYNC_HOLD.  The primary motiviation is the design of my
anti-starvation code for fsync.  It requires taking an inode lock over the
sync operation, so we could run into lock ordering problems with multiple
inodes.  It is possible to take a single global lock to solve the ordering
problem, but then that would prevent a future nice implementation of "sync
multiple inodes" based on lock order via inode address.

Seems like a backward step to remove this, but actually it is busted
anyway: we can't use the inode lists for data integrity wait: an inode can
be taken off the dirty lists but still be under writeback.  In order to
satisfy data integrity semantics, we should wait for it to finish
writeback, but if we only search the dirty lists, we'll miss it.

It would be possible to have a "writeback" list, for sys_sync, I suppose.
But why complicate things by prematurely optimise?  For unmounting, we
could avoid the "livelock avoidance" code, which would be easier, but
again premature IMO.

Fixing the existing data integrity problem will come next.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:09 -08:00
Hugh Dickins edc315fd22 badpage: remove vma from page_remove_rmap
Remove page_remove_rmap()'s vma arg, which was only for the Eeek message.
And remove the BUG_ON(page_mapcount(page) == 0) from CONFIG_DEBUG_VM's
page_dup_rmap(): we're trying to be more resilient about that than BUGs.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:07 -08:00
Hugh Dickins 2509ef26db badpage: zap print_bad_pte on swap and file
Complete zap_pte_range()'s coverage of bad pagetable entries by calling
print_bad_pte() on a pte_file in a linear vma and on a bad swap entry.
That needs free_swap_and_cache() to tell it, which will also have shown
one of those "swap_free" errors (but with much less information).

Similar checks in fork's copy_one_pte()?  No, that would be more noisy
than helpful: we'll see them when parent and child exec or exit.

Where do_nonlinear_fault() calls print_bad_pte(): omit !VM_CAN_NONLINEAR
case, that could only be a bug in sys_remap_file_pages(), not a bad pte.
VM_FAULT_OOM rather than VM_FAULT_SIGBUS?  Well, okay, that is consistent
with what happens if do_swap_page() operates a bad swap entry; but don't
we have patches to be more careful about killing when VM_FAULT_OOM?

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:07 -08:00
Hugh Dickins 79f4b7bf39 badpage: simplify page_alloc flag check+clear
Simplify the PAGE_FLAGS checking and clearing when freeing and allocating
a page: check the same flags as before when freeing, clear ALL the flags
(unless PageReserved) when freeing, check ALL flags off when allocating.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:07 -08:00
Hugh Dickins 20137a490f swapfile: swapon randomize if nonrot
Swap allocation has always started from the beginning of the swap area;
but if we're dealing with a solidstate swap device which can only remap
blocks within limited zones, that would sooner wear out the first zone.

Therefore sys_swapon() test whether blk_queue is non-rotational, and if so
randomize the cluster_next starting position for allocation.

If blk_queue is nonrot, note SWP_SOLIDSTATE for later use, and report it
with an "SS" at the right end of the kernel's "Adding ...  swap" message
(so that if it's both nonrot and discardable, "SSD" will be shown there).
Perhaps something should be shown in /proc/swaps (swapon -s), but we have
to be more cautious before making any addition to that format.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Joern Engel <joern@logfs.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Donjun Shin <djshin90@gmail.com>
Cc: Tejun Heo <teheo@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:05 -08:00
Hugh Dickins 7992fde72c swapfile: swap allocation use discard
When scan_swap_map() finds a free cluster of swap pages to allocate,
discard the old contents of the cluster if the device supports discard.
But don't bother when swap is so fragmented that we allocate single pages.

Be careful about racing allocations made while we're scanning for a
cluster; and hold up allocations made while we're discarding.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Joern Engel <joern@logfs.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Donjun Shin <djshin90@gmail.com>
Cc: Tejun Heo <teheo@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:05 -08:00
Hugh Dickins 6a6ba83175 swapfile: swapon use discard (trim)
When adding swap, all the old data on swap can be forgotten: sys_swapon()
discard all but the header page of the swap partition (or every extent but
the header of the swap file), to give a solidstate swap device the
opportunity to optimize its wear-levelling.

If that succeeds, note SWP_DISCARDABLE for later use, and report it with a
"D" at the right end of the kernel's "Adding ...  swap" message.  Perhaps
something should be shown in /proc/swaps (swapon -s), but we have to be
more cautious before making any addition to that format.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Joern Engel <joern@logfs.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Donjun Shin <djshin90@gmail.com>
Cc: Tejun Heo <teheo@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:05 -08:00
Hugh Dickins ebebbbe904 swapfile: rearrange scan and swap_info
Before making functional changes, rearrange scan_swap_map() to simplify
subsequent diffs.  Actually, there is one functional change in there:
leave cluster_nr negative while scanning for a new cluster - resetting it
early increased the likelihood that when we have difficulty finding a free
cluster, another task may come in and try doing exactly the same - just a
waste of cpu.

Before making functional changes, rearrange struct swap_info_struct
slightly: flags will be needed as an unsigned long (for wait_on_bit), next
is a good int to pair with prio, old_block_size is uninteresting so shift
it to the end.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:05 -08:00
Hugh Dickins 22c6f8fdb3 swapfile: remove SWP_ACTIVE mask
Remove the SWP_ACTIVE mask: it just obscures the SWP_WRITEOK flag.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:05 -08:00
KOSAKI Motohiro 69beeb1d34 mm: make vread() and vwrite() declaration
Sparse output following warnings.

mm/vmalloc.c:1436:6: warning: symbol 'vread' was not declared. Should it be static?
mm/vmalloc.c:1474:6: warning: symbol 'vwrite' was not declared. Should it be static?

However, it is used by /dev/kmem. fixed here.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:05 -08:00
Hugh Dickins b962716b45 mm: optimize get_scan_ratio for no swap
Rik suggests a simplified get_scan_ratio() for !CONFIG_SWAP.  Yes, the gcc
optimizer gives us that, when nr_swap_pages is #defined as 0L.  Move usual
declaration to swapfile.c: it never belonged in page_alloc.c.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:04 -08:00
Hugh Dickins 60371d971a mm: add add_to_swap stub
If we add a failing stub for add_to_swap(), then we can remove the #ifdef
CONFIG_SWAP from mm/vmscan.c.

This was intended as a source cleanup, but looking more closely, it turns
out that the !CONFIG_SWAP case was going to keep_locked for an anonymous
page, whereas now it goes to the more suitable activate_locked, like the
CONFIG_SWAP nr_swap_pages 0 case.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:04 -08:00
Hugh Dickins ac47b003d0 mm: remove gfp_mask from add_to_swap
Remove gfp_mask argument from add_to_swap(): it's misleading because its
only caller, shrink_page_list(), is not atomic at that point; and in due
course (implementing discard) we'll sometimes want to allocate some memory
with GFP_NOIO (as is used in swap_writepage) when allocating swap.

No change to the gfp_mask passed down to add_to_swap_cache(): still use
__GFP_HIGH without __GFP_WAIT (with nomemalloc and nowarn as before):
though it's not obvious if that's the best combination to ask for here.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:04 -08:00
Hugh Dickins a2c43eed83 mm: try_to_free_swap replaces remove_exclusive_swap_page
remove_exclusive_swap_page(): its problem is in living up to its name.

It doesn't matter if someone else has a reference to the page (raised
page_count); it doesn't matter if the page is mapped into userspace
(raised page_mapcount - though that hints it may be worth keeping the
swap): all that matters is that there be no more references to the swap
(and no writeback in progress).

swapoff (try_to_unuse) has been removing pages from swapcache for years,
with no concern for page count or page mapcount, and we used to have a
comment in lookup_swap_cache() recognizing that: if you go for a page of
swapcache, you'll get the right page, but it could have been removed from
swapcache by the time you get page lock.

So, give up asking for exclusivity: get rid of
remove_exclusive_swap_page(), and remove_exclusive_swap_page_ref() and
remove_exclusive_swap_page_count() which were spawned for the recent LRU
work: replace them by the simpler try_to_free_swap() which just checks
page_swapcount().

Similarly, remove the page_count limitation from free_swap_and_count(),
but assume that it's worth holding on to the swap if page is mapped and
swap nowhere near full.  Add a vm_swap_full() test in free_swap_cache()?
It would be consistent, but I think we probably have enough for now.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:03 -08:00
Hugh Dickins 7b1fe59793 mm: reuse_swap_page replaces can_share_swap_page
A good place to free up old swap is where do_wp_page(), or do_swap_page(),
is about to redirty the page: the data on disk is then stale and won't be
read again; and if we do decide to write the page out later, using the
previous swap location makes an unnecessary disk seek very likely.

So give can_share_swap_page() the side-effect of delete_from_swap_cache()
when it safely can.  And can_share_swap_page() was always a misleading
name, the more so if it has a side-effect: rename it reuse_swap_page().

Irrelevant cleanup nearby: remove swap_token_default_timeout definition
from swap.h: it's used nowhere.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:03 -08:00
David Rientjes 2da02997e0 mm: add dirty_background_bytes and dirty_bytes sysctls
This change introduces two new sysctls to /proc/sys/vm:
dirty_background_bytes and dirty_bytes.

dirty_background_bytes is the counterpart to dirty_background_ratio and
dirty_bytes is the counterpart to dirty_ratio.

With growing memory capacities of individual machines, it's no longer
sufficient to specify dirty thresholds as a percentage of the amount of
dirtyable memory over the entire system.

dirty_background_bytes and dirty_bytes specify quantities of memory, in
bytes, that represent the dirty limits for the entire system.  If either
of these values is set, its value represents the amount of dirty memory
that is needed to commence either background or direct writeback.

When a `bytes' or `ratio' file is written, its counterpart becomes a
function of the written value.  For example, if dirty_bytes is written to
be 8096, 8K of memory is required to commence direct writeback.
dirty_ratio is then functionally equivalent to 8K / the amount of
dirtyable memory:

	dirtyable_memory = free pages + mapped pages + file cache

	dirty_background_bytes = dirty_background_ratio * dirtyable_memory
		-or-
	dirty_background_ratio = dirty_background_bytes / dirtyable_memory

		AND

	dirty_bytes = dirty_ratio * dirtyable_memory
		-or-
	dirty_ratio = dirty_bytes / dirtyable_memory

Only one of dirty_background_bytes and dirty_background_ratio may be
specified at a time, and only one of dirty_bytes and dirty_ratio may be
specified.  When one sysctl is written, the other appears as 0 when read.

The `bytes' files operate on a page size granularity since dirty limits
are compared with ZVC values, which are in page units.

Prior to this change, the minimum dirty_ratio was 5 as implemented by
get_dirty_limits() although /proc/sys/vm/dirty_ratio would show any user
written value between 0 and 100.  This restriction is maintained, but
dirty_bytes has a lower limit of only one page.

Also prior to this change, the dirty_background_ratio could not equal or
exceed dirty_ratio.  This restriction is maintained in addition to
restricting dirty_background_bytes.  If either background threshold equals
or exceeds that of the dirty threshold, it is implicitly set to half the
dirty threshold.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:03 -08:00
David Rientjes 364aeb2849 mm: change dirty limit type specifiers to unsigned long
The background dirty and dirty limits are better defined with type
specifiers of unsigned long since negative writeback thresholds are not
possible.

These values, as returned by get_dirty_limits(), are normally compared
with ZVC values to determine whether writeback shall commence or be
throttled.  Such page counts cannot be negative, so declaring the page
limits as signed is unnecessary.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:02 -08:00
Hugh Dickins 2afd1c928f mm: make page_lock_anon_vma() static
page_lock_anon_vma() and page_unlock_anon_vma() were made available to
show_page_path() in vmscan.c; but now that has been removed, make them
static in rmap.c again, they're better kept private if possible.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:02 -08:00
Hugh Dickins b5934c5318 mm: add_active_or_unevictable into rmap
lru_cache_add_active_or_unevictable() and page_add_new_anon_rmap() always
appear together.  Save some symbol table space and some jumping around by
removing lru_cache_add_active_or_unevictable(), folding its code into
page_add_new_anon_rmap(): like how we add file pages to lru just after
adding them to page cache.

Remove the nearby "TODO: is this safe?" comments (yes, it is safe), and
change page_add_new_anon_rmap()'s address BUG_ON to VM_BUG_ON as
originally intended.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:02 -08:00
Hugh Dickins 6d91add09f mm: add Set,ClearPageSwapCache stubs
If we add NOOP stubs for SetPageSwapCache() and ClearPageSwapCache(), then
we can remove the #ifdef CONFIG_SWAPs from mm/migrate.c.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:02 -08:00
Hugh Dickins 3c1d43787b mm: remove GFP_HIGHUSER_PAGECACHE
GFP_HIGHUSER_PAGECACHE is just an alias for GFP_HIGHUSER_MOVABLE, making
that harder to track down: remove it, and its out-of-work brothers
GFP_NOFS_PAGECACHE and GFP_USER_PAGECACHE.

Since we're making that improvement to hotremove_migrate_alloc(), I think
we can now also remove one of the "o"s from its comment.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:01 -08:00
Hugh Dickins e5991371ee mm: remove cgroup_mm_owner_callbacks
cgroup_mm_owner_callbacks() was brought in to support the memrlimit
controller, but sneaked into mainline ahead of it.  That controller has
now been shelved, and the mm_owner_changed() args were inadequate for it
anyway (they needed an mm pointer instead of a task pointer).

Remove the dead code, and restore mm_update_next_owner() locking to how it
was before: taking mmap_sem there does nothing for memcontrol.c, now the
only user of mm->owner.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Paul Menage <menage@google.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:01 -08:00
KOSAKI Motohiro 64cdd548ff mm: cleanup: remove #ifdef CONFIG_MIGRATION
#ifdef in *.c file decrease source readability a bit.  removing is better.

This patch doesn't have any functional change.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:00 -08:00
KOSAKI Motohiro 1b0bd11886 mm: get rid of pagevec_release_nonlru()
speculative page references patch (commit:
e286781d5f) removed last
pagevec_release_nonlru() caller.

So this function can be removed now.

This patch doesn't have any functional change.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:00 -08:00
Gary Hade c04fc586c1 mm: show node to memory section relationship with symlinks in sysfs
Show node to memory section relationship with symlinks in sysfs

Add /sys/devices/system/node/nodeX/memoryY symlinks for all
the memory sections located on nodeX.  For example:
/sys/devices/system/node/node1/memory135 -> ../../memory/memory135
indicates that memory section 135 resides on node1.

Also revises documentation to cover this change as well as updating
Documentation/ABI/testing/sysfs-devices-memory to include descriptions
of memory hotremove files 'phys_device', 'phys_index', and 'state'
that were previously not described there.

In addition to it always being a good policy to provide users with
the maximum possible amount of physical location information for
resources that can be hot-added and/or hot-removed, the following
are some (but likely not all) of the user benefits provided by
this change.
Immediate:
  - Provides information needed to determine the specific node
    on which a defective DIMM is located.  This will reduce system
    downtime when the node or defective DIMM is swapped out.
  - Prevents unintended onlining of a memory section that was
    previously offlined due to a defective DIMM.  This could happen
    during node hot-add when the user or node hot-add assist script
    onlines _all_ offlined sections due to user or script inability
    to identify the specific memory sections located on the hot-added
    node.  The consequences of reintroducing the defective memory
    could be ugly.
  - Provides information needed to vary the amount and distribution
    of memory on specific nodes for testing or debugging purposes.
Future:
  - Will provide information needed to identify the memory
    sections that need to be offlined prior to physical removal
    of a specific node.

Symlink creation during boot was tested on 2-node x86_64, 2-node
ppc64, and 2-node ia64 systems.  Symlink creation during physical
memory hot-add tested on a 2-node x86_64 system.

Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:00 -08:00
David Rientjes 75aa199410 oom: print triggering task's cpuset and mems allowed
When cpusets are enabled, it's necessary to print the triggering task's
set of allowable nodes so the subsequently printed meminfo can be
interpreted correctly.

We also print the task's cpuset name for informational purposes.

[rientjes@google.com: task lock current before dereferencing cpuset]
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:58:59 -08:00
Nick Piggin 1c0fe6e3bd mm: invoke oom-killer from page fault
Rather than have the pagefault handler kill a process directly if it gets
a VM_FAULT_OOM, have it call into the OOM killer.

With increasingly sophisticated oom behaviour (cpusets, memory cgroups,
oom killing throttling, oom priority adjustment or selective disabling,
panic on oom, etc), it's silly to unconditionally kill the faulting
process at page fault time.  Create a hook for pagefault oom path to call
into instead.

Only converted x86 and uml so far.

[akpm@linux-foundation.org: make __out_of_memory() static]
[akpm@linux-foundation.org: fix comment]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Jeff Dike <jdike@addtoit.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:58:58 -08:00
Mel Gorman 3340289ddf mm: report the MMU pagesize in /proc/pid/smaps
The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the
kernel to back a VMA.  This matches the size used by the MMU in the
majority of cases.  However, one counter-example occurs on PPC64 kernels
whereby a kernel using 64K as a base pagesize may still use 4K pages for
the MMU on older processor.  To distinguish, this patch reports
MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:58:58 -08:00
Mel Gorman 08fba69986 mm: report the pagesize backing a VMA in /proc/pid/smaps
It is useful to verify a hugepage-aware application is using the expected
pagesizes for its memory regions. This patch creates an entry called
KernelPageSize in /proc/pid/smaps that is the size of page used by the
kernel to back a VMA. The entry is not called PageSize as it is possible
the MMU uses a different size. This extension should not break any sensible
parser that skips lines containing unrecognised information.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:58:58 -08:00
James Morris ac8cc0fa53 Merge branch 'next' into for-linus 2009-01-07 09:58:22 +11:00
David Howells 3699c53c48 CRED: Fix regression in cap_capable() as shown up by sys_faccessat() [ver #3]
Fix a regression in cap_capable() due to:

	commit 3b11a1dece
	Author: David Howells <dhowells@redhat.com>
	Date:   Fri Nov 14 10:39:26 2008 +1100

	    CRED: Differentiate objective and effective subjective credentials on a task

The problem is that the above patch allows a process to have two sets of
credentials, and for the most part uses the subjective credentials when
accessing current's creds.

There is, however, one exception: cap_capable(), and thus capable(), uses the
real/objective credentials of the target task, whether or not it is the current
task.

Ordinarily this doesn't matter, since usually the two cred pointers in current
point to the same set of creds.  However, sys_faccessat() makes use of this
facility to override the credentials of the calling process to make its test,
without affecting the creds as seen from other processes.

One of the things sys_faccessat() does is to make an adjustment to the
effective capabilities mask, which cap_capable(), as it stands, then ignores.

The affected capability check is in generic_permission():

	if (!(mask & MAY_EXEC) || execute_ok(inode))
		if (capable(CAP_DAC_OVERRIDE))
			return 0;

This change passes the set of credentials to be tested down into the commoncap
and SELinux code.  The security functions called by capable() and
has_capability() select the appropriate set of credentials from the process
being checked.

This can be tested by compiling the following program from the XFS testsuite:

/*
 *  t_access_root.c - trivial test program to show permission bug.
 *
 *  Written by Michael Kerrisk - copyright ownership not pursued.
 *  Sourced from: http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-10/6030.html
 */
#include <limits.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/stat.h>

#define UID 500
#define GID 100
#define PERM 0
#define TESTPATH "/tmp/t_access"

static void
errExit(char *msg)
{
    perror(msg);
    exit(EXIT_FAILURE);
} /* errExit */

static void
accessTest(char *file, int mask, char *mstr)
{
    printf("access(%s, %s) returns %d\n", file, mstr, access(file, mask));
} /* accessTest */

int
main(int argc, char *argv[])
{
    int fd, perm, uid, gid;
    char *testpath;
    char cmd[PATH_MAX + 20];

    testpath = (argc > 1) ? argv[1] : TESTPATH;
    perm = (argc > 2) ? strtoul(argv[2], NULL, 8) : PERM;
    uid = (argc > 3) ? atoi(argv[3]) : UID;
    gid = (argc > 4) ? atoi(argv[4]) : GID;

    unlink(testpath);

    fd = open(testpath, O_RDWR | O_CREAT, 0);
    if (fd == -1) errExit("open");

    if (fchown(fd, uid, gid) == -1) errExit("fchown");
    if (fchmod(fd, perm) == -1) errExit("fchmod");
    close(fd);

    snprintf(cmd, sizeof(cmd), "ls -l %s", testpath);
    system(cmd);

    if (seteuid(uid) == -1) errExit("seteuid");

    accessTest(testpath, 0, "0");
    accessTest(testpath, R_OK, "R_OK");
    accessTest(testpath, W_OK, "W_OK");
    accessTest(testpath, X_OK, "X_OK");
    accessTest(testpath, R_OK | W_OK, "R_OK | W_OK");
    accessTest(testpath, R_OK | X_OK, "R_OK | X_OK");
    accessTest(testpath, W_OK | X_OK, "W_OK | X_OK");
    accessTest(testpath, R_OK | W_OK | X_OK, "R_OK | W_OK | X_OK");

    exit(EXIT_SUCCESS);
} /* main */

This can be run against an Ext3 filesystem as well as against an XFS
filesystem.  If successful, it will show:

	[root@andromeda src]# ./t_access_root /tmp/xxx 0 4043 4043
	---------- 1 dhowells dhowells 0 2008-12-31 03:00 /tmp/xxx
	access(/tmp/xxx, 0) returns 0
	access(/tmp/xxx, R_OK) returns 0
	access(/tmp/xxx, W_OK) returns 0
	access(/tmp/xxx, X_OK) returns -1
	access(/tmp/xxx, R_OK | W_OK) returns 0
	access(/tmp/xxx, R_OK | X_OK) returns -1
	access(/tmp/xxx, W_OK | X_OK) returns -1
	access(/tmp/xxx, R_OK | W_OK | X_OK) returns -1

If unsuccessful, it will show:

	[root@andromeda src]# ./t_access_root /tmp/xxx 0 4043 4043
	---------- 1 dhowells dhowells 0 2008-12-31 02:56 /tmp/xxx
	access(/tmp/xxx, 0) returns 0
	access(/tmp/xxx, R_OK) returns -1
	access(/tmp/xxx, W_OK) returns -1
	access(/tmp/xxx, X_OK) returns -1
	access(/tmp/xxx, R_OK | W_OK) returns -1
	access(/tmp/xxx, R_OK | X_OK) returns -1
	access(/tmp/xxx, W_OK | X_OK) returns -1
	access(/tmp/xxx, R_OK | W_OK | X_OK) returns -1

I've also tested the fix with the SELinux and syscalls LTP testsuites.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-01-07 09:38:48 +11:00
James Morris 29881c4502 Revert "CRED: Fix regression in cap_capable() as shown up by sys_faccessat() [ver #2]"
This reverts commit 14eaddc967.

David has a better version to come.
2009-01-07 09:21:54 +11:00
Oliver Hartkopp 1fa17d4ba4 can: omit unneeded skb_clone() calls
The AF_CAN core delivered always cloned sk_buffs to the AF_CAN
protocols, although this was _only_ needed by the can-raw protocol.
With this (additionally documented) change, the AF_CAN core calls the
callback functions of the registered AF_CAN protocols with the original
(uncloned) sk_buff pointer and let's the can-raw protocol do the
skb_clone() itself which omits all unneeded skb_clone() calls for other
AF_CAN protocols.

Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-06 11:07:54 -08:00
Herbert Xu e1c096e251 vlan: Add GRO interfaces
This patch adds GRO interfaces for hardware-assisted VLAN reception.
With this in place we're now at parity with LRO as far as the
interface is concerned.  That is, you can now take any LRO driver
and convert it over to GRO.

As the CB memory clashes with GRO's use of CB, I've removed it
entirely by storing dev in skb->dev.  This is OK because VLAN
gets called first thing in netif_receive_skb and skb->dev is
not used in between us calling netif_rx and netif_receive_skb
getting called.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-06 10:50:09 -08:00
Herbert Xu 96e93eab20 gro: Add internal interfaces for VLAN
Previously GRO's only entry point from the outside is through
napi_gro_receive and napi_gro_frags.  These interfaces are for
device drivers.

This patch rearranges things to provide a new set of interfaces
for VLANs.  These interfaces are for internal use only.  The
VLAN code itself can then provide a set of entry points for
device drivers.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-06 10:49:34 -08:00
Stephen Rothwell b8ac9fc0e8 uio: make uio_info's name and version const
These are only ever assigned constant strings and never modified.

This was noticed because Wolfram Sang needed to cast the result of
of_get_property() in order to assign it to the name field of a struct
uio_info.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Hans J. Koch <hjk@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-06 10:44:44 -08:00
Hans J. Koch e70c412ee4 UIO: Pass information about ioports to userspace (V2)
Devices sometimes have memory where all or parts of it can not be mapped to
userspace. But it might still be possible to access this memory from
userspace by other means. An example are PCI cards that advertise not only
mappable memory but also ioport ranges. On x86 architectures, these can be
accessed with ioperm, iopl, inb, outb, and friends. Mike Frysinger (CCed)
reported a similar problem on Blackfin arch where it doesn't seem to be easy
to mmap non-cached memory but it can still be accessed from userspace.

This patch allows kernel drivers to pass information about such ports to
userspace. Similar to the existing mem[] array, it adds a port[] array to
struct uio_info. Each port range is described by start, size, and porttype.

If a driver fills in at least one such port range, the UIO core will simply
pass this information to userspace by creating a new directory "portio"
underneath /sys/class/uio/uioN/. Similar to the "mem" directory, it will
contain a subdirectory (portX) for each port range given.

Note that UIO simply passes this information to userspace, it performs no
action whatsoever with this data. It's userspace's responsibility to obtain
access to these ports and to solve arch dependent issues. The "porttype"
attribute tells userspace what kind of port it is dealing with.

This mechanism could also be used to give userspace information about GPIOs
related to a device. You frequently find such hardware in embedded devices,
so I added a UIO_PORT_GPIO definition. I'm not really sure if this is a good
idea since there are other solutions to this problem, but it won't hurt much
anyway.

Signed-off-by: Hans J. Koch <hjk@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-06 10:44:44 -08:00
Kay Sievers 475b44c199 mtd: struct device - replace bus_id with dev_name(), dev_set_name()
CC: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-06 10:44:38 -08:00
Mark McLoughlin 0aa0dc41bf driver core: add root_device_register()
Add support for allocating root device objects which group
device objects under /sys/devices directories.

Also add a sysfs 'module' symlink which points to the owner
of the root device object. This symlink will be used in virtio
to allow userspace to determine which virtio bus implementation
a given device is associated with.

[Includes suggestions from Cornelia Huck]

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-06 10:44:33 -08:00
Cornelia Huck d0d85ff989 Make DEBUG take precedence over DYNAMIC_PRINTK_DEBUG
Statically defined DEBUG should take precedence over
dynamically enabled debugging; otherwise adding DEBUG
(like, for example, via CONFIG_DEBUG_KOBJECT) does not
have the expected result of printing pr_debug() and dev_dbg()
messages unconditionally.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-06 10:44:33 -08:00
Greg Kroah-Hartman b9daa99ee5 driver core: move knode_bus into private structure
Nothing outside of the driver core should ever touch knode_bus, so
move it out of the public eye.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-06 10:44:33 -08:00
Greg Kroah-Hartman 93e746db18 driver core: move knode_driver into private structure
Nothing outside of the driver core should ever touch knode_driver, so
move it out of the public eye.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-06 10:44:32 -08:00
Greg Kroah-Hartman 11c3b5c3e0 driver core: move klist_children into private structure
Nothing outside of the driver core should ever touch klist_children, or
knode_parent, so move them out of the public eye.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-06 10:44:32 -08:00
Greg Kroah-Hartman 2831fe6f9c driver core: create a private portion of struct device
This is to be used to move things out of struct device that no code
outside of the driver core should ever touch.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-06 10:44:32 -08:00
Matthew Wilcox 210272a284 driver core: Remove completion from struct klist_node
Removing the completion from klist_node reduces its size from 64 bytes
to 28 on x86-64.  To maintain the semantics of klist_remove(), we add
a single list of klist nodes which are pending deletion and scan them.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-06 10:44:30 -08:00
Matthew Wilcox 929d2fa595 driver core: Rearrange struct device for better packing
This minor rearrangement saves 16 bytes from sizeof(struct device)
according to pahole.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-06 10:44:30 -08:00
Alan Stern 7f4f5d4516 Fix misspellings in pm.h macros
This patch (as1167) fixes some misspellings in various recently-added
macros in pm.h.  Fortunately these macros are not yet used anywhere.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-01-06 10:44:30 -08:00
Rafael J. Wysocki adf094931f PM: Simplify the new suspend/hibernation framework for devices
PM: Simplify the new suspend/hibernation framework for devices

Following the discussion at the Kernel Summit, simplify the new
device PM framework by merging 'struct pm_ops' and
'struct pm_ext_ops' and removing pointers to 'struct pm_ext_ops'
from 'struct platform_driver' and 'struct pci_driver'.

After this change, the suspend/hibernation callbacks will only
reside in 'struct device_driver' as well as at the bus type/
device class/device type level.  Accordingly, PCI and platform
device drivers are now expected to put their suspend/hibernation
callbacks into the 'struct device_driver' embedded in
'struct pci_driver' or 'struct platform_driver', respectively.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@suse.cz>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-06 10:44:29 -08:00
Dan Williams 864498aaa9 dmaengine: use idr for registering dma device numbers
This brings some predictability to dma device numbers, i.e. an rmmod/insmod
cycle may now result in /sys/class/dma/dma0chan0 being restored rather than
/sys/class/dma/dma1chan0 appearing.

Cc: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:21 -07:00
Dan Williams 41d5e59c12 dmaengine: add a release for dma class devices and dependent infrastructure
Resolves:
WARNING: at drivers/base/core.c:122 device_release+0x4d/0x52()
Device 'dma0chan0' does not have a release() function, it is broken and must be fixed.

The dma_chan_dev object is introduced to gear-match sysfs kobject and
dmaengine channel lifetimes.  When a channel is removed access to the
sysfs entries return -ENODEV until the kobject can be released.

The bulk of the change is updates to existing code to handle the extra
layer of indirection between a dma_chan and its struct device.

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Cc: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:21 -07:00
Dan Williams 7dd6025101 dmaengine: kill enum dma_state_client
DMA_NAK is now useless.  We can just use a bool instead.

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:19 -07:00
Dan Williams f27c580c36 dmaengine: remove 'bigref' infrastructure
Reference counting is done at the module level so clients need not worry
that a channel will leave while they are actively using dmaengine.

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:18 -07:00
Dan Williams aa1e6f1a38 dmaengine: kill struct dma_client and supporting infrastructure
All users have been converted to either the general-purpose allocator,
dma_find_channel, or dma_request_channel.

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:17 -07:00
Dan Williams 209b84a88f dmaengine: replace dma_async_client_register with dmaengine_get
Now that clients no longer need to be notified of channel arrival
dma_async_client_register can simply increment the dmaengine_ref_count.

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:17 -07:00
Dan Williams 74465b4ff9 atmel-mci: convert to dma_request_channel and down-level dma_slave
dma_request_channel provides an exclusive channel, so we no longer need to
pass slave data through dmaengine.

Cc: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:16 -07:00
Dan Williams 33df8ca068 dmatest: convert to dma_request_channel
Replace the client registration infrastructure with a custom loop to
poll for channels.  Once dma_request_channel returns NULL stop asking
for channels.  A userspace side effect of this change if that loading
the dmatest module before loading a dma driver will result in no
channels being found, previously dmatest would get a callback.  To
facilitate testing in the built-in case dmatest_init is marked as a
late_initcall.  Another side effect is that channels under test can not
be used for any other purpose.

Cc: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:15 -07:00
Dan Williams 59b5ec2144 dmaengine: introduce dma_request_channel and private channels
This interface is primarily for device-to-memory clients which need to
search for dma channels with platform-specific characteristics.  The
prototype is:

struct dma_chan *dma_request_channel(dma_cap_mask_t mask,
                                     dma_filter_fn filter_fn,
                                     void *filter_param);

When the optional 'filter_fn' parameter is set to NULL
dma_request_channel simply returns the first channel that satisfies the
capability mask.  Otherwise, when the mask parameter is insufficient for
specifying the necessary channel, the filter_fn routine can be used to
disposition the available channels in the system. The filter_fn routine
is called once for each free channel in the system.  Upon seeing a
suitable channel filter_fn returns DMA_ACK which flags that channel to
be the return value from dma_request_channel.  A channel allocated via
this interface is exclusive to the caller, until dma_release_channel()
is called.

To ensure that all channels are not consumed by the general-purpose
allocator the DMA_PRIVATE capability is provided to exclude a dma_device
from general-purpose (memory-to-memory) consideration.

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:15 -07:00
Dan Williams f67b459992 net_dma: convert to dma_find_channel
Use the general-purpose channel allocation provided by dmaengine.

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:15 -07:00
Dan Williams 2ba05622b8 dmaengine: provide a common 'issue_pending_all' implementation
async_tx and net_dma each have open-coded versions of issue_pending_all,
so provide a common routine in dmaengine.

The implementation needs to walk the global device list, so implement
rcu to allow dma_issue_pending_all to run lockless.  Clients protect
themselves from channel removal events by holding a dmaengine reference.

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:14 -07:00
Dan Williams bec085134e dmaengine: centralize channel allocation, introduce dma_find_channel
Allowing multiple clients to each define their own channel allocation
scheme quickly leads to a pathological situation.  For memory-to-memory
offload all clients can share a central allocator.

This simply moves the existing async_tx allocator to dmaengine with
minimal fixups:
* async_tx.c:get_chan_ref_by_cap --> dmaengine.c:nth_chan
* async_tx.c:async_tx_rebalance --> dmaengine.c:dma_channel_rebalance
* split out common code from async_tx.c:__async_tx_find_channel -->
  dma_find_channel

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:14 -07:00
Dan Williams 6f49a57aa5 dmaengine: up-level reference counting to the module level
Simply, if a client wants any dmaengine channel then prevent all dmaengine
modules from being removed.  Once the clients are done re-enable module
removal.

Why?, beyond reducing complication:
1/ Tracking reference counts per-transaction in an efficient manner, as
   is currently done, requires a complicated scheme to avoid cache-line
   bouncing effects.
2/ Per-transaction ref-counting gives the false impression that a
   dma-driver can be gracefully removed ahead of its user (net, md, or
   dma-slave)
3/ None of the in-tree dma-drivers talk to hot pluggable hardware, but
   if such an engine were built one day we still would not need to notify
   clients of remove events.  The driver can simply return NULL to a
   ->prep() request, something that is much easier for a client to handle.

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-06 11:38:14 -07:00
Chuck Lever 57ef692588 NLM: Rewrite IPv4 privileged requester's check
Clean up.

For consistency, rewrite the IPv4 check to match the same style as the
new IPv6 check.  Note that ipv4_is_loopback() is somewhat broader in
its interpretation of what is a loopback address than simply
"127.0.0.1".

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:56 -05:00
Chuck Lever d1208f7073 NLM: nlm_privileged_requester() doesn't recognize mapped loopback address
Commit b85e4676 added the nlm_privileged_requester() helper to check
whether an RPC request was sent from a local privileged caller.  It
recognizes IPv4 privileged callers (from "127.0.0.1"), and IPv6
privileged callers (from "::1").

However, IPV6_ADDR_LOOPBACK is not set for the mapped IPv4 loopback
address (::ffff:7f00:0001), so the test breaks when the kernel's RPC
service is IPv6-enabled but user space is calling via the IPv4
loopback address.  This is actually the most common case for IPv6-
enabled RPC services on Linux.

Rewrite the IPv6 check to handle the mapped IPv4 loopback address as
well as a normal IPv6 loopback address.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:56 -05:00
Chuck Lever 8529bc51d3 NSM: Move nsm_addr() to fs/lockd/mon.c
Clean up: nsm_addr_in() is no longer used, and nsm_addr() is used only in
fs/lockd/mon.c, so move it there.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:55 -05:00
Chuck Lever e6765b8397 NSM: Remove include/linux/lockd/sm_inter.h
Clean up: The include/linux/lockd/sm_inter.h header is nearly empty
now.  Remove it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:55 -05:00
Chuck Lever 92fd91b998 NLM: Remove "create" argument from nsm_find()
Clean up: nsm_find() now has only one caller, and that caller
unconditionally sets the @create argument. Thus the @create
argument is no longer needed.

Since nsm_find() now has a more specific purpose, pick a more
appropriate name for it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:54 -05:00
Chuck Lever 3420a8c435 NSM: Add nsm_lookup() function
Introduce a new API to fs/lockd/mon.c that allows nlm_host_rebooted()
to lookup up nsm_handles via the contents of an nlm_reboot struct.

The new function is equivalent to calling nsm_find() with @create set
to zero, but it takes a struct nlm_reboot instead of separate
arguments.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:54 -05:00
Chuck Lever 576df4634e NLM: Decode "priv" argument of NLMPROC_SM_NOTIFY as an opaque
The NLM XDR decoders for the NLMPROC_SM_NOTIFY procedure should treat
their "priv" argument truly as an opaque, as defined by the protocol,
and let the upper layers figure out what is in it.

This will make it easier to modify the contents and interpretation of
the "priv" argument, and keep knowledge about what's in "priv" local
to fs/lockd/mon.c.

For now, the NLM and NSM implementations should behave exactly as they
did before.

The formation of the address of the rebooted host in
nlm_host_rebooted() may look a little strange, but it is the inverse
of how nsm_init_private() forms the private cookie.  Plus, it's
going away soon anyway.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:54 -05:00
Chuck Lever 7fefc9cb9d NLM: Change nlm_host_rebooted() to take a single nlm_reboot argument
Pass the nlm_reboot data structure directly from the NLMPROC_SM_NOTIFY
XDR decoders to nlm_host_rebooted().  This eliminates some packing and
unpacking of the NLMPROC_SM_NOTIFY results, and prepares for passing
these results, including the "priv" cookie, directly to a lookup
routine in fs/lockd/mon.c.

This patch changes code organization but should not cause any
behavioral change.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:54 -05:00
Chuck Lever 7e44d3bea2 NSM: Generate NSMPROC_MON's "priv" argument when nsm_handle is created
Introduce a new data type, used by both the in-kernel NLM and NSM
implementations, that is used to manage the opaque "priv" argument
for the NSMPROC_MON and NLMPROC_SM_NOTIFY calls.

Construct the "priv" cookie when the nsm_handle is created.

The nsm_init_private() function may look a little strange, but it is
roughly equivalent to how the XDR encoder formed the "priv" argument.
It's going to go away soon.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:53 -05:00
Chuck Lever 67c6d107a6 NSM: Move nsm_find() to fs/lockd/mon.c
The nsm_find() function sets up fresh nsm_handle entries.  This is
where we will store the "priv" cookie used to lookup nsm_handles during
reboot recovery.  The cookie will be constructed when nsm_find()
creates a new nsm_handle.

As much as possible, I would like to keep everything that handles a
"priv" cookie in fs/lockd/mon.c so that all the smarts are in one
source file.  That organization should make it pretty simple to see how
all this works.

To me, it makes more sense than the current arrangement to keep
nsm_find() with nsm_monitor() and nsm_unmonitor().

So, start reorganizing by moving nsm_find() into fs/lockd/mon.c.  The
nsm_release() function comes along too, since it shares the nsm_lock
global variable.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:53 -05:00
Chuck Lever 36e8e668d3 NSM: Move NSM program and procedure numbers to fs/lockd/mon.c
Clean up: Move the RPC program and procedure numbers for NSM into the
one source file that needs them: fs/lockd/mon.c.

And, as with NLM, NFS, and rpcbind calls, use NSMPROC_FOO instead of
SM_FOO for NSM procedure numbers.

Finally, make a couple of comments more precise: what is referred to
here as SM_NOTIFY is really the NLM (lockd) NLMPROC_SM_NOTIFY downcall,
not NSMPROC_NOTIFY.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:52 -05:00
Chuck Lever 9c1bfd037f NSM: Move NSM-related XDR data structures to lockd's xdr.h
Clean up: NSM's XDR data structures are used only in fs/lockd/mon.c,
so move them there.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:52 -05:00
Chuck Lever 356c3eb466 NLM: Move the public declaration of nsm_unmonitor() to lockd.h
Clean up.

Make the nlm_host argument "const," and move the public declaration to
lockd.h.  Add a documenting comment.

Bruce observed that nsm_unmonitor()'s only caller doesn't care about
its return code, so make nsm_unmonitor() return void.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:52 -05:00
Chuck Lever c8c23c423d NSM: Release nsmhandle in nlm_destroy_host
The nsm_handle's reference count is bumped in nlm_lookup_host().  It
should be decremented in nlm_destroy_host() to make it easier to see
the balance of these two operations.

Move the nsm_release() call to fs/lockd/host.c.

The h_nsmhandle pointer is set in nlm_lookup_host(), and never cleared.
The nlm_destroy_host() function is never called for the same nlm_host
twice, so h_nsmhandle won't ever be NULL when nsm_unmonitor() is
called.

All references to the nlm_host are gone before it is freed.  We can
skip making h_nsmhandle NULL just before the nlm_host is deallocated.

It's also likely we can remove the h_nsmhandle NULL check in
nlmsvc_is_client() as well, but we can do that later when rearchitect-
ing the nlm_host cache.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:52 -05:00
Chuck Lever 1e49323c4a NLM: Move the public declaration of nsm_monitor() to lockd.h
Clean up.

Make the nlm_host argument "const," and move the public declaration to
lockd.h with other NSM public function (nsm_release, eg) and global
variable declarations.

Add a documenting comment.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:52 -05:00
Chuck Lever 29ed1407ed NSM: Support IPv6 version of mon_name
The "mon_name" argument of the NSMPROC_MON and NSMPROC_UNMON upcalls
is a string that contains the hostname or IP address of the remote peer
to be notified when this host has rebooted.  The sm-notify command uses
this identifier to contact the peer when we reboot, so it must be
either a well-qualified DNS hostname or a presentation format IP
address string.

When the "nsm_use_hostnames" sysctl is set to zero, the kernel's NSM
provides a presentation format IP address in the "mon_name" argument.
Otherwise, the "caller_name" argument from NLM requests is used,
which is usually just the DNS hostname of the peer.

To support IPv6 addresses for the mon_name argument, we use the
nsm_handle's address eye-catcher, which already contains an appropriate
presentation format address string.  Using the eye-catcher string
obviates the need to use a large buffer on the stack to form the
presentation address string for the upcall.

This patch also addresses a subtle bug.

An NSMPROC_MON request and the subsequent NSMPROC_UNMON request for the
same peer are required to use the same value for the "mon_name"
argument.  Otherwise, rpc.statd's NSMPROC_UNMON processing cannot
locate the database entry for that peer and remove it.

If the setting of nsm_use_hostnames is changed between the time the
kernel sends an NSMPROC_MON request and the time it sends the
NSMPROC_UNMON request for the same peer, the "mon_name" argument for
these two requests may not be the same.  This is because the value of
"mon_name" is currently chosen at the moment the call is made based on
the setting of nsm_use_hostnames

To ensure both requests pass identical contents in the "mon_name"
argument, we now select which string to use for the argument in the
nsm_monitor() function.  A pointer to this string is saved in the
nsm_handle so it can be used for a subsequent NSMPROC_UNMON upcall.

NB: There are other potential problems, such as how nlm_host_rebooted()
might behave if nsm_use_hostnames were changed while hosts are still
being monitored.  This patch does not attempt to address those
problems.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:51 -05:00
Chuck Lever f47534f7f0 NSM: Use modern style for sm_name field in nsm_handle
Clean up: I'm about to add another "char *" field to the nsm_handle
structure.  The sm_name field uses an older style of declaring a
"char *" field.  If I match that style for the new field, checkpatch.pl
will complain.

So, fix the sm_name field to use the new style.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:50 -05:00
Chuck Lever bc995801a0 NLM: Support IPv6 scope IDs in nlm_display_address()
Scope ID support is needed since the kernel's NSM implementation is
about to use these displayed addresses as a mon_name in some cases.

When nsm_use_hostnames is zero, without scope ID support NSM will fail
to handle peers that contact us via a link-local address.  Link-local
addresses do not work without an interface ID, which is stored in the
sockaddr's sin6_scope_id field.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:49 -05:00
Chuck Lever 1df40b609a NLM: Remove address eye-catcher buffers from nlm_host
The h_name field in struct nlm_host is a just copy of
h_nsmhandle->sm_name.  Likewise, the contents of the h_addrbuf field
should be identical to the sm_addrbuf field.

The h_srcaddrbuf field is used only in one place for debugging.  We can
live without this until we get %pI formatting for printk().

Currently these buffers are 48 bytes, but we need to support scope IDs
in IPv6 presentation addresses, which means making the buffers even
larger.  Instead, let's find ways to eliminate them to save space.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:49 -05:00
Chuck Lever 7538ce1eb6 NLM: Use modern style for pointer fields in nlm_host
Clean up: I'm about to add another "char *" field to the nlm_host
structure.  The h_name field, for example, uses an older style of
declaring a "char *" field.  If I match that style for the new field,
checkpatch.pl will complain.

So, fix pointer fields to use the new style.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:48 -05:00
Jeff Layton c9233eb7b0 sunrpc: add sv_maxconn field to svc_serv (try #3)
svc_check_conn_limits() attempts to prevent denial of service attacks
by having the service close old connections once it reaches a
threshold. This threshold is based on the number of threads in the
service:

	(serv->sv_nrthreads + 3) * 20

Once we reach this, we drop the oldest connections and a printk pops
to warn the admin that they should increase the number of threads.

Increasing the number of threads isn't an option however for services
like lockd. We don't want to eliminate this check entirely for such
services but we need some way to increase this limit.

This patch adds a sv_maxconn field to the svc_serv struct. When it's
set to 0, we use the current method to calculate the max number of
connections. RPC services can then set this on an as-needed basis.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:47 -05:00
J. Bruce Fields 548eaca46b nfsd: document new filehandle fsid types
Descriptions taken from mountd code (in nfs-utils/utils/mountd/cache.c).

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06 11:53:47 -05:00
Sergei Shtylyov 592b531521 ide: move read_sff_dma_status() method to 'struct ide_dma_ops'
Move apparently misplaced read_sff_dma_status() method from 'struct ide_tp_ops'
to 'struct ide_dma_ops', renaming it to dma_sff_read_status() and making only
required for SFF-8038i compatible IDE controller drivers (greatly cutting down
the number of initializers) as its only user (outside ide-dma-sff.c and such
drivers) appears to be ide_pci_check_simplex() which is only called for such
controllers...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-06 17:21:02 +01:00
Shane McDonald 391ad1908a Resurrect IT8172 IDE controller driver
Support for the IT8172 IDE controller was removed from the kernel
sometime after 2.6.18.  Support for the only boards that used the IT8172
was removed from the kernel after 2.6.18, as they had never compiled
since 2.6.0.  However, there are a couple of platforms that use this
chip: the PMC-Sierra Xiao Hu thin-client computer, which is no longer
in production, and the Linksys NSS4000 Network Attached Storage box,
which is based on the Xiao Hu board.  I am attempting to add support
for the Xiao Hu to the kernel, and this IT8172 IDE controller is the
first bit of code in this effort.

This patch resurrects the IT8172 IDE controller code.  I began with
the 2.6.18 version of the it8172.c file, and have moved it forward so
that it works with the latest version of the kernel.  I have run this
driver on a PMC-Sierra Xiao Hu board with the 2.6.28 kernel, and
I have had no problems with it in my configuration.  The attached patch
applies cleanly against 2.6.28.

Signed-off-by: Shane McDonald <mcdonald.shane@gmail.com>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: alan@lxorguk.ukuu.org.uk
[bart: s/HWIF(drive)/drive->hwif/]
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-06 17:21:01 +01:00
Bartlomiej Zolnierkiewicz 94c96445f3 ide: remove unused ide_hwif_t.sg_mapped field
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-06 17:20:59 +01:00
Bartlomiej Zolnierkiewicz 906ef986a7 ide: struct ide_atapi_pc - remove unused fields and update documentation
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-06 17:20:59 +01:00
Borislav Petkov d6251d4488 ide-cd: convert to ide-atapi facilities
... and remove no longer needed cdrom_start_packet_command and
cdrom_transfer_packet_command.

Tested lightly with ide-cd and ide-floppy.

Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-06 17:20:58 +01:00
Bartlomiej Zolnierkiewicz 2bd24a1cfc ide: add port and host iterators
Add ide_port_for_each_dev() / ide_host_for_each_port() iterators
and update IDE code to use them.

While at it:
- s/unit/i/ variable in ide_port_wait_ready(), ide_probe_port(),
  ide_port_tune_devices(), ide_port_init_devices_data(), do_reset1(),
  ide_acpi_set_state() and scc_dma_end()
- s/d/i/ variable in ide_proc_port_register_devices()

There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-06 17:20:56 +01:00
Bartlomiej Zolnierkiewicz 5e7f3a4669 ide: dynamic allocation of device structures
Allocate device structures dynamically instead of having them embedded
in ide_hwif_t:

* Remove needless zeroing of port structure from ide_init_port_data().

* Add ide_hwif_t.devices[MAX_DRIVES] (table of pointers to the devices).

* Add ide_port_{alloc,free}_devices() helpers and use them respectively
  in ide_{host,free}_alloc().

* Convert all users of ->drives[] to use ->devices[] instead.

While at it:

* Use drive->dn for the slave device check in scc_pata.c.

As a nice side-effect this patch cuts ~1kB (x86-32) from the resulting
code size:

   text    data     bss     dec     hex filename
  53963    1244     237   55444    d894 drivers/ide/ide-core.o.before
  52981    1244     237   54462    d4be drivers/ide/ide-core.o.after

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-06 17:20:56 +01:00
Bartlomiej Zolnierkiewicz 627e05daa1 ide: remove ->error method from struct ide_driver
* Remove (now superfluous) ->error method from struct ide_driver.

* Unexport __ide_error() and make it static.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-06 17:20:54 +01:00
Bartlomiej Zolnierkiewicz 7f3c868ba7 ide: remove ide_driver_t typedef
While at it:
- s/struct ide_driver_s/struct ide_driver/
- use to_ide_driver() macro in ide-proc.c

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-06 17:20:53 +01:00
Bartlomiej Zolnierkiewicz 9892ec5497 ide: remove 'byte' typedef
Just use u8 instead, also s/__u8/u8/ in ide-cd.h while at it.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-06 17:20:53 +01:00
Bartlomiej Zolnierkiewicz c0ae502347 ide: remove ide_pci_enablebit_t typedef
Remove needless parens while at it.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-06 17:20:52 +01:00
Bartlomiej Zolnierkiewicz 54cc1428cf ide: remove local_irq_set() macro
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-06 17:20:52 +01:00
Bartlomiej Zolnierkiewicz 898ec223fe ide: remove HWIF() macro
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-06 17:20:52 +01:00
Bartlomiej Zolnierkiewicz b40d1b88f1 ide: move ide_init_port_data() and friends to ide-probe.c
* Move IDE_DEFAULT_MAX_FAILURES to <linux/ide.h>.

* Move ide_cfg_mtx, ide_hwif_to_major[], ide_port_init_devices_data(),
  ide_init_port_data(), ide_init_port_hw() and ide_unregister() to
  ide-probe.c from ide.c.

* Make ide_unregister(), ide_init_port_data(), ide_init_port_hw()
  and ide_cfg_mtx static.

While at it:

* Remove stale ide_init_port_data() documentation and ide_lock extern.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-06 17:20:51 +01:00
Bartlomiej Zolnierkiewicz b65fac32cf ide: merge ide_hwgroup_t with ide_hwif_t (v2)
* Merge ide_hwgroup_t with ide_hwif_t.

* Cleanup init_irq() accordingly, then remove no longer needed
  ide_remove_port_from_hwgroup() and ide_ports[].

* Remove now unused HWGROUP() macro.

While at it:

* ide_dump_ata_error() fixups

v2:
* Fix ->quirk_list check in do_ide_request()
  (s/hwif->cur_dev/prev_port->cur_dev).

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-06 17:20:50 +01:00
Bartlomiej Zolnierkiewicz 5b31f855f1 ide: use lock bitops for ports serialization (v2)
* Add ->host_busy field to struct ide_host and use it's first bit
  together with lock bitops to provide new ports serialization method.

* Convert core IDE code to use new ide_[un]lock_host() helpers.

  This removes the need for taking hwgroup->lock if host is already
  busy on serialized hosts and makes it possible to merge ide_hwgroup_t
  into ide_hwif_t (done in the later patch).

* Remove no longer needed ide_hwgroup_t.busy and ide_[un]lock_hwgroup().

* Update do_ide_request() documentation.

v2:
* ide_release_lock() should be called inside IDE_HFLAG_SERIALIZE check.

* Add ide_hwif_t.busy flag and ide_[un]lock_port() for serializing
  devices on a port.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-06 17:20:49 +01:00
Bartlomiej Zolnierkiewicz efe0397eef ide: remove hwgroup->hwif and {drive,hwif}->next
* Add 'int port_count' field to ide_hwgroup_t to keep the track
  of the number of ports in the hwgroup.  Then update init_irq()
  and ide_remove_port_from_hwgroup() to use it.

* Remove no longer needed hwgroup->hwif, {drive,hwif}->next,
  ide_add_drive_to_hwgroup() and ide_remove_drive_from_hwgroup()
  (hwgroup->drive now only denotes the currently active device
   in the hwgroup).

* Update locking documentation in <linux/ide.h>.

While at it:

* Rename ->drive field in ide_hwgroup_t to ->cur_dev.

* Use __func__ in ide_timer_expiry().

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-06 17:20:49 +01:00
Bartlomiej Zolnierkiewicz ae86afaee6 ide: use per-port IRQ handlers
Use hwif instead of hwgroup as {request,free}_irq()'s cookie,
teach ide_intr() to return early for non-active serialized ports,
modify unexpected_intr() accordingly and then use per-port IRQ
handlers instead of per-hwgroup ones.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-06 17:20:48 +01:00
Bartlomiej Zolnierkiewicz bd53cbcce5 ide: add ->cur_port to struct ide_host and use it for serialized hosts
* Pass 'ide_hwif_t *' instead of 'ide_hwgroup_t *' to unexpected_intr().

* Cache pointer to the port currently being serviced in ->cur_port
  and use it instead of hwif->hwgroup on serialized hosts.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-06 17:20:48 +01:00
James Smart 4be98c0ca3 [SCSI] fc transport: restore missing dev_loss_tmo callback to LLDD
When we reworked the transport for the rport lifetimes, in cases where the
rport was reused as a container for tgt id bindings, we inadvertantly
removed the callback to the driver indicating that dev_loss_tmo had fired.

This patch restores that functionality.

Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-01-06 09:43:33 -06:00
Frederik Schwarzer 0211a9c850 trivial: fix an -> a typos in documentation and comments
It is always "an" if there is a vowel _spoken_ (not written).
So it is:
"an hour" (spoken vowel)
but
"a uniform" (spoken 'j')

Signed-off-by: Frederik Schwarzer <schwarzerf@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-01-06 11:28:07 +01:00
Frederik Schwarzer 025dfdafe7 trivial: fix then -> than typos in comments and documentation
- (better, more, bigger ...) then -> (...) than

Signed-off-by: Frederik Schwarzer <schwarzerf@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-01-06 11:28:06 +01:00
Takashi Iwai 5439e726b5 Merge branch 'topic/asoc' into for-linus 2009-01-06 09:48:51 +01:00
Ingo Molnar d9be28ea91 Merge branches 'sched/clock', 'sched/cleanups' and 'linus' into sched/urgent 2009-01-06 09:33:57 +01:00
Ingo Molnar fdbc0450df Merge branches 'core/futexes', 'core/locking', 'core/rcu' and 'linus' into core/urgent 2009-01-06 09:32:11 +01:00
Rusty Russell 835481d9bc cpumask: convert struct cpufreq_policy to cpumask_var_t
Impact: use new cpumask API to reduce memory usage

This is part of an effort to reduce structure sizes for machines
configured with large NR_CPUS.  cpumask_t gets replaced by
cpumask_var_t, which is either struct cpumask[1] (small NR_CPUS) or
struct cpumask * (large NR_CPUS).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-06 09:05:31 +01:00
Theodore Ts'o b3881f74b3 ext4: Add mount option to set kjournald's I/O priority
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jens Axboe <jens.axboe@oracle.com>
2009-01-05 22:46:26 -05:00
Linus Torvalds 238c6d5483 Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
  dm snapshot: extend exception store functions
  dm snapshot: split out exception store implementations
  dm snapshot: rename struct exception_store
  dm snapshot: separate out exception store interface
  dm mpath: move trigger_event to system workqueue
  dm: add name and uuid to sysfs
  dm table: rework reference counting
  dm: support barriers on simple devices
  dm request: extend target interface
  dm request: add caches
  dm ioctl: allow dm_copy_name_and_uuid to return only one field
  dm log: ensure log bitmap fits on log device
  dm log: move region_size validation
  dm log: avoid reinitialising io_req on every operation
  dm: consolidate target deregistration error handling
  dm raid1: fix error count
  dm log: fix dm_io_client leak on error paths
  dm snapshot: change yield to msleep
  dm table: drop reference at unbind
2009-01-05 19:20:59 -08:00
Andi Kleen ab4c142488 dm: support barriers on simple devices
Implement barrier support for single device DM devices

This patch implements barrier support in DM for the common case of dm linear
just remapping a single underlying device. In this case we can safely
pass the barrier through because there can be no reordering between
devices.

 NB. Any DM device might cease to support barriers if it gets
     reconfigured so code must continue to allow for a possible
     -EOPNOTSUPP on every barrier bio submitted.  - agk

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:05:09 +00:00
Kiyoshi Ueda 7d76345da6 dm request: extend target interface
This patch adds the following target interfaces for request-based dm.

  map_rq    : for mapping a request

  rq_end_io : for finishing a request

  busy      : for avoiding performance regression from bio-based dm.
              Target can tell dm core not to map requests now, and
              that may help requests in the block layer queue to be
              bigger by I/O merging.
              In bio-based dm, this behavior is done by device
              drivers managing the block layer queue.
              But in request-based dm, dm core has to do that
              since dm core manages the block layer queue.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:05:07 +00:00
Mikulas Patocka 10d3bd09a3 dm: consolidate target deregistration error handling
Change dm_unregister_target to return void and use BUG() for error
reporting.

dm_unregister_target can only fail because of programming bug in the
target driver. It can't fail because of user's behavior or disk errors.

This patch changes unregister_target to return void and use BUG if
someone tries to unregister non-registered target or unregister target
that is in use.

This patch removes code duplication (testing of error codes in all dm
targets) and reports bugs in just one place, in dm_unregister_target. In
some target drivers, these return codes were ignored, which could lead
to a situation where bugs could be missed.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:04:58 +00:00
Linus Torvalds 8e128ce331 Merge branch 'for-next' of git://git.o-hand.com/linux-mfd
* 'for-next' of git://git.o-hand.com/linux-mfd: (30 commits)
  mfd: Fix section mismatch in da903x
  mfd: move drivers/i2c/chips/menelaus.c to drivers/mfd
  mfd: move drivers/i2c/chips/tps65010.c to drivers/mfd
  mfd: dm355evm msp430 driver
  mfd: Add missing break from wm3850-core
  mfd: Add WM8351 support
  mfd: Support configurable numbers of DCDCs and ISINKs on WM8350
  mfd: Handle missing WM8350 platform data
  mfd: Add WM8352 support
  mfd: Use irq_to_desc in twl4030 code
  power_supply: Add Dialog DA9030 battery charger driver
  mfd: Dialog DA9030 battery charger MFD driver
  mfd: Register WM8400 codec device
  mfd: Pass driver_data onto child devices
  mfd: Fix twl4030-core.c build error
  mfd: twl4030 regulator bug fixes
  mfd: twl4030: create some regulator devices
  mfd: twl4030: cleanup symbols and OMAP dependency
  mfd: twl4030: simplified child creation code
  power_supply: Add battery health reporting for WM8350
  ...
2009-01-05 19:04:09 -08:00
Linus Torvalds 0bbb275358 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  module: convert to stop_machine_create/destroy.
  stop_machine: introduce stop_machine_create/destroy.
  parisc: fix module loading failure of large kernel modules
  module: fix module loading failure of large kernel modules for parisc
  module: fix warning of unused function when !CONFIG_PROC_FS
  kernel/module.c: compare symbol values when marking symbols as exported in /proc/kallsyms.
  remove CONFIG_KMOD
2009-01-05 19:03:39 -08:00
Linus Torvalds 0578c3b4d4 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  swiotlb: Don't include linux/swiotlb.h twice in lib/swiotlb.c
  intel-iommu: fix build error with INTR_REMAP=y and DMAR=n
  swiotlb: add missing __init annotations
2009-01-05 19:03:11 -08:00
Linus Torvalds 8606ab6d30 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (22 commits)
  HID: fix error condition propagation in hid-sony driver
  HID: fix reference count leak hidraw
  HID: add proper support for pensketch 12x9 tablet
  HID: don't allow DealExtreme usb-radio be handled by usb hid driver
  HID: fix default Kconfig setting for TopSpeed driver
  HID: driver for TopSeed Cyberlink quirky remote
  HID: make boot protocol drivers depend on EMBEDDED
  HID: avoid sparse warning in HID_COMPAT_LOAD_DRIVER
  HID: hiddev cleanup -- handle all error conditions properly
  HID: force feedback driver for GreenAsia 0x12 PID
  HID: switch specialized drivers from "default y" to !EMBEDDED
  HID: set proper dev.parent in hidraw
  HID: add dynids facility
  HID: use GFP_KERNEL in hid_alloc_buffers
  HID: usbhid, use usb_endpoint_xfer_int
  HID: move usbhid flags to usbhid.h
  HID: add n-trig digitizer support
  HID: add phys and name ioctls to hidraw
  HID: struct device - replace bus_id with dev_name(), dev_set_name()
  HID: automatically call usbhid_set_leds in usbhid driver
  ...
2009-01-05 18:53:34 -08:00
Linus Torvalds c54febae99 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (27 commits)
  GFS2: Use DEFINE_SPINLOCK
  GFS2: Fix use-after-free bug on umount (try #2)
  Revert "GFS2: Fix use-after-free bug on umount"
  GFS2: Streamline alloc calculations for writes
  GFS2: Send useful information with uevent messages
  GFS2: Fix use-after-free bug on umount
  GFS2: Remove ancient, unused code
  GFS2: Move four functions from super.c
  GFS2: Fix bug in gfs2_lock_fs_check_clean()
  GFS2: Send some sensible sysfs stuff
  GFS2: Kill two daemons with one patch
  GFS2: Move gfs2_recoverd into recovery.c
  GFS2: Fix "truncate in progress" hang
  GFS2: Clean up & move gfs2_quotad
  GFS2: Add more detail to debugfs glock dumps
  GFS2: Banish struct gfs2_rgrpd_host
  GFS2: Move rg_free from gfs2_rgrpd_host to gfs2_rgrpd
  GFS2: Move rg_igeneration into struct gfs2_rgrpd
  GFS2: Banish struct gfs2_dinode_host
  GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksize
  ...
2009-01-05 18:52:54 -08:00
Linus Torvalds 15b0669072 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (44 commits)
  qlge: Fix sparse warnings for tx ring indexes.
  qlge: Fix sparse warning regarding rx buffer queues.
  qlge: Fix sparse endian warning in ql_hw_csum_setup().
  qlge: Fix sparse endian warning for inbound packet control block flags.
  qlge: Fix sparse warnings for byte swapping in qlge_ethool.c
  myri10ge: print MAC and serial number on probe failure
  pkt_sched: cls_u32: Fix locking in u32_change()
  iucv: fix cpu hotplug
  af_iucv: Free iucv path/socket in path_pending callback
  af_iucv: avoid left over IUCV connections from failing connects
  af_iucv: New error return codes for connect()
  net/ehea: bitops work on unsigned longs
  Revert "net: Fix for initial link state in 2.6.28"
  tcp: Kill extraneous SPLICE_F_NONBLOCK checks.
  tcp: don't mask EOF and socket errors on nonblocking splice receive
  dccp: Integrate the TFRC library with DCCP
  dccp: Clean up ccid.c after integration of CCID plugins
  dccp: Lockless integration of CCID congestion-control plugins
  qeth: get rid of extra argument after printk to dev_* conversion
  qeth: No large send using EDDP for HiperSockets.
  ...
2009-01-05 18:44:59 -08:00
Linus Torvalds e9af797d75 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Fix on resume, now preserves user policy min/max.
  [CPUFREQ] Add Celeron Core support to p4-clockmod.
  [CPUFREQ] add to speedstep-lib additional fsb values for core processors
  [CPUFREQ] Disable sysfs ui for p4-clockmod.
  [CPUFREQ] p4-clockmod: reduce noise
  [CPUFREQ] clean up speedstep-centrino and reduce cpumask_t usage
2009-01-05 18:33:38 -08:00
Linus Torvalds 10cc04f5a0 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (138 commits)
  ocfs2: Access the right buffer_head in ocfs2_merge_rec_left.
  ocfs2: use min_t in ocfs2_quota_read()
  ocfs2: remove unneeded lvb casts
  ocfs2: Add xattr support checking in init_security
  ocfs2: alloc xattr bucket in ocfs2_xattr_set_handle
  ocfs2: calculate and reserve credits for xattr value in mknod
  ocfs2/xattr: fix credits calculation during index create
  ocfs2/xattr: Always updating ctime during xattr set.
  ocfs2/xattr: Remove extend_trans call and add its credits from the beginning
  ocfs2/dlm: Fix race during lockres mastery
  ocfs2/dlm: Fix race in adding/removing lockres' to/from the tracking list
  ocfs2/dlm: Hold off sending lockres drop ref message while lockres is migrating
  ocfs2/dlm: Clean up errors in dlm_proxy_ast_handler()
  ocfs2/dlm: Fix a race between migrate request and exit domain
  ocfs2: One more hamming code optimization.
  ocfs2: Another hamming code optimization.
  ocfs2: Don't hand-code xor in ocfs2_hamming_encode().
  ocfs2: Enable metadata checksums.
  ocfs2: Validate superblock with checksum and ecc.
  ocfs2: Checksum and ECC for directory blocks.
  ...
2009-01-05 18:32:43 -08:00
Linus Torvalds 520c853466 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  inotify: fix type errors in interfaces
  fix breakage in reiserfs_new_inode()
  fix the treatment of jfs special inodes
  vfs: remove duplicate code in get_fs_type()
  add a vfs_fsync helper
  sys_execve and sys_uselib do not call into fsnotify
  zero i_uid/i_gid on inode allocation
  inode->i_op is never NULL
  ntfs: don't NULL i_op
  isofs check for NULL ->i_op in root directory is dead code
  affs: do not zero ->i_op
  kill suid bit only for regular files
  vfs: lseek(fd, 0, SEEK_CUR) race condition
2009-01-05 18:32:06 -08:00
Nick Piggin e8c82c2e23 mm lockless pagecache barrier fix
An XFS workload showed up a bug in the lockless pagecache patch. Basically it
would go into an "infinite" loop, although it would sometimes be able to break
out of the loop! The reason is a missing compiler barrier in the "increment
reference count unless it was zero" case of the lockless pagecache protocol in
the gang lookup functions.

This would cause the compiler to use a cached value of struct page pointer to
retry the operation with, rather than reload it. So the page might have been
removed from pagecache and freed (refcount==0) but the lookup would not correctly
notice the page is no longer in pagecache, and keep attempting to increment the
refcount and failing, until the page gets reallocated for something else. This
isn't a data corruption because the condition will be detected if the page has
been reallocated. However it can result in a lockup.

Linus points out that ACCESS_ONCE is also required in that pointer load, even
if it's absence is not causing a bug on our particular build. The most general
way to solve this is just to put an rcu_dereference in radix_tree_deref_slot.

Assembly of find_get_pages,
before:
.L220:
        movq    (%rbx), %rax    #* ivtmp.1162, tmp82
        movq    (%rax), %rdi    #, prephitmp.1149
.L218:
        testb   $1, %dil        #, prephitmp.1149
        jne     .L217   #,
        testq   %rdi, %rdi      # prephitmp.1149
        je      .L203   #,
        cmpq    $-1, %rdi       #, prephitmp.1149
        je      .L217   #,
        movl    8(%rdi), %esi   # <variable>._count.counter, c
        testl   %esi, %esi      # c
        je      .L218   #,

after:
.L212:
        movq    (%rbx), %rax    #* ivtmp.1109, tmp81
        movq    (%rax), %rdi    #, ret
        testb   $1, %dil        #, ret
        jne     .L211   #,
        testq   %rdi, %rdi      # ret
        je      .L197   #,
        cmpq    $-1, %rdi       #, ret
        je      .L211   #,
        movl    8(%rdi), %esi   # <variable>._count.counter, c
        testl   %esi, %esi      # c
        je      .L212   #,

(notice the obvious infinite loop in the first example, if page->count remains 0)

Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-05 18:31:12 -08:00
Dan Williams 07f2211e4f dmaengine: remove dependency on async_tx
async_tx.ko is a consumer of dma channels.  A circular dependency arises
if modules in drivers/dma rely on common code in async_tx.ko.  It
prevents either module from being unloaded.

Move dma_wait_for_async_tx and async_tx_run_dependencies to dmaeninge.o
where they should have been from the beginning.

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-01-05 18:10:19 -07:00
Peter Ujfalusi 2e72f8e371 ASoC: New enum type: value_enum
This patch introduces a new enum type.
In this enum type each enumerated items referred with a value.

This new enum type can handle enums encoded in bitfield, or any other
weird ways. twl4030 codec has several mux selection register, where the
input/output mux is coded in a bitfield. With the normal enum type this type
of mux can not be handled in a clean way.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@nokia.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2009-01-05 17:47:17 +00:00
Michael Kerrisk 4ae8978cf9 inotify: fix type errors in interfaces
The problems lie in the types used for some inotify interfaces, both at the kernel level and at the glibc level. This mail addresses the kernel problem. I will follow up with some suggestions for glibc changes.

For the sys_inotify_rm_watch() interface, the type of the 'wd' argument is
currently 'u32', it should be '__s32' .  That is Robert's suggestion, and
is consistent with the other declarations of watch descriptors in the
kernel source, in particular, the inotify_event structure in
include/linux/inotify.h:

struct inotify_event {
        __s32           wd;             /* watch descriptor */
        __u32           mask;           /* watch mask */
        __u32           cookie;         /* cookie to synchronize two events */
        __u32           len;            /* length (including nulls) of name */
        char            name[0];        /* stub for possible name */
};

The patch makes the changes needed for inotify_rm_watch().

Signed-off-by: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Robert Love <rlove@google.com>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05 11:54:29 -05:00
Christoph Hellwig 4c728ef583 add a vfs_fsync helper
Fsync currently has a fdatawrite/fdatawait pair around the method call,
and a mutex_lock/unlock of the inode mutex.  All callers of fsync have
to duplicate this, but we have a few and most of them don't quite get
it right.  This patch adds a new vfs_fsync that takes care of this.
It's a little more complicated as usual as ->fsync might get a NULL file
pointer and just a dentry from nfsd, but otherwise gets afile and we
want to take the mapping and file operations from it when it is there.

Notes on the fsync callers:

 - ecryptfs wasn't calling filemap_fdatawrite / filemap_fdatawait on the
   	lower file
 - coda wasn't calling filemap_fdatawrite / filemap_fdatawait on the host
	file, and returning 0 when ->fsync was missing
 - shm wasn't calling either filemap_fdatawrite / filemap_fdatawait nor
   taking i_mutex.  Now given that shared memory doesn't have disk
   backing not doing anything in fsync seems fine and I left it out of
   the vfs_fsync conversion for now, but in that case we might just
   not pass it through to the lower file at all but just call the no-op
   simple_sync_file directly.

[and now actually export vfs_fsync]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05 11:54:28 -05:00
Joel Becker e06c8227fd jbd2: Add buffer triggers
Filesystems often to do compute intensive operation on some
metadata.  If this operation is repeated many times, it can be very
expensive.  It would be much nicer if the operation could be performed
once before a buffer goes to disk.

This adds triggers to jbd2 buffer heads.  Just before writing a metadata
buffer to the journal, jbd2 will optionally call a commit trigger associated
with the buffer.  If the journal is aborted, an abort trigger will be
called on any dirty buffers as they are dropped from pending
transactions.

ocfs2 will use this feature.

Initially I tried to come up with a more generic trigger that could be
used for non-buffer-related events like transaction completion.  It
doesn't tie nicely, because the information a buffer trigger needs
(specific to a journal_head) isn't the same as what a transaction
trigger needs (specific to a tranaction_t or perhaps journal_t).  So I
implemented a buffer set, with the understanding that
journal/transaction wide triggers should be implemented separately.

There is only one trigger set allowed per buffer.  I can't think of any
reason to attach more than one set.  Contrast this with a journal or
transaction in which multiple places may want to watch the entire
transaction separately.

The trigger sets are considered static allocation from the jbd2
perspective.  ocfs2 will just have one trigger set per block type,
setting the same set on every bh of the same type.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:30 -08:00
Jan Kara 7d9056ba20 quota: Export dquot_alloc() and dquot_destroy() functions
These are default functions for creating and destroying quota structures
and they should be used from filesystems.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:25 -08:00
Jan Kara 5cd9d5bb86 quota: Unexport dqblk_v1.h and dqblk_v2.h
Unexport header files dqblk_v[12].h since except for quota format ID they
don't contain information userspace should be interested in. Move ID
definitions to quota.h.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:25 -08:00
Mark Fasheh e97fcd95a4 jbd2: Add BH_JBDPrivateStart
Add this so that file systems using JBD2 can safely allocate unused b_state
bits.

In this case, we add it so that Ocfs2 can define a single bit for tracking
the validation state of a buffer.

Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:24 -08:00
Jan Kara 12c77527e4 quota: Implement function for scanning active dquots
OCFS2 needs to scan all active dquots once in a while and sync quota
information among cluster nodes. Provide a helper function for it so
that it does not have to reimplement internally a list which VFS
already has. Moreover this function is probably going to be useful
for other clustered filesystems if they decide to use VFS quotas.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:23 -08:00
Jan Kara 3d9ea253a0 quota: Add helpers to allow ocfs2 specific quota initialization, freeing and recovery
OCFS2 needs to peek whether quota structure is already in memory so
that it can avoid expensive cluster locking in that case. Similarly
when freeing dquots, it checks whether it is the last quota structure
user or not. Finally, it needs to get reference to dquot structure for
specified id and quota type when recovering quota file after crash.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:22 -08:00
Jan Kara 571b46e40b quota: Update version number
Increase reported version number of quota support since quota core has changed
significantly. Also remove __DQUOT_NUM_VERSION__ since nobody uses it.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:22 -08:00
Jan Kara 4d59bce4f9 quota: Keep which entries were set by SETQUOTA quotactl
Quota in a clustered environment needs to synchronize quota information
among cluster nodes. This means we have to occasionally update some
information in dquot from disk / network. On the other hand we have to
be careful not to overwrite changes administrator did via SETQUOTA.
So indicate in dquot->dq_flags which entries have been set by SETQUOTA
and quota format can clear these flags when it properly propagated
the changes.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:22 -08:00
Jan Kara db49d2df48 quota: Allow negative usage of space and inodes
For clustered filesystems, it can happen that space / inode usage goes
negative temporarily (because some node is allocating another node
is freeing and they are not completely in sync). So let quota code
allow this and change qsize_t so a signed type so that we don't
underflow the variables.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:21 -08:00
Jan Kara e3d4d56b97 quota: Convert union in mem_dqinfo to a pointer
Coming quota support for OCFS2 is going to need quite a bit
of additional per-sb quota information. Moreover having fs.h
include all the types needed for this structure would be a
pain in the a**. So remove the union from mem_dqinfo and add
a private pointer for filesystem's use.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:21 -08:00
Jan Kara 1ccd14b9c2 quota: Split off quota tree handling into a separate file
There is going to be a new version of quota format having 64-bit
quota limits and a new quota format for OCFS2. They are both
going to use the same tree structure as VFSv0 quota format. So
split out tree handling into a separate file and make size of
leaf blocks, amount of space usable in each block (needed for
checksumming) and structures contained in them configurable
so that the code can be shared.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:21 -08:00
Jan Kara cf770c1371 quota: Move quotaio_v[12].h from include/linux/ to fs/
Since these include files are used only by implementation of quota formats,
there's no need to have them in include/linux/.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:58 -08:00
Jan Kara ca785ec66b quota: Introduce DQUOT_QUOTA_SYS_FILE flag
If filesystem can handle quota files as system files hidden from users, we can
skip a lot of cache invalidation, syncing, inode flags setting etc. when
turning quotas on, off and quota_sync. Allow filesystem to indicate that it is
hiding quota files from users by DQUOT_QUOTA_SYS_FILE flag.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:57 -08:00
Jan Kara dcb30695f2 quota: Remove compatibility function sb_any_quota_enabled()
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:56 -08:00
Jan Kara f55abc0fb9 quota: Allow to separately enable quota accounting and enforcing limits
Split DQUOT_USR_ENABLED (and DQUOT_GRP_ENABLED) into DQUOT_USR_USAGE_ENABLED
and DQUOT_USR_LIMITS_ENABLED. This way we are able to separately enable /
disable whether we should:
1) ignore quotas completely
2) just keep uptodate information about usage
3) actually enforce quota limits

This is going to be useful when quota is treated as filesystem metadata - we
then want to keep quota information uptodate all the time and just enable /
disable limits enforcement.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:56 -08:00
Jan Kara e4bc7b4b7f quota: Make _SUSPENDED just a flag
Upto now, DQUOT_USR_SUSPENDED behaved like a state - i.e., either quota
was enabled or suspended or none. Now allowed states are 0, ENABLED,
ENABLED | SUSPENDED. This will be useful later when we implement separate
enabling of quota usage tracking and limits enforcement because we need to
keep track of a state which has been suspended.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:56 -08:00
Jan Kara 12095460f7 quota: Increase size of variables for limits and inode usage
So far quota was fine with quota block limits and inode limits/numbers in
a 32-bit type. Now with rapid increase in storage sizes there are coming
requests to be able to handle quota limits above 4TB / more that 2^32 inodes.
So bump up sizes of types in mem_dqblk structure to 64-bits to be able to
handle this. Also update inode allocation / checking functions to use qsize_t
and make global structure keep quota limits in bytes so that things are
consistent.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:55 -08:00
Jan Kara 74f783af95 quota: Add callbacks for allocating and destroying dquot structures
Some filesystems would like to keep private information together with each
dquot. Add callbacks alloc_dquot and destroy_dquot allowing filesystem to
allocate larger dquots from their private slab in a similar fashion we
currently allocate inodes.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:55 -08:00
Haavard Skinnemoen 0e49005090 Merge branch 'move-atmel-mci-h' into boards 2009-01-05 16:36:07 +01:00
Nicolas Ferre c42aa775cc atmel-mci: move atmel-mci.h file to include/linux
Needed to use the atmel-mci driver in an architecture
independant maner.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
2009-01-05 16:35:31 +01:00
Haavard Skinnemoen bc08969fe6 Merge branch 'cleanups' into boards 2009-01-05 15:51:52 +01:00
Ingo Molnar be92d7af38 genirq: provide irq_to_desc() to non-genirq architectures too
Impact: build fix on non-genirq architectures

Sam Ravnborg reported this build failure on sparc32 allmodconfig,
the GPIO drivers assume the presence of irq_to_desc():

 drivers/gpio/gpiolib.c: In function `gpiolib_dbg_show':
 drivers/gpio/gpiolib.c:1146: error: implicit declaration of function 'irq_to_desc'

Add it in the !genirq case too.

Reported-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Sam Ravnborg <sam@ravnborg.org>
2009-01-05 14:53:30 +01:00
Ingo Molnar 46483d10e5 Merge branch 'core/iommu' into core/urgent
Conflicts:
	lib/swiotlb.c
2009-01-05 14:17:24 +01:00
Alexey Korolev d81408304b [MTD] LPDDR extended physmap driver to support LPDDR flash
Physmap is a generic map driver for different platforms and flash types.
We added support of LPDDR to physmap.
All changes here are related to introduction of new pfow_base parameter.
This parameter is valid in case of LPDDR chips only.

Signed-off-by: Alexey Korolev <akorolev@infradead.org>
Acked-by: Jared Hulbert <jaredeh@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-05 13:57:28 +01:00
Alexey Korolev d13e51e747 [MTD] LPDDR added new pfow_base parameter
We need to supply additional parameter to mapping driver and tell
LPDDR drivers where PFOW window is in chip mapping.
It leads to necessity of map_info structure extendoing.

Signed-off-by: Alexey Korolev <akorolev@infradead.org>
Acked-by: Jared Hulbert <jaredeh@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-05 13:56:08 +01:00
Alexey Korolev eb3db27507 [MTD] LPDDR PFOW definition
LPDDR chips use PFOW window for sending commands, reading status and
capabilites requesting.
This pfow.h - contains definitions for PFOW window fileds, possible commands,
error flags and some common macro function to avoid code duplications.

Signed-off-by: Alexey Korolev <akorolev@infradead.org>
Acked-by: Jared Hulbert <jaredeh@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-05 13:55:58 +01:00
Alexey Korolev 922ab535bb [MTD] LPDDR QINFO records definitions
There are declaraton of structures and macros definitions
necessary for operations with QINFO in this patch.

Signed-off-by: Alexey Korolev <akorolev@infradead.org>
Acked-by: Jared Hulbert <jaredeh@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-05 13:55:54 +01:00
Li Zefan c70f22d203 sched: clean up arch_reinit_sched_domains()
- Make arch_reinit_sched_domains() static. It was exported to be used in
  s390, but now rebuild_sched_domains() is used instead.

- Make it return void.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-05 13:54:16 +01:00
Peter Zijlstra a6037b61c2 hrtimer: fix recursion deadlock by re-introducing the softirq
Impact: fix rare runtime deadlock

There are a few sites that do:

  spin_lock_irq(&foo)
  hrtimer_start(&bar)
    __run_hrtimer(&bar)
      func()
        spin_lock(&foo)

which obviously deadlocks. In order to avoid this, never call __run_hrtimer()
from hrtimer_start*() context, but instead defer this to softirq context.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-05 13:14:33 +01:00
David Woodhouse 353816f43d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:
	arch/arm/mach-pxa/corgi.c
	arch/arm/mach-pxa/poodle.c
	arch/arm/mach-pxa/spitz.c
2009-01-05 10:50:33 +01:00
Paul E. McKenney ea7d3fef42 rcu: eliminate synchronize_rcu_xxx macro
Impact: cleanup

Expand macro into two files.

The synchronize_rcu_xxx macro is quite ugly and it's only used by two
callers, so expand it instead.  This makes this code easier to change.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-05 10:18:08 +01:00
Steven Whitehouse e9079cce20 GFS2: Support for FIEMAP ioctl
This patch implements the FIEMAP ioctl for GFS2. We can use the generic
code (aside from a lock order issue, solved as per Ted Tso's suggestion)
for which I've introduced a new variant of the generic function. We also
have one exception to deal with, namely stuffed files, so we do that
"by hand", setting all the required flags.

This has been tested with a modified (I could only find an old version) of
Eric's test program, and appears to work correctly.

This patch does not currently support FIEMAP of xattrs, but the plan is to add
that feature at some future point.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Eric Sandeen <sandeen@redhat.com>
2009-01-05 07:38:46 +00:00
Linus Torvalds fe0bdec68b Merge branch 'audit.b61' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* 'audit.b61' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
  audit: validate comparison operations, store them in sane form
  clean up audit_rule_{add,del} a bit
  make sure that filterkey of task,always rules is reported
  audit rules ordering, part 2
  fixing audit rule ordering mess, part 1
  audit_update_lsm_rules() misses the audit_inode_hash[] ones
  sanitize audit_log_capset()
  sanitize audit_fd_pair()
  sanitize audit_mq_open()
  sanitize AUDIT_MQ_SENDRECV
  sanitize audit_mq_notify()
  sanitize audit_mq_getsetattr()
  sanitize audit_ipc_set_perm()
  sanitize audit_ipc_obj()
  sanitize audit_socketcall
  don't reallocate buffer in every audit_sockaddr()
2009-01-04 16:32:11 -08:00
David Howells 14eaddc967 CRED: Fix regression in cap_capable() as shown up by sys_faccessat() [ver #2]
Fix a regression in cap_capable() due to:

	commit 5ff7711e635b32f0a1e558227d030c7e45b4a465
	Author: David Howells <dhowells@redhat.com>
	Date:   Wed Dec 31 02:52:28 2008 +0000

	    CRED: Differentiate objective and effective subjective credentials on a task

The problem is that the above patch allows a process to have two sets of
credentials, and for the most part uses the subjective credentials when
accessing current's creds.

There is, however, one exception: cap_capable(), and thus capable(), uses the
real/objective credentials of the target task, whether or not it is the current
task.

Ordinarily this doesn't matter, since usually the two cred pointers in current
point to the same set of creds.  However, sys_faccessat() makes use of this
facility to override the credentials of the calling process to make its test,
without affecting the creds as seen from other processes.

One of the things sys_faccessat() does is to make an adjustment to the
effective capabilities mask, which cap_capable(), as it stands, then ignores.

The affected capability check is in generic_permission():

	if (!(mask & MAY_EXEC) || execute_ok(inode))
		if (capable(CAP_DAC_OVERRIDE))
			return 0;

This change splits capable() from has_capability() down into the commoncap and
SELinux code.  The capable() security op now only deals with the current
process, and uses the current process's subjective creds.  A new security op -
task_capable() - is introduced that can check any task's objective creds.

strictly the capable() security op is superfluous with the presence of the
task_capable() op, however it should be faster to call the capable() op since
two fewer arguments need be passed down through the various layers.

This can be tested by compiling the following program from the XFS testsuite:

/*
 *  t_access_root.c - trivial test program to show permission bug.
 *
 *  Written by Michael Kerrisk - copyright ownership not pursued.
 *  Sourced from: http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-10/6030.html
 */
#include <limits.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/stat.h>

#define UID 500
#define GID 100
#define PERM 0
#define TESTPATH "/tmp/t_access"

static void
errExit(char *msg)
{
    perror(msg);
    exit(EXIT_FAILURE);
} /* errExit */

static void
accessTest(char *file, int mask, char *mstr)
{
    printf("access(%s, %s) returns %d\n", file, mstr, access(file, mask));
} /* accessTest */

int
main(int argc, char *argv[])
{
    int fd, perm, uid, gid;
    char *testpath;
    char cmd[PATH_MAX + 20];

    testpath = (argc > 1) ? argv[1] : TESTPATH;
    perm = (argc > 2) ? strtoul(argv[2], NULL, 8) : PERM;
    uid = (argc > 3) ? atoi(argv[3]) : UID;
    gid = (argc > 4) ? atoi(argv[4]) : GID;

    unlink(testpath);

    fd = open(testpath, O_RDWR | O_CREAT, 0);
    if (fd == -1) errExit("open");

    if (fchown(fd, uid, gid) == -1) errExit("fchown");
    if (fchmod(fd, perm) == -1) errExit("fchmod");
    close(fd);

    snprintf(cmd, sizeof(cmd), "ls -l %s", testpath);
    system(cmd);

    if (seteuid(uid) == -1) errExit("seteuid");

    accessTest(testpath, 0, "0");
    accessTest(testpath, R_OK, "R_OK");
    accessTest(testpath, W_OK, "W_OK");
    accessTest(testpath, X_OK, "X_OK");
    accessTest(testpath, R_OK | W_OK, "R_OK | W_OK");
    accessTest(testpath, R_OK | X_OK, "R_OK | X_OK");
    accessTest(testpath, W_OK | X_OK, "W_OK | X_OK");
    accessTest(testpath, R_OK | W_OK | X_OK, "R_OK | W_OK | X_OK");

    exit(EXIT_SUCCESS);
} /* main */

This can be run against an Ext3 filesystem as well as against an XFS
filesystem.  If successful, it will show:

	[root@andromeda src]# ./t_access_root /tmp/xxx 0 4043 4043
	---------- 1 dhowells dhowells 0 2008-12-31 03:00 /tmp/xxx
	access(/tmp/xxx, 0) returns 0
	access(/tmp/xxx, R_OK) returns 0
	access(/tmp/xxx, W_OK) returns 0
	access(/tmp/xxx, X_OK) returns -1
	access(/tmp/xxx, R_OK | W_OK) returns 0
	access(/tmp/xxx, R_OK | X_OK) returns -1
	access(/tmp/xxx, W_OK | X_OK) returns -1
	access(/tmp/xxx, R_OK | W_OK | X_OK) returns -1

If unsuccessful, it will show:

	[root@andromeda src]# ./t_access_root /tmp/xxx 0 4043 4043
	---------- 1 dhowells dhowells 0 2008-12-31 02:56 /tmp/xxx
	access(/tmp/xxx, 0) returns 0
	access(/tmp/xxx, R_OK) returns -1
	access(/tmp/xxx, W_OK) returns -1
	access(/tmp/xxx, X_OK) returns -1
	access(/tmp/xxx, R_OK | W_OK) returns -1
	access(/tmp/xxx, R_OK | X_OK) returns -1
	access(/tmp/xxx, W_OK | X_OK) returns -1
	access(/tmp/xxx, R_OK | W_OK | X_OK) returns -1

I've also tested the fix with the SELinux and syscalls LTP testsuites.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-01-05 11:17:04 +11:00
Herbert Xu 5d38a079ce gro: Add page frag support
This patch allows GRO to merge page frags (skb_shinfo(skb)->frags)
in one skb, rather than using the less efficient frag_list.

It also adds a new interface, napi_gro_frags to allow drivers
to inject page frags directly into the stack without allocating
an skb.  This is intended to be the GRO equivalent for LRO's
lro_receive_frags interface.

The existing GSO interface can already handle page frags with
or without an appended frag_list so nothing needs to be changed
there.

The merging itself is rather simple.  We store any new frag entries
after the last existing entry, without checking whether the first
new entry can be merged with the last existing entry.  Making this
check would actually be easy but since no existing driver can
produce contiguous frags anyway it would just be mental masturbation.

If the total number of entries would exceed the capacity of a
single skb, we simply resort to using frag_list as we do now.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 16:13:40 -08:00
David S. Miller 14deae4156 ipv6: Fix sporadic sendmsg -EINVAL when sending to multicast groups.
Thanks to excellent diagnosis by Eduard Guzovsky.

The core problem is that on a network with lots of active
multicast traffic, the neighbour cache can fill up.  If
we try to allocate a new route and thus neighbour cache
entry, the bog-standard GC attempt the neighbour layer does
in ineffective because route entries hold a reference
to the existing neighbour entries and GC can only liberate
entries with no references.

IPV4 already has a way to handle this, by doing a route cache
GC in such situations (when neigh attach returns -ENOBUFS).

So simply mimick this on the ipv6 side.

Tested-by: Eduard Guzovsky <eguzovsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04 16:04:39 -08:00
Heiko Carstens 9ea09af3bd stop_machine: introduce stop_machine_create/destroy.
Introduce stop_machine_create/destroy. With this interface subsystems
that need a non-failing stop_machine environment can create the
stop_machine machine threads before actually calling stop_machine.
When the threads aren't needed anymore they can be killed with
stop_machine_destroy again.

When stop_machine gets called and the threads aren't present they
will be created and destroyed automatically. This restores the old
behaviour of stop_machine.

This patch also converts cpu hotplug to the new interface since it
is special: cpu_down calls __stop_machine instead of stop_machine.
However the kstop threads will only be created when stop_machine
gets called.

Changing the code so that the threads would be created automatically
on __stop_machine is currently not possible: when __stop_machine gets
called we hold cpu_add_remove_lock, which is the same lock that
create_rt_workqueue would take. So the workqueue needs to be created
before the cpu hotplug code locks cpu_add_remove_lock.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-05 08:40:14 +10:30
Helge Deller 088af9a6e0 module: fix module loading failure of large kernel modules for parisc
When creating the final layout of a kernel module in memory, allow the
module loader to reserve some additional memory in front of a given section.
This is currently only needed for the parisc port which needs to put the
stub entries there to fulfill the 17/22bit PCREL relocations with large
kernel modules like xfs.

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (renamed fn)
2009-01-05 08:40:13 +10:30
Alessandro Zummo 099e657625 rtc: add alarm/update irq interfaces
Add standard interfaces for alarm/update irqs enabling.  Drivers are no
more required to implement equivalent ioctl code as rtc-dev will provide
it.

UIE emulation should now be handled correctly and will work even for those
RTC drivers who cannot be configured to do both UIE and AIE.

Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04 13:33:20 -08:00
Nick Piggin 54566b2c15 fs: symlink write_begin allocation context fix
With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened.  They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim.  This bug could
cause filesystem deadlocks.

The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock.  The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.

Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
this flag in their write_begin function.  Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).

This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
random example).

[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@kernel.org>		[2.6.28.x]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Cleaned up the calling convention: just pass in the AOP flags
  untouched to the grab_cache_page_write_begin() function.  That
  just simplifies everybody, and may even allow future expansion of the
  logic.   - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04 13:33:20 -08:00
Pekka Enberg c644f0e4b5 fs: introduce bgl_lock_ptr()
As suggested by Andreas Dilger, introduce a bgl_lock_ptr() helper in
<linux/blockgroup_lock.h> and add separate sb_bgl_lock() helpers to
filesystem specific header files to break the hidden dependency to
struct ext[234]_sb_info.

Also, while at it, convert the macros to static inlines to try make up
for all the times I broke Andrew Morton's tree.

Acked-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04 13:33:20 -08:00
Randy Dunlap 0a30c5cefa spi.h uses/needs device.h
Include header files as used/needed:

  In file included from drivers/leds/leds-dac124s085.c:16:
  include/linux/spi/spi.h:66: error: field 'dev' has incomplete type
  include/linux/spi/spi.h: In function 'to_spi_device':
  include/linux/spi/spi.h💯 warning: type defaults to 'int' in declaration of '__mptr'
  ...

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04 13:33:20 -08:00
Al Viro 5af75d8d58 audit: validate comparison operations, store them in sane form
Don't store the field->op in the messy (and very inconvenient for e.g.
audit_comparator()) form; translate to dense set of values and do full
validation of userland-submitted value while we are at it.

->audit_init_rule() and ->audit_match_rule() get new values now; in-tree
instances updated.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04 15:14:42 -05:00
Al Viro e45aa212ea audit rules ordering, part 2
Fix the actual rule listing; add per-type lists _not_ used for matching,
with all exit,... sitting on one such list.  Simplifies "do something
for all rules" logics, while we are at it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04 15:14:42 -05:00
Al Viro 0590b9335a fixing audit rule ordering mess, part 1
Problem: ordering between the rules on exit chain is currently lost;
all watch and inode rules are listed after everything else _and_
exit,never on one kind doesn't stop exit,always on another from
being matched.

Solution: assign priorities to rules, keep track of the current
highest-priority matching rule and its result (always/never).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04 15:14:41 -05:00
Al Viro 57f71a0af4 sanitize audit_log_capset()
* no allocations
* return void
* don't duplicate checked for dummy context

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04 15:14:41 -05:00
Al Viro 157cf649a7 sanitize audit_fd_pair()
* no allocations
* return void

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04 15:14:41 -05:00
Al Viro 564f6993ff sanitize audit_mq_open()
* don't bother with allocations
* don't do double copy_from_user()
* don't duplicate parts of check for audit_dummy_context()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04 15:14:41 -05:00
Al Viro c32c8af43b sanitize AUDIT_MQ_SENDRECV
* logging the original value of *msg_prio in mq_timedreceive(2)
  is insane - the argument is write-only (i.e. syscall always
  ignores the original value and only overwrites it).
* merge __audit_mq_timed{send,receive}
* don't do copy_from_user() twice
* don't mess with allocations in auditsc part
* ... and don't bother checking !audit_enabled and !context in there -
  we'd already checked for audit_dummy_context().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04 15:14:40 -05:00
Al Viro 20114f71b2 sanitize audit_mq_notify()
* don't copy_from_user() twice
* don't bother with allocations
* don't duplicate parts of audit_dummy_context()
* make it return void

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04 15:14:40 -05:00
Al Viro 7392906ea9 sanitize audit_mq_getsetattr()
* get rid of allocations
* make it return void
* don't duplicate parts of audit_dummy_context()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04 15:14:40 -05:00
Al Viro e816f370cb sanitize audit_ipc_set_perm()
* get rid of allocations
* make it return void
* simplify callers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04 15:14:40 -05:00
Al Viro a33e675100 sanitize audit_ipc_obj()
* get rid of allocations
* make it return void
* simplify callers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04 15:14:39 -05:00
Al Viro f3298dc4f2 sanitize audit_socketcall
* don't bother with allocations
* now that it can't fail, make it return void

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04 15:14:39 -05:00
David Brownell 0931a4c6db mfd: dm355evm msp430 driver
Basic MFD framework for the MSP430 microcontroller firmware used
on the dm355evm board:

 - Provides an interface for other drivers: register read/write
   utilities, and register declarations.

 - Directly exports:
     * Many signals through the GPIO framework
         + LEDs
         + SW6 through gpio sysfs
	 + NTSC/nPAL jumper through gpio sysfs
	 + ... more could be added later, e.g. MMC signals
     * Child devices:
	+ LEDs, via leds-gpio child (and default triggers)
	+ RTC, via rtc-dm355evm child device
	+ Buttons and IR control, via dm355evm_keys

 - Supports power-off system call.  Use the reset button to power
   the board back up; the power supply LED will be on, but the
   MSP430 waits to re-activate the regulators.

 - On probe() this:
     * Announces firmware revision
     * Turns off the banked LEDs
     * Exports the resources noted above
     * Hooks the power-off support
     * Muxes tvp5146 -or- imager for video input

Unless the new tvp514x driver (tracked for mainline) is configured,
this assumes that some custom imager driver handles video-in.

This completely ignores the registers reporting the output voltages
on the various power supplies.  Someone could add a hwmon interface
if that seems useful.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2009-01-04 12:17:43 +01:00
Mark Brown ca23f8c1b0 mfd: Add WM8351 support
The WM8351 is a WM8350 variant. As well as register default changes the
WM8351 has fewer voltage and current regulators than the WM8350.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2009-01-04 12:17:42 +01:00
Mark Brown 645524a9c6 mfd: Support configurable numbers of DCDCs and ISINKs on WM8350
Some WM8350 variants have fewer DCDCs and ISINKs. Identify these at
probe and refuse to use the absent DCDCs when running on these chips.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2009-01-04 12:17:42 +01:00
Mark Brown 9692063062 mfd: Add WM8352 support
The WM8352 is a variant of the WM8350. Aside from the register defaults
there are no software visible differences to the WM8350.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2009-01-04 12:17:42 +01:00
Mike Rapoport 856f6fd119 mfd: Dialog DA9030 battery charger MFD driver
This patch amends DA903x MFD driver with definitions and methods
needed for battery charger driver.

Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2009-01-04 12:17:41 +01:00
David Brownell b73eac7871 mfd: twl4030 regulator bug fixes
This contains two bugfixes to the initial twl4030 regulator
support patch related to USB:

 (a) always overwrite the old list of consumers ... else
     the regulator handles all use the same "usb1v5" name;
 (b) don't set up the "usbcp" regulator, which turns out
     to be managed through separate controls, usually ULPI
     directly from the OTG controller.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2009-01-04 12:17:40 +01:00
David Brownell dad759ff8b mfd: twl4030: create some regulator devices
Initial code to create twl4030 voltage regulator devices, using
the new regulator framework.  Note that this now starts to care
what name is used to declare the TWL chip:

 - TWL4030 is the "old" chip; newer ones have a bigger variety
   of VAUX2 voltages.

 - TWL5030 is the core "new" chip; TPS65950 is its catalog version.

 - The TPS65930 and TPS65920 are cost-reduced catalog versions of
   TWL5030 parts ... fewer regulators, no battery charger, etc.

Board-specific regulator configuration should be provided, listing
which regulators are used and their constraints (e.g. 1.8V only).

Code that could ("should"?) leverage the regulator stuff includes
TWL4030 USB transceiver support and MMC glue, LCD support for the
3430SDP and Labrador boards, and S-Video output.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2009-01-04 12:17:39 +01:00
David Brownell 67460a7c26 mfd: twl4030: cleanup symbols and OMAP dependency
Finish removing dependency of TWL driver stack on platform-specific
IRQ definitions ... and remove the build dependency on OMAP.

This lets the TWL4030 code be included in test builds for most
platforms, and will make it easier for non-OMAP folk to update
most of this code for new APIs etc.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2009-01-04 12:17:39 +01:00
Mark Brown 4008e879e1 power_supply: Add battery health reporting for WM8350
Implement support for reporting battery health in the WM8350 battery
interface. Since we are now able to report this via the classs remove
the diagnostics from the interrupt handler.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Anton Vorontsov <cbouatmailru@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2009-01-04 12:17:39 +01:00
Mark Brown 7e386e6e0e power_supply: Add cold to the POWER_SUPPLY_HEALTH report values
Some systems are able to report problems with batteries being under
temperature.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Anton Vorontsov <cbouatmailru@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2009-01-04 12:17:39 +01:00
Mark Brown b797a55519 mfd: Refactor WM8350 chip identification
Since the WM8350 driver was originally written the semantics for the
identification registers of the chip have been clarified, allowing
us to do an exact match on all the fields. This avoids mistakenly
running on unsupported hardware.

Also change to using the datasheet names more consistently for
legibility and fix a printk() that should be dev_err().

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2009-01-04 12:17:39 +01:00
Mark Brown d756f4a444 mfd: Switch WM8350 revision detection to a feature based model
Rather than check for chip revisions in the WM8350 drivers have the core
code set flags for relevant differences.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2009-01-04 12:17:38 +01:00
Mark Brown 14431aa0c5 power_supply: Add support for WM8350 PMU
This patch adds support for the PMU provided by the WM8350 which
implements battery, line and USB supplies including a battery charger.
The hardware functions largely autonomously, with minimal software
control required to initiate fast charging.

Support for configuration of the USB supply is not yet implemented.
This means that the hardware will remain in the mode configured at
startup, by default limiting the current drawn from USB to 100mA.

This driver was originally written by Liam Girdwood with subsequent
updates for submission by Mark Brown.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Anton Vorontsov <cbouatmailru@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2009-01-04 12:17:38 +01:00
David Brownell 3fba19ec1a mfd: allow reading entire register banks on twl4030
Minor change to the TWL4030 utility interface:  support reads
of all 256 bytes in each register bank (vs just 255).  This
can help when debugging, but is otherwise a NOP.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2009-01-04 12:17:38 +01:00
Mark Brown 6748852634 mfd: Add AUXADC support for WM8350
The auxiliary ADC in the WM8350 is shared between several subdevices
so access to it needs to be arbitrated by the core driver.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2009-01-04 12:17:38 +01:00
Mark Brown 0c8a601678 mfd: Add WM8350 revision H support
No other software changes are required.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
2009-01-04 12:17:38 +01:00
Ingo Molnar c66b9906f8 intel-iommu: fix build error with INTR_REMAP=y and DMAR=n
dmar.o can be built in the CONFIG_INTR_REMAP=y case but
iommu_calculate_agaw() is only available if VT-d is built as well.

So create an inline version of iommu_calculate_agaw() for the
!CONFIG_DMAR case. The iommu->agaw value wont be used in this
case, but the code is cleaner (has less #ifdefs) if we have it around
unconditionally.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-04 11:00:05 +01:00
Theodore Ts'o c319106723 ext4: Remove code to create the journal inode
This code has been obsolete in quite some time, since the supported
method for adding a journal inode is to use tune2fs (or to creating
new filesystem with a journal via mke2fs or mkfs.ext4).

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-06 11:14:25 -05:00
Hannes Eder 725cf0f47d HID: avoid sparse warning in HID_COMPAT_LOAD_DRIVER
Impact: include a prototype for the exported function in the macro

Fix about 20 of this warnings:

  drivers/hid/hid-a4tech.c:162:1: warning: symbol 'hid_compat_a4tech' was not declared. Should it be static?

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-01-04 01:00:53 +01:00
Jiri Slaby 3a6f82f7a2 HID: add dynids facility
Allow adding new devices to the hid drivers on the fly without
a need of kernel recompilation.

Now, one can test a driver e.g. by:
echo 0003:045E:00F0.0003 > ../generic-usb/unbind
echo 0003 045E 00F0 > new_id
from some driver subdir.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-01-04 01:00:52 +01:00
Jiri Slaby 0ed94b3342 HID: move usbhid flags to usbhid.h
Move usbhid specific flags from global hid.h into local usbhid.h.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-01-04 01:00:51 +01:00
Jiri Kosina 9188e79ec3 HID: add phys and name ioctls to hidraw
The hiddev interface provides ioctl() calls which can be used
to obtain phys and raw name of the underlying device.

Add the corresponding support also into hidraw.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-01-04 01:00:51 +01:00
Linus Torvalds 7d3b56ba37 Merge branch 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits)
  x86: setup_per_cpu_areas() cleanup
  cpumask: fix compile error when CONFIG_NR_CPUS is not defined
  cpumask: use alloc_cpumask_var_node where appropriate
  cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t
  x86: use cpumask_var_t in acpi/boot.c
  x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids
  sched: put back some stack hog changes that were undone in kernel/sched.c
  x86: enable cpus display of kernel_max and offlined cpus
  ia64: cpumask fix for is_affinity_mask_valid()
  cpumask: convert RCU implementations, fix
  xtensa: define __fls
  mn10300: define __fls
  m32r: define __fls
  h8300: define __fls
  frv: define __fls
  cris: define __fls
  cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
  cpumask: zero extra bits in alloc_cpumask_var_node
  cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/
  cpumask: convert mm/
  ...
2009-01-03 12:04:39 -08:00
Linus Torvalds 269b012321 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu: (89 commits)
  AMD IOMMU: remove now unnecessary #ifdefs
  AMD IOMMU: prealloc_protection_domains should be static
  kvm/iommu: fix compile warning
  AMD IOMMU: add statistics about total number of map requests
  AMD IOMMU: add statistics about allocated io memory
  AMD IOMMU: add stats counter for domain tlb flushes
  AMD IOMMU: add stats counter for single iommu domain tlb flushes
  AMD IOMMU: add stats counter for cross-page request
  AMD IOMMU: add stats counter for free_coherent requests
  AMD IOMMU: add stats counter for alloc_coherent requests
  AMD IOMMU: add stats counter for unmap_sg requests
  AMD IOMMU: add stats counter for map_sg requests
  AMD IOMMU: add stats counter for unmap_single requests
  AMD IOMMU: add stats counter for map_single requests
  AMD IOMMU: add stats counter for completion wait events
  AMD IOMMU: add init code for statistic collection
  AMD IOMMU: add necessary header defines for stats counting
  AMD IOMMU: add Kconfig entry for statistic collection code
  AMD IOMMU: use dev_name in iommu_enable function
  AMD IOMMU: use calc_devid in prealloc_protection_domains
  ...
2009-01-03 12:03:52 -08:00
Linus Torvalds f60a0a7984 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (34 commits)
  V4L/DVB (10173): Missing v4l2_prio_close in radio_release
  V4L/DVB (10172): add DVB_DEVICE_TYPE= to uevent
  V4L/DVB (10171): Use usb_set_intfdata
  V4L/DVB (10170): tuner-simple: prevent possible OOPS caused by divide by zero error
  V4L/DVB (10168): sms1xxx: fix inverted gpio for lna control on tiger r2
  V4L/DVB (10167): sms1xxx: add support for inverted gpio
  V4L/DVB (10166): dvb frontend: stop using non-C99 compliant comments
  V4L/DVB (10165): Add FE_CAN_2G_MODULATION flag to frontends that support DVB-S2
  V4L/DVB (10164): Add missing S2 caps flag to S2API
  V4L/DVB (10163): em28xx: allocate adev together with struct em28xx dev
  V4L/DVB (10162): tuner-simple: Fix tuner type set message
  V4L/DVB (10161): saa7134: fix autodetection for AVer TV GO 007 FM Plus
  V4L/DVB (10160): em28xx: update chip id for em2710
  V4L/DVB (10157): Add USB ID for the Sil4701 radio from DealExtreme
  V4L/DVB (10156): saa7134: Add support for Avermedia AVer TV GO 007 FM Plus
  V4L/DVB (10155): Add TEA5764 radio driver
  V4L/DVB (10154): saa7134: fix a merge conflict on Behold H6 board
  V4L/DVB (10153): Add the Beholder H6 card to DVB-T part of sources.
  V4L/DVB (10152): Change configuration of the Beholder H6 card
  V4L/DVB (10151): Fix I2C bridge error in zl10353
  ...
2009-01-03 12:02:18 -08:00
Yinghai Lu 2f98357001 sparseirq: move set/get_timer_rand_state back to .c
those two functions only used in that C file

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-03 12:01:23 -08:00
Linus Torvalds e9e67a8b57 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
  mmc: warn about voltage mismatches
  mmc_spi: Add support for OpenFirmware bindings
  pxamci: fix dma_unmap_sg length
  mmc_block: ensure all sectors that do not have errors are read
  drivers/mmc: Move a dereference below a NULL test
  sdhci: handle built-in sdhci with modular leds class
  mmc: balanc pci_iomap with pci_iounmap
  mmc_block: print better error messages
  mmc: Add mmc_vddrange_to_ocrmask() helper function
  ricoh_mmc: Handle newer models of Ricoh controllers
  mmc: Add 8-bit bus width support
  sdhci: activate led support also when module
  mmc: trivial annotation of 'blocks'
  pci: use pci_ioremap_bar() in drivers/mmc
  sdricoh_cs: Add support for Bay Controller devices
  mmc: at91_mci: reorder timer setup and mmc_add_host() call
2009-01-03 12:00:07 -08:00
Linus Torvalds 61420f59a5 Merge branch 'cputime' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
* 'cputime' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
  [PATCH] fast vdso implementation for CLOCK_THREAD_CPUTIME_ID
  [PATCH] improve idle cputime accounting
  [PATCH] improve precision of idle time detection.
  [PATCH] improve precision of process accounting.
  [PATCH] idle cputime accounting
  [PATCH] fix scaled & unscaled cputime accounting
2009-01-03 11:56:24 -08:00
Rusty Russell 2fdf66b491 cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t
Impact: Reduce memory usage, use new API.

This is part of an effort to reduce structure sizes for machines
configured with large NR_CPUS.  cpumask_t gets replaced by
cpumask_var_t, which is either struct cpumask[1] (small NR_CPUS) or
struct cpumask * (large NR_CPUS).

(Changes to powernow-k* by <travis>.)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-03 19:15:40 +01:00
Ingo Molnar e465b535ce Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask into cpus4096-v2 2009-01-03 18:54:51 +01:00
Mike Travis 7eb1955336 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask into merge-rr-cpumask
Conflicts:
	arch/x86/kernel/io_apic.c
	kernel/rcuclassic.c
	kernel/sched.c
	kernel/time/tick-sched.c

Signed-off-by: Mike Travis <travis@sgi.com>
[ mingo@elte.hu: backmerged typo fix for io_apic.c ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-03 18:53:31 +01:00
Theodore Ts'o 87d8fe1ee6 add releasepage hooks to block devices which can be used by file systems
Implement blkdev_releasepage() to release the buffer_heads and pages
after we release private data belonging to a mounted filesystem.

Cc: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-03 09:47:09 -05:00
Joerg Roedel e4754c96cf VT-d: remove now unused intel_iommu_found function
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-01-03 14:11:08 +01:00
Joerg Roedel d14d65777c VT-d: adapt domain iova_to_phys function for IOMMU API
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-01-03 14:11:08 +01:00
Joerg Roedel dde57a210d VT-d: adapt domain map and unmap functions for IOMMU API
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-01-03 14:11:08 +01:00
Joerg Roedel 4c5478c94e VT-d: adapt device attach and detach functions for IOMMU API
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-01-03 14:11:08 +01:00
Joerg Roedel 5d450806eb VT-d: adapt domain init and destroy functions for IOMMU API
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-01-03 14:11:07 +01:00
Joerg Roedel 19de40a847 KVM: change KVM to use IOMMU API
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-01-03 14:11:07 +01:00
Joerg Roedel 4a77a6cf6d introcude linux/iommu.h for an iommu api
This patch introduces the API to abstract the exported VT-d functions
for KVM into a generic API. This way the AMD IOMMU implementation can
plug into this API later.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-01-03 14:10:09 +01:00
Weidong Han b653574a7d Deassign device in kvm_free_assgined_device
In kvm_iommu_unmap_memslots(), assigned_dev_head is already empty.

Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-01-03 14:10:08 +01:00
Weidong Han 0a92035674 KVM: support device deassignment
Support device deassignment, it can be used in device hotplug.

Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-01-03 14:02:19 +01:00
Weidong Han 260782bcfd KVM: use the new intel iommu APIs
intel iommu APIs are updated, use the new APIs.

In addition, change kvm_iommu_map_guest() to just create the domain, let kvm_iommu_assign_device() assign device.

Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-01-03 14:02:19 +01:00
Weidong Han faa3d6f5ff Change intel iommu APIs of virtual machine domain
These APIs are used by KVM to use VT-d

Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-01-03 14:02:18 +01:00
Weidong Han 1b5736839a calculate agaw for each iommu
"SAGAW" capability may be different across iommus. Use a default agaw, but if default agaw is not supported in some iommus, choose a less supported agaw.

Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-01-03 14:02:18 +01:00
Mark McLoughlin 2abd7e167c intel-iommu: move iommu_prepare_gfx_mapping() out of dma_remapping.h
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-03 11:57:35 +01:00
Mark McLoughlin 58fa7304a2 intel-iommu: kill off duplicate def of dmar_disabled
This is only used in dmar.c and intel-iommu.h, so dma_remapping.h
seems like the appropriate place for it.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-03 11:57:35 +01:00
Mark McLoughlin a647dacbb1 intel-iommu: move struct device_domain_info out of dma_remapping.h
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-03 11:57:35 +01:00
Mark McLoughlin 99126f7ce1 intel-iommu: move struct dmar_domain def out dma_remapping.h
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-03 11:57:35 +01:00
Mark McLoughlin 622ba12a4c intel-iommu: move DMA PTE defs out of dma_remapping.h
DMA_PTE_READ/WRITE are needed by kvm.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-03 11:57:35 +01:00
Mark McLoughlin 7a8fc25e0c intel-iommu: move context entry defs out from dma_remapping.h
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-03 11:57:35 +01:00
Mark McLoughlin 46b08e1a76 intel-iommu: move root entry defs from dma_remapping.h
We keep the struct root_entry forward declaration for the
pointer in struct intel_iommu.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-03 11:57:35 +01:00
Mark McLoughlin f27be03b27 intel-iommu: move DMA_32/64BIT_PFN into intel-iommu.c
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-03 11:57:34 +01:00
Mark McLoughlin 519a054915 intel-iommu: make init_dmars() static
init_dmars() is not used outside of drivers/pci/intel-iommu.c

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-03 11:57:34 +01:00
Mark McLoughlin 015ab17dc2 intel-iommu: remove some unused struct intel_iommu fields
The seg, saved_msg and sysdev fields appear to be unused since
before the code was first merged.

linux/msi.h is not needed in linux/intel-iommu.h anymore since
there is no longer a reference to struct msi_msg. The MSI code
in drivers/pci/intel-iommu.c still has linux/msi.h included
via linux/dmar.h.

linux/sysdev.h isn't needed because there is no reference to
struct sys_device.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-03 11:57:34 +01:00
Rusty Russell 5ece5c5192 xtensa: define __fls
Like fls, but can't be handed 0 and returns the bit number.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-03 16:21:08 +10:30
Rusty Russell 5c134dad43 mn10300: define __fls
Like fls, but can't be handed 0 and returns the bit number.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-03 16:19:03 +10:30
Rusty Russell 16a206260e m32r: define __fls
Like fls, but can't be handed 0 and returns the bit number.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-03 16:16:54 +10:30
Rusty Russell ee38e5140b frv: define __fls
Like fls, but can't be handed 0 and returns the bit number.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-03 16:14:05 +10:30
Ingo Molnar 6680598b44 Disallow gcc versions 3.{0,1}
GCC 3.0 and 3.1 are too old to build a working kernel.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ This check got dropped as obsolete when I simplified the gcc header
  inclusion mess in f153b82121, but Willy
  Tarreau reports actually having those old versions still..  -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 12:19:34 -08:00
Linus Torvalds b840d79631 Merge branch 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
  x86: export vector_used_by_percpu_irq
  x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
  sched: nominate preferred wakeup cpu, fix
  x86: fix lguest used_vectors breakage, -v2
  x86: fix warning in arch/x86/kernel/io_apic.c
  sched: fix warning in kernel/sched.c
  sched: move test_sd_parent() to an SMP section of sched.h
  sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
  sched: activate active load balancing in new idle cpus
  sched: bias task wakeups to preferred semi-idle packages
  sched: nominate preferred wakeup cpu
  sched: favour lower logical cpu number for sched_mc balance
  sched: framework for sched_mc/smt_power_savings=N
  sched: convert BALANCE_FOR_xx_POWER to inline functions
  x86: use possible_cpus=NUM to extend the possible cpus allowed
  x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
  x86: update io_apic.c to the new cpumask code
  x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
  x86: xen: use smp_call_function_many()
  x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
  ...

Fixed up trivial conflict in kernel/time/tick-sched.c manually
2009-01-02 11:44:09 -08:00
Linus Torvalds 597b0d2162 Merge branch 'kvm-updates/2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates/2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (140 commits)
  KVM: MMU: handle large host sptes on invlpg/resync
  KVM: Add locking to virtual i8259 interrupt controller
  KVM: MMU: Don't treat a global pte as such if cr4.pge is cleared
  MAINTAINERS: Maintainership changes for kvm/ia64
  KVM: ia64: Fix kvm_arch_vcpu_ioctl_[gs]et_regs()
  KVM: x86: Rework user space NMI injection as KVM_CAP_USER_NMI
  KVM: VMX: Fix pending NMI-vs.-IRQ race for user space irqchip
  KVM: fix handling of ACK from shared guest IRQ
  KVM: MMU: check for present pdptr shadow page in walk_shadow
  KVM: Consolidate userspace memory capability reporting into common code
  KVM: Advertise the bug in memory region destruction as fixed
  KVM: use cpumask_var_t for cpus_hardware_enabled
  KVM: use modern cpumask primitives, no cpumask_t on stack
  KVM: Extract core of kvm_flush_remote_tlbs/kvm_reload_remote_mmus
  KVM: set owner of cpu and vm file operations
  anon_inodes: use fops->owner for module refcount
  x86: KVM guest: kvm_get_tsc_khz: return khz, not lpj
  KVM: MMU: prepopulate the shadow on invlpg
  KVM: MMU: skip global pgtables on sync due to cr3 switch
  KVM: MMU: collapse remote TLB flushes on root sync
  ...
2009-01-02 11:41:11 -08:00
Mauro Carvalho Chehab e4cda3e072 V4L/DVB (10166): dvb frontend: stop using non-C99 compliant comments
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-01-02 17:15:07 -02:00
Klaus Schmidinger cb889a2f35 V4L/DVB (10164): Add missing S2 caps flag to S2API
The attached patch adds a capability flag that allows an application
to determine whether a particular device can handle "second generation
modulation" transponders. This is necessary in order for applications
to be able to decide which device to use for a given channel in
a multi device environment, where DVB-S and DVB-S2 devices are mixed.

It is assumed that a device capable of handling "second generation
modulation" can implicitly handle "first generation modulation".
The flag is not named anything with DVBS2 in order to allow its
use with future DVBT2 devices as well (should they ever come).

Signed-off by: Klaus Schmidinger <Klaus.Schmidinger@cadsoft.de>

Acked-by: Steven Toth <stoth@linuxtv.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-01-02 17:15:01 -02:00
Hans Verkuil aecde8b53b V4L/DVB (10141): v4l2: debugging API changed to match against driver name instead of ID.
Since the i2c driver ID will be removed in the near future we have to
modify the v4l2 debugging API to use the driver name instead of driver ID.

Note that this API is not used in applications other than v4l2-dbg.cpp
as it is for debugging and testing only.

Should anyone use the old VIDIOC_G_CHIP_IDENT, then this will be logged
with a warning that it is deprecated and will be removed in 2.6.30.

Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-01-02 17:11:52 -02:00
Hans Verkuil 9bb7cde793 V4L/DVB (10139): v4l: rename v4l_compat_ioctl32 to v4l2_compat_ioctl32
This rename prevents conflicts with the older compat_ioctl32 module.

Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-01-02 17:11:39 -02:00
Hans Verkuil 069b747931 V4L/DVB (10138): v4l2-ioctl: change to long return type to match unlocked_ioctl.
Since internal to v4l2 the ioctl prototype is the same regardless of it
being called through .ioctl or .unlocked_ioctl, we need to convert it all
to the long return type of unlocked_ioctl.

Thanks to Jean-Francois Moine for posting an initial patch for this and
thus bringing it to our attention.

Cc: Jean-Francois Moine <moinejf@free.fr>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-01-02 17:11:34 -02:00
Hans Verkuil bec43661b1 V4L/DVB (10135): v4l2: introduce v4l2_file_operations.
Introduce a struct v4l2_file_operations for v4l2 drivers.

Remove the unnecessary inode argument.

Move compat32 handling (and llseek) into the v4l2-dev core: this is now
handled in the v4l2 core and no longer in the drivers themselves.

Note that this changeset reverts an earlier patch that changed the return
type of__video_ioctl2 from int to long. This change will be reinstated
later in a much improved version.

Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2009-01-02 17:11:12 -02:00
Linus Torvalds 2640c9a90f Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (32 commits)
  ide-atapi: start dma in a drive-specific way
  ide-atapi: put the rest of non-ide-cd code into the else-clause of ide_transfer_pc
  ide-atapi: remove timeout arg to ide_issue_pc
  ide-cd: remove handler wrappers
  ide-cd: remove xferlen arg to cdrom_start_packet_command
  ide-atapi: split drive-specific functionality in ide_issue_pc
  ide-atapi: assign expiry and timeout based on device type
  ide-atapi: compute cmd_len based on device type in ide_transfer_pc
  ide: remove the last ide-scsi remnants
  ide-atapi: remove ide-scsi remnants from ide_pc_intr()
  ide-atapi: remove ide-scsi remnants from ide_transfer_pc()
  ide-atapi: remove ide-scsi remnants from ide_issue_pc
  ide-cd: move cdrom_timer_expiry to ide-atapi.c
  ide-atapi: teach ide atapi about drive->waiting_for_dma
  ide-atapi: accomodate transfer length calculation for ide-cd
  ide-atapi: setup dma for ide-cd
  ide-atapi: combine drive-specific assignments
  ide-atapi: add a dev_is_idecd-inline
  remove ide-scsi
  ide-floppy: allocate only toplevel packet commands
  ...
2009-01-02 10:32:18 -08:00
Linus Torvalds 80618fa83a Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/dvrabel/uwb
* 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/dvrabel/uwb: (31 commits)
  uwb: remove beacon cache entry after calling uwb_notify()
  uwb: remove unused include/linux/uwb/debug.h
  uwb: use print_hex_dump()
  uwb: use dev_dbg() for debug messages
  uwb: fix memory leak in uwb_rc_notif()
  wusb: fix oops when terminating a non-existant reservation
  uwb: fix oops when terminating an already terminated reservation
  uwb: improved MAS allocator and reservation conflict handling
  wusb: add debug files for ASL, PZL and DI to the whci-hcd driver
  uwb: fix oops in debug PAL's reservation callback
  uwb: clean up whci_wait_for() timeout error message
  wusb: whci-hcd shouldn't do ASL/PZL updates while channel is inactive
  uwb: remove unused beacon group join/leave events
  wlp: start/stop radio on network interface up/down
  uwb: add basic radio manager
  uwb: add pal parameter to new reservation callback
  uwb: fix races between events and neh timers
  uwb: don't unbind the radio controller driver when resetting
  uwb: per-radio controller event thread and beacon cache
  uwb: add commands to add/remove IEs to the debug interface
  ...
2009-01-02 10:31:04 -08:00
Linus Torvalds d2fde28ce7 Merge branch 'tty-updates' from Alan
* tty-updates: (75 commits)
  serial_8250: support for Sealevel Systems Model 7803 COMM+8
  hso maintainers update patch
  hso modem detect fix patch against Alan Cox'es tty tree
  tty: Fix an ircomm warning and note another bug
  drivers/char/cyclades.c: cy_pci_probe: fix error path
  Serial: UART driver changes for Cavium OCTEON.
  Serial: Allow port type to be specified when calling serial8250_register_port.
  8250: Serial driver changes to support future Cavium OCTEON serial patches.
  8250: Don't clobber spinlocks.
  fix for tty-serial-move-port
  tty: We want the port object to be persistent
  __FUNCTION__ is gcc-specific, use __func__
  serial: RS485 ioctl structure uses __u32 include linux/types.h
  tty: Drop the lock_kernel in the private ioctl hook
  synclink_cs: Convert to tty_port
  tty: use port methods for the rocket driver
  tty: kref the rocket driver
  tty: make rocketport use standard port->flags
  tty: Redo the rocket driver locking
  tty: Make epca use the port helpers
  ...
2009-01-02 10:26:01 -08:00
Flavio Leitner e65f0f8271 serial_8250: support for Sealevel Systems Model 7803 COMM+8
Add support for Sealevel Systems Model 7803 COMM+8

Signed-off-by: Flavio Leitner <fleitner@redhat.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:19:44 -08:00
David Daney 6b06f19151 Serial: UART driver changes for Cavium OCTEON.
Cavium UART implementation is not covered by existing uart_configS.
Define a new uart_config (PORT_OCTEON) which is specified by OCTEON
platform device registration code.

Signed-off-by: Tomaso Paoletti <tpaoletti@caviumnetworks.com>
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:19:43 -08:00
David Daney 8e23fcc89c Serial: Allow port type to be specified when calling serial8250_register_port.
Add flag value UPF_FIXED_TYPE which specifies that the UART type is
known and should not be probed.  For this case the UARTs properties
are just copied out of the uart_config entry.

This allows us to keep SOC specific 8250 probe code out of 8250.c.  In
this case we know the serial hardware will not be changing as it is on
the same silicon as the CPU, and we can specify it with certainty in
the board/cpu setup code.

The alternative is to load up 8250.c with a bunch of OCTEON specific
special cases in the probing code.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:19:43 -08:00
David Daney 7d6a07d123 8250: Serial driver changes to support future Cavium OCTEON serial patches.
In order to use Cavium OCTEON specific serial i/o drivers, we first
patch the 8250 driver to use replaceable I/O functions.  Compatible
I/O functions are added for existing iotypeS.

An added benefit of this change is that it makes it easy to factor
some of the existing special cases out to board/SOC specific support
code.

The alternative is to load up 8250.c with a bunch of OCTEON specific
iotype code and bug work-arounds.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Tomaso Paoletti <tpaoletti@caviumnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:19:43 -08:00
Alan Cox f751928e0d tty: We want the port object to be persistent
Move the tty_port and uart_info bits around a little. By embedding the uart_info
into the uart_port we get rid of lots of corner case testing and also get the
ability to go port<->state<->info which is a bit more elegant than the current
data structures.

Downsides - we allocate a tiny bit more memory for unused ports, upside we've
removed as much code as it saved for most users..

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:19:42 -08:00
Andy Whitcroft 60c20fb8c0 serial: RS485 ioctl structure uses __u32 include linux/types.h
In the commit below a new struct serial_rs485 was introduced for a new
ioctl:

    commit c26c56c0f4
    Author: Alan Cox <alan@redhat.com>
    Date:   Mon Oct 13 10:37:48 2008 +0100

	tty: Cris has a nice RS485 ioctl so we should steal it

This structure uses the __u32 types for some of its members, which leads
to the following compile error:

    $ cc -I.../include -c X.c
    In file included from X.c:2: .../include/linux/serial.h:185:
		error: expected specifier-qualifier-list before ‘__u32’
    $

It seems that these types are appropriate for this structure as it is
to be exposed to userspace.  These types are available via linux/types.h
so move the include of that outside the __KERNEL__ section.

Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:19:42 -08:00
Niels de Vos 39aced68d6 serial: set correct baud_base for Oxford Semiconductor Ltd EXSYS EX-41092 Dual 16950 Serial adapter
The PCI-card identified as "Oxford Semiconductor Ltd EXSYS EX-41092 Dual
16950 Serial adapter" is only usable with other devices (i.e. not the same
card) after doing a "setserial /dev/ttyS<n> baud_base 115200".  This
baud_base should be default for this card.

Signed-off-by: Niels de Vos <niels.devos@wincor-nixdorf.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:19:40 -08:00
Alan Cox a6614999e8 tty: Introduce some close helpers for ports
Again this is a lot of common code we can unify

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:19:40 -08:00
Alan Cox 2a6eadbd5a tty: Rework istallion to use the tty port changes
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:19:39 -08:00
Alan Cox 36c621d82b tty: Introduce a tty_port generic block_til_ready
Start sucking more commonality out of the drivers into a single piece of
core code.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:19:39 -08:00
Alan Cox 3e61696bdc isicom: redo locking to use tty port locks
This helps set the basis for moving block_til_ready into common code. We also
introduce a tty_port_hangup helper as this will also be generally needed.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:19:38 -08:00
Alan Cox 5d951fb458 tty: Pull the dtr raise into tty port
This moves another per device special out of what should be shared open
wait paths into private methods

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:19:38 -08:00
Alan Cox 31f35939d1 tty_port: Add a port level carrier detect operation
This is the first step to generalising the various pieces of waiting logic
duplicated in all sorts of serial drivers.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:19:38 -08:00
Alan Cox c9b3976e3f tty: Fix PPP hang under load
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:19:38 -08:00
Russell King 975a1a7d88 And here's a patch (to be applied on top of the last) which prevents
this happening again by making use of 'const'.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:19:37 -08:00
Alan Cox fc6f623822 pty: simplify resize
We have special case logic for resizing pty/tty pairs. We also have a per
driver resize method so for the pty case we should use it.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:19:36 -08:00
Joe Peterson a88a69c912 n_tty: Fix loss of echoed characters and remove bkl from n_tty
Fixes the loss of echoed (and other ldisc-generated characters) when
the tty is stopped or when the driver output buffer is full (happens
frequently for input during continuous program output, such as ^C)
and removes the Big Kernel Lock from the N_TTY line discipline.

Adds an "echo buffer" to the N_TTY line discipline that handles all
ldisc-generated output (including echoed characters).  Along with the
loss of characters, this also fixes the associated loss of sync between
tty output and the ldisc state when characters cannot be immediately
written to the tty driver.

The echo buffer stores (in addition to characters) state operations that need
to be done at the time of character output (like management of the column
position).  This allows echo to cooperate correctly with program output,
since the ldisc state remains consistent with actual characters written.

Since the echo buffer code now isolates the tty column state code
to the process_out* and process_echoes functions, we can remove the
Big Kernel Lock (BKL) and replace it with mutex locks.

Highlights are:

* Handles echo (and other ldisc output) when tty driver buffer is full
  - continuous program output can block echo
* Saves echo when tty is in stopped state (e.g. ^S)
  - (e.g.: ^Q will correctly cause held characters to be released for output)
* Control character pairs (e.g. "^C") are treated atomically and not
  split up by interleaved program output
* Line discipline state is kept consistent with characters sent to
  the tty driver
* Remove the big kernel lock (BKL) from N_TTY line discipline

Signed-off-by: Joe Peterson <joe@skyrush.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:19:35 -08:00
Linus Torvalds f9d1425007 Disallow gcc versions 4.1.{0,1}
These compiler versions are known to miscompile __weak functions and
thus generate kernels that don't necessarily work correctly.  If a weak
function is int he same compilation unit as a caller, gcc may end up
inlining it, and thus binding the weak function too early.

See

    http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27781

for details.

Cc: Adrian Bunk <bunk@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 09:29:43 -08:00
Linus Torvalds f153b82121 Sanitize gcc version header includes
- include the gcc version-dependent header files from the generic gcc
   header file, rather than the other way around (iow: don't make the
   non-gcc header file have to know about gcc versions)

 - don't include compiler-gcc4.h for gcc 5 (for whenever it gets
   released).  That's just confusing and made us do odd things in the
   gcc4 header file (testing that we really had version 4!)

 - generate the name from the __GNUC__ version directly, rather than
   having a mess of #if conditionals.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 09:23:03 -08:00
FUJITA Tomonori 97ae77a1cd [SCSI] block: make blk_rq_map_user take a NULL user-space buffer for WRITE
The commit 818827669d (block: make
blk_rq_map_user take a NULL user-space buffer) extended
blk_rq_map_user to accept a NULL user-space buffer with a READ
command. It was necessary to convert sg to use the block layer mapping
API.

This patch extends blk_rq_map_user again for a WRITE command. It is
necessary to convert st and osst drivers to use the block layer
apping API.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-01-02 11:10:35 -06:00
FUJITA Tomonori 56c451f4b5 [SCSI] block: fix the partial mappings with struct rq_map_data
This fixes bio_copy_user_iov to properly handle the partial mappings
with struct rq_map_data (which only sg uses for now but st and osst
will shortly). It adds the offset member to struct rq_map_data and
changes blk_rq_map_user to update it so that bio_copy_user_iov can add
an appropriate page frame via bio_add_pc_page().

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-01-02 11:10:08 -06:00
Borislav Petkov 28ad91db77 ide-atapi: remove timeout arg to ide_issue_pc
There should be no functionality change resulting from this patch.

Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-02 16:12:56 +01:00
Borislav Petkov 5317464dcc ide: remove the last ide-scsi remnants
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-02 16:12:54 +01:00
Borislav Petkov 5d655a03b8 ide-atapi: remove ide-scsi remnants from ide_pc_intr()
As a result, remove now unused ide_scsi_get_timeout and ide_scsi_expiry.

Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-02 16:12:54 +01:00
Borislav Petkov 4cad085efb ide-cd: move cdrom_timer_expiry to ide-atapi.c
- cdrom_timer_expiry -> ide_cd_expiry
- remove expiry-arg to ide_issue_pc as it is redundant now
- ide_debug_log -> debug_log

Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-02 16:12:53 +01:00
Borislav Petkov 392de1d53d ide-atapi: accomodate transfer length calculation for ide-cd
... by factoring it out of ide_cd_do_request() into a helper, as suggested by
Bart.

There should be no functionality change resulting from this patch.

Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
[bart: BLK_DEV_IDECD needs to select IDE_ATAPI now]
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-02 16:12:52 +01:00
Borislav Petkov bf64741fe8 ide: make IDE_AFLAG_.. numbering continuous again
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-02 16:12:50 +01:00
Bartlomiej Zolnierkiewicz 201bffa464 ide: use per-device request queue locks (v2)
* Move hack for flush requests from choose_drive() to do_ide_request().

* Add ide_plug_device() helper and convert core IDE code from using
  per-hwgroup lock as a request lock to use the ->queue_lock instead.

* Remove no longer needed:
  - choose_drive() function
  - WAKEUP() macro
  - 'sleeping' flag from ide_hwif_t
  - 'service_{start,time}' fields from ide_drive_t

This patch results in much simpler and more maintainable code
(besides being a scalability improvement).

v2:
* Fixes/improvements based on review from Elias:
  - take as many requests off the queue as possible
  - remove now redundant BUG_ON()

Cc: Elias Oltmanns <eo@nebensachen.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-02 16:12:50 +01:00
Bartlomiej Zolnierkiewicz 631de3708d ide: add ide_[un]lock_hwgroup() helpers
Add ide_[un]lock_hwgroup() inline helpers for obtaining exclusive
access to the given hwgroup and update the core code accordingly.

[ This change besides making code saner results in more efficient
  use of ide_{get,release}_lock(). ]

Cc: Michael Schmitz <schmitz@biophys.uni-duesseldorf.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Elias Oltmanns <eo@nebensachen.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-02 16:12:50 +01:00
Bartlomiej Zolnierkiewicz 295f00042a ide: don't execute the next queued command from the hard-IRQ context (v2)
* Tell the block layer that we are not done handling requests by using
  blk_plug_device() in ide_do_request() (request handling function)
  and ide_timer_expiry() (timeout handler) if the queue is not empty.

* Remove optimization which directly calls ide_do_request() for the next
  queued command from the ide_intr() (IRQ handler) and ide_timer_expiry().

* Remove no longer needed IRQ masking from ide_do_request() - in case of
  IDE ports needing serialization disable_irq_nosync()/enable_irq() was
  used for the (possibly shared) IRQ of the other IDE port.

* Put the misplaced comment in the right place in ide_do_request().

* Drop no longer needed 'int masked_irq' argument from ide_do_request().

* Merge ide_do_request() into do_ide_request().

* Remove no longer needed IDE_NO_IRQ define.

While at it:

* Don't use HWGROUP() macro in do_ide_request().

* Use __func__ in ide_intr().

This patch reduces IRQ hadling latency for IDE and improves the system-wide
handling of shared IRQs (which should result in more timeout resistant and
stable IDE systems).  It also makes it possible to do some further changes
later (i.e. replace some busy-waiting delays with sleeping equivalents).

v2:
Changes per review from Elias Oltmanns:
- fix wrong goto statement in 'if (startstop == ide_stopped)' block
- use spin_unlock_irq()
- don't use obsolete HWIF() macro

Cc: Elias Oltmanns <eo@nebensachen.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-02 16:12:48 +01:00
Bartlomiej Zolnierkiewicz ebdab07dad ide: move sysfs support to ide-sysfs.c
While at it:
- media_string() -> ide_media_string()

There should be no functional changes caused by this patch.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-02 16:12:48 +01:00
David Vrabel b21a207141 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into for-upstream
Conflicts:

	drivers/uwb/wlp/eda.c
2009-01-02 13:17:13 +00:00
Linus Torvalds b58602a4ba Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (34 commits)
  nfsd race fixes: jfs
  nfsd race fixes: reiserfs
  nfsd race fixes: ext4
  nfsd race fixes: ext3
  nfsd race fixes: ext2
  nfsd/create race fixes, infrastructure
  filesystem notification: create fs/notify to contain all fs notification
  fs/block_dev.c: __read_mostly improvement and sb_is_blkdev_sb utilization
  kill ->dir_notify()
  filp_cachep can be static in fs/file_table.c
  fix f_count description in Documentation/filesystems/files.txt
  make INIT_FS use the __RW_LOCK_UNLOCKED initialization
  take init_fs to saner place
  kill vfs_permission
  pass a struct path * to may_open
  kill walk_init_root
  remove incorrect comment in inode_permission
  expand some comments (d_path / seq_path)
  correct wrong function name of d_put in kernel document and source comment
  fix switch_names() breakage in short-to-short case
  ...
2008-12-31 15:57:56 -08:00
Rusty Russell 8c384cdee3 cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
Impact: new debug CONFIG options

This helps find unconverted code.  It currently breaks compile horribly,
but we never wanted a flag day so that's expected.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-01 10:12:30 +10:30
Rusty Russell 41c7bb9588 cpumask: convert rest of files in kernel/
Impact: Reduce stack usage, use new cpumask API.

Mainly changing cpumask_t to 'struct cpumask' and similar simple API
conversion.  Two conversions worth mentioning:

1) we use cpumask_any_but to avoid a temporary in kernel/softlockup.c,
2) Use cpumask_var_t in taskstats_user_cmd().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
2009-01-01 10:12:28 +10:30
Rusty Russell bd232f97b3 cpumask: convert RCU implementations
Impact: use new cpumask API.

rcu_ctrlblk contains a cpumask, and it's highly optimized so I don't want
a cpumask_var_t (ie. a pointer) for the CONFIG_CPUMASK_OFFSTACK case.  It
could use a dangling bitmap, and be allocated in __rcu_init to save memory,
but for the moment we use a bitmap.

(Eventually 'struct cpumask' will be undefined for CONFIG_CPUMASK_OFFSTACK,
so we use a bitmap here to show we really mean it).

We remove on-stack cpumasks, using cpumask_var_t for
rcu_torture_shuffle_tasks() and for_each_cpu_and in force_quiescent_state().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-01 10:12:26 +10:30
Rusty Russell d036e67b40 cpumask: convert kernel/irq
Impact: Reduce stack usage, use new cpumask API.  ALPHA mod!

Main change is that irq_default_affinity becomes a cpumask_var_t, so
treat it as a pointer (this effects alpha).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-01 10:12:26 +10:30
Rusty Russell 6b954823c2 cpumask: convert kernel time functions
Impact: Use new APIs

Convert kernel/time functions to use struct cpumask *.

Note the ugly bitmap declarations in tick-broadcast.c.  These should
be cpumask_var_t, but there was no obvious initialization function to
put the alloc_cpumask_var() calls in.  This was safe.

(Eventually 'struct cpumask' will be undefined for CONFIG_CPUMASK_OFFSTACK,
so we use a bitmap here to show we really mean it).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
2009-01-01 10:12:25 +10:30
Rusty Russell ab53d472e7 bitmap: find_last_bit()
Impact: New API

As the name suggests.  For the moment everyone uses the generic one.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-01 10:12:19 +10:30
Rusty Russell 434ae514c2 m68k: define __fls
Like fls, but can't be handed 0 and returns the bit number.

(I broke this arch in linux-next by using __fls in generic code).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-01 10:12:18 +10:30
Al Viro 261bca86ed nfsd/create race fixes, infrastructure
new helpers - insert_inode_locked() and insert_inode_locked4().
Hash new inode, making sure that there's no such inode in icache
already.  If there is and it does not end up unhashed (as would
happen if we have nfsd trying to resolve a bogus fhandle), fail.
Otherwise insert our inode into hash and succeed.

In either case have i_state set to new+locked; cleanup ends up
being simpler with such calling conventions.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31 18:07:43 -05:00
Al Viro 6badd79bd0 kill ->dir_notify()
Remove the hopelessly misguided ->dir_notify().  The only instance (cifs)
has been broken by design from the very beginning; the objects it creates
are never destroyed, keep references to struct file they can outlive, nothing
that could possibly evict them exists on close(2) path *and* no locking
whatsoever is done to prevent races with close(), should the previous, er,
deficiencies someday be dealt with.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31 18:07:43 -05:00
Eric Dumazet b6b3fdead2 filp_cachep can be static in fs/file_table.c
Instead of creating the "filp" kmem_cache in vfs_caches_init(),
we can do it a litle be later in files_init(), so that filp_cachep
is static to fs/file_table.c

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31 18:07:42 -05:00
Al Viro 18d8fda7c3 take init_fs to saner place
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31 18:07:42 -05:00
Christoph Hellwig cb23beb551 kill vfs_permission
With all the nameidata removal there's no point anymore for this helper.
Of the three callers left two will go away with the next lookup series
anyway.

Also add proper kerneldoc to inode_permission as this is the main
permission check routine now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31 18:07:41 -05:00
Christoph Hellwig 3fb64190aa pass a struct path * to may_open
No need for the nameidata in may_open - a struct path is enough.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31 18:07:41 -05:00
Duane Griffin 035146851c vfs: introduce helper function to safely NUL-terminate symlinks
A number of filesystems were potentially triggering kernel bugs due to
corrupted symlink names on disk. This function helps safely terminate
the names.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Duane Griffin <duaneg@dghda.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31 18:07:38 -05:00
Jan Engelhardt dded4f4d50 include: linux/fs.h: put declarations in __KERNEL__
include/linux/fs.h contains externs for a bunch of variables.  That obviously
belongs under ifdef __KERNEL__.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31 18:07:38 -05:00
Nick Piggin c2452f3278 shrink struct dentry
struct dentry is one of the most critical structures in the kernel. So it's
sad to see it going neglected.

With CONFIG_PROFILING turned on (which is probably the common case at least
for distros and kernel developers), sizeof(struct dcache) == 208 here
(64-bit). This gives 19 objects per slab.

I packed d_mounted into a hole, and took another 4 bytes off the inline
name length to take the padding out from the end of the structure. This
shinks it to 200 bytes. I could have gone the other way and increased the
length to 40, but I'm aiming for a magic number, read on...

I then got rid of the d_cookie pointer. This shrinks it to 192 bytes. Rant:
why was this ever a good idea? The cookie system should increase its hash
size or use a tree or something if lookups are a problem. Also the "fast
dcookie lookups" in oprofile should be moved into the dcookie code -- how
can oprofile possibly care about the dcookie_mutex? It gets dropped after
get_dcookie() returns so it can't be providing any sort of protection.

At 192 bytes, 21 objects fit into a 4K page, saving about 3MB on my system
with ~140 000 entries allocated. 192 is also a multiple of 64, so we get
nice cacheline alignment on 64 and 32 byte line systems -- any given dentry
will now require 3 cachelines to touch all fields wheras previously it
would require 4.

I know the inline name size was chosen quite carefully, however with the
reduction in cacheline footprint, it should actually be just about as fast
to do a name lookup for a 36 character name as it was before the patch (and
faster for other sizes). The memory footprint savings for names which are
<= 32 or > 36 bytes long should more than make up for the memory cost for
33-36 byte names.

Performance is a feature...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31 18:07:38 -05:00
Kentaro Takeda be6d3e56a6 introduce new LSM hooks where vfsmount is available.
Add new LSM hooks for path-based checks.  Call them on directory-modifying
operations at the points where we still know the vfsmount involved.

Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31 18:07:37 -05:00
Anton Vorontsov 9c43df5791 mmc_spi: Add support for OpenFirmware bindings
The support is implemented via platform data accessors, new module
(of_mmc_spi) will be created automatically when the driver compiles
on OpenFirmware platforms. Link-time dependency will load the module
automatically.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2008-12-31 19:01:55 +01:00
Paul Moore 6c2e8ac095 netlabel: Update kernel configuration API
Update the NetLabel kernel API to expose the new features added in kernel
releases 2.6.25 and 2.6.28: the static/fallback label functionality and network
address based selectors.

Signed-off-by: Paul Moore <paul.moore@hp.com>
2008-12-31 12:54:11 -05:00
Anton Vorontsov 86e8286a0e mmc: Add mmc_vddrange_to_ocrmask() helper function
This function sets the OCR mask bits according to provided voltage
ranges. Will be used by the mmc_spi OpenFirmware bindings.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2008-12-31 18:18:13 +01:00
Jarkko Lavinen b30f8af335 mmc: Add 8-bit bus width support
Signed-off-by: Jarkko Lavinen <jarkko.lavinen@nokia.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2008-12-31 18:18:12 +01:00
Linus Torvalds db200df0b3 Merge branch 'irq-fixes-for-linus-4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-fixes-for-linus-4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sparseirq: move __weak symbols into separate compilation unit
  sparseirq: work around __weak alias bug
  sparseirq: fix hang with !SPARSE_IRQ
  sparseirq: set lock_class for legacy irq when sparse_irq is selected
  sparseirq: work around compiler optimizing away __weak functions
  sparseirq: fix desc->lock init
  sparseirq: do not printk when migrating IRQ descriptors
  sparseirq: remove duplicated arch_early_irq_init()
  irq: simplify for_each_irq_desc() usage
  proc: remove ifdef CONFIG_SPARSE_IRQ from stat.c
  irq: for_each_irq_desc() move to irqnr.h
  hrtimer: remove #include <linux/irq.h>
2008-12-31 09:00:59 -08:00
Jan Kiszka 4531220b71 KVM: x86: Rework user space NMI injection as KVM_CAP_USER_NMI
There is no point in doing the ready_for_nmi_injection/
request_nmi_window dance with user space. First, we don't do this for
in-kernel irqchip anyway, while the code path is the same as for user
space irqchip mode. And second, there is nothing to loose if a pending
NMI is overwritten by another one (in contrast to IRQs where we have to
save the number). Actually, there is even the risk of raising spurious
NMIs this way because the reason for the held-back NMI might already be
handled while processing the first one.

Therefore this patch creates a simplified user space NMI injection
interface, exporting it under KVM_CAP_USER_NMI and dropping the old
KVM_CAP_NMI capability. And this time we also take care to provide the
interface only on archs supporting NMIs via KVM (right now only x86).

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:55:47 +02:00
Mark McLoughlin defaf1587c KVM: fix handling of ACK from shared guest IRQ
If an assigned device shares a guest irq with an emulated
device then we currently interpret an ack generated by the
emulated device as originating from the assigned device
leading to e.g. "Unbalanced enable for IRQ 4347" from the
enable_irq() in kvm_assigned_dev_ack_irq().

The fix is fairly simple - don't enable the physical device
irq unless it was previously disabled.

Of course, this can still lead to a situation where a
non-assigned device ACK can cause the physical device irq to
be reenabled before the device was serviced. However, being
level sensitive, the interrupt will merely be regenerated.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:55:47 +02:00
Avi Kivity 1a811b6167 KVM: Advertise the bug in memory region destruction as fixed
Userspace might need to act differently.

Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:55:46 +02:00
Sheng Yang 6b9cc7fd46 KVM: Enable MSI for device assignment
We enable guest MSI and host MSI support in this patch. The userspace want to
enable MSI should set KVM_DEV_IRQ_ASSIGN_ENABLE_MSI in the assigned_irq's flag.
Function would return -ENOTTY if can't enable MSI, userspace shouldn't set MSI
Enable bit when KVM_ASSIGN_IRQ return -ENOTTY with
KVM_DEV_IRQ_ASSIGN_ENABLE_MSI.

Userspace can tell the support of MSI device from #ifdef KVM_CAP_DEVICE_MSI.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:55:02 +02:00
Sheng Yang 0937c48d07 KVM: Add fields for MSI device assignment
Prepared for kvm_arch_assigned_device_msi_dispatch().

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:55:01 +02:00
Sheng Yang 4f906c19ae KVM: Replace irq_requested with more generic irq_requested_type
Separate guest irq type and host irq type, for we can support guest using INTx
with host using MSI (but not opposite combination).

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:55:01 +02:00
Sheng Yang e19e30effa KVM: IRQ ACK notifier should be used with in-kernel irqchip
Also remove unnecessary parameter of unregister irq ack notifier.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:51:47 +02:00
Jan Kiszka c4abb7c9cd KVM: x86: Support for user space injected NMIs
Introduces the KVM_NMI IOCTL to the generic x86 part of KVM for
injecting NMIs from user space and also extends the statistic report
accordingly.

Based on the original patch by Sheng Yang.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:51:42 +02:00
Martin Schwidefsky 79741dd357 [PATCH] idle cputime accounting
The cpu time spent by the idle process actually doing something is
currently accounted as idle time. This is plain wrong, the architectures
that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the
time spent doing nothing and the time spent by idle doing work. The first
is accounted with account_idle_time and the second with account_system_time.
The architectures that use the account_xxx_time interface directly and not
the account_xxx_ticks interface now need to do the check for the idle
process in their arch code. In particular to improve the system vs true
idle time accounting the arch code needs to measure the true idle time
instead of just testing for the idle process.
To improve the tick based accounting as well we would need an architecture
primitive that can tell us if the pt_regs of the interrupted context
points to the magic instruction that halts the cpu.

In addition idle time is no more added to the stime of the idle process.
This field now contains the system time of the idle process as it should
be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as
every tick that occurs while idle is running will be accounted as idle
time.

This patch contains the necessary common code changes to be able to
distinguish idle system time and true idle time. The architectures with
support for VIRT_CPU_ACCOUNTING need some changes to exploit this.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-12-31 15:11:46 +01:00
Martin Schwidefsky 457533a7d3 [PATCH] fix scaled & unscaled cputime accounting
The utimescaled / stimescaled fields in the task structure and the
global cpustat should be set on all architectures. On s390 the calls
to account_user_time_scaled and account_system_time_scaled never have
been added. In addition system time that is accounted as guest time
to the user time of a process is accounted to the scaled system time
instead of the scaled user time.
To fix the bugs and to prevent future forgetfulness this patch merges
account_system_time_scaled into account_system_time and
account_user_time_scaled into account_user_time.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Michael Neuling <mikey@neuling.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-12-31 15:11:46 +01:00
Rusty Russell 2ca1a61583 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	arch/x86/kernel/io_apic.c
2008-12-31 23:05:57 +10:30
Thomas Gleixner 1c5745aa38 sched_clock: prevent scd->clock from moving backwards, take #2
Redo:

  5b7dba4: sched_clock: prevent scd->clock from moving backwards

which had to be reverted due to s2ram hangs:

  ca7e716: Revert "sched_clock: prevent scd->clock from moving backwards"

... this time with resume restoring GTOD later in the sequence
taken into account as well.

The "timekeeping_suspended" flag is not very nice but we cannot call into
GTOD before it has been properly resumed and the scheduler will run very
early in the resume sequence.

Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-31 09:53:21 +01:00
robert.moore@intel.com e8443c358c ACPICA: Update version to 20081204.
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-12-31 01:18:26 -05:00
robert.moore@intel.com 4b67a0e467 ACPICA: FADT: set acpi_gbl_use_default_register_widths to TRUE by default
This returns the FADT support to the original behavior, which is
to use default register widths. However, now check each register
definition and report a warning if it differs from the default.
This is a first step to moving away from the default widths,
rather than outright believing the widths in all FADTs for all
machines, considered rather dangerous until more data is obtained.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-12-31 01:17:56 -05:00
Bob Moore 06f5541960 ACPICA: FADT parsing changes and fixes
1) Update the register lengths for the PM1 event blocks. The
length must be divided by two in order to use these to access
the status registers.
2) Add run-time option to use default register lengths to override a
faulty FADT.
3) Add warning message if any of the X64 address structures contain a length
that does not match the legacy length earlier in the FADT.
4) Move all FADT warning messages into the ValidateFadt function.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-12-31 01:17:09 -05:00
Bob Moore 1685bd404d ACPICA: Add ACPI_MUTEX_TYPE configuration option
Used to specify whether the OSL mutex interfaces should be used,
or binary semaphores instead.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-12-31 01:16:39 -05:00
Bob Moore 7488c8d511 ACPICA: Fixes for various ACPI data tables
Eliminate extraneous "zero length subtable" messages.
Fix subtable output for ERST, MCFG, EINJ tables.
Implement all subtables for HEST.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-12-31 01:16:09 -05:00
Bob Moore 50df4d8b0f ACPICA: Restructure includes into public/private
acpi.h now includes only the "public" acpica headers. All other
acpica headers are "private" and should not be included by acpica
users. One new file, accommon.h is used to include the commonly
used private headers for acpica code generation. Future plans
are to move all private headers to a new subdirectory.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-12-31 01:15:40 -05:00
Lin Ming ea7e96e0f2 ACPI: remove private acpica headers from driver files
External driver files should not include any private acpica headers.

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-12-31 01:15:22 -05:00
Bob Moore d3fd902d1e ACPICA: New: acpi_reset interface - write to reset register
Uses the FADT-defined reset register and reset value. Checks the
FADT flags for the reset register supported bit. Supports reset
register in memory or I/O space, but not in PCI config space
since the host has the information to do it.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-12-31 01:14:32 -05:00
Bob Moore ecfbbc7b46 ACPICA: New: acpi_read and acpi_write public interfaces
Changed the acpi_hw_low_level_read and acpi_hw_low_level_write functions to
the public acpi_read and acpi_write to allow direct access to
ACPI registers.  Removed the "width" parameter since the width
can be obtained from the input GAS structure. Updated the FADT
initialization to setup the GAS structures with the proper
widths. Some widths are still hardcoded because many FADTs have
incorrect register lengths.

Signed-off-by: Bob Moore <robert.moore@intel.com
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-12-31 01:12:56 -05:00
Bob Moore 08ac07b826 ACPICA: New: Public GPE group enable/disable interfaces
Added acpi_disable_all_gpes and acpi_enable_all_runtime_gpes for
public use.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-12-31 01:10:46 -05:00
Bob Moore e97d6bf1a0 ACPICA: New: acpi_get_gpe_device interface
This function maps an input GPE index to a GPE block device. Also
Added acpi_current_gpe_count to track the current number of GPEs
that are being managed by the ACPICA core (both FADT-based GPEs
and the GPEs contained in GPE block devices.)

Modify drivers/acpi/system.c to use these 2 new interfaces

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-12-31 01:10:24 -05:00
Lin Ming 889c78be9e ACPI: osl.c: replace return_ACPI_STATUS with return
return_ACPI_STATUS is an internal acpica function, replace it with return.
acpi_gbl_permanent_mmap moved from acglobal.h to acpixf.h for external use

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-12-31 01:02:05 -05:00
Len Brown 087da3b4e2 ACPI: simplify buffer management for acpi_pci_bind() etc.
use ACPI_ALLOCATE_BUFFER to remove the allocations
within acpi_pci_bind(), acpi_pci_unbind() and acpi_pci_bind_root().
While there, delete some unnecessary param inits from those routines.

Delete concept of ACPI_PATHNAME_MAX, since this was the last use.

Signed-off-by: Len Brown <len.brown@intel.com>
2008-12-30 22:52:26 -05:00
Bjorn Helgaas f748bafa3c ACPI: PCI: move struct acpi_prt_entry declaration out of public header file
The struct acpi_prt_entry is used only in pci_irq.c, so there's no
need for the declaration to be public.  This patch moves it into
pci_irq.c.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-12-30 21:20:23 -05:00
Linus Torvalds 6a94cb7306 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (184 commits)
  [XFS] Fix race in xfs_write() between direct and buffered I/O with DMAPI
  [XFS] handle unaligned data in xfs_bmbt_disk_get_all
  [XFS] avoid memory allocations in xfs_fs_vcmn_err
  [XFS] Fix speculative allocation beyond eof
  [XFS] Remove XFS_BUF_SHUT() and friends
  [XFS] Use the incore inode size in xfs_file_readdir()
  [XFS] set b_error from bio error in xfs_buf_bio_end_io
  [XFS] use inode_change_ok for setattr permission checking
  [XFS] add a FMODE flag to make XFS invisible I/O less hacky
  [XFS] resync headers with libxfs
  [XFS] simplify projid check in xfs_rename
  [XFS] replace b_fspriv with b_mount
  [XFS] Remove unused tracing code
  [XFS] Remove unnecessary assertion
  [XFS] Remove unused variable in ktrace_free()
  [XFS] Check return value of xfs_buf_get_noaddr()
  [XFS] Fix hang after disallowed rename across directory quota domains
  [XFS] Fix compile with CONFIG_COMPAT enabled
  move inode tracing out of xfs_vnode.
  move vn_iowait / vn_iowake into xfs_aops.c
  ...
2008-12-30 17:48:25 -08:00
Linus Torvalds f57fa1d6a6 Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (70 commits)
  fs/nfs/nfs4proc.c: make nfs4_map_errors() static
  rpc: add service field to new upcall
  rpc: add target field to new upcall
  nfsd: support callbacks with gss flavors
  rpc: allow gss callbacks to client
  rpc: pass target name down to rpc level on callbacks
  nfsd: pass client principal name in rsc downcall
  rpc: implement new upcall
  rpc: store pointer to pipe inode in gss upcall message
  rpc: use count of pipe openers to wait for first open
  rpc: track number of users of the gss upcall pipe
  rpc: call release_pipe only on last close
  rpc: add an rpc_pipe_open method
  rpc: minor gss_alloc_msg cleanup
  rpc: factor out warning code from gss_pipe_destroy_msg
  rpc: remove unnecessary assignment
  NFS: remove unused status from encode routines
  NFS: increment number of operations in each encode routine
  NFS: fix comment placement in nfs4xdr.c
  NFS: fix tabs in nfs4xdr.c
  ...
2008-12-30 17:45:45 -08:00
Linus Torvalds 590cf28580 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (104 commits)
  [SCSI] fcoe: fix configuration problems
  [SCSI] cxgb3i: fix select/depend problem
  [SCSI] fcoe: fix incorrect use of struct module
  [SCSI] cxgb3i: remove use of skb->sp
  [SCSI] cxgb3i: Add cxgb3i iSCSI driver.
  [SCSI] zfcp: Remove unnecessary warning message
  [SCSI] zfcp: Add support for unchained FSF requests
  [SCSI] zfcp: Remove busid macro
  [SCSI] zfcp: remove DID_DID flag
  [SCSI] zfcp: Simplify mask lookups for incoming RSCNs
  [SCSI] zfcp: Remove initial device data from zfcp_data
  [SCSI] zfcp: fix compile warning
  [SCSI] zfcp: Remove adapter list
  [SCSI] zfcp: Simplify SBAL allocation to fix sparse warnings
  [SCSI] zfcp: register with SCSI layer on ccw registration
  [SCSI] zfcp: Fix message line break
  [SCSI] qla2xxx: changes in multiq code
  [SCSI] eata: fix the data buffer accessors conversion regression
  [SCSI] ibmvfc: Improve async event handling
  [SCSI] lpfc : correct printk types on PPC compiles
  ...
2008-12-30 17:43:10 -08:00
Linus Torvalds f54a6ec0fd Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (583 commits)
  V4L/DVB (10130): use USB API functions rather than constants
  V4L/DVB (10129): dvb: remove deprecated use of RW_LOCK_UNLOCKED in frontends
  V4L/DVB (10128): modify V4L documentation to be a valid XHTML
  V4L/DVB (10127): stv06xx: Avoid having y unitialized
  V4L/DVB (10125): em28xx: Don't do AC97 vendor detection for i2s audio devices
  V4L/DVB (10124): em28xx: expand output formats available
  V4L/DVB (10123): em28xx: fix reversed definitions of I2S audio modes
  V4L/DVB (10122): em28xx: don't load em28xx-alsa for em2870 based devices
  V4L/DVB (10121): em28xx: remove worthless Pinnacle PCTV HD Mini 80e device profile
  V4L/DVB (10120): em28xx: remove redundant Pinnacle Dazzle DVC 100 profile
  V4L/DVB (10119): em28xx: fix corrupted XCLK value
  V4L/DVB (10118): zoran: fix warning for a variable not used
  V4L/DVB (10116): af9013: Fix gcc false warnings
  V4L/DVB (10111a): usbvideo.h: remove an useless blank line
  V4L/DVB (10111): quickcam_messenger.c: fix a warning
  V4L/DVB (10110): v4l2-ioctl: Fix warnings when using .unlocked_ioctl = __video_ioctl2
  V4L/DVB (10109): anysee: Fix usage of an unitialized function
  V4L/DVB (10104): uvcvideo: Add support for video output devices
  V4L/DVB (10102): uvcvideo: Ignore interrupt endpoint for built-in iSight webcams.
  V4L/DVB (10101): uvcvideo: Fix bulk URB processing when the header is erroneous
  ...
2008-12-30 17:41:32 -08:00
Linus Torvalds ab70537c32 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  lguest: struct device - replace bus_id with dev_name()
  lguest: move the initial guest page table creation code to the host
  kvm-s390: implement config_changed for virtio on s390
  virtio_console: support console resizing
  virtio: add PCI device release() function
  virtio_blk: fix type warning
  virtio: block: dynamic maximum segments
  virtio: set max_segment_size and max_sectors to infinite.
  virtio: avoid implicit use of Linux page size in balloon interface
  virtio: hand virtio ring alignment as argument to vring_new_virtqueue
  virtio: use KVM_S390_VIRTIO_RING_ALIGN instead of relying on pagesize
  virtio: use LGUEST_VRING_ALIGN instead of relying on pagesize
  virtio: Don't use PAGE_SIZE for vring alignment in virtio_pci.
  virtio: rename 'pagesize' arg to vring_init/vring_size
  virtio: Don't use PAGE_SIZE in virtio_pci.c
  virtio: struct device - replace bus_id with dev_name(), dev_set_name()
  virtio-pci queue allocation not page-aligned
2008-12-30 17:37:25 -08:00
Linus Torvalds 14a3c4ab0e Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: (407 commits)
  [ARM] pxafb: add support for overlay1 and overlay2 as framebuffer devices
  [ARM] pxafb: cleanup of the timing checking code
  [ARM] pxafb: cleanup of the color format manipulation code
  [ARM] pxafb: add palette format support for LCCR4_PAL_FOR_3
  [ARM] pxafb: add support for FBIOPAN_DISPLAY by dma braching
  [ARM] pxafb: allow pxafb_set_par() to start from arbitrary yoffset
  [ARM] pxafb: allow video memory size to be configurable
  [ARM] pxa: add document on the MFP design and how to use it
  [ARM] sa1100_wdt: don't assume CLOCK_TICK_RATE to be a constant
  [ARM] rtc-sa1100: don't assume CLOCK_TICK_RATE to be a constant
  [ARM] pxa/tavorevb: update board support (smartpanel LCD + keypad)
  [ARM] pxa: Update eseries defconfig
  [ARM] 5352/1: add w90p910-plat config file
  [ARM] s3c: S3C options should depend on PLAT_S3C
  [ARM] mv78xx0: implement GPIO and GPIO interrupt support
  [ARM] Kirkwood: implement GPIO and GPIO interrupt support
  [ARM] Orion: share GPIO IRQ handling code
  [ARM] Orion: share GPIO handling code
  [ARM] s3c: define __io using the typesafe version
  [ARM] S3C64XX: Ensure CPU_V6 is selected
  ...
2008-12-30 17:36:49 -08:00
Linus Torvalds 74a6d0f064 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (33 commits)
  ide-cd: remove dead dsc_overlap setting
  ide: push local_irq_{save,restore}() to do_identify()
  ide: remove superfluous local_irq_{save,restore}() from ide_dump_status()
  ide: move legacy ISA/VLB ports handling to ide-legacy.c (v2)
  ide: move Power Management support to ide-pm.c
  ide: use ATA_DMA_* defines in ide-dma-sff.c
  ide: checkpatch.pl fixes for ide-lib.c
  ide: remove inline tags from ide-probe.c
  ide: remove redundant code from ide_end_drive_cmd()
  ide: struct device - replace bus_id with dev_name(), dev_set_name()
  ide: rework handling of serialized ports (v2)
  cy82c693: remove superfluous ide_cy82c693 chipset type
  trm290: add IDE_HFLAG_TRM290 host flag
  ide: add ->max_sectors field to struct ide_port_info
  rz1000: apply chipset quirks early (v2)
  ide: always set nIEN on idle devices
  ide: fix ->quirk_list checking in ide_do_request()
  gayle: set IDE_HFLAG_SERIALIZE explictly
  cmd64x: set IDE_HFLAG_SERIALIZE explictly for CMD646
  ali14xx: doesn't use shared IRQs
  ...
2008-12-30 17:34:37 -08:00