Commit Graph

20866 Commits

Author SHA1 Message Date
Wu Zhangjin b70e4f0529 tracing: Fix undeclared ENOSYS in include/linux/tracepoint.h
The header file include/linux/tracepoint.h may be included without
include/linux/errno.h and then the compiler will fail on building for
undelcared ENOSYS. This patch fixes this problem via including <linux/errno.h>
to include/linux/tracepoint.h.

Signed-off-by: Wu Zhangjin <wuzhangjin@gmail.com>
LKML-Reference: <1277118549-622-1-git-send-email-wuzhangjin@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-06-21 12:23:36 -04:00
Sjur Braendeland 69ad78208e caif: Add debug connection type for CAIF.
Added new CAIF protocol type CAIFPROTO_DEBUG for accessing
CAIF debug on the ST Ericsson modems.

There are two debug servers on the modem, one for radio related
debug (CAIF_RADIO_DEBUG_SERVICE) and the other for
communication/application related debug (CAIF_COM_DEBUG_SERVICE).

The debug connection can contain trace debug printouts or
interactive debug used for debugging and test.

Debug connections can be of type STREAM or SEQPACKET.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-20 19:46:07 -07:00
Herbert Xu d29c0c5c33 udp: Add UFO to NETIF_F_SOFTWARE_GSO
This patch adds UFO to the list of GSO features with a software
fallback.  This allows UFO to be used even if the hardware does
not support it.

In particular, this allows us to test the UFO fallback, as it
has been reported to not work in some cases.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16 18:07:52 -07:00
Eric W. Biederman 3f551f9436 sock: Introduce cred_to_ucred
To keep the coming code clear and to allow both the sock
code and the scm code to share the logic introduce a
fuction to translate from struct cred to struct ucred.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16 14:55:35 -07:00
Eric W. Biederman 5c1469de75 user_ns: Introduce user_nsmap_uid and user_ns_map_gid.
Define what happens when a we view a uid from one user_namespace
in another user_namepece.

- If the user namespaces are the same no mapping is necessary.

- For most cases of difference use overflowuid and overflowgid,
  the uid and gid currently used for 16bit apis when we have a 32bit uid
  that does fit in 16bits.  Effectively the situation is the same,
  we want to return a uid or gid that is not assigned to any user.

- For the case when we happen to be mapping the uid or gid of the
  creator of the target user namespace use uid 0 and gid as confusing
  that user with root is not a problem.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16 14:55:34 -07:00
Jiri Kosina f1bbbb6912 Merge branch 'master' into for-next 2010-06-16 18:08:13 +02:00
Uwe Kleine-König 65155b3708 fix typos concerning "management"
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-06-16 18:03:16 +02:00
Uwe Kleine-König 46cd09a7de fix typos concerning "acquire"
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-06-16 18:03:15 +02:00
Herbert Xu d5f31fbfd8 netpoll: Use correct primitives for RCU dereferencing
Now that RCU debugging checks for matching rcu_dereference calls
and rcu_read_lock, we need to use the correct primitives or face
nasty warnings.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 21:44:29 -07:00
Eric Dumazet 5933dd2f02 net: NET_SKB_PAD should depend on L1_CACHE_BYTES
In old kernels, NET_SKB_PAD was defined to 16.

Then commit d6301d3dd1 (net: Increase default NET_SKB_PAD to 32), and
commit 18e8c134f4 (net: Increase NET_SKB_PAD to 64 bytes) increased it
to 64.

While first patch was governed by network stack needs, second was more
driven by performance issues on current hardware. Real intent was to
align data on a cache line boundary.

So use max(32, L1_CACHE_BYTES) instead of 64, to be more generic.

Remove microblaze and powerpc own NET_SKB_PAD definitions.

Thanks to Alexander Duyck and David Miller for their comments.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 18:16:43 -07:00
Ben Hutchings 82695d9b18 net: Fix error in comment on net_device_ops::ndo_get_stats
ndo_get_stats still returns struct net_device_stats *; there is
no struct net_device_stats64.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 15:08:48 -07:00
David S. Miller 16fb62b6b4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2010-06-15 13:49:24 -07:00
Jiri Pirko f350a0a873 bridge: use rx_handler_data pointer to store net_bridge_port pointer
Register net_bridge_port pointer as rx_handler data pointer. As br_port is
removed from struct net_device, another netdev priv_flag is added to indicate
the device serves as a bridge port. Also rcuized pointers are now correctly
dereferenced in br_fdb.c and in netfilter parts.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 11:48:58 -07:00
Jiri Pirko a35e2c1b6d macvlan: use rx_handler_data pointer to store macvlan_port pointer V2
Register macvlan_port pointer as rx_handler data pointer. As macvlan_port is
removed from struct net_device, another netdev priv_flag is added to indicate
the device serves as a macvlan port.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 11:47:12 -07:00
Jiri Pirko 93e2c32b5c net: add rx_handler data pointer
Add possibility to register rx_handler data pointer along with a rx_handler.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 11:47:11 -07:00
Herbert Xu c18370f5b2 netpoll: Add netpoll_tx_running
This patch adds the helper netpoll_tx_running for use within
ndo_start_xmit.  It returns non-zero if ndo_start_xmit is being
invoked by netpoll, and zero otherwise.

This is currently implemented by simply looking at the hardirq
count.  This is because for all non-netpoll uses of ndo_start_xmit,
IRQs must be enabled while netpoll always disables IRQs before
calling ndo_start_xmit.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 10:58:40 -07:00
Herbert Xu 8fdd95ec16 netpoll: Allow netpoll_setup/cleanup recursion
This patch adds the functions __netpoll_setup/__netpoll_cleanup
which is designed to be called recursively through ndo_netpoll_seutp.

They must be called with RTNL held, and the caller must initialise
np->dev and ensure that it has a valid reference count.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 10:58:40 -07:00
Herbert Xu 4247e161b1 netpoll: Add ndo_netpoll_setup
This patch adds ndo_netpoll_setup as the initialisation primitive
to complement ndo_netpoll_cleanup.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 10:58:39 -07:00
Herbert Xu de85d99eb7 netpoll: Fix RCU usage
The use of RCU in netpoll is incorrect in a number of places:

1) The initial setting is lacking a write barrier.
2) The synchronize_rcu is in the wrong place.
3) Read barriers are missing.
4) Some places are even missing rcu_read_lock.
5) npinfo is zeroed after freeing.

This patch fixes those issues.  As most users are in BH context,
this also converts the RCU usage to the BH variant.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 10:58:38 -07:00
Patrick McHardy f9181f4ffc Merge branch 'master' of /repos/git/net-next-2.6
Conflicts:
	include/net/netfilter/xt_rateest.h
	net/bridge/br_netfilter.c
	net/netfilter/nf_conntrack_core.c

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-15 17:31:06 +02:00
Luciano Coelho 0902b469bd netfilter: xtables: idletimer target implementation
This patch implements an idletimer Xtables target that can be used to
identify when interfaces have been idle for a certain period of time.

Timers are identified by labels and are created when a rule is set with a new
label.  The rules also take a timeout value (in seconds) as an option.  If
more than one rule uses the same timer label, the timer will be restarted
whenever any of the rules get a hit.

One entry for each timer is created in sysfs.  This attribute contains the
timer remaining for the timer to expire.  The attributes are located under
the xt_idletimer class:

/sys/class/xt_idletimer/timers/<label>

When the timer expires, the target module sends a sysfs notification to the
userspace, which can then decide what to do (eg. disconnect to save power).

Cc: Timo Teras <timo.teras@iki.fi>
Signed-off-by: Luciano Coelho <luciano.coelho@nokia.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-15 15:04:00 +02:00
Ben Hutchings 1be3b5fe9d ethtool: Revert incorrect indentation changes
commit 97f8aefbbf "net: fix ethtool
coding style errors and warnings" changed the indentation of several
macro definitions in ethtool.h.  These definitions line up in the diff
where there is an extra character at the start of each line, but not
in the resulting file.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-14 23:13:06 -07:00
Dave Airlie da931a931d agp: drop vmalloc flag.
Since the code that was too ugly to live is upstream, we can use it now,
instead of rolling our own.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-06-15 09:56:01 +10:00
Eric Dumazet f5c5440d40 netfilter: nfnetlink_log: RCU conversion, part 2
- must use atomic_inc_not_zero() in instance_lookup_get()

- must use hlist_add_head_rcu() instead of hlist_add_head()

- must use hlist_del_rcu() instead of hlist_del()

- Introduce NFULNL_COPY_DISABLED to stop lockless reader from using an
instance, before we do final instance_put() on it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-14 16:15:23 +02:00
Jens Axboe 575f552012 Merge branch 'for-jens' of git://git.drbd.org/linux-2.6-drbd into for-linus 2010-06-14 12:54:57 +02:00
Philipp Reisner dc66c74de6 drbd: Fixed a race between disk-attach and unexpected state changes
This was a very hard to trigger race condition.

If we got a state packet from the peer, after drbd_nl_disk() has
already changed the disk state to D_NEGOTIATING but
after_state_ch() was not yet run by the worker, then receive_state()
might called drbd_sync_handshake(), which in turn crashed
when accessing p_uuid.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-06-14 12:19:41 +02:00
Ben Hutchings be1f3c2c02 net: Enable 64-bit net device statistics on 32-bit architectures
Use struct rtnl_link_stats64 as the statistics structure.

On 32-bit architectures, insert 32 bits of padding after/before each
field of struct net_device_stats to make its layout compatible with
struct rtnl_link_stats64.  Add an anonymous union in net_device; move
stats into the union and add struct rtnl_link_stats64 stats64.

Add net_device_ops::ndo_get_stats64, implementations of which will
return a pointer to struct rtnl_link_stats64.  Drivers that implement
this operation must not update the structure asynchronously.

Change dev_get_stats() to call ndo_get_stats64 if available, and to
return a pointer to struct rtnl_link_stats64.  Change callers of
dev_get_stats() accordingly.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-12 15:51:22 -07:00
Len Brown 42de5532f4 Merge branch 'bugzilla-13931-sleep-nvs' into release
Conflicts:
	drivers/acpi/sleep.c

Signed-off-by: Len Brown <len.brown@intel.com>
2010-06-12 01:15:40 -04:00
Linus Torvalds 4cea8706c3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  wimax/i2400m: fix missing endian correction read in fw loader
  net8139: fix a race at the end of NAPI
  pktgen: Fix accuracy of inter-packet delay.
  pkt_sched: gen_estimator: add a new lock
  net: deliver skbs on inactive slaves to exact matches
  ipv6: fix ICMP6_MIB_OUTERRORS
  r8169: fix mdio_read and update mdio_write according to hw specs
  gianfar: Revive the driver for eTSEC devices (disable timestamping)
  caif: fix a couple range checks
  phylib: Add support for the LXT973 phy.
  net: Print num_rx_queues imbalance warning only when there are allocated queues
2010-06-11 14:20:03 -07:00
David S. Miller 62522d36d7 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-06-11 13:32:31 -07:00
David S. Miller 14599f1e34 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
Conflicts:
	drivers/net/wireless/wl12xx/wl1271.h
	drivers/net/wireless/wl12xx/wl1271_cmd.h
2010-06-11 11:34:06 -07:00
Christoph Hellwig c5444198ca writeback: simplify and split bdi_start_writeback
bdi_start_writeback now never gets a superblock passed, so we can just remove
that case.  And to further untangle the code and flatten the call stack
split it into two trivial helpers for it's two callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-11 12:58:08 +02:00
John Fastabend 597a264b1a net: deliver skbs on inactive slaves to exact matches
Currently, the accelerated receive path for VLAN's will
drop packets if the real device is an inactive slave and
is not one of the special pkts tested for in
skb_bond_should_drop().  This behavior is different then
the non-accelerated path and for pkts over a bonded vlan.

For example,

vlanx -> bond0 -> ethx

will be dropped in the vlan path and not delivered to any
packet handlers at all.  However,

bond0 -> vlanx -> ethx

and

bond0 -> ethx

will be delivered to handlers that match the exact dev,
because the VLAN path checks the real_dev which is not a
slave and netif_recv_skb() doesn't drop frames but only
delivers them to exact matches.

This patch adds a sk_buff flag which is used for tagging
skbs that would previously been dropped and allows the
skb to continue to skb_netif_recv().  Here we add
logic to check for the deliver_no_wcard flag and if it
is set only deliver to handlers that match exactly.  This
makes both paths above consistent and gives pkt handlers
a way to identify skbs that come from inactive slaves.
Without this patch in some configurations skbs will be
delivered to handlers with exact matches and in others
be dropped out right in the vlan path.

I have tested the following 4 configurations in failover modes
and load balancing modes.

# bond0 -> ethx

# vlanx -> bond0 -> ethx

# bond0 -> vlanx -> ethx

# bond0 -> ethx
            |
  vlanx -> --

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-10 22:23:34 -07:00
Luotao Fu fb76dd10b9 Input: matrix_keypad - add support for clustered irq
This one adds support of a combined irq source for the whole matrix keypad.
This can be useful if all rows and columns of the keypad are e.g. connected
to a GPIO expander, which only has one interrupt line for all events on
every single GPIO.

Signed-off-by: Luotao Fu <l.fu@pengutronix.de>
Acked-by: Eric Miao <eric.y.miao@gmail.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-06-10 12:33:59 -07:00
Matthew Garrett dd4c4f17d7 suspend: Move NVS save/restore code to generic suspend functionality
Saving platform non-volatile state may be required for suspend to RAM as
well as hibernation. Move it to more generic code.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-06-10 11:02:34 -04:00
Alan Cox 79907d89c3 misc: Fix allocation 'borrowed' by vhost_net
10, 233 is allocated officially to /dev/kmview which is shipping in
Ubuntu and Debian distributions.  vhost_net seem to have borrowed it
without making a proper request and this causes regressions in the other
distributions.

vhost_net can use a dynamic minor so use that instead.  Also update the
file with a comment to try and avoid future misunderstandings.

cc: stable@kernel.org
Signed-off-by: Alan Cox <device@lanana.org>
[ We should have caught this before 2.6.34 got released.  - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-06-09 08:50:31 -07:00
Dave Chinner 0b5649278e writeback: pay attention to wbc->nr_to_write in write_cache_pages
If a filesystem writes more than one page in ->writepage, write_cache_pages
fails to notice this and continues to attempt writeback when wbc->nr_to_write
has gone negative - this trace was captured from XFS:

    wbc_writeback_start: towrt=1024
    wbc_writepage: towrt=1024
    wbc_writepage: towrt=0
    wbc_writepage: towrt=-1
    wbc_writepage: towrt=-5
    wbc_writepage: towrt=-21
    wbc_writepage: towrt=-85

This has adverse effects on filesystem writeback behaviour. write_cache_pages()
needs to terminate after a certain number of pages are written, not after a
certain number of calls to ->writepage are made.  This is a regression
introduced by 17bc6c30cf ("vfs: Add
no_nrwrite_index_update writeback control flag"), but cannot be reverted
directly due to subsequent bug fixes that have gone in on top of it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-06-08 18:12:44 -07:00
Peter Zijlstra dc61b1d65e sched: Fix PROVE_RCU vs cpu_cgroup
PROVE_RCU has a few issues with the cpu_cgroup because the scheduler
typically holds rq->lock around the css rcu derefs but the generic
cgroup code doesn't (and can't) know about that lock.

Provide means to add extra checks to the css dereference and use that
in the scheduler to annotate its users.

The addition of rq->lock to these checks is correct because the
cgroup_subsys::attach() method takes the rq->lock for each task it
moves, therefore by holding that lock, we ensure the task is pinned to
the current cgroup and the RCU derefence is valid.

That leaves one genuine race in __sched_setscheduler() where we used
task_group() without holding any of the required locks and thus raced
with the cgroup code. Solve this by moving the check under the
appropriate lock.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-08 18:44:04 +02:00
Eric Dumazet 5bfddbd46a netfilter: nf_conntrack: IPS_UNTRACKED bit
NOTRACK makes all cpus share a cache line on nf_conntrack_untracked
twice per packet. This is bad for performance.
__read_mostly annotation is also a bad choice.

This patch introduces IPS_UNTRACKED bit so that we can use later a
per_cpu untrack structure more easily.

A new helper, nf_ct_untracked_get() returns a pointer to
nf_conntrack_untracked.

Another one, nf_ct_untracked_status_or() is used by nf_nat_init() to add
IPS_NAT_DONE_MASK bits to untracked status.

nf_ct_is_untracked() prototype is changed to work on a nf_conn pointer.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-08 16:09:52 +02:00
Eric Dumazet bb69ae049f anycast: Some RCU conversions
- dev_get_by_flags() changed to dev_get_by_flags_rcu()

- ipv6_sock_ac_join() dont touch dev & idev refcounts
- ipv6_sock_ac_drop() dont touch dev & idev refcounts
- ipv6_sock_ac_close() dont touch dev & idev refcounts
- ipv6_dev_ac_dec() dount touch idev refcount
- ipv6_chk_acast_addr() dont touch idev refcount

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-07 22:49:25 -07:00
Tejun Heo 4daedcfe8c ahci: add pci quirk for JMB362
JMB362 is a new variant of jmicron controller which is similar to
JMB360 but has two SATA ports instead of one.  As there is no PATA
port, single function AHCI mode can be used as in JMB360.  Add pci
quirk for JMB362.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Aries Lee <arieslee@jmicron.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-06-07 16:03:10 -04:00
Konrad Rzeszutek Wilk d7ef1533a9 swiotlb: Make swiotlb bookkeeping functions visible in the header file.
We put the functions dealing with the operations on
the SWIOTLB buffer in the header and make those functions non-static.
And also make the functions exported via EXPORT_SYMBOL_GPL.

See "swiotlb: swiotlb: add swiotlb_tbl_map_single library function" for
full description of patchset.

[v2: swiotlb_sync_single_range_for_* no more. Remove usage.]

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Tested-by: Albert Herranz <albert_herranz@yahoo.es>
2010-06-07 11:59:27 -04:00
Konrad Rzeszutek Wilk 22d4826998 swiotlb: search and replace "int dir" with "enum dma_data_direction dir"
.. to catch anybody doing something funky.

See "swiotlb: swiotlb: add swiotlb_tbl_map_single library function" for
full description of patchset.

[v2: swiotlb_sync_single_range_* no more - removed usage]
[v3: enum dma_data_direction direction -> enum dma_data_direction dir]

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Tested-by: Albert Herranz <albert_herranz@yahoo.es>
2010-06-07 11:59:26 -04:00
FUJITA Tomonori abbceff7d7 swiotlb: add the swiotlb initialization function with iotlb memory
This enables the caller to initialize swiotlb with its own iotlb
memory.

See "swiotlb: swiotlb: add swiotlb_tbl_map_single library function" for
full description of patchset.

[v2: changed ..with_tlb to ..with_tbl]

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Albert Herranz <albert_herranz@yahoo.es>
2010-06-07 11:59:25 -04:00
David S. Miller eedc765ca4 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/sfc/net_driver.h
	drivers/net/sfc/siena.c
2010-06-06 17:42:02 -07:00
Andy Gospodarek bb1d912323 bonding: allow user-controlled output slave selection
v2: changed bonding module version, modified to apply on top of changes
from previous patch in series, and updated documentation to elaborate on
multiqueue awareness that now exists in bonding driver.

This patch give the user the ability to control the output slave for
round-robin and active-backup bonding.  Similar functionality was
discussed in the past, but Jay Vosburgh indicated he would rather see a
feature like this added to existing modes rather than creating a
completely new mode.  Jay's thoughts as well as Neil's input surrounding
some of the issues with the first implementation pushed us toward a
design that relied on the queue_mapping rather than skb marks.
Round-robin and active-backup modes were chosen as the first users of
this slave selection as they seemed like the most logical choices when
considering a multi-switch environment.

Round-robin mode works without any modification, but active-backup does
require inclusion of the first patch in this series and setting
the 'all_slaves_active' flag.  This will allow reception of unicast traffic on
any of the backup interfaces.

This was tested with IPv4-based filters as well as VLAN-based filters
with good results.

More information as well as a configuration example is available in the
patch to Documentation/networking/bonding.txt.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-05 02:23:17 -07:00
Alexander Duyck b78462ebc6 skbuff: add check for non-linear to warn_if_lro and needs_linearize
We can avoid an unecessary cache miss by checking if the skb is non-linear
before accessing gso_size/gso_type in skb_warn_if_lro, the same can also be
done to avoid a cache miss on nr_frags if data_len is 0.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-05 02:23:16 -07:00
Linus Torvalds 7f0d384caf Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  Minix: Clean up left over label
  fix truncate inode time modification breakage
  fix setattr error handling in sysfs, configfs
  fcntl: return -EFAULT if copy_to_user fails
  wrong type for 'magic' argument in simple_fill_super()
  fix the deadlock in qib_fs
  mqueue doesn't need make_bad_inode()
2010-06-04 21:12:39 -07:00
Linus Torvalds 90ec781973 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  module: fix bne2 "gave up waiting for init of module libcrc32c"
  module: verify_export_symbols under the lock
  module: move find_module check to end
  module: make locking more fine-grained.
  module: Make module sysfs functions private.
  module: move sysfs exposure to end of load_module
  module: fix kdb's illicit use of struct module_use.
  module: Make the 'usage' lists be two-way
2010-06-04 21:09:48 -07:00
Rusty Russell 6407ebb271 module: Make module sysfs functions private.
These were placed in the header in ef665c1a06 to get the various
SYSFS/MODULE config combintations to compile.

That may have been necessary then, but it's not now.  These functions
are all local to module.c.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
2010-06-05 11:17:36 +09:30
Rusty Russell c8e21ced08 module: fix kdb's illicit use of struct module_use.
Linus changed the structure, and luckily this didn't compile any more.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Martin Hicks <mort@sgi.com>
2010-06-05 11:17:36 +09:30
Linus Torvalds 2c02dfe7fe module: Make the 'usage' lists be two-way
When adding a module that depends on another one, we used to create a
one-way list of "modules_which_use_me", so that module unloading could
see who needs a module.

It's actually quite simple to make that list go both ways: so that we
not only can see "who uses me", but also see a list of modules that are
"used by me".

In fact, we always wanted that list in "module_unload_free()": when we
unload a module, we want to also release all the other modules that are
used by that module.  But because we didn't have that list, we used to
first iterate over all modules, and then iterate over each "used by me"
list of that module.

By making the list two-way, we simplify module_unload_free(), and it
allows for some trivial fixes later too.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cleaned & rebased)
2010-06-05 11:17:35 +09:30
Linus Torvalds 999fd1ab34 Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (23 commits)
  sh: Make intc messages consistent via pr_fmt.
  sh: make sure static declaration on ms7724se
  sh: make sure static declaration on mach-migor
  sh: make sure static declaration on mach-ecovec24
  sh: make sure static declaration on mach-ap325rxa
  clocksource: sh_cmt: compute mult and shift before registration
  clocksource: sh_tmu: compute mult and shift before registration
  sh: PIO disabling for x3proto and urquell.
  sh: mach-sdk7786: conditionally disable PIO support.
  sh: support for platforms without PIO.
  usb: r8a66597-hcd pio to mmio accessor conversion.
  usb: gadget: r8a66597-udc pio to mmio accessor conversion.
  usb: gadget: m66592-udc pio to mmio accessor conversion.
  sh: add romImage MMCIF boot for sh7724 and Ecovec V2
  sh: add boot code to MMCIF driver header
  sh: prepare MMCIF driver header file
  sh: allow romImage data between head.S and the zero page
  sh: Add support MMCIF for ecovec
  sh: remove duplicated #include
  input: serio: disable i8042 for non-cayman sh platforms.
  ...
2010-06-04 15:42:09 -07:00
Linus Torvalds 9a9620db07 Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core: (83 commits)
  i7core_edac: Better describe the supported devices
  Add support for Westmere to i7core_edac driver
  i7core_edac: don't free on success
  i7core_edac: Add support for X5670
  Always call i7core_[ur]dimm_check_mc_ecc_err
  i7core_edac: fix memory leak of i7core_dev
  EDAC: add __init to i7core_xeon_pci_fixup
  i7core_edac: Fix wrong device id for channel 1 devices
  i7core: add support for Lynnfield alternate address
  i7core_edac: Add initial support for Lynnfield
  i7core_edac: do not export static functions
  edac: fix i7core build
  edac: i7core_edac produces undefined behaviour on 32bit
  i7core_edac: Use a more generic approach for probing PCI devices
  i7core_edac: PCI device is called NONCORE, instead of NOCORE
  i7core_edac: Fix ringbuffer maxsize
  i7core_edac: First store, then increment
  i7core_edac: Better parse "any" addrmask
  i7core_edac: Use a lockless ringbuffer
  edac: Create an unique instance for each kobj
  ...
2010-06-04 15:39:54 -07:00
Linus Torvalds d2dd328b7f Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (27 commits)
  block: make blk_init_free_list and elevator_init idempotent
  block: avoid unconditionally freeing previously allocated request_queue
  pipe: change /proc/sys/fs/pipe-max-pages to byte sized interface
  pipe: change the privilege required for growing a pipe beyond system max
  pipe: adjust minimum pipe size to 1 page
  block: disable preemption before using sched_clock()
  cciss: call BUG() earlier
  Preparing 8.3.8rc2
  drbd: Reduce verbosity
  drbd: use drbd specific ratelimit instead of global printk_ratelimit
  drbd: fix hang on local read errors while disconnected
  drbd: Removed the now empty w_io_error() function
  drbd: removed duplicated #includes
  drbd: improve usage of MSG_MORE
  drbd: need to set socket bufsize early to take effect
  drbd: improve network latency, TCP_QUICKACK
  drbd: Revert "drbd: Create new current UUID as late as possible"
  brd: support discard
  Revert "writeback: fix WB_SYNC_NONE writeback from umount"
  Revert "writeback: ensure that WB_SYNC_NONE writeback with sb pinned is sync"
  ...
2010-06-04 15:37:44 -07:00
Oleg Nesterov 485d527686 sys_personality: change sys_personality() to accept "unsigned int" instead of u_long
task_struct->pesonality is "unsigned int", but sys_personality() paths use
"unsigned long pesonality".  This means that every assignment or
comparison is not right.  In particular, if this argument does not fit
into "unsigned int" __set_personality() changes the caller's personality
and then sys_personality() returns -EINVAL.

Turn this argument into "unsigned int" and avoid overflows.  Obviously,
this is the user-visible change, we just ignore the upper bits.  But this
can't break the sane application.

There is another thing which can confuse the poorly written applications.
User-space thinks that this syscall returns int, not long.  This means
that the returned value can be negative and look like the error code.  But
note that libc won't be confused and thus errno won't be set, and with
this patch the user-space can never get -1 unless sys_personality() really
fails.  And, most importantly, the negative RET != -1 is only possible if
that app previously called personality(RET).

Pointed-out-by: Wenming Zhang <wezhang@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-06-04 15:21:45 -07:00
Roberto Sassu 7d683a0999 wrong type for 'magic' argument in simple_fill_super()
It's used to superblock ->s_magic, which is unsigned long.

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Reviewed-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
CC: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-06-04 17:16:28 -04:00
FUJITA Tomonori 467429b475 ssb: remove the ssb DMA API
Now they are unnecessary.  We can use the generic DMA API with any bus.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Michael Buesch <mb@bu3sch.de>
Cc: Gary Zambrano <zambrano@broadcom.com>
Cc: Stefano Brivio <stefano.brivio@polimi.it>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-04 16:00:42 -04:00
FUJITA Tomonori 14f92952bf ssb: add dma_dev to ssb_device structure
Add dma_dev, a pointer to struct device, to struct ssb_device.  We pass it
to the generic DMA API with SSB_BUSTYPE_PCI and SSB_BUSTYPE_SSB.
ssb_devices_register() sets up it properly.

This is preparation for replacing the ssb bus specific DMA API (ssb_dma_*)
with the generic DMA API.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Michael Buesch <mb@bu3sch.de>
Cc: Gary Zambrano <zambrano@broadcom.com>
Cc: Stefano Brivio <stefano.brivio@polimi.it>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Cc: John W. Linville <linville@tuxdriver.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-04 16:00:41 -04:00
Linus Torvalds bc23416cd4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda-intel - fix wallclk variable update and condition
  ALSA: asihpi - Fix uninitialized variable
  ALSA: hda: Use LPIB for ASUS M2V
  usb/gadget: Replace the old USB audio FU definitions in f_audio.c
  ASoC: MX31ads sound support should depend on MACH_MX31ADS_WM1133_EV1
  ASoC: Add missing Kconfig entry for Phytec boards
  ALSA: usb-audio: export UAC2 clock selectors as mixer controls
  ALSA: usb-audio: clean up find_audio_control_unit()
  ALSA: usb-audio: add UAC2 sepecific Feature Unit controls
  ALSA: usb-audio: unify constants from specification
  ALSA: usb-audio: parse clock topology of UAC2 devices
  ALSA: usb-audio: fix selector unit string index accessor
  include/linux/usb/audio-v2.h: add more UAC2 details
  ALSA: usb-audio: support partially write-protected UAC2 controls
  ALSA: usb-audio: UAC2: clean up parsing of bmaControls
  ALSA: hda: Use LPIB for another mainboard
  ALSA: hda: Use mb31 quirk for an iMac model
  ALSA: hda: Use LPIB for an ASUS device
2010-06-04 09:48:03 -07:00
Justin P. Mattock 724df61592 fix comment typo in netdevice.h
Fix missing "of" in comment.

 Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-06-04 16:03:34 +02:00
Linus Torvalds ad8456361f Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  libata: implement on-demand HPA unlocking
  libata: use the enlarged capacity after late HPA unlock
  SCSI: implement sd_unlock_native_capacity()
  libata-sff: trivial corrections to Kconfig help text
  sata_nv: don't diddle with nIEN on mcp55
  sata_via: magic vt6421 fix for transmission problems w/ WD drives
2010-06-03 15:48:15 -07:00
Linus Torvalds f150dba6d4 Merge branch 'perf-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: Fix crash in swevents
  perf buildid-list: Fix --with-hits event processing
  perf scripts python: Give field dict to unhandled callback
  perf hist: fix objdump output parsing
  perf-record: Check correct pid when forking
  perf: Do the comm inheritance per thread in event__process_task
  perf: Use event__process_task from perf sched
  perf: Process comm events by tid
  blktrace: Fix new kernel-doc warnings
  perf_events: Fix unincremented buffer base on partial copy
  perf_events: Fix event scheduling issues introduced by transactional API
  perf_events, trace: Fix perf_trace_destroy(), mutex went missing
  perf_events, trace: Fix probe unregister race
  perf_events: Fix races in group composition
  perf_events: Fix races and clean up perf_event and perf_mmap_data interaction
2010-06-03 15:45:26 -07:00
Linus Torvalds 1067b6c2be Merge branch 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (41 commits)
  drm/radeon/kms: make sure display hw is disabled when suspending
  drm/vmwgfx: Allow userspace to change default layout. Bump minor.
  drm/vmwgfx: Fix framebuffer modesetting
  drm/vmwgfx: Fix vga save / restore with display topology.
  vgaarb: use MIT license
  vgaarb: convert pr_devel() to pr_debug()
  drm: fix typos in Linux DRM Developer's Guide
  drm/radeon/kms/pm: voltage fixes
  drm/radeon/kms/pm: radeon_set_power_state fixes
  drm/radeon/kms/pm: patch default power state with default clocks/voltages on r6xx+
  drm/radeon/kms/pm: enable SetVoltage on r7xx/evergreen
  drm/radeon/kms/pm: add support for SetVoltage cmd table (V2)
  drm/radeon/kms/evergreen: add initial CS parser
  drm/kms: disable/enable poll around switcheroo on/off
  drm/nouveau: fixup confusion over which handle the DSM is hanging off.
  drm/nouveau: attempt to get bios from ACPI v3
  drm/nv50: cast IGP memory location to u64 before shifting
  drm/ttm: Fix ttm_page_alloc.c
  drm/ttm: Fix cached TTM page allocation.
  drm/vmwgfx: Remove some leftover debug messages.
  ...
2010-06-03 07:19:45 -07:00
Jens Axboe ff9da691c0 pipe: change /proc/sys/fs/pipe-max-pages to byte sized interface
This changes the interface to be based on bytes instead. The API
matches that of F_SETPIPE_SZ in that it rounds up the passed in
size so that the resulting page array is a power-of-2 in size.

The proc file is renamed to /proc/sys/fs/pipe-max-size to
reflect this change.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-03 14:54:39 +02:00
Eric Dumazet bc10502dba net: use __packed annotation
cleanup patch.

Use new __packed annotation in net/ and include/
(except netfilter)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-03 03:21:52 -07:00
Eric Dumazet b5f7e75547 ipv4: add LINUX_MIB_IPRPFILTER snmp counter
Christoph Lameter mentioned that packets could be dropped in input path
because of rp_filter settings, without any SNMP counter being
incremented. System administrator can have a hard time to track the
problem.

This patch introduces a new counter, LINUX_MIB_IPRPFILTER, incremented
each time we drop a packet because Reverse Path Filter triggers.

(We receive an IPv4 datagram on a given interface, and find the route to
send an answer would use another interface)

netstat -s | grep IPReversePathFilter
    IPReversePathFilter: 21714

Reported-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-03 03:18:19 -07:00
Tiago Vignatti c0db9cbc73 vgaarb: use MIT license
Signed-off-by: Tiago Vignatti <tiago.vignatti@nokia.com>
Cc: Henry Zhao <Henry.Zhao@Sun.COM>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-06-03 13:13:34 +10:00
Walter Goldens 77c2061d10 wireless: fix several minor description typos
Signed-off-by: Walter Goldens <goldenstranger@yahoo.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-02 16:13:18 -04:00
Tejun Heo d8d9129ea2 libata: implement on-demand HPA unlocking
Implement ata_scsi_unlock_native_capacity() which will be called
through SCSI layer when block layer notices that partitions on a
device extend beyond the end of the device.  It requests EH to unlock
HPA, waits for completion and returns the current device capacity.

This allows libata to unlock HPA on demand instead of having to decide
whether to unlock upfront.  Unlocking on demand is safer than
unlocking by upfront because some BIOSes write private data to the
area beyond HPA limit.  This was suggested by Ben Hutchings.

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-06-02 13:50:10 -04:00
Jiri Pirko ab95bfe01f net: replace hooks in __netif_receive_skb V5
What this patch does is it removes two receive frame hooks (for bridge and for
macvlan) from __netif_receive_skb. These are replaced them with a single
hook for both. It only supports one hook per device because it makes no
sense to do bridging and macvlan on the same device.

Then a network driver (of virtual netdev like macvlan or bridge) can register
an rx_handler for needed net device.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-02 07:11:15 -07:00
Scott McMillan 614f60fa9d packet_mmap: expose hw packet timestamps to network packet capture utilities
This patch adds a setting, PACKET_TIMESTAMP, to specify the packet
timestamp source that is exported to capture utilities like tcpdump by
packet_mmap.

PACKET_TIMESTAMP accepts the same integer bit field as
SO_TIMESTAMPING.  However, only the SOF_TIMESTAMPING_SYS_HARDWARE and
SOF_TIMESTAMPING_RAW_HARDWARE values are currently recognized by
PACKET_TIMESTAMP.  SOF_TIMESTAMPING_SYS_HARDWARE takes precedence over
SOF_TIMESTAMPING_RAW_HARDWARE if both bits are set.

If PACKET_TIMESTAMP is not set, a software timestamp generated inside
the networking stack is used (the behavior before this setting was
added).

Signed-off-by: Scott McMillan <scott.a.mcmillan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-02 05:53:56 -07:00
Takashi Iwai c7a441bba9 Merge branch 'fix/hda' into for-linus 2010-06-02 14:18:06 +02:00
Eric Dumazet c2d9ba9bce net: CONFIG_NET_NS reduction
Use read_pnet() and write_pnet() to reduce number of ifdef CONFIG_NET_NS

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-02 05:16:23 -07:00
Linus Torvalds 1f73897861 Merge branch 'for-35' of git://repo.or.cz/linux-kbuild
* 'for-35' of git://repo.or.cz/linux-kbuild: (81 commits)
  kbuild: Revert part of e8d400a to resolve a conflict
  kbuild: Fix checking of scm-identifier variable
  gconfig: add support to show hidden options that have prompts
  menuconfig: add support to show hidden options which have prompts
  gconfig: remove show_debug option
  gconfig: remove dbg_print_ptype() and dbg_print_stype()
  kconfig: fix zconfdump()
  kconfig: some small fixes
  add random binaries to .gitignore
  kbuild: Include gen_initramfs_list.sh and the file list in the .d file
  kconfig: recalc symbol value before showing search results
  .gitignore: ignore *.lzo files
  headerdep: perlcritic warning
  scripts/Makefile.lib: Align the output of LZO
  kbuild: Generate modules.builtin in make modules_install
  Revert "kbuild: specify absolute paths for cscope"
  kbuild: Do not unnecessarily regenerate modules.builtin
  headers_install: use local file handles
  headers_check: fix perl warnings
  export_report: fix perl warnings
  ...
2010-06-01 08:55:52 -07:00
Jens Axboe b4ca761577 Merge branch 'master' into for-linus
Conflicts:
	fs/pipe.c

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-01 12:42:12 +02:00
Jens Axboe 28f4197e5d block: disable preemption before using sched_clock()
Commit 9195291e5f added calls to
sched_clock() from preemptible code. sched_clock() is both the
wrong interface AND cannot be called without preempt disabled.

Apply a temporary fix to get rid of the warnings, a real patch
is in the works.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-01 12:23:18 +02:00
Philipp Reisner 099c5c310e Preparing 8.3.8rc2
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-01 11:12:28 +02:00
Jens Axboe 0e3c9a2284 Revert "writeback: fix WB_SYNC_NONE writeback from umount"
This reverts commit e913fc825d.

We are investigating a hang associated with the WB_SYNC_NONE changes,
so revert them for now.

Conflicts:

	fs/fs-writeback.c
	mm/page-writeback.c

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-01 11:08:43 +02:00
David S. Miller 5953a30347 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2010-05-31 23:44:57 -07:00
Daniel Mack 65f25da44b ALSA: usb-audio: unify constants from specification
Move more definitions from private enums to appropriate header files.

Signed-off-by: Daniel Mack <daniel@caiaq.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2010-05-31 18:17:22 +02:00
Daniel Mack 7176d37a28 ALSA: usb-audio: fix selector unit string index accessor
This is another regression from the UAC2 code refactoring.

Signed-off-by: Daniel Mack <daniel@caiaq.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2010-05-31 18:16:31 +02:00
Daniel Mack 5dd360ebd8 include/linux/usb/audio-v2.h: add more UAC2 details
Also, remove the 'bmControl' field from uac_clock_selector_descriptor,
which was at the wrong offset. This struct is currently unused.

Signed-off-by: Daniel Mack <daniel@caiaq.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2010-05-31 18:16:14 +02:00
Daniel Mack dcbe7bcfa3 ALSA: usb-audio: UAC2: clean up parsing of bmaControls
Introduce two new static inline functions for a more readable parsing
of UAC2 bmaControls.

Signed-off-by: Daniel Mack <daniel@caiaq.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2010-05-31 18:15:45 +02:00
Eric Dumazet 7489aec8ee netfilter: xtables: stackptr should be percpu
commit f3c5c1bfd4 (netfilter: xtables: make ip_tables reentrant)
introduced a performance regression, because stackptr array is shared by
all cpus, adding cache line ping pongs. (16 cpus share a 64 bytes cache
line)

Fix this using alloc_percpu()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-By: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-31 16:41:35 +02:00
David S. Miller 64960848ab Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2010-05-31 05:46:45 -07:00
David S. Miller 38117d1495 net: Fix NETDEV_NOTIFY_PEERS to not conflict with NETDEV_BONDING_DESLAVE.
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-31 00:28:35 -07:00
Ian Campbell 06c4648d46 arp_notify: allow drivers to explicitly request a notification event.
Currently such notifications are only generated when the device comes up or the
address changes. However one use case for these notifications is to enable
faster network recovery after a virtual machine migration (by causing switches
to relearn their MAC tables). A migration appears to the network stack as a
temporary loss of carrier and therefore does not trigger either of the current
conditions. Rather than adding carrier up as a trigger (which can cause issues
when interfaces a flapping) simply add an interface which the driver can use
to explicitly trigger the notification.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-31 00:27:44 -07:00
David S. Miller 92b4522f72 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-05-31 00:10:35 -07:00
Peter Zijlstra 8a49542c05 perf_events: Fix races in group composition
Group siblings don't pin each-other or the parent, so when we destroy
events we must make sure to clean up all cross referencing pointers.

In particular, for destruction of a group leader we must be able to
find all its siblings and remove their reference to it.

This means that detaching an event from its context must not detach it
from the group, otherwise we can end up failing to clear all pointers.

Solve this by clearly separating the attachment to a context and
attachment to a group, and keep the group composed until we destroy
the events.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-31 08:46:09 +02:00
Peter Zijlstra ac9721f3f5 perf_events: Fix races and clean up perf_event and perf_mmap_data interaction
In order to move toward separate buffer objects, rework the whole
perf_mmap_data construct to be a more self-sufficient entity, one
with its own lifetime rules.

This greatly sanitizes the whole output redirection code, which
was riddled with bugs and races.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-31 08:46:08 +02:00
Magnus Damm 8a768952ca sh: add boot code to MMCIF driver header
This patch adds a set of MMCIF functions for the romImage
boot loader that allows the kernel to be booted directly
from an MMC card.

Thanks to Jeremy Baker for the initial prototype.

Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-05-31 13:11:47 +09:00
Magnus Damm 487d9fc501 sh: prepare MMCIF driver header file
Update the MMCIF driver to include register information
and register access functions in the header file.
The MMCIF boot code builds on top of this.

Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-05-31 13:11:41 +09:00
Paul Mundt 8fa76f7e61 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2010-05-31 12:59:19 +09:00
Linus Torvalds 3b03117c5c Merge branch 'slub/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6
* 'slub/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
  SLUB: Allow full duplication of kmalloc array for 390
  slub: move kmem_cache_node into it's own cacheline
2010-05-30 12:46:17 -07:00
Linus Torvalds 003386fff3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  mm: export generic_pipe_buf_*() to modules
  fuse: support splice() reading from fuse device
  fuse: allow splice to move pages
  mm: export remove_from_page_cache() to modules
  mm: export lru_cache_add_*() to modules
  fuse: support splice() writing to fuse device
  fuse: get page reference for readpages
  fuse: use get_user_pages_fast()
  fuse: remove unneeded variable
2010-05-30 09:16:14 -07:00
Linus Torvalds 17d30ac077 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (47 commits)
  mfd: Rename twl5031 sih modules
  mfd: Storage class for timberdale should be before const qualifier
  mfd: Remove unneeded and dangerous clearing of clientdata
  mfd: New AB8500 driver
  gpio: Fix inverted rdc321x gpio data out registers
  mfd: Change rdc321x resources flags to IORESOURCE_IO
  mfd: Move pcf50633 irq related functions to its own file.
  mfd: Use threaded irq for pcf50633
  mfd: pcf50633-adc: Fix potential race in pcf50633_adc_sync_read
  mfd: Fix pcf50633 bitfield logic in interrupt handler
  gpio: rdc321x needs to select MFD_CORE
  mfd: Use menuconfig for quicker config editing
  ARM: AB3550 board configuration and irq for U300
  mfd: AB3550 core driver
  mfd: AB3100 register access change to abx500 API
  mfd: Renamed ab3100.h to abx500.h
  gpio: Add TC35892 GPIO driver
  mfd: Add Toshiba's TC35892 MFD core
  mfd: Delay to mask tsc irq in max8925
  mfd: Remove incorrect wm8350 kfree
  ...
2010-05-30 09:13:08 -07:00
Linus Torvalds e38c1e54ce Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
  DMAENGINE: DMA40 U8500 platform configuration
  DMA: PL330: Add dma api driver
2010-05-30 09:12:43 -07:00
Linus Torvalds d28619f156 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
  quota: Convert quota statistics to generic percpu_counter
  ext3 uses rb_node = NULL; to zero rb_root.
  quota: Fixup dquot_transfer
  reiserfs: Fix resuming of quotas on remount read-write
  pohmelfs: Remove dead quota code
  ufs: Remove dead quota code
  udf: Remove dead quota code
  quota: rename default quotactl methods to dquot_
  quota: explicitly set ->dq_op and ->s_qcop
  quota: drop remount argument to ->quota_on and ->quota_off
  quota: move unmount handling into the filesystem
  quota: kill the vfs_dq_off and vfs_dq_quota_on_remount wrappers
  quota: move remount handling into the filesystem
  ocfs2: Fix use after free on remount read-only

Fix up conflicts in fs/ext4/super.c and fs/ufs/file.c
2010-05-30 09:11:11 -07:00
Randy Dunlap 97ef6f7449 rapidio: fix new kernel-doc warnings
Fix a bunch of new rapidio kernel-doc warnings:

Warning(include/linux/rio.h:123): No description found for parameter 'comp_tag'
Warning(include/linux/rio.h:123): No description found for parameter 'phys_efptr'
Warning(include/linux/rio.h:123): No description found for parameter 'em_efptr'
Warning(include/linux/rio.h:123): No description found for parameter 'pwcback'
Warning(include/linux/rio.h:247): No description found for parameter 'set_domain'
Warning(include/linux/rio.h:247): No description found for parameter 'get_domain'
Warning(drivers/rapidio/rio-scan.c:1133): No description found for parameter 'rdev'
Warning(drivers/rapidio/rio-scan.c:1133): Excess function parameter 'port' description in 'rio_init_em'
Warning(drivers/rapidio/rio.c:349): No description found for parameter 'rdev'
Warning(drivers/rapidio/rio.c:349): Excess function parameter 'mport' description in 'rio_request_inb_pwrite'
Warning(drivers/rapidio/rio.c:393): No description found for parameter 'port'
Warning(drivers/rapidio/rio.c:393): No description found for parameter 'local'
Warning(drivers/rapidio/rio.c:393): No description found for parameter 'destid'
Warning(drivers/rapidio/rio.c:393): No description found for parameter 'hopcount'
Warning(drivers/rapidio/rio.c:393): Excess function parameter 'rdev' description in 'rio_mport_get_physefb'
Warning(drivers/rapidio/rio.c:845): Excess function parameter 'local' description in 'rio_std_route_clr_table'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-30 09:02:47 -07:00
Linus Torvalds 35926ff5fb Revert "cpusets: randomize node rotor used in cpuset_mem_spread_node()"
This reverts commit 0ac0c0d0f8, which
caused cross-architecture build problems for all the wrong reasons.
IA64 already added its own version of __node_random(), but the fact is,
there is nothing architectural about the function, and the original
commit was just badly done. Revert it, since no fix is forthcoming.

Requested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-30 09:00:03 -07:00
Linus Torvalds b612a05537 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: clean up on forwarded aborted mds request
  ceph: fix leak of osd authorizer
  ceph: close out mds, osd connections before stopping auth
  ceph: make lease code DN specific
  fs/ceph: Use ERR_CAST
  ceph: renew auth tickets before they expire
  ceph: do not resend mon requests on auth ticket renewal
  ceph: removed duplicated #includes
  ceph: avoid possible null dereference
  ceph: make mds requests killable, not interruptible
  sched: add wait_for_completion_killable_timeout
2010-05-30 08:56:39 -07:00
Christoph Lameter 0f1f694260 SLUB: Allow full duplication of kmalloc array for 390
Commit 756dee7587 ("SLUB: Get rid of dynamic DMA
kmalloc cache allocation") makes S390 run out of kmalloc caches.  Increase the
number of kmalloc caches to a safe size.

Cc: <stable@kernel.org> [ .33 and .34 ]
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-05-30 13:02:08 +03:00
Linus Torvalds 52b0ace7df Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (26 commits)
  ALSA: snd-usb-caiaq: Bump version number to 1.3.21
  ALSA: Revert "ALSA: snd-usb-caiaq: Set default input mode of A4DJ"
  ALSA: snd-usb-caiaq: Simplify single case to an 'if'
  ALSA: snd-usb-caiaq: Restore 'Control vinyl' input mode on A4DJ
  ALSA: hda: Use LPIB for a Shuttle device
  ALSA: hda: Add support for another Lenovo ThinkPad Edge in conexant codec
  ALSA: hda: Use LPIB for Sony VPCS11V9E
  ALSA: usb-audio: fix feature unit parser for UAC2
  ALSA: asihpi - Minor code cleanup
  ALSA: asihpi - Add support for new ASI8800 family
  ALSA: asihpi - Fix bug preventing outstream_write preload from happening
  ALSA: asihpi - Fix imbalanced lock path in hw_message
  ALSA: asihpi - Remove support for old ASI8800 family
  ALSA: asihpi - Add hd radio blend functions
  ALSA: asihpi - Remove unused io map functions
  ALSA: usb-audio: add support for UAC2 pitch control
  ALSA: usb-audio: parse UAC2 endpoint descriptors correctly
  ALSA: usb-audio: fix return values
  ALSA: usb-audio: parse more format descriptors with structs
  sound: Add missing spin_unlock
  ...
2010-05-29 15:31:57 -07:00
Sage Weil 0aa12fb439 sched: add wait_for_completion_killable_timeout
Add missing _killable_timeout variant for wait_for_completion that will
return when a timeout expires or the task is killed.

CC: Ingo Molnar <mingo@elte.hu>
CC: Andreas Herrmann <andreas.herrmann3@amd.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29 09:12:30 -07:00
Changli Gao 5b0daa3474 skb: make skb_recycle_check() return a bool value
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-29 00:12:13 -07:00
Linus Torvalds e4f2e5eaac Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
  intel_idle: native hardware cpuidle driver for latest Intel processors
  ACPI: acpi_idle: touch TS_POLLING only in the non-MWAIT case
  acpi_pad: uses MONITOR/MWAIT, so it doesn't need to clear TS_POLLING
  sched: clarify commment for TS_POLLING
  ACPI: allow a native cpuidle driver to displace ACPI
  cpuidle: make cpuidle_curr_driver static
  cpuidle: add cpuidle_unregister_driver() error check
  cpuidle: fail to register if !CONFIG_CPU_IDLE
2010-05-28 16:14:17 -07:00
Linus Torvalds 9a90e09854 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (27 commits)
  ACPI: Don't let acpi_pad needlessly mark TSC unstable
  drivers/acpi/sleep.h: Checkpatch cleanup
  ACPI: Minor cleanup eliminating redundant PMTIMER_TICKS to NS conversion
  ACPI: delete unused c-state promotion/demotion data strucutures
  ACPI: video: fix acpi_backlight=video
  ACPI: EC: Use kmemdup
  drivers/acpi: use kasprintf
  ACPI, APEI, EINJ injection parameters support
  Add x64 support to debugfs
  ACPI, APEI, Use ERST for persistent storage of MCE
  ACPI, APEI, Error Record Serialization Table (ERST) support
  ACPI, APEI, Generic Hardware Error Source memory error support
  ACPI, APEI, UEFI Common Platform Error Record (CPER) header
  Unified UUID/GUID definition
  ACPI Hardware Error Device (PNP0C33) support
  ACPI, APEI, PCIE AER, use general HEST table parsing in AER firmware_first setup
  ACPI, APEI, Document for APEI
  ACPI, APEI, EINJ support
  ACPI, APEI, HEST table parsing
  ACPI, APEI, APEI supporting infrastructure
  ...
2010-05-28 14:42:18 -07:00
Len Brown 91dd696439 Merge branch 'acpi_enable' into release 2010-05-28 16:17:27 -04:00
Linus Torvalds 89ad6a6173 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  remove detritus left by "mm: make read_cache_page synchronous"
  fix fs/sysv s_dirt handling
  fat: convert to use the new truncate convention.
  ext2: convert to use the new truncate convention.
  tmpfs: convert to use the new truncate convention
  fs: convert simple fs to new truncate
  kill spurious reference to vmtruncate
  fs: introduce new truncate sequence
  fs/super: fix kernel-doc warning
  fs/minix: bugfix, number of indirect block ptrs per block depends on block size
  rename the generic fsync implementations
  drop unused dentry argument to ->fsync
  fs: Add missing mutex_unlock
  Fix racy use of anon_inode_getfd() in perf_event.c
  get rid of the magic around f_count in aio
  VFS: fix recent breakage of FS_REVAL_DOT
  Revert "anon_inode: set S_IFREG on the anon_inode"
2010-05-28 10:07:48 -07:00
npiggin@suse.de 7bb46a6734 fs: introduce new truncate sequence
Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
setattr > vmtruncate > truncate, have filesystems call their truncate sequence
from ->setattr if filesystem specific operations are required. vmtruncate is
deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
previously should be used.

simple_setattr is introduced for simple in-ram filesystems to implement
the new truncate sequence. Eventually all filesystems should be converted
to implement a setattr, and the default code in notify_change should go
away.

simple_setsize is also introduced to perform just the ATTR_SIZE portion
of simple_setattr (ie. changing i_size and trimming pagecache).

To implement the new truncate sequence:
- filesystem specific manipulations (eg freeing blocks) must be done in
  the setattr method rather than ->truncate.
- vmtruncate can not be used by core code to trim blocks past i_size in
  the event of write failure after allocation, so this must be performed
  in the fs code.
- convert usage of helpers block_write_begin, nobh_write_begin,
  cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
  variants. These avoid calling vmtruncate to trim blocks (see previous).
- inode_setattr should not be used. generic_setattr is a new function
  to be used to copy simple attributes into the generic inode.
- make use of the better opportunity to handle errors with the new sequence.

Big problem with the previous calling sequence: the filesystem is not called
until i_size has already changed.  This means it is not allowed to fail the
call, and also it does not know what the previous i_size was. Also, generic
code calling vmtruncate to truncate allocated blocks in case of error had
no good way to return a meaningful error (or, for example, atomically handle
block deallocation).

Cc: Christoph Hellwig <hch@lst.de>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:15:33 -04:00
Christoph Hellwig 1b061d9247 rename the generic fsync implementations
We don't name our generic fsync implementations very well currently.
The no-op implementation for in-memory filesystems currently is called
simple_sync_file which doesn't make too much sense to start with,
the the generic one for simple filesystems is called simple_fsync
which can lead to some confusion.

This patch renames the generic file fsync method to generic_file_fsync
to match the other generic_file_* routines it is supposed to be used
with, and the no-op implementation to noop_fsync to make it obvious
what to expect.  In addition add some documentation for both methods.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:06:06 -04:00
Christoph Hellwig 7ea8085910 drop unused dentry argument to ->fsync
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:05:02 -04:00
Al Viro d7065da038 get rid of the magic around f_count in aio
__aio_put_req() plays sick games with file refcount.  What
it wants is fput() from atomic context; it's almost always
done with f_count > 1, so they only have to deal with delayed
work in rare cases when their reference happens to be the
last one.  Current code decrements f_count and if it hasn't
hit 0, everything is fine.  Otherwise it keeps a pointer
to struct file (with zero f_count!) around and has delayed
work do __fput() on it.

Better way to do it: use atomic_long_add_unless( , -1, 1)
instead of !atomic_long_dec_and_test().  IOW, decrement it
only if it's not the last reference, leave refcount alone
if it was.  And use normal fput() in delayed work.

I've made that atomic_long_add_unless call a new helper -
fput_atomic().  Drops a reference to file if it's safe to
do in atomic (i.e. if that's not the last one), tells if
it had been able to do that.  aio.c converted to it, __fput()
use is gone.  req->ki_file *always* contributes to refcount
now.  And __fput() became static.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:03:07 -04:00
Linus Torvalds aa36c7bf98 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  libata: implement dump_id force param
  libata: disable ATAPI AN by default
  libata-sff: make BMDMA optional
  libata-sff: kill dummy BMDMA ops from sata_qstor and pata_octeon_cf
  libata-sff: separate out BMDMA init
  libata-sff: separate out BMDMA irq handler
  libata-sff: ata_sff_irq_clear() is BMDMA specific
  sata_mv: drop unncessary EH callback resetting
2010-05-27 18:34:58 -07:00
Len Brown 752138df0d cpuidle: make cpuidle_curr_driver static
cpuidle_register_driver() sets cpuidle_curr_driver
cpuidle_unregister_driver() clears cpuidle_curr_driver

We should't expose cpuidle_curr_driver to
potential modification except via these interfaces.
So make it static and create cpuidle_get_driver() to observe it.

Signed-off-by: Len Brown <len.brown@intel.com>
2010-05-27 21:06:58 -04:00
Rabin Vincent 62579266cf mfd: New AB8500 driver
Add a new driver to support the AB8500 Power Management chip, replacing
the current AB4500.  The new driver replaces the old one, instead of an
incremental modification, because this is a substantial overhaul
including:

 - Split of the driver into -core and -spi portions, to allow another
   interface layer to be added

 - Addition of interrupt support

 - Switch to MFD core API for handling subdevices

 - Simplification of the APIs to remove a redundant block parameter

 - Rename of the APIs and macros from ab4500_* to ab8500_*

 - Rename of the files from ab4500* to ab8500*

 - Change of the driver name from ab4500 to ab8500

Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Acked-by: Srinidhi Kasagar <srinidhi.kasagar@stericsson.com>
Signed-off-by: Rabin Vincent <rabin.vincent@stericsson.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2010-05-28 01:38:00 +02:00
Mattias Wallin fa661258a2 mfd: AB3100 register access change to abx500 API
The interface for the AB3100 is changed to make way for the
ABX500 family of chips: AB3550, AB5500 and future ST-Ericsson
Analog Baseband chips. The register access functions are moved
out to a separate struct abx500_ops. In this way the interface
is moved from the implementation and the sub functionality drivers
can keep their interface intact when chip infrastructure and
communication mechanisms changes. We also define the AB3550
device IDs and the AB3550 platform data struct and convert
the catenated 32bit event to an array of 3 x 8bits.

Signed-off-by: Mattias Wallin <mattias.wallin@stericsson.com>
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2010-05-28 01:37:45 +02:00
Linus Walleij 812f9e9d42 mfd: Renamed ab3100.h to abx500.h
The goal here is to make way for a more general interface for the
analog baseband chips ab3100 ab3550 ab550 and future chips.

This patch have been divided into two parts since both changing name
and content of a file is not recommended in git.

Signed-off-by: Mattias Wallin <mattias.wallin@stericsson.com>
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2010-05-28 01:37:44 +02:00
Rabin Vincent b4ecd326b7 mfd: Add Toshiba's TC35892 MFD core
The TC35892 I/O Expander provides 24 GPIOs, a keypad controller, timers,
and a rotator wheel interface.  This patch adds the MFD core.

Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Rabin Vincent <rabin.vincent@stericsson.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2010-05-28 01:37:42 +02:00
Mark Brown b03b4d7cdd mfd: Ensure WM831x charger interrupts are acknowledged when suspending
The charger interrupts on the WM831x are unconditionally a wake source
for the system. If the power driver is not able to monitor them (for
example, due to the IRQ line not having been wired up on the system)
then any charger interrupt will prevent the system suspending for any
meaningful amount of time since nothing will ack them.

Avoid this issue by manually acknowledging these interrupts when we
suspend the WM831x core device if they are masked. If software is
actually using the interrupts then they will be unmasked and this
change will have no effect.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2010-05-28 01:37:39 +02:00
Todd Fischer 7525996670 input: Touchscreen driver for TPS6507x
Add touch screen input driver for TPS6507x family of multi-function
chips.  Uses the TPS6507x MFD driver.  No interrupt support due to
testing limitations of current hardware.

Signed-off-by: Todd Fischer <todd.fischer@ridgerun.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2010-05-28 01:37:38 +02:00
Todd Fischer 31dd6a2672 mfd: Add TPS6507x support
TPS6507x are multi function (PM, touchscreen) chipsets from TI.
This commit also changes the corresponding regulator driver from being
standalone to an MFD subdevice.

Signed-off-by: Todd Fischer <todd.fischer@ridgerun.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2010-05-28 01:37:38 +02:00
Todd Fischer 0bc20bba35 mfd: Add tps6507x board data structure
Add mfd structure which refrences sub-driver initialization data. For example,
for a giving hardware implementation, the voltage regulator sub-driver
initialization data provides the mapping betten a voltage regulator and what
the output voltage is being used for.

Signed-off-by: Todd Fischer <todd.fischer@ridgerun.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2010-05-28 01:37:37 +02:00
Todd Fischer d183fcc975 mfd: Move TPS6507x register definition to header file.
Other sub-drivers for the TPS6507x chip will need to use register
definition so move it out of the source file and into a header file.

Signed-off-by: Todd Fischer <todd.fischer@ridgerun.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2010-05-28 01:37:36 +02:00
Ira W. Snyder bd3581323c mfd: Janz CMOD-IO PCI MODULbus Carrier Board support
The Janz CMOD-IO PCI MODULbus carrier board is a PCI to MODULbus bridge,
which may host many different types of MODULbus daughterboards, including
CAN and GPIO controllers.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Reviewed-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2010-05-28 01:37:32 +02:00
Henrik Kretzschmar 872c1b14e7 mfd: Section cleanup of 88pm860x driver
This patch fixes three section mismatches.

WARNING: drivers/mfd/88pm860x.o(.text+0x12): Section mismatch in
reference from the function pm860x_device_exit() to the function
.devexit.text:device_irq_exit()
The function pm860x_device_exit() references a function in an exit
section.
Often the function device_irq_exit() has valid usage outside the exit
section
and the fix is to remove the __devexit annotation of device_irq_exit.

WARNING: drivers/mfd/88pm860x.o(.text+0xb0): Section mismatch in
reference from the function pm860x_device_init() to the function
.devinit.text:device_8606_init()
The function pm860x_device_init() references
the function __devinit device_8606_init().
This is often because pm860x_device_init lacks a __devinit
annotation or the annotation of device_8606_init is wrong.

WARNING: drivers/mfd/88pm860x.o(.text+0xbe): Section mismatch in
reference from the function pm860x_device_init() to the function
.devinit.text:device_8607_init()
The function pm860x_device_init() references
the function __devinit device_8607_init().
This is often because pm860x_device_init lacks a __devinit
annotation or the annotation of device_8607_init is wrong.

Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2010-05-28 01:37:31 +02:00
Florian Fainelli e090d506c3 mfd: Add support for the RDC321x southbridge
This patch adds a new MFD driver for the RDC321x southbridge. This southbridge
is always present in the RDC321x System-on-a-Chip and provides access to some
GPIOs as well as a watchdog. Access to these two functions is done using the
southbridge PCI device configuration space.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2010-05-28 01:37:28 +02:00
Linus Torvalds c5617b200a Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits)
  tracing: Add __used annotation to event variable
  perf, trace: Fix !x86 build bug
  perf report: Support multiple events on the TUI
  perf annotate: Fix up usage of the build id cache
  x86/mmiotrace: Remove redundant instruction prefix checks
  perf annotate: Add TUI interface
  perf tui: Remove annotate from popup menu after failure
  perf report: Don't start the TUI if -D is used
  perf: Fix getline undeclared
  perf: Optimize perf_tp_event_match()
  perf: Remove more code from the fastpath
  perf: Optimize the !vmalloc backed buffer
  perf: Optimize perf_output_copy()
  perf: Fix wakeup storm for RO mmap()s
  perf-record: Share per-cpu buffers
  perf-record: Remove -M
  perf: Ensure that IOC_OUTPUT isn't used to create multi-writer buffers
  perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events
  perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction
  perf tui: Allow disabling the TUI on a per command basis in ~/.perfconfig
  ...
2010-05-27 15:23:47 -07:00
Linus Torvalds cad719d86e Merge branch 'for-linus' of git://git.o-hand.com/linux-rpurdie-backlight
* 'for-linus' of git://git.o-hand.com/linux-rpurdie-backlight:
  gta02: Use pcf50633 backlight driver instead of platform backlight driver.
  backlight: pcf50633: Register a pcf50633-backlight device in pcf50633 core driver.
  backlight: Add pcf50633 backlight driver
  backlight: 88pm860x_bl: fix error handling in pm860x_backlight_probe
  backlight: max8925_bl: Fix error handling path
  backlight: l4f00242t03: fix error handling in l4f00242t03_probe
  backlight: add S6E63M0 AMOLED LCD Panel driver
  backlight: adp8860: add support for ADP8861 & ADP8863
  backlight: mbp_nvidia_bl - Fix DMI_SYS_VENDOR for MacBook1,1
  backlight: Add Cirrus EP93xx backlight driver
  backlight: l4f00242t03: Fix regulators handling code in remove function
  backlight: fix adp8860_bl build errors
  backlight: new driver for the ADP8860 backlight parts
  backlight: 88pm860x_bl - potential memory leak
  backlight: mbp_nvidia_bl - add support for older MacBookPro and MacBook 6,1.
  backlight: Kconfig cleanup
  backlight: backlight_device_register() return ERR_PTR()
2010-05-27 11:34:55 -07:00
Linus Torvalds 3ddab4788d Merge branch 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds
* 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds:
  leds: Add mx31moboard MC13783 led support
  leds: Add mc13783 LED support
  leds: leds-ss4200: fix led_classdev_unregister twice in error handling
  leds: leds-lp3944: properly handle lp3944_configure fail in lp3944_probe
  leds: led-class: set permissions on max_brightness file to 0444
  leds: leds-gpio: Change blink_set callback to be able to turn off blinking
  leds: Add LED driver for the Soekris net5501 board
  leds: 88pm860x - fix checking in probe function
2010-05-27 11:34:20 -07:00
Linus Torvalds d1e0fe252e Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: (23 commits)
  hwmon: (lm75) Add support for the Texas Instruments TMP105
  hwmon: (ltc4245) Read only one GPIO pin
  hwmon: (dme1737) Add SCH5127 support
  hwmon: (tmp102) Don't always stop chip at exit
  hwmon: (tmp102) Fix suspend and resume functions
  hwmon: (tmp102) Various fixes
  hwmon: Driver for TI TMP102 temperature sensor
  hwmon: EMC1403 thermal sensor support
  hwmon: (applesmc) Add temperature sensor labels to sysfs interface
  hwmon: (applesmc) Add generic support for MacBook Pro 7
  hwmon: (applesmc) Add generic support for MacBook Pro 6
  hwmon: (applesmc) Add support for MacBook Pro 5,3 and 5,4
  hwmon: (tmp401) Reorganize code to get rid of static forward declarations
  hwmon: (tmp401) Use constants for sysfs file permissions
  hwmon: (adm1031) Allow setting update rate
  hwmon: Add description of the update_rate sysfs attribute
  hwmon: (lm90) Use programmed update rate
  hwmon: (f71882fg) Acquire I/O regions while we're working with them
  hwmon: (f71882fg) Code cleanup
  hwmon: (f71882fg) Use strict_stro(l|ul) instead of simple_strto$1
  ...
2010-05-27 11:33:46 -07:00
Jean Delvare 70dd6beac0 hwmon: (asus_atk0110) Don't load if ACPI resources aren't enforced
When the user passes the kernel parameter acpi_enforce_resources=lax,
the ACPI resources are no longer protected, so a native driver can
make use of them. In that case, we do not want the asus_atk0110 to be
loaded. Unfortunately, this driver loads automatically due to its
MODULE_DEVICE_TABLE, so the user ends up with two drivers loaded for
the same device - this is bad.

So I suggest that we prevent the asus_atk0110 driver from loading if
acpi_enforce_resources=lax.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Luca Tettamanti <kronos.it@gmail.com>
Cc: Len Brown <lenb@kernel.org>
2010-05-27 19:58:37 +02:00
Linus Torvalds 4e455c6782 Merge branch 'sfi-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6
* 'sfi-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6:
  SFI: add sysfs interface for SFI tables.
  SFI: add support for v0.81 spec
2010-05-27 10:47:41 -07:00
Linus Torvalds 105a048a4f Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (27 commits)
  Btrfs: add more error checking to btrfs_dirty_inode
  Btrfs: allow unaligned DIO
  Btrfs: drop verbose enospc printk
  Btrfs: Fix block generation verification race
  Btrfs: fix preallocation and nodatacow checks in O_DIRECT
  Btrfs: avoid ENOSPC errors in btrfs_dirty_inode
  Btrfs: move O_DIRECT space reservation to btrfs_direct_IO
  Btrfs: rework O_DIRECT enospc handling
  Btrfs: use async helpers for DIO write checksumming
  Btrfs: don't walk around with task->state != TASK_RUNNING
  Btrfs: do aio_write instead of write
  Btrfs: add basic DIO read/write support
  direct-io: do not merge logically non-contiguous requests
  direct-io: add a hook for the fs to provide its own submit_bio function
  fs: allow short direct-io reads to be completed via buffered IO
  Btrfs: Metadata ENOSPC handling for balance
  Btrfs: Pre-allocate space for data relocation
  Btrfs: Metadata ENOSPC handling for tree log
  Btrfs: Metadata reservation for orphan inodes
  Btrfs: Introduce global metadata reservation
  ...
2010-05-27 10:43:44 -07:00
Linus Torvalds e4ce30f377 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits)
  ext4: Make fsync sync new parent directories in no-journal mode
  ext4: Drop whitespace at end of lines
  ext4: Fix compat EXT4_IOC_ADD_GROUP
  ext4: Conditionally define compat ioctl numbers
  tracing: Convert more ext4 events to DEFINE_EVENT
  ext4: Add new tracepoints to track mballoc's buddy bitmap loads
  ext4: Add a missing trace hook
  ext4: restart ext4_ext_remove_space() after transaction restart
  ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted
  ext4: Avoid crashing on NULL ptr dereference on a filesystem error
  ext4: Use bitops to read/modify i_flags in struct ext4_inode_info
  ext4: Convert calls of ext4_error() to EXT4_ERROR_INODE()
  ext4: Convert callers of ext4_get_blocks() to use ext4_map_blocks()
  ext4: Add new abstraction ext4_map_blocks() underneath ext4_get_blocks()
  ext4: Use our own write_cache_pages()
  ext4: Show journal_checksum option
  ext4: Fix for ext4_mb_collect_stats()
  ext4: check for a good block group before loading buddy pages
  ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocate
  ext4: Remove extraneous newlines in ext4_msg() calls
  ...

Fixed up trivial conflict in fs/ext4/fsync.c
2010-05-27 10:26:37 -07:00
Linus Torvalds 55ddf14b04 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
  ieee1394: schedule for removal
  firewire: core: use separate timeout for each transaction
  firewire: core: Fix tlabel exhaustion problem
  firewire: core: make transaction label allocation more robust
  firewire: core: clean up config ROM related defined constants
  ieee1394: mark char device files as not seekable
  firewire: cdev: mark char device files as not seekable
  firewire: ohci: cleanups and fix for nonstandard build without debug facility
  firewire: ohci: wait for PHY register accesses to complete
  firewire: ohci: fix up configuration of TI chips
  firewire: ohci: enable 1394a enhancements
  firewire: ohci: do not clear PHY interrupt status inadvertently
  firewire: ohci: add a function for reading PHY registers

Trivial conflicts in Documentation/feature-removal-schedule.txt
2010-05-27 10:22:06 -07:00
Dmitry Monakhov f32764bd2b quota: Convert quota statistics to generic percpu_counter
Generic per-cpu counter has some memory overhead but it is negligible for
modern systems and embedded systems compile without quota support.  And code
reuse is a good thing. This patch should fix complain from preemptive kernels
which was introduced by dde9588853.

[Jan Kara: Fixed patch to work on 32-bit archs as well]

Reported-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-27 18:56:27 +02:00
Linus Torvalds 7eb1053fd0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: usbtouchscreen - support bigger iNexio touchscreens
  Input: ads7846 - return error on regulator_get() failure
  Input: twl4030-vibra - correct the power down sequence
  Input: enable onkey driver of max8925
  Input: use ABS_CNT rather than (ABS_MAX + 1)
2010-05-27 09:19:55 -07:00
Lee Schermerhorn 7aac789885 numa: introduce numa_mem_id()- effective local memory node id
Introduce numa_mem_id(), based on generic percpu variable infrastructure
to track "nearest node with memory" for archs that support memoryless
nodes.

Define API in <linux/topology.h> when CONFIG_HAVE_MEMORYLESS_NODES
defined, else stubs.  Architectures will define HAVE_MEMORYLESS_NODES
if/when they support them.

Archs can override definitions of:

numa_mem_id() - returns node number of "local memory" node
set_numa_mem() - initialize [this cpus'] per cpu variable 'numa_mem'
cpu_to_mem()  - return numa_mem for specified cpu; may be used as lvalue

Generic initialization of 'numa_mem' occurs in __build_all_zonelists().
This will initialize the boot cpu at boot time, and all cpus on change of
numa_zonelist_order, or when node or memory hot-plug requires zonelist
rebuild.  Archs that support memoryless nodes will need to initialize
'numa_mem' for secondary cpus as they're brought on-line.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:57 -07:00
Lee Schermerhorn 7281201922 numa: add generic percpu var numa_node_id() implementation
Rework the generic version of the numa_node_id() function to use the new
generic percpu variable infrastructure.

Guard the new implementation with a new config option:

        CONFIG_USE_PERCPU_NUMA_NODE_ID.

Archs which support this new implemention will default this option to 'y'
when NUMA is configured.  This config option could be removed if/when all
archs switch over to the generic percpu implementation of numa_node_id().
Arch support involves:

  1) converting any existing per cpu variable implementations to use
     this implementation.  x86_64 is an instance of such an arch.
  2) archs that don't use a per cpu variable for numa_node_id() will
     need to initialize the new per cpu variable "numa_node" as cpus
     are brought on-line.  ia64 is an example.
  3) Defining USE_PERCPU_NUMA_NODE_ID in arch dependent Kconfig--e.g.,
     when NUMA is configured.  This is required because I have
     retained the old implementation by default to allow archs to
     be modified incrementally, as desired.

Subsequent patches will convert x86_64 and ia64 to use this implemenation.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:57 -07:00
jan Blunck ae6afc3f5c vfs: introduce noop_llseek()
This is an implementation of ->llseek useable for the rare special case
when userspace expects the seek to succeed but the (device) file is
actually not able to perform the seek.  In this case you use noop_llseek()
instead of falling back to the default implementation of ->llseek.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:56 -07:00
Jeff Moyer 9d85cba718 aio: fix the compat vectored operations
The aio compat code was not converting the struct iovecs from 32bit to
64bit pointers, causing either EINVAL to be returned from io_getevents, or
EFAULT as the result of the I/O.  This patch passes a compat flag to
io_submit to signal that pointer conversion is necessary for a given iocb
array.

A variant of this was tested by Michael Tokarev.  I have also updated the
libaio test harness to exercise this code path with good success.
Further, I grabbed a copy of ltp and ran the
testcases/kernel/syscall/readv and writev tests there (compiled with -m32
on my 64bit system).  All seems happy, but extra eyes on this would be
welcome.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix CONFIG_COMPAT=n build]
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: <stable@kernel.org>		[2.6.35.1]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:53 -07:00
Jeff Moyer b83733639a compat: factor out compat_rw_copy_check_uvector from compat_do_readv_writev
It was reported in http://lkml.org/lkml/2010/3/8/309 that 32 bit readv and
writev AIO operations were not functioning properly.  It turns out that
the code to convert the 32bit io vectors to 64 bits was never written.
The results of that can be pretty bad, but in my testing, it mostly ended
up in generating EFAULT as we walked off the list of I/O vectors provided.

This patch set fixes the problem in my environment.  are greatly
appreciated.

This patch:

Factor out code that will be used by both compat_do_readv_writev and the
compat aio submission code paths.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: <stable@kernel.org>		[2.6.35.1]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:53 -07:00
FUJITA Tomonori 99d1bd2c13 dma-mapping: remove deprecated dma_sync_single and dma_sync_sg API
Since 2.6.5, it had been commented, 'for backwards compatibility,
removed in 2.7.x'. Since 2.6.31, it have been marked as __deprecated.

I think that we can remove the API safely now.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:53 -07:00
FUJITA Tomonori 5fd75a7850 dma-mapping: remove unnecessary sync_single_range_* in dma_map_ops
sync_single_range_for_cpu and sync_single_range_for_device hooks are
unnecessary because sync_single_for_cpu and sync_single_for_device can
be used instead.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:52 -07:00
FUJITA Tomonori 38388301b7 swiotlb: remove unnecessary swiotlb_sync_single_range_*
swiotlb_sync_single_range_for_cpu and swiotlb_sync_single_range_for_device
are unnecessary because swiotlb_sync_single_for_cpu and
swiotlb_sync_single_for_device can be used instead.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:52 -07:00
Joe Eykholt 5960164fde lib/random32: export pseudo-random number generator for modules
This patch moves the definition of struct rnd_state and the inline
__seed() function to linux/random.h.  It renames the static __random32()
function to prandom32() and exports it for use in modules.

prandom32() is useful as a privately-seeded pseudo random number generator
that can give the same result every time it is initialized.

For FCoE FC-BB-6 VN2VN mode self-selected unique FC address generation, we
need an pseudo-random number generator seeded with the 64-bit world-wide
port name.  A truly random generator or one seeded with randomness won't
do because the same sequence of numbers should be generated each time we
boot or the link comes up.

A prandom32_seed() inline function is added to the header file.  It is
inlined not for speed, but so the function won't be expanded in the base
kernel, but only in the module that uses it.

Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:52 -07:00
Oleg Nesterov 0a14a130ca INIT_SIGHAND: use SIG_DFL instead of NULL
Cosmetic, no changes in the compiled code. Just s/NULL/SIG_DFL/ to make
it more readable and grep-friendly.

Note: probably SIG_IGN makes more sense, we could kill ignore_signals().
But then kernel_init() should do flush_signal_handlers() before exec().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Mathias Krause <Mathias.Krause@secunet.com>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:52 -07:00
Oleg Nesterov f20011457f pids: init_struct_pid.tasks should never see the swapper process
"statically initialize struct pid for swapper" commit 820e45db says:

	Statically initialize a struct pid for the swapper process (pid_t == 0)
	and attach it to init_task.  This is needed so task_pid(), task_pgrp()
	and task_session() interfaces work on the swapper process also.

OK, but:

	- it doesn't make sense to add init_task.pids[].node into
	  init_struct_pid.tasks[], and in fact this just wrong.

	  idle threads are special, they shouldn't be visible on any
	  global list. In particular do_each_pid_task(init_struct_pid)
	  shouldn't see swapper.

	  This is the actual reason why kill(0, SIGKILL) from /sbin/init
	  (which starts with 0,0 special pids) crashes the kernel. The
	  signal sent to pgid/sid == 0 must never see idle threads, even
	  if the previous patch fixed the crash itself.

	- we have other idle threads running on the non-boot CPUs, see
	  the next patch.

Change INIT_STRUCT_PID/INIT_PID_LINK to create the empty/unhashed
hlist_head/hlist_node. Like any other idle thread swapper can never exit,
so detach_pid()->__hlist_del() is not possible, but we could change
INIT_PID_LINK() to set pprev = &next if needed.

All we need is the valid swapper->pids[].pid == &init_struct_pid.

Reported-by: Mathias Krause <mathias.krause@secunet.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Mathias Krause <Mathias.Krause@secunet.com>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:52 -07:00
Oleg Nesterov fa2755e20a INIT_TASK() should initialize ->thread_group list
The trivial /sbin/init doing

	int main(void)
	{
		kill(0, SIGKILL)
	}

crashes the kernel.

This happens because __kill_pgrp_info(init_struct_pid) also sends SIGKILL
to the swapper process which runs with the uninitialized ->thread_group.

Change INIT_TASK() to initialize ->thread_group properly.

Note: the real problem is that the swapper process must not be visible to
signals, see the next patch. But this change is right anyway and fixes
the crash.

Reported-and-tested-by: Mathias Krause <mathias.krause@secunet.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Mathias Krause <Mathias.Krause@secunet.com>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:51 -07:00
Hedi Berriche 72680a191b pids: increase pid_max based on num_possible_cpus
On a system with a substantial number of processors, the early default
pid_max of 32k will not be enough.  A system with 1664 CPU's, there are
25163 processes started before the login prompt.  It's estimated that with
2048 CPU's we will pass the 32k limit.  With 4096, we'll reach that limit
very early during the boot cycle, and processes would stall waiting for an
available pid.

This patch increases the early maximum number of pids available, and
increases the minimum number of pids that can be set during runtime.

[akpm@linux-foundation.org: fix warnings]
Signed-off-by: Hedi Berriche <hedi@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg KH <gregkh@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: John Stoffel <john@stoffel.org>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:51 -07:00
Alexandre Bounine 7a88d62862 rapidio: add switch domain routines
Add switch specific domain routines required for 16-bit routing support in
switches with hierarchical implementation of routing tables.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:51 -07:00
Alexandre Bounine 058f88d672 rapidio: modify initialization of switch operations
Modify the way how RapidIO switch operations are declared.  Multiple
assignments through the linker script replaced by single initialization
call.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:51 -07:00
Thomas Moll 933af4a6c4 rapidio: add enabling SRIO port RX and TX
Add the functionality to enable Input receiver and Output transmitter of
every port, to allow non-maintenance traffic.

Signed-off-by: Thomas Moll <thomas.moll@sysgo.com>
Signed-off-by: Alexandre Bounine <abounine@tundra.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:51 -07:00
Alexandre Bounine e5cabeb3d6 rapidio: add Port-Write handling for EM
Add RapidIO Port-Write message handling in the context of Error
   Management Extensions Specification Rev.1.3.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Thomas Moll <thomas.moll@sysgo.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:50 -07:00
Alexandre Bounine 07590ff039 rapidio: add IDT CPS/TSI switches
Extentions to RapidIO switch support:

1. modify switch route operation declarations to allow using single
   switch-specific file for family of switches that share the same route
   table operations.

2. add standard route table operations for switches that that support
   route table manipulation registers as defined in the Rev.1.3 of RapidIO
   specification.

3. add clear-route-table operation for switches

4. add CPSxx and TSIxxx families of RapidIO switches

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Thomas Moll <thomas.moll@sysgo.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:50 -07:00
Manfred Spraul 31a7c4746e ipc/sem.c: cacheline align the ipc spinlock for semaphores
Cacheline align the spinlock for sysv semaphores.  Without the patch, the
spinlock and sem_otime [written by every semop that modified the array]
and sem_base [read in the hot path of try_atomic_semop()] can be in the
same cacheline.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:49 -07:00
Akinobu Mita b957e043ee notifier: change notifier_from_errno(0) to return NOTIFY_OK
This changes notifier_from_errno(0) to be NOTIFY_OK instead of
NOTIFY_STOP_MASK | NOTIFY_OK.

Currently, the notifiers which return encapsulated errno value have to
do something like this:

	err = do_something(); // returns -errno
	if (err)
		return notifier_from_errno(err);
	else
		return NOTIFY_OK;

This change makes the above code simple:

	err = do_something(); // returns -errno

	return return notifier_from_errno(err);

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:47 -07:00
Oleg Nesterov b3ac022cb9 proc: turn signal_struct->count into "int nr_threads"
No functional changes, just s/atomic_t count/int nr_threads/.

With the recent changes this counter has a single user, get_nr_threads()
And, none of its callers need the really accurate number of threads, not
to mention each caller obviously races with fork/exit.  It is only used to
report this value to the user-space, except first_tid() uses it to avoid
the unnecessary while_each_thread() loop in the unlikely case.

It is a bit sad we need a word in struct signal_struct for this, perhaps
we can change get_nr_threads() to approximate the number of threads using
signal->live and kill ->nr_threads later.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:47 -07:00
Oleg Nesterov 7e49827cc9 proc: get_nr_threads() doesn't need ->siglock any longer
Now that task->signal can't go away get_nr_threads() doesn't need
->siglock to read signal->count.

Also, make it inline, move into sched.h, and convert 2 other proc users of
signal->count to use this (now trivial) helper.

Henceforth get_nr_threads() is the only valid user of signal->count, we
are ready to turn it into "int nr_threads" or, perhaps, kill it.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:47 -07:00
Oleg Nesterov a705be6b5e kill the obsolete thread_group_cputime_free() helper
Kill the empty thread_group_cputime_free() helper.  It was needed to free
the per-cpu data which we no longer have.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:46 -07:00
Oleg Nesterov b7b8ff6373 signals: kill the awful task_rq_unlock_wait() hack
Now that task->signal can't go away we can revert the horrible hack added
by ad474caca3 ("fix for
account_group_exec_runtime(), make sure ->signal can't be freed under
rq->lock").

And we can do more cleanups sched_stats.h/posix-cpu-timers.c later.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:46 -07:00
Oleg Nesterov ea6d290ca3 signals: make task_struct->signal immutable/refcountable
We have a lot of problems with accessing task_struct->signal, it can
"disappear" at any moment.  Even current can't use its ->signal safely
after exit_notify().  ->siglock helps, but it is not convenient, not
always possible, and sometimes it makes sense to use task->signal even
after this task has already dead.

This patch adds the reference counter, sigcnt, into signal_struct.  This
reference is owned by task_struct and it is dropped in
__put_task_struct().  Perhaps it makes sense to export
get/put_signal_struct() later, but currently I don't see the immediate
reason.

Rename __cleanup_signal() to free_signal_struct() and unexport it.  With
the previous changes it does nothing except kmem_cache_free().

Change __exit_signal() to not clear/free ->signal, it will be freed when
the last reference to any thread in the thread group goes away.

Note:
	- when the last thead exits signal->tty can point to nowhere, see
	  the next patch.

	- with or without this patch signal_struct->count should go away,
	  or at least it should be "int nr_threads" for fs/proc. This will
	  be addressed later.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:46 -07:00
Oleg Nesterov 09faef11df exit: change zap_other_threads() to count sub-threads
Change zap_other_threads() to return the number of other sub-threads found
on ->thread_group list.

Other changes are cosmetic:

	- change the code to use while_each_thread() helper

	- remove the obsolete comment about SIGKILL/SIGSTOP

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:46 -07:00
Oleg Nesterov c70a626d3e umh: creds: kill subprocess_info->cred logic
Now that nobody ever changes subprocess_info->cred we can kill this member
and related code.  ____call_usermodehelper() always runs in the context of
freshly forked kernel thread, it has the proper ->cred copied from its
parent kthread, keventd.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:45 -07:00
Oleg Nesterov 685bfd2c48 umh: creds: convert call_usermodehelper_keys() to use subprocess_info->init()
call_usermodehelper_keys() uses call_usermodehelper_setkeys() to change
subprocess_info->cred in advance.  Now that we have info->init() we can
change this code to set tgcred->session_keyring in context of execing
kernel thread.

Note: since currently call_usermodehelper_keys() is never called with
UMH_NO_WAIT, call_usermodehelper_keys()->key_get() and umh_keys_cleanup()
are not really needed, we could rely on install_session_keyring_to_cred()
which does key_get() on success.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:45 -07:00
Neil Horman 898b374af6 exec: replace call_usermodehelper_pipe with use of umh init function and resolve limit
The first patch in this series introduced an init function to the
call_usermodehelper api so that processes could be customized by caller.
This patch takes advantage of that fact, by customizing the helper in
do_coredump to create the pipe and set its core limit to one (for our
recusrsion check).  This lets us clean up the previous uglyness in the
usermodehelper internals and factor call_usermodehelper out entirely.
While I'm at it, we can also modify the helper setup to look for a core
limit value of 1 rather than zero for our recursion check

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:44 -07:00
Neil Horman a06a4dc3a0 kmod: add init function to usermodehelper
About 6 months ago, I made a set of changes to how the core-dump-to-a-pipe
feature in the kernel works.  We had reports of several races, including
some reports of apps bypassing our recursion check so that a process that
was forked as part of a core_pattern setup could infinitely crash and
refork until the system crashed.

We fixed those by improving our recursion checks.  The new check basically
refuses to fork a process if its core limit is zero, which works well.

Unfortunately, I've been getting grief from maintainer of user space
programs that are inserted as the forked process of core_pattern.  They
contend that in order for their programs (such as abrt and apport) to
work, all the running processes in a system must have their core limits
set to a non-zero value, to which I say 'yes'.  I did this by design, and
think thats the right way to do things.

But I've been asked to ease this burden on user space enough times that I
thought I would take a look at it.  The first suggestion was to make the
recursion check fail on a non-zero 'special' number, like one.  That way
the core collector process could set its core size ulimit to 1, and enable
the kernel's recursion detection.  This isn't a bad idea on the surface,
but I don't like it since its opt-in, in that if a program like abrt or
apport has a bug and fails to set such a core limit, we're left with a
recursively crashing system again.

So I've come up with this.  What I've done is modify the
call_usermodehelper api such that an extra parameter is added, a function
pointer which will be called by the user helper task, after it forks, but
before it exec's the required process.  This will give the caller the
opportunity to get a call back in the processes context, allowing it to do
whatever it needs to to the process in the kernel prior to exec-ing the
user space code.  In the case of do_coredump, this callback is ues to set
the core ulimit of the helper process to 1.  This elimnates the opt-in
problem that I had above, as it allows the ulimit for core sizes to be set
to the value of 1, which is what the recursion check looks for in
do_coredump.

This patch:

Create new function call_usermodehelper_fns() and allow it to assign both
an init and cleanup function, as we'll as arbitrary data.

The init function is called from the context of the forked process and
allows for customization of the helper process prior to calling exec.  Its
return code gates the continuation of the process, or causes its exit.
Also add an arbitrary data pointer to the subprocess_info struct allowing
for data to be passed from the caller to the new process, and the
subsequent cleanup process

Also, use this patch to cleanup the cleanup function.  It currently takes
an argp and envp pointer for freeing, which is ugly.  Lets instead just
make the subprocess_info structure public, and pass that to the cleanup
and init routines

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:44 -07:00
Jack Steiner 0ac0c0d0f8 cpusets: randomize node rotor used in cpuset_mem_spread_node()
Some workloads that create a large number of small files tend to assign
too many pages to node 0 (multi-node systems).  Part of the reason is that
the rotor (in cpuset_mem_spread_node()) used to assign nodes starts at
node 0 for newly created tasks.

This patch changes the rotor to be initialized to a random node number of
the cpuset.

[akpm@linux-foundation.org: fix layout]
[Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration]
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Paul Menage <menage@google.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:44 -07:00
Jack Steiner 6adef3ebe5 cpusets: new round-robin rotor for SLAB allocations
We have observed several workloads running on multi-node systems where
memory is assigned unevenly across the nodes in the system.  There are
numerous reasons for this but one is the round-robin rotor in
cpuset_mem_spread_node().

For example, a simple test that writes a multi-page file will allocate
pages on nodes 0 2 4 6 ...  Odd nodes are skipped.  (Sometimes it
allocates on odd nodes & skips even nodes).

An example is shown below.  The program "lfile" writes a file consisting
of 10 pages.  The program then mmaps the file & uses get_mempolicy(...,
MPOL_F_NODE) to determine the nodes where the file pages were allocated.
The output is shown below:

	# ./lfile
	 allocated on nodes: 2 4 6 0 1 2 6 0 2

There is a single rotor that is used for allocating both file pages & slab
pages.  Writing the file allocates both a data page & a slab page
(buffer_head).  This advances the RR rotor 2 nodes for each page
allocated.

A quick confirmation seems to confirm this is the cause of the uneven
allocation:

	# echo 0 >/dev/cpuset/memory_spread_slab
	# ./lfile
	 allocated on nodes: 6 7 8 9 0 1 2 3 4 5

This patch introduces a second rotor that is used for slab allocations.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Paul Menage <menage@google.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:44 -07:00
Kirill A. Shutemov 907860ed38 cgroups: make cftype.unregister_event() void-returning
Since we are unable to handle an error returned by
cftype.unregister_event() properly, let's make the callback
void-returning.

mem_cgroup_unregister_event() has been rewritten to be a "never fail"
function.  On mem_cgroup_usage_register_event() we save old buffer for
thresholds array and reuse it in mem_cgroup_usage_unregister_event() to
avoid allocation.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:44 -07:00
akpm@linux-foundation.org ac39cf8cb8 memcg: fix mis-accounting of file mapped racy with migration
FILE_MAPPED per memcg of migrated file cache is not properly updated,
because our hook in page_add_file_rmap() can't know to which memcg
FILE_MAPPED should be counted.

Basically, this patch is for fixing the bug but includes some big changes
to fix up other messes.

Now, at migrating mapped file, events happen in following sequence.

 1. allocate a new page.
 2. get memcg of an old page.
 3. charge ageinst a new page before migration. But at this point,
    no changes to new page's page_cgroup, no commit for the charge.
    (IOW, PCG_USED bit is not set.)
 4. page migration replaces radix-tree, old-page and new-page.
 5. page migration remaps the new page if the old page was mapped.
 6. Here, the new page is unlocked.
 7. memcg commits the charge for newpage, Mark the new page's page_cgroup
    as PCG_USED.

Because "commit" happens after page-remap, we can count FILE_MAPPED
at "5", because we should avoid to trust page_cgroup->mem_cgroup.
if PCG_USED bit is unset.
(Note: memcg's LRU removal code does that but LRU-isolation logic is used
 for helping it. When we overwrite page_cgroup->mem_cgroup, page_cgroup is
 not on LRU or page_cgroup->mem_cgroup is NULL.)

We can lose file_mapped accounting information at 5 because FILE_MAPPED
is updated only when mapcount changes 0->1. So we should catch it.

BTW, historically, above implemntation comes from migration-failure
of anonymous page. Because we charge both of old page and new page
with mapcount=0, we can't catch
  - the page is really freed before remap.
  - migration fails but it's freed before remap
or .....corner cases.

New migration sequence with memcg is:

 1. allocate a new page.
 2. mark PageCgroupMigration to the old page.
 3. charge against a new page onto the old page's memcg. (here, new page's pc
    is marked as PageCgroupUsed.)
 4. page migration replaces radix-tree, page table, etc...
 5. At remapping, new page's page_cgroup is now makrked as "USED"
    We can catch 0->1 event and FILE_MAPPED will be properly updated.

    And we can catch SWAPOUT event after unlock this and freeing this
    page by unmap() can be caught.

 7. Clear PageCgroupMigration of the old page.

So, FILE_MAPPED will be correctly updated.

Then, for what MIGRATION flag is ?
  Without it, at migration failure, we may have to charge old page again
  because it may be fully unmapped. "charge" means that we have to dive into
  memory reclaim or something complated. So, it's better to avoid
  charge it again. Before this patch, __commit_charge() was working for
  both of the old/new page and fixed up all. But this technique has some
  racy condtion around FILE_MAPPED and SWAPOUT etc...
  Now, the kernel use MIGRATION flag and don't uncharge old page until
  the end of migration.

I hope this change will make memcg's page migration much simpler.  This
page migration has caused several troubles.  Worth to add a flag for
simplification.

Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Tested-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reported-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:44 -07:00
Daisuke Nishimura 87946a7228 memcg: move charge of file pages
This patch adds support for moving charge of file pages, which include
normal file, tmpfs file and swaps of tmpfs file.  It's enabled by setting
bit 1 of <target cgroup>/memory.move_charge_at_immigrate.

Unlike the case of anonymous pages, file pages(and swaps) in the range
mmapped by the task will be moved even if the task hasn't done page fault,
i.e.  they might not be the task's "RSS", but other task's "RSS" that maps
the same file.  And mapcount of the page is ignored(the page can be moved
even if page_mapcount(page) > 1).  So, conditions that the page/swap
should be met to be moved is that it must be in the range mmapped by the
target task and it must be charged to the old cgroup.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:43 -07:00
Felipe Balbi c4b5be98fe gpiolib: introduce set_debounce method
A few architectures, like OMAP, allow you to set a debouncing time for the
gpio before generating the IRQ.  Teach gpiolib about that.

Mark said:
: This would be generally useful for embedded systems, especially where
: the interrupt concerned is a wake source.  It allows drivers to avoid
: spurious interrupts from noisy sources so if the hardware supports it
: the driver can avoid having to explicitly wait for the signal to become
: stable and software has to cope with fewer events.  We've lived without
: it for quite some time, though.

David said:
: I looked at adding debounce support to the generic GPIO calls (and thus
: gpiolib) some time back, but decided against it.  I forget why at this
: time (check list archives) but it wasn't because of lack of utility in
: certain contexts.
:
: One thing to watch out for is just how variable the hardware capabilities
: are.  Atmel GPIOs have something like a fixed number of 32K clock cycles
: for debounce, twl4030 had something odd, OMAPs were more like the Atmel
: chips but with a different clock.  In some cases debouncing had to be
: ganged, not per-GPIO.  And so forth.

Signed-off-by: Felipe Balbi <felipe.balbi@nokia.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: David Brownell <david-b@pacbell.net>
Reviewed-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:42 -07:00
Uwe Kleine-König 62154991a8 gpiolib: make names array and its values const
gpiolib doesn't need to modify the names and I assume most initializers
use string constants that shouldn't be modified anyhow.

[akpm@linux-foundation.org: fix drivers/gpio/cs5535-gpio.c]
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Kevin Wells <kevin.wells@nxp.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:41 -07:00
Marc Zyngier a80a0bbee4 gpio: add interrupt handling capability to max732x
Most of the GPIO expanders supported by the max732x driver have interrupt
generation capability by reporting changes on input pins through an INT#
pin.  This patch implements the irq_chip functionnality (edge detection
only).

Signed-off-by: Marc Zyngier <maz@misterjones.org>
Cc: Eric Miao <eric.y.miao@gmail.com>
Cc: Jebediah Huang <jebediah.huang@gmail.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:41 -07:00
Viresh KUMAR c63b3cba4f sdhci-spear: ST SPEAr based SDHCI controller glue
Add a glue layer to support the sdhci driver on the ST SPEAr platform.

Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Cc: <shiraz.hashim@st.com>
Cc: Linus Walleij <linus.ml.walleij@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:40 -07:00
Grazvydas Ignotas 6c1f716e81 sdio: add new function for RAW (Read after Write) operation
SDIO specification allows RAW (Read after Write) operation using
IO_RW_DIRECT command (CMD52) by setting the RAW bit.  This operation is
similar to ordinary read/write commands, except that both write and read
are performed using single command/response pair.  The Linux SDIO layer
already supports this internaly, only external function is missing for
drivers to make use, which is added by this patch.

This type of command is required to implement proper power save mode
support in wl1251 wifi driver.

Android has similar patch for G1 in it's tree for the same reason:

http://android.git.kernel.org/?p=kernel/common.git;a=commitdiff;h=74a47786f6ecbe6c1cf9fb15efe6a968451deb52

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Kalle Valo <kalle.valo@iki.fi>
Cc: Dmitry Shmidt <dimitrysh@google.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:40 -07:00
Matt Fleming 1a13f8fa76 mmc: remove the "state" argument to mmc_suspend_host()
Even though many mmc host drivers pass a pm_message_t argument to
mmc_suspend_host() that argument isn't used the by MMC core.  As host
drivers are converted to dev_pm_ops they'll have to construct
pm_message_t's (as they won't be passed by the PM subsystem any more) just
to appease the mmc suspend interface.

We might as well just delete the unused paramter.

Signed-off-by: Matt Fleming <matt@console-pimps.org>
Acked-by: Anton Vorontsov <cbouatmailru@gmail.com>
Acked-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl>ZZ
Acked-by: Sascha Sommer <saschasommer@freenet.de>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:40 -07:00
Yusuke Goda fdc50a9444 mmc: add support MMCIF for SuperH
MMCIF is the MMC Host Interface in SuperH.

Signed-off-by: Yusuke Goda <yusuke.goda.sx@renesas.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:39 -07:00
Anton Vorontsov a7626b7a5d sdhci-pltfm: implement platform data passing
This includes platform ops, quirks and (de)initialization callbacks.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Cc: Richard Röjfors <richard.rojfors@pelagicore.com>
Cc: David Vrabel <david.vrabel@csr.com>
Cc: Pierre Ossman <pierre@ossman.eu>
Cc: Ben Dooks <ben@simtec.co.uk>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:39 -07:00
Daniel Mack 43b8e3bc4a ALSA: usb-audio: parse UAC2 endpoint descriptors correctly
UAC2 devices have their information about pitch control stored in a
different field. Parse it, and emulate the bits for a v1 device.

A new struct uac2_iso_endpoint_descriptor is added.

Signed-off-by: Daniel Mack <daniel@caiaq.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2010-05-27 09:49:22 +02:00
Len Brown 6b2c676bf3 cpuidle: fail to register if !CONFIG_CPU_IDLE
Signed-off-by: Len Brown <len.brown@intel.com>
2010-05-27 01:56:24 -04:00
Lars-Peter Clausen f5bf403a9d backlight: pcf50633: Register a pcf50633-backlight device in pcf50633 core driver.
Register a device newly added pcf50633-backlight driver as a child device in
the pcf50633 core driver.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>
2010-05-26 17:34:39 +01:00
Lars-Peter Clausen 2ddfd12f35 backlight: Add pcf50633 backlight driver
This patch adds a backlight driver for controlling the pcf50633 LED module.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>
2010-05-26 17:34:38 +01:00
InKi Dae ee378a5c65 backlight: add S6E63M0 AMOLED LCD Panel driver
This is S6E63M0 AMOLED LCD Panel(480x800) driver using 3-wired SPI
interface also almost features for lcd panel driver has been implemented
in here.  and I added new structure common for all the lcd panel drivers
to include/linux/lcd.h file.

LCD Panel driver needs interfaces for controlling device power such as
power on/off and reset.  these interfaces are device specific so it should
be implemented to machine code at this time, we should create new
structure for registering these functions as callbacks and also a header
file for that structure and finally registered callback functions would be
called by lcd panel driver.  such header file(including new structure for
lcd panel) would be added for all the lcd panel drivers.

If anyone provides common structure for registering such callback
functions then we could reduce unnecessary header files for lcd panel.  I
thought that suitable anyone could be include/linux/lcd.h so a new
lcd_platform_data structure was added there.

[akpm@linux-foundation.org: coding-style fixes]
[randy.dunlap@oracle.com: fix s6e63m0 kconfig]
[randy.dunlap@oracle.com: fix device attribute functions return types]
Signed-off-by: InKi Dae <inki.dae@samsung.com>
Reviewed-by: KyungMin Park <kyungmin.park.samsung.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>
2010-05-26 17:34:16 +01:00
Linus Torvalds 13da9e200f Revert "endian: #define __BYTE_ORDER"
This reverts commit b3b77c8cae, which was
also totally broken (see commit 0d2daf5cc8 that reverted the crc32
version of it).  As reported by Stephen Rothwell, it causes problems on
big-endian machines:

> In file included from fs/jfs/jfs_types.h:33,
>                  from fs/jfs/jfs_incore.h:26,
>                  from fs/jfs/file.c:22:
> fs/jfs/endian24.h:36:101: warning: "__LITTLE_ENDIAN" is not defined

The kernel has never had that crazy "__BYTE_ORDER == __LITTLE_ENDIAN"
model.  It's not how we do things, and it isn't how we _should_ do
things.  So don't go there.

Requested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-26 08:30:15 -07:00
Michael Hennerich c7c06d8a95 backlight: adp8860: add support for ADP8861 & ADP8863
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>
2010-05-26 13:08:33 +01:00
Michael Hennerich 82fd53b7f7 backlight: new driver for the ADP8860 backlight parts
The ADP8860 combines a programmable backlight LED charge pump driver with
automatic phototransistor control.

Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>
2010-05-26 13:08:31 +01:00
Philippe Rétornaz 7fdcef8a41 leds: Add mc13783 LED support
This add basic led support for Freescale MC13783 PMIC.

Signed-off-by: Philippe Rétornaz <philippe.retornaz@epfl.ch>
Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>
2010-05-26 13:07:56 +01:00
Benjamin Herrenschmidt 2146325df2 leds: leds-gpio: Change blink_set callback to be able to turn off blinking
The leds-gpio blink_set() callback follows the same prototype as the
main leds subsystem blink_set() one.

The problem is that to stop blink, normally, a leds driver does it
in the brightness_set() callback when asked to set a new fixed value.

However, with leds-gpio, the platform has no hook to do so, as this
later callback results in a standard GPIO manipulation.

This changes the leds-gpio specific callback to take a new argument
that indicates whether the LED should be blinking or not and in what
state it should be set if not. We also update the dns323 platform
which seems to be the only user of this so far.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>
2010-05-26 13:07:55 +01:00
Linus Torvalds b1cdc4670b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (63 commits)
  drivers/net/usb/asix.c: Fix pointer cast.
  be2net: Bug fix to avoid disabling bottom half during firmware upgrade.
  proc_dointvec: write a single value
  hso: add support for new products
  Phonet: fix potential use-after-free in pep_sock_close()
  ath9k: remove VEOL support for ad-hoc
  ath9k: change beacon allocation to prefer the first beacon slot
  sock.h: fix kernel-doc warning
  cls_cgroup: Fix build error when built-in
  macvlan: do proper cleanup in macvlan_common_newlink() V2
  be2net: Bug fix in init code in probe
  net/dccp: expansion of error code size
  ath9k: Fix rx of mcast/bcast frames in PS mode with auto sleep
  wireless: fix sta_info.h kernel-doc warnings
  wireless: fix mac80211.h kernel-doc warnings
  iwlwifi: testing the wrong variable in iwl_add_bssid_station()
  ath9k_htc: rare leak in ath9k_hif_usb_alloc_tx_urbs()
  ath9k_htc: dereferencing before check in hif_usb_tx_cb()
  rt2x00: Fix rt2800usb TX descriptor writing.
  rt2x00: Fix failed SLEEP->AWAKE and AWAKE->SLEEP transitions.
  ...
2010-05-25 16:59:51 -07:00
Tejun Heo 43c9c59185 libata: implement dump_id force param
Add dump_id libata.force parameter.  If specified, libata dumps full
IDENTIFY data during device configuration.  This is to aid debugging.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Larry Baker <baker@usgs.gov>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-05-25 19:41:19 -04:00
Tejun Heo 9a7780c9ac libata-sff: make BMDMA optional
Make BMDMA optional depending on new config variable CONFIG_ATA_BMDMA.
In Kconfig, drivers are grouped into five groups - non-SFF native, SFF
w/ custom DMA interface, SFF w/ BMDMA, PIO-only SFF, and generic
fallback / legacy ones.  Kconfig and Makefile are reorganized
according to the groups and ordered alphabetically inside each group.

ata_ioports.bmdma_addr and ata_port.bmdma_prd[_dma] are put into
CONFIG_ATA_BMDMA, as are all bmdma related ops, variables and
functions.

This increase the binary size slightly when BMDMA is enabled but on
both native-only and PIO-only configurations the size is slightly
reduced.  Either way, the size difference is insignificant.  This
change is more meaningful to signify the separation between SFF and
BMDMA and as a tool to verify the separation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-05-25 19:41:12 -04:00
Tejun Heo 1c5afdf7a6 libata-sff: separate out BMDMA init
Separate out ata_pci_bmdma_prepare_host() and ata_pci_bmdma_init_one()
from their SFF counterparts.  SFF ones no longer try to initialize
BMDMA or set PCI master.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-05-25 19:40:30 -04:00
Tejun Heo c3b2889424 libata-sff: separate out BMDMA irq handler
Separate out BMDMA irq handler from SFF irq handler.  The misnamed
host_intr() functions are renamed to ata_sff_port_intr() and
ata_bmdma_port_intr().  Common parts are factored into
__ata_sff_port_intr() and __ata_sff_interrupt() and used by sff and
bmdma interrupt routines.

All BMDMA drivers now use ata_bmdma_interrupt() or
ata_bmdma_port_intr() while all non-BMDMA SFF ones use
ata_sff_interrupt() or ata_sff_port_intr().

For now, ata_pci_sff_init_one() uses ata_bmdma_interrupt() as it's
used by both SFF and BMDMA drivers.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-05-25 19:40:24 -04:00
Tejun Heo 37f65b8bc2 libata-sff: ata_sff_irq_clear() is BMDMA specific
ata_sff_irq_clear() is BMDMA specific.  Rename it to
ata_bmdma_irq_clear(), move it to ata_bmdma_port_ops and make
->sff_irq_clear() optional.

Note: ata_bmdma_irq_clear() is actually only needed by ata_piix and
      possibly by sata_sil.  This should be moved to respective low
      level drivers later.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-05-25 19:40:19 -04:00
Kay Sievers 578454ff7e driver core: add devname module aliases to allow module on-demand auto-loading
This adds:
  alias: devname:<name>
to some common kernel modules, which will allow the on-demand loading
of the kernel module when the device node is accessed.

Ideally all these modules would be compiled-in, but distros seems too
much in love with their modularization that we need to cover the common
cases with this new facility. It will allow us to remove a bunch of pretty
useless init scripts and modprobes from init scripts.

The static device node aliases will be carried in the module itself. The
program depmod will extract this information to a file in the module directory:
  $ cat /lib/modules/2.6.34-00650-g537b60d-dirty/modules.devname
  # Device nodes to trigger on-demand module loading.
  microcode cpu/microcode c10:184
  fuse fuse c10:229
  ppp_generic ppp c108:0
  tun net/tun c10:200
  dm_mod mapper/control c10:235

Udev will pick up the depmod created file on startup and create all the
static device nodes which the kernel modules specify, so that these modules
get automatically loaded when the device node is accessed:
  $ /sbin/udevd --debug
  ...
  static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184
  static_dev_create_from_modules: mknod '/dev/fuse' c10:229
  static_dev_create_from_modules: mknod '/dev/ppp' c108:0
  static_dev_create_from_modules: mknod '/dev/net/tun' c10:200
  static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235
  udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666
  udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666

A few device nodes are switched to statically allocated numbers, to allow
the static nodes to work. This might also useful for systems which still run
a plain static /dev, which is completely unsafe to use with any dynamic minor
numbers.

Note:
The devname aliases must be limited to the *common* and *single*instance*
device nodes, like the misc devices, and never be used for conceptually limited
systems like the loop devices, which should rather get fixed properly and get a
control node for losetup to talk to, instead of creating a random number of
device nodes in advance, regardless if they are ever used.

This facility is to hide the mess distros are creating with too modualized
kernels, and just to hide that these modules are not compiled-in, and not to
paper-over broken concepts. Thanks! :)

Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Ian Kent <raven@themaw.net>
Signed-Off-By: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-25 15:08:26 -07:00
Linus Torvalds ec96e2fe95 Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: (103 commits)
  ARM: 6141/1: Add audio support part in arch/arm/mach-w90x900
  ARM: 5939/1: ARM: Add option CMDLINE_FORCE to force usage of the in-kernel cmdline
  ARM: 6140/1: silence a bogus sparse warning in unwind.c
  ARM: mach-at91: duplicated include
  ARM: arch/arm/nwfpe/fpsr.h: Checkpatch cleanup
  ARM: arch/arm/mach-shark/pci.c: Checkpatch cleanup
  ARM: arch/arm/nwfpe/ChangeLog: Checkpatch cleanup
  ARM: arch/arm/mach-sa1100/leds.c: Checkpatch cleanup
  ARM: arch/arm/mach-h720x/common.h: Checkpatch cleanup
  ARM: arch/arm/mach-footbridge/ebsa285-pci.c: Checkpatch cleanup
  ARM: arch/arm/mach-clps711x/Makefile.boot: Checkpatch cleanup
  ARM: arch/arm/boot/bootp/bootp.lds: Checkpatch cleanup
  ARM: SPEAR6xx: remove duplicated #include
  ARM: s3c6400_defconfig: Add NAND driver
  ARM: s3c6400_defconfig: enable sound as modules
  ARM: s3c6400_defconfig: enable power management
  ARM: s5pv210_defconfig: Update s5pv210_defconfig to v2.6.34
  ARM: s5pc110_defconfig: Update s5pc110_defconfig to v2.6.34
  ARM: s5p6442_defconfig: Update s5p6442_defconfig to v2.6.34
  ARM: s5p6440_defconfig: Update s5p6440_defconfig to v2.6.34
  ...
2010-05-25 12:06:33 -07:00
Linus Torvalds 702c0b0497 Merge branch 'next-spi' of git://git.secretlab.ca/git/linux-2.6
* 'next-spi' of git://git.secretlab.ca/git/linux-2.6:
  spi/xilinx: Fix compile error
  spi/davinci: Fix clock prescale factor computation
  spi: move bitbang txrx utility functions to private header
  spi/mpc5121: Add SPI master driver for MPC5121 PSC
  powerpc/mpc5121: move PSC FIFO memory init to platform code
  spi/ep93xx: implemented driver for Cirrus EP93xx SPI controller
  Documentation/spi/* compile warning fix
  spi/omap2_mcspi: Check params before dereference or use
  spi/omap2_mcspi: add turbo mode support
  spi/omap2_mcspi: change default DMA_MIN_BYTES value to 160
  spi/pl022: fix stop queue procedure
  spi/pl022: add support for the PL023 derivate
  spi/pl022: fix up differences between ARM and ST versions
  spi/spi_mpc8xxx: Do not use map_tx_dma to unmap rx_dma
  spi/spi_mpc8xxx: Fix QE mode Litte Endian
  spi/spi_mpc8xxx: fix potential memory corruption.
2010-05-25 12:04:17 -07:00
Linus Torvalds 99765cc7e3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6:
  regulator: return set_mode is same mode is requested
  Regulators: ab3100/bq24022: add a missing .owner field in regulator_desc
  twl6030: regulator: Remove vsel tables and use formula for calculation
  mc13783-regulator: fix vaild voltage range checking for mc13783_fixed_regulator_set_voltage
  regulator: use voltage number array in 88pm860x
  regulator: make 88pm860x sharing one driver structure
  regulator: simplify regulator_register() error handling
  regulator: fix unset_regulator_supplies() to remove all matches
  regulator: prevent registration of matching regulator consumer supplies
  regulator: Allow regulator-regulator supplies to be specified by name
2010-05-25 11:49:41 -07:00
Feng Tang 5487ab4a5a SFI: add support for v0.81 spec
There are 2 major changes from v0.81 to v0.7:
1. Consolidating the SPIB/I2CB tables into a new DEVS table,
   which is more expandable and can support other bus types
   than spi/i2c.
2. Creating a new GPIO table, which list all the GPIO pins
   used in the platform.

However, to avoid breaking current platforms who use SFI v0.7
version firmware, the definitions for SPIB/I2CB will still
be kept for a while

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-05-25 11:41:43 -04:00
Grazvydas Ignotas 49c39b4953 fbdev: move FBIO_WAITFORVSYNC to linux/fb.h
FBIO_WAITFORVSYNC is currently implemented by matroxfb, atyfb, intelfb and
more.  All of them keep redefining the same FBIO_WAITFORVSYNC macro over
and over again, so move it to linux/fb.h and clean up those duplicate
defines.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Cc: Ville Syrjala <syrjala@sci.fi>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Maik Broemme <mbroemme@plusserver.de>
Cc: Petr Vandrovec <vandrove@vc.cvut.cz>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: "Hiremath, Vaibhav" <hvaibhav@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:09 -07:00
Samu Onkalo 6d94d40810 lis3: interrupt handlers for 8bit wakeup and click events
Content for the 8bit device threaded interrupt handlers.  Depending on the
interrupt line and chip configuration, either click or wakeup / freefall
handler is called.  In case of click, BTN_ event is sent via input device.
 In case of wakeup or freefall, input device ABS_ events are updated
immediatelly.

It is still possible to configure interrupt line 1 for fast freefall
detection and use the second line either for click or threshold based
interrupts.  Or both lines can be used for click / threshold interrupts.

Polled input device can be set to stopped state and still get coordinate
updates via input device using interrupt based method.  Polled mode and
interrupt mode can also be used parallel.

BTN_ events are remapped based on existing axis remapping information.

Signed-off-by: Samu Onkalo <samu.p.onkalo@nokia.com>
Acked-by: Eric Piel <eric.piel@tremplin-utc.net>
Cc: Daniel Mack <daniel@caiaq.de>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:07 -07:00
Samu Onkalo 92ba4fe4b5 lis3: add skeletons for interrupt handlers
Original lis3 driver didn't provide interrupt handler(s) for click or
threshold event handling.  This patch adds threaded handlers for one or
two interrupt lines for 8 bit device.  Actual content for interrupt
handling is provided in the separate patch.

Signed-off-by: Samu Onkalo <samu.p.onkalo@nokia.com>
Tested-by: Daniel Mack <daniel@caiaq.de>
Acked-by: Eric Piel <eric.piel@tremplin-utc.net>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:06 -07:00
Samu Onkalo 342c5f1281 lis3: introduce platform data for second ff / wu unit
8 bit device has two wakeup / free fall units.  It was not possible to
configure the second unit.  This patch introduces configuration entry to
the platform data and also corresponding changes to the 8 bit setup
function.

High pass filters were enabled by default.  Patch introduces configuration
option for high pass filter cut off frequency and also possibility to
disable or enable the filter via platform data.  Since the control is a
new one and default state was filter enabled, new option is used to
disable the filter.  This way old platform data is still compatible with
the change.

Signed-off-by: Samu Onkalo <samu.p.onkalo@nokia.com>
Acked-by: Eric Piel <eric.piel@tremplin-utc.net>
Tested-by: Daniel Mack <daniel@caiaq.de>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:06 -07:00
Andy Shevchenko 903788892e lib: introduce common method to convert hex digits
hex_to_bin() is a little method which converts hex digit to its actual
value.  There are plenty of places where such functionality is needed.

[akpm@linux-foundation.org: use tolower(), saving 3 bytes, test the more common case first - it's quicker]
[akpm@linux-foundation.org: relocate tolower to make it even faster! (Joe)]
Signed-off-by: Andy Shevchenko <ext-andriy.shevchenko@nokia.com>
Cc: Tilman Schmidt <tilman@imap.cc>
Cc: Duncan Sands <duncan.sands@free.fr>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: "Richard Russon (FlatCap)" <ldm@flatcap.org>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:05 -07:00
Florian Ragwitz 2b2f68b538 DYNAMIC_DEBUG: fix documentation errors
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Florian Ragwitz <rafl@debian.org>
Cc: Jason Baron <jbaron@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:05 -07:00
OGAWA Hirofumi f40c396a9a ratelimit: add ratelimit_state_init()
For now, all users of ratelimit_state allocates it statically, so
DEFINE_RATELIMIT_STATE() is enough.  But, I want to use ratelimit_state
for fs, i.e.  per super_block to suppress too many error reports.

So, this adds ratelimit_state_init() to initialize ratelimite_state
which is dynamically allocated, instead of opencoding.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:03 -07:00
OGAWA Hirofumi d8521fcc5e printk_ratelimited(): fix uninitialized spinlock
ratelimit_state initialization of printk_ratelimited() seems broken.  This
fixes it by using DEFINE_RATELIMIT_STATE() to initialize spinlock
properly.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:03 -07:00
Joe Perches fc62f2f19e kernel.h: add pr_warn for symmetry to dev_warn, netdev_warn
The current logging macros are
pr_<level>, dev_<level>, netdev_<level>, and netif_<level>.
pr_ uses warning, the other use warn.

Standardize these logging macros a bit more by adding pr_warn and
pr_warn_ratelimited.

Right now, there are:

$ for level in emerg alert crit err warn warning notice info ; do \
    for prefix in pr dev netdev netif ; do \
      echo -n "${prefix}_${level}:	`git grep -w "${prefix}_${level}" | wc -l`	" ; \
    done ; \
    echo ; \
  done
pr_emerg: 	45	dev_emerg: 	4	netdev_emerg: 	1	netif_emerg: 	4
pr_alert: 	24	dev_alert: 	36	netdev_alert: 	1	netif_alert: 	6
pr_crit: 	24	dev_crit: 	22	netdev_crit: 	1	netif_crit: 	4
pr_err:  	2013	dev_err: 	8467	netdev_err: 	267	netif_err: 	240
pr_warn: 	0	dev_warn: 	1818	netdev_warn: 	126	netif_warn: 	23
pr_warning: 	773	dev_warning: 	0	netdev_warning:	0	netif_warning: 	0
pr_notice: 	148	dev_notice: 	111	netdev_notice: 	9	netif_notice: 	3
pr_info: 	1717	dev_info: 	3007	netdev_info: 	101	netif_info: 	85

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:03 -07:00
Alexey Dobriyan 4be929be34 kernel-wide: replace USHORT_MAX, SHORT_MAX and SHORT_MIN with USHRT_MAX, SHRT_MAX and SHRT_MIN
- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not
  USHORT_MAX/SHORT_MAX/SHORT_MIN.

- Make SHRT_MIN of type s16, not int, for consistency.

[akpm@linux-foundation.org: fix drivers/dma/timb_dma.c]
[akpm@linux-foundation.org: fix security/keys/keyring.c]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:02 -07:00
Joakim Tjernlund b3b77c8cae endian: #define __BYTE_ORDER
Linux does not define __BYTE_ORDER in its endian header files which makes
some header files bend backwards to get at the current endian.  Lets
#define __BYTE_ORDER in big_endian.h/litte_endian.h to make it easier for
header files that are used in user space too.

In userspace the convention is that

  1. _both_ __LITTLE_ENDIAN and __BIG_ENDIAN are defined,
  2. you have to test for e.g. __BYTE_ORDER == __BIG_ENDIAN.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:02 -07:00
Jani Nikula e47103b1af err.h: add __must_check to error pointer handlers
Add __must_check to error pointer handlers to have the compiler warn about
mistakes like:

	if (err)
		ERR_PTR(err);

It found two bugs:

Mar 12 Nikula Jani [PATCH] enclosure: fix error path - actually return ERR_PTR() on error
Mar 12 Nikula Jani [PATCH] sunrpc: fix error path - actually return ERR_PTR() on error

Signed-off-by: Jani Nikula <ext-jani.1.nikula@nokia.com>
Cc: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:02 -07:00
Chris Metcalf c6f6b596a5 mm: make lowmem_page_address() use PFN_PHYS() for improved portability
This ensures that platforms with lowmem PAs above 32 bits work correctly
by avoiding truncating the PA during a left shift.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Barry Song <21cnbao@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:02 -07:00
Haicheng Li 4eaf3f6439 mem-hotplug: fix potential race while building zonelist for new populated zone
Add global mutex zonelists_mutex to fix the possible race:

     CPU0                                  CPU1                    CPU2
(1) zone->present_pages += online_pages;
(2)                                       build_all_zonelists();
(3)                                                               alloc_page();
(4)                                                               free_page();
(5) build_all_zonelists();
(6)   __build_all_zonelists();
(7)     zone->pageset = alloc_percpu();

In step (3,4), zone->pageset still points to boot_pageset, so bad
things may happen if 2+ nodes are in this state. Even if only 1 node
is accessing the boot_pageset, (3) may still consume too much memory
to fail the memory allocations in step (7).

Besides, atomic operation ensures alloc_percpu() in step (7) will never fail
since there is a new fresh memory block added in step(6).

[haicheng.li@linux.intel.com: hold zonelists_mutex when build_all_zonelists]
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Andi Kleen <andi.kleen@intel.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:02 -07:00
Haicheng Li 1f522509c7 mem-hotplug: avoid multiple zones sharing same boot strapping boot_pageset
For each new populated zone of hotadded node, need to update its pagesets
with dynamically allocated per_cpu_pageset struct for all possible CPUs:

    1) Detach zone->pageset from the shared boot_pageset
       at end of __build_all_zonelists().

    2) Use mutex to protect zone->pageset when it's still
       shared in onlined_pages()

Otherwises, multiple zones of different nodes would share same boot strapping
boot_pageset for same CPU, which will finally cause below kernel panic:

  ------------[ cut here ]------------
  kernel BUG at mm/page_alloc.c:1239!
  invalid opcode: 0000 [#1] SMP
  ...
  Call Trace:
   [<ffffffff811300c1>] __alloc_pages_nodemask+0x131/0x7b0
   [<ffffffff81162e67>] alloc_pages_current+0x87/0xd0
   [<ffffffff81128407>] __page_cache_alloc+0x67/0x70
   [<ffffffff811325f0>] __do_page_cache_readahead+0x120/0x260
   [<ffffffff81132751>] ra_submit+0x21/0x30
   [<ffffffff811329c6>] ondemand_readahead+0x166/0x2c0
   [<ffffffff81132ba0>] page_cache_async_readahead+0x80/0xa0
   [<ffffffff8112a0e4>] generic_file_aio_read+0x364/0x670
   [<ffffffff81266cfa>] nfs_file_read+0xca/0x130
   [<ffffffff8117b20a>] do_sync_read+0xfa/0x140
   [<ffffffff8117bf75>] vfs_read+0xb5/0x1a0
   [<ffffffff8117c151>] sys_read+0x51/0x80
   [<ffffffff8103c032>] system_call_fastpath+0x16/0x1b
  RIP  [<ffffffff8112ff13>] get_page_from_freelist+0x883/0x900
   RSP <ffff88000d1e78a8>
  ---[ end trace 4bda28328b9990db ]

[akpm@linux-foundation.org: merge fix]
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Andi Kleen <andi.kleen@intel.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:01 -07:00
Marcelo Roberto Jimenez 0faa56389c mm: fix NR_SECTION_ROOTS == 0 when using using sparsemem extreme.
Got this while compiling for ARM/SA1100:

mm/sparse.c: In function '__section_nr':
mm/sparse.c:135: warning: 'root' is used uninitialized in this function

This patch follows Russell King's suggestion for a new calculation for
NR_SECTION_ROOTS.  Thanks also to Sergei Shtylyov for pointing out the
existence of the macro DIV_ROUND_UP.

Atsushi Nemoto observed:
: This fix doesn't just silence the warning - it fixes a real problem.
:
: Without this fix, mem_section[] might have 0 size so mem_section[0]
: will share other variable area.  For example, I got:
:
: c030c700 b __warned.16478
: c030c700 B mem_section
: c030c701 b __warned.16483
:
: This might cause very strange behavior.  Your patch actually fixes it.

Signed-off-by: Marcelo Roberto Jimenez <mroberto@cpti.cetuc.puc-rio.br>
Cc: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:01 -07:00
Akinobu Mita ff3d58c22b highmem: remove unneeded #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT for debug_kmap_atomic()
In f4112de6b6 ("mm: introduce
debug_kmap_atomic") I said that debug_kmap_atomic() needs
CONFIG_TRACE_IRQFLAGS_SUPPORT.

It was wrong.  (I thought irqs_disabled() is only available when the
architecture has CONFIG_TRACE_IRQFLAGS_SUPPORT)

Remove the #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT check to enable
kmap_atomic() debugging for the architectures which do not have
CONFIG_TRACE_IRQFLAGS_SUPPORT.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:01 -07:00
matt mooney fd23855e38 include/linux/gfp.h: fix coding style
Add parenthesis in a define.  This doesn't change functionality.

checkpatch errors:
1) white space fixes
2) add spaces after comas

Signed-off-by: matt mooney <mfm@muteddisk.com>
Cc: Dan Carpenter <error27@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:01 -07:00
matt mooney 263ff5d8e8 include/linux/gfp.h: spelling fixes
Fix minor spelling errors in a few comments; no code changes.

Signed-off-by: matt mooney <mfm@muteddisk.com>
Cc: Dan Carpenter <error27@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:00 -07:00
minskey guo cf23422b9d cpu/mem hotplug: enable CPUs online before local memory online
Enable users to online CPUs even if the CPUs belongs to a numa node which
doesn't have onlined local memory.

The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either
in system boot/init period, or at the time of local memory online.  For a
numa node without onlined local memory, its zonelists are not initialized
at present.  As a result, any memory allocation operations executed by
CPUs within this node will fail.  In fact, an out-of-memory error is
triggered when attempt to online CPUs before memory comes to online.

This patch tries to create zonelists for such numa nodes, so that the
memory allocation for this node can be fallback'ed to other nodes.

[akpm@linux-foundation.org: remove unneeded export]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: minskey guo<chaohong.guo@intel.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:00 -07:00
Johannes Weiner 8b25c6d223 vmscan: remove isolate_pages callback scan control
For now, we have global isolation vs.  memory control group isolation, do
not allow the reclaim entry function to set an arbitrary page isolation
callback, we do not need that flexibility.

And since we already pass around the group descriptor for the memory
control group isolation case, just use it to decide which one of the two
isolator functions to use.

The decisions can be merged into nearby branches, so no extra cost there.
In fact, we save the indirect calls.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:00 -07:00
Mel Gorman 4f92e2586b mm: compaction: defer compaction using an exponential backoff when compaction fails
The fragmentation index may indicate that a failure is due to external
fragmentation but after a compaction run completes, it is still possible
for an allocation to fail.  There are two obvious reasons as to why

  o Page migration cannot move all pages so fragmentation remains
  o A suitable page may exist but watermarks are not met

In the event of compaction followed by an allocation failure, this patch
defers further compaction in the zone (1 << compact_defer_shift) times.
If the next compaction attempt also fails, compact_defer_shift is
increased up to a maximum of 6.  If compaction succeeds, the defer
counters are reset again.

The zone that is deferred is the first zone in the zonelist - i.e.  the
preferred zone.  To defer compaction in the other zones, the information
would need to be stored in the zonelist or implemented similar to the
zonelist_cache.  This would impact the fast-paths and is not justified at
this time.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:00 -07:00
Mel Gorman 5e77190580 mm: compaction: add a tunable that decides when memory should be compacted and when it should be reclaimed
The kernel applies some heuristics when deciding if memory should be
compacted or reclaimed to satisfy a high-order allocation.  One of these
is based on the fragmentation.  If the index is below 500, memory will not
be compacted.  This choice is arbitrary and not based on data.  To help
optimise the system and set a sensible default for this value, this patch
adds a sysctl extfrag_threshold.  The kernel will only compact memory if
the fragmentation index is above the extfrag_threshold.

[randy.dunlap@oracle.com: Fix build errors when proc fs is not configured]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:06:59 -07:00
Mel Gorman 56de7263fc mm: compaction: direct compact when a high-order allocation fails
Ordinarily when a high-order allocation fails, direct reclaim is entered
to free pages to satisfy the allocation.  With this patch, it is
determined if an allocation failed due to external fragmentation instead
of low memory and if so, the calling process will compact until a suitable
page is freed.  Compaction by moving pages in memory is considerably
cheaper than paging out to disk and works where there are locked pages or
no swap.  If compaction fails to free a page of a suitable size, then
reclaim will still occur.

Direct compaction returns as soon as possible.  As each block is
compacted, it is checked if a suitable page has been freed and if so, it
returns.

[akpm@linux-foundation.org: Fix build errors]
[aarcange@redhat.com: fix count_vm_event preempt in memory compaction direct reclaim]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:06:59 -07:00
Mel Gorman ed4a6d7f06 mm: compaction: add /sys trigger for per-node memory compaction
Add a per-node sysfs file called compact.  When the file is written to,
each zone in that node is compacted.  The intention that this would be
used by something like a job scheduler in a batch system before a job
starts so that the job can allocate the maximum number of hugepages
without significant start-up cost.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:06:59 -07:00
Mel Gorman 76ab0f530e mm: compaction: add /proc trigger for memory compaction
Add a proc file /proc/sys/vm/compact_memory.  When an arbitrary value is
written to the file, all zones are compacted.  The expected user of such a
trigger is a job scheduler that prepares the system before the target
application runs.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:06:59 -07:00
Mel Gorman 748446bb6b mm: compaction: memory compaction core
This patch is the core of a mechanism which compacts memory in a zone by
relocating movable pages towards the end of the zone.

A single compaction run involves a migration scanner and a free scanner.
Both scanners operate on pageblock-sized areas in the zone.  The migration
scanner starts at the bottom of the zone and searches for all movable
pages within each area, isolating them onto a private list called
migratelist.  The free scanner starts at the top of the zone and searches
for suitable areas and consumes the free pages within making them
available for the migration scanner.  The pages isolated for migration are
then migrated to the newly isolated free pages.

[aarcange@redhat.com: Fix unsafe optimisation]
[mel@csn.ul.ie: do not schedule work on other CPUs for compaction]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:06:59 -07:00
Mel Gorman c175a0ce75 mm: move definition for LRU isolation modes to a header
Currently, vmscan.c defines the isolation modes for __isolate_lru_page().
Memory compaction needs access to these modes for isolating pages for
migration.  This patch exports them.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:06:59 -07:00
Mel Gorman a8bef8ff6e mm: migration: avoid race between shift_arg_pages() and rmap_walk() during migration by not migrating temporary stacks
Page migration requires rmap to be able to find all ptes mapping a page
at all times, otherwise the migration entry can be instantiated, but it
is possible to leave one behind if the second rmap_walk fails to find
the page.  If this page is later faulted, migration_entry_to_page() will
call BUG because the page is locked indicating the page was migrated by
the migration PTE not cleaned up. For example

  kernel BUG at include/linux/swapops.h:105!
  invalid opcode: 0000 [#1] PREEMPT SMP
  ...
  Call Trace:
   [<ffffffff810e951a>] handle_mm_fault+0x3f8/0x76a
   [<ffffffff8130c7a2>] do_page_fault+0x44a/0x46e
   [<ffffffff813099b5>] page_fault+0x25/0x30
   [<ffffffff8114de33>] load_elf_binary+0x152a/0x192b
   [<ffffffff8111329b>] search_binary_handler+0x173/0x313
   [<ffffffff81114896>] do_execve+0x219/0x30a
   [<ffffffff8100a5c6>] sys_execve+0x43/0x5e
   [<ffffffff8100320a>] stub_execve+0x6a/0xc0
  RIP  [<ffffffff811094ff>] migration_entry_wait+0xc1/0x129

There is a race between shift_arg_pages and migration that triggers this
bug.  A temporary stack is setup during exec and later moved.  If
migration moves a page in the temporary stack and the VMA is then removed
before migration completes, the migration PTE may not be found leading to
a BUG when the stack is faulted.

This patch causes pages within the temporary stack during exec to be
skipped by migration.  It does this by marking the VMA covering the
temporary stack with an otherwise impossible combination of VMA flags.
These flags are cleared when the temporary stack is moved to its final
location.

[kamezawa.hiroyu@jp.fujitsu.com: idea for having migration skip temporary stacks]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:06:59 -07:00
Mel Gorman 7f60c214fd mm: migration: share the anon_vma ref counts between KSM and page migration
For clarity of review, KSM and page migration have separate refcounts on
the anon_vma.  While clear, this is a waste of memory.  This patch gets
KSM and page migration to share their toys in a spirit of harmony.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:06:58 -07:00
Mel Gorman 3f6c82728f mm: migration: take a reference to the anon_vma before migrating
This patchset is a memory compaction mechanism that reduces external
fragmentation memory by moving GFP_MOVABLE pages to a fewer number of
pageblocks.  The term "compaction" was chosen as there are is a number of
mechanisms that are not mutually exclusive that can be used to defragment
memory.  For example, lumpy reclaim is a form of defragmentation as was
slub "defragmentation" (really a form of targeted reclaim).  Hence, this
is called "compaction" to distinguish it from other forms of
defragmentation.

In this implementation, a full compaction run involves two scanners
operating within a zone - a migration and a free scanner.  The migration
scanner starts at the beginning of a zone and finds all movable pages
within one pageblock_nr_pages-sized area and isolates them on a
migratepages list.  The free scanner begins at the end of the zone and
searches on a per-area basis for enough free pages to migrate all the
pages on the migratepages list.  As each area is respectively migrated or
exhausted of free pages, the scanners are advanced one area.  A compaction
run completes within a zone when the two scanners meet.

This method is a bit primitive but is easy to understand and greater
sophistication would require maintenance of counters on a per-pageblock
basis.  This would have a big impact on allocator fast-paths to improve
compaction which is a poor trade-off.

It also does not try relocate virtually contiguous pages to be physically
contiguous.  However, assuming transparent hugepages were in use, a
hypothetical khugepaged might reuse compaction code to isolate free pages,
split them and relocate userspace pages for promotion.

Memory compaction can be triggered in one of three ways.  It may be
triggered explicitly by writing any value to /proc/sys/vm/compact_memory
and compacting all of memory.  It can be triggered on a per-node basis by
writing any value to /sys/devices/system/node/nodeN/compact where N is the
node ID to be compacted.  When a process fails to allocate a high-order
page, it may compact memory in an attempt to satisfy the allocation
instead of entering direct reclaim.  Explicit compaction does not finish
until the two scanners meet and direct compaction ends if a suitable page
becomes available that would meet watermarks.

The series is in 14 patches.  The first three are not "core" to the series
but are important pre-requisites.

Patch 1 reference counts anon_vma for rmap_walk_anon(). Without this
	patch, it's possible to use anon_vma after free if the caller is
	not holding a VMA or mmap_sem for the pages in question. While
	there should be no existing user that causes this problem,
	it's a requirement for memory compaction to be stable. The patch
	is at the start of the series for bisection reasons.
Patch 2 merges the KSM and migrate counts. It could be merged with patch 1
	but would be slightly harder to review.
Patch 3 skips over unmapped anon pages during migration as there are no
	guarantees about the anon_vma existing. There is a window between
	when a page was isolated and migration started during which anon_vma
	could disappear.
Patch 4 notes that PageSwapCache pages can still be migrated even if they
	are unmapped.
Patch 5 allows CONFIG_MIGRATION to be set without CONFIG_NUMA
Patch 6 exports a "unusable free space index" via debugfs. It's
	a measure of external fragmentation that takes the size of the
	allocation request into account. It can also be calculated from
	userspace so can be dropped if requested
Patch 7 exports a "fragmentation index" which only has meaning when an
	allocation request fails. It determines if an allocation failure
	would be due to a lack of memory or external fragmentation.
Patch 8 moves the definition for LRU isolation modes for use by compaction
Patch 9 is the compaction mechanism although it's unreachable at this point
Patch 10 adds a means of compacting all of memory with a proc trgger
Patch 11 adds a means of compacting a specific node with a sysfs trigger
Patch 12 adds "direct compaction" before "direct reclaim" if it is
	determined there is a good chance of success.
Patch 13 adds a sysctl that allows tuning of the threshold at which the
	kernel will compact or direct reclaim
Patch 14 temporarily disables compaction if an allocation failure occurs
	after compaction.

Testing of compaction was in three stages.  For the test, debugging,
preempt, the sleep watchdog and lockdep were all enabled but nothing nasty
popped out.  min_free_kbytes was tuned as recommended by hugeadm to help
fragmentation avoidance and high-order allocations.  It was tested on X86,
X86-64 and PPC64.

Ths first test represents one of the easiest cases that can be faced for
lumpy reclaim or memory compaction.

1. Machine freshly booted and configured for hugepage usage with
	a) hugeadm --create-global-mounts
	b) hugeadm --pool-pages-max DEFAULT:8G
	c) hugeadm --set-recommended-min_free_kbytes
	d) hugeadm --set-recommended-shmmax

	The min_free_kbytes here is important. Anti-fragmentation works best
	when pageblocks don't mix. hugeadm knows how to calculate a value that
	will significantly reduce the worst of external-fragmentation-related
	events as reported by the mm_page_alloc_extfrag tracepoint.

2. Load up memory
	a) Start updatedb
	b) Create in parallel a X files of pagesize*128 in size. Wait
	   until files are created. By parallel, I mean that 4096 instances
	   of dd were launched, one after the other using &. The crude
	   objective being to mix filesystem metadata allocations with
	   the buffer cache.
	c) Delete every second file so that pageblocks are likely to
	   have holes
	d) kill updatedb if it's still running

	At this point, the system is quiet, memory is full but it's full with
	clean filesystem metadata and clean buffer cache that is unmapped.
	This is readily migrated or discarded so you'd expect lumpy reclaim
	to have no significant advantage over compaction but this is at
	the POC stage.

3. In increments, attempt to allocate 5% of memory as hugepages.
	   Measure how long it took, how successful it was, how many
	   direct reclaims took place and how how many compactions. Note
	   the compaction figures might not fully add up as compactions
	   can take place for orders other than the hugepage size

X86				vanilla		compaction
Final page count                    913                916 (attempted 1002)
pages reclaimed                   68296               9791

X86-64				vanilla		compaction
Final page count:                   901                902 (attempted 1002)
Total pages reclaimed:           112599              53234

PPC64				vanilla		compaction
Final page count:                    93                 94 (attempted 110)
Total pages reclaimed:           103216              61838

There was not a dramatic improvement in success rates but it wouldn't be
expected in this case either.  What was important is that fewer pages were
reclaimed in all cases reducing the amount of IO required to satisfy a
huge page allocation.

The second tests were all performance related - kernbench, netperf, iozone
and sysbench.  None showed anything too remarkable.

The last test was a high-order allocation stress test.  Many kernel
compiles are started to fill memory with a pressured mix of unmovable and
movable allocations.  During this, an attempt is made to allocate 90% of
memory as huge pages - one at a time with small delays between attempts to
avoid flooding the IO queue.

                                             vanilla   compaction
Percentage of request allocated X86               98           99
Percentage of request allocated X86-64            95           98
Percentage of request allocated PPC64             55           70

This patch:

rmap_walk_anon() does not use page_lock_anon_vma() for looking up and
locking an anon_vma and it does not appear to have sufficient locking to
ensure the anon_vma does not disappear from under it.

This patch copies an approach used by KSM to take a reference on the
anon_vma while pages are being migrated.  This should prevent rmap_walk()
running into nasty surprises later because anon_vma has been freed.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:06:58 -07:00
Miao Xie c0ff7453bb cpuset,mm: fix no node to alloc memory when changing cpuset's mems
Before applying this patch, cpuset updates task->mems_allowed and
mempolicy by setting all new bits in the nodemask first, and clearing all
old unallowed bits later.  But in the way, the allocator may find that
there is no node to alloc memory.

The reason is that cpuset rebinds the task's mempolicy, it cleans the
nodes which the allocater can alloc pages on, for example:

(mpol: mempolicy)
	task1			task1's mpol	task2
	alloc page		1
	  alloc on node0? NO	1
				1		change mems from 1 to 0
				1		rebind task1's mpol
				0-1		  set new bits
				0	  	  clear disallowed bits
	  alloc on node1? NO	0
	  ...
	can't alloc page
	  goto oom

This patch fixes this problem by expanding the nodes range first(set newly
allowed bits) and shrink it lazily(clear newly disallowed bits).  So we
use a variable to tell the write-side task that read-side task is reading
nodemask, and the write-side task clears newly disallowed nodes after
read-side task ends the current memory allocation.

[akpm@linux-foundation.org: fix spello]
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Paul Menage <menage@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:06:57 -07:00
Miao Xie 708c1bbc9d mempolicy: restructure rebinding-mempolicy functions
Nick Piggin reported that the allocator may see an empty nodemask when
changing cpuset's mems[1].  It happens only on the kernel that do not do
atomic nodemask_t stores.  (MAX_NUMNODES > BITS_PER_LONG)

But I found that there is also a problem on the kernel that can do atomic
nodemask_t stores.  The problem is that the allocator can't find a node to
alloc page when changing cpuset's mems though there is a lot of free
memory.  The reason is like this:

(mpol: mempolicy)
	task1			task1's mpol	task2
	alloc page		1
	  alloc on node0? NO	1
				1		change mems from 1 to 0
				1		rebind task1's mpol
				0-1		  set new bits
				0	  	  clear disallowed bits
	  alloc on node1? NO	0
	  ...
	can't alloc page
	  goto oom

I can use the attached program reproduce it by the following step:

# mkdir /dev/cpuset
# mount -t cpuset cpuset /dev/cpuset
# mkdir /dev/cpuset/1
# echo `cat /dev/cpuset/cpus` > /dev/cpuset/1/cpus
# echo `cat /dev/cpuset/mems` > /dev/cpuset/1/mems
# echo $$ > /dev/cpuset/1/tasks
# numactl --membind=`cat /dev/cpuset/mems` ./cpuset_mem_hog <nr_tasks> &
   <nr_tasks> = max(nr_cpus - 1, 1)
# killall -s SIGUSR1 cpuset_mem_hog
# ./change_mems.sh

several hours later, oom will happen though there is a lot of free memory.

This patchset fixes this problem by expanding the nodes range first(set
newly allowed bits) and shrink it lazily(clear newly disallowed bits).  So
we use a variable to tell the write-side task that read-side task is
reading nodemask, and the write-side task clears newly disallowed nodes
after read-side task ends the current memory allocation.

This patch:

In order to fix no node to alloc memory, when we want to update mempolicy
and mems_allowed, we expand the set of nodes first (set all the newly
nodes) and shrink the set of nodes lazily(clean disallowed nodes), But the
mempolicy's rebind functions may breaks the expanding.

So we restructure the mempolicy's rebind functions and split the rebind
work to two steps, just like the update of cpuset's mems: The 1st step:
expand the set of the mempolicy's nodes.  The 2nd step: shrink the set of
the mempolicy's nodes.  It is used when there is no real lock to protect
the mempolicy in the read-side.  Otherwise we can do rebind work at once.

In order to implement it, we define

	enum mpol_rebind_step {
		MPOL_REBIND_ONCE,
		MPOL_REBIND_STEP1,
		MPOL_REBIND_STEP2,
		MPOL_REBIND_NSTEP,
	};

If the mempolicy needn't be updated by two steps, we can pass
MPOL_REBIND_ONCE to the rebind functions.  Or we can pass
MPOL_REBIND_STEP1 to do the first step of the rebind work and pass
MPOL_REBIND_STEP2 to do the second step work.

Besides that, it maybe long time between these two step and we have to
release the lock that protects mempolicy and mems_allowed.  If we hold the
lock once again, we must check whether the current mempolicy is under the
rebinding (the first step has been done) or not, because the task may
alloc a new mempolicy when we don't hold the lock.  So we defined the
following flag to identify it:

#define MPOL_F_REBINDING (1 << 2)

The new functions will be used in the next patch.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Paul Menage <menage@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:06:57 -07:00
Minchan Kim e13861d822 mm: remove return value of putback_lru_pages()
putback_lru_page() never can fail.  So it doesn't matter count of "the
number of pages put back".

In addition, users of this functions don't use return value.

Let's remove unnecessary code.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:06:57 -07:00
KOSAKI Motohiro e9d6c15738 tmpfs: insert tmpfs cache pages to inactive list at first
Shaohua Li reported parallel file copy on tmpfs can lead to OOM killer.
This is regression of caused by commit 9ff473b9a7 ("vmscan: evict
streaming IO first").  Wow, It is 2 years old patch!

Currently, tmpfs file cache is inserted active list at first.  This means
that the insertion doesn't only increase numbers of pages in anon LRU, but
it also reduces anon scanning ratio.  Therefore, vmscan will get totally
confused.  It scans almost only file LRU even though the system has plenty
unused tmpfs pages.

Historically, lru_cache_add_active_anon() was used for two reasons.
1) Intend to priotize shmem page rather than regular file cache.
2) Intend to avoid reclaim priority inversion of used once pages.

But we've lost both motivation because (1) Now we have separate anon and
file LRU list.  then, to insert active list doesn't help such priotize.
(2) In past, one pte access bit will cause page activation.  then to
insert inactive list with pte access bit mean higher priority than to
insert active list.  Its priority inversion may lead to uninteded lru
chun.  but it was already solved by commit 645747462 (vmscan: detect
mapped file pages used only once).  (Thanks Hannes, you are great!)

Thus, now we can use lru_cache_add_anon() instead.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reported-by: Shaohua Li <shaohua.li@intel.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:06:56 -07:00
Josef Bacik facd07b07d direct-io: add a hook for the fs to provide its own submit_bio function
Because BTRFS can do RAID and such, we need our own submit hook so we can setup
the bio's in the correct fashion, and handle checksum errors properly.  So there
are a few changes here

1) The submit_io hook.  This is straightforward, just call this instead of
submit_bio.

2) Allow the fs to return -ENOTBLK for reads.  Usually this has only worked for
writes, since writes can fallback onto buffered IO.  But BTRFS needs the option
of falling back on buffered IO if it encounters a compressed extent, since we
need to read the entire extent in and decompress it.  So if we get -ENOTBLK back
from get_block we'll return back and fallback on buffered just like the write
case.

I've tested these changes with fsx and everything seems to work.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25 10:34:55 -04:00
Miklos Szeredi dd3bb14f44 fuse: support splice() writing to fuse device
Allow userspace filesystem implementation to use splice() to write to
the fuse device.  The semantics of using splice() are:

 1) buffer the message header and data in a temporary pipe
 2) with a *single* splice() call move the message from the temporary pipe
    to the fuse device

The READ reply message has the most interesting use for this, since
now the data from an arbitrary file descriptor (which could be a
regular file, a block device or a socket) can be tranferred into the
fuse device without having to go through a userspace buffer.  It will
also allow zero copy moving of pages.

One caveat is that the protocol on the fuse device requires the length
of the whole message to be written into the header.  But the length of
the data transferred into the temporary pipe may not be known in
advance.  The current library implementation works around this by
using vmplice to write the header and modifying the header after
splicing the data into the pipe (error handling omitted):

	struct fuse_out_header out;

	iov.iov_base = &out;
	iov.iov_len = sizeof(struct fuse_out_header);
	vmsplice(pip[1], &iov, 1, 0);
	len = splice(input_fd, input_offset, pip[1], NULL, len, 0);
	/* retrospectively modify the header: */
	out.len = len + sizeof(struct fuse_out_header);
	splice(pip[0], NULL, fuse_chan_fd(req->ch), NULL, out.len, flags);

This works since vmsplice only saves a pointer to the data, it does
not copy the data itself.

Since pipes are currently limited to 16 pages and messages need to be
spliced atomically, the length of the data is limited to 15 pages (or
60kB for 4k pages).

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2010-05-25 15:06:06 +02:00
Haojian Zhuang 9f79e9db2e regulator: use voltage number array in 88pm860x
A lot of condition comparision statements are used in original driver. These
statements are used to check the boundary of voltage numbers since voltage
number isn't linear.

Now use array of voltage numbers instead. Clean code with simpler way.

Signed-off-by: Haojian Zhuang <haojian.zhuang@marvell.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
2010-05-25 10:16:02 +01:00
Mark Brown 0178f3e28e regulator: Allow regulator-regulator supplies to be specified by name
When one regulator supplies another allow the relationship to be specified
using names rather than struct regulators, in a similar manner to that
allowed for consumer supplies. This allows static configuration at compile
time, reducing the need for dynamic init code.

Also change the references to LINE supply to be system supply since line
is sometimes used for actual supplies and therefore potentially confusing.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
2010-05-25 10:16:01 +01:00
Grant Likely b1e50ebcf2 Merge remote branch 'origin' into secretlab/next-spi 2010-05-25 00:38:26 -06:00
hartleys 41c4221ca6 spi: move bitbang txrx utility functions to private header
A number of files in drivers/spi fail checkincludes.pl due to the double
include of <linux/spi/spi_bitbang.h>.

The first include is needed to get the struct spi_bitbang definition and
the spi_bitbang_* function prototypes.

The second include happens after defining EXPAND_BITBANG_TXRX to get the
inlined bitbang_txrx_* utility functions.

The <linux/spi/spi_bitbang.h> header is also included by a number of other
spi drivers, as well as some arch/ code, in order to use struct spi_bitbang
and the associated functions.

To fix the double include, and remove any potential confusion about it, move
the inlined bitbang_txrx_* functions to a new private header in drivers/spi
and also remove the need to define EXPAND_BITBANG_TXRX.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-05-25 00:23:17 -06:00
Linus Walleij 781c7b129b spi/pl022: add support for the PL023 derivate
This adds support for a further ST variant of the PL022 called
PL023. Some differences in the control registers due to being
stripped down to SPI mode only, and a new clock feedback sample
delay config setting is available.

Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-05-25 00:23:14 -06:00
Linus Walleij 556f4aeb7d spi/pl022: fix up differences between ARM and ST versions
The PL022 SPI driver did not cleanly separate between the
original unmodified ARM version and the ST Microelectronics
versions. Split this more cleanly and fix some whitespace
moaning from checkpatch at the same time.

Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-05-25 00:23:13 -06:00
Paul Mundt 14baf9d7f2 serial: sh-sci: fix up serial DMA build.
asm/dmaengine.h no longer exists, update for the shared linux/sh_dma.h
header.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-05-25 12:22:33 +09:00
Alexander Duyck 73367bd8ee slub: move kmem_cache_node into it's own cacheline
This patch is meant to improve the performance of SLUB by moving the local
kmem_cache_node lock into it's own cacheline separate from kmem_cache.
This is accomplished by simply removing the local_node when NUMA is enabled.

On my system with 2 nodes I saw around a 5% performance increase w/
hackbench times dropping from 6.2 seconds to 5.9 seconds on average.  I
suspect the performance gain would increase as the number of nodes
increases, but I do not have the data to currently back that up.

Bugzilla-Reference: http://bugzilla.kernel.org/show_bug.cgi?id=15713
Cc: <stable@kernel.org>
Reported-by: Alex Shi <alex.shi@intel.com>
Tested-by: Alex Shi <alex.shi@intel.com>
Acked-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-05-24 21:11:29 +03:00
Linus Torvalds 7e125f7b9c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6:
  cmd640: fix kernel oops in test_irq() method
  pdc202xx_old: ignore "FIFO empty" bit in test_irq() method
  pdc202xx_old: wire test_irq() method for PDC2026x
  IDE: pass IRQ flags to the IDE core
  ide: fix comment typo in ide.h
2010-05-24 08:05:29 -07:00
Linus Torvalds f13771187b Merge branch 'bkl/ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing
* 'bkl/ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
  uml: Pushdown the bkl from harddog_kern ioctl
  sunrpc: Pushdown the bkl from sunrpc cache ioctl
  sunrpc: Pushdown the bkl from ioctl
  autofs4: Pushdown the bkl from ioctl
  uml: Convert to unlocked_ioctls to remove implicit BKL
  ncpfs: BKL ioctl pushdown
  coda: Clean-up whitespace problems in pioctl.c
  coda: BKL ioctl pushdown
  drivers: Push down BKL into various drivers
  isdn: Push down BKL into ioctl functions
  scsi: Push down BKL into ioctl functions
  dvb: Push down BKL into ioctl functions
  smbfs: Push down BKL into ioctl function
  coda/psdev: Remove BKL from ioctl function
  um/mmapper: Remove BKL usage
  sn_hwperf: Kill BKL usage
  hfsplus: Push down BKL into ioctl function
2010-05-24 08:01:10 -07:00