Commit Graph

348172 Commits

Author SHA1 Message Date
Stanislav Kinsbursky 73fb847a44 SUNRPC: introduce cache_detail->cache_request callback
This callback will allow to simplify upcalls in further patches in this
series.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-15 10:43:45 -05:00
Stanislav Kinsbursky 462b8f6bf1 NFS: simplify and clean cache library
This is a cleanup patch.
Such helpers like nfs_cache_init() and nfs_cache_destroy() are redundant,
because they are just a wrappers around sunrpc_init_cache_detail() and
sunrpc_destroy_cache_detail() respectively.
So let's remove them completely and move corresponding logic to
nfs_cache_register_net() and nfs_cache_unregister_net() respectively (since
they are called together anyway).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-15 10:43:36 -05:00
Stanislav Kinsbursky 483479c26a NFS: use SUNRPC cache creation and destruction helper for DNS cache
This cache was the first containerized and doesn't use net-aware cache
creation and destruction helpers.
This is a cleanup patch which just makes code looks clearer and reduce amount
of lines of code.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-15 10:43:12 -05:00
Fengguang Wu e56a316214 nfsd4: free_stid can be static
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
2013-02-11 16:22:50 -05:00
Jeff Layton 01a7decf75 nfsd: keep a checksum of the first 256 bytes of request
Now that we're allowing more DRC entries, it becomes a lot easier to hit
problems with XID collisions. In order to mitigate those, calculate a
checksum of up to the first 256 bytes of each request coming in and store
that in the cache entry, along with the total length of the request.

This initially used crc32, but Chuck Lever and Jim Rees pointed out that
crc32 is probably more heavyweight than we really need for generating
these checksums, and recommended looking at using the same routines that
are used to generate checksums for IP packets.

On an x86_64 KVM guest measurements with ftrace showed ~800ns to use
csum_partial vs ~1750ns for crc32.  The difference probably isn't
terribly significant, but for now we may as well use csum_partial.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Stones-thrown-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-08 16:02:26 -05:00
Jeff Layton 4c190e2f91 sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer
When GSSAPI integrity signatures are in use, or when we're using GSSAPI
privacy with the v2 token format, there is a trailing checksum on the
xdr_buf that is returned.

It's checked during the authentication stage, and afterward nothing
cares about it. Ordinarily, it's not a problem since the XDR code
generally ignores it, but it will be when we try to compute a checksum
over the buffer to help prevent XID collisions in the duplicate reply
cache.

Fix the code to trim off the checksums after verifying them. Note that
in unwrap_integ_data, we must avoid trying to reverify the checksum if
the request was deferred since it will no longer be present when it's
revisited.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2013-02-08 15:19:10 -05:00
Jeff Layton de0b65ca55 sunrpc: fix comment in struct xdr_buf definition
...these pages aren't necessarily contiguous.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-05 09:41:14 -05:00
Jeff Layton 5976687a2b sunrpc: move address copy/cmp/convert routines and prototypes from clnt.h to addr.h
These routines are used by server and client code, so having them in a
separate header would be best.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-05 09:41:14 -05:00
Jeff Layton 155a345a52 sunrpc: copy scope ID in __rpc_copy_addr6
When copying an address, we should also copy the scopeid in the event
that this is a link-local address and the scope matters.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-05 09:41:13 -05:00
J. Bruce Fields 3abdb60712 nfsd4: simplify idr allocation
We don't really need to preallocate at all; just allocate and initialize
everything at once, but leave the sc_type field initially 0 to prevent
finding the stateid till it's fully initialized.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-05 09:41:12 -05:00
majianpeng 2d32b29a1c nfsd: Fix memleak
When free nfs-client, it must free the ->cl_stateids.

Cc: stable@kernel.org
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-05 09:40:47 -05:00
Jeff Layton b4e7f2c945 nfsd: register a shrinker for DRC cache entries
Since we dynamically allocate them now, allow the system to call us up
to release them if it gets low on memory. Since these entries aren't
replaceable, only free ones that are expired or that are over the cap.
The the seeks value is set to '1' however to indicate that freeing the
these entries is low-cost.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 17:19:13 -05:00
Jeff Layton aca8a23de6 nfsd: add recurring workqueue job to clean the cache
It's not sufficient to only clean the cache when requests come in. What
if we have a flurry of activity and then the server goes idle? Add a
workqueue job that will clean the cache every RC_EXPIRE period.

Care is taken to only run this when we expect to have entries expiring.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 17:19:12 -05:00
Jeff Layton 2c6b691c05 nfsd: when updating an entry with RC_NOCACHE, just free it
There's no need to keep entries around that we're declaring RC_NOCACHE.
Ditto if there's a problem with the entry.

With this change too, there's no need to test for RC_UNUSED in the
search function. If the entry's in the hash table then it's either
INPROG or DONE.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 17:19:11 -05:00
Jeff Layton 13cc8a78e8 nfsd: remove the cache_disabled flag
With the change to dynamically allocate entries, the cache is never
disabled on the fly. Remove this flag.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 17:19:11 -05:00
Jeff Layton 0338dd1572 nfsd: dynamically allocate DRC entries
The existing code keeps a fixed-size cache of 1024 entries. This is much
too small for a busy server, and wastes memory on an idle one.  This
patch changes the code to dynamically allocate and free these cache
entries.

A cap on the number of entries is retained, but it's much larger than
the existing value and now scales with the amount of low memory in the
machine.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 17:19:10 -05:00
Jeff Layton 0ee0bf7ee5 nfsd: track the number of DRC entries in the cache
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 17:19:09 -05:00
Jeff Layton 56c2548b2d nfsd: always move DRC entries to the end of LRU list when updating timestamp
...otherwise, we end up with the list ordering wrong. Currently, it's
not a problem since we skip RC_INPROG entries, but keeping the ordering
strict will be necessary for a later patch that adds a cache cleaner.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 17:19:09 -05:00
Jeff Layton 2eeb9b2abc nfsd: initialize the exp->ex_uuid field in svc_export_init
commit 885c91f746 in Bruce's tree was causing oopses for me:

general protection fault: 0000 [#1] SMP
Modules linked in: nfsd(OF) nfs_acl(OF) auth_rpcgss(OF) lockd(OF) sunrpc(OF) kvm_amd kvm microcode i2c_piix4 virtio_net virtio_balloon cirrus drm_kms_helper ttm drm virtio_blk i2c_core
CPU 0
Pid: 564, comm: exportfs Tainted: GF          O 3.8.0-0.rc5.git2.1.fc19.x86_64 #1 Bochs Bochs
RIP: 0010:[<ffffffff811b1509>]  [<ffffffff811b1509>] kfree+0x49/0x280
RSP: 0018:ffff88007a3d7c50  EFLAGS: 00010203
RAX: 01adaf8dadadad80 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000001
RDX: ffffffff7fffffff RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6b6b
RBP: ffff88007a3d7c80 R08: 6b6b6b6b6b6b6b6b R09: 0000000000000000
R10: 0000000000000018 R11: 0000000000000000 R12: ffff88006a117b50
R13: ffffffffa01a589c R14: ffff8800631b0f50 R15: 01ad998dadadad80
FS:  00007fcaa3616740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f5d84b6fdd8 CR3: 0000000064db4000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process exportfs (pid: 564, threadinfo ffff88007a3d6000, task ffff88006af28000)
Stack:
 ffff88007a3d7c80 ffff88006a117b68 ffff88006a117b50 0000000000000000
 ffff8800631b0f50 ffff88006a117b50 ffff88007a3d7ca0 ffffffffa01a589c
 ffff880036be1148 ffff88007a3d7cf8 ffff88007a3d7e28 ffffffffa01a6a98
Call Trace:
 [<ffffffffa01a589c>] svc_export_put+0x5c/0x70 [nfsd]
 [<ffffffffa01a6a98>] svc_export_parse+0x328/0x7e0 [nfsd]
 [<ffffffffa016f1c7>] cache_do_downcall+0x57/0x70 [sunrpc]
 [<ffffffffa016f25e>] cache_downcall+0x7e/0x100 [sunrpc]
 [<ffffffffa016f338>] cache_write_procfs+0x58/0x90 [sunrpc]
 [<ffffffffa016f2e0>] ? cache_downcall+0x100/0x100 [sunrpc]
 [<ffffffff8123b0e5>] proc_reg_write+0x75/0xb0
 [<ffffffff811ccecf>] vfs_write+0x9f/0x170
 [<ffffffff811cd089>] sys_write+0x49/0xa0
 [<ffffffff816e0919>] system_call_fastpath+0x16/0x1b
Code: 66 66 66 90 48 83 fb 10 0f 86 c3 00 00 00 48 89 df 49 bf 00 00 00 00 00 ea ff ff e8 f2 12 ea ff 48 c1 e8 0c 48 c1 e0 06 49 01 c7 <49> 8b 07 f6 c4 80 0f 85 1d 02 00 00 49 8b 07 a8 80 0f 84 ee 01
RIP  [<ffffffff811b1509>] kfree+0x49/0x280
 RSP <ffff88007a3d7c50>

I think Majianpeng's patch is correct, but incomplete. In order for it
to be safe to free the ex_uuid unconditionally in svc_export_put, we
need to make sure it's initialized to NULL in the init routine.

Cc: majianpeng <majianpeng@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 09:16:24 -05:00
Jeff Layton a4a3ec3291 nfsd: break out hashtable search into separate function
Later, we'll need more than one call site for this, so break it out
into a new function.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 09:16:24 -05:00
Jeff Layton d1a0774de6 nfsd: clean up and clarify the cache expiration code
Add a preprocessor constant for the expiry time of cache entries, and
move the test for an expired entry into a function. Note that the current
code does not test for RC_INPROG. It just assumes that it won't take more
than 2 minutes to fill out an in-progress entry.

I'm not sure how valid that assumption is though, so let's just ensure
that we never consider an RC_INPROG entry to be expired.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 09:16:23 -05:00
Jeff Layton 25e6b8b0e1 nfsd: remove redundant test from nfsd_reply_cache_free
Entries can only get a c_type of RC_REPLBUFF iff they are
RC_DONE. Therefore the test for RC_DONE isn't necessary here.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 09:16:22 -05:00
Jeff Layton f09841fdfa nfsd: add alloc and free functions for DRC entries
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 09:16:22 -05:00
Jeff Layton 8a8bc40d9b nfsd: create a dedicated slabcache for DRC entries
Currently we use kmalloc() which wastes a little bit of memory on each
allocation since it's a power of 2 allocator. Since we're allocating a
1024 of these now, and may need even more later, let's create a new
slabcache for them.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 09:16:21 -05:00
Jeff Layton 09662d58d5 nfsd: get rid of RC_INTR
The reply cache code never returns this status.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 09:16:20 -05:00
Jeff Layton 6dc8889589 nfsd: remove unneeded spinlock in nfsd_cache_update
The locking rules for cache entries say that locking the cache_lock
isn't needed if you're just touching the current entry. Earlier
in this function we set rp->c_state to RC_UNUSED without any locking,
so I believe it's ok to do the same here.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 09:16:19 -05:00
Jeff Layton 7b9e8522a6 nfsd: fix IPv6 address handling in the DRC
Currently, it only stores the first 16 bytes of any address. struct
sockaddr_in6 is 28 bytes however, so we're currently ignoring the last
12 bytes of the address.

Expand the c_addr field to a sockaddr_in6, and cast it to a sockaddr_in
as necessary. Also fix the comparitor to use the existing RPC
helpers for this.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 09:16:19 -05:00
majianpeng 885c91f746 nfsd: Fix memleak in svc_export_put
In func svc_export_parse, the uuid which used kmemdup to alloc will be
changed in func export_update.So the later kfree don't free this memory.
And it can't be free in func svc_export_parse because other place still
used.So put this operation in func svc_export_put.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-01-29 16:50:03 -05:00
J. Bruce Fields ff89be87c7 nfsd4: require version 4 when enabling or disabling minorversion
The current code will allow silly things like:

	echo "+2 +3 +4 +7.1">/proc/fs/nfsd/versions

Reported-by: Fan Chaoting <fanchaoting@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-01-23 18:25:01 -05:00
Stanislav Kinsbursky bca0ec6511 nfsd: fix unused "nn" variable warning in free_client()
If CONFIG_LOCKDEP is disabled, then there would be a warning like this:

  CC [M]  fs/nfsd/nfs4state.o
fs/nfsd/nfs4state.c: In function ‘free_client’:
fs/nfsd/nfs4state.c:1051:19: warning: unused variable ‘nn’ [-Wunused-variable]

So, let's add "maybe_unused" tag to this variable.

Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-01-23 18:17:40 -05:00
Andriy Skulysh 35525b7978 sunrpc: Fix lockd sleeping until timeout
There is a race in enqueueing thread to a pool and
waking up a thread.
lockd doesn't wake up on reception of lock granted callback
if svc_wake_up() is called before lockd's thread is added
to a pool.

Signed-off-by: Andriy Skulysh <Andriy_Skulysh@xyratex.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-01-23 18:17:39 -05:00
J. Bruce Fields 624ab46448 svcrpc: silence "unused variable" warning in !RPC_DEBUG case
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-01-23 18:17:38 -05:00
Yanchuan Nian ec16867617 nfsd: Remove write permission from file content
The write function doesn't be implemented in file content, and it's meaningless
to write data into this file directly. Remove write permission from it.

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-01-23 18:17:38 -05:00
Yanchuan Nian 266533c6df nfsd: Don't unlock the state while it's not locked
In the procedure of CREATE_SESSION, the state is locked after
alloc_conn_from_crses(). If the allocation fails, the function
goes to "out_free_session", and then "out" where there is an
unlock function.

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-01-23 18:17:37 -05:00
Yanchuan Nian 74b70dded3 nfsd: Pass correct slot number to nfsd4_put_drc_mem()
In alloc_session(), numslots is the correct slot number used by the session.
But the slot number passed to nfsd4_put_drc_mem() is the one from nfs client.

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-01-23 18:17:36 -05:00
J. Bruce Fields 84822d0b3b nfsd4: simplify nfsd4_encode_fattr interface slightly
It seems slightly simpler to make nfsd4_encode_fattr rather than its
callers responsible for advancing the write pointer on success.

(Also: the count == 0 check in the verify case looks superfluous.
Running out of buffer space is really the only reason fattr encoding
should fail with eresource.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-01-23 18:17:35 -05:00
Linus Torvalds d1c3ed669a Linux 3.8-rc2 2013-01-02 18:13:21 -08:00
Linus Torvalds d50403dcc5 Merge branch 'fixes-for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds
Pull LED fix from Bryan Wu.

* 'fixes-for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds:
  leds: leds-gpio: set devm_gpio_request_one() flags param correctly
2013-01-02 18:12:35 -08:00
Javier Martinez Canillas 2d7c22f67d leds: leds-gpio: set devm_gpio_request_one() flags param correctly
commit a99d76f leds: leds-gpio: use gpio_request_one

changed the leds-gpio driver to use gpio_request_one() instead
of gpio_request() + gpio_direction_output()

Unfortunately, it also made a semantic change that breaks the
leds-gpio driver.

The gpio_request_one() flags parameter was set to:

GPIOF_DIR_OUT | (led_dat->active_low ^ state)

Since GPIOF_DIR_OUT is 0, the final flags value will just be the
XOR'ed value of led_dat->active_low and state.

This value were used to distinguish between HIGH/LOW output initial
level and call gpio_direction_output() accordingly.

With this new semantic gpio_request_one() will take the flags value
of 1 as a configuration of input direction (GPIOF_DIR_IN) and will
call gpio_direction_input() instead of gpio_direction_output().

int gpio_request_one(unsigned gpio, unsigned long flags, const char *label)
{
..
	if (flags & GPIOF_DIR_IN)
		err = gpio_direction_input(gpio);
	else
		err = gpio_direction_output(gpio,
				(flags & GPIOF_INIT_HIGH) ? 1 : 0);
..
}

The right semantic is to evaluate led_dat->active_low ^ state and
set the output initial level explicitly.

Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Reported-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Tested-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Bryan Wu <cooloney@gmail.com>
2013-01-02 17:58:41 -08:00
Linus Torvalds ef05e9b960 Merge git://www.linux-watchdog.org/linux-watchdog
Pull watchdog fixes from Wim Van Sebroeck:
 "This fixes some small errors in the new da9055 driver, eliminates a
  compiler warning and adds DT support for the twl4030_wdt driver (so
  that we can have multiple watchdogs with DT on the omap platforms)."

* git://www.linux-watchdog.org/linux-watchdog:
  watchdog: twl4030_wdt: add DT support
  watchdog: omap_wdt: eliminate unused variable and a compiler warning
  watchdog: da9055: Don't update wdt_dev->timeout in da9055_wdt_set_timeout error path
  watchdog: da9055: Fix invalid free of devm_ allocated data
2013-01-02 17:46:14 -08:00
Linus Torvalds 080a62e2ce PCI updates for v3.8:
PCI: Reduce Ricoh 0xe822 SD card reader base clock frequency to 50MHz
   PCI: Remove spurious error for sriov_numvfs store and simplify flow
   PCI: Add PCIe Link Capability link speed and width names
   PCI/PM: Do not suspend port if any subordinate device needs PME polling
   PCI: Work around Stratus ftServer broken PCIe hierarchy (fix DMI check)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ5HnVAAoJEPGMOI97Hn6zprgP/1we4RhVaXLRnmLyc9OpS97Z
 9KZUU/rsMQ/RYstdNFV30JOypMFL1BlK7jpLR14gjSgCulKK9etvjBTwiV26Rfor
 n/LWru4CWUtGUH/2c4IwuN0FKxfU7W4GxuVfKi3uACh7yJRwKgxZhFKLLb4OZ/T0
 A1CiIktdpZhH5A8+WdoSkZSsfQPuUA6UVKQleEQh/qJl9qgxwEDLsdj5fIZLsFUB
 Fo3bbusq2X+pHU0uuBIzrheSUeSmxXzeZcte8JxTEEwB/Gdsn24lJ39MK5PHAaOE
 gSVC7HDi+vNCICZhi7H93musPczL1TqeyMZQWSa/rj7KV836kG+Phz61SmsXTxyR
 VpfnEZOx7GreErpBuLKrOVslXJl1TBc/ZiiLd5SBUlO4ZClAssPcevtUexCR3xr6
 eHoSYMtwblW/vgJ3rn/PD8SgksZVJsd6+JAlVbC53XAdeJuEheCdEU7HnQZ3ZQRF
 6wpWOBfIxdSQM4AukncNjUSTQpVjNFoEXNcPBCamazDz9NgRIcrnBAd/94+AVD0t
 WQpoU0HDP6h00pK8Ls3Fsv23qbfPDPP9i6zhSGlv5Q9Sz5T8b178j8h7tkUgtPy/
 vxAtwUgFwz5cxE053lrht0JEQUikv99VcUJKrQc17g6GIMenh4duXrxF1I1EERXD
 fcVZNas3SrnLSKfsVwws
 =Ehs+
 -----END PGP SIGNATURE-----

Merge tag '3.8-pci-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "Some fixes for v3.8.  They include a fix for the new SR-IOV sysfs
  management support, an expanded quirk for Ricoh SD card readers, a
  Stratus DMI quirk fix, and a PME polling fix."

* tag '3.8-pci-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Reduce Ricoh 0xe822 SD card reader base clock frequency to 50MHz
  PCI/PM: Do not suspend port if any subordinate device needs PME polling
  PCI: Add PCIe Link Capability link speed and width names
  PCI: Work around Stratus ftServer broken PCIe hierarchy (fix DMI check)
  PCI: Remove spurious error for sriov_numvfs store and simplify flow
2013-01-02 17:44:29 -08:00
David Howells 8a7eab2b54 UAPI: Strip _UAPI prefix on header install no matter the whitespace
Commit 56c176c9ca ("UAPI: strip the _UAPI prefix from header guards
during header installation") strips the _UAPI prefix from header guards,
but only if there's a single space between the cpp directive and the
label.

Make it more flexible and able to handle tabs and multiple white space
characters.

Signed-off-by: David Howells <dhowell@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-02 17:36:10 -08:00
David Howells 3d33fcc11b UAPI: Remove empty Kbuild files
Empty files can get deleted by the patch program, so remove empty Kbuild
files and their links from the parent Kbuilds.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-02 17:36:10 -08:00
Linus Torvalds 007f6c3a63 Two self-explanatory fixes and a third patch which improves performance. When
overwriting a full page in the eCryptfs page cache, skip reading in and
 decrypting the corresponding lower page.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCgAGBQJQ5HR+AAoJENaSAD2qAscKg/gQAJSGpz9Frh3QqV30smvbKASI
 vBcHpbEBMhpExzkcLF3Gqdj7KqcwpN3Nh+oAD1vNyvermeczazEebr5wFfNTv4eE
 TetUfa2e92RS0c0yxgS+9k1Fhxi8BCovNxmFfiq5iPFHSNwjixPBHLLZVFPCdp9N
 il/dV8Y7wg1exDikZQc8lqiVULZxvkBc+R/dgXFhAnwFxDMT2jiInXbBU4Onct0P
 +YX4FwrKnDCOg7bk8Mk/lW6mwAuhoelnuF3dy9v/soBeclOeTfmUmO44dv0D3IPY
 iGpGofhs+cDSKxOZ0XXocAdFdmY7fbcijppoF00XyZiuqcd59zc0l+LDRuCBcXD7
 SFSTzR0uFf8C0rM4Mjfz6WGbwW7Ae0KqLbFIVg03MJDCquOtDBr0Xdpviy1GYNo3
 H0Z3400olyGqp/3ZoEjefOoz9DbzqHtzhcMtGBN/ihyaolPJzS81pLTYCsja2SJM
 pHUjId3abWOVRgtrAk+XUO9Sn6W8Or5bug4+idYwD6LfUILz9OpHin/mplnHoF9F
 8lEjhzNHyvU3HQPyR4v/TidExyx7IBeP0tOLk4X2N+fmH45ukl/pPDNfpF/2lxpd
 mN7HK2H2cYtGrYSwSmwuG0q9W365vmk8mvu2Xz5aIMe9r5SeucgPjzZ3zg+kHgRE
 OqJljwln6TaSB/7o0MQ5
 =JNeQ
 -----END PGP SIGNATURE-----

Merge tag 'ecryptfs-3.8-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull ecryptfs fixes from Tyler Hicks:
 "Two self-explanatory fixes and a third patch which improves
  performance: when overwriting a full page in the eCryptfs page cache,
  skip reading in and decrypting the corresponding lower page."

* tag 'ecryptfs-3.8-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  fs/ecryptfs/crypto.c: make ecryptfs_encode_for_filename() static
  eCryptfs: fix to use list_for_each_entry_safe() when delete items
  eCryptfs: Avoid unnecessary disk read and data decryption during writing
2013-01-02 17:33:50 -08:00
Linus Torvalds 58890c0669 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
 "Two of Alex's patches deal with a race when reseting server
  connections for open RBD images, one demotes some non-fatal BUGs to
  WARNs, and my patch fixes a protocol feature bit failure path."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: fix protocol feature mismatch failure path
  libceph: WARN, don't BUG on unexpected connection states
  libceph: always reset osds when kicking
  libceph: move linger requests sooner in kick_requests()
2013-01-02 17:32:49 -08:00
Mel Gorman 42288fe366 mm: mempolicy: Convert shared_policy mutex to spinlock
Sasha was fuzzing with trinity and reported the following problem:

  BUG: sleeping function called from invalid context at kernel/mutex.c:269
  in_atomic(): 1, irqs_disabled(): 0, pid: 6361, name: trinity-main
  2 locks held by trinity-main/6361:
   #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff810aa314>] __do_page_fault+0x1e4/0x4f0
   #1:  (&(&mm->page_table_lock)->rlock){+.+...}, at: [<ffffffff8122f017>] handle_pte_fault+0x3f7/0x6a0
  Pid: 6361, comm: trinity-main Tainted: G        W
  3.7.0-rc2-next-20121024-sasha-00001-gd95ef01-dirty #74
  Call Trace:
    __might_sleep+0x1c3/0x1e0
    mutex_lock_nested+0x29/0x50
    mpol_shared_policy_lookup+0x2e/0x90
    shmem_get_policy+0x2e/0x30
    get_vma_policy+0x5a/0xa0
    mpol_misplaced+0x41/0x1d0
    handle_pte_fault+0x465/0x6a0

This was triggered by a different version of automatic NUMA balancing
but in theory the current version is vunerable to the same problem.

do_numa_page
  -> numa_migrate_prep
    -> mpol_misplaced
      -> get_vma_policy
        -> shmem_get_policy

It's very unlikely this will happen as shared pages are not marked
pte_numa -- see the page_mapcount() check in change_pte_range() -- but
it is possible.

To address this, this patch restores sp->lock as originally implemented
by Kosaki Motohiro.  In the path where get_vma_policy() is called, it
should not be calling sp_alloc() so it is not necessary to treat the PTL
specially.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-02 17:32:13 -08:00
Linus Torvalds 5439ca6b8f Various bug fixes for ext4. Perhaps the most serious bug fixed is one
which could cause file system corruptions when performing file punch
 operations.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJQ374OAAoJENNvdpvBGATwEGAP/jKUwjQhBZiF0k9dg1kQ5eTz
 bdli4fy1vxrEMIOym8IZa4nBQJVCkArwRgjc28gCBD6k9u6X3GPa26vUydsoPfP6
 odPdc9c9HtsbYQGuaq1SohID5HfjxHewTcUmCs4X4SpGcSurUcT7eQYWqSuIxFHR
 0nKk8NO4EcWh2uqIoGPrc8QpSdor0DXXYYjZmHCeVLH1n6PyoMsnrFMfO9KqMLUL
 vNR54CX9n1GRTfAfJNkNzcwfs8IfNkDUyv5hFpDh15tLltogU0TqnlAl3vSeZGSx
 vVfhwHmQTK/bJyC3YaoRZqq9CQJVk2f/OTBpJDFY/USaapuitJd6vqbmh7NiRNAN
 LaKmFt99MPfwyjEhIA7+J0LCTraAxc536q43oWWK5dAJhWI7DW0lbHARVeQTixNy
 KJ1Lp0pmmz1mX8/lugOnK1SPBF525kTaoiz2bWqg4oQgn7mBzUlgj+EV22/6Rq83
 TpKOKstl4BiZi8t5AhmFiwqtknCDiT5vUKQNy2kuM/oXtPJID/lM/TJbR5viYD3l
 AH3Ef7xj61CynFZ0oBeraGwtXc2BHJpJdWz+8uj0/VhFfC+uNUYapSLFwyiAVZKO
 xxaItT3ylfKpa0AWK6HBc2SLuL72SCHAPks06YKFtSyHtr5C8SCcafxU2DSOSi7K
 VrhkcH6STa77Br7a1ORt
 =9R/D
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bug fixes from Ted Ts'o:
 "Various bug fixes for ext4.  Perhaps the most serious bug fixed is one
  which could cause file system corruptions when performing file punch
  operations."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: avoid hang when mounting non-journal filesystems with orphan list
  ext4: lock i_mutex when truncating orphan inodes
  ext4: do not try to write superblock on ro remount w/o journal
  ext4: include journal blocks in df overhead calcs
  ext4: remove unaligned AIO warning printk
  ext4: fix an incorrect comment about i_mutex
  ext4: fix deadlock in journal_unmap_buffer()
  ext4: split off ext4_journalled_invalidatepage()
  jbd2: fix assertion failure in jbd2_journal_flush()
  ext4: check dioread_nolock on remount
  ext4: fix extent tree corruption caused by hole punch
2013-01-02 09:57:34 -08:00
Hugh Dickins a7a88b2373 mempolicy: remove arg from mpol_parse_str, mpol_to_str
Remove the unused argument (formerly no_context) from mpol_parse_str()
and from mpol_to_str().

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-02 09:27:10 -08:00
Hugh Dickins f2a07f40db tmpfs mempolicy: fix /proc/mounts corrupting memory
Recently I suggested using "mount -o remount,mpol=local /tmp" in NUMA
mempolicy testing.  Very nasty.  Reading /proc/mounts, /proc/pid/mounts
or /proc/pid/mountinfo may then corrupt one bit of kernel memory, often
in a page table (causing "Bad swap" or "Bad page map" warning or "Bad
pagetable" oops), sometimes in a vm_area_struct or rbnode or somewhere
worse.  "mpol=prefer" and "mpol=prefer:Node" are equally toxic.

Recent NUMA enhancements are not to blame: this dates back to 2.6.35,
when commit e17f74af35 "mempolicy: don't call mpol_set_nodemask() when
no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(),
which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags.
With slab poisoning, you can then rely on mpol_to_str() to set the bit
for node 0x6b6b, probably in the next page above the caller's stack.

mpol_parse_str() is only called from shmem_parse_options(): no_context
is always true, so call it unused for now, and remove !no_context code.
Set v.nodes or v.preferred_node or MPOL_F_LOCAL as mpol_to_str() might
expect.  Then mpol_to_str() can ignore its no_context argument also,
the mpol being appropriately initialized whether contextualized or not.
Rename its no_context unused too, and let subsequent patch remove them
(that's not needed for stable backporting, which would involve rejects).

I don't understand why MPOL_LOCAL is described as a pseudo-policy:
it's a reasonable policy which suffers from a confusing implementation
in terms of MPOL_PREFERRED with MPOL_F_LOCAL.  I believe this would be
much more robust if MPOL_LOCAL were recognized in switch statements
throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly
empty) nodes mask like everyone else, instead of its preferred_node
variant (I presume an optimization from the days before MPOL_LOCAL).
But that would take me too long to get right and fully tested.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-02 09:27:10 -08:00
Eric Wong 128dd1759d epoll: prevent missed events on EPOLL_CTL_MOD
EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to
ensure events are not missed.  Since the modifications to the interest
mask are not protected by the same lock as ep_poll_callback, we need to
ensure the change is visible to other CPUs calling ep_poll_callback.

We also need to ensure f_op->poll() has an up-to-date view of past
events which occured before we modified the interest mask.  So this
barrier also pairs with the barrier in wq_has_sleeper().

This should guarantee either ep_poll_callback or f_op->poll() (or both)
will notice the readiness of a recently-ready/modified item.

This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in:
http://thread.gmane.org/gmane.linux.kernel/1408782/

Signed-off-by: Eric Wong <normalperson@yhbt.net>
Cc: Hans Verkuil <hans.verkuil@cisco.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Voellmy <andreas.voellmy@yale.edu>
Tested-by: "Junchang(Jason) Wang" <junchang.wang@yale.edu>
Cc: netdev@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-02 09:16:43 -08:00