Commit Graph

45713 Commits

Author SHA1 Message Date
Trond Myklebust fb10fb67ad NFSv4: Cleanup the setting of the nfs4 lease period
Make a helper function nfs4_set_lease_period() and have
nfs41_setup_state_renewal() and nfs4_do_fsinfo() use it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-05 19:13:08 -04:00
Chris Mason 1083881654 Merge branch 'integration-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.8 2016-08-05 12:25:05 -07:00
Hiraku Toyooka e976e56423 ramoops: use persistent_ram_free() instead of kfree() for freeing prz
persistent_ram_zone(=prz) structures are allocated by persistent_ram_new(),
which includes vmap() or ioremap(). But they are currently freed by
kfree(). This uses persistent_ram_free() for correct this asymmetry usage.

Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.kw@hitachi.com>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Seiji Aguchi <seiji.aguchi.tr@hitachi.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-08-05 11:21:46 -07:00
Kees Cook 529182e204 ramoops: use DT reserved-memory bindings
Instead of a ramoops-specific node, use a child node of /reserved-memory.
This requires that of_platform_device_create() be explicitly called
for the node, though, since "/reserved-memory" does not have its own
"compatible" property.

Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Rob Herring <robh@kernel.org>
2016-08-05 11:21:36 -07:00
Trond Myklebust 206b3bb574 NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKED
We should handle those errors in the same way we handle the other
stateid errors: by invalidating the faulty layout stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-05 12:18:10 -04:00
Linus Torvalds a71e36045e Highlights:
Trond made a change to the server's tcp logic that allows a fast
 	client to better take advantage of high bandwidth networks, but
 	may increase the risk that a single client could starve other
 	clients; a new sunrpc.svc_rpc_per_connection_limit parameter
 	should help mitigate this in the (hopefully unlikely) event this
 	becomes a problem in practice.
 
 	Tom Haynes added a minimal flex-layout pnfs server, which is of
 	no use in production for now--don't build it unless you're doing
 	client testing or further server development.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXo7HNAAoJECebzXlCjuG+zqUP/RxO5jZjBhNI8/ayGdDW/Jnq
 s0Fu6B+aNRV3GnugmIeI4tWNGnPyERNzFtjLKlnwaasz/oW4qBLqGbNUWC5xKARS
 erODs0hM/1aCYWwNBEc5qXP2u23HrWVuQ+B5fg42ACyliKFGq5faDRmf6XGU/1kB
 8unXGWPAiLiNZD/bWP91fYhThlLgpfHBFZ7M3G2IqmzWZTSELPzwp1bpRWt7yWQQ
 z1oYtXToycbwz3yPVk3cXtaoqpjDUVZf2Guqgqi1BwEyEtYOSaYo1VHNsKDf4OId
 QXQh64AqIK4uszpvtNhvsEaAECN7IiB+N4n2laFiQVmAf8Hfl3AnV/gKeD4lKmTj
 TY6knnjZO/X88wn80MB7JR1H1WXvvzNIHwNR95qfub/lVKX+C+0AORRtYhi5F9ec
 ixNs/z1ImLpYxAjiP/T5anD5xcX2S+LcSv7kRjhEufqNFtRAIqBZO9ZWbCdXAAyE
 tcH9Cru4jeIlFO/y6O61EVrn9FFj2+0uu+7urefNRQ2Y9pmKeculJrLF6WO8WHms
 4IzXMmjZK+358RVdX2Ji5Hw6rBDvfgP+LjB8Jn8CeIiNRONEjT+2/AYQcfk61aLb
 INUbk6G6Vfd8iMO4aaRI9tmW+vKCOZa0IbnrNE1oHKp/AKBDr25i5YPSCsnl3r4Q
 iR7rRe9FIkfqBpbfjVFv
 =mo54
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "Highlights:

   - Trond made a change to the server's tcp logic that allows a fast
     client to better take advantage of high bandwidth networks, but may
     increase the risk that a single client could starve other clients;
     a new sunrpc.svc_rpc_per_connection_limit parameter should help
     mitigate this in the (hopefully unlikely) event this becomes a
     problem in practice.

   - Tom Haynes added a minimal flex-layout pnfs server, which is of no
     use in production for now--don't build it unless you're doing
     client testing or further server development"

* tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linux: (32 commits)
  nfsd: remove some dead code in nfsd_create_locked()
  nfsd: drop unnecessary MAY_EXEC check from create
  nfsd: clean up bad-type check in nfsd_create_locked
  nfsd: remove unnecessary positive-dentry check
  nfsd: reorganize nfsd_create
  nfsd: check d_can_lookup in fh_verify of directories
  nfsd: remove redundant zero-length check from create
  nfsd: Make creates return EEXIST instead of EACCES
  SUNRPC: Detect immediate closure of accepted sockets
  SUNRPC: accept() may return sockets that are still in SYN_RECV
  nfsd: allow nfsd to advertise multiple layout types
  nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
  nfsd/blocklayout: Make sure calculate signature/designator length aligned
  xfs: abstract block export operations from nfsd layouts
  SUNRPC: Remove unused callback xpo_adjust_wspace()
  SUNRPC: Change TCP socket space reservation
  SUNRPC: Add a server side per-connection limit
  SUNRPC: Micro optimisation for svc_data_ready
  SUNRPC: Call the default socket callbacks instead of open coding
  SUNRPC: lock the socket while detaching it
  ...
2016-08-04 19:59:06 -04:00
Linus Torvalds d58b0d980f Merge branch 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull more btrfs updates from Chris Mason:
 "This is part two of my btrfs pull, which is some cleanups and a batch
  of fixes.

  Most of the code here is from Jeff Mahoney, making the pointers we
  pass around internally more consistent and less confusing overall.  I
  noticed a small problem right before I sent this out yesterday, so I
  fixed it up and re-tested overnight"

* 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (40 commits)
  Btrfs: fix __MAX_CSUM_ITEMS
  btrfs: btrfs_abort_transaction, drop root parameter
  btrfs: add btrfs_trans_handle->fs_info pointer
  btrfs: btrfs_relocate_chunk pass extent_root to btrfs_end_transaction
  btrfs: convert nodesize macros to static inlines
  btrfs: introduce BTRFS_MAX_ITEM_SIZE
  btrfs: cleanup, remove prototype for btrfs_find_root_ref
  btrfs: copy_to_sk drop unused root parameter
  btrfs: simpilify btrfs_subvol_inherit_props
  btrfs: tests, use BTRFS_FS_STATE_DUMMY_FS_INFO instead of dummy root
  btrfs: tests, require fs_info for root
  btrfs: tests, move initialization into tests/
  btrfs: btrfs_test_opt and friends should take a btrfs_fs_info
  btrfs: prefix fsid to all trace events
  btrfs: plumb fs_info into btrfs_work
  btrfs: remove obsolete part of comment in statfs
  btrfs: hide test-only member under ifdef
  btrfs: Ratelimit "no csum found" info message
  btrfs: Add ratelimit to btrfs printing
  Btrfs: fix unexpected balance crash due to BUG_ON
  ...
2016-08-04 19:56:16 -04:00
Linus Torvalds 3a303258ef This pull request contains mostly cleanups and minor
improvements of UBI and UBIFS.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJXoiyMAAoJEEtJtSqsAOnWoXoP/1Q192UXeI18eezK//Y1kgv/
 Q3gFoqtOWBnw9kcY9aTHdAtPJcgsjRzCMPVbd1TBEe071xWCyKziyGalNUFKLKOR
 IZxym3uf65jhXkcch7ZtoUdMH7XcGOavPg8X47RWs5u72uTiIt6t/RRUwM1zDeaW
 YZx3FnCGwyzPygrogTbVfH132o1pzO587wrxFeaZQ30sWCLqQOk3qVyROgz2J9zm
 00TjNQEvUgfhBf2PiUvX0S5Lan/AX1aB3iEGg05fIDDsZqui698DRDx+isFEJEHf
 NWBHDBnhOObwKgutDfCk1gsfIKxzxBCxlLQG/ZaCwG4XKke8ylRc1wNffJbKrIIQ
 AYywLol3n3/WR4VvPK+4/TX/s4UOZOvSZYiaVJiSmxOCUNydNtwIewNp+aVghV/u
 qMfWsWRIPy7OXOdm3fTxzRsFtUxZaqglQ/dK24i1d8kktM0rkb1mgfKq9P0uctWq
 0ejnNHQmJyuGKYvemjBtTXUFmFktelolDOfsAl10MbYZ+OwPOYpI9FbGY/POYWuT
 Gpn/x/r2lGtP94kGYxBzSX8xTCC4SEFaMjE2sRvhWoxA8YgIydTDhz9SxCO1wz8E
 a7nPnRQ0iZfo5JW0MkLZim+YDNyBjY5ASeBXXdJH/uXlCaFjmDCDCLz5/e08DuM3
 lmmkepYwimHJIClr6d+0
 =hOxy
 -----END PGP SIGNATURE-----

Merge tag 'upstream-4.8-rc1' of git://git.infradead.org/linux-ubifs

Pull UBI/UBIFS updates from Richard Weinberger:
 "This contains mostly cleanups and minor improvements of UBI and UBIFS"

* tag 'upstream-4.8-rc1' of git://git.infradead.org/linux-ubifs:
  ubi: Use bitmaps in Fastmap self-check code
  ubi: Be more paranoid while seaching for the most recent Fastmap
  ubi: Check whether the Fastmap anchor matches the super block
  ubi: Rework Fastmap attach base code
  ubi: Fix whitespace issue in count_fastmap_pebs()
  ubi: Introduce vol_ignored()
  ubi: Fix scan_fast() comment
  ubifs: switch_gc_head: Remove redondant sync of wbuf
  ubi: Make volume resize power cut aware
  ubi: Fix early logging
  ubi: gluebi: Fix double refcounting
  ubifs: Silence early error messages if MS_SILENT is set
  ubi: Fix race condition between ubi device creation and udev
  ubifs: Update comment for ubifs_errc
  ubi: Only read necessary size when reading the VID header
  ubifs: Make xattr structures static
  ubifs: Silence error output if MS_SILENT is set
2016-08-04 19:51:49 -04:00
Linus Torvalds 9e0243db61 Merge branch 'for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
 "Beside of various fixes this also contains patches to enable features
  such was Kcov, kmemleak and TRACE_IRQFLAGS_SUPPORT on UML"

* 'for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common()
  um: Support kcov
  um: Enable TRACE_IRQFLAGS_SUPPORT
  um: Use asm-generic/irqflags.h
  um: Fix possible deadlock in sig_handler_common()
  um: Select HAVE_DEBUG_KMEMLEAK
  um: Setup physical memory in setup_arch()
  um: Eliminate null test after alloc_bootmem
2016-08-04 19:37:59 -04:00
Linus Torvalds 8e7106a607 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu updates from Greg Ungerer:
 "This series is all about Nicolas flat format support for MMU systems.

  Traditional m68k no-MMU flat format binaries can now be run on m68k
  MMU enabled systems too.  The series includes some nice cleanups of
  the binfmt_flat code and converts it to using proper user space
  accessor functions.

  With all this in place you can boot and run a complete no-MMU flat
  format based user space on an MMU enabled system"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68k: enable binfmt_flat on systems with an MMU
  binfmt_flat: allow compressed flat binary format to work on MMU systems
  binfmt_flat: add MMU-specific support
  binfmt_flat: update libraries' data segment pointer with userspace accessors
  binfmt_flat: use clear_user() rather than memset() to clear .bss
  binfmt_flat: use proper user space accessors with old relocs code
  binfmt_flat: use proper user space accessors with relocs processing code
  binfmt_flat: clean up create_flat_tables() and stack accesses
  binfmt_flat: use generic transfer_args_to_stack()
  elf_fdpic_transfer_args_to_stack(): make it generic
  binfmt_flat: prevent kernel dammage from corrupted executable headers
  binfmt_flat: convert printk invocations to their modern form
  binfmt_flat: assorted cleanups
  m68k: use same start_thread() on MMU and no-MMU
  m68k: fix file path comment
  m68k: fix bFLT executable running on MMU enabled systems
2016-08-04 18:04:44 -04:00
Dan Carpenter 2b11885921 nfsd: remove some dead code in nfsd_create_locked()
We changed this around in f135af1041f ('nfsd: reorganize nfsd_create')
so "dchild" can't be an error pointer any more.  Also, dchild can't be
NULL here (and dput would already handle this even if it was).

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04 17:11:53 -04:00
J. Bruce Fields fa08139d5e nfsd: drop unnecessary MAY_EXEC check from create
We need an fh_verify to make sure we at least have a dentry, but actual
permission checks happen later.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04 17:11:52 -04:00
J. Bruce Fields 7142327449 nfsd: clean up bad-type check in nfsd_create_locked
Minor cleanup, no change in behavior.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04 17:11:51 -04:00
J. Bruce Fields d03d9fe476 nfsd: remove unnecessary positive-dentry check
vfs_{create,mkdir,mknod} each begin with a call to may_create(), which
returns EEXIST if the object already exists.

This check is therefore unnecessary.

(In the NFSv2 case, nfsd_proc_create also has such a check.  Contrary to
RFC 1094, our code seems to believe that a CREATE of an existing file
should succeed.  I'm leaving that behavior alone.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04 17:11:50 -04:00
J. Bruce Fields b44061d0b9 nfsd: reorganize nfsd_create
There's some odd logic in nfsd_create() that allows it to be called with
the parent directory either locked or unlocked.  The only already-locked
caller is NFSv2's nfsd_proc_create().  It's less confusing to split out
the unlocked case into a separate function which the NFSv2 code can call
directly.

Also fix some comments while we're here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04 17:11:49 -04:00
J. Bruce Fields e75b23f9e3 nfsd: check d_can_lookup in fh_verify of directories
Create and other nfsd ops generally assume we can call lookup_one_len on
inodes with S_IFDIR set.  Al says that this assumption isn't true in
general, though it should be for the filesystem objects nfsd sees.

Add a check just to make sure our assumption isn't violated.

Remove a couple checks for i_op->lookup in create code.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04 17:11:48 -04:00
J. Bruce Fields 12391d0723 nfsd: remove redundant zero-length check from create
lookup_one_len already has this check.

The only effect of this patch is to return access instead of perm in the
0-length-filename case.  I actually prefer nfserr_perm (or _inval?), but
I doubt anyone cares.

The isdotent check seems redundant too, but I worry that some client
might actually care about that strange nfserr_exist error.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04 17:11:47 -04:00
Oleg Drokin 7eed34f18d nfsd: Make creates return EEXIST instead of EACCES
When doing a create (mkdir/mknod) on a name, it's worth
checking the name exists first before returning EACCES in case
the directory is not writeable by the user.
This makes return values on the client more consistent
regardless of whenever the entry there is cached in the local
cache or not.
Another positive side effect is certain programs only expect
EEXIST in that case even despite POSIX allowing any valid
error to be returned.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04 17:11:46 -04:00
Mike Christie abf545484d mm/block: convert rw_page users to bio op use
The rw_page users were not converted to use bio/req ops. As a result
bdev_write_page is not passing down REQ_OP_WRITE and the IOs will
be sent down as reads.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Fixes: 4e1b2d52a8 ("block, fs, drivers: remove REQ_OP compat defs and related code")

Modified by me to:

1) Drop op_flags passing into ->rw_page(), as we don't use it.
2) Make op_is_write() and friends safe to use for !CONFIG_BLOCK

Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-04 14:25:33 -06:00
Shaun Tancheff b571bc606e Fixup direct bi_rw modifiers
bi_rw should be using bio_set_op_attrs to set bi_rw.

Signed-off-by: Shaun Tancheff <shaun@tancheff.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-04 14:19:16 -06:00
Jens Axboe 1aee6b9a7d f2fs: drop bio->bi_rw manual assignment
Merge 4fc29c1aa3 included this extra line, but it's not needed (or
useful) since we'll bio_set_op_attrs() right after to properly set
the op and flags for the bio.

Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-04 14:19:16 -06:00
Paolo Valente 20bd723ec6 block: add missing group association in bio-cloning functions
When a bio is cloned, the newly created bio must be associated with
the same blkcg as the original bio (if BLK_CGROUP is enabled). If
this operation is not performed, then the new bio is not associated
with any group, and the group of the current task is returned when
the group of the bio is requested.

Depending on the cloning frequency, this may cause a large
percentage of the bios belonging to a given group to be treated
as if belonging to other groups (in most cases as if belonging to
the root group). The expected group isolation may thereby be broken.

This commit adds the missing association in bio-cloning functions.

Fixes: da2f0f74cf ("Btrfs: add support for blkio controllers")
Cc: stable@vger.kernel.org # v4.3+

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Reviewed-by: Nikolay Borisov <kernel@kyup.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-04 14:19:16 -06:00
Jan Kara dc5ff2b1d6 writeback: Write dirty times for WB_SYNC_ALL writeback
Currently we take care to handle I_DIRTY_TIME in vfs_fsync() and
queue_io() so that inodes which have only dirty timestamps are properly
written on fsync(2) and sync(2). However there are other call sites -
most notably going through write_inode_now() - which expect inode to be
clean after WB_SYNC_ALL writeback. This is not currently true as we do
not clear I_DIRTY_TIME in __writeback_single_inode() even for
WB_SYNC_ALL writeback in all the cases. This then resulted in the
following oops because bdev_write_inode() did not clean the inode and
writeback code later stumbled over a dirty inode with detached wb.

  general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
  Modules linked in:
  CPU: 3 PID: 32 Comm: kworker/u10:1 Not tainted 4.6.0-rc3+ #349
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Workqueue: writeback wb_workfn (flush-11:0)
  task: ffff88006ccf1840 ti: ffff88006cda8000 task.ti: ffff88006cda8000
  RIP: 0010:[<ffffffff818884d2>]  [<ffffffff818884d2>]
  locked_inode_to_wb_and_lock_list+0xa2/0x750
  RSP: 0018:ffff88006cdaf7d0  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88006ccf2050
  RDX: 0000000000000000 RSI: 000000114c8a8484 RDI: 0000000000000286
  RBP: ffff88006cdaf820 R08: ffff88006ccf1840 R09: 0000000000000000
  R10: 000229915090805f R11: 0000000000000001 R12: ffff88006a72f5e0
  R13: dffffc0000000000 R14: ffffed000d4e5eed R15: ffffffff8830cf40
  FS:  0000000000000000(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000003301bf8 CR3: 000000006368f000 CR4: 00000000000006e0
  DR0: 0000000000001ec9 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
  Stack:
   ffff88006a72f680 ffff88006a72f768 ffff8800671230d8 03ff88006cdaf948
   ffff88006a72f668 ffff88006a72f5e0 ffff8800671230d8 ffff88006cdaf948
   ffff880065b90cc8 ffff880067123100 ffff88006cdaf970 ffffffff8188e12e
  Call Trace:
   [<     inline     >] inode_to_wb_and_lock_list fs/fs-writeback.c:309
   [<ffffffff8188e12e>] writeback_sb_inodes+0x4de/0x1250 fs/fs-writeback.c:1554
   [<ffffffff8188efa4>] __writeback_inodes_wb+0x104/0x1e0 fs/fs-writeback.c:1600
   [<ffffffff8188f9ae>] wb_writeback+0x7ce/0xc90 fs/fs-writeback.c:1709
   [<     inline     >] wb_do_writeback fs/fs-writeback.c:1844
   [<ffffffff81891079>] wb_workfn+0x2f9/0x1000 fs/fs-writeback.c:1884
   [<ffffffff813bcd1e>] process_one_work+0x78e/0x15c0 kernel/workqueue.c:2094
   [<ffffffff813bdc2b>] worker_thread+0xdb/0xfc0 kernel/workqueue.c:2228
   [<ffffffff813cdeef>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
   [<ffffffff867bc5d2>] ret_from_fork+0x22/0x50 arch/x86/entry/entry_64.S:392
  Code: 05 94 4a a8 06 85 c0 0f 85 03 03 00 00 e8 07 15 d0 ff 41 80 3e
  00 0f 85 64 06 00 00 49 8b 9c 24 88 01 00 00 48 89 d8 48 c1 e8 03 <42>
  80 3c 28 00 0f 85 17 06 00 00 48 8b 03 48 83 c0 50 48 39 c3
  RIP  [<     inline     >] wb_get include/linux/backing-dev-defs.h:212
  RIP  [<ffffffff818884d2>] locked_inode_to_wb_and_lock_list+0xa2/0x750
  fs/fs-writeback.c:281
   RSP <ffff88006cdaf7d0>
  ---[ end trace 986a4d314dcb2694 ]---

Fix the problem by making sure __writeback_single_inode() writes inode
only with dirty times in WB_SYNC_ALL mode.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-04 14:19:16 -06:00
Ross Zwisler 99a01cdf9d block: remove BLK_DEV_DAX config option
The functionality for block device DAX was already removed with commit
acc93d30d7 ("Revert "block: enable dax for raw block devices"")

However, we still had a config option hanging around that was always
disabled because it depended on CONFIG_BROKEN.  This config option was
introduced in commit 03cdadb040 ("block: disable block device DAX by
default")

This change reverts that commit, removing the dead config option.

Link: http://lkml.kernel.org/r/20160729182314.6368-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04 08:50:07 -04:00
Dan Carpenter 8a545f1851 hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common()
We can't pass error pointers to kfree() or it causes an oops.

Fixes: 52b209f7b8 ('get rid of hostfs_read_inode()')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-08-04 00:18:10 +02:00
Chris Mason 42049bf60d Btrfs: fix __MAX_CSUM_ITEMS
Jeff Mahoney's cleanup commit (14a1e067b4) wasn't correct for csums on
machines where the pagesize >= metadata blocksize.

This just reverts the relevant hunks to bring the old math back.

Signed-off-by: Chris Mason <clm@fb.com>
2016-08-03 14:08:37 -07:00
David Howells db20a8925b cachefiles: Fix race between inactivating and culling a cache object
There's a race between cachefiles_mark_object_inactive() and
cachefiles_cull():

 (1) cachefiles_cull() can't delete a backing file until the cache object
     is marked inactive, but as soon as that's the case it's fair game.

 (2) cachefiles_mark_object_inactive() marks the object as being inactive
     and *only then* reads the i_blocks on the backing inode - but
     cachefiles_cull() might've managed to delete it by this point.

Fix this by making sure cachefiles_mark_object_inactive() gets any data it
needs from the backing inode before deactivating the object.

Without this, the following oops may occur:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
IP: [<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
...
CPU: 11 PID: 527 Comm: kworker/u64:4 Tainted: G          I    ------------   3.10.0-470.el7.x86_64 #1
Hardware name: Hewlett-Packard HP Z600 Workstation/0B54h, BIOS 786G4 v03.19 03/11/2011
Workqueue: fscache_object fscache_object_work_func [fscache]
task: ffff880035edaf10 ti: ffff8800b77c0000 task.ti: ffff8800b77c0000
RIP: 0010:[<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
RSP: 0018:ffff8800b77c3d70  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800bf6cc400 RCX: 0000000000000034
RDX: 0000000000000000 RSI: ffff880090ffc710 RDI: ffff8800bf761ef8
RBP: ffff8800b77c3d88 R08: 2000000000000000 R09: 0090ffc710000000
R10: ff51005d2ff1c400 R11: 0000000000000000 R12: ffff880090ffc600
R13: ffff8800bf6cc520 R14: ffff8800bf6cc400 R15: ffff8800bf6cc498
FS:  0000000000000000(0000) GS:ffff8800bb8c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000098 CR3: 00000000019ba000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffff880090ffc600 ffff8800bf6cc400 ffff8800867df140 ffff8800b77c3db0
 ffffffffa06c48cb ffff880090ffc600 ffff880090ffc180 ffff880090ffc658
 ffff8800b77c3df0 ffffffffa085d846 ffff8800a96b8150 ffff880090ffc600
Call Trace:
 [<ffffffffa06c48cb>] cachefiles_drop_object+0x6b/0xf0 [cachefiles]
 [<ffffffffa085d846>] fscache_drop_object+0xd6/0x1e0 [fscache]
 [<ffffffffa085d615>] fscache_object_work_func+0xa5/0x200 [fscache]
 [<ffffffff810a605b>] process_one_work+0x17b/0x470
 [<ffffffff810a6e96>] worker_thread+0x126/0x410
 [<ffffffff810a6d70>] ? rescuer_thread+0x460/0x460
 [<ffffffff810ae64f>] kthread+0xcf/0xe0
 [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140
 [<ffffffff81695418>] ret_from_fork+0x58/0x90
 [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140

The oopsing code shows:

	callq  0xffffffff810af6a0 <wake_up_bit>
	mov    0xf8(%r12),%rax
	mov    0x30(%rax),%rax
	mov    0x98(%rax),%rax   <---- oops here
	lock add %rax,0x130(%rbx)

where this is:

	d_backing_inode(object->dentry)->i_blocks

Fixes: a5b3a80b89 (CacheFiles: Provide read-and-reset release counters for cachefilesd)
Reported-by: Jianhong Yin <jiyin@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-08-03 13:33:26 -04:00
Al Viro 8ecfb75216 Merge branch 'for-viro' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into for-linus 2016-08-03 13:31:51 -04:00
Geert Uytterhoeven 4b2e0162e4 fs/proc: Add compiler check for -Wno-override-init to support gcc < 4.2
With gcc < 4.2 (e.g. 4.1.2):

      CC      fs/proc/task_mmu.o
    cc1: error: unrecognized command line option "-Wno-override-init"

To fix this, only enable the compiler option when it is actually
supported by the compiler.

Fixes: ca52953f5f ("fs/proc/task_mmu.c: suppress compilation warnings with W=1")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-03 12:45:23 -04:00
Al Viro 7d50a29fe4 9p: use clone_fid()
in a bunch of places it cleans the things up

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-08-03 11:12:12 -04:00
Al Viro 797fc16d8f 9p: fix braino introduced in "9p: new helper - v9fs_parent_fid()"
In v9fs_vfs_rename() we need to clone the parents' fids, not just
find them.

Spotted-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-08-03 11:02:48 -04:00
Miklos Szeredi f0fce87c36 vfs: make dentry_needs_remove_privs() internal
Only used by the vfs.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-08-03 13:57:57 +02:00
Miklos Szeredi c1892c3776 vfs: fix deadlock in file_remove_privs() on overlayfs
file_remove_privs() is called with inode lock on file_inode(), which
proceeds to calling notify_change() on file->f_path.dentry.  Which triggers
the WARN_ON_ONCE(!inode_is_locked(inode)) in addition to deadlocking later
when ovl_setattr tries to lock the underlying inode again.

Fix this mess by not mixing the layers, but doing everything on underlying
dentry/inode.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 07a2daab49 ("ovl: Copy up underlying inode's ->i_mode to overlay inode")
Cc: <stable@vger.kernel.org>
2016-08-03 13:57:56 +02:00
Filipe Manana e657149933 Btrfs: remove unused function btrfs_add_delayed_qgroup_reserve()
No longer used as of commit 5846a3c268 ("btrfs: qgroup: Fix a race in
delayed_ref which leads to abort trans").

Signed-off-by: Filipe Manana <fdmanana@suse.com>
2016-08-03 11:02:51 +01:00
Darrick J. Wong 3481b68285 xfs: move (and rename) the deferred bmap-free tracepoints
Rename the deferred bmap-free to extent_free and make them only
trigger when we're really running deferred ops.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 12:31:07 +10:00
Darrick J. Wong 51ce9d000c xfs: collapse single use static functions
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 12:30:31 +10:00
Darrick J. Wong e127fafd1d xfs: remove unnecessary parentheses from log redo item recovery functions
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 12:29:32 +10:00
Darrick J. Wong 722e251770 xfs: remove the extents array from the rmap update done log item
Nothing ever uses the extent array in the rmap update done redo
item, so remove it before it is fixed in the on-disk log format.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 12:28:43 +10:00
Darrick J. Wong c1d22ae89c xfs: in btree_lshift, only allocate temporary cursor when needed
We only need the temporary cursor in _btree_lshift if we're shifting
in an overlapped btree.  Therefore, factor that into a single block
of code so we avoid unnecessary cursor duplication.

Also fix use of the wrong cursor when checking for corruption in
xfs_btree_rshift().

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 12:26:22 +10:00
Darrick J. Wong 1f704b2b47 xfs: remove unnecesary lshift/rshift key initialization
In the lshift/rshift functions we don't use the key variable for
anything now, so remove the variable and its initializer.  The
update_keys functions figure out the key for a block on their own.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 12:22:45 +10:00
Darrick J. Wong 973b83194b xfs: remove the get*keys and update_keys btree ops pointers
These are internal btree functions; we don't need them to be
dispatched via function pointers.  Make them static again and
just check the overlapped flag to figure out what we need to
do.  The strategy behind this patch was suggested by Christoph.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 12:22:12 +10:00
Darrick J. Wong 1c0607ace9 xfs: enable the rmap btree functionality
Originally-From: Dave Chinner <dchinner@redhat.com>

Add the feature flag to the supported matrix so that the kernel can
mount and use rmap btree enabled filesystems

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick.wong@oracle.com: move the experimental tag]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 12:20:57 +10:00
Darrick J. Wong 04f130605f xfs: don't update rmapbt when fixing agfl
Allow a caller of xfs_alloc_fix_freelist to disable rmapbt updates
when fixing the AG freelist.  xfs_repair needs this during phase 5
to be able to adjust the freelist while it's reconstructing the rmap
btree; the missing entries will be added back at the very end of
phase 5 once the AGFL contents settle down.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 12:19:53 +10:00
Darrick J. Wong 2b0eeb5e74 xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled
Swapping extents between two inodes requires the owner to be updated
in the rmap tree for all the extents that are swapped. This code
does not yet exist, so switch off the XFS_IOC_SWAPEXT ioctl until
support has been implemented. This will need to be done before the
rmap btree code can have the experimental tag removed.

This functionality will be provided in a (much) later patch, using
some of the reflink deferred block remapping functionality to
accomlish extent swapping with rmap updates.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 12:18:07 +10:00
Darrick J. Wong a650e8f98e xfs: add rmap btree block detection to log recovery
Originally-From: Dave Chinner <dchinner@redhat.com>

So such blocks can be correctly identified and have their operations
structures attached to validate recovery has not resulted in a
correct block.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 12:17:11 +10:00
Darrick J. Wong 5d650e90a1 xfs: add rmap btree geometry feature flag
Originally-From: Dave Chinner <dchinner@redhat.com>

So xfs_info and other userspace utilities know the filesystem is
using this feature.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 12:16:44 +10:00
Darrick J. Wong 9c19464469 xfs: propagate bmap updates to rmapbt
When we map, unmap, or convert an extent in a file's data or attr
fork, schedule a respective update in the rmapbt.  Previous versions
of this patch required a 1:1 correspondence between bmap and rmap,
but this is no longer true as we now have ability to make interval
queries against the rmapbt.

We use the deferred operations code to handle redo operations
atomically and deadlock free.  This plumbs in all five rmap actions
(map, unmap, convert extent, alloc, free); we'll use the first three
now for file data, and reflink will want the last two.  We also add
an error injection site to test log recovery.

Finally, we need to fix the bmap shift extent code to adjust the
rmaps correctly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 12:16:05 +10:00
Darrick J. Wong f8dbebef98 xfs: enable the xfs_defer mechanism to process rmaps to update
Connect the xfs_defer mechanism with the pieces that we'll need to
handle deferred rmap updates.  We'll wire up the existing code to
our new deferred mechanism later.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 12:11:01 +10:00
Darrick J. Wong 9e88b5d867 xfs: log rmap intent items
Provide a mechanism for higher levels to create RUI/RUD items, submit
them to the log, and a stub function to deal with recovered RUI items.
These parts will be connected to the rmapbt in a later patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 12:09:48 +10:00
Darrick J. Wong 5880f2d78f xfs: create rmap update intent log items
Create rmap update intent/done log items to record redo information in
the log.  Because we need to roll transactions between updating the
bmbt mapping and updating the reverse mapping, we also have to track
the status of the metadata updates that will be recorded in the
post-roll transactions, just in case we crash before committing the
final transaction.  This mechanism enables log recovery to finish what
was already started.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 12:04:45 +10:00
Darrick J. Wong abf0923381 xfs: add rmap btree insert and delete helpers
Add a couple of helper functions to encapsulate rmap btree insert and
delete operations.  Add tracepoints to the update function.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 12:03:58 +10:00
Darrick J. Wong fb7d926769 xfs: convert unwritten status of reverse mappings
Provide a function to convert an unwritten rmap extent to a real one
and vice versa.

[ dchinner: Note that this algorithm and code was derived from the
  existing bmapbt unwritten extent conversion code in
  xfs_bmap_add_extent_unwritten_real(). ]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 12:03:19 +10:00
Darrick J. Wong f922cd90b8 xfs: remove an extent from the rmap btree
Originally-From: Dave Chinner <dchinner@redhat.com>

Now that we have records in the rmap btree, we need to remove them
when extents are freed. This needs to find the relevant record in
the btree and remove/trim/split it accordingly.

[darrick.wong@oracle.com: make rmap routines handle the enlarged keyspace]
[dchinner: remove remaining unused debug printks]
[darrick: fix a bug when growfs in an AG with an rmap ending at EOFS]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:45:12 +10:00
Darrick J. Wong 0a1b0b3855 xfs: add an extent to the rmap btree
Originally-From: Dave Chinner <dchinner@redhat.com>

Now all the btree, free space and transaction infrastructure is in
place, we can finally add the code to insert reverse mappings to the
rmap btree. Freeing will be done in a separate patch, so just the
addition operation can be focussed on here.

[darrick: handle owner offsets when adding rmaps]
[dchinner: remove remaining debug printk statements]
[darrick: move unwritten bit to rm_offset]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:44:21 +10:00
Darrick J. Wong aa966d84aa xfs: add tracepoints for the rmap functions
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:43:24 +10:00
Darrick J. Wong c543838a1e xfs: teach rmapbt to support interval queries
Now that the generic btree code supports querying all records within a
range of keys, use that functionality to allow us to ask for all the
extents mapped to a range of physical blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:42:39 +10:00
Darrick J. Wong cfed56ae5f xfs: support overlapping intervals in the rmap btree
Now that the generic btree code supports overlapping intervals, plug
in the rmap btree to this functionality.  We will need it to find
potential left neighbors in xfs_rmap_{alloc,free} later in the patch
set.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:40:56 +10:00
Darrick J. Wong 4b8ed67794 xfs: add rmap btree operations
Originally-From: Dave Chinner <dchinner@redhat.com>

Implement the generic btree operations needed to manipulate rmap
btree blocks. This is very similar to the per-ag freespace btree
implementation, and uses the AGFL for allocation and freeing of
blocks.

Adapt the rmap btree to store owner offsets within each rmap record,
and to handle the primary key being redefined as the tuple
[agblk, owner, offset].  The expansion of the primary key is crucial
to allowing multiple owners per extent.

[darrick: adapt the btree ops to deal with offsets]
[darrick: remove init_rec_from_key]
[darrick: move unwritten bit to rm_offset]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:39:05 +10:00
Darrick J. Wong 525488520a xfs: rmap btree requires more reserved free space
Originally-From: Dave Chinner <dchinner@redhat.com>

The rmap btree is allocated from the AGFL, which means we have to
ensure ENOSPC is reported to userspace before we run out of free
space in each AG. The last allocation in an AG can cause a full
height rmap btree split, and that means we have to reserve at least
this many blocks *in each AG* to be placed on the AGFL at ENOSPC.
Update the various space calculation functions to handle this.

Also, because the macros are now executing conditional code and are
called quite frequently, convert them to functions that initialise
variables in the struct xfs_mount, use the new variables everywhere
and document the calculations better.

[darrick.wong@oracle.com: don't reserve blocks if !rmap]
[dchinner@redhat.com: update m_ag_max_usable after growfs]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:38:24 +10:00
Darrick J. Wong fa30f03cda xfs: rmap btree transaction reservations
The rmap btrees will use the AGFL as the block allocation source, so
we need to ensure that the transaction reservations reflect the fact
this tree is modified by allocation and freeing. Hence we need to
extend all the extent allocation/free reservations used in
transactions to handle this.

Note that this also gets rid of the unused XFS_ALLOCFREE_LOG_RES
macro, as we now do buffer reservations based on the number of
buffers logged via xfs_calc_buf_res(). Hence we only need the buffer
count calculation now.

[darrick: use rmap_maxlevels when calculating log block resv]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:37:10 +10:00
Darrick J. Wong e70d829f8d xfs: add rmap btree growfs support
Originally-From: Dave Chinner <dchinner@redhat.com>

Now we can read and write rmap btree blocks, we can add support to
the growfs code to initialise new rmap btree blocks.

[darrick.wong@oracle.com: fill out the rmap offset fields]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:36:08 +10:00
Darrick J. Wong 035e00acb5 xfs: define the on-disk rmap btree format
Originally-From: Dave Chinner <dchinner@redhat.com>

Now we have all the surrounding call infrastructure in place, we can
start filling out the rmap btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate rmap btree blocks. This prepares the way for adding the
btree operations implementation.

[darrick: record owner and offset info in rmap btree]
[darrick: fork, bmbt and unwritten state in rmap btree]
[darrick: flags are a separate field in xfs_rmap_irec]
[darrick: calculate maxlevels separately]
[darrick: move the 'unwritten' bit into unused parts of rm_offset]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:36:07 +10:00
Darrick J. Wong 673930c34a xfs: introduce rmap extent operation stubs
Originally-From: Dave Chinner <dchinner@redhat.com>

Add the stubs into the extent allocation and freeing paths that the
rmap btree implementation will hook into. While doing this, add the
trace points that will be used to track rmap btree extent
manipulations.

[darrick.wong@oracle.com: Extend the stubs to take full owner info.]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:33:43 +10:00
Darrick J. Wong 340785cca1 xfs: add owner field to extent allocation and freeing
For the rmap btree to work, we have to feed the extent owner
information to the the allocation and freeing functions. This
information is what will end up in the rmap btree that tracks
allocated extents. While we technically don't need the owner
information when freeing extents, passing it allows us to validate
that the extent we are removing from the rmap btree actually
belonged to the owner we expected it to belong to.

We also define a special set of owner values for internal metadata
that would otherwise have no owner. This allows us to tell the
difference between metadata owned by different per-ag btrees, as
well as static fs metadata (e.g. AG headers) and internal journal
blocks.

There are also a couple of special cases we need to take care of -
during EFI recovery, we don't actually know who the original owner
was, so we need to pass a wildcard to indicate that we aren't
checking the owner for validity. We also need special handling in
growfs, as we "free" the space in the last AG when extending it, but
because it's new space it has no actual owner...

While touching the xfs_bmap_add_free() function, re-order the
parameters to put the struct xfs_mount first.

Extend the owner field to include both the owner type and some sort
of index within the owner.  The index field will be used to support
reverse mappings when reflink is enabled.

When we're freeing extents from an EFI, we don't have the owner
information available (rmap updates have their own redo items).
xfs_free_extent therefore doesn't need to do an rmap update. Make
sure that the log replay code signals this correctly.

This is based upon a patch originally from Dave Chinner. It has been
extended to add more owner information with the intent of helping
recovery operations when things go wrong (e.g. offset of user data
block in a file).

[dchinner: de-shout the xfs_rmap_*_owner helpers]
[darrick: minor style fixes suggested by Christoph Hellwig]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:33:42 +10:00
Darrick J. Wong 8018026ef2 xfs: rmap btree add more reserved blocks
Originally-From: Dave Chinner <dchinner@redhat.com>

XFS reserves a small amount of space in each AG for the minimum
number of free blocks needed for operation. Adding the rmap btree
increases the number of reserved blocks, but it also increases the
complexity of the calculation as the free inode btree is optional
(like the rmbt).

Rather than calculate the prealloc blocks every time we need to
check it, add a function to calculate it at mount time and store it
in the struct xfs_mount, and convert the XFS_PREALLOC_BLOCKS macro
just to use the xfs-mount variable directly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:31:47 +10:00
Darrick J. Wong 00f4e4f907 xfs: add rmap btree stats infrastructure
Originally-From: Dave Chinner <dchinner@redhat.com>

The rmap btree will require the same stats as all the other generic
btrees, so add all the code for that now.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:31:11 +10:00
Darrick J. Wong b87049444a xfs: introduce rmap btree definitions
Originally-From: Dave Chinner <dchinner@redhat.com>

Add new per-ag rmap btree definitions to the per-ag structures. The
rmap btree will sit in the empty slots on disk after the free space
btrees, and hence form a part of the array of space management
btrees. This requires the definition of the btree to be contiguous
with the free space btrees.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:30:32 +10:00
Darrick J. Wong df3954ff72 xfs: increase XFS_BTREE_MAXLEVELS to fit the rmapbt
By my calculations, a 1,073,741,824 block AG with a 1k block size
can attain a maximum height of 9.  Assuming a record size of 24
bytes, a key/ptr size of 44 bytes, and half-full btree nodes, we'd
need 53,687,092 blocks for the records and ~6 million blocks for the
keys.  That requires a btree of height 9 based on the following
derivation:

Block size = 1024b
sblock CRC header = 56b
== 1024-56 = 968 bytes for tree data

rmapbt record = 24b
== 40 records per leaf block

rmapbt ptr/key = 44b
== 22 ptr/keys per block

Worst case, each block is half full, so 20 records and 11 ptrs per block.

1073741824 rmap records / 20 records per block
== 53687092 leaf blocks

53687092 leaves / 11 ptrs per block
== 4880645 level 1 blocks
== 443695 level 2 blocks
== 40336 level 3 blocks
== 3667 level 4 blocks
== 334 level 5 blocks
== 31 level 6 blocks
== 3 level 7 blocks
== 1 level 8 block

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:29:42 +10:00
Darrick J. Wong ba9e780246 xfs: add tracepoints and error injection for deferred extent freeing
Add a couple of tracepoints for the deferred extent free operation and
a site for injecting errors while finishing the operation.  This makes
it easier to debug deferred ops and test log redo.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:26:33 +10:00
Darrick J. Wong dc42375d5f xfs: refactor redo intent item processing
Refactor the EFI intent item recovery (and cancellation) functions
into a general function that scans the AIL and an intent item type
specific handler.  Move the function that recovers a single EFI item
into the extent free item code.  We'll want the generalized function
when we start wiring up more redo item types.

Furthermore, ensure that log recovery only replays the redo items
that were in the AIL prior to recovery by checking the item LSN
against the largest LSN seen during log scanning.  As written this
should never happen, but we can be defensive anyway.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:23:49 +10:00
Darrick J. Wong 2c3234d1ef xfs: rename flist/free_list to dfops
Mechanical change of flist/free_list to dfops, since they're now
deferred ops, not just a freeing list.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:19:29 +10:00
Darrick J. Wong 310a75a3c6 xfs: change xfs_bmap_{finish,cancel,init,free} -> xfs_defer_*
Drop the compatibility shims that we were using to integrate the new
deferred operation mechanism into the existing code.  No new code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:18:10 +10:00
Darrick J. Wong 3ab78df2a5 xfs: rework xfs_bmap_free callers to use xfs_defer_ops
Restructure everything that used xfs_bmap_free to use xfs_defer_ops
instead.  For now we'll just remove the old symbols and play some
cpp magic to make it work; in the next patch we'll actually rename
everything.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:15:38 +10:00
Darrick J. Wong 9749fee83f xfs: enable the xfs_defer mechanism to process extents to free
Connect the xfs_defer mechanism with the pieces that we'll need to
handle deferred extent freeing.  We'll wire up the existing code to
our new deferred mechanism later.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:14:35 +10:00
Darrick J. Wong bba61cbf30 xfs: clean up typedef usage in the EFI/EFD handling code
Replace structure typedefs with struct xfs_foo_* in the EFI/EFD
handling code in preparation to move it over to deferred ops.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:13:47 +10:00
Darrick J. Wong 3cd48abcc1 xfs: add tracepoints for the deferred ops mechanism
Add tracepoints for the internals of the deferred ops mechanism
and tracepoint classes for clients of the dops, to make debugging
easier.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:13:02 +10:00
Darrick J. Wong 4e0cc29b91 xfs: move deferred operations into a separate file
All the code around struct xfs_bmap_free basically implements a
deferred operation framework through which we can roll transactions
(to unlock buffers and avoid violating lock order rules) while
managing all the necessary log redo items.  Previously we only used
this code to free extents after some sort of mapping operation, but
with the advent of rmap and reflink, we suddenly need to do more than
that.

With that in mind, xfs_bmap_free really becomes a deferred ops control
structure.  Rename the structure and move the deferred ops into their
own file to avoid further bloating of the bmap code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:12:25 +10:00
Darrick J. Wong 28a89567b8 xfs: refactor btree owner change into a separate visit-blocks function
Refactor the btree_change_owner function into a more generic apparatus
which visits all blocks in a btree.  We'll use this in a subsequent
patch for counting btree blocks for AG reservations.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:10:55 +10:00
Darrick J. Wong 105f7d83db xfs: introduce interval queries on btrees
Create a function to enable querying of btree records mapping to a
range of keys.  This will be used in subsequent patches to allow
querying the reverse mapping btree to find the extents mapped to a
range of physical blocks, though the generic code can be used for
any range query.

The overlapped query range function needs to use the btree get_block
helper because the root block could be an inode, in which case
bc_bufs[nlevels-1] will be NULL.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:10:21 +10:00
Darrick J. Wong 2c813ad66a xfs: support btrees with overlapping intervals for keys
On a filesystem with both reflink and reverse mapping enabled, it's
possible to have multiple rmap records referring to the same blocks on
disk.  When overlapping intervals are possible, querying a classic
btree to find all records intersecting a given interval is inefficient
because we cannot use the left side of the search interval to filter
out non-matching records the same way that we can use the existing
btree key to filter out records coming after the right side of the
search interval.  This will become important once we want to use the
rmap btree to rebuild BMBTs, or implement the (future) fsmap ioctl.

(For the non-overlapping case, we can perform such queries trivially
by starting at the left side of the interval and walking the tree
until we pass the right side.)

Therefore, extend the btree code to come closer to supporting
intervals as a first-class record attribute.  This involves widening
the btree node's key space to store both the lowest key reachable via
the node pointer (as the btree does now) and the highest key reachable
via the same pointer and teaching the btree modifying functions to
keep the highest-key records up to date.

This behavior can be turned on via a new btree ops flag so that btrees
that cannot store overlapping intervals don't pay the overhead costs
in terms of extra code and disk format changes.

When we're deleting a record in a btree that supports overlapped
interval records and the deletion results in two btree blocks being
joined, we defer updating the high/low keys until after all possible
joining (at higher levels in the tree) have finished.  At this point,
the btree pointers at all levels have been updated to remove the empty
blocks and we can update the low and high keys.

When we're doing this, we must be careful to update the keys of all
node pointers up to the root instead of stopping at the first set of
keys that don't need updating.  This is because it's possible for a
single deletion to cause joining of multiple levels of tree, and so
we need to update everything going back to the root.

The diff_two_keys functions return < 0, 0, or > 0 if key1 is less than,
equal to, or greater than key2, respectively.  This is consistent
with the rest of the kernel and the C library.

In btree_updkeys(), we need to evaluate the force_all parameter before
running the key diff to avoid reading uninitialized memory when we're
forcing a key update.  This happens when we've allocated an empty slot
at level N + 1 to point to a new block at level N and we're in the
process of filling out the new keys.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:08:36 +10:00
Linus Torvalds d52bd54db8 Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:

 - the rest of ocfs2

 - various hotfixes, mainly MM

 - quite a bit of misc stuff - drivers, fork, exec, signals, etc.

 - printk updates

 - firmware

 - checkpatch

 - nilfs2

 - more kexec stuff than usual

 - rapidio updates

 - w1 things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (111 commits)
  ipc: delete "nr_ipc_ns"
  kcov: allow more fine-grained coverage instrumentation
  init/Kconfig: add clarification for out-of-tree modules
  config: add android config fragments
  init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig
  relay: add global mode support for buffer-only channels
  init: allow blacklisting of module_init functions
  w1:omap_hdq: fix regression
  w1: add helper macro module_w1_family
  w1: remove need for ida and use PLATFORM_DEVID_AUTO
  rapidio/switches: add driver for IDT gen3 switches
  powerpc/fsl_rio: apply changes for RIO spec rev 3
  rapidio: modify for rev.3 specification changes
  rapidio: change inbound window size type to u64
  rapidio/idt_gen2: fix locking warning
  rapidio: fix error handling in mbox request/release functions
  rapidio/tsi721_dma: advance queue processing from transfer submit call
  rapidio/tsi721: add messaging mbox selector parameter
  rapidio/tsi721: add PCIe MRRS override parameter
  rapidio/tsi721_dma: add channel mask and queue size parameters
  ...
2016-08-02 21:08:07 -04:00
Darrick J. Wong 70b2265935 xfs: add function pointers for get/update keys to the btree
Add some function pointers to bc_ops to get the btree keys for
leaf and node blocks, and to update parent keys of a block.
Convert the _btree_updkey calls to use our new pointer, and
modify the tree shape changing code to call the appropriate
get_*_keys pointer instead of _btree_copy_keys because the
overlapping btree has to calculate high key values.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:03:38 +10:00
Darrick J. Wong e5821e57af xfs: during btree split, save new block key & ptr for future insertion
When a btree block has to be split, we pass the new block's ptr from
xfs_btree_split() back to xfs_btree_insert() via a pointer parameter;
however, we pass the block's key through the cursor's record.  It is a
little weird to "initialize" a record from a key since the non-key
attributes will have garbage values.

When we go to add support for interval queries, we have to be able to
pass the lowest and highest keys accessible via a pointer.  There's no
clean way to pass this back through the cursor's record field.
Therefore, pass the key directly back to xfs_btree_insert() the same
way that we pass the btree_ptr.

As a bonus, we no longer need init_rec_from_key and can drop it from the
codebase.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:02:39 +10:00
Darrick J. Wong 0d309791bd xfs: set *stat=1 after iroot realloc
If we make the inode root block of a btree unfull by expanding the
root, we must set *stat to 1 to signal success, rather than leaving
it uninitialized.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:01:25 +10:00
Darrick J. Wong f4a0660de3 xfs: fix locking of the rt bitmap/summary inodes
When we're deleting realtime extents, we need to lock the summary
inode in case we need to update the summary info to prevent an assert
on the rsumip inode lock on a debug kernel.  While we're at it, fix
the locking annotations so that we avoid triggering lockdep warnings.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:00:42 +10:00
Darrick J. Wong 3dadf901dd xfs: fix attr shortform structure alignment on cris
Apparently cris doesn't require structure stride to align with the
largest type in the struct, so list[0] isn't at offset 4 like it is
everywhere else.  Fix this... insofar as existing XFSes on cris are
screwed.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 10:59:42 +10:00
Darrick J. Wong 0facef7fb0 xfs: in _attrlist_by_handle, copy the cursor back to userspace
When we're iterating inode xattrs by handle, we have to copy the
cursor back to userspace so that a subsequent invocation actually
retrieves subsequent contents.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 10:58:53 +10:00
Linus Torvalds 8cbdd85bda orangefs: kernel side caching and executable bugfix
This allows OrangeFS to utilize the dcache and adds an in kernel
 attribute cache. We previously used the user side client for this
 purpose.
 
 We see a modest performance increase on small file operations. For
 example, without the cache, compiling coreutils takes about 17 minutes.
 With the patch and a 50 millisecond timeout for dcache_timeout_msecs and
 getattr_timeout_msecs (the default), compiling coreutils takes about
 6 minutes 20 seconds. On the same hardware, compiling coreutils on an
 xfs filesystem takes 90 seconds. We see similar improvements with mdtest
 and a test involving writing, reading, and deleting a large number of
 small files.
 
 Interested parties can review more data at the following URL.
 
 https://docs.google.com/spreadsheets/d/1v4aUeppKexIbRMz_Yn9k4eaM3uy2KCaPoe_93YKWOtA/pubhtml
 
 The eventual goal of this is to allow getdents to turn into a
 readdirplus to the OrangeFS server. The cache will be filled then, which
 should provide a performance benefit to the common case of readdir
 followed by getattr on each entry (i.e. ls -l).
 
 This also fixes a bug. When orangefs_inode_permission was added, it did
 not collect i_size from the OrangeFS server, since this presses an
 unnecessary load on the OrangeFS server. However, it left a case where
 i_size is never initialized. Then running an executable could fail.
 
 With this patch, size is always collected to be inserted into the cache.
 Thus the bug disappears. If this patch is not accepted during this merge
 window, we will send a one-line band-aid for this bug instead.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIxBAABCAAbBQJXoPhPFBxtYXJ0aW5Ab21uaWJvbmQuY29tAAoJEPVzxHxs4+kh
 wCsQALUKnyoJzhHAmEoxYZGUPchgBS2yyWQJGP3ViqE8GbVubVG2NsLbluO1u5en
 /pdOPDXeij7pPGzdWk6wt0tXvM3oGJ3UPRi9ofEtU3XHnb4srX6XHBeG3ZHHZH0A
 91NPnMsmlBQvivBbVbjYrgXMKXz/UCQot7Y5iP7o9Gmick5tQqhRB21GcSCMeD7k
 ycrl61EA+GYDZOlzVspF2LJ52MhIXuT1T9ev66dLQWv8p6pMmpA4kda3Dwvqn/cE
 GGTeElq2PBGdhGapK4axGfRAW55997j9k6gcxLvFdA99ayAQ3+0hzXw4rNzcdabA
 ESUOe4riaYEaGEd686Mtd2w9hxvr1bOqkyRCKNnko90JJnqfGsgLfetpasG8CgUo
 n8VGxjimuCamBDf1+0ZzUs0Pj8q+U1QNQtHJi9QR/sNnNds/52k9OXV2r4MG+suU
 MAie5eD0Py6GzP9pOrAmuFbBkgd7Ag3EbiTjR1lKRpBR626inL/jM60XFfaF4P5g
 YOXC+VtJuVR88emIxqJ9ebdEy9+2yfkyinrLH9xZNctoz7KIoMhsmWb2bONKJDnx
 ngoqVKyH5opw6dKRkbTCM1A2mq8NntDvU6yeyHYJ2NXPXgARf9rSUIJ0RvR3oxdh
 Fqt5QyYHYDPZBuQn9XUV7t+VhAOFCbAPUDMMlifZUNx7icbj
 =rGmf
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v4.8' of git://github.com/martinbrandenburg/linux

Pull orangefs update from Martin Brandenburg:
 "Kernel side caching and executable bugfix

  This allows OrangeFS to utilize the dcache and adds an in kernel
  attribute cache.  We previously used the user side client for this
  purpose.

  We see a modest performance increase on small file operations.  For
  example, without the cache, compiling coreutils takes about 17
  minutes.  With the patch and a 50 millisecond timeout for
  dcache_timeout_msecs and getattr_timeout_msecs (the default),
  compiling coreutils takes about 6 minutes 20 seconds.  On the same
  hardware, compiling coreutils on an xfs filesystem takes 90 seconds.
  We see similar improvements with mdtest and a test involving writing,
  reading, and deleting a large number of small files.

  Interested parties can review more data at the following URL.

    https://docs.google.com/spreadsheets/d/1v4aUeppKexIbRMz_Yn9k4eaM3uy2KCaPoe_93YKWOtA/pubhtml

  The eventual goal of this is to allow getdents to turn into a
  readdirplus to the OrangeFS server.  The cache will be filled then,
  which should provide a performance benefit to the common case of
  readdir followed by getattr on each entry (i.e.  ls -l).

  This also fixes a bug.  When orangefs_inode_permission was added, it
  did not collect i_size from the OrangeFS server, since this presses an
  unnecessary load on the OrangeFS server.  However, it left a case
  where i_size is never initialized.  Then running an executable could
  fail.

  With this patch, size is always collected to be inserted into the
  cache.  Thus the bug disappears.  If this patch is not accepted during
  this merge window, we will send a one-line band-aid for this bug
  instead"

* tag 'for-linus-v4.8' of git://github.com/martinbrandenburg/linux:
  Orangefs: update orangefs.txt
  orangefs: Account for jiffies wraparound.
  orangefs: Change default dcache and getattr timeout to 50 msec.
  orangefs: Allow dcache and getattr cache time to be configured.
  orangefs: Cache getattr results.
  orangefs: Use d_time to avoid excessive lookups
2016-08-02 19:47:06 -04:00
Linus Torvalds 72b5ac54d6 The highlights are:
* RADOS namespace support in libceph and CephFS (Zheng Yan and myself).
    The stopgaps added in 4.5 to deny access to inodes in namespaces are
    removed and CEPH_FEATURE_FS_FILE_LAYOUT_V2 feature bit is now fully
    supported.
 
  * A large rework of the MDS cap flushing code (Zheng Yan).
 
  * Handle some of ->d_revalidate() in RCU mode (Jeff Layton).  We were
    overly pessimistic before, bailing at the first sight of LOOKUP_RCU.
 
 On top of that we've got a few CephFS bug fixes, a couple of cleanups
 and Arnd's workaround for a weird genksyms issue.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJXoKLJAAoJEEp/3jgCEfOLDTUIAIcctpKUiNBokc95mQaXYl34
 j7lPIaD0/Ur7JPt4nMdtlywYJYSVV2c+SglHztj/+fv0G4bWbLVEFRruh9SwKIci
 PzttcmycIAqSn1f5gBZwyQbGuffd/F0EnBj7fFjcukt01i3s1ZQ7t4XtLGtAV0Ts
 aIfFtx9SqWig57Z1OZqNgnhnOoh6IqNbic3FL5Hvdl5N5pFbBcQho6Vzoa5O1osH
 URG6RmCcO4nykfSoxiivE7UZ+CImsXHkRD7rupBuIjqjZ8wvmZqQF5qxnkb9Dw2F
 IkNhrHkTSIiv4EsNPLAETTnFSozrL1nEykKr2FBW+ti8nxNcav+8FgVapqLvFIw=
 =gQ0/
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client

Pull Ceph updates from Ilya Dryomov:
 "The highlights are:

   - RADOS namespace support in libceph and CephFS (Zheng Yan and
     myself).  The stopgaps added in 4.5 to deny access to inodes in
     namespaces are removed and CEPH_FEATURE_FS_FILE_LAYOUT_V2 feature
     bit is now fully supported

   - A large rework of the MDS cap flushing code (Zheng Yan)

   - Handle some of ->d_revalidate() in RCU mode (Jeff Layton).  We were
     overly pessimistic before, bailing at the first sight of LOOKUP_RCU

  On top of that we've got a few CephFS bug fixes, a couple of cleanups
  and Arnd's workaround for a weird genksyms issue"

* tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client: (34 commits)
  ceph: fix symbol versioning for ceph_monc_do_statfs
  ceph: Correctly return NXIO errors from ceph_llseek
  ceph: Mark the file cache as unreclaimable
  ceph: optimize cap flush waiting
  ceph: cleanup ceph_flush_snaps()
  ceph: kick cap flushes before sending other cap message
  ceph: introduce an inode flag to indicates if snapflush is needed
  ceph: avoid sending duplicated cap flush message
  ceph: unify cap flush and snapcap flush
  ceph: use list instead of rbtree to track cap flushes
  ceph: update types of some local varibles
  ceph: include 'follows' of pending snapflush in cap reconnect message
  ceph: update cap reconnect message to version 3
  ceph: mount non-default filesystem by name
  libceph: fsmap.user subscription support
  ceph: handle LOOKUP_RCU in ceph_d_revalidate
  ceph: allow dentry_lease_is_valid to work under RCU walk
  ceph: clear d_fsinfo pointer under d_lock
  ceph: remove ceph_mdsc_lease_release
  ceph: don't use ->d_time
  ...
2016-08-02 19:39:09 -04:00
Jeff Mahoney 0a11b9aae4 reiserfs: fix "new_insert_key may be used uninitialized ..."
new_insert_key only makes any sense when it's associated with a
new_insert_ptr, which is initialized to NULL and changed to a
buffer_head when we also initialize new_insert_key.  We can key off of
that to avoid the uninitialized warning.

Link: http://lkml.kernel.org/r/5eca5ffb-2155-8df2-b4a2-f162f105efed@suse.com
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:22 -04:00
Ryusuke Konishi e63e88bc53 nilfs2: move ioctl interface and disk layout to uapi separately
The header file "include/linux/nilfs2_fs.h" is composed of parts for
ioctl and disk format, and both are intended to be shared with user
space programs.

This moves them to the uapi directory "include/uapi/linux" splitting the
file to "nilfs2_api.h" and "nilfs2_ondisk.h".  The following minor
changes are accompanied by this migration:

 - nilfs_direct_node struct in nilfs2/direct.h is converged to
   nilfs2_ondisk.h because it's an on-disk structure.
 - inline functions nilfs_rec_len_from_disk() and
   nilfs_rec_len_to_disk() are moved to nilfs2/dir.c.

Link: http://lkml.kernel.org/r/1465825507-3407-4-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:21 -04:00
Ryusuke Konishi 4ce5c3426c nilfs2: use BIT() macro
Replace bit shifts by BIT macro for clarity.

Link: http://lkml.kernel.org/r/1465825507-3407-3-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:21 -04:00
Ryusuke Konishi ad980c9ab7 nilfs2: fix misuse of a semaphore in sysfs code
Variables ns_seg_seq, ns_segnum, ns_nextnum, ns_pseg_offset, ns_cno,
ns_ctime, ns_nongc_ctime, and ns_ndirtyblks, are protected by
ns_segctor_sem, but ns_sem is wrongly used by the nilfs sysfs code when
reading these variables.  This fixes the misuse and clarifies which
semaphore protects them in the comment of the_nilfs struct.

Link: http://lkml.kernel.org/r/1465825507-3407-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:20 -04:00
Ryusuke Konishi a7d3f104da nilfs2: refactor parser of snapshot mount option
Move parser of snapshot mount option to a separate function
nilfs_parse_snapshot_option(), replace simple_strtoull() with
kstrtoull() to avoid checkpatch.pl warning "WARNING: simple_strtoull is
obsolete, use kstrtoull instead", and refine the error message of the
parser.

Link: http://lkml.kernel.org/r/1464875891-5443-9-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:20 -04:00
Ryusuke Konishi aceb4170bb nilfs2: do not use yield()
Use cond_resched() instead of yield() in the loop of
nilfs_transaction_lock() since the usage corresponds to the "be nice for
others" case that the comment of yield() says.

This removes the following checkpatch.pl warning:

 "WARNING: Using yield() is generally wrong. See yield() kernel-doc
  (sched/core.c)"

Link: http://lkml.kernel.org/r/1464875891-5443-8-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:19 -04:00
Ryusuke Konishi 39a9dcca61 nilfs2: emit error message when I/O error is detected
When nilfs returned -EIO as an error code, it's not always clear if it
came from the underlying block device or not.  This will mend the issue
by having low level I/O routines of nilfs output an error message when
they detected an I/O error.

Link: http://lkml.kernel.org/r/1464875891-5443-7-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:19 -04:00
Ryusuke Konishi d6517deb01 nilfs2: replace nilfs_warning() with nilfs_msg()
Use nilfs_msg() to output warning messages and get rid of
nilfs_warning() function.  This also removes function names from the
messages unless we embed them explicitly in format strings.  Instead,
some messages are revised to clarify the context.

[arnd@arndb.de: avoid warning about unused variables]
  Link: http://lkml.kernel.org/r/20160615201945.3348205-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/1464875891-5443-6-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:18 -04:00
Ryusuke Konishi feee880fa5 nilfs2: reduce bare use of printk() with nilfs_msg()
Replace most use of printk() in nilfs2 implementation with nilfs_msg(),
and reduce the following checkpatch.pl warning:

  "WARNING: Prefer [subsystem eg: netdev]_crit([subsystem]dev, ...
   then dev_crit(dev, ... then pr_crit(...  to printk(KERN_CRIT ..."

This patch also fixes a minor checkpatch warning "WARNING: quoted string
split across lines" that often accompanies the prior warning, and amends
message format as needed.

Link: http://lkml.kernel.org/r/1464875891-5443-5-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:17 -04:00
Ryusuke Konishi 6625689e15 nilfs2: embed a back pointer to super block instance in nilfs object
Insert a back pointer to super block instance in nilfs object so that
functions of nilfs2 easily refer to the super block instance.  This
simplifies replacement of printk() in the successive change.

Link: http://lkml.kernel.org/r/1464875891-5443-4-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:17 -04:00
Ryusuke Konishi a66dfb0a91 nilfs2: add nilfs_msg() message interface
Define an own output routine to replace bare use of printk() function.
The output routine is implemented with a macro and a helper function,
which are named nilfs_msg() and __nilfs_msg(), respectively.

__nilfs_msg() formats a message like "NILFS (<device-name>): <message>",
prefixing it with a given log level, and terminates the statement with a
newline.  The "device-name" is optional to make it available in early
stages; it will be omitted if a NULL pointer is passed to super block
instance argument.  nilfs_msg() wraps __nilfs_msg() and is removed if
CONFIG_PRINTK is not set.

Link: http://lkml.kernel.org/r/1464875891-5443-3-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:16 -04:00