linux

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	24c3047095	Merge branch 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs * 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits) pnfsblock: write_pagelist handle zero invalid extents pnfsblock: note written INVAL areas for layoutcommit pnfsblock: bl_write_pagelist pnfsblock: bl_read_pagelist pnfsblock: cleanup_layoutcommit pnfsblock: encode_layoutcommit pnfsblock: merge rw extents pnfsblock: add extent manipulation functions pnfsblock: bl_find_get_extent pnfsblock: xdr decode pnfs_block_layout4 pnfsblock: call and parse getdevicelist pnfsblock: merge extents pnfsblock: lseg alloc and free pnfsblock: remove device operations pnfsblock: add device operations pnfsblock: basic extent code pnfsblock: use pageio_ops api pnfsblock: add blocklayout Kconfig option, Makefile, and stubs pnfs: cleanup_layoutcommit pnfs: ask for layout_blksize and save it in nfs_server ...	2011-07-31 06:26:50 -10:00
Peng Tao	71cdd40fd4	pnfsblock: write_pagelist handle zero invalid extents For invalid extents, find other pages in the same fsblock and write them out. [pnfsblock: write_begin] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:17 -04:00
Fred Isaman	31e6306a40	pnfsblock: note written INVAL areas for layoutcommit Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:17 -04:00
Fred Isaman	650e2d39bd	pnfsblock: bl_write_pagelist Note: When upper layer's read/write request cannot be fulfilled, the block layout driver shouldn't silently mark the page as error. It should do what can be done and leave the rest to the upper layer. To do so, we should set rdata/wdata->res.count properly. When upper layer re-send the read/write request to finish the rest part of the request, pgbase is the position where we should start at. [pnfsblock: bl_write_pagelist support functions] [pnfsblock: bl_write_pagelist adjust for missing PG_USE_PNFS] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: handle errors when read or write pagelist.] Signed-off-by: Zhang Jingwang <yyalone@gmail.com> [pnfs-block: use new write_pagelist api] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> [SQUASHME: pnfsblock: mds_offset is set in the generic layer] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> [pnfsblock: mark IO error with NFS_LAYOUT_{RW\|RO}_FAILED] Signed-off-by: Peng Tao <peng_tao@emc.com> [pnfsblock: SQUASHME: adjust to API change] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: fixup blksize alignment in bl_setup_layoutcommit] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> [pnfsblock: bl_write_pagelist adjust for missing PG_USE_PNFS] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: handle errors when read or write pagelist.] Signed-off-by: Zhang Jingwang <yyalone@gmail.com> [pnfs-block: use new write_pagelist api] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:17 -04:00
Fred Isaman	9549ec01b0	pnfsblock: bl_read_pagelist Note: When upper layer's read/write request cannot be fulfilled, the block layout driver shouldn't silently mark the page as error. It should do what can be done and leave the rest to the upper layer. To do so, we should set rdata/wdata->res.count properly. When upper layer re-send the read/write request to finish the rest part of the request, pgbase is the position where we should start at. [pnfsblock: mark IO error with NFS_LAYOUT_{RW\|RO}_FAILED] Signed-off-by: Peng Tao <peng_tao@emc.com> [pnfsblock: read path error handling] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: handle errors when read or write pagelist.] Signed-off-by: Zhang Jingwang <yyalone@gmail.com> [pnfs-block: use new read_pagelist api] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:17 -04:00
Fred Isaman	b2be7811dd	pnfsblock: cleanup_layoutcommit In blocklayout driver. There are two things happening while layoutcommit/cleanup. 1. the modified extents are encoded. 2. On cleanup the extents are put back on the layout rw extents list, for reads. In the new system where actual xdr encoding is done in encode_layoutcommit() directly into xdr buffer, these are the new commit stages: 1. On setup_layoutcommit, the range is adjusted as before and a structure is allocated for communication with bl_encode_layoutcommit && bl_cleanup_layoutcommit (Generic layer provides a void-star to hang it on) 2. bl_encode_layoutcommit is called to do the actual encoding directly into xdr. The commit-extent-list is not freed and is stored on above structure. FIXME: The code is not yet converted to the new XDR cleanup 3. On cleanup the commit-extent-list is put back by a call to set_to_rw() as before, but with no need for XDR decoding of the list as before. And the commit-extent-list is freed. Finally allocated structure is freed. [rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()] Signed-off-by: Jim Rees <rees@umich.edu> [pnfsblock: introduce bl_committing list] Signed-off-by: Peng Tao <peng_tao@emc.com> [pnfsblock: SQUASHME: adjust to API change] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [blocklayout: encode_layoutcommit implementation] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> [pnfsblock: fix bug setting up layoutcommit.] Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> [pnfsblock: cleanup_layoutcommit wants a status parameter] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:17 -04:00
Fred Isaman	90ace12ac4	pnfsblock: encode_layoutcommit In blocklayout driver. There are two things happening while layoutcommit/cleanup. 1. the modified extents are encoded. 2. On cleanup the extents are put back on the layout rw extents list, for reads. In the new system where actual xdr encoding is done in encode_layoutcommit() directly into xdr buffer, these are the new commit stages: 1. On setup_layoutcommit, the range is adjusted as before and a structure is allocated for communication with bl_encode_layoutcommit && bl_cleanup_layoutcommit (Generic layer provides a void-star to hang it on) 2. bl_encode_layoutcommit is called to do the actual encoding directly into xdr. The commit-extent-list is not freed and is stored on above structure. FIXME: The code is not yet converted to the new XDR cleanup 3. On cleanup the commit-extent-list is put back by a call to set_to_rw() as before, but with no need for XDR decoding of the list as before. And the commit-extent-list is freed. Finally allocated structure is freed. [rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()] [pnfsblock: get rid of deprecated xdr macros] Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [blocklayout: encode_layoutcommit implementation] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> [pnfsblock: fix bug setting up layoutcommit.] Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> [pnfsblock: prevent commit list corruption] [pnfsblock: fix layoutcommit with an empty opaque] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:17 -04:00
Fred Isaman	9f3770422c	pnfsblock: merge rw extents Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:17 -04:00
Fred Isaman	c1c2a4cd35	pnfsblock: add extent manipulation functions Adds working implementations of various support functions to handle INVAL extents, needed by writes, such as bl_mark_sectors_init and bl_is_sector_init. [pnfsblock: fix 64-bit compiler warnings for extent manipulation] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> [Implement release_inval_marks] Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:17 -04:00
Fred Isaman	6d742ba538	pnfsblock: bl_find_get_extent Implement bl_find_get_extent(), one of the core extent manipulation routines. [pnfsblock: Lookup list entry of layouts and tags in reverse order] Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Jim Rees <rees@umich.edu> pnfsblock: fix print format warnings for sector_t and size_t gcc spews warnings about these on x86_64, e.g.: fs/nfs/blocklayout/blocklayout.c:74: warning: format ‘%Lu’ expects type ‘long long unsigned int’, but argument 2 has type ‘sector_t’ fs/nfs/blocklayout/blocklayout.c:388: warning: format ‘%d’ expects type ‘int’, but argument 5 has type ‘size_t’ Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:16 -04:00
Fred Isaman	e9437ccef9	pnfsblock: xdr decode pnfs_block_layout4 XDR decodes the block layout payload sent in LAYOUTGET result, storing the result in an extent list. [pnfsblock: get rid of deprecated xdr macros] Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: fix bug getting pnfs_layout_type in translate_devid().] Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:16 -04:00
Fred Isaman	2f9fd18260	pnfsblock: call and parse getdevicelist Call GETDEVICELIST during mount, then call and parse GETDEVICEINFO for each device returned. [pnfsblock: get rid of deprecated xdr macros] Signed-off-by: Jim Rees <rees@umich.edu> [pnfsblock: fix pnfs_deviceid references] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: fix print format warnings for sector_t and size_t] [pnfs-block: #include <linux/vmalloc.h>] [pnfsblock: no PNFS_NFS_SERVER] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [pnfsblock: fix bug determining size of striped volume] [pnfsblock: fix oops when using multiple devices] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> [pnfsblock: get rid of vmap and deviceid->area structure] Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:16 -04:00
Fred Isaman	03341d2cc9	pnfsblock: merge extents Replace a stub, so that extents underlying the layouts are properly added, merged, or ignored as necessary. Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: delete the new node before put it] Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:16 -04:00
Fred Isaman	a60d2ebd93	pnfsblock: lseg alloc and free Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: fix bug getting pnfs_layout_type in translate_devid().] Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Zhang Jingwang <Jingwang.Zhang@emc.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:16 -04:00
Jim Rees	025a70ed65	pnfsblock: remove device operations Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> [upcall bugfixes] Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:16 -04:00
Jim Rees	fe0a9b7408	pnfsblock: add device operations Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> [upcall bugfixes] Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:16 -04:00
Fred Isaman	9e69296999	pnfsblock: basic extent code Adds structures and basic create/delete code for extents. Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Zhang Jingwang <Jingwang.Zhang@emc.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:16 -04:00
Benny Halevy	e9643fe80d	pnfsblock: use pageio_ops api [pnfsblock: use pnfs_generic_pg_init_read/write] Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:15 -04:00
Fred Isaman	155e7524f2	pnfsblock: add blocklayout Kconfig option, Makefile, and stubs Define a configuration variable to enable/disable compilation of the block driver code. Add the minimal structure for a pnfs block layout driver, and empty list-heads that will hold the extent data [pnfsblock: make NFS_V4_1 select PNFS_BLOCK] Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [pnfs-block: fix CONFIG_PNFS_BLOCK dependencies] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> [pnfsblock: SQUASHME: adjust to API change] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfs: move pnfs_layout_type inline in nfs_inode] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [blocklayout: encode_layoutcommit implementation] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> [pnfsblock: layout alloc and free] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfs: move pnfs_layout_type inline in nfs_inode] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> [pnfsblock: define module alias] Signed-off-by: Peng Tao <peng_tao@emc.com> [rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()] Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:15 -04:00
Andy Adamson	db29c08909	pnfs: cleanup_layoutcommit This gives layout driver a chance to cleanup structures they put in at encode_layoutcommit. Signed-off-by: Andy Adamson <andros@netapp.com> [fixup layout header pointer for layoutcommit] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> [rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()] Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:15 -04:00
Fred Isaman	dae100c2b1	pnfs: ask for layout_blksize and save it in nfs_server Block layout needs it to determine IO size. Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Tao Guo <glorioustao@gmail.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:15 -04:00
Benny Halevy	738fd0f360	pnfs: add set-clear layoutdriver interface To allow layout driver to issue getdevicelist at mount time, and clean up at umount time. [fixup non NFS_V4_1 set_pnfs_layoutdriver definition] [pnfs: pass mntfh down the init_pnfs path] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:15 -04:00
Andy Adamson	7f11d8d38d	pnfs: GETDEVICELIST The block driver uses GETDEVICELIST Signed-off-by: Andy Adamson <andros@netapp.com> [pass struct nfs_server * to getdevicelist] [get machince creds for getdevicelist] [fix getdevicelist decode sizing] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:15 -04:00
Peng Tao	3557c6c3be	pnfs: use lwb as layoutcommit length Using NFS4_MAX_UINT64 will break current protocol. [Needed in v3.0] CC: Stable Tree <stable@kernel.org> Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:15 -04:00
Peng Tao	a9bae5666d	pnfs: let layoutcommit handle a list of lseg There can be multiple lseg per file, so layoutcommit should be able to handle it. [Needed in v3.0] CC: Stable Tree <stable@kernel.org> Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:15 -04:00
Peng Tao	9fa4075878	pnfs: save layoutcommit cred at layout header init No need to save it for every lseg. No need to save it at every pnfs_set_layoutcommit. [Needed in v3.0] CC: Stable Tree <stable@kernel.org> Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:14 -04:00
Peng Tao	acff588053	pnfs: save layoutcommit lwb at layout header No need to save it for every lseg. [Needed in v3.0] CC: Stable Tree <stable@kernel.org> Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-31 12:18:14 -04:00
Wu Fengguang	0e995816f4	don't busy retry the inode on failed grab_super_passive() This fixes a soft lockup on conditions a) the flusher is working on a work by __bdi_start_writeback(), while b) someone else calls writeback_inodes_sb() or sync_inodes_sb(), which grab sb->s_umount and enqueue a new work for the flusher to execute The s_umount grabbed by (b) will fail the grab_super_passive() in (a). Then if the inode is requeued, wb_writeback() will busy retry on it. As a result, wb_writeback() loops for ever without releasing wb->list_lock, which further blocks other tasks. Fix the busy loop by redirtying the inode. This may undesirably delay the writeback of the inode, however most likely it will be picked up soon by the queued work by writeback_inodes_sb(), sync_inodes_sb() or even writeback_inodes_wb(). bug url: http://www.spinics.net/lists/linux-fsdevel/msg47292.html Reported-by: Christoph Hellwig <hch@infradead.org> Tested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-07-31 22:52:08 +08:00
Bryan Schumaker	374e4e3ec3	Additional readdir cookie loop information Print out the name of the file that triggers the cookie loop message to make it slightly easier to track down the cause. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-30 14:37:14 -04:00
Trond Myklebust	0c0308066c	NFS: Fix spurious readdir cookie loop messages If the directory contents change, then we have to accept that the file->f_pos value may shrink if we do a 'search-by-cookie'. In that case, we should turn off the loop detection and let the NFS client try to recover. The patch also fixes a second loop detection bug by ensuring that after turning on the ctx->duped flag, we read at least one new cookie into ctx->dir_cookie before attempting to match with ctx->dup_cookie. Reported-by: Petr Vandrovec <petr@vandrovec.name> Cc: stable@kernel.org [2.6.39+] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-30 14:34:50 -04:00
Dan Carpenter	c49bafa384	ext4: add missing kfree() on error return path in add_new_gdb() We added some more error handling in `b40971426a` "ext4: add error checking to calls to ext4_handle_dirty_metadata()". But we need to call kfree() as well to avoid a memory leak. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-30 12:58:41 -04:00
Theodore Ts'o	d59729f4e7	ext4: fix races in ext4_sync_parent() Fix problems if fsync() races against a rename of a parent directory as pointed out by Al Viro in his own inimitable way: >While we are at it, could somebody please explain what the hell is ext4 >doing in >static int ext4_sync_parent(struct inode inode) >{ > struct writeback_control wbc; > struct dentry dentry = NULL; > int ret = 0; > > while (inode && ext4_test_inode_state(inode, EXT4_STATE_NEWENTRY)) { > ext4_clear_inode_state(inode, EXT4_STATE_NEWENTRY); > dentry = list_entry(inode->i_dentry.next, > struct dentry, d_alias); > if (!dentry \|\| !dentry->d_parent \|\| !dentry->d_parent->d_inode) > break; > inode = dentry->d_parent->d_inode; > ret = sync_mapping_buffers(inode->i_mapping); > ... >Note that dentry obviously can't be NULL there. dentry->d_parent is never >NULL. And dentry->d_parent would better not be negative, for crying out >loud! What's worse, there's no guarantees that dentry->d_parent will >remain our parent over that sync_mapping_buffers() and that inode won't >just be freed under us (after rename() and memory pressure leading to >eviction of what used to be our dentry->d_parent)...... Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-30 12:34:19 -04:00
Linus Torvalds	983236b574	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: Fix build breakage in xfs_iops.c when CONFIG_FS_POSIX_ACL is not set	2011-07-29 23:45:06 -07:00
Linus Torvalds	74aec4e0dd	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6: ecryptfs: Make inode bdi consistent with superblock bdi eCryptfs: Unlock keys needed by ecryptfsd	2011-07-29 23:43:50 -07:00
Linus Torvalds	59ed2bb274	ext2: remove duplicate 'ext2_get_acl()' define When commit `4e34e719e4` ("fs: take the ACL checks to common code") changed the xyz_check_acl() functions into the more natural xyz_get_acl() interface, we grew two copies of the #define ext2_get_acl NULL define for the non-acl case. Remove the extra one. Reported-by: Marco Stornelli <marco.stornelli@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-29 23:21:50 -07:00
Markus Trippelsdorf	a5a7bbcc01	xfs: Fix build breakage in xfs_iops.c when CONFIG_FS_POSIX_ACL is not set commit `4e34e719e4`, that takes the ACL checks to common code, accidentely broke the build when CONFIG_FS_POSIX_ACL is not set: CC fs/xfs/linux-2.6/xfs_iops.o fs/xfs/linux-2.6/xfs_iops.c:1025:14: error: ‘xfs_get_acl’ undeclared here (not in a function) Fix this by declaring xfs_get_acl a static inline function. Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-29 12:26:14 -05:00
Thieu Le	985ca0e626	ecryptfs: Make inode bdi consistent with superblock bdi Make the inode mapping bdi consistent with the superblock bdi so that dirty pages are flushed properly. Signed-off-by: Thieu Le <thieule@chromium.org> Cc: <stable@kernel.org> [2.6.39+] Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-07-28 23:48:26 -05:00
Tyler Hicks	b2987a5e05	eCryptfs: Unlock keys needed by ecryptfsd Fixes a regression caused by `b5695d0463` Kernel keyring keys containing eCryptfs authentication tokens should not be write locked when calling out to ecryptfsd to wrap and unwrap file encryption keys. The eCryptfs kernel code can not hold the key's write lock because ecryptfsd needs to request the key after receiving such a request from the kernel. Without this fix, all file opens and creates will timeout and fail when using the eCryptfs PKI infrastructure. This is not an issue when using passphrase-based mount keys, which is the most widely deployed eCryptfs configuration. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Acked-by: Roberto Sassu <roberto.sassu@polito.it> Tested-by: Roberto Sassu <roberto.sassu@polito.it> Tested-by: Alexis Hafner1 <haf@zurich.ibm.com> Cc: <stable@kernel.org> [2.6.39+]	2011-07-28 23:30:09 -05:00
Linus Torvalds	95b6886526	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (54 commits) tpm_nsc: Fix bug when loading multiple TPM drivers tpm: Move tpm_tis_reenable_interrupts out of CONFIG_PNP block tpm: Fix compilation warning when CONFIG_PNP is not defined TOMOYO: Update kernel-doc. tpm: Fix a typo tpm_tis: Probing function for Intel iTPM bug tpm_tis: Fix the probing for interrupts tpm_tis: Delay ACPI S3 suspend while the TPM is busy tpm_tis: Re-enable interrupts upon (S3) resume tpm: Fix display of data in pubek sysfs entry tpm_tis: Add timeouts sysfs entry tpm: Adjust interface timeouts if they are too small tpm: Use interface timeouts returned from the TPM tpm_tis: Introduce durations sysfs entry tpm: Adjust the durations if they are too small tpm: Use durations returned from TPM TOMOYO: Enable conditional ACL. TOMOYO: Allow using argv[]/envp[] of execve() as conditions. TOMOYO: Allow using executable's realpath and symlink's target as conditions. TOMOYO: Allow using owner/group etc. of file objects as conditions. ... Fix up trivial conflict in security/tomoyo/realpath.c	2011-07-27 19:26:38 -07:00
Al Viro	d6b722aa38	hppfs: missing include Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-27 22:21:58 -04:00
Utako Kusaka	29ae07b702	ext4: Fix overflow caused by missing cast in ext4_fallocate() The logical block number in map.l_blk is a __u32, and so before we shift it left, by the block size, we neeed cast it to a 64-bit size. Otherwise i_size can be corrupted on an ENOSPC. # df -T /mnt/mp1 Filesystem Type 1K-blocks Used Available Use% Mounted on /dev/sda6 ext4 9843276 153056 9190200 2% /mnt/mp1 # fallocate -o 0 -l 2199023251456 /mnt/mp1/testfile fallocate: /mnt/mp1/testfile: fallocate failed: No space left on device # stat /mnt/mp1/testfile File: `/mnt/mp1/testfile' Size: 4293656576 Blocks: 19380440 IO Block: 4096 regular file Device: 806h/2054d Inode: 12 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2011-07-25 13:01:31.414490496 +0900 Modify: 2011-07-25 13:01:31.414490496 +0900 Change: 2011-07-25 13:01:31.454490495 +0900 Signed-off-by: Utako Kusaka <u-kusaka@wm.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> -- fs/ext4/extents.c \| 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)	2011-07-27 22:11:20 -04:00
Robin Dong	0e1147b001	ext4: add action of moving index in ext4_ext_rm_idx for Punch Hole The old function ext4_ext_rm_idx is used only for truncate case because it just remove last index in extent-index-block. When punching hole, it usually needed to remove "middle" index, therefore we must move indexes which after it forward. (I create a file with 1 depth extent tree and punch hole in the middle of it, the last index in index-block strangly gone, so I find out this bug) Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-27 21:29:33 -04:00
Yongqiang Yang	668f4dc559	ext4: simplify parameters of reserve_backup_gdb() The reserve_backup_gdb() function only needs the block group number; there's no need to pass a pointer to struct ext4_new_group_data to it. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>	2011-07-27 21:23:13 -04:00
Yongqiang Yang	2f91971014	ext4: simplify parameters of add_new_gdb() add_new_gdb() only needs the block group number; there is no need to pass a pointer to struct ext4_new_group_data to add_new_gdb(). Instead of filling in a pointer the struct buffer_head in add_new_gdb(), it's simpler to have the caller fetch it from the s_group_desc[] array. [Fixed error path to handle the case where struct buffer_head *primary hasn't been set yet. -- Ted] Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-27 21:16:33 -04:00
Yongqiang Yang	e6075e984d	ext4: remove lock_buffer in bclean() and setup_new_group_blocks() There is no need to lock the buffers since no one else should be touching these buffers besides the file system. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-27 20:40:18 -04:00
Linus Torvalds	22712200e1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: make sure reserve_metadata_bytes doesn't leak out strange errors Btrfs: use the commit_root for reading free_space_inode crcs Btrfs: reduce extent_state lock contention for metadata Btrfs: remove lockdep magic from btrfs_next_leaf Btrfs: make a lockdep class for each root Btrfs: switch the btrfs tree locks to reader/writer Btrfs: fix deadlock when throttling transactions Btrfs: stop using highmem for extent_buffers Btrfs: fix BUG_ON() caused by ENOSPC when relocating space Btrfs: tag pages for writeback in sync Btrfs: fix enospc problems with delalloc Btrfs: don't flush delalloc arbitrarily Btrfs: use find_or_create_page instead of grab_cache_page Btrfs: use a worker thread to do caching Btrfs: fix how we merge extent states and deal with cached states Btrfs: use the normal checksumming infrastructure for free space cache Btrfs: serialize flushers in reserve_metadata_bytes Btrfs: do transaction space reservation before joining the transaction Btrfs: try to only do one btrfs_search_slot in do_setxattr	2011-07-27 16:43:52 -07:00
Linus Torvalds	597a67e0ba	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: optimize the negative xattr caching xfs: prevent against ioend livelocks in xfs_file_fsync xfs: flag all buffers as metadata xfs: encapsulate a block of debug code	2011-07-27 13:41:51 -07:00
Linus Torvalds	28890d3598	Merge branch 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs * 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (44 commits) NFSv4: Don't use the delegation->inode in nfs_mark_return_delegation() nfs: don't use d_move in nfs_async_rename_done RDMA: Increasing RPCRDMA_MAX_DATA_SEGS SUNRPC: Replace xprt->resend and xprt->sending with a priority queue SUNRPC: Allow caller of rpc_sleep_on() to select priority levels SUNRPC: Support dynamic slot allocation for TCP connections SUNRPC: Clean up the slot table allocation SUNRPC: Initalise the struct xprt upon allocation SUNRPC: Ensure that we grab the XPRT_LOCK before calling xprt_alloc_slot pnfs: simplify pnfs files module autoloading nfs: document nfsv4 sillyrename issues NFS: Convert nfs4_set_ds_client to EXPORT_SYMBOL_GPL SUNRPC: Convert the backchannel exports to EXPORT_SYMBOL_GPL SUNRPC: sunrpc should not explicitly depend on NFS config options NFS: Clean up - simplify the switch to read/write-through-MDS NFS: Move the pnfs write code into pnfs.c NFS: Move the pnfs read code into pnfs.c NFS: Allow the nfs_pageio_descriptor to signal that a re-coalesce is needed NFS: Use the nfs_pageio_descriptor->pg_bsize in the read/write request NFS: Cache rpc_ops in struct nfs_pageio_descriptor ...	2011-07-27 13:23:02 -07:00
Chris Mason	ff95acb673	Merge branch 'integration' into for-linus	2011-07-27 16:18:13 -04:00
Chris Mason	75c195a2ca	Btrfs: make sure reserve_metadata_bytes doesn't leak out strange errors The btrfs transaction code will return any errors that come from reserve_metadata_bytes. We need to make sure we don't return funny things like 1 or EAGAIN. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-27 16:11:41 -04:00
David Howells	09570f9149	proc: make struct proc_dir_entry::name a terminal array rather than a pointer Since __proc_create() appends the name it is given to the end of the PDE structure that it allocates, there isn't a need to store a name pointer. Instead we can just replace the name pointer with a terminal char array of _unspecified_ length. The compiler will simply append the string to statically defined variables of PDE type overlapping any hole at the end of the structure and, unlike specifying an explicitly _zero_ length array, won't give a warning if you try to statically initialise it with a string of more than zero length. Also, whilst we're at it: (1) Move namelen to end just prior to name and reduce it to a single byte (name shouldn't be longer than NAME_MAX). (2) Move pde_unload_lock two places further on so that if it's four bytes in size on a 64-bit machine, it won't cause an unused hole in the PDE struct. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-27 12:50:45 -07:00
Chris Mason	2cf8572dac	Btrfs: use the commit_root for reading free_space_inode crcs Now that we are using regular file crcs for the free space cache, we can deadlock if we try to read the free_space_inode while we are updating the crc tree. This commit fixes things by using the commit_root to read the crcs. This is safe because we the free space cache file would already be loaded if that block group had been changed in the current transaction. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-27 12:46:48 -04:00
Chris Mason	19b6caf4ac	Btrfs: reduce extent_state lock contention for metadata For metadata buffers that don't straddle pages (all of them), btrfs can safely use the page uptodate bits and extent_buffer uptodate bit instead of needing to use the extent_state tree. This greatly reduces contention on the state tree lock. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-27 12:46:47 -04:00
Chris Mason	31533fb263	Btrfs: remove lockdep magic from btrfs_next_leaf Before the reader/writer locks, btrfs_next_leaf needed to keep the path blocking to avoid making lockdep upset. Now that btrfs_next_leaf only takes read locks, this isn't required. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-27 12:46:47 -04:00
Chris Mason	85d4e46111	Btrfs: make a lockdep class for each root This patch was originally from Tejun Heo. lockdep complains about the btrfs locking because we sometimes take btree locks from two different trees at the same time. The current classes are based only on level in the btree, which isn't enough information for lockdep to figure out if the lock is safe. This patch makes a class for each type of tree, and lumps all the FS trees that actually have files and directories into the same class. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-27 12:46:46 -04:00
Chris Mason	bd681513fa	Btrfs: switch the btrfs tree locks to reader/writer The btrfs metadata btree is the source of significant lock contention, especially in the root node. This commit changes our locking to use a reader/writer lock. The lock is built on top of rw spinlocks, and it extends the lock tracking to remember if we have a read lock or a write lock when we go to blocking. Atomics count the number of blocking readers or writers at any given time. It removes all of the adaptive spinning from the old code and uses only the spinning/blocking hints inside of btrfs to decide when it should continue spinning. In read heavy workloads this is dramatically faster. In write heavy workloads we're still faster because of less contention on the root node lock. We suffer slightly in dbench because we schedule more often during write locks, but all other benchmarks so far are improved. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-27 12:46:46 -04:00
Josef Bacik	81317fdedd	Btrfs: fix deadlock when throttling transactions Hit this nice little deadlock. What happens is this __btrfs_end_transaction with throttle set, --use_count so it equals 0 btrfs_commit_transaction <somebody else actually manages to start the commit> btrfs_end_transaction --use_count so now its -1 <== BAD we just return and wait on the transaction This is bad because we just return after our use_count is -1 and don't let go of our num_writer count on the transaction, so the guy committing the transaction just sits there forever. Fix this by inc'ing our use_count if we're going to call commit_transaction so that if we call btrfs_end_transaction it's valid. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-27 12:46:46 -04:00
Chris Mason	a65917156e	Btrfs: stop using highmem for extent_buffers The extent_buffers have a very complex interface where we use HIGHMEM for metadata and try to cache a kmap mapping to access the memory. The next commit adds reader/writer locks, and concurrent use of this kmap cache would make it even more complex. This commit drops the ability to use HIGHMEM with extent buffers, and rips out all of the related code. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-27 12:46:45 -04:00
Miao Xie	199c36eaa9	Btrfs: fix BUG_ON() caused by ENOSPC when relocating space When we balanced the chunks across the devices, BUG_ON() in __finish_chunk_alloc() was triggered. ------------[ cut here ]------------ kernel BUG at fs/btrfs/volumes.c:2568! [SNIP] Call Trace: [<ffffffffa049525e>] btrfs_alloc_chunk+0x8e/0xa0 [btrfs] [<ffffffffa04546b0>] do_chunk_alloc+0x330/0x3a0 [btrfs] [<ffffffffa045c654>] btrfs_reserve_extent+0xb4/0x1f0 [btrfs] [<ffffffffa045c86b>] btrfs_alloc_free_block+0xdb/0x350 [btrfs] [<ffffffffa048a8d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs] [<ffffffffa04476fd>] __btrfs_cow_block+0x14d/0x5e0 [btrfs] [<ffffffffa044660d>] ? read_block_for_search+0x14d/0x4d0 [btrfs] [<ffffffffa0447c9b>] btrfs_cow_block+0x10b/0x240 [btrfs] [<ffffffffa044dd5e>] btrfs_search_slot+0x49e/0x7a0 [btrfs] [<ffffffffa044f07d>] btrfs_insert_empty_items+0x8d/0xf0 [btrfs] [<ffffffffa045e973>] insert_with_overflow+0x43/0x110 [btrfs] [<ffffffffa045eb0d>] btrfs_insert_dir_item+0xcd/0x1f0 [btrfs] [<ffffffffa0489bd0>] ? map_extent_buffer+0xb0/0xc0 [btrfs] [<ffffffff812276ad>] ? rb_insert_color+0x9d/0x160 [<ffffffffa046cc40>] ? inode_tree_add+0xf0/0x150 [btrfs] [<ffffffffa0474801>] btrfs_add_link+0xc1/0x1c0 [btrfs] [<ffffffff811dacac>] ? security_inode_init_security+0x1c/0x30 [<ffffffffa04a28aa>] ? btrfs_init_acl+0x4a/0x180 [btrfs] [<ffffffffa047492f>] btrfs_add_nondir+0x2f/0x70 [btrfs] [<ffffffffa046af16>] ? btrfs_init_inode_security+0x46/0x60 [btrfs] [<ffffffffa0474ac0>] btrfs_create+0x150/0x1d0 [btrfs] [<ffffffff81159c63>] ? generic_permission+0x23/0xb0 [<ffffffff8115b415>] vfs_create+0xa5/0xc0 [<ffffffff8115ce6e>] do_last+0x5fe/0x880 [<ffffffff8115dc0d>] path_openat+0xcd/0x3d0 [<ffffffff8115e029>] do_filp_open+0x49/0xa0 [<ffffffff8116a965>] ? alloc_fd+0x95/0x160 [<ffffffff8114f0c7>] do_sys_open+0x107/0x1e0 [<ffffffff810bcc3f>] ? audit_syscall_entry+0x1bf/0x1f0 [<ffffffff8114f1e0>] sys_open+0x20/0x30 [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b [SNIP] RIP [<ffffffffa049444a>] __finish_chunk_alloc+0x20a/0x220 [btrfs] The reason is: Task1 Space balance task do_chunk_alloc() __finish_chunk_alloc() update device info in the chunk tree alloc system metadata block relocate system metadata block group set system metadata block group readonly, This block group is the only one that can allocate space. So there is no free space that can be allocated now. find no space and don't try to alloc new chunk, and then return ENOSPC BUG_ON() in __finish_chunk_alloc() was triggered. Fix this bug by allocating a new system metadata chunk before relocating the old one if we find there is no free space which can be allocated after setting the old block group to be read-only. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Tested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-27 12:46:45 -04:00
Josef Bacik	f7aaa06bff	Btrfs: tag pages for writeback in sync Everybody else does this, we need to do it too. If we're syncing, we need to tag the pages we're going to write for writeback so we don't end up writing the same stuff over and over again if somebody is constantly redirtying our file. This will keep us from having latencies with heavy sync workloads. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-27 12:46:44 -04:00
Josef Bacik	9e0baf60de	Btrfs: fix enospc problems with delalloc So I had this brilliant idea to use atomic counters for outstanding and reserved extents, but this turned out to be a bad idea. Consider this where we have 1 outstanding extent and 1 reserved extent Reserver Releaser atomic_dec(outstanding) now 0 atomic_read(outstanding)+1 get 1 atomic_read(reserved) get 1 don't actually reserve anything because they are the same atomic_cmpxchg(reserved, 1, 0) atomic_inc(outstanding) atomic_add(0, reserved) free reserved space for 1 extent Then the reserver now has no actual space reserved for it, and when it goes to finish the ordered IO it won't have enough space to do it's allocation and you get those lovely warnings. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-27 12:46:44 -04:00
Josef Bacik	a599142806	Btrfs: don't flush delalloc arbitrarily Kill the check to see if we have 512mb of reserved space in delalloc and shrink_delalloc if we do. This causes unexpected latencies and we have other logic to see if we need to throttle. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-27 12:46:43 -04:00
Josef Bacik	a94733d0bc	Btrfs: use find_or_create_page instead of grab_cache_page grab_cache_page will use mapping_gfp_mask(), which for all inodes is set to GFP_HIGHUSER_MOVABLE. So instead use find_or_create_page in all cases where we need GFP_NOFS so we don't deadlock. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-07-27 12:46:43 -04:00
Josef Bacik	bab39bf998	Btrfs: use a worker thread to do caching A user reported a deadlock when copying a bunch of files. This is because they were low on memory and kthreadd got hung up trying to migrate pages for an allocation when starting the caching kthread. The page was locked by the person starting the caching kthread. To fix this we just need to use the async thread stuff so that the threads are already created and we don't have to worry about deadlocks. Thanks, Reported-by: Roman Mamedov <rm@romanrm.ru> Signed-off-by: Josef Bacik <josef@redhat.com>	2011-07-27 12:46:25 -04:00
Linus Torvalds	5fd00b0315	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: jfs: clean up some compiler warnings	2011-07-27 09:26:39 -07:00
Linus Torvalds	333c066bb7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Fix mount hang caused by certain access pattern to sysfs files	2011-07-27 09:26:22 -07:00
Christoph Hellwig	510792ee29	xfs: optimize the negative xattr caching Since the addition of file capabilities every write needs to read xattrs to check if we have any capabilities to clear. In Linux 3.0 Andi Kleen added a flag to cache the fact that we do not have any attributes on an inode. Make sure to already mark a file as not having any attributes when reading it from disk in case it doesn't even have an attribute fork. Based on an earlier patch from Andi Kleen. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-26 22:06:50 -05:00
Christoph Hellwig	d1166ec792	xfs: prevent against ioend livelocks in xfs_file_fsync We need to take some locks to prevent new ioends from coming in when we wait for all existing ones to go away. Up to Linux 3.0 that was done using the i_mutex held by the VFS fsync code, but now that we are called without it we need to take care of it ourselves. Use the I/O lock instead of i_mutex just like we do in other places. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-26 22:06:39 -05:00
Christoph Hellwig	34951f5cb7	xfs: flag all buffers as metadata Now that REQ_META bios aren't treated specially in the CFQ I/O schedule anymore, we can tag all buffers as metadata to make blktrace traces more meaningful. Note that we use buffers also to zero out partial blocks in the preallocation / hole punching code, and while they operate on data blocks the zeros written certainly aren't data. I think this case is borderline metadata enough to not bother special casing it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-26 22:05:48 -05:00
Alex Elder	1c4f33296e	xfs: encapsulate a block of debug code Pull into a helper function some debug-only code that validates a xfs_da_blkinfo structure that's been read from disk. Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@infradead.org>	2011-07-26 22:05:34 -05:00
Yongqiang Yang	6d40bc5a7e	ext4: simplify journal handling in setup_new_group_blocks() This patch simplifies journal handling in setup_new_group_blocks(). In previous code, block bitmap is modified everywhere in setup_new_group_blocks(), ext4_get_write_access() in extend_or_restart_transaction() is used to guarantee that the block bitmap stays in the new handle, this makes things complicated. The previous commit changed things so that the modifications on the block bitmap are batched and done by ext4_set_bits() at the end of the for loop. This allows us to simplify things. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-26 22:24:41 -04:00
Yongqiang Yang	c3e94d1df9	ext4: let setup_new_group_blocks() set multiple bits at a time Rename mb_set_bits() to ext4_set_bits() and make it a global function so that setup_new_group_blocks() can use it. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-26 22:05:53 -04:00
Yongqiang Yang	2b79b09d13	ext4: fix a typo in ext4_group_extend() Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-26 21:53:35 -04:00
Yongqiang Yang	4740b830ed	ext4: let ext4_group_add_blocks() handle 0 blocks quickly If ext4_group_add_blocks() is called with 0 block, make it return 0 without doing any extra work. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-26 21:51:08 -04:00
Yongqiang Yang	cc7365dfe4	ext4: let ext4_group_add_blocks() return an error code This patch lets ext4_group_add_blocks() return an error code if it fails, so that upper functions can handle error correctly. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-26 21:46:07 -04:00
Yongqiang Yang	0529155e8a	ext4: rename ext4_add_groupblocks() to ext4_group_add_blocks() Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-26 21:43:56 -04:00
Yongqiang Yang	ce723c31b5	ext4: prevent a fs with errors from being resized A filesystem with errors is not allowed to being resized, otherwise, it is easy to destroy the filesystem. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-26 21:39:09 -04:00
Yongqiang Yang	8f82f840ec	ext4: prevent parallel resizers by atomic bit ops Before this patch, parallel resizers are allowed and protected by a mutex lock, actually, there is no need to support parallel resizer, so this patch prevents parallel resizers by atmoic bit ops, like lock_page() and unlock_page() do. To do this, the patch removed the mutex lock s_resize_lock from struct ext4_sb_info and added a unsigned long field named s_resize_flags which inidicates if there is a resizer. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-26 21:35:44 -04:00
Linus Torvalds	e371d46ae4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: merge fchmod() and fchmodat() guts, kill ancient broken kludge xfs: fix misspelled S_IS...() xfs: get rid of open-coded S_ISREG(), etc. vfs: document locking requirements for d_move, __d_move and d_materialise_unique omfs: fix (mode & S_IFDIR) abuse btrfs: S_ISREG(mode) is not mode & S_IFREG... ima: fmode_t misspelled as mode_t... pci-label.c: size_t misspelled as mode_t jffs2: S_ISLNK(mode & S_IFMT) is pointless snd_msnd ->mode is fmode_t, not mode_t v9fs_iop_get_acl: get rid of unused variable vfs: dont chain pipe/anon/socket on superblock s_inodes list Documentation: Exporting: update description of d_splice_alias fs: add missing unlock in default_llseek()	2011-07-26 18:30:20 -07:00
Arun Sharma	60063497a9	atomic: use <linux/atomic.h> This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:47 -07:00
Oleg Nesterov	32e107f71e	fs/exec.c:acct_arg_size(): ptl is no longer needed for add_mm_counter() acct_arg_size() takes ->page_table_lock around add_mm_counter() if !SPLIT_RSS_COUNTING. This is not needed after commit `172703b08c` ("mm: delete non-atomic mm counter implementation"). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:44 -07:00
Tetsuo Handa	b4edf8bd06	exec: do not retry load_binary method if CONFIG_MODULES=n If CONFIG_MODULES=n, it makes no sense to retry the list of binary formats handler because the list will not be modified by request_module(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Richard Weinberger <richard@nod.at> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:44 -07:00
Tetsuo Handa	912193521b	exec: do not call request_module() twice from search_binary_handler() Currently, search_binary_handler() tries to load binary loader module using request_module() if a loader for the requested program is not yet loaded. But second attempt of request_module() does not affect the result of search_binary_handler(). If request_module() triggered recursion, calling request_module() twice causes 2 to the power of MAX_KMOD_CONCURRENT (= 50) repetitions. It is not an infinite loop but is sufficient for users to consider as a hang up. Therefore, this patch changes not to call request_module() twice, making 1 to the power of MAX_KMOD_CONCURRENT repetitions in case of recursion. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: Richard Weinberger <richard@nod.at> Tested-by: Richard Weinberger <richard@nod.at> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:44 -07:00
Michal Hocko	aacb3d17a7	fs/exec.c: use BUILD_BUG_ON for VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP Commit `a8bef8ff6e` ("mm: migration: avoid race between shift_arg_pages() and rmap_walk() during migration by not migrating temporary stacks") introduced a BUG_ON() to ensure that VM_STACK_FLAGS and VM_STACK_INCOMPLETE_SETUP do not overlap. The check is a compile time one, so BUILD_BUG_ON is more appropriate. Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:44 -07:00
Vasiliy Kulikov	293eb1e777	proc: fix a race in do_io_accounting() If an inode's mode permits opening /proc/PID/io and the resulting file descriptor is kept across execve() of a setuid or similar binary, the ptrace_may_access() check tries to prevent using this fd against the task with escalated privileges. Unfortunately, there is a race in the check against execve(). If execve() is processed after the ptrace check, but before the actual io information gathering, io statistics will be gathered from the privileged process. At least in theory this might lead to gathering sensible information (like ssh/ftp password length) that wouldn't be available otherwise. Holding task->signal->cred_guard_mutex while gathering the io information should protect against the race. The order of locking is similar to the one inside of ptrace_attach(): first goes cred_guard_mutex, then lock_task_sighand(). Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:43 -07:00
Daisuke Ogino	d2857e79a2	procfs: return ENOENT on opening a being-removed proc entry Change the return value to ENOENT. This return value is then returned when opening the proc entry that have been removed. For example, open("/proc/bus/pci/XX/YY") when the corresponding device is being hot-removed. Signed-off-by: Daisuke Ogino <ogino.daisuke@jp.fujitsu.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Acked-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:43 -07:00
Oleg Nesterov	99b6456748	do_coredump: fix the "ispipe" error check do_coredump() assumes that if format_corename() fails it should return -ENOMEM. This is not true, for example cn_print_exe_file() can propagate the error from d_path. Even if it was true, this is too fragile. Change the code to check "ispipe < 0". Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Reviewed-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:43 -07:00
Jiri Slaby	2c563731fe	coredump: escape / in hostname and comm Change every occurence of / in comm and hostname to !. If the process changes its name to contain /, the core is not dumped (if the directory tree doesn't exist like that). The same with hostname being something like myhost/3. Fix this behaviour by using the escape loop used in %E. (We extract it to a separate function.) Now both with comm == myprocess/1 and hostname == myhost/1, the core is dumped like (kernel.core_pattern='core.%p.%e.%h): core.2349.myprocess!1.myhost!1 Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:43 -07:00
Jiri Slaby	3141c8b165	coredump: use task comm instead of (unknown) If we don't know the file corresponding to the binary (i.e. exe_file is unknown), use "task->comm (path unknown)" instead of simple "(unknown)" as suggested by ak. The fallback is the same as %e except it will append "(path unknown)". Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <andi@firstfloor.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:43 -07:00
Linus Torvalds	ba5b56cb3e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits) ceph: document unlocked d_parent accesses ceph: explicitly reference rename old_dentry parent dir in request ceph: document locking for ceph_set_dentry_offset ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bug ceph: protect d_parent access in ceph_d_revalidate ceph: protect access to d_parent ceph: handle racing calls to ceph_init_dentry ceph: set dir complete frag after adding capability rbd: set blk_queue request sizes to object size ceph: set up readahead size when rsize is not passed rbd: cancel watch request when releasing the device ceph: ignore lease mask ceph: fix ceph_lookup_open intent usage ceph: only link open operations to directory unsafe list if O_CREAT\|O_TRUNC ceph: fix bad parent_inode calc in ceph_lookup_open ceph: avoid carrying Fw cap during write into page cache libceph: don't time out osd requests that haven't been received ceph: report f_bfree based on kb_avail rather than diffing. ceph: only queue capsnap if caps are dirty ceph: fix snap writeback when racing with writes ...	2011-07-26 13:38:50 -07:00
Al Viro	e57712ebeb	merge fchmod() and fchmodat() guts, kill ancient broken kludge The kludge in question is undocumented and doesn't work for 32bit binaries on amd64, sparc64 and s390. Passing (mode_t)-1 as mode had (since 0.99.14v and contrary to behaviour of any other Unix, prescriptions of POSIX, SuS and our own manpages) was kinda-sorta no-op. Note that any software relying on that (and looking for examples shows none) would be visibly broken on sparc64, where practically all userland is built 32bit. No such complaints noticed... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-26 15:07:43 -04:00
Al Viro	03209378b4	xfs: fix misspelled S_IS...() mode_t is not a bitmap... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-26 15:05:30 -04:00
Al Viro	abbede1b3a	xfs: get rid of open-coded S_ISREG(), etc. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-26 15:05:16 -04:00
Linus Torvalds	2ac232f37f	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: jbd: change the field "b_cow_tid" of struct journal_head from type unsigned to tid_t ext3.txt: update the links in the section "useful links" to the latest ones ext3: Fix data corruption in inodes with journalled data ext2: check xattr name_len before acquiring xattr_sem in ext2_xattr_get ext3: Fix compilation with -DDX_DEBUG quota: Remove unused declaration jbd: Use WRITE_SYNC in journal checkpoint. jbd: Fix oops in journal_remove_journal_head() ext3: Return -EINVAL when start is beyond the end of fs in ext3_trim_fs() ext3/ioctl.c: silence sparse warnings about different address spaces ext3/ext4 Documentation: remove bh/nobh since it has been deprecated ext3: Improve truncate error handling ext3: use proper little-endian bitops ext2: include fs.h into ext2_fs.h ext3: Fix oops in ext3_try_to_allocate_with_rsv() jbd: fix a bug of leaking jh->b_jcount jbd: remove dependency on __GFP_NOFAIL ext3: Convert ext3 to new truncate calling convention jbd: Add fixed tracepoints ext3: Add fixed tracepoints Resolve conflicts in fs/ext3/fsync.c due to fsync locking push-down and new fixed tracepoints.	2011-07-26 11:34:40 -07:00
Sage Weil	d79698da32	ceph: document unlocked d_parent accesses For the most part we don't care about racing with rename when directing MDS requests; either the old or new parent is fine. Document that, and do some minor cleanup. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:31:26 -07:00
Sage Weil	41b02e1f9b	ceph: explicitly reference rename old_dentry parent dir in request We carry a pin on the parent directory for the rename source and dest dentries. For the source it's r_locked_dir; we need to explicitly reference the old_dentry parent as well, since the dentry's d_parent may change between when the request was created and pinned and when it is freed. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:31:14 -07:00
Sage Weil	4f17726452	ceph: document locking for ceph_set_dentry_offset Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:31:08 -07:00
Sage Weil	e5f86dc377	ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bug Have caller pass in a safely-obtained reference to the parent directory for calculating a dentry's hash valud. While we're here, simpify the flow through ceph_encode_fh() so that there is a single exit point and cleanup. Also fix a bug with the dentry hash calculation: calculate the hash for the dentry we were given, not its parent. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:30:55 -07:00
Sage Weil	bf1c6aca96	ceph: protect d_parent access in ceph_d_revalidate Protect d_parent with d_lock. Carry a reference. Simplify the flow so that there is a single exit point and cleanup. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:30:43 -07:00
Sage Weil	5f21c96dd5	ceph: protect access to d_parent d_parent is protected by d_lock: use it when looking up a dentry's parent directory inode. Also take a reference and drop it in the caller to avoid a use-after-free. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:30:29 -07:00
Sage Weil	48d0cbd124	ceph: handle racing calls to ceph_init_dentry The ->lookup() and prepopulate_readdir() callers are working with unhashed dentries, so we don't have to worry. The export.c callers, though, need to initialize something they got back from d_obtain_alias() and are potentially racing with other callers. Make sure we don't return unless the dentry is properly initialized (by us or someone else). Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:30:15 -07:00
Sage Weil	dfabbed6fd	ceph: set dir complete frag after adding capability Curretly ceph_add_cap clears the complete bit if we are newly issued the FILE_SHARED cap, which is normally the case for a newly issue cap on a new directory. That means we clear the just-set bit. Move the check that sets the flag to after the cap is added/updated. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:30:02 -07:00
Yehuda Sadeh	e985222743	ceph: set up readahead size when rsize is not passed This should improve the default read performance, as without it readahead is practically disabled. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2011-07-26 11:29:14 -07:00
Sage Weil	2f90b852e3	ceph: ignore lease mask The lease mask is no longer used (and it changed a while back). Instead, use a non-zero duration to indicate that there is a lease being issued. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:28:25 -07:00
Sage Weil	468640e32c	ceph: fix ceph_lookup_open intent usage We weren't properly calling lookup_instantiate_filp when setting up the lookup intent, which could lead to file leakage on errors. So: - use separate helper for the hidden snapdir translation, immediately following the mds request - use ceph_finish_lookup for the final dentry/return value dance in the exit path - lookup_instantiate_filp on success Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:28:11 -07:00
Sage Weil	9bae113a08	ceph: only link open operations to directory unsafe list if O_CREAT\|O_TRUNC We only need to put these on the directory unsafe list if they have side effects that fsync(2) should flush out. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:27:59 -07:00
Sage Weil	acda765788	ceph: fix bad parent_inode calc in ceph_lookup_open We were always getting NULL here because the intent file f_dentry is always NULL at this point, which means we were always passing NULL to ceph_mdsc_do_request. In reality, this was fine, since this isn't currently ever a write operation that needs to get strung on the dir's unsafe list. Use the dir explicitly, and only pass it if this open has side-effects that a dir fsync should flush. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:27:48 -07:00
Sage Weil	d8de9ab63a	ceph: avoid carrying Fw cap during write into page cache The generic_file_aio_write call may block on balance_dirty_pages while we flush data to the OSDs. If we hold a reference to the FILE_WR cap during that interval revocation by the MDS (e.g., to do a stat(2)) may be very slow. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:27:34 -07:00
Greg Farnum	8f04d42276	ceph: report f_bfree based on kb_avail rather than diffing. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>	2011-07-26 11:27:06 -07:00
Sage Weil	e77dc3e9c0	ceph: only queue capsnap if caps are dirty We used to go into this branch if i_wrbuffer_ref_head was non-zero. This was an ancient check from before we were careful about dealing with all kinds of caps (and not just dirty pages). It is cleaner to only queue a capsnap if there is an actual dirty cap. If we are racing with... something...we will end up here with ci->i_wrbuffer_refs but no dirty caps. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:26:41 -07:00
Sage Weil	af0ed569d7	ceph: fix snap writeback when racing with writes There are two problems that come up when we try to queue a capsnap while a write is in progress: - The FILE_WR cap is held, but not yet dirty, so we may queue a capsnap with dirty == 0. That will crash later in __ceph_flush_snaps(). Or on the FILE_WR cap if a write is in progress. - We may not have i_head_snapc set, which causes problems pretty quickly. Look to the snaprealm in this case. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:26:31 -07:00
Sage Weil	9cfa1098dc	ceph: use flag bit for at_end readdir flag This saves us a word of memory per file. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:26:18 -07:00
Sage Weil	4918b6d140	ceph: add F_SYNC file flag to force sync (non-O_DIRECT) io This allows us to force IO through the sync path which you normally only get when multiple clients are reading/writing to the same file or by mounting with -o sync. Among other things, this lets test programs verify correctness with a single mount. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:26:07 -07:00
Sage Weil	252c6728de	ceph: add flags field to file_info Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:25:27 -07:00
Linus Torvalds	1d87c28e68	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: Cleanup: check return codes of crypto api calls CIFS: Fix oops while mounting with prefixpath [CIFS] Redundant null check after dereference cifs: use cifs_dirent in cifs_save_resume_key cifs: use cifs_dirent to replace cifs_get_name_from_search_buf cifs: introduce cifs_dirent cifs: cleanup cifs_filldir	2011-07-26 11:11:28 -07:00
Jeff Layton	c46c887744	vfs: document locking requirements for d_move, __d_move and d_materialise_unique Adding a comment to d_materialise_unique per Al's request... d_move and __d_move have some pretty substantial locking requirements, but they are not clearly documented. Add some comments spelling them out. Also, document the requirement for the i_mutex of the parent in d_materialise_unique. Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-26 13:41:14 -04:00
Linus Torvalds	f01ef569cd	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback: (27 commits) mm: properly reflect task dirty limits in dirty_exceeded logic writeback: don't busy retry writeback on new/freeing inodes writeback: scale IO chunk size up to half device bandwidth writeback: trace global_dirty_state writeback: introduce max-pause and pass-good dirty limits writeback: introduce smoothed global dirty limit writeback: consolidate variable names in balance_dirty_pages() writeback: show bdi write bandwidth in debugfs writeback: bdi write bandwidth estimation writeback: account per-bdi accumulated written pages writeback: make writeback_control.nr_to_write straight writeback: skip tmpfs early in balance_dirty_pages_ratelimited_nr() writeback: trace event writeback_queue_io writeback: trace event writeback_single_inode writeback: remove .nonblocking and .encountered_congestion writeback: remove writeback_control.more_io writeback: skip balance_dirty_pages() for in-memory fs writeback: add bdi_dirty_limit() kernel-doc writeback: avoid extra sync work at enqueue time writeback: elevate queue_io() into wb_writeback() ... Fix up trivial conflicts in fs/fs-writeback.c and mm/filemap.c	2011-07-26 10:39:54 -07:00
Al Viro	41c96486f2	omfs: fix (mode & S_IFDIR) abuse granted, on a filesystem that has only regular files and directories it happens to work, but really should be S_ISDIR(mode)... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-26 13:05:28 -04:00
Al Viro	569254b0cc	btrfs: S_ISREG(mode) is not mode & S_IFREG... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-26 13:05:05 -04:00
Al Viro	61effb519c	jffs2: S_ISLNK(mode & S_IFMT) is pointless it's S_ISLNK(mode), TYVM... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-26 13:00:35 -04:00
Al Viro	24a01d4ee4	v9fs_iop_get_acl: get rid of unused variable Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-26 12:57:42 -04:00
Eric Dumazet	a209dfc7b0	vfs: dont chain pipe/anon/socket on superblock s_inodes list Workloads using pipes and sockets hit inode_sb_list_lock contention. superblock s_inodes list is needed for quota, dirty, pagecache and fsnotify management. pipe/anon/socket fs are clearly not candidates for these. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-26 12:57:09 -04:00
Dan Carpenter	bacb2d816c	fs: add missing unlock in default_llseek() A recent change in linux-next, `982d816581` "fs: add SEEK_HOLE and SEEK_DATA flags" added some direct returns on error, but it should have been a goto out. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-26 12:57:09 -04:00
Jan Kara	2d859db3e4	ext4: fix data corruption in inodes with journalled data When journalling data for an inode (either because it is a symlink or because the filesystem is mounted in data=journal mode), ext4_evict_inode() can discard unwritten data by calling truncate_inode_pages(). This is because we don't mark the buffer / page dirty when journalling data but only add the buffer to the running transaction and thus mm does not know there are still unwritten data. Fix the problem by carefully tracking transaction containing inode's data, committing this transaction, and writing uncheckpointed buffers when inode should be reaped. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-26 09:07:11 -04:00
Steven Whitehouse	1923703991	GFS2: Fix mount hang caused by certain access pattern to sysfs files Depending upon the order of userspace/kernel during the mount process, this can result in a hang without the _all version of the completion. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-07-26 10:18:37 +01:00
Linus Torvalds	e08dc1325f	p9: avoid unused variable warning Commit `4e34e719e4` ("fs: take the ACL checks to common code") removed the use of the 'acl' variable in v9fs_iop_get_acl(), but left the variable definition around. Remove it to get rid of the warning: fs/9p/acl.c: In function ‘v9fs_iop_get_acl’: fs/9p/acl.c:101:20: warning: unused variable ‘acl’ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-25 23:43:53 -07:00
Linus Torvalds	91d44d9999	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus * git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus: Squashfs: Make ZLIB compression support optional Squashfs: Update documentation for XZ and add squashfs-tools devel tree	2011-07-25 22:50:35 -07:00
Linus Torvalds	2dad3206db	Merge branch 'for-3.1' of git://linux-nfs.org/~bfields/linux * 'for-3.1' of git://linux-nfs.org/~bfields/linux: nfsd: don't break lease on CLAIM_DELEGATE_CUR locks: rename lock-manager ops nfsd4: update nfsv4.1 implementation notes nfsd: turn on reply cache for NFSv4 nfsd4: call nfsd4_release_compoundargs from pc_release nfsd41: Deny new lock before RECLAIM_COMPLETE done fs: locks: remove init_once nfsd41: check the size of request nfsd41: error out when client sets maxreq_sz or maxresp_sz too small nfsd4: fix file leak on open_downgrade nfsd4: remember to put RW access on stateid destruction NFSD: Added TEST_STATEID operation NFSD: added FREE_STATEID operation svcrpc: fix list-corrupting race on nfsd shutdown rpc: allow autoloading of gss mechanisms svcauth_unix.c: quiet sparse noise svcsock.c: include sunrpc.h to quiet sparse noise nfsd: Remove deprecated nfsctl system call and related code. NFSD: allow OP_DESTROY_CLIENTID to be only op in COMPOUND Fix up trivial conflicts in Documentation/feature-removal-schedule.txt	2011-07-25 22:49:19 -07:00
Linus Torvalds	84635d68be	vfs: fix check_acl compile error when CONFIG_FS_POSIX_ACL is not set Commit `e77819e57f` ("vfs: move ACL cache lookup into generic code") didn't take the FS_POSIX_ACL config variable into account - when that is not set, ACL's go away, and the cache helper functions do not exist, causing compile errors like fs/namei.c: In function 'check_acl': fs/namei.c:191:10: error: implicit declaration of function 'negative_cached_acl' fs/namei.c:196:2: error: implicit declaration of function 'get_cached_acl' fs/namei.c:196:6: warning: assignment makes pointer from integer without a cast fs/namei.c:212:11: error: implicit declaration of function 'set_cached_acl' Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Acked-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-25 22:47:03 -07:00
Linus Torvalds	45b583b10a	Merge 'akpm' patch series * Merge akpm patch series: (122 commits) drivers/connector/cn_proc.c: remove unused local Documentation/SubmitChecklist: add RCU debug config options reiserfs: use hweight_long() reiserfs: use proper little-endian bitops pnpacpi: register disabled resources drivers/rtc/rtc-tegra.c: properly initialize spinlock drivers/rtc/rtc-twl.c: check return value of twl_rtc_write_u8() in twl_rtc_set_time() drivers/rtc: add support for Qualcomm PMIC8xxx RTC drivers/rtc/rtc-s3c.c: support clock gating drivers/rtc/rtc-mpc5121.c: add support for RTC on MPC5200 init: skip calibration delay if previously done misc/eeprom: add eeprom access driver for digsy_mtc board misc/eeprom: add driver for microwire 93xx46 EEPROMs checkpatch.pl: update $logFunctions checkpatch: make utf-8 test --strict checkpatch.pl: add ability to ignore various messages checkpatch: add a "prefer __aligned" check checkpatch: validate signature styles and To: and Cc: lines checkpatch: add __rcu as a sparse modifier checkpatch: suggest using min_t or max_t ... Did this as a merge because of (trivial) conflicts in - Documentation/feature-removal-schedule.txt - arch/xtensa/include/asm/uaccess.h that were just easier to fix up in the merge than in the patch series.	2011-07-25 21:00:19 -07:00
Akinobu Mita	9d6bf5aa17	reiserfs: use hweight_long() Use hweight_long() to count free bits in the bitmap. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-25 20:57:17 -07:00
Akinobu Mita	0c2fd1bfb1	reiserfs: use proper little-endian bitops Using __test_and_{set,clear}_bit_le() with ignoring its return value can be replaced with __{set,clear}_bit_le(). This introduces reiserfs_{set,clear}_le_bit for __{set,clear}_bit_le and does the above change with them. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-25 20:57:17 -07:00
Hugh Dickins	708e3508c2	tmpfs: clone shmem_file_splice_read() Copy __generic_file_splice_read() and generic_file_splice_read() from fs/splice.c to shmem_file_splice_read() in mm/shmem.c. Make page_cache_pipe_buf_ops and spd_release_page() accessible to it. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-25 20:57:11 -07:00
David Rientjes	be8f684d73	oom: make deprecated use of oom_adj more verbose /proc/pid/oom_adj is deprecated and scheduled for removal in August 2012 according to Documentation/feature-removal-schedule.txt. This patch makes the warning more verbose by making it appear as a more serious problem (the presence of a stack trace and being multiline should attract more attention) so that applications still using the old interface can get fixed. Very popular users of the old interface have been converted since the oom killer rewrite has been introduced. udevd switched to the /proc/pid/oom_score_adj interface for v162, kde switched in 4.6.1, and opensshd switched in 5.7p1. At the start of 2012, this should be changed into a WARN() to emit all such incidents and then finally remove the tunable in August 2012 as scheduled. Signed-off-by: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-25 20:57:09 -07:00
Becky Bruce	2b37c35e65	fs/hugetlbfs/inode.c: fix pgoff alignment checking on 32-bit This: vma->vm_pgoff & ~(huge_page_mask(h) >> PAGE_SHIFT) is incorrect on 32-bit. It causes us to & the pgoff with something that looks like this (for a 4m hugepage): 0xfff003ff. The mask should be flipped and then shifted, to give you 0x0000_03fff. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-25 20:57:07 -07:00
Linus Torvalds	14067ff536	vfs: make gcc generate more obvious code for acl permission checking The "fsuid is the inode owner" case is not necessarily always the likely case, but it's the case that doesn't do anything odd and that we want in straight-line code. Make gcc not generate random "jump around for the fun of it" code. This just helps me read profiles. That thing is one of the hottest parts of the whole pathname lookup. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-25 19:55:52 -07:00
Shirish Pargaonkar	14cae3243b	cifs: Cleanup: check return codes of crypto api calls Check return codes of crypto api calls and either log an error or log an error and return from the calling function with error. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-25 22:12:10 +00:00
Pavel Shilovsky	f5bc1e755d	CIFS: Fix oops while mounting with prefixpath commit `fec11dd9a0` caused a regression when we have already mounted //server/share/a and want to mount //server/share/a/b. The problem is that lookup_one_len calls __lookup_hash with nd pointer as NULL. Then __lookup_hash calls do_revalidate in the case when dentry exists and we end up with NULL pointer deference in cifs_d_revalidate: if (nd->flags & LOOKUP_RCU) return -ECHILD; Fix this by checking nd for NULL. Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Shirish Pargaonkar <shirishp@us.ibm.com> CC: Stable <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-25 22:06:40 +00:00
Steve French	e010a5ef95	[CIFS] Redundant null check after dereference Reviewed-by: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-25 22:04:32 +00:00
Christoph Hellwig	eaf35b1ea8	cifs: use cifs_dirent in cifs_save_resume_key Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-25 21:43:14 +00:00
Christoph Hellwig	f16d59b417	cifs: use cifs_dirent to replace cifs_get_name_from_search_buf This allows us to parse the on the wire structures only once in cifs_filldir. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-25 21:40:53 +00:00
Christoph Hellwig	cda0ec6a86	cifs: introduce cifs_dirent Introduce a generic directory entry structure, and factor the parsing of the various on the wire structures that can represent one into a common helper. Switch cifs_entry_is_dot over to use it as a start. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-25 21:36:44 +00:00
Mark Fasheh	38a1a91953	btrfs: don't BUG_ON allocation errors in btrfs_drop_snapshot In addition to properly handling allocation failure from btrfs_alloc_path, I also fixed up the kzalloc error handling code immediately below it. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2011-07-25 14:35:15 -07:00
Mark Fasheh	92b8e897f6	btrfs: Don't BUG_ON alloc_path errors in find_next_chunk I also removed the BUG_ON from error return of find_next_chunk in init_first_rw_device(). It turns out that the only caller of init_first_rw_device() also BUGS on any nonzero return so no actual behavior change has occurred here. do_chunk_alloc() also needed an update since it calls btrfs_alloc_chunk() which can now return -ENOMEM. Instead of setting space_info->full on any error from btrfs_alloc_chunk() I catch and return every error value _except_ -ENOSPC. Thanks goes to Tsutomu Itoh for pointing that issue out. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2011-07-25 14:34:54 -07:00
Christoph Hellwig	9feed6f8fb	cifs: cleanup cifs_filldir Use sensible variable names and formatting and remove some superflous checks on entry. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-25 21:05:10 +00:00
Linus Torvalds	d3ec4844d4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits) fs: Merge split strings treewide: fix potentially dangerous trailing ';' in #defined values/expressions uwb: Fix misspelling of neighbourhood in comment net, netfilter: Remove redundant goto in ebt_ulog_packet trivial: don't touch files that are removed in the staging tree lib/vsprintf: replace link to Draft by final RFC number doc: Kconfig: `to be' -> `be' doc: Kconfig: Typo: square -> squared doc: Konfig: Documentation/power/{pm => apm-acpi}.txt drivers/net: static should be at beginning of declaration drivers/media: static should be at beginning of declaration drivers/i2c: static should be at beginning of declaration XTENSA: static should be at beginning of declaration SH: static should be at beginning of declaration MIPS: static should be at beginning of declaration ARM: static should be at beginning of declaration rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check Update my e-mail address PCIe ASPM: forcedly -> forcibly gma500: push through device driver tree ... Fix up trivial conflicts: - arch/arm/mach-ep93xx/dma-m2p.c (deleted) - drivers/gpio/gpio-ep93xx.c (renamed and context nearby) - drivers/net/r8169.c (just context changes)	2011-07-25 13:56:39 -07:00
Chandra Seetharaman	c35a549c8b	xfs: Remove the macro XFS_BUFTARG_NAME Remove the definition and usages of the macro XFS_BUFTARG_NAME. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-25 15:03:37 -05:00
Chandra Seetharaman	49074c069c	xfs: Remove the macro XFS_BUF_TARGET Remove the definition and usages of the macro XFS_BUF_TARGET Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-25 15:03:31 -05:00
Chandra Seetharaman	e38c9b87e5	xfs: Remove the macro XFS_BUF_SET_TARGET Remove the macro XFS_BUF_SET_TARGET. hch: As all the buffer allocator already set ->b_target it should be safe to simply remove these calls. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-25 15:03:26 -05:00
Chandra Seetharaman	811e64c716	Replace the macro XFS_BUF_ISPINNED with helper xfs_buf_ispinned Replace the macro XFS_BUF_ISPINNED with an inline helper function xfs_buf_ispinned, and change all its usages. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-25 15:03:22 -05:00
Chandra Seetharaman	02fe03d909	xfs: Remove the macro XFS_BUF_SET_PTR Remove the definition and usages of the macro XFS_BUF_SET_PTR. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-25 15:03:17 -05:00
Chandra Seetharaman	6292604447	xfs: Remove the macro XFS_BUF_PTR Remove the definition and usages of the macro XFS_BUF_PTR. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-25 15:03:13 -05:00
Chandra Seetharaman	0095a21eb6	xfs: Remove macro XFS_BUF_SET_START Remove the definition and usage of the macro XFS_BUF_SET_START. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-25 15:03:09 -05:00
Chandra Seetharaman	72790aa119	xfs: Remove macro XFS_BUF_HOLD Remove the definition and usage of the macro XFS_BUF_HOLD Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-25 15:03:06 -05:00
Chandra Seetharaman	b75e40a419	xfs: Remove macro XFS_BUF_BUSY and family Remove the definitions and uses of the macros XFS_BUF_BUSY, XFS_BUF_UNBUSY, and XFS_BUF_ISBUSY. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-25 15:03:00 -05:00
Chandra Seetharaman	5a52c2a581	xfs: Remove the macro XFS_BUF_ERROR and family Remove the definitions and usage of the macros XFS_BUF_ERROR, XFS_BUF_GETERROR and XFS_BUF_ISERROR. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-25 14:57:46 -05:00
Chandra Seetharaman	ed43233be9	xfs: Remove the macro XFS_BUF_BFLAGS Remove the definition of the macro XFS_BUF_BFLAGS and its usage. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-25 14:57:36 -05:00
Linus Torvalds	0003230e82	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: fs: take the ACL checks to common code bury posix_acl_..._masq() variants kill boilerplates around posix_acl_create_masq() generic_acl: no need to clone acl just to push it to set_cached_acl() kill boilerplate around posix_acl_chmod_masq() reiserfs: cache negative ACLs for v1 stat format xfs: cache negative ACLs if there is no attribute fork 9p: do no return 0 from ->check_acl without actually checking vfs: move ACL cache lookup into generic code CIFS: Fix oops while mounting with prefixpath xfs: Fix wrong return value of xfs_file_aio_write fix devtmpfs race caam: don't pass bogus S_IFCHR to debugfs_create_...() get rid of create_proc_entry() abuses - proc_mkdir() is there for purpose asus-wmi: ->is_visible() can't return negative fix jffs2 ACLs on big-endian with 16bit mode_t 9p: close ACL leaks ocfs2_init_acl(): fix a leak VFS : mount lock scalability for internal mounts	2011-07-25 12:53:15 -07:00
Trond Myklebust	ed1e6211a0	NFSv4: Don't use the delegation->inode in nfs_mark_return_delegation() nfs_mark_return_delegation() is usually called without any locking, and so it is not safe to dereference delegation->inode. Since the inode is only used to discover the nfs_client anyway, it makes more sense to have the callers pass a valid pointer to the nfs_server as a parameter. Reported-by: Ian Kent <raven@themaw.net> Cc: stable@kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-25 15:37:29 -04:00
Jeff Layton	73ca1001ed	nfs: don't use d_move in nfs_async_rename_done If the task that initiated the sillyrename ends up being killed by a fatal signal, then it will eventually return back to userspace and end up releasing the i_mutex. d_move however needs to be done while holding the i_mutex. Instead of using d_move here, just unhash the old and new dentries to prevent them from being found by lookups. With this change though, the dentries are now incorrect post-rename and do not reflect the actual name of the file on the server. I'm proceeding under the assumption that since they are unhashed that this isn't really a problem. In order for the sillydelete to still work though, the dname must be copied earlier when setting up the sillydelete info, and the name must be recopied if the sillydelete info has to be moved to a new dentry. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-25 15:00:21 -04:00
Stephen Rothwell	5f00bcb38e	Merge branch 'master' into devel and apply fixup from Stephen Rothwell: vfs/nfs: fixup for nfs_open_context change Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-07-25 14:53:52 -04:00
Christoph Hellwig	4e34e719e4	fs: take the ACL checks to common code Replace the ->check_acl method with a ->get_acl method that simply reads an ACL from disk after having a cache miss. This means we can replace the ACL checking boilerplate code with a single implementation in namei.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:30:23 -04:00
Al Viro	edde854e8b	bury posix_acl_..._masq() variants made static; no callers left outside of posix_acl.c. posix_acl_clone() also has lost all external callers and became static... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:27:32 -04:00
Al Viro	826cae2f2b	kill boilerplates around posix_acl_create_masq() new helper: posix_acl_create(&acl, gfp, mode_p). Replaces acl with modified clone, on failure releases acl and replaces with NULL. Returns 0 or -ve on error. All callers of posix_acl_create_masq() switched. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:27:32 -04:00
Al Viro	95203befa8	generic_acl: no need to clone acl just to push it to set_cached_acl() In-core acls are copy-on-write, so the reference taken by set_cached_acl() will do just fine. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:27:31 -04:00
Al Viro	bc26ab5f65	kill boilerplate around posix_acl_chmod_masq() new helper: posix_acl_chmod(&acl, gfp, mode). Replaces acl with modified clone or with NULL if that has failed; returns 0 or -ve on error. All callers of posix_acl_chmod_masq() switched to that - they'd been doing exactly the same thing. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:27:30 -04:00
Christoph Hellwig	4482a087d4	reiserfs: cache negative ACLs for v1 stat format Always set up a negative ACL cache entry if the inode can't have ACLs. That behaves much better than doing this check inside ->check_acl. Also remove the left over MAY_NOT_BLOCK check. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:25:38 -04:00
Christoph Hellwig	6311b10800	xfs: cache negative ACLs if there is no attribute fork Always set up a negative ACL cache entry if the inode doesn't have an attribute fork. That behaves much better than doing this check inside ->check_acl. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:25:38 -04:00
Christoph Hellwig	ebbb0ef287	9p: do no return 0 from ->check_acl without actually checking If we do not want to use ACLs we at least need to perform normal Unix permission checks. From the comment I'm not quite sure that's what is intended, but if 0p wants to do permission checks entirely on the server it needs to do so in ->permission, not in ->check_acl. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:25:38 -04:00
Linus Torvalds	e77819e57f	vfs: move ACL cache lookup into generic code This moves logic for checking the cached ACL values from low-level filesystems into generic code. The end result is a streamlined ACL check that doesn't need to load the inode->i_op->check_acl pointer at all for the common cached case. The filesystems also don't need to check for a non-blocking RCU walk case in their acl_check() functions, because that is all handled at a VFS layer. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:23:39 -04:00
Pavel Shilovsky	3ca30d40a9	CIFS: Fix oops while mounting with prefixpath commit `fec11dd9a0` caused a regression when we have already mounted //server/share/a and want to mount //server/share/a/b. The problem is that lookup_one_len calls __lookup_hash with nd pointer as NULL. Then __lookup_hash calls do_revalidate in the case when dentry exists and we end up with NULL pointer deference in cifs_d_revalidate: if (nd->flags & LOOKUP_RCU) return -ECHILD; Fix this by checking nd for NULL. Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:23:21 -04:00
Markus Trippelsdorf	340a0a01b9	xfs: Fix wrong return value of xfs_file_aio_write The fsync prototype change commit `02c24a8218` accidentally overwrote the ssize_t return value of xfs_file_aio_write with 0 for SYNC type writes. Fix this by checking if an error occured when calling xfs_file_fsync and only change the return value in this case. In addition xfs_file_fsync actually returns a normal negative error, so fix this, too. Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:23:21 -04:00
Linus Torvalds	096a705bbc	Merge branch 'for-3.1/core' of git://git.kernel.dk/linux-block * 'for-3.1/core' of git://git.kernel.dk/linux-block: (24 commits) block: strict rq_affinity backing-dev: use synchronize_rcu_expedited instead of synchronize_rcu block: fix patch import error in max_discard_sectors check block: reorder request_queue to remove 64 bit alignment padding CFQ: add think time check for group CFQ: add think time check for service tree CFQ: move think time check variables to a separate struct fixlet: Remove fs_excl from struct task. cfq: Remove special treatment for metadata rqs. block: document blk_plug list access block: avoid building too big plug list compat_ioctl: fix make headers_check regression block: eliminate potential for infinite loop in blkdev_issue_discard compat_ioctl: fix warning caused by qemu block: flush MEDIA_CHANGE from drivers on close(2) blk-throttle: Make total_nr_queued unsigned block: Add __attribute__((format(printf...) and fix fallout fs/partitions/check.c: make local symbols static block:remove some spare spaces in genhd.c block:fix the comment error in blkdev.h ...	2011-07-25 10:33:36 -07:00
Dave Kleikamp	3c2c226285	jfs: clean up some compiler warnings jfs has a few variables being set but never used. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>	2011-07-25 11:01:12 -05:00
Al Viro	963945bf93	fix jffs2 ACLs on big-endian with 16bit mode_t casting int * to mode_t * is not a good thing - on a lot of big-endian architectures mode_t happens to be smaller than int and there it breaks quite spectaculary... Fucked-up-by: commit `cfc8dc6f6f` Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-24 10:12:01 -04:00
Al Viro	1ec95bf34d	9p: close ACL leaks Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-24 10:10:18 -04:00
Al Viro	c0d960f038	ocfs2_init_acl(): fix a leak Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-24 10:10:09 -04:00
Tim Chen	423e0ab086	VFS : mount lock scalability for internal mounts For a number of file systems that don't have a mount point (e.g. sockfs and pipefs), they are not marked as long term. Therefore in mntput_no_expire, all locks in vfs_mount lock are taken instead of just local cpu's lock to aggregate reference counts when we release reference to file objects. In fact, only local lock need to have been taken to update ref counts as these file systems are in no danger of going away until we are ready to unregister them. The attached patch marks file systems using kern_mount without mount point as long term. The contentions of vfs_mount lock is now eliminated. Before un-registering such file system, kern_unmount should be called to remove the long term flag and make the mount point ready to be freed. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-24 10:08:32 -04:00
Wu Fengguang	fcc5c22218	writeback: don't busy retry writeback on new/freeing inodes Fix a system hang bug introduced by commit `b7a2441f99` ("writeback: remove writeback_control.more_io") and `e8dfc3058` ("writeback: elevate queue_io() into wb_writeback()") easily reproducible with high memory pressure and lots of file creation/deletions, for example, a kernel build in limited memory. It hangs when some inode is in the I_NEW, I_FREEING or I_WILL_FREE state, the flusher will get stuck busy retrying that inode, never releasing wb->list_lock. The lock in turn blocks all kinds of other tasks when they are trying to grab it. As put by Jan, it's a safe change regarding data integrity. I_FREEING or I_WILL_FREE inodes are written back by iput_final() and it is reclaim code that is responsible for eventually removing them. So writeback code can safely ignore them. I_NEW inodes should move out of this state when they are fully set up and in the writeback round following that, we will consider them for writeback. So the change makes sense. CC: Jan Kara <jack@suse.cz> Reported-by: Hugh Dickins <hughd@google.com> Tested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-07-24 10:46:51 +08:00
Robin Dong	b7ca1e8ec5	ext4: correct comment for ext4_ext_check_cache The comment for ext4_ext_check_cache has a litte mistake. Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-23 21:53:25 -04:00
Robin Dong	0737964bc9	ext4: correct the debug message in ext4_ext_insert_extent The debug message in ext4_ext_insert_extent before moving extent is incorrect (the "from xx to xx"). Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-23 21:51:07 -04:00
Robin Dong	5718789da5	ext4: remove unused argument in ext4_ext_next_leaf_block The argument "inode" in function ext4_ext_next_allocated_block looks useless, so clean it. Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-23 21:49:07 -04:00
Tao Ma	6a0fe49308	ext4: remove ac_repeats from ext4_allocation_context ac_repeats isn't referenced in the mballoc code. So remove it. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-23 16:18:55 -04:00
Tao Ma	ced156e464	ext4: don't increment s_mb_buddies_generated in ext4_mb_release In ext4_mb_release, we use s_mb_buddies_generated++. Although the output is OK, but I don't think we need this extra ++. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-23 16:18:05 -04:00
Tao Ma	529da704ad	ext4: remove unnecessary ext4_get_group_info in ext4_mb_load_buddy ext4_mb_load_buddy() calls ext4_get_group_info() for setting both "grp" and "e4b->bd_info", but it could do "e4b->bd_info = grp". Reported-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-23 16:07:26 -04:00
Casey Bodley	0c12eaffdf	nfsd: don't break lease on CLAIM_DELEGATE_CUR CLAIM_DELEGATE_CUR is used in response to a broken lease; allowing it to break the lease and return EAGAIN leaves the client unable to make progress in returning the delegation nfs4_get_vfs_file() now takes struct nfsd4_open for access to the claim type, and calls nfsd_open() with NFSD_MAY_NOT_BREAK_LEASE when claim type is CLAIM_DELEGATE_CUR Signed-off-by: Casey Bodley <cbodley@citi.umich.edu> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-07-23 14:58:17 -04:00
Aneesh Kumar K.V	48e370ff93	fs/9p: add 9P2000.L unlinkat operation unlinkat - Remove a directory entry size[4] Tunlinkat tag[2] dirfid[4] name[s] flag[4] size[4] Runlinkat tag[2] older Tremove have the below request format size[4] Tremove tag[2] fid[4] The remove message is used to remove a directory entry either file or directory The remove opreation is actually a directory opertation and should ideally have dirfid, if not we cannot represent the fid on server with anything other than name. We will have to derive the directory name from fid in the Tremove request. NOTE: The operation doesn't clunk the unlink fid. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-07-23 09:32:52 -05:00
Aneesh Kumar K.V	9e8fb38e7d	fs/9p: add 9P2000.L renameat operation renameat - change name of file or directory size[4] Trenameat tag[2] olddirfid[4] oldname[s] newdirfid[4] newname[s] size[4] Rrenameat tag[2] older Trename have the below request format size[4] Trename tag[2] fid[4] newdirfid[4] name[s] The rename message is used to change the name of a file, possibly moving it to a new directory. The rename opreation is actually a directory opertation and should ideally have olddirfid, if not we cannot represent the fid on server with anything other than name. We will have to derive the old directory name from fid in the Trename request. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-07-23 09:32:51 -05:00
Aneesh Kumar K.V	ed80fcfac2	fs/9p: Always ask new inode in create This make sure we don't end up reusing the unlinked inode object. The ideal way is to use inode i_generation. But i_generation is not available in userspace always. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-07-23 09:32:50 -05:00
Prem Karat	a2dd43bb0d	fs/9p: Fix invalid mount options/args Without this fix, if any invalid mount options/args are passed while mouting the 9p fs, no error (-EINVAL) is returned and default arg value is assigned. This fix returns -EINVAL when an invalid arguement is found while parsing mount options. Signed-off-by: Prem Karat <prem.karat@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-07-23 09:32:48 -05:00
Aneesh Kumar K.V	fd2421f544	fs/9p: When doing inode lookup compare qid details and inode mode bits. This make sure we don't use wrong inode from the inode hash. The inode number of the file deleted is reused by the next file system object created and if we only use inode number for inode hash lookup we could end up with wrong struct inode. Also compare inode generation number. Not all Linux file system provide st_gen in userspace. So it could be 0; Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-07-23 09:32:48 -05:00
Aneesh Kumar K.V	2053d67c54	fs/9p: remove rename work around in 9p Now that VFS does the right thing remove the work around. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-07-23 09:32:47 -05:00
Linus Torvalds	bbd9d6f7fb	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (107 commits) vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp isofs: Remove global fs lock jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory fix IN_DELETE_SELF on overwriting rename() on ramfs et.al. mm/truncate.c: fix build for CONFIG_BLOCK not enabled fs:update the NOTE of the file_operations structure Remove dead code in dget_parent() AFS: Fix silly characters in a comment switch d_add_ci() to d_splice_alias() in "found negative" case as well simplify gfs2_lookup() jfs_lookup(): don't bother with . or .. get rid of useless dget_parent() in btrfs rename() and link() get rid of useless dget_parent() in fs/btrfs/ioctl.c fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers drivers: fix up various ->llseek() implementations fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek Ext4: handle SEEK_HOLE/SEEK_DATA generically Btrfs: implement our own ->llseek fs: add SEEK_HOLE and SEEK_DATA flags reiserfs: make reiserfs default to barrier=flush ... Fix up trivial conflicts in fs/xfs/linux-2.6/xfs_super.c due to the new shrinker callout for the inode cache, that clashed with the xfs code to start the periodic workers later.	2011-07-22 19:02:39 -07:00
Jan Kara	b22570d9ab	ext3: Fix data corruption in inodes with journalled data When journalling data for an inode (either because it is a symlink or because the filesystem is mounted in data=journal mode), ext3_evict_inode() can discard unwritten data by calling truncate_inode_pages(). This is because we don't mark the buffer / page dirty when journalling data but only add the buffer to the running transaction and thus mm does not know there are still unwritten data. Fix the problem by carefully tracking transaction containing inode's data, committing this transaction, and writing uncheckpointed buffers when inode should be reaped. Signed-off-by: Jan Kara <jack@suse.cz>	2011-07-23 01:49:00 +02:00
Konstantin Khlebnikov	5a9a43646c	vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp Replace unclear (struct dentry ) to (struct file ) typecast with ERR_CAST() macro. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-22 19:42:13 -04:00
Jan Kara	d769b3c2ab	isofs: Remove global fs lock sbi->s_mutex isn't needed for isofs at all so we can just remove it. Generally, since isofs is always mounted read-only, filesystem structure cannot change under us. So buffer_head contents stays constant after it's filled in. That leaves us with possible changes of global data structures. Superblock changes only during filesystem mount (even remount does not change it), inodes are only filled in during reading from disk. So there are no changes of these structures to bother about. Arguments why sbi->s_mutex can be removed at each place: isofs_readdir: Accesses sb, inode, filp, local variables => s_mutex not needed isofs_lookup: Protected by directory's i_mutex. Accesses sb, inode, dentry, local variables => s_mutex not needed rock_ridge_symlink_readpage: Protected by page lock. Accesses sb, inode, local variables => s_mutex not needed. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-22 19:42:12 -04:00
Al Viro	22ba747f66	jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory We don't generate IN_DELETE_SELF on victim of overwriting rename() if it happens to be a directory. Trivially fixed by doing to ->i_nlink what we do ->pino_nlink a couple of lines later in jffs2_rename(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-22 19:42:11 -04:00
Al Viro	841590ce16	fix IN_DELETE_SELF on overwriting rename() on ramfs et.al. On ramfs and other simple_rename() users IN_DELETE_SELF is not generated for victim of overwriting rename() if it's is a directory. Works on most of the local filesystems and really trivial to fix... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-22 19:42:11 -04:00
Matthew Garrett	dee28e72b6	pstore: Allow the user to explicitly choose a backend pstore only allows one backend to be registered at present, but the system may provide several. Add a parameter to allow the user to choose which backend will be used rather than just relying on load order. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2011-07-22 16:14:39 -07:00
Matthew Garrett	b94fdd077e	pstore: Make "part" unsigned We'll never have a negative part, so just make this an unsigned int. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2011-07-22 16:14:29 -07:00
Matthew Garrett	56280682ce	pstore: Add extra context for writes and erases EFI only provides small amounts of individual storage, and conventionally puts metadata in the storage variable name. Rather than add a metadata header to the (already limited) variable storage, it's easier for us to modify pstore to pass all the information we need to construct a unique variable name to the appropriate functions. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2011-07-22 16:14:20 -07:00
Matthew Garrett	638c1fd303	pstore: Extend API for more flexibility in new backends Some pstore implementations may not have a static context, so extend the API to pass the pstore_info struct to all calls and allow for a context pointer. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2011-07-22 16:14:06 -07:00
Linus Torvalds	8209f53d79	Merge branch 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc * 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc: (39 commits) ptrace: do_wait(traced_leader_killed_by_mt_exec) can block forever ptrace: fix ptrace_signal() && STOP_DEQUEUED interaction connector: add an event for monitoring process tracers ptrace: dont send SIGSTOP on auto-attach if PT_SEIZED ptrace: mv send-SIGSTOP from do_fork() to ptrace_init_task() ptrace_init_task: initialize child->jobctl explicitly has_stopped_jobs: s/task_is_stopped/SIGNAL_STOP_STOPPED/ ptrace: make former thread ID available via PTRACE_GETEVENTMSG after PTRACE_EVENT_EXEC stop ptrace: wait_consider_task: s/same_thread_group/ptrace_reparented/ ptrace: kill real_parent_is_ptracer() in in favor of ptrace_reparented() ptrace: ptrace_reparented() should check same_thread_group() redefine thread_group_leader() as exit_signal >= 0 do not change dead_task->exit_signal kill task_detached() reparent_leader: check EXIT_DEAD instead of task_detached() make do_notify_parent() __must_check, update the callers __ptrace_detach: avoid task_detached(), check do_notify_parent() kill tracehook_notify_death() make do_notify_parent() return bool ptrace: s/tracehook_tracer_task()/ptrace_parent()/ ...	2011-07-22 15:06:50 -07:00
Linus Torvalds	c1f792a5bf	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: (49 commits) xfs: add size update tracepoint to IO completion xfs: convert AIL cursors to use struct list_head xfs: remove confusing ail cursor wrapper xfs: use a cursor for bulk AIL insertion xfs: failure mapping nfs fh to inode should return ESTALE xfs: Remove the second parameter to xfs_sb_count() xfs: remove the dead XFS_DABUF_DEBUG code xfs: remove leftovers of the old btree tracing code xfs: remove the dead QUOTADEBUG code xfs: remove the unused xfs_buf_delwri_sort function xfs: remove wrappers around b_iodone xfs: remove wrappers around b_fspriv xfs: add a proper transaction pointer to struct xfs_buf xfs: factor out xfs_da_grow_inode_int xfs: factor out xfs_dir2_leaf_find_stale xfs: cleanup struct xfs_dir2_free xfs: reshuffle dir2 headers xfs: start periodic workers later Revert "xfs: fix filesystsem freeze race in xfs_trans_alloc" xfs: remove variables that serve no purpose in xfs_alloc_ag_vextent_exact() ...	2011-07-22 13:16:33 -07:00
Linus Torvalds	6aaf4404ab	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: don't limit active work items dlm: use workqueue for callbacks dlm: remove deadlock debug print dlm: improve rsb searches dlm: keep lkbs in idr dlm: fix kmalloc args dlm: don't do pointless NULL check, use kzalloc and fix order of arguments dlm: dump address of unknown node dlm: use vmalloc for hash tables dlm: show addresses in configfs	2011-07-22 13:16:07 -07:00
Linus Torvalds	ba1f9db908	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus: hfsplus: ensure bio requests are not smaller than the hardware sectors hfsplus: Add additional range check to handle on-disk corruptions hfsplus: Add error propagation for hfsplus_ext_write_extent_locked hfsplus: add error checking for hfs_find_init() hfsplus: lift the 2TB size limit hfsplus: fix overflow in hfsplus_read_wrapper hfsplus: fix overflow in hfsplus_get_block hfsplus: assignments inside `if' condition clean-up	2011-07-22 13:12:17 -07:00
Linus Torvalds	49302baa64	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: GFS2: combine duplicated block freeing routines GFS2: Add S_NOSEC support GFS2: Automatically adjust glock min hold time GFS2: Cache dir hash table in a contiguous buffer	2011-07-22 13:10:41 -07:00
Linus Torvalds	59a7ac1211	Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6 * 'linux-next' of git://git.infradead.org/ubifs-2.6: (32 commits) MAINTAINERS: change e-mail of Adrian Hunter UBIFS: fix master node recovery UBIFS: improve power cut emulation testing UBIFS: rename recovery testing variables UBIFS: remove custom list of superblocks UBIFS: stop re-defining UBI operations UBIFS: switch to I/O helpers UBIFS: switch to ubifs_leb_write UBIFS: switch to ubifs_leb_read UBIFS: introduce more I/O helpers UBIFS: always print stacktrace when switching to R/O mode UBIFS: remove unused and unneeded debugging function UBIFS: add global debugfs knobs UBIFS: introduce debugfs helpers UBIFS: re-arrange debugging code a bit UBIFS: be more informative in failure mode UBIFS: switch self-check knobs to debugfs UBIFS: lessen amount of debugging check types UBIFS: introduce helper functions for debugging checks and tests UBIFS: amend debugging inode size check function prototype ...	2011-07-22 13:09:35 -07:00
Wang Sheng-Hui	03b5bb3429	ext2: check xattr name_len before acquiring xattr_sem in ext2_xattr_get In ext2_xattr_get(), the code will acquire xattr_sem first, later checks the length of xattr name_len > 255. It's unnecessarily time consuming and also ext2_xattr_set() checks the length before other checks. So move the check before acquiring xattr_sem to make these two functions consistent. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2011-07-22 19:41:16 +02:00
Jean Delvare	df2e301fee	fs: Merge split strings No idea why these were split in the first place... Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-07-22 16:47:15 +02:00
Seth Forshee	6596528e39	hfsplus: ensure bio requests are not smaller than the hardware sectors Currently all bio requests are 512 bytes, which may fail for media whose physical sector size is larger than this. Ensure these requests are not smaller than the block device logical block size. BugLink: http://bugs.launchpad.net/bugs/734883 Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2011-07-22 16:37:44 +02:00
Naohiro Aota	aac4e4198e	hfsplus: Add additional range check to handle on-disk corruptions 'recoff' is read from disk and used for an argument to memcpy, so if the value read from disk is larger than the page size, it result to "general protection fault". This patch add additional range check for the value, so that disk fuzz won't cause such fault. Signed-off-by: Naohiro Aota <naota@elisp.net> Signed-off-by: Christoph Hellwig <hch@lst.de>	2011-07-22 16:36:56 +02:00
Oleg Nesterov	eac1b5e57d	ptrace: do_wait(traced_leader_killed_by_mt_exec) can block forever Test-case: void tfunc(void arg) { execvp("true", NULL); return NULL; } int main(void) { int pid; if (fork()) { pthread_t t; kill(getpid(), SIGSTOP); pthread_create(&t, NULL, tfunc, NULL); for (;;) pause(); } pid = getppid(); assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0); while (wait(NULL) > 0) ptrace(PTRACE_CONT, pid, 0,0); return 0; } It is racy, exit_notify() does __wake_up_parent() too. But in the likely case it triggers the problem: de_thread() does release_task() and the old leader goes away without the notification, the tracer sleeps in do_wait() without children/tracees. Change de_thread() to do __wake_up_parent(traced_leader->parent). Since it is already EXIT_DEAD we can do this without ptrace_unlink(), EXIT_DEAD threads do not exist from do_wait's pov. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Tejun Heo <tj@kernel.org>	2011-07-22 15:10:49 +02:00
Phillip Lougher	cc6d349714	Squashfs: Make ZLIB compression support optional Squashfs now supports XZ and LZO compression in addition to ZLIB. As such it no longer makes sense to always include ZLIB support. In particular embedded systems may only use LZO or XZ compression, and the ability to exclude ZLIB support will reduce kernel size. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>	2011-07-22 03:01:28 +01:00
Linus Torvalds	2bafc7a275	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: CIFS: Fix wrong length in cifs_iovec_read	2011-07-21 14:28:01 -07:00
Linus Torvalds	b91da88fed	vfs: drop conditional inode prefetch in __do_lookup_rcu It seems to hurt performance in real life. Yes, the inode will be used later, but the conditional doesn't seem to predict all that well (negative dentries are not uncommon) and it looks like the cost of prefetching is simply higher than depending on the cache doing the right thing. As usual. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-21 11:01:42 -07:00
Jan Beulich	b307d4655a	FS-Cache: Fix __fscache_uncache_all_inode_pages()'s outer loop The compiler, at least for ix86 and m68k, validly warns that the comparison: next <= (loff_t)-1 is always true (and it's always true also for x86-64 and probably all other arches - as long as pgoff_t isn't wider than loff_t). The intention appears to be to avoid wrapping of "next", so rather than eliminating the pointless comparison, fix the loop to indeed get exited when "next" would otherwise wrap. On m68k the following warning is observed: fs/fscache/page.c: In function '__fscache_uncache_all_inode_pages': fs/fscache/page.c:979: warning: comparison is always false due to limited range of data type Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Suresh Jayaraman <sjayaraman@suse.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-21 10:59:16 -07:00
Pavel Shilovsky	2cebaa58b7	CIFS: Fix wrong length in cifs_iovec_read Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-21 00:48:05 +00:00
Al Viro	86c98e8cdb	Remove dead code in dget_parent() ->d_parent is never NULL... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:48:04 -04:00
David Howells	e4b9f00581	AFS: Fix silly characters in a comment Fix silly characters in a comment in AFS code (some weird characters replaced the word 'flag' some point way back). Reported-by: viro@ZenIV.linux.org.uk Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:48:03 -04:00
Al Viro	4513d899c4	switch d_add_ci() to d_splice_alias() in "found negative" case as well Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:48:02 -04:00
Al Viro	6c673ab393	simplify gfs2_lookup() d_splice_alias() will DTRT when given NULL or ERR_PTR Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:48:02 -04:00
Al Viro	79ac5a46c5	jfs_lookup(): don't bother with . or .. they'll never be passed to ->lookup() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:48:01 -04:00
Al Viro	10d9f309d8	get rid of useless dget_parent() in btrfs rename() and link() ->d_parent is locked and stable there... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:48:00 -04:00
Al Viro	2fbe8c8ad1	get rid of useless dget_parent() in fs/btrfs/ioctl.c both callers there have dentry->d_parent stabilized by the fact that their caller had obtained dentry from lookup_one_len() and had not dropped ->i_mutex on parent since then. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:48:00 -04:00
Josef Bacik	02c24a8218	fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers Btrfs needs to be able to control how filemap_write_and_wait_range() is called in fsync to make it less of a painful operation, so push down taking i_mutex and the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some file systems can drop taking the i_mutex altogether it seems, like ext3 and ocfs2. For correctness sake I just pushed everything down in all cases to make sure that we keep the current behavior the same for everybody, and then each individual fs maintainer can make up their mind about what to do from there. Thanks, Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:59 -04:00
Josef Bacik	06222e491e	fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek This converts everybody to handle SEEK_HOLE/SEEK_DATA properly. In some cases we just return -EINVAL, in others we do the normal generic thing, and in others we're simply making sure that the properly due-dilligence is done. For example in NFS/CIFS we need to make sure the file size is update properly for the SEEK_HOLE and SEEK_DATA case, but since it calls the generic llseek stuff itself that is all we have to do. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:58 -04:00
Josef Bacik	c334b1138b	Ext4: handle SEEK_HOLE/SEEK_DATA generically Since Ext4 has its own lseek we need to make sure it handles SEEK_HOLE/SEEK_DATA. For now just do the same thing that is done in the generic case, somebody else can come along and make it do fancy things later. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:57 -04:00
Josef Bacik	b26751575a	Btrfs: implement our own ->llseek In order to handle SEEK_HOLE/SEEK_DATA we need to implement our own llseek. Basically for the normal SEEK_*'s we will just defer to the generic helper, and for SEEK_HOLE/SEEK_DATA we will use our fiemap helper to figure out the nearest hole or data. Currently this helper doesn't check for delalloc bytes for prealloc space, so for now treat prealloc as data until that is fixed. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:56 -04:00
Josef Bacik	982d816581	fs: add SEEK_HOLE and SEEK_DATA flags This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags. Turns out using fiemap in things like cp cause more problems than it solves, so lets try and give userspace an interface that doesn't suck. We need to match solaris here, and the definitions are o If /whence/ is SEEK_HOLE, the offset of the start of the next hole greater than or equal to the supplied offset is returned. The definition of a hole is provided near the end of the DESCRIPTION. o If /whence/ is SEEK_DATA, the file pointer is set to the start of the next non-hole file region greater than or equal to the supplied offset. So in the generic case the entire file is data and there is a virtual hole at the end. That means we will just return i_size for SEEK_HOLE and will return the same offset for SEEK_DATA. This is how Solaris does it so we have to do it the same way. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:56 -04:00
Christoph Hellwig	b4d5b10fb2	reiserfs: make reiserfs default to barrier=flush Change the default reiserfs mount option to barrier=flush. Based on a patch from Jeff Mahoney in the SuSE tree. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:55 -04:00
Christoph Hellwig	00eacd66cd	ext3: make ext3 mount default to barrier=1 This patch turns on barriers by default for ext3. mount -o barrier=0 will turn them off. Based on a patch from Chris Mason in the SuSE tree. Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Ted Ts'o <tytso@mit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:54 -04:00
Al Viro	b85fd6bdc9	don't open-code parent_ino() in assorted ->readdir() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:54 -04:00
Al Viro	2def9e4ec7	minix_getattr(): don't bother with ->d_parent we can find superblock easier, TYVM... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:53 -04:00
Al Viro	ee60498f3e	coda_venus_readdir(): use offsetof() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:52 -04:00
Kay Sievers	f15146380d	fs: seq_file - add event counter to simplify poll() support Moving the event counter into the dynamically allocated 'struc seq_file' allows poll() support without the need to allocate its own tracking structure. All current users are switched over to use the new counter. Requested-by: Andrew Morton akpm@linux-foundation.org Acked-by: NeilBrown <neilb@suse.de> Tested-by: Lucas De Marchi lucas.demarchi@profusion.mobi Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:50 -04:00
Christoph Hellwig	72c5052ddc	fs: move inode_dio_done to the end_io handler For filesystems that delay their end_io processing we should keep our i_dio_count until the the processing is done. Enable this by moving the inode_dio_done call to the end_io handler if one exist. Note that the actual move to the workqueue for ext4 and XFS is not done in this patch yet, but left to the filesystem maintainers. At least for XFS it's not needed yet either as XFS has an internal equivalent to i_dio_count. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:50 -04:00
Christoph Hellwig	aacfc19c62	fs: simplify the blockdev_direct_IO prototype Simple filesystems always pass inode->i_sb_bdev as the block device argument, and never need a end_io handler. Let's simply things for them and for my grepping activity by dropping these arguments. The only thing not falling into that scheme is ext4, which passes and end_io handler without needing special flags (yet), but given how messy the direct I/O code there is use of __blockdev_direct_IO in one instead of two out of three cases isn't going to make a large difference anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:49 -04:00
Christoph Hellwig	df2d6f2658	fs: always maintain i_dio_count Maintain i_dio_count for all filesystems, not just those using DIO_LOCKING. This these filesystems to also protect truncate against direct I/O requests by using common code. Right now the only non-DIO_LOCKING filesystem that appears to do so is XFS, which uses an opencoded variant of the i_dio_count scheme. Behaviour doesn't change for filesystems never calling inode_dio_wait. For ext4 behaviour changes when using the dioread_nonlock option, which previously was missing any protection between truncate and direct I/O reads. For ocfs2 that handcrafted i_dio_count manipulations are replaced with the common code now enable. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:48 -04:00
Christoph Hellwig	562c72aa57	fs: move inode_dio_wait calls into ->setattr Let filesystems handle waiting for direct I/O requests themselves instead of doing it beforehand. This means filesystem-specific locks to prevent new dio referenes from appearing can be held. This is important to allow generalizing i_dio_count to non-DIO_LOCKING filesystems. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:47 -04:00
Christoph Hellwig	bd5fe6c5eb	fs: kill i_alloc_sem i_alloc_sem is a rather special rw_semaphore. It's the last one that may be released by a non-owner, and it's write side is always mirrored by real exclusion. It's intended use it to wait for all pending direct I/O requests to finish before starting a truncate. Replace it with a hand-grown construct: - exclusion for truncates is already guaranteed by i_mutex, so it can simply fall way - the reader side is replaced by an i_dio_count member in struct inode that counts the number of pending direct I/O requests. Truncate can't proceed as long as it's non-zero - when i_dio_count reaches non-zero we wake up a pending truncate using wake_up_bit on a new bit in i_flags - new references to i_dio_count can't appear while we are waiting for it to read zero because the direct I/O count always needs i_mutex (or an equivalent like XFS's i_iolock) for starting a new operation. This scheme is much simpler, and saves the space of a spinlock_t and a struct list_head in struct inode (typically 160 bits on a non-debug 64-bit system). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:46 -04:00
Christoph Hellwig	f9b5570d7f	fs: simplify handling of zero sized reads in __blockdev_direct_IO Reject zero sized reads as soon as we know our I/O length, and don't borther with locks or allocations that might have to be cleaned up otherwise. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:45 -04:00
Jan Kara	9ea7df534e	ext4: Rewrite ext4_page_mkwrite() to use generic helpers Rewrite ext4_page_mkwrite() to use __block_page_mkwrite() helper. This removes the need of using i_alloc_sem to avoid races with truncate which seems to be the wrong locking order according to lock ordering documented in mm/rmap.c. Also calling ext4_da_write_begin() as used by the old code seems to be problematic because we can decide to flush delay-allocated blocks which will acquire s_umount semaphore - again creating unpleasant lock dependency if not directly a deadlock. Also add a check for frozen filesystem so that we don't busyloop in page fault when the filesystem is frozen. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:45 -04:00
Christoph Hellwig	5826869158	fat: remove i_alloc_sem abuse Add a new rw_semaphore to protect bmap against truncate. Previous i_alloc_sem was abused for this, but it's going away in this series. Note that we can't simply use i_mutex, given that the swapon code calls ->bmap under it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:44 -04:00
Tobias Klauser	8c5dc70aae	VFS: Fixup kerneldoc for generic_permission() The flags parameter went away in d749519b444db985e40b897f73ce1898b11f997e Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:43 -04:00
Dave Chinner	8daaa83145	xfs: make use of new shrinker callout for the inode cache Convert the inode reclaim shrinker to use the new per-sb shrinker operations. This allows much bigger reclaim batches to be used, and allows the XFS inode cache to be shrunk in proportion with the VFS dentry and inode caches. This avoids the problem of the VFS caches being shrunk significantly before the XFS inode cache is shrunk resulting in imbalances in the caches during reclaim. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:42 -04:00
Dave Chinner	8ab47664d5	vfs: increase shrinker batch size Now that the per-sb shrinker is responsible for shrinking 2 or more caches, increase the batch size to keep econmies of scale for shrinking each cache. Increase the shrinker batch size to 1024 objects. To allow for a large increase in batch size, add a conditional reschedule to prune_icache_sb() so that we don't hold the LRU spin lock for too long. This mirrors the behaviour of the __shrink_dcache_sb(), and allows us to increase the batch size without needing to worry about problems caused by long lock hold times. To ensure that filesystems using the per-sb shrinker callouts don't cause problems, document that the object freeing method must reschedule appropriately inside loops. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:41 -04:00
Dave Chinner	0e1fdafd93	superblock: add filesystem shrinker operations Now we have a per-superblock shrinker implementation, we can add a filesystem specific callout to it to allow filesystem internal caches to be shrunk by the superblock shrinker. Rather than perpetuate the multipurpose shrinker callback API (i.e. nr_to_scan == 0 meaning "tell me how many objects freeable in the cache), two operations will be added. The first will return the number of objects that are freeable, the second is the actual shrinker call. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:41 -04:00
Dave Chinner	4f8c19fdf3	inode: remove iprune_sem Now that we have per-sb shrinkers with a lifecycle that is a subset of the superblock lifecycle and can reliably detect a filesystem being unmounted, there is not longer any race condition for the iprune_sem to protect against. Hence we can remove it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:40 -04:00
Dave Chinner	b0d40c92ad	superblock: introduce per-sb cache shrinker infrastructure With context based shrinkers, we can implement a per-superblock shrinker that shrinks the caches attached to the superblock. We currently have global shrinkers for the inode and dentry caches that split up into per-superblock operations via a coarse proportioning method that does not batch very well. The global shrinkers also have a dependency - dentries pin inodes - so we have to be very careful about how we register the global shrinkers so that the implicit call order is always correct. With a per-sb shrinker callout, we can encode this dependency directly into the per-sb shrinker, hence avoiding the need for strictly ordering shrinker registrations. We also have no need for any proportioning code for the shrinker subsystem already provides this functionality across all shrinkers. Allowing the shrinker to operate on a single superblock at a time means that we do less superblock list traversals and locking and reclaim should batch more effectively. This should result in less CPU overhead for reclaim and potentially faster reclaim of items from each filesystem. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:10 -04:00

... 3 4 5 6 7 ...

24030 Commits