linux

Commit Graph

Author	SHA1	Message	Date
Peter Zijlstra	d08b3851da	[PATCH] mm: tracking shared dirty pages Tracking of dirty pages in shared writeable mmap()s. The idea is simple: write protect clean shared writeable pages, catch the write-fault, make writeable and set dirty. On page write-back clean all the PTE dirty bits and write protect them once again. The implementation is a tad harder, mainly because the default backing_dev_info capabilities were too loosely maintained. Hence it is not enough to test the backing_dev_info for cap_account_dirty. The current heuristic is as follows, a VMA is eligible when: - its shared writeable (vm_flags & (VM_WRITE\|VM_SHARED)) == (VM_WRITE\|VM_SHARED) - it is not a 'special' mapping (vm_flags & (VM_PFNMAP\|VM_INSERTPAGE)) == 0 - the backing_dev_info is cap_account_dirty mapping_cap_account_dirty(vma->vm_file->f_mapping) - f_op->mmap() didn't change the default page protection Page from remap_pfn_range() are explicitly excluded because their COW semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and because they don't have a backing store anyway. mprotect() is taught about the new behaviour as well. However it overrides the last condition. Cleaning the pages on write-back is done with page_mkclean() a new rmap call. It can be called on any page, but is currently only implemented for mapped pages, if the page is found the be of a VMA that accounts dirty pages it will also wrprotect the PTE. Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from under ->private_lock. This seems to be safe, since ->private_lock is used to serialize access to the buffers, not the page itself. This is needed because clear_page_dirty() will call into page_mkclean() and would thereby violate locking order. [dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:44 -07:00
Jan Kara	3998b9301d	[PATCH] jbd: fix commit of ordered data buffers Original commit code assumes, that when a buffer on BJ_SyncData list is locked, it is being written to disk. But this is not true and hence it can lead to a potential data loss on crash. Also the code didn't count with the fact that journal_dirty_data() can steal buffers from committing transaction and hence could write buffers that no longer belong to the committing transaction. Finally it could possibly happen that we tried writing out one buffer several times. The patch below tries to solve these problems by a complete rewrite of the data commit code. We go through buffers on t_sync_datalist, lock buffers needing write out and store them in an array. Buffers are also immediately refiled to BJ_Locked list or unfiled (if the write out is completed). When the array is full or we have to block on buffer lock, we submit all accumulated buffers for IO. [suitable for 2.6.18.x around the 2.6.19-rc2 timeframe] Signed-off-by: Jan Kara <jack@suse.cz> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-26 08:48:44 -07:00
Ian Kent	c0ba7e5147	[PATCH] autofs4: zero timeout prevents shutdown If the timeout of an autofs mount is set to zero then umounts are disabled. This works fine, however the kernel module checks the expire timeout and goes no further if it is zero. This is not the right thing to do at shutdown as the module is passed an option to expire mounts regardless of their timeout setting. This patch allows autofs to honor the force expire option. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-25 17:38:35 -07:00
Mark Fasheh	0d5dc6c2dd	ocfs2: Teach ocfs2_drop_lock() to use ->set_lvb() callback With this, we don't need to pass an additional struct with function pointer. Now that the callbacks are fully used, comment the remaining API. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:48 -07:00
Mark Fasheh	b5e500e23e	ocfs2: Remove ->unblock lockres operation Have ocfs2_process_blocked_lock() call ocfs2_generic_unblock_lock(), which gets to be ocfs2_unblock_lock() now that it's the only possible unblock function. Remove the ->unblock() callback from the structure, and all lock type specific unblock functions. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:48 -07:00
Mark Fasheh	cc567d89b3	ocfs2: move downconvert worker to lockres ops This way lock types don't have to manually pass it to ocfs2_generic_unblock_lock(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:48 -07:00
Mark Fasheh	08280f11de	ocfs2: Remove unused dlmglue functions The meta data unblocking code no longer needs ocfs2_do_unblock_meta() or ocfs2_can_downconvert_meta_lock(), so remove them. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:48 -07:00
Mark Fasheh	810d5aeba1	ocfs2: Have the metadata lock use generic dlmglue functions Fill in the ->check_downconvert and ->set_lvb callbacks with meta data specific operations and switch ocfs2_unblock_meta() to call ocfs2_generic_unblock_lock() Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:47 -07:00
Mark Fasheh	5ef0d4ea08	ocfs2: Add ->set_lvb callback in dlmglue This allows a lock type to set the value block before downconvert. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:47 -07:00
Mark Fasheh	16d5b9567a	ocfs2: Add ->check_downconvert callback in dlmglue This will allow lock types to force a requeue of a lock downconvert. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:47 -07:00
Mark Fasheh	f7fbfdd1fc	ocfs2: Check for refreshing locks in generic unblock function Tidy up the exit path a bit too. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:47 -07:00
Mark Fasheh	b80fc012e0	ocfs2: don't unconditionally pass LVB flags Allow a lock type to specifiy whether it makes use of the LVB. The only type which does this right now is the meta data lock. This should save us some space on network messages since they won't have to needlessly transmit value blocks. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:47 -07:00
Mark Fasheh	aa2623ad80	ocfs2: combine inode and generic blocking AST functions There is extremely little difference between the two now. We can remove the callback from ocfs2_lock_res_ops as well. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:46 -07:00
Mark Fasheh	54a7e7552e	ocfs2: Add ->get_osb() dlmglue locking operation Will be used to find the ocfs2_super structure from a given lockres. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:46 -07:00
Mark Fasheh	2a45f2d13e	ocfs2: remove ->unlock_ast() callback from ocfs2_lock_res_ops This was always defined to the same function in all locks, so clean things up by removing and passing ocfs2_unlock_ast() directly to the DLM. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:46 -07:00
Mark Fasheh	e92d57df27	ocfs2: combine inode and generic AST functions There is extremely little difference between the two now. We can remove the callback from ocfs2_lock_res_ops as well. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:46 -07:00
Mark Fasheh	f625c9793b	ocfs2: Clean up lock resource refresh flags Use of the refresh mechanism is lock-type wide, so move knowledge of that to the ocfs2_lock_res_ops structure. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:46 -07:00
Mark Fasheh	24c19ef404	ocfs2: Remove i_generation from inode lock names OCFS2 puts inode meta data in the "lock value block" provided by the DLM. Typically, i_generation is encoded in the lock name so that a deleted inode on and a new one in the same block don't share the same lvb. Unfortunately, that scheme means that the read in ocfs2_read_locked_inode() is potentially thrown away as soon as the meta data lock is taken - we cannot encode the lock name without first knowing i_generation, which requires a disk read. This patch encodes i_generation in the inode meta data lvb, and removes the value from the inode meta data lock name. This way, the read can be covered by a lock, and at the same time we can distinguish between an up to date and a stale LVB. This will help cold-cache stat(2) performance in particular. Since this patch changes the protocol version, we take the opportunity to do a minor re-organization of two of the LVB fields. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:46 -07:00
Mark Fasheh	f9e2d82e63	ocfs2: Encode i_generation in the meta data lvb When i_generation is removed from the lockname, this will help us determine whether a meta data lvb has information that is in sync with the local struct inode. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:45 -07:00
Mark Fasheh	4d3b83f736	ocfs2: Free up some space in the lvb lvb_version doesn't need to be a whole 32 bits. Make it an 8 bit field to free up some space. This should be backwards compatible until we use one of the fields, in which case we'd bump the lvb version anyway. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:45 -07:00
Mark Fasheh	0027dd5bc2	ocfs2: Remove special casing for inode creation in ocfs2_dentry_attach_lock() We can't use LKM_LOCAL for new dentry locks because an unlink and subsequent re-create of a name/inode pair may result in the lock still being mastered somewhere in the cluster. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:45 -07:00
Mark Fasheh	1ba9da2ffa	ocfs2: manually d_move() during ocfs2_rename() Make use of FS_RENAME_DOES_D_MOVE to avoid a race condition that can occur during ->rename() if we d_move() outside of the parent directory cluster locks, and another node discovers the new name (created during the rename) and unlinks it. d_move() will unconditionally rehash a dentry - which will leave stale data in the system. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:45 -07:00
Mark Fasheh	349457ccf2	[PATCH] Allow file systems to manually d_move() inside of ->rename() Some file systems want to manually d_move() the dentries involved in a rename. We can do this by making use of the FS_ODD_RENAME flag if we just have nfs_rename() unconditionally do the d_move(). While there, we rename the flag to be more descriptive. OCFS2 uses this to protect that part of the rename operation with a cluster lock. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org>	2006-09-24 13:50:45 -07:00
Mark Fasheh	1390334b4c	ocfs2: Remove the dentry vote This is unused now. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:43 -07:00
Mark Fasheh	379dfe9d0d	ocfs2: Hook rest of the file system into dentry locking API Actually replace the vote calls with the new dentry operations. Make any necessary adjustments to get the scheme to work. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:43 -07:00
Mark Fasheh	80c05846f6	ocfs2: Add dentry tracking API Replace the dentry vote mechanism with a cluster lock which covers a set of dentries. This allows us to force d_delete() only on nodes which actually care about an unlink. Every node that does a ->lookup() gets a read only lock on the dentry, until an unlink during which the unlinking node, will request an exclusive lock, forcing the other nodes who care about that dentry to d_delete() it. The effect is that we retain a very lightweight ->d_revalidate(), and at the same time get to make large improvements to the average case performance of the ocfs2 unlink and rename operations. This patch adds the higher level API and the dentry manipulation code. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:43 -07:00
Mark Fasheh	d680efe9d8	ocfs2: Add new cluster lock type Replace the dentry vote mechanism with a cluster lock which covers a set of dentries. This allows us to force d_delete() only on nodes which actually care about an unlink. Every node that does a ->lookup() gets a read only lock on the dentry, until an unlink during which the unlinking node, will request an exclusive lock, forcing the other nodes who care about that dentry to d_delete() it. The effect is that we retain a very lightweight ->d_revalidate(), and at the same time get to make large improvements to the average case performance of the ocfs2 unlink and rename operations. This patch adds the cluster lock type which OCFS2 can attach to dentries. A small number of fs/ocfs2/dcache.c functions are stubbed out so that this change can compile. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:42 -07:00
Mark Fasheh	f0681062b8	ocfs2: Update dlmglue for new dlmlock() API File system lock names are very regular right now, so we really only need to pass an extra parameter to dlmlock(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:42 -07:00
Mark Fasheh	ea5b3a187e	ocfs2: Update dlmfs for new dlmlock() API We just need to add a namelen field to the user_lock_res structure, and update a few debug prints. Instead of updating all debug prints, I took the opportunity to remove a few that are likely unnecessary these days. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:42 -07:00
Mark Fasheh	3384f3df5e	ocfs2: Allow binary names in the DLM The OCFS2 DLM uses strlen() to determine lock name length, which excludes the possibility of putting binary values in the name string. Fix this by requiring that string length be passed in as a parameter. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:42 -07:00
Mark Fasheh	e2c73698af	ocfs2: Silence dlm error print An AST can be delivered via the network after a lock has been removed, so no need to print an error when we see that. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2006-09-24 13:50:41 -07:00
Jeff Garzik	e18fa700c9	Move several *_SUPER_MAGIC symbols to include/linux/magic.h. Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-09-24 11:13:19 -04:00
Chuck Lever	026ed5c918	NFS: unmark NFS direct I/O as experimental Remove the EXPERIMENTAL flag from the NFS_DIRECTIO option. Test plan: Unset the EXPERIMENTAL kernel build option and check to see that the NFS direct I/O option is still available. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:06 -04:00
Chuck Lever	f551e44ff1	NFS: add comments clarifying the use of nfs_post_op_update() Comments-only change to clarify a detail of the NFS protocol and how it is implemented in Linux. Test plan: None. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:05 -04:00
Josef 'Jeff' Sipek	aec5e17528	NFS: Use SEEK_END instead of hardcoded value Signed-off-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:04 -04:00
Trond Myklebust	51b6ded4d9	NFSv4: When mounting with a port=0 argument, substitute port=2049 RFC3530 states that the registered port 2049 for the NFS protocol should be the default configuration in order to allow clients not to use the RPC binding protocols. If the mount program sends us a port=0, we therefore substitute port=2049. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:04 -04:00
Trond Myklebust	2066fe89b4	NFSv4: Poll more aggressively when handling NFS4ERR_DELAY Change the initial retry delay from 1s to 0.1s (and then back off exponentially). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:04 -04:00
Trond Myklebust	c514983d8d	NFSv4: Handle the condition NFS4ERR_FILE_OPEN Retry a few times before we give up: the error is usually due to ordering issues with asynchronous RPC calls. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:03 -04:00
Trond Myklebust	6b30954ebb	NFSv4: Retry lease recovery if it failed during a synchronous operation. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:03 -04:00
Trond Myklebust	97db8f4179	NFS: Don't invalidate the symlink we just stuffed into the cache And slight optimisation of nfs_end_data_update(): directories never have delegations anyway. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:03 -04:00
Trond Myklebust	5f004cf2aa	NFS: Make read() return an ESTALE if the file has been deleted Currently, a read() request will return EIO even if the file has been deleted on the server, simply because that is what the VM will return if the call to readpage() fails to update the page. Ensure that readpage() marks the inode as stale if it receives an ESTALE. Then return that error to userland. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:02 -04:00
J. Bruce Fields	2dec51466a	NFSv4: It's perfectly legal for clp to be NULL here.... Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:02 -04:00
Trond Myklebust	fd6840714d	NFS: nfs_lookup - don't hash dentry when optimising away the lookup If the open intents tell us that a given lookup is going to result in a, exclusive create, we currently optimize away the lookup call itself. The reason is that the lookup would not be atomic with the create RPC call, so why do it in the first place? A problem occurs, however, if the VFS aborts the exclusive create operation after the lookup, but before the call to create the file/directory: in this case we will end up with a hashed negative dentry in the dcache that has never been looked up. Fix this by only actually hashing the dentry once the create operation has been successfully completed. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:25:01 -04:00
andros@citi.umich.edu	297de4f656	Fix a referral error Oops Fix an oops when the referral server is not responding. Check the error return from nfs4_set_client() in nfs4_create_referral_server. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:56 -04:00
Chuck Lever	058ad9cbf1	NFS: NFS_ROOT should use the new rpc_create API Teach NFS_ROOT to use the new rpc_create API instead of the old two-call API for creating an RPC transport. Test plan: Compile the kernel with the NFS client build-in, and set CONFIG_NFS_ROOT. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:55 -04:00
David Howells	6daabf1b04	NFS: Fix up compiler warnings on 64-bit platforms in client.c Fix up warnings from compiling on ppc64. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:55 -04:00
Trond Myklebust	158998b6fe	SUNRPC: Make rpc_mkpipe() take the parent dentry as an argument Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:54 -04:00
Trond Myklebust	5dd3177ae5	NFSv4: Fix a use-after-free issue with the nfs server. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:54 -04:00
Trond Myklebust	275a082fe9	Add a real API for dealing with blk_congestion_wait() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:54 -04:00
Chuck Lever	94a6d75320	NFS: Use cached page as buffer for NFS symlink requests Now that we have a copy of the symlink path in the page cache, we can pass a struct page down to the XDR routines instead of a string buffer. Test plan: Connectathon, all NFS versions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2006-09-22 23:24:53 -04:00

1 2 3 4 5 ...

3356 Commits