Commit Graph

1470 Commits

Author SHA1 Message Date
J. Bruce Fields 64a318ee2a NLM: Further cancel fixes
If the server receives an NLM cancel call and finds no waiting lock to
 cancel, then chances are the lock has already been applied, and the client
 just hadn't yet processed the NLM granted callback before it sent the
 cancel.

 The Open Group text, for example, perimts a server to return either success
 (LCK_GRANTED) or failure (LCK_DENIED) in this case.  But returning an error
 seems more helpful; the client may be able to use it to recognize that a
 race has occurred and to recover from the race.

 So, modify the relevant functions to return an error in this case.

 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:54 -05:00
J. Bruce Fields 2c5acd2e1a NLM: clean up nlmsvc_delete_block
The fl_next check here is superfluous (and possibly a layering violation).

 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:54 -05:00
J. Bruce Fields 5996a298da NLM: don't unlock on cancel requests
Currently when lockd gets an NLM_CANCEL request, it also does an unlock for
 the same range.  This is incorrect.

 The Open Group documentation says that "This procedure cancels an
 *outstanding* blocked lock request."  (Emphasis mine.)

 Also, consider a client that holds a lock on the first byte of a file, and
 requests a lock on the entire file.  If the client cancels that request
 (perhaps because the requesting process is signalled), the server shouldn't
 apply perform an unlock on the entire file, since that will also remove the
 previous lock that the client was already granted.

 Or consider a lock request that actually *downgraded* an exclusive lock to
 a shared lock.

 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:53 -05:00
J. Bruce Fields f232142cc2 NLM: Clean up nlmsvc_grant_reply locking
Slightly simpler logic here makes it more trivial to verify that the up's
 and down's are balanced here.  Break out an assignment from a conditional
 while we're at it.

 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:53 -05:00
Trond Myklebust a72b44222d NFSv4: Allow user to set the port used by the NFSv4 callback channel
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:52 -05:00
Trond Myklebust a895b4a198 NFS: Clean up weak cache consistency code
...and ensure that nfs_update_inode() respects wcc

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:52 -05:00
Trond Myklebust fa178f29c0 NFSv4: Ensure DELEGRETURN returns attributes
Upon return of a write delegation, the server will almost always bump the
 change attribute. Ensure that we pick up that change so that we don't
 invalidate our data cache unnecessarily.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:51 -05:00
Trond Myklebust beb2a5ec38 NFSv4: Ensure change attribute returned by GETATTR callback conforms to spec
According to RFC3530 we're supposed to cache the change attribute
 at the time the client receives a write delegation.
 If the inode is clean, a CB_GETATTR callback by the server to the
 client is supposed to return the cached change attribute.
 If, OTOH, the inode is dirty, the client should bump the cached
 change attribute by 1.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:51 -05:00
Trond Myklebust 566dd6064e NFS: Make directIO aware of compound pages...
...and avoid calling set_page_dirty on them

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:50 -05:00
Trond Myklebust 70b9ecbdb9 NFS: Make stat() return updated mtimes after a write()
The SuS states that a call to write() will cause mtime to be updated on
 the file. In order to satisfy that requirement, we need to flush out
 any cached writes in nfs_getattr().
 Speed things up slightly by not committing the writes.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:50 -05:00
Trond Myklebust 24174119c7 NFSv4: Ensure that we return the delegation on the target of a rename too.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:50 -05:00
Chuck Lever 40859d7ee6 NFS: support large reads and writes on the wire
Most NFS server implementations allow up to 64KB reads and writes on the
 wire.  The Solaris NFS server allows up to a megabyte, for instance.

 Now the Linux NFS client supports transfer sizes up to 1MB, too.  This will
 help reduce protocol and context switch overhead on read/write intensive NFS
 workloads, and support larger atomic read and write operations on servers
 that support them.

 Test-plan:
 Connectathon and iozone on mount point with wsize=rsize>32768 over TCP.
 Tests with NFS over UDP to verify the maximum RPC payload size cap.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:49 -05:00
Chuck Lever 325cfed9ae NFS: make "inode number mismatch" message more useful
To help NFS users and server developers, make the "inode number mismatch"
 message display more useful information.

 Test-plan:
 None.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:49 -05:00
Chuck Lever dc20f80390 NFS: get rid of useless kernel log message
nfs_statfs() generates a log message when GETATTR returns an error.  This
 is usually a useless message.  Make it a dprintk.

 Test plan:
 None

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:48 -05:00
Chuck Lever 6b59a75460 NFS: Fix error recovery code in fs/nfs/inode.c:__init_nfs()
Red Hat found a problem in the error recovery logic in __init_nfs.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:48 -05:00
Chuck Lever ce1a8e6796 NFS: use generic_write_checks() to sanity check direct writes
Replace ad hoc write parameter sanity checking in nfs_file_direct_write()
 with a call to generic_write_checks().  This should make the proper checks
 modulo the O_LARGEFILE flag, and should catch NFSv2-specific limitations by
 virtue of i_sb->s_maxbytes.

 Test plan:
 Posix compliance testing with both NFSv2 and NFSv3.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:47 -05:00
Trond Myklebust 286d7d6a0c NFSv4: Remove requirement for machine creds for the "setclientid" operation
Use a cred from the nfs4_client->cl_state_owners list.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:47 -05:00
Trond Myklebust b4454fe1a7 NFSv4: Remove requirement for machine creds for the "renew" operation
In RFC3530, the RENEW operation is allowed to use either

 the same principal, RPC security flavour and (if RPCSEC_GSS), the same
  mechanism and service that was used for SETCLIENTID_CONFIRM

 OR

 Any principal, RPC security flavour and service combination that
 currently has an OPEN file on the server.

 Choose the latter since that doesn't require us to keep credentials for
 the same principal for the entire duration of the mount.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:47 -05:00
Trond Myklebust 58d9714a44 NFSv4: Send RENEW requests to the server only when we're holding state
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:46 -05:00
Trond Myklebust 5043e900f5 NFS: Convert instances of kernel_thread() to kthread()
Convert private implementations in NFSv4 state recovery and delegation
 code to use kthreads.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:46 -05:00
Trond Myklebust 433fbe4c88 NFSv4: State recovery cleanup
Use wait_on_bit() when waiting for state recovery to complete.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:45 -05:00
Trond Myklebust 26e976a884 NFSv4: OPEN/LOCK/LOCKU/CLOSE will automatically renew the NFSv4 lease
Cut down on the number of unnecessary RENEW requests on the wire.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:45 -05:00
Trond Myklebust 2bd615797e SUNRPC: Ensure that SIGKILL will always terminate a synchronous RPC call.
...and make sure that the "intr" flag also enables SIGHUP and SIGTERM to
 interrupt RPC calls too (as per the Solaris implementation).

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:45 -05:00
Trond Myklebust fe650407a8 NFSv4: Make DELEGRETURN an interruptible operation.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:44 -05:00
Trond Myklebust a5d16a4d09 NFSv4: Convert LOCK rpc call into an asynchronous RPC call
In order to allow users to interrupt/cancel it.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:44 -05:00
Trond Myklebust 911d1aaf26 NFSv4: locking XDR cleanup
Get rid of some unnecessary intermediate structures

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:44 -05:00
Trond Myklebust 864472e9b8 NFSv4: Make open recovery track O_RDWR, O_RDONLY and O_WRONLY correctly
When recovering from a delegation recall or a network partition, we need
 to replay open(O_RDWR), open(O_RDONLY) and open(O_WRONLY) separately.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:43 -05:00
Trond Myklebust e761692381 NFSv4: Make nfs4_state track O_RDWR, O_RDONLY and O_WRONLY separately
A closer reading of RFC3530 reveals that OPEN_DOWNGRADE must always
 specify a access modes that have been the argument of a previous OPEN
 operation.
 IOW: doing OPEN(O_RDWR) and then OPEN_DOWNGRADE(O_WRONLY) is forbidden
 unless the user called OPEN(O_WRONLY)

 In order to fix that, we really need to track the three possible open
 states separately.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:43 -05:00
Trond Myklebust cdd4e68b5f NFSv4: Make open_confirm() asynchronous too
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:42 -05:00
Trond Myklebust 24ac23ab88 NFSv4: Convert open() into an asynchronous RPC call
OPEN is a stateful operation, so we must ensure that it always
 completes. In order to allow users to interrupt the operation,
 we need to make the RPC call asynchronous, and then wait on
 completion (or cancel).

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:42 -05:00
Trond Myklebust e56e0b78eb NFSv4: Allocate OPEN call RPC arguments using kmalloc()
Cleanup in preparation for making OPEN calls interruptible by the user.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:41 -05:00
Trond Myklebust 06f814a3ad NFSv4: Make locku use the new RPC "wait on completion" interface.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:40 -05:00
Trond Myklebust 44c288732f NFSv4: stateful NFSv4 RPC call interface
The NFSv4 model requires us to complete all RPC calls that might
 establish state on the server whether or not the user wants to
 interrupt it. We may also need to schedule new work (including
 new RPC calls) in order to cancel the new state.

 The asynchronous RPC model will allow us to ensure that RPC calls
 always complete, but in order to allow for "synchronous" RPC, we
 want to add the ability to wait for completion.
 The waits are, of course, interruptible.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:40 -05:00
Trond Myklebust 4ce70ada1f SUNRPC: Further cleanups
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:40 -05:00
Trond Myklebust 963d8fe533 RPC: Clean up RPC task structure
Shrink the RPC task structure. Instead of storing separate pointers
 for task->tk_exit and task->tk_release, put them in a structure.

 Also pass the user data pointer as a parameter instead of passing it via
 task->tk_calldata. This enables us to nest callbacks.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:39 -05:00
Trond Myklebust abd3e641d5 NFS: Work correctly with single-page ->writepage() calls
Ensure that we always initiate flushing of data before we exit
 a single-page ->writepage() call.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:39 -05:00
Linus Torvalds d99cf9d679 Merge branch 'post-2.6.15' of git://brick.kernel.dk/data/git/linux-2.6-block
Manual fixup for merge with Jens' "Suspend support for libata", commit
ID 9b84754866.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 09:01:25 -08:00
Neil Brown 9f708e40fe [PATCH] knfsd: reduce stack consumption
A typical nfsd call trace is
 nfsd -> svc_process -> nfsd_dispatch -> nfsd3_proc_write ->
   nfsd_write ->nfsd_vfs_write -> vfs_writev

These add up to over 300 bytes on the stack.
Looking at each of these, I see that nfsd_write (which includes
 nfsd_vfs_write) contributes 0x8c to stack usage itself!!

It turns out this is because it puts a 'struct iattr' on the stack so
it can kill suid if needed.  The following patch saves about 50 bytes
off the stack in this call path.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:59 -08:00
David Shaw a334de2866 [PATCH] knfsd: check error status from vfs_getattr and i_op->fsync
Both vfs_getattr and i_op->fsync return error statuses which nfsd was
largely ignoring.  This as noticed when exporting directories using fuse.

This patch cleans up most of the offences, which involves moving the call
to vfs_getattr out of the xdr encoding routines (where it is too late to
report an error) into the main NFS procedure handling routines.

There is still a called to vfs_gettattr (related to the ACL code) where the
status is ignored, and called to nfsd_sync_dir don't check return status
either.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:59 -08:00
Jan Kara f93ea411b7 [PATCH] jbd: split checkpoint lists
Split the checkpoint list of the transaction into two lists.  In the first
list we keep the buffers that need to be submitted for IO.  In the second
list are kept buffers that were already submitted and we just have to wait
for the IO to complete.  This should simplify a handling of checkpoint
lists a bit and can eventually be also a performance gain.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:59 -08:00
Miklos Szeredi 39ee059aff [PATCH] fuse: check file type in lookup
Previously invalid types were quietly changed to regular files, but at
revalidation the inode was changed to bad.  This was rather inconsistent
behavior.

Now check if the type is valid on initial lookup, and return -EIO if not.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:56 -08:00
Miklos Szeredi 6ad84acab9 [PATCH] fuse: ensure progress in read and write
In direct_io mode, send at least one page per reqest.  Previously it was
possible that reqests with zero data were sent, and hence the read/write
didn't make any progress, resulting in an infinite (though interruptible)
loop.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:56 -08:00
Miklos Szeredi 3ec870d524 [PATCH] fuse: make maximum write data configurable
Make the maximum size of write data configurable by the filesystem.  The
previous fixed 4096 limit only worked on architectures where the page size is
less or equal to this.  This change make writing work on other architectures
too, and also lets the filesystem receive bigger write requests in direct_io
mode.

Normal writes which go through the page cache are still limited to a page
sized chunk per request.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:56 -08:00
Miklos Szeredi 1d3d752b47 [PATCH] fuse: clean up request size limit checking
Change the way a too large request is handled.  Until now in this case the
device read returned -EINVAL and the operation returned -EIO.

Make it more flexibible by not returning -EINVAL from the read, but restarting
it instead.

Also remove the fixed limit on setxattr data and let the filesystem provide as
large a read buffer as it needs to handle the extended attribute data.

The symbolic link length is already checked by VFS to be less than PATH_MAX,
so the extra check against FUSE_SYMLINK_MAX is not needed.

The check in fuse_create_open() against FUSE_NAME_MAX is not needed, since the
dentry has already been looked up, and hence the name already checked.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:56 -08:00
Miklos Szeredi 248d86e87d [PATCH] fuse: fail file operations on bad inode
Make file operations on a bad inode fail.  This just makes things a
bit more consistent.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:55 -08:00
Miklos Szeredi 6f9f11806a [PATCH] fuse: add code documentation
Document some not-so-trivial functions.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:55 -08:00
Miklos Szeredi 8cbdf1e6f6 [PATCH] fuse: support caching negative dentries
Add support for caching negative dentries.

Up till now, ->d_revalidate() always forced a new lookup on these.  Now let
the lookup method return a zero node ID (not used for anything else) meaning a
negative entry, but with a positive cache timeout.  The old way of signaling
negative entry (replying ENOENT) still works.

Userspace should check the ABI minor version to see whether sending a zero ID
is allowed by the kernel or not.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:55 -08:00
Miklos Szeredi de5f120255 [PATCH] fuse: add frsize to statfs reply
Add 'frsize' member to the statfs reply.

I'm not sure if sending f_fsid will ever be needed, but just in case leave
some space at the end of the structure, so less compatibility mess would be
required.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:55 -08:00
Miklos Szeredi 45714d6561 [PATCH] fuse: bump interface version
Change interface version to 7.4.

Following changes will need backward compatibility support, so store the minor
version returned by userspace.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:55 -08:00
Miklos Szeredi 4633a22e7a [PATCH] fuse: clean up page offset calculation
Use page_offset() instead of doing page offset calculation by hand.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:55 -08:00