Commit Graph

5851 Commits

Author SHA1 Message Date
Trond Myklebust a50f7951a3 NFS: Fix an Oops in the nfs_access_cache_shrinker()
The nfs_access_cache_shrinker may race with nfs_access_zap_cache().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:25 -04:00
Trond Myklebust e2f032e9ef NFS: nfs3_proc_create() should use nfs_post_op_update_inode()
Also get rid of a redundant call to nfs_setattr_update_inode(). The call to
nfs3_proc_setattr() already takes care of that.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:25 -04:00
Jeff Layton aa53ed541a NFS4: on a O_EXCL OPEN make sure SETATTR sets the fields holding the verifier
The Linux NFS4 client simply skips over the bitmask in an O_EXCL open
call and so it doesn't bother to reset any fields that may be holding
the verifier. This patch has us save the first two words of the bitmask
(which is all the current client has #defines for). The client then
later checks this bitmask and turns on the appropriate flags in the
sattr->ia_verify field for the following SETATTR call.

This patch only currently checks to see if the server used the atime
and mtime slots for the verifier (which is what the Linux server uses
for this). I'm not sure of what other fields the server could
reasonably use, but adding checks for others should be trivial.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:25 -04:00
Trond Myklebust fc6ae3cf48 NFS: Re-enable forced umounts
They disappeared some time around 2.6.18.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:25 -04:00
Jeff Layton 83d93f2229 NFS: Use GFP_HIGHUSER for page allocation in nfs_symlink()
nfs_symlink() allocates a GFP_KERNEL page for the pagecache. Most
pagecache pages are allocated using GFP_HIGHUSER, and there's no reason
not to do that in nfs_symlink() as well.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2007-07-10 23:40:25 -04:00
Trond Myklebust a0356862bc NFS: Fix nfs_reval_fsid()
We don't need to revalidate the fsid on the root directory. It suffices to
revalidate it on the current directory.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:24 -04:00
Trond Myklebust b39e625b6e NFSv4: Clean up nfs4_call_async()
Use rpc_run_task() instead of doing it ourselves.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:24 -04:00
Trond Myklebust 4a35bd41af NFSv4: Ensure that nfs4_do_close() doesn't race with umount
nfs4_do_close() does not currently have any way to ensure that the user
won't attempt to unmount the partition while the asynchronous RPC call
is completing. This again may cause Oopses in nfs_update_inode().

Add a vfsmount argument to nfs4_close_state to ensure that the partition
remains mounted while we're closing the file.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:24 -04:00
Trond Myklebust ad389da79f NFSv4: Ensure asynchronous open() calls always pin the mountpoint
A number of race conditions may currently ensue if the user presses ^C
and then unmounts the partition while an asynchronous open() is in
progress.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:24 -04:00
Trond Myklebust 539cd03a57 NFSv4: Cleanup: pass the nfs_open_context to open recovery code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:24 -04:00
Trond Myklebust 88be9f990f NFS: Replace vfsmount and dentry in nfs_open_context with struct path
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:23 -04:00
Trond Myklebust de05a0cc2a NFS: Minor read optimisation...
Since PG_uptodate may now end up getting set during the call to
nfs_wb_page(), we can avoid putting a read request on the wire in those
situations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:23 -04:00
Trond Myklebust 44dd151d5c NFS: Don't mark a written page as uptodate until it is on disk
The write may fail, so we should not mark the page as uptodate until we are
certain that the data has been accepted and written to disk by the server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:23 -04:00
Trond Myklebust d9df8d6b38 NFS: Don't fail an O_DIRECT read/write if get_user_pages() returns pages
There is no need to fail the entire O_DIRECT read/write just because
get_user_pages() returned fewer pages than we requested.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:23 -04:00
Chuck Lever 070ea60214 NFS: Clean ups in fs/nfs/direct.c
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:23 -04:00
Pavel Emelianov bcf67e1625 Make common helpers for seq_files that work with list_heads
Many places in kernel use seq_file API to iterate over a regular list_head.
The code for such iteration is identical in all the places, so it's worth
introducing a common helpers.

This makes code about 300 lines smaller:

The first version of this patch made the helper functions static inline
in the seq_file.h header. This patch moves them to the fs/seq_file.c as
Andrew proposed. The vmlinux .text section sizes are as follows:

2.6.22-rc1-mm1:              0x001794d5
with the previous version:   0x00179505
with this patch:             0x00179135

The config file used was make allnoconfig with the "y" inclusion of all
the possible options to make the files modified by the patch compile plus
drivers I have on the test node.

This patch:

Many places in kernel use seq_file API to iterate over a regular list_head.
The code for such iteration is identical in all the places, so it's worth
introducing a common helpers.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-10 17:51:13 -07:00
Linus Torvalds 9f9d763216 Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
  [S390] vmlogrdr function annotation.
  [S390] s390: rename CPU_IDLE to S390_CPU_IDLE
  [S390] cio: Remove prototype for non-existing function cmf_reset().
  [S390] zcrypt: fix request timeout handling
  [S390] system call optimization.
  [S390] dasd: Avoid compile warnings on !CONFIG_DASD_PROFILE
  [S390] Remove volatile from atomic_t
  [S390] Program check in diag 210 under 31 bit
  [S390] Bogomips calculation for 64 bit.
  [S390] smp: Merge smp_count_cpus() and smp_get_save_areas().
  [S390] zcore: Fix __user annotation.
  [S390] fixed cdl-format detection.
  [S390] sclp: Test facility list before executing a service call.
  [S390] sclp: introduce some new interfaces.
  [S390] Fixed comment typo.
  [S390] vmcp cleanup
2007-07-10 14:46:09 -07:00
Linus Torvalds 1b21f458dd Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (57 commits)
  [GFS2] Accept old format NFS filehandles
  [GFS2] Small fixes to logging code
  [DLM] dump more lock values
  [GFS2] Remove i_mode passing from NFS File Handle
  [GFS2] Obtaining no_formal_ino from directory entry
  [GFS2] git-gfs2-nmw-build-fix
  [GFS2] System won't suspend with GFS2 file system mounted
  [GFS2] remounting w/o acl option leaves acls enabled
  [GFS2] inode size inconsistency
  [DLM] Telnet to port 21064 can stop all lockspaces
  [GFS2] Fix gfs2_block_truncate_page err return
  [GFS2] Addendum to the journaled file/unmount patch
  [GFS2] Simplify multiple glock aquisition
  [GFS2] assertion failure after writing to journaled file, umount
  [GFS2] Use zero_user_page() in stuffed_readpage()
  [GFS2] Remove bogus '\0' in rgrp.c
  [GFS2] Journaled file write/unstuff bug
  [DLM] don't require FS flag on all nodes
  [GFS2] Fix deallocation issues
  [GFS2] return conflicts for GETLK
  ...
2007-07-10 13:56:13 -07:00
Linus Torvalds 01370f0603 Merge branch 'splice-2.6.23' of git://git.kernel.dk/data/git/linux-2.6-block
* 'splice-2.6.23' of git://git.kernel.dk/data/git/linux-2.6-block:
  pipe: add documentation and comments
  pipe: change the ->pin() operation to ->confirm()
  Remove remnants of sendfile()
  xip sendfile removal
  splice: completely document external interface with kerneldoc
  sendfile: remove bad_sendfile() from bad_file_ops
  shmem: convert to using splice instead of sendfile()
  relay: use splice_to_pipe() instead of open-coding the pipe loop
  pipe: allow passing around of ops private pointer
  splice: divorce the splice structure/function definitions from the pipe header
  splice: relay support
  sendfile: convert nfsd to splice_direct_to_actor()
  sendfile: convert nfs to using splice_read()
  loop: convert to using splice_direct_to_actor() instead of sendfile()
  splice: add void cookie to the actor data
  sendfile: kill generic_file_sendfile()
  sendfile: remove .sendfile from filesystems that use generic_file_sendfile()
  sys_sendfile: switch to using ->splice_read, if available
  vmsplice: add vmsplice-to-user support
  splice: abstract out actor data
2007-07-10 13:51:06 -07:00
Steven Whitehouse 3ebf44902f [GFS2] Accept old format NFS filehandles
On Tue, 2007-07-10 at 10:06 +0100, Christoph Hellwig wrote:
> > -#define GFS2_LARGE_FH_SIZE 10
> > -
> > -struct gfs2_fh_obj {
> > -   struct gfs2_inum_host this;
> > -   u32 imode;
> > -};
> > +#define GFS2_LARGE_FH_SIZE 8
>
> Because gfs2_decode_fh only accepts file handles with GFS2_LARGE_FH_SIZE
> or GFS2_LARGE_FH_SIZE you don't accept filehandles sent out by and older
> gfs version anymore.  Stale filehandles because of a new kernel version
> are a big no-no, so please add back code to handle the old filehandles
> on the decode side.
>

This should fix that problem I think since its only relating to end of
the fh we can just ignore that field in order to accept the older
format.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Wendy Cheng <wcheng@redhat.com>
2007-07-10 12:28:27 +01:00
Stefan Haberland bf1a95a225 [S390] fixed cdl-format detection.
CDL formated DASDs are now detected correctly even if no VOL1 label is
on the disk. This prevents possible loss of data.

Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-07-10 11:24:44 +02:00
Jens Axboe 0845718daf pipe: add documentation and comments
As per Andrew Mortons request, here's a set of documentation for
the generic pipe_buf_operations hooks, the pipe, and pipe_buffer
structures.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:16 +02:00
Jens Axboe cac36bb06e pipe: change the ->pin() operation to ->confirm()
The name 'pin' was badly chosen, it doesn't pin a pipe buffer
in the most commonly used sense in the kernel. So change the
name to 'confirm', after debating this issue with Hugh
Dickins a bit.

A good return from ->confirm() means that the buffer is really
there, and that the contents are good.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:15 +02:00
Jens Axboe d96e6e7164 Remove remnants of sendfile()
There are now zero users of .sendfile() in the kernel, so kill
it from the file_operations structure and in do_sendfile().

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:15 +02:00
Carsten Otte d054fe3d10 xip sendfile removal
This patch removes xip_file_sendfile, the sendfile implementation for
xip without replacement. Those customers that use xip on s390 are not
using sendfile() as far as we know, and so far s390 is the only platform
this could potentially be used on so far.
Having sendfile is not a popular feature for execute in place file
systems, however we have a working implementation of splice_read() based
on fs/splice.c if anyone asks for it.
At this point in time, it does not seem preferable to merge
splice_read() for xip because it causes extra maintenence effort due to
code duplication and it requires struct page behind the xip memory
segment. We'd like to get rid of that in favor of supporting flash based
embedded platforms (Monta Vista work) soon.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:15 +02:00
Jens Axboe 932cc6d4f7 splice: completely document external interface with kerneldoc
Also add fs/splice.c as a kerneldoc target with a smaller blurb that
should be expanded to better explain the overview of splice.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:15 +02:00
Jens Axboe d6f517568f sendfile: remove bad_sendfile() from bad_file_ops
do_sendfile() prefers splice over sendfile, so it should not trigger
(directly, at least).

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:15 +02:00
Jens Axboe 497f9625c2 pipe: allow passing around of ops private pointer
relay needs this for proper consumption handling, and the network
receive support needs it as well to lookup the sk_buff on pipe
release.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:14 +02:00
Jens Axboe d6b29d7cee splice: divorce the splice structure/function definitions from the pipe header
We need to move even more stuff into the header so that folks can use
the splice_to_pipe() implementation instead of open-coding a lot of
pipe knowledge (see relay implementation), so move to our own header
file finally.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:14 +02:00
Jens Axboe cf8208d0ea sendfile: convert nfsd to splice_direct_to_actor()
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:14 +02:00
Jens Axboe f0930fffa9 sendfile: convert nfs to using splice_read()
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:14 +02:00
Jens Axboe 5ffc4ef45b sendfile: remove .sendfile from filesystems that use generic_file_sendfile()
They can use generic_file_splice_read() instead. Since sys_sendfile() now
prefers that, there should be no change in behaviour.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:13 +02:00
Jens Axboe 534f2aaa6a sys_sendfile: switch to using ->splice_read, if available
This patch makes sendfile prefer to use ->splice_read(), if it's
available in the file_operations structure.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:12 +02:00
Jens Axboe 6a14b90bb6 vmsplice: add vmsplice-to-user support
A bit of a cheat, it actually just copies the data to userspace. But
this makes the interface nice and symmetric and enables people to build
on splice, with room for future improvement in performance.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:12 +02:00
Jens Axboe c66ab6fa70 splice: abstract out actor data
For direct splicing (or private splicing), the output may not be a file.
So abstract out the handling into a specified actor function and put
the data in the splice_desc structure earlier, so we can build on top
of that.

This is the first step in better splice handling for drivers, and also
for implementing vmsplice _to_ user memory.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:12 +02:00
Adrian Bunk 72d3a38ee0 unexport bio_{,un}map_user
bio_{,un}map_user no longer have any modular users.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:03:34 +02:00
Linus Torvalds 71d441ddb5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
  JFS: Update print_hex_dump() syntax
  JFS: use print_hex_dump() rather than private dump_mem() function
  JFS: Whitespace cleanup and remove some dead code
2007-07-09 13:09:16 -07:00
Ingo Molnar 43ae34cb4c sched: scheduler debugging, core
scheduler debugging core: implement /proc/sched_debug and
/proc/<PID>/sched files for scheduler debugging.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-07-09 18:52:00 +02:00
Balbir Singh 172ba844a8 sched: update delay-accounting to use CFS's precise stats
update delay-accounting to use CFS's precise stats.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-07-09 18:52:00 +02:00
Ingo Molnar b27f03d4bd sched: make use of precise accounting for /proc task stats
make use of CFS's precise accounting to drive /proc/<pid>/stat statistics.

this code was co-authored by:

 Balbir Singh <balbir@linux.vnet.ibm.com>
 Dmitry Adamushko <dmitry.adamushko@gmail.com>
 Ingo Molnar <mingo@elte.hu>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
2007-07-09 18:51:59 +02:00
Ingo Molnar 62480d13d5 sched: remove the SleepAVG field
remove the SleepAVG field from /proc/<pid>/status, as
with the removal of the sleep-average code this value
no longer makes sense.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-07-09 18:51:59 +02:00
Steven Whitehouse a0a24741ca [GFS2] Small fixes to logging code
This reverts part of an earlier patch which tried to reclaim
gfs2_bufdata structures too early and resulted in a "use after free"
case (this bit from me). Also a change to not write out log headers
unless we really need to (in the case of flushing nothing we don't need
a header) from Bob.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2007-07-09 15:43:07 +01:00
David Teigland ac90a25525 [DLM] dump more lock values
Add two more output fields (lkb_flags and rsb nodeid) to the new debugfs
file that dumps one lock per line.  Also, dump all locks instead of just
mastered locks.  Accordingly, use a suffix of _locks instead of _master.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:24:13 +01:00
Wendy Cheng 35dcc52e3a [GFS2] Remove i_mode passing from NFS File Handle
GFS2 has been passing i_mode within NFS File Handle. Other than the
wrong assumption that there is always room for this extra 16 bit value,
the current gfs2_get_dentry doesn't really need the i_mode to work
correctly. Note that GFS2 NFS code does go thru the same lookup code
path as direct file access route (where the mode is obtained from name
lookup) but gfs2_get_dentry() is coded for different purpose. It is not
used during lookup time. It is part of the file access procedure call.
When the call is invoked, if on-disk inode is not in-memory, it has to
be read-in. This makes i_mode passing a useless overhead.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:24:11 +01:00
Wendy Cheng bb9bcf0616 [GFS2] Obtaining no_formal_ino from directory entry
GFS2 lookup code doesn't ask for inode shared glock. This implies during
in-memory inode creation for existing file, GFS2 will not disk-read in
the inode contents. This leaves no_formal_ino un-initialized during
lookup time. The un-initialized no_formal_ino is subsequently encoded
into file handle. Clients will get ESTALE error whenever it tries to
access these files.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:24:08 +01:00
akpm@linux-foundation.org f4fadb23ca [GFS2] git-gfs2-nmw-build-fix
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:24:06 +01:00
Abhijith Das b365762924 [GFS2] System won't suspend with GFS2 file system mounted
The kernel threads in gfs2, namely gfs2_scand, gfs2_logd, gfs2_quotad,
gfs2_glockd, gfs2_recoverd weren't doing anything when the suspend
mechanism was trying to freeze them.

I put in calls to refrigerator() in the loops for all the daemons and
suspend works as expected.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:24:04 +01:00
Bob Peterson 569a7b6c2e [GFS2] remounting w/o acl option leaves acls enabled
This patch is for bugzilla bug #245663.  This crosswrites a fix from
gfs1 (bz #210369) so that the mount options are reset properly upon
remount.  This was tested on system trin-10.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:24:01 +01:00
Wendy Cheng 090ffaa55d [GFS2] inode size inconsistency
This should have been part of the NFS patch #1 but somehow I missed it
when packaging the patches. It is not a critical issue as the others (I
hope). RHEL 5.1 31.el5 kernel runs fine without this change.

Our truncate code is chopped into two parts, one for vfs inode changes
(in vmtruncate()) and one of gfs inode (in gfs2_truncatei()). These two
operatons are, unfortunately, not atomic. So it could happens that
vmtruncate() succeeds (inode->i_size is changed) but gfs2_truncatei
fails (say kernel temporarily out of memory). This would leave gfs inode
i_di.di_size out of sync with vfs inode i_size. It will later confuse
gfs2_commit_write() if a write is issued. Last time I checked, it will
cause file corruption.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:59 +01:00
Patrick Caulfield 97d848365e [DLM] Telnet to port 21064 can stop all lockspaces
This patch fixes Red Hat bz#245892

Opening a tcp connection from a cluster member to another cluster member
targeting the dlm port it is enough to stop every dlm operation in the cluster.
This means that GFS and rgmanager will hang.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:57 +01:00
S. Wendy Cheng 1875f2f31b [GFS2] Fix gfs2_block_truncate_page err return
Code segment inside gfs2_block_truncate_page() doesn't set the return
code correctly. This causes NFSD erroneously returns EIO back to client
with setattr procedure call (truncate error).

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:54 +01:00
Robert Peterson 773ed1a044 [GFS2] Addendum to the journaled file/unmount patch
This patch is an addendum to the previous journaled file/unmount patch.
It fixes a problem discovered during testing.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:52 +01:00
Steven Whitehouse eaf5bd3cac [GFS2] Simplify multiple glock aquisition
There is a bug in the code which acquires multiple glocks where if the
initial out-of-order attempt fails part way though we can land up trying
to acquire the wrong number of glocks. This is part of the fix for red
hat bz #239737. The other part of the bz doesn't apply to upstream
kernels since it was fixed by:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d3717bdf8f08a0e1039158c8bab2c24d20f492b6

Since the out-of-order code doesn't appear to add anything to the
performance of GFS2, this patch just removed it rather than trying to
fix it. It should be much easier to see whats going on here now. In
addition, we don't allocate any memory unless we are using a lot of
glocks (which is a relatively uncommon case).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:50 +01:00
Robert Peterson 2332c4435b [GFS2] assertion failure after writing to journaled file, umount
This patch passes all my nasty tests that were causing the code to
fail under one circumstance or another.  Here is a complete summary
of all changes from today's git tree, in order of appearance:

1. There are now separate variables for metadata buffer accounting.
2. Variable sd_log_num_hdrs is no longer needed, since the header
   accounting is taken care of by the reserve/refund sequence.
3. Fixed a tiny grammatical problem in a comment.
4. Added a new function "calc_reserved" to calculate the reserved
   log space.  This isn't entirely necessary, but it has two benefits:
   First, it simplifies the gfs2_log_refund function greatly.
   Second, it allows for easier debugging because I could sprinkle the
   code with calls to this function to make sure the accounting is
   proper (by adding asserts and printks) at strategic point of the code.
5. In log_pull_tail there apparently was a kludge to fix up the
   accounting based on a "pull" parameter.  The buffer accounting is
   now done properly, so the kludge was removed.
6. File sync operations were making a call to gfs2_log_flush that
   writes another journal header.  Since that header was unplanned
   for (reserved) by the reserve/refund sequence, the free space had
   to be decremented so that when log_pull_tail gets called, the free
   space is be adjusted properly.  (Did I hear you call that a kludge?
   well, maybe, but a lot more justifiable than the one I removed).
7. In the gfs2_log_shutdown code, it optionally syncs the log by
   specifying the PULL parameter to log_write_header.  I'm not sure
   this is necessary anymore.  It just seems to me there could be
   cases where shutdown is called while there are outstanding log
   buffers.
8. In the (data)buf_lo_before_commit functions, I changed some offset
   values from being calculated on the fly to being constants.	That
   simplified some code and we might as well let the compiler do the
   calculation once rather than redoing those cycles at run time.
9. This version has my rewritten databuf_lo_add function.
   This version is much more like its predecessor, buf_lo_add, which
   makes it easier to understand.  Again, this might not be necessary,
   but it seems as if this one works as well as the previous one,
   maybe even better, so I decided to leave it in.
10. In databuf_lo_before_commit, a previous data corruption problem
   was caused by going off the end of the buffer.  The proper solution
   is to have the proper limit in place, rather than stopping earlier.
   (Thus my previous attempt to fix it is wrong).
   If you don't wrap the buffer, you're stopping too early and that
   causes more log buffer accounting problems.
11. In lops.h there are two new (previously mentioned) constants for
   figuring out the data offset for the journal buffers.
12. There are also two new functions, buf_limit and databuf_limit to
   calculate how many entries will fit in the buffer.
13. In function gfs2_meta_wipe, it needs to distinguish between pinned
   metadata buffers and journaled data buffers for proper journal buffer
   accounting.	It can't use the JDATA gfs2_inode flag because it's
   sometimes passed the "real" inode and sometimes the "metadata
   inode" and the inode flags will be random bits in a metadata
   gfs2_inode.	It needs to base its decision on which was passed in.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:47 +01:00
Steven Whitehouse 2840501ac8 [GFS2] Use zero_user_page() in stuffed_readpage()
As suggested by Robert P. J. Day <rpjday@mindspring.com>

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Robert P. J. Day <rpjday@mindspring.com>
2007-07-09 08:23:45 +01:00
Steven Whitehouse c4201214cb [GFS2] Remove bogus '\0' in rgrp.c
Not sure how it slipped in, but we don't want it anyway.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:43 +01:00
Robert Peterson 8fb68595d5 [GFS2] Journaled file write/unstuff bug
This patch is for bugzilla bug 283162, which uncovered a number of
bugs pertaining to writing to files that have the journaled bit on.
These bugs happen most often when writing to the meta_fs because
the files are always journaled.  So operations like gfs2_grow were
particularly vulnerable, although many of the problems could be
recreated with normal files after setting the journaled bit on.
The problems fixed are:

-GFS2 wasn't ever writing unstuffed journaled data blocks to their
 in-place location on disk. Now it does.

-If you unmounted too quickly after doing IO to a journaled file,
 GFS2 was crashing because you would discard a buffer whose bufdata
 was still on the active items list.  GFS2 now deals with this
 gracefully.

-GFS2 was losing track of the bufdata for journaled data blocks,
 and it wasn't getting freed, causing an error when you tried to
 unmount the module.  GFS2 now frees all the bufdata structures.

-There was a memory corruption occurring because GFS2 wrote
 twice as many log entries for journaled buffers.

-It was occasionally trying to write journal headers in buffers
 that weren't currently mapped.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:40 +01:00
David Teigland fad59c1390 [DLM] don't require FS flag on all nodes
Mask off the recently added DLM_LSFL_FS flag when setting the exflags.
This way all the nodes in the lockspace aren't required to have the FS
flag set, since we later check that exflags matches among all nodes.

Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:38 +01:00
Abhijith Das d93cfa9884 [GFS2] Fix deallocation issues
There were two issues during deallocation of unlinked inodes. The
first was relating to the use of a "try" lock which in the case of
the inode lock wasn't trying hard enough to deallocate in all
circumstances (now changed to a normal glock) and in the case of
the iopen lock didn't wait for the demotion of the shared lock before
attempting to get the exclusive lock, and thereby sometimes (timing dependent)
not completing the deallocation when it should have done.

The second issue related to the lack of a way to invalidate dcache entries
on remote nodes (now fixed by this patch) which meant that unlinks were
taking a long time to return disk space to the fs. By adding some code to
invalidate the dcache entries across the cluster for unlinked inodes, that
is now fixed.

This patch was written jointly by Abhijith Das and Steven Whitehouse.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:36 +01:00
David Teigland a7a2ff8a95 [GFS2] return conflicts for GETLK
We weren't returning the correct result when GETLK found a conflict,
which is indicated by userspace passing back a 1.

Signed-off-by: Abhijith Das <adas redhat com>
Signed-off-by: David Teigland <teigland redhat com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:33 +01:00
David Teigland d88101d4d8 [GFS2] set plock owner in GETLK info
Set the owner field in the plock info sent to userspace for GETLK.
Without this, gfs_controld won't correctly see when the GETLK from a
process matches one of the process's existing locks.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:31 +01:00
akpm@linux-foundation.org 037bcbb756 [GFS2] gfs2_lookupi() uninitialised var fix
fs/gfs2/inode.c: In function 'gfs2_lookupi':
fs/gfs2/inode.c:392: warning: 'error' may be used uninitialized in this function

Looks like a real bug to me.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:29 +01:00
Steven Whitehouse c8cdf47937 [GFS2] Recovery for lost unlinked inodes
Under certain circumstances its possible (though rather unlikely) that
inodes which were unlinked by one node while still open on another might
get "lost" in the sense that they don't get deallocated if the node
which held the inode open crashed before it was unlinked.

This patch adds the recovery code which allows automatic deallocation of
the inode if its found during block allocation (the sensible time to
look for such inodes since we are scanning the rgrp's bitmaps anyway at
this time, so it adds no overhead to do this).

Since the inode will have had its i_nlink set to zero, all we need to
trigger recovery is a lookup and an iput(), and the normal deallocation
code takes care of the rest.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:26 +01:00
Robert Peterson b35997d448 [GFS2] Can't mount GFS2 file system on AoE device
This patch fixes bug 243131: Can't mount GFS2 file system on AoE device.
When using AoE devices with lock_nolock, there is no locking table, so
gfs2 (and gfs1) uses the superblock s_id.  This turns out to be the device
name in some cases.  In the case of AoE, the device contains a slash,
(e.g. "etherd/e1.1p2") which is an invalid character when we try to
register the table in sysfs.  This patch replaces the "/" with underscore.
Rather than add a new variable to the stack, I'm just reusing a (char *)
variable that's no longer used: table.

This code has been tested on the failing system using a RHEL5 patch.
The upstream code was tested by using gfs2_tool sb to interject a "/"
into the table name of a clustered gfs2 file system.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:24 +01:00
Steven Whitehouse e1cc86037b [GFS2] Fix bug in error path of inode
This fixes a bug in the ordering of operations in the error path of
createi. Its not valid to do an iput() when holding the inode's glock
since the iput() will (in this case) result in delete_inode() being
called which needs to grab the lock itself. This was causing the
recursive lock checking code to trigger.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:22 +01:00
Steven Whitehouse ffed8ab342 [GFS2] Fix typo in rename of directories
A typo caused us to pass a NULL pointer when renaming directories. It
was accidentally introduced in: [GFS2] Clean up inode number handling

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:19 +01:00
Patrick Caulfield 44f487a553 [DLM] variable allocation
Add a new flag, DLM_LSFL_FS, to be used when a file system creates a lockspace.
This flag causes the dlm to use GFP_NOFS for allocations instead of GFP_KERNEL.
(This updated version of the patch uses gfp_t for ls_allocation.)

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-Off-By: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:17 +01:00
Josef Bacik 292e539e93 [DLM] fix reference counting
This is a fix for the patch

021d2ff3a08019260a1dc002793c92d6bf18afb6

I left off a dlm_hold_rsb which causes the box to panic if you try to use
debugfs.  This patch fixes the problem.  Sorry about that,

Signed-off-by: Josef Bacik <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:15 +01:00
Steven Whitehouse 4bd91ba181 [GFS2] Add nanosecond timestamp feature
This adds a nanosecond timestamp feature to the GFS2 filesystem. Due
to the way that the on-disk format works, older filesystems will just
appear to have this field set to zero. When mounted by an older version
of GFS2, the filesystem will simply ignore the extra fields so that
it will again appear to have whole second resolution, so that its
trivially backward compatible.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:12 +01:00
Steven Whitehouse bb8d8a6f54 [GFS2] Fix sign problem in quota/statfs and cleanup _host structures
This patch fixes some sign issues which were accidentally introduced
into the quota & statfs code during the endianess annotation process.
Also included is a general clean up which moves all of the _host
structures out of gfs2_ondisk.h (where they should not have been to
start with) and into the places where they are actually used (often only
one place). Also those _host structures which are not required any more
are removed entirely (which is the eventual plan for all of them).

The conversion routines from ondisk.c are also moved into the places
where they are actually used, which for almost every one, was just one
single place, so all those are now static functions. This also cleans up
the end of gfs2_ondisk.h which no longer needs the #ifdef __KERNEL__.

The net result is a reduction of about 100 lines of code, many functions
now marked static plus the bug fixes as mentioned above. For good
measure I ran the code through sparse after making these changes to
check that there are no warnings generated.

This fixes Red Hat bz #239686

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:10 +01:00
Benjamin Marzinski ddf4b426aa [GFS2] fix jdata issues
This is a patch for the first three issues of RHBZ #238162

The first issue is that when you allocate a new page for a file, it will not
start off uptodate. This makes sense, since you haven't written anything to that
part of the file yet.  Unfortunately, gfs2_pin() checks to make sure that the
buffers are uptodate.  The solution to this is to mark the buffers uptodate in
gfs2_commit_write(), after they have been zeroed out and have the data written
into them.  I'm pretty confident with this fix, although it's not completely
obvious that there is no problem with marking the buffers uptodate here.

The second issue is simply that you can try to pin a data buffer that is already
on the incore log, and thus, already pinned. This patch checks to see if this
buffer is already on the log, and exits databuf_lo_add() if it is, just like
buf_lo_add() does.

The third issue is that gfs2_log_flush() doesn't do it's block accounting
correctly.  Both metadata and journaled data are logged, but gfs2_log_flush()
only compares the number of metadata blocks with the number of blocks to commit
to the ondisk journal.  This patch also counts the journaled data blocks.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:08 +01:00
Patrick Caulfield afb853fb4e [DLM] fix socket shutdown
This patch clears the user_data of active sockets as part of cleanup.
This prevents any late-arriving data from trying to add jobs to the work
queue while we are tidying up.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-Off-By: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:05 +01:00
Steven Whitehouse 89918647a4 [GFS2] Make the log reserved blocks depend on block size
The number of blocks which we reserve in the log at the start of each
transaction needs to depends upon the block size since the overhead is
related to the number of "pointers" which can be fitted into a single
block.

This relates to Red Hat bz #240435

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:03 +01:00
Abhijith Das 1990e91765 [GFS2] Quotas non-functional - fix another bug
This patch fixes a bug where gfs2 was writing update quota usage
information to the wrong location in the quota file.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:23:01 +01:00
David Teigland 0b7cac0fb0 [DLM] show default protocol
Display the initial value of the "protocol" config value in configfs.
The default value has always been 0 in the past anyway, so it's always
appeared to be correct.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:59 +01:00
David Teigland 9dd592d70b [DLM] dumping master locks
Add a new debugfs file that dumps a compact list of mastered locks.
This will be used by a userland daemon to collect state for deadlock
detection.

Also, for the existing function that prints all lock state, lock the rsb
before going through the lock lists since they can be changing in the
course of normal dlm activity.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:56 +01:00
David Teigland 8b4021fa43 [DLM] canceling deadlocked lock
Add a function that can be used through libdlm by a system daemon to cancel
another process's deadlocked lock.  A completion ast with EDEADLK is returned
to the process waiting for the lock.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:54 +01:00
David Teigland 84d8cd69a8 [DLM] timeout fixes
Various fixes related to the new timeout feature:
- add_timeout() missed setting TIMEWARN flag on lkb's when the
  TIMEOUT flag was already set
- clear_proc_locks should remove a dead process's locks from the
  timeout list
- the end-of-life calculation for user locks needs to consider that
  ETIMEDOUT is equivalent to -DLM_ECANCEL
- make initial default timewarn_cs config value visible in configfs
- change bit position of TIMEOUT_CANCEL flag so it's not copied to
  a remote master node
- set timestamp on remote lkb's so a lock dump will display the time
  they've been waiting

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:52 +01:00
Steven Whitehouse b3cab7b9a3 [DLM] Compile fix
A one liner fix which got missed from the earlier patches.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Fabio Massimo Di Nitto <fabbione@ubuntu.com>
Cc: David Teigland <teigland@redhat.com>
2007-07-09 08:22:49 +01:00
David Teigland 639aca417d [DLM] fix compile breakage
In the rush to get the previous patch set sent, a compilation bug I fixed
shortly before sending somehow got clobbered, probably by a missed quilt
refresh or something.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:45 +01:00
David Teigland 8b0e7b2cf3 [DLM] wait for config check during join [6/6]
Joining the lockspace should wait for the initial round of inter-node
config checks to complete before returning.  This way, if there's a
configuration mismatch between the joining node and the existing nodes,
the join can fail and return an error to the application.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:42 +01:00
David Teigland 79d72b5448 [DLM] fix new_lockspace error exit [5/6]
Fix the error path when exiting new_lockspace().  It was kfree'ing the
lockspace struct at the end, but that's only valid if it exits before
kobject_register occured.  After kobject_register we have to let the
kobject do the freeing.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:40 +01:00
David Teigland c85d65e914 [DLM] cancel in conversion deadlock [4/6]
When conversion deadlock is detected, cancel the conversion and return
EDEADLK to the application.  This is a new default behavior where before
the dlm would allow the deadlock to exist indefinately.

The DLM_LKF_NODLCKWT flag can now be used in a conversion to prevent the
dlm from performing conversion deadlock detection/cancelation on it.
The DLM_LKF_CONVDEADLK flag can continue to be used as before to tell the
dlm to demote the granted mode of the lock being converted if it gets into
a conversion deadlock.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:38 +01:00
David Teigland d7db923ea4 [DLM] dlm_device interface changes [3/6]
Change the user/kernel device interface used by libdlm:
- Add ability for userspace to check the version of the interface.  libdlm
  can now adapt to different versions of the kernel interface.
- Increase the size of the flags passed in a lock request so all possible
  flags can be used from userspace.
- Add an opaque "xid" value for each lock.  This "transaction id" will be
  used later to associate locks with each other during deadlock detection.
- Add a "timeout" value for each lock.  This is used along with the
  DLM_LKF_TIMEOUT flag.

Also, remove a fragment of unused code in device_read().

This patch requires updating libdlm which is backward compatible with
older kernels.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:36 +01:00
David Teigland 3ae1acf93a [DLM] add lock timeouts and warnings [2/6]
New features: lock timeouts and time warnings.  If the DLM_LKF_TIMEOUT
flag is set, then the request/conversion will be canceled after waiting
the specified number of centiseconds (specified per lock).  This feature
is only available for locks requested through libdlm (can be enabled for
kernel dlm users if there's a use for it.)

If the new DLM_LSFL_TIMEWARN flag is set when creating the lockspace, then
a warning message will be sent to userspace (using genetlink) after a
request/conversion has been waiting for a given number of centiseconds
(configurable per node).  The time warnings will be used in the future
to do deadlock detection in userspace.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:33 +01:00
David Teigland 85e86edf95 [DLM] block scand during recovery [1/6]
Don't let dlm_scand run during recovery since it may try to do a resource
directory removal while the directory nodes are changing.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:31 +01:00
Josef Bacik 916297aad5 [DLM] keep dlm from panicing when traversing rsb list in debugfs
This problem was originally reported against GFS6.1, but the same issue exists
in upstream DLM.  This patch keeps the rsb iterator assigning under the rsbtbl
list lock.  Each time we process an rsb we grab a reference to it to make sure
it is not freed out from underneath us, and then put it when we get the next rsb
in the list or move onto another list.

Signed-off-by: Josef Bacik <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:29 +01:00
Abhijith Das 2a87ab0806 [GFS2] Quotas non-functional - fix bug
This patch fixes an error in the quota code where a 'struct
gfs2_quota_lvb*' was being passed to gfs2_adjust_quota() instead of a
'struct gfs2_quota_data*'. Also moved 'struct gfs2_quota_lvb' from
fs/gfs2/incore.h to include/linux/gfs2_ondisk.h as per Steve's suggestion.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:26 +01:00
Steven Whitehouse dbb7cae2a3 [GFS2] Clean up inode number handling
This patch cleans up the inode number handling code. The main difference
is that instead of looking up the inodes using a struct gfs2_inum_host
we now use just the no_addr member of this structure. The tests relating
to no_formal_ino can then be done by the calling code. This has
advantages in that we want to do different things in different code
paths if the no_formal_ino doesn't match. In the NFS patch we want to
return -ESTALE, but in the ->lookup() path, its a bug in the fs if the
no_formal_ino doesn't match and thus we can withdraw in this case.

In order to later fix bz #201012, we need to be able to look up an inode
without knowing no_formal_ino, as the only information that is known to
us is the on-disk location of the inode in question.

This patch will also help us to fix bz #236099 at a later date by
cleaning up a lot of the code in that area.

There are no user visible changes as a result of this patch and there
are no changes to the on-disk format either.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:24 +01:00
Steven Whitehouse 41d7db0ab4 [GFS2] Reduce size of struct gdlm_lock
This patch removes the completion (which is rather large) from struct
gdlm_lock in favour of using the wait_on_bit() functions. We don't need
to add any extra fields to the structure to do this, so we save 32 bytes
(on x86_64) per structure. This adds up to quite a lot when we may
potentially have millions of these lock structures,

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: David Teigland <teigland@redhat.com>
2007-07-09 08:22:21 +01:00
Robert Peterson cd81a4bac6 [GFS2] Addendum patch 2 for gfs2_grow
This addendum patch 2 corrects three things:

1. It fixes a stupid mistake in the previous addendum that broke gfs2.
   Ref: https://www.redhat.com/archives/cluster-devel/2007-May/msg00162.html
2. It fixes a problem that Dave Teigland pointed out regarding the
   external declarations in ops_address.h being in the wrong place.
3. It recasts a couple more %llu printks to (unsigned long long)
   as requested by Steve Whitehouse.

I would have loved to put this all in one revised patch, but there was
a rush to get some patches for RHEL5.	Therefore, the previous patches
were applied to the git tree "as is" and therefore, I'm posting another
addendum.  Sorry.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:19 +01:00
Nate Diller 0507ecf50f [GFS2] use zero_user_page
Use zero_user_page() instead of open-coding it.

Signed-off-by: Nate Diller <nate.diller@gmail.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-07-09 08:22:17 +01:00
Robert Peterson 6c53267f05 [GFS2] Kernel changes to support new gfs2_grow command (part 2)
To avoid code redundancy, I separated out the operational "guts" into
a new function called read_rindex_entry.  Then I made two functions:
the closer-to-original gfs2_ri_update (without the special condition
checks) and gfs2_ri_update_special that's designed with that condition
in mind.  (I don't like the name, but if you have a suggestion, I'm
all ears).

Oh, and there's an added benefit:  we don't need all the ugly gotos
anymore.  ;)

This patch has been tested with gfs2_fsck_hellfire (which runs for
three and a half hours, btw).

Signed-off-By: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:14 +01:00
Robert Peterson 7ae8fa8451 [GFS2] kernel changes to support new gfs2_grow command
This is another revision of my gfs2 kernel patch that allows
gfs2_grow to function properly.

Steve Whitehouse expressed some concerns about the previous
patch and I restructured it based on his comments.
The previous patch was doing the statfs_change at file close time,
under its own transaction.  The current patch does the statfs_change
inside the gfs2_commit_write function, which keeps it under the
umbrella of the inode transaction.

I can't call ri_update to re-read the rindex file during the
transaction because the transaction may have outstanding unwritten
buffers attached to the rgrps that would be otherwise blown away.
So instead, I created a new function, gfs2_ri_total, that will
re-read the rindex file just to total the file system space
for the sake of the statfs_change.  The ri_update will happen
later, when gfs2 realizes the version number has changed, as it
happened before my patch.

Since the statfs_change is happening at write_commit time and there
may be multiple writes to the rindex file for one grow operation.
So one consequence of this restructuring is that instead of getting
one kernel message to indicate the change, you may see several.
For example, before when you did a gfs2_grow, you'd get a single
message like:

GFS2: File system extended by 247876 blocks (968MB)

Now you get something like:

GFS2: File system extended by 207896 blocks (812MB)
GFS2: File system extended by 39980 blocks (156MB)

This version has also been successfully run against the hours-long
"gfs2_fsck_hellfire" test that does several gfs2_grow and gfs2_fsck
while interjecting file system damage.  It does this repeatedly
under a variety Resource Group conditions.

Signed-off-By: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:12 +01:00
Satyam Sharma 3168b0780d [DLM] fix a couple of races
Fix two races in fs/dlm/config.c:

(1) Grab the configfs subsystem semaphore before calling
config_group_find_obj() in get_space(). This solves a potential race
between get_space() and concurrent mkdir(2) or rmdir(2).

(2) Grab a reference on the found config_item _while_ holding the configfs
subsystem semaphore in get_comm(), and not after it. This solves a
potential race between get_comm() and concurrent rmdir(2).

Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:10 +01:00
Benjamin Marzinski b524fe646c [GFS2] flush the glock completely in inode_go_sync
Fix for bz #231910
When filemap_fdatawrite() is called on the inode mapping in data=ordered mode,
it will add the glock to the log. In inode_go_sync(), if you do the
gfs2_log_flush() before this, after the filemap_fdatawrite() call, the glock
and its associated data buffers will be on the log again. This means you can
demote a lock from exclusive, without having it flushed from the log. The
attached patch simply moves the gfs2_log_flush up to after the
filemap_fdatawrite() call.

Originally, I tried moving the gfs2_log_flush to after gfs2_meta_sync(), but
that caused me to trip the following assert.

GFS2: fsid=cypher-36:test.0: fatal: assertion "!buffer_busy(bh)" failed
GFS2: fsid=cypher-36:test.0:   function = gfs2_ail_empty_gl, file = fs/gfs2/glops.c, line = 61

It appears that gfs2_log_flush() puts some of the glocks buffers in the busy
state and the filemap_fdatawrite() call is necessary to flush them. This makes
me worry slightly that a related problem could happen because of moving the
gfs2_log_flush() after the initial filemap_fdatawrite(), but I assume that
gfs2_ail_empty_gl() would catch that case as well.

Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:07 +01:00
Linus Torvalds 1e5de2837c Fix permission checking for the new utimensat() system call
Commit 1c710c896e added the utimensat()
system call, but didn't handle the case of checking for the writability
of the target right, when the target was a file descriptor, not a
filename.

We cannot use vfs_permission(MAY_WRITE) for that case, and need to
simply check whether the file descriptor is writable.  The oops from
using the wrong function was noticed and narrowed down by Markus
Trippelsdorf.

Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-08 12:02:55 -07:00
Adrian Bunk 95511ad434 DLM must depend on SYSFS
The dependency of DLM on SYSFS got lost in
commit 6ed7257b46 resulting in the
following compile error with CONFIG_DLM=y, CONFIG_SYSFS=n:

<--  snip  -->

...
  LD      .tmp_vmlinux1
fs/built-in.o: In function `dlm_lockspace_init':
/home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/fs/dlm/lockspace.c:231: undefined reference to `kernel_subsys'
fs/built-in.o: In function `configfs_init':
/home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/fs/configfs/mount.c:143: undefined reference to `kernel_subsys'
make[1]: *** [.tmp_vmlinux1] Error 1

<--  snip  -->

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-07 14:17:43 -07:00
Michael Ellerman ef7320edb1 Fix elf_core_dump() when writing arch specific notes (spu coredumps)
elf_core_dump() supports dumping arch specific ELF notes, via the #define
ELF_CORE_WRITE_EXTRA_NOTES.  Currently the only user of this is the powerpc
spu coredump code.

There is a bug in the handling of foffset WRT the arch notes, which causes
us to erroneously increment foffset by the size of the arch notes, leaving
a block of zeroes in the file, and causing all subsequent data in the file
to be at <supposed position> + <arch note size>.  eg:

  LOAD  0x050000 0x00100000 0x00000000 0x20000 0x20000 R E 0x10000

Tells us we should have a chunk of data at 0x50000.  The truth is the data
is at 0x90dbc = 0x50000 + 0x40dbc (the size of the arch notes).

This bug prevents gdb from reading the core file correctly.

The simplest fix is to simply remember the size of the arch notes, and add
it to foffset after we've written the arch notes.  The only drawback is
that if the arch code doesn't write as many bytes as it said it would, we
end up with a broken core dump again.  For now I think that's a reasonable
requirement.

Tested on a Cell blade, gdb no longer complains about the core file being
bogus.

While I'm here I should point out that the spu coredump code does not work
if we're dumping to a pipe - we'll have to wait for 23 to fix that.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-06 10:23:43 -07:00
David Woodhouse e2baf4ed16 [JFFS2] Fix readinode failure when read_dnode() detects CRC failure.
We should have stopped returning 1 from read_dnode() to indicate
failure. We can just mark the damn thing obsolete immediately. But I
missed a case where we don't.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-07-04 10:24:29 -04:00
Zach Brown fcb82f8835 dio: remove bogus refcounting BUG_ON
Badari Pulavarty reported a case of this BUG_ON is triggering during
testing.  It's completely bogus and should be removed.

It's trying to notice if we left references to the dio hanging around in
the sync case.  They should have been dropped as IO completed while this
path was in dio_await_completion().  This condition will also be
checked, via some twisty logic, by the BUG_ON(ret != -EIOCBQUEUED) a few
lines lower.  So to start this BUG_ON() is redundant.

More fatally, it's dereferencing dio-> after having dropped its
reference.  It's only safe to dereference the dio after releasing the
lock if the final reference was just dropped.  Another CPU might free
the dio in bio completion and reuse the memory after this path drops the
dio lock but before the BUG_ON() is evaluated.

This patch passed aio+dio regression unit tests and aio-stress on ext3.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-03 18:23:23 -07:00
David Woodhouse edd5cd4a94 Introduce fixed sys_sync_file_range2() syscall, implement on PowerPC and ARM
Not all the world is an i386.  Many architectures need 64-bit arguments to be
aligned in suitable pairs of registers, and the original
sys_sync_file_range(int, loff_t, loff_t, int) was therefore wasting an
argument register for padding after the first integer.  Since we don't
normally have more than 6 arguments for system calls, that left no room for
the final argument on some architectures.

Fix this by introducing sys_sync_file_range2(int, int, loff_t, loff_t) which
all fits nicely.  In fact, ARM already had that, but called it
sys_arm_sync_file_range.  Move it to fs/sync.c and rename it, then implement
the needed compatibility routine.  And stop the missing syscall check from
bitching about the absence of sys_sync_file_range() if we've implemented
sys_sync_file_range2() instead.

Tested on PPC32 and with 32-bit and 64-bit userspace on PPC64.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-28 11:38:30 -07:00
Andrew Morton ddc80bd781 ext2: fix return of uninitialised variable
gcc correctly says

fs/ext2/super.c: In function 'ext2_remount':
fs/ext2/super.c:1055: warning: 'err' may be used uninitialized in this function

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-28 11:38:29 -07:00
Davide Libenzi f8738c5c52 avoid spurious POLLIN returns in signalfd
The new code in kernel/signal.c does not allow fetching private signals
from another task.  This patch avoid spurious POLLIN returns from a
signalfd poll(2) operation.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-28 11:34:54 -07:00
Michael Halcrow d4c5cdb3e0 zero out last page for llseek/write
When one llseek's past the end of the file and then writes, every page past
the previous end of the file should be cleared.  Trevor found that the code,
as is, does not assure that the very last page is always cleared.  This patch
takes care of that.

Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-28 11:34:53 -07:00
Michael Halcrow e10f281bca eCryptfs: initialize crypt_stat in setattr
Recent changes in eCryptfs have made it possible to get to ecryptfs_setattr()
with an uninitialized crypt_stat struct.  This results in a wide and colorful
variety of unpleasantries.  This patch properly initializes the crypt_stat
structure in ecryptfs_setattr() when it is necessary to do so.

Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-28 11:34:53 -07:00
Michael Halcrow 240e2df5c7 eCryptfs: fix write zeros behavior
This patch fixes the processes involved in wiping regions of the data during
truncate and write events, fixing a kernel hang in 2.6.22-rc4 while assuring
that zero values are written out to the appropriate locations during events in
which the i_size will change.

The range passed to ecryptfs_truncate() from ecryptfs_prepare_write() includes
the page that is the object of ecryptfs_prepare_write().  This leads to a
kernel hang as read_cache_page() is executed on the same page in the
ecryptfs_truncate() execution path.  This patch remedies this by limiting the
range passed to ecryptfs_truncate() so as to exclude the page that is the
object of ecryptfs_prepare_write(); it also adds code to
ecryptfs_prepare_write() to zero out the region of its own page when writing
past the i_size position.  This patch also modifies ecryptfs_truncate() so
that when a file is truncated to a smaller size, eCryptfs will zero out the
contents of the new last page from the new size through to the end of the last
page.

Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-28 11:34:53 -07:00
Kirill Korotaev e5d2861f31 ext4: lost brelse in ext4_read_inode()
One of error path in ext4_read_inode() leaks bh since brelse is forgoten.

Signed-off-by: Kirill Korotaev <dev@openvz.org>
Acked-by: Vasily Averin <vvs@sw.ru>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-24 08:59:12 -07:00
Kirill Korotaev e4a10a362c ext3: lost brelse in ext3_read_inode()
One of error path in ext3_read_inode() leaks bh since brelse is forgoten.

Signed-off-by: Kirill Korotaev <dev@openvz.org>
Acked-by: Vasily Averin <vvs@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-24 08:59:12 -07:00
Carsten Otte 266f5aa097 ext2: disallow setting xip on remount
Yan Zheng pointed out that ext2_remount lacks checking if -o xip should be
enabled or not.  This patch checks for presence of direct_access on the
backing block device and if the blocksize meets the requirements.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Cc: Yan Zheng <yanzheng@21cn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-24 08:59:12 -07:00
Christoph Hellwig 700716c846 [XFS] s/memclear_highpage_flush/zero_user_page/
SGI-PV: 957103
SGI-Modid: xfs-linux-melb:xfs-kern:28678a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-06-19 15:20:31 +10:00
Eric W. Biederman 9d66586f77 shm: fix the filename of hugetlb sysv shared memory
Some user space tools need to identify SYSV shared memory when examining
/proc/<pid>/maps.  To do so they look for a block device with major zero, a
dentry named SYSV<sysv key>, and having the minor of the internal sysv
shared memory kernel mount.

To help these tools and to make it easier for people just browsing
/proc/<pid>/maps this patch modifies hugetlb sysv shared memory to use the
SYSV<key> dentry naming convention.

User space tools will still have to be aware that hugetlb sysv shared
memory lives on a different internal kernel mount and so has a different
block device minor number from the rest of sysv shared memory.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Albert Cahalan <acahalan@gmail.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-16 13:16:16 -07:00
Jan Kara 74584ae509 udf: fix possible leakage of blocks
We have to take care that when we call udf_discard_prealloc() from
udf_clear_inode() we have to write inode ourselves afterwards (otherwise,
some changes might be lost leading to leakage of blocks, use of free blocks
or improperly aligned extents).

Also udf_discard_prealloc() does two different things - it removes
preallocated blocks and truncates the last extent to exactly match i_size.
We move the latter functionality to udf_truncate_tail_extent(), call
udf_discard_prealloc() when last reference to a file is dropped and call
udf_truncate_tail_extent() when inode is being removed from inode cache
(udf_clear_inode() call).

We cannot call udf_truncate_tail_extent() earlier as subsequent open+write
would find the last block of the file mapped and happily write to the end
of it, although the last extent says it's shorter.

[akpm@linux-foundation.org: Make checkpatch.pl happier]
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Eric Sandeen <sandeen@sandeen.net>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-16 13:16:16 -07:00
Alexey Dobriyan edad01e2a1 fuse: ->fs_flags fixlet
fs/fuse/inode.c:658:3: error: Initializer entry defined twice
fs/fuse/inode.c:661:3:   also defined here

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-16 13:16:15 -07:00
Jens Axboe 02676e5aee splice: only check do_wakeup in splice_to_pipe() for a real pipe
We only ever set do_wakeup to non-zero if the pipe has an inode
backing, so it's pointless to check outside the pipe->inode
check.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-06-15 13:16:13 +02:00
Jens Axboe 00de00bdad splice: fix leak of pages on short splice to pipe
If the destination pipe is full and we already transferred
data, we break out instead of waiting for more pipe room.
The exit logic looks at spd->nr_pages to see if we moved
everything inside the spd container, but we decrement that
variable in the loop to decide when spd has emptied.

Instead we want to compare to the original page count in
the spd, so cache that in a local variable.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-06-15 13:14:22 +02:00
Jens Axboe 17ee4f49ab splice: adjust balance_dirty_pages_ratelimited() call
As we have potentially dirtied more than 1 page, we should indicate as
such to the dirty page balancing. So call
balance_dirty_pages_ratelimited_nr() and pass in the approximate number
of pages we dirtied.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-06-15 13:10:37 +02:00
Dave Kleikamp 288e4d838d JFS: Update print_hex_dump() syntax
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
2007-06-13 10:17:50 -05:00
Tejun Heo dd14cbc994 sysfs: fix race condition around sd->s_dentry, take#2
Allowing attribute and symlink dentries to be reclaimed means
sd->s_dentry can change dynamically.  However, updates to the field
are unsynchronized leading to race conditions.  This patch adds
sysfs_lock and use it to synchronize updates to sd->s_dentry.

Due to the locking around ->d_iput, the check in sysfs_drop_dentry()
is complex.  sysfs_lock only protect sd->s_dentry pointer itself.  The
validity of the dentry is protected by dcache_lock, so whether dentry
is alive or not can only be tested while holding both locks.

This is minimal backport of sysfs_drop_dentry() rewrite in devel
branch.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-06-12 16:08:47 -07:00
Tejun Heo 6aa054aadf sysfs: fix condition check in sysfs_drop_dentry()
The condition check doesn't make much sense as it basically always
succeeds.  This causes NULL dereferencing on certain cases.  It seems
that parentheses are put in the wrong place.  Fix it.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-06-12 16:08:46 -07:00
Eric Sandeen dc351252b3 sysfs: store sysfs inode nrs in s_ino to avoid readdir oopses
Backport of
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc1/2.6.22-rc1-mm1/broken-out/gregkh-driver-sysfs-allocate-inode-number-using-ida.patch

For regular files in sysfs, sysfs_readdir wants to traverse
sysfs_dirent->s_dentry->d_inode->i_ino to get to the inode number.
But, the dentry can be reclaimed under memory pressure, and there is
no synchronization with readdir.  This patch follows Tejun's scheme of
allocating and storing an inode number in the new s_ino member of a
sysfs_dirent, when dirents are created, and retrieving it from there
for readdir, so that the pointer chain doesn't have to be traversed.

Tejun's upstream patch uses a new-ish "ida" allocator which brings
along some extra complexity; this -stable patch has a brain-dead
incrementing counter which does not guarantee uniqueness, but because
sysfs doesn't hash inodes as iunique expects, uniqueness wasn't
guaranteed today anyway.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-06-12 16:08:46 -07:00
Linus Torvalds 3e2ce4dae9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [CIFS] CIFS should honour umask
  [CIFS] Missing flag on negprot needed for some servers to force packet signing
  [CIFS] whitespace cleanup part 2
  [CIFS] whitespace cleanup
  [CIFS] fix mempool destroy done in wrong order in cifs error path
  [CIFS] typo in previous patch
  [CIFS] Fix oops on failed cifs mount (in kthread_stop)
2007-06-11 11:39:25 -07:00
Linus Torvalds 5212c555be Merge branch 'splice-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-block
* 'splice-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-block:
  splice: __generic_file_splice_read: fix read/truncate race
  splice: __generic_file_splice_read: fix i_size_read() length checks
  splice: move balance_dirty_pages_ratelimited() outside of splice actor
  pipe: move pipe_inode_info structure decleration up before it's used
  splice: remove do_splice_direct() symbol export
  splice: move inode size check into generic_file_splice_read()
2007-06-11 11:31:05 -07:00
Linus Torvalds 845a2fdcbd Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2: Fix invalid assertion during write on 64k pages
  ocfs2: Fix masklog breakage
2007-06-08 19:44:16 -07:00
Greg Ungerer c287ef1ff9 nommu: report correct errno in message
Report the correct errno for out of memory debug output in binfmt_flat.c

Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-08 17:23:32 -07:00
Steve French 3ce53fc4c5 [CIFS] CIFS should honour umask
This patch makes CIFS honour a process' umask like other filesystems.
Of course the server is still free to munge the permissions if it wants
to; but the client will send the "right" permissions to begin with.

A few caveats:

1) It only applies to filesystems that have CAP_UNIX (aka support unix
extensions)
2) It applies the correct mode to the follow up CIFSSMBUnixSetPerms()
after remote creation

When mode to CIFS/NTFS ACL mapping is complete we can do the
same thing for that case for servers which do not
support the Unix Extensions.

Signed-off-by: Matt Keenen <matt@opcode-solutions.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-06-08 14:55:14 +00:00
Jens Axboe 620a324b74 splice: __generic_file_splice_read: fix read/truncate race
Original patch and description from Neil Brown <neilb@suse.de>,
merged and adapted to splice branch by me. Neils text follows:

__generic_file_splice_read() currently samples the i_size at the start
and doesn't do so again unless it needs to call ->readpage to load
a page.  After ->readpage it has to re-sample i_size as a truncate
may have caused that page to be filled with zeros, and the read()
call should not see these.

However there are other activities that might cause ->readpage to be
called on a page between the time that __generic_file_splice_read()
samples i_size and when it finds that it has an uptodate page. These
include at least read-ahead and possibly another thread performing a
read

So we must sample i_size *after* it has an uptodate page.  Thus the
current sampling at the start and after a read can be replaced with a
sampling before page addition into spd.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-06-08 08:34:11 +02:00
Hugh Dickins 475ecade68 splice: __generic_file_splice_read: fix i_size_read() length checks
__generic_file_splice_read's partial page check, at eof after readpage,
not only got its calculations wrong, but also reused the loff variable:
causing data corruption when splicing from a non-0 offset in the file's
last page (revealed by ext2 -b 1024 testing on a loop of a tmpfs file).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-06-08 08:34:05 +02:00
Jens Axboe 20d698db67 splice: move balance_dirty_pages_ratelimited() outside of splice actor
I've seen inode related deadlocks, so move this call outside of the
actor itself, which may hold the inode lock.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-06-08 08:33:59 +02:00
Jens Axboe 267adc3e66 splice: remove do_splice_direct() symbol export
It's only supposed to be used by do_sendfile(), which is never
modular. So kill the export.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-06-08 08:33:41 +02:00
Jens Axboe d366d39885 splice: move inode size check into generic_file_splice_read()
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-06-08 08:32:38 +02:00
Bryan Wu 85f6038f21 RAMFS NOMMU: missed POSIX UID/GID inode attribute checking
This bug was caught by LTP testcase fchmod06 on Blackfin platform.

In the manpage of fchmod, "EPERM: The effective UID does not match the
owner of the file, and the process is not privileged (Linux: it does not
have the CAP_FOWNER capability)."

But the ramfs nommu code missed the inode_change_ok POSIX UID/GID
verification. This patch fixed this.

Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-07 17:11:13 -07:00
Mark Fasheh eeb47d1234 ocfs2: Fix invalid assertion during write on 64k pages
The write path code intends to bug if a math error (or unhandled case)
results in a write outside of the current cluster boundaries. The actual
BUG_ON() statements however are incorrect, leading to a crash on kernels
with 64k page size. Fix those by checking against the right variables.

Also, move the assertions higher up within the functions so that they trip
*before* the code starts to mark buffers.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-06-06 16:42:03 -07:00
Tiger Yang 59be7dc97b ocfs2: Fix masklog breakage
Some of the sysfs changes inadvertantly broke the simple runtime debug log
filtering employed in ocfs2. Fix this by properly exporting the masklog
category filter names.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-06-06 16:41:08 -07:00
Dave Kleikamp 209e101bf4 JFS: use print_hex_dump() rather than private dump_mem() function
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
2007-06-06 16:30:17 -05:00
Dave Kleikamp f720e3ba55 JFS: Whitespace cleanup and remove some dead code
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
2007-06-06 15:28:35 -05:00
Yehuda Sadeh Weinraub 100c1ddc98 [CIFS] Missing flag on negprot needed for some servers to force packet signing
A related signature issue that I came across.
There's a bug in win2k that when NT error codes are not negotiated, the
server doesn't response that signatures are mandatory. Since there's
(currently) no way turn on signatures in such case, I had to force NT
error codes, so that this bug will not occur

Signed-off-by: Yehuda Sadeh Weinraub <Yehuda.Sadeh@expand.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-06-05 21:31:16 +00:00
Steve French 221601c3d1 [CIFS] whitespace cleanup part 2
Various coding style problems found by running the new
   checkpatch.pl script against fs/cifs.  3 more files
   fixed up.

Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-06-05 20:35:06 +00:00
Steve French 5fdae1f681 [CIFS] whitespace cleanup
Various coding style problems found by running fs/cifs
against the new checkpatch.pl script.  Since there
were too many to fit in one patch.  Updated the first
four files.

Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-06-05 18:30:44 +00:00
Linus Torvalds ec4883b015 Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6:
  [JFFS2] Fix obsoletion of metadata nodes in jffs2_add_tn_to_tree()
  [MTD] Fix error checking after get_mtd_device() in get_sb_mtd functions
  [JFFS2] Fix buffer length calculations in jffs2_get_inode_nodes()
  [JFFS2] Fix potential memory leak of dead xattrs on unmount.
  [JFFS2] Fix BUG() caused by failing to discard xattrs on deleted files.
  [MTD] generalise the handling of MTD-specific superblocks
  [MTD] [MAPS] don't force uclinux mtd map to be root dev
2007-06-04 17:54:09 -07:00
Andrew Morton 78ae87c3cd vanishing ioctl handler debugging
We've had several reoprts of the CPU jumping to 0x00000000 is do_ioctl().  I
assume that there's a race and someone is zeroing out the ioctl handler while
this CPU waits for the lock_kernel().

The patch adds code to detect this, then emits stuff which will hopefuly lead
us to the culprit.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-04 13:25:10 -07:00
Akinobu Mita e6985c7f68 [CIFS] fix mempool destroy done in wrong order in cifs error path
Slab cache used as memory pool can not be destroyed before the memory
pool destruction. Because the memory pool still holds some objects and
kmem_cache_destroy() says "Can't free all objects".

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-06-04 16:14:59 +00:00
David Woodhouse 0477d24e2a [JFFS2] Fix obsoletion of metadata nodes in jffs2_add_tn_to_tree()
We should keep the mdata node with higher version number, not just the
one we happen to find latest. Doh.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-06-01 20:04:43 +01:00
Yoann Padioleau f834368564 parse errors in ifdefs
Fix various bits of obviously-busted code which we're not happening to
compile, due to ifdefs.

Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Jan Kara <jack@ucw.cz>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-01 08:18:28 -07:00
Jan Kara 85d71244f0 Fix possible UDF data corruption
update_next_aext() could possibly rewrite values in elen and eloc, possibly
leading to data corruption when rewriting a file.  Use temporary variables
instead.  Also advance cur_epos as it can also point to an indirect extent
pointer.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-01 08:18:27 -07:00
Artem Bityutskiy ea55d30798 [JFFS2] Fix buffer length calculations in jffs2_get_inode_nodes()
If we have already read enough bytes, no need to call read_more().

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-06-01 13:20:29 +01:00
Alex Tomas 315054f023 When ext4_ext_insert_extent() fails to insert new blocks
we should free just the allocated blocks.

Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-05-31 16:20:15 -04:00
Amit Arora 25d14f983f ext4: Extent overlap bugfix
This patch adds a check for overlap of extents and cuts short the
new extent to be inserted, if there is a chance of overlap.

Signed-off-by: Amit Arora <aarora@in.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-05-31 16:20:15 -04:00
Mingming Cao 8a9dc94498 Remove unnecessary exported symbols.
Signed-Off-By: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-05-31 16:20:15 -04:00
Dave Kleikamp 8c55e20411 EXT4: Fix whitespace
Replace a lot of spaces with tabs

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-05-31 16:20:14 -04:00
Andrew Morton 00c541eae7 afs: needs sched.h
mips:

fs/afs/flock.c: In function `afs_lock_may_be_available':
fs/afs/flock.c:55: error: dereferencing pointer to incomplete type
fs/afs/flock.c: In function `afs_lock_work':
fs/afs/flock.c:84: error: dereferencing pointer to incomplete type
fs/afs/flock.c:89: error: dereferencing pointer to incomplete type
fs/afs/flock.c:109: error: dereferencing pointer to incomplete type
fs/afs/flock.c:135: error: dereferencing pointer to incomplete type
fs/afs/flock.c:143: error: dereferencing pointer to incomplete type
fs/afs/flock.c:158: error: dereferencing pointer to incomplete type
fs/afs/flock.c:161: error: dereferencing pointer to incomplete type
fs/afs/flock.c:179: error: `TASK_UNINTERRUPTIBLE' undeclared (first use in this function)
fs/afs/flock.c:179: error: (Each undeclared identifier is reported only once
fs/afs/flock.c:179: error: for each function it appears in.)
fs/afs/flock.c:179: error: `TASK_INTERRUPTIBLE' undeclared (first use in this function)
fs/afs/flock.c:182: error: dereferencing pointer to incomplete type

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-31 07:58:14 -07:00
Andrew Morton 1fc799e1b4 ntfs_init_locked_inode(): fix array indexing
Local variable `i' is a byte-counter.  Don't use it as an index into an array
of le32's.

Reported-by: "young dave" <hidave.darkstar@gmail.com>
Cc: "Christoph Lameter" <clameter@sgi.com>
Acked-by: Anton Altaparmakov <aia21@cantab.net>
Cc: <stable@kernel.org>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-31 07:58:13 -07:00
Bryan Wu 3f0a6766e0 a bug in ramfs_nommu_resize function, passing old size to vmtruncate
It should be pass "newsize" to vmtruncate function to modify the
inode->i_size, while the old size is passed to vmtruncate.

This bug was caught by LTP truncate test case on Blackfin platform.
After it was fixed, the LTP truncate test case passed.

Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-30 20:54:07 -07:00
Trond Myklebust b4946ffb18 NFS: Fix a refcount leakage in O_DIRECT
The current code is leaking a reference to dreq->kref when the calls to
nfs_direct_read_schedule() and nfs_direct_write_schedule() return an
error.
This patch moves the call to kref_put() from nfs_direct_wait() back into
nfs_direct_read() and nfs_direct_write() (which are the functions that
actually took the reference in the first place) fixing the leak.

Thanks to Denis V. Lunev for spotting the bug and proposing the original
fix.

Acked-by: Denis V. Lunev <dlunev@gmail.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-30 16:26:01 -04:00
David Chinner df3c724426 [XFS] Write at EOF may not update filesize correctly.
The recent fix for preventing NULL files from being left around does not
update the file size corectly in all cases. The missing case is a write
extending the file that does not need to allocate a block.

In that case we used a read mapping of the extent which forced the use of
the read I/O completion handler instead of the write I/O completion
handle. Hence the file size was not updated on I/O completion.

SGI-PV: 965068
SGI-Modid: xfs-linux-melb:xfs-kern:28657a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nscott@aconex.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-29 18:15:17 +10:00
Hugh Dickins f4d43bd579 fix compat console unimap regression
Why is it that since the 2f1a2ccb9c console
UTF-8 fixes went into 2.6.22-rc1, the PowerMac G5 shows only inverse video
question marks for the text on tty2-6? whereas tty1 is fine, and so is x86.

No fault of that patch: by removing the old fallback behaviour, it reveals
that 32-bit setfont running on 64-bit kernels has only really worked on
the current console, the rest getting faked by that inadequate fallback.

Bring the compat do_unimap_ioctl into line with the main one: PIO_UNIMAP
and GIO_UNIMAP apply to the specified tty, not redirected to fg_console.
Use the same checks, and most particularly, remember to check access_ok:
con_set_unimap and con_get_unimap are using __get_user and __put_user.

And the compat vt_check should ask for the same capability as the main
one, CAP_SYS_TTY_CONFIG rather than CAP_SYS_ADMIN.  Added in vt_ioctl's
vc_cons_allocated check for safety, though failure may well be impossible.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-25 17:37:46 -07:00
Christoph Hellwig d9b08b9efe [PATCH] ocfs2: use generic_segment_checks
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-25 11:06:37 -07:00
Mark Fasheh 8fccfc829a ocfs2: fix inode leak
We weren't cleaning up our inode reference on error in
ocfs2_reserve_local_alloc_bits(). Add a check for error return and iput() if
need be. Move the code to set the alloc context inode info to the end of the
function so we don't have any possibility of passing back a bad pointer.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-25 11:00:46 -07:00
Nate Diller 5c3c6bb770 [PATCH] ocfs2: use zero_user_page
Use zero_user_page() instead of open-coding it.

Signed-off-by: Nate Diller <nate.diller@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-25 11:00:39 -07:00
Mark Fasheh 1024c902ab ocfs2: unmap_mapping_range() in ocfs2_truncate()
We weren't calling this before, but since ocfs2 handles the entire truncate
operation, we should.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-25 11:00:31 -07:00
Mark Fasheh e9dfc0b2bc ocfs2: trylock in ocfs2_readpage()
Similarly to the page lock / cluster lock inversion in ocfs2_readpage, we
can deadlock on ip_alloc_sem. We can down_read_trylock() instead and just
return AOP_TRUNCATED_PAGE if the operation fails.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-25 11:00:23 -07:00
Linus Torvalds d333fc8d30 Merge branch 'fixes' of git://git.linux-nfs.org/pub/linux/nfs-2.6
* 'fixes' of git://git.linux-nfs.org/pub/linux/nfs-2.6:
  NFS: Fix nfs_direct_dirty_pages()
  NFS: Fix handful of compiler warnings in direct.c
  NFS: Avoid a deadlock situation on write
2007-05-24 09:17:12 -07:00
Trond Myklebust d4a8f3677f NFS: Fix nfs_direct_dirty_pages()
We only need to dirty the pages that were actually read in.

Also convert nfs_direct_dirty_pages() to call set_page_dirty() instead of
set_page_dirty_lock(). A call to lock_page() is unacceptable in an rpciod
callback function.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-24 11:18:18 -04:00
Chuck Lever 749e146e01 NFS: Fix handful of compiler warnings in direct.c
This patch fixes a couple of signage issues that were causing an Oops
when running the LTP diotest4 test. get_user_pages() returns a signed
error, hence we need to be careful when comparing with the unsigned
number of pages from data->npages.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-24 10:44:20 -04:00
Trond Myklebust 7fe7f8487a NFS: Avoid a deadlock situation on write
When processes are allowed to attempt to lock a non-contiguous range of nfs
write requests, it is possible for generic_writepages to 'wrap round' the
address space, and call writepage() on a request that is already locked by
the same process.

We avoid the deadlock by checking if the page index is contiguous with the
list of nfs write requests that is already held in our
nfs_pageio_descriptor prior to attempting to lock a new request.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-24 10:44:20 -04:00
Michael Halcrow 53a2731f93 eCryptfs: delay writing 0's after llseek until write
Delay writing 0's out in eCryptfs after a seek past the end of the file
until data is actually written.

http://www.opengroup.org/onlinepubs/009695399/functions/lseek.html

``The lseek() function shall not, by itself, extend the size of a
file.''

Without this fix, applications that lseek() past the end of the file without
writing will experience unexpected behavior.

Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-23 20:14:15 -07:00
Davi Arnaut b3762bfc8d signalfd: retrieve multiple signals with one read() call
Gathering signals in bulk enables server applications to drain a signal
queue (almost full of realtime signals) more efficiently by reducing the
syscall and file look-up overhead.

Very similar to the sigtimedwait4() call described by Niels Provos, Chuck
Lever, and Stephen Tweedie in a paper entitled "Analyzing the Overload
Behavior of a Simple Web Server".  The paper lists more details and
advantages.

Signed-off-by: Davi E. M. Arnaut <davi@haxent.com.br>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-23 20:14:14 -07:00
Miklos Szeredi ead5f0b5fa fuse: delete inode on drop
When inode is dropped (no more references) delete it from cache.

There's not much point in keeping it cached, when a new lookup will refresh
the attributes anyway.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-23 20:14:13 -07:00
Miklos Szeredi 889f784831 fuse: generic_write_checks() for direct_io
This fixes O_APPEND in direct IO mode.  Also checks writes against file size
limits, notably rlimits.

Reported by Greg Bruno.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-23 20:14:13 -07:00
Christoph Hellwig 492c8b332e uselib: add missing MNT_NOEXEC check
We don't allow loading ELF shared library from noexec points so the
same should apply to sys_uselib aswell.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Ulrich Drepper <drepper@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-23 20:14:13 -07:00
David Woodhouse 5a1b639148 Missing 'const' from reiserfs MIN_KEY declaration.
In stree.c, MIN_KEY is declared const. The extern declaration in dir.c
doesn't match...

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-23 20:14:13 -07:00
Badari Pulavarty 6087b2dab2 optimize compat_core_sys_select() by a using stack space for small fd sets
Optimize select by a using stack space for small fd sets.
core_sys_select() already has this optimization.  This is for compat
version.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-23 20:14:12 -07:00
Miklos Szeredi b9ba347f27 fuse: fix mknod of regular file
The wrong lookup flag was tested in ->create() causing havoc (error or
Oops) when a regular file was created with mknod() in a fuse filesystem.

Thanks to J. Cameijo Cerdeira for the report.

Kernels 2.6.18 onward are affected.  Please apply to -stable as well.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-23 20:14:11 -07:00
Steve French f7f7c31c98 [CIFS] typo in previous patch
(also fixed missing space after if)

Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-05-24 02:29:51 +00:00
Steve French 28356a1679 [CIFS] Fix oops on failed cifs mount (in kthread_stop)
If the cifs demultiplex thread wakes up and exits
(zeroing server->tsk) before kthread_stop is called, the
cifs_mount code could pass a null pointer to kthread_stop

Thanks to akpm, Dave Young and Shaggy for suggesting
earlier versions of this patch.

CC: akpm@linux-foundatior.org
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-05-23 14:45:36 +00:00
Linus Torvalds cdb7532f7b Merge master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6:
  sh: Fix dreamcast build for IRQ changes.
  sh: Fix clock multiplier on SH7722.
  sh: Wire up kdump crash kernel exec in die().
  sh: sr.bl toggling around idle sleep.
  sh: disable genrtc support.
  fs: Kill sh dependency for binfmt_flat.
  sh: Disable psw support for R7785RP.
  sh: Fix page size alignment in __copy_user_page().
  sh: Fix up various compile warnings for SE boards.
  sh: Wire up signalfd/timerfd/eventfd syscalls.
  sh: revert addition of page fault notifiers
  spelling fixes: arch/sh/
  input: hp680_ts compile fixes.
  sh: landisk: Header cleanups.
  sh: landisk: rtc-rs5c313 support.
  sh: Kill off pmb slab cache destructor.
  sh: Fix up psw build rules for r7780rp.
  sh: Shut up compiler warnings in __do_page_fault().
2007-05-22 17:26:18 -07:00
Jeff Garzik 72dd9ca599 partitions/LDM: build fix
This from a "tested" patch...

Signed-off-by: Jeff Garzik <jeff@garzik.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 21:38:17 -07:00
Anton Altaparmakov dde33348e5 LDM: Fix for Windows Vista dynamic disks
This fixes the LDM driver so that it works with Windows Vista dynamic
disks which are subtly different to Windows 2000/XP ones.

The patch was needed to get a Vista formatted dynamic disk to be
recognized and parsed successfully.

Thanks go to Chris Teachworth for the report and testing.

Cc: Richard Russon <ldm@flatcap.org>
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:58:40 -07:00
Alexey Dobriyan e8edc6e03a Detach sched.h from mm.h
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.

This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
   getting them indirectly

Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
   they don't need sched.h
b) sched.h stops being dependency for significant number of files:
   on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
   after patch it's only 3744 (-8.3%).

Cross-compile tested on

	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
	alpha alpha-up
	arm
	i386 i386-up i386-defconfig i386-allnoconfig
	ia64 ia64-up
	m68k
	mips
	parisc parisc-up
	powerpc powerpc-up
	s390 s390-up
	sparc sparc-up
	sparc64 sparc64-up
	um-x86_64
	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

as well as my two usual configs.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:18:19 -07:00
OGAWA Hirofumi ff1be9ad61 Fix "fs: convert core functions to zero_user_page"
The bug was introduced by 01f2705daf.
It misses to convert the first argument, it should be "new_page".

This became a cause of fatfs corruption.

Cc: Nate Diller <nate.diller@gmail.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:15:32 -07:00
Paul Mundt 1d4be747a8 fs: Kill sh dependency for binfmt_flat.
Not really sure where this bogosity came from, but there's certainly
nothing special about sh that lets us use flat files with the MMU on.

Kill the dependency, and leave it as !MMU, like it is for all of the
other nommu-wielding ports.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-05-21 14:34:00 +09:00
David Woodhouse 2ad8ee7135 [JFFS2] Fix potential memory leak of dead xattrs on unmount.
An xattr_datum which ends up orphaned should be freed by the GC 
thread. But if we umount before the GC thread is finished, or if we 
mount read-only and the GC thread never runs, they might never be 
freed. Clean them up during unmount, if there are any left.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-05-20 11:30:38 -04:00
David Woodhouse 8ae5d31263 [JFFS2] Fix BUG() caused by failing to discard xattrs on deleted files.
When we cannot mark nodes as obsolete, such as on NAND flash, we end up 
having to delete inodes with !nlink in jffs2_build_remove_unlinked_inode().
However, jffs2_build_xattr_subsystem() runs later than this, and will
attach an xref to the dead inode. Then later when the last nodes of that
dead inode are erased we hit a BUG() in jffs2_del_ino_cache() 
because we're not supposed to get there with an xattr still attached to 
the inode which is being killed.

The simple fix is to refrain from attaching xattrs to inodes with zero 
nlink, in jffs2_build_xattr_subsystem(). It's it's OK to trust nlink 
here because the file system isn't actually mounted yet, so there's no 
chance that a zero-nlink file could actually be alive still because 
it's open.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-05-20 11:28:22 -04:00
Davide Libenzi 18963c01b8 timerfd use waitqueue lock ...
The timerfd was using the unlocked waitqueue operations, but it was
using a different lock, so poll_wait() would race with it.

This makes timerfd directly use the waitqueue lock.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-18 13:09:34 -07:00
Davide Libenzi d48eb23315 eventfd use waitqueue lock ...
The eventfd was using the unlocked waitqueue operations, but it was
using a different lock, so poll_wait() would race with it.

This makes eventfd directly use the waitqueue lock.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-18 13:09:34 -07:00
Trond Myklebust dd504ea16f Merge branch 'master' of /home/trondmy/repositories/git/linux-2.6/ 2007-05-17 11:36:59 -04:00
Christoph Lameter ea125892a1 Fix page allocation flags in grow_dev_page()
grow_dev_page() simply passes GFP_NOFS to find_or_create_page.  This means
the allocation of radix tree nodes is done with GFP_NOFS and the allocation
of a new page is done using GFP_NOFS.

The mapping has a flags field that contains the necessary allocation flags
for the page cache allocation.  These need to be consulted in order to get
DMA and HIGHMEM allocations etc right.  And yes a blockdev could be
allowing Highmem allocations if its a ramdisk.

Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-17 05:23:06 -07:00
Jan Kara 7925409e20 circular locking dependency found in QUOTA OFF
i_mutex on quota files is special.  Unlike i_mutexes for other inodes it is
acquired under dqonoff_mutex.  Tell lockdep about this lock ranking.  Also
comment and code in quota_sync_sb() seem to be bogus (as i_mutex for quota
file can be acquired under dqonoff_mutex).  Move truncate_inode_pages()
call under dqonoff_mutex and save some problems with races...

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-17 05:23:05 -07:00
Nate Diller c9f2875b79 ecryptfs: use zero_user_page
Use zero_user_page() instead of open-coding it.

Signed-off-by: Nate Diller <nate.diller@gmail.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-17 05:23:05 -07:00
Dan Aloni 71ce92f3fa make sysctl/kernel/core_pattern and fs/exec.c agree on maximum core filename size
Make sysctl/kernel/core_pattern and fs/exec.c agree on maximum core
filename size and change it to 128, so that extensive patterns such as
'/local/cores/%e-%h-%s-%t-%p.core' won't result in truncated filename
generation.

Signed-off-by: Dan Aloni <da-x@monatomic.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-17 05:23:05 -07:00
Trond Myklebust 5cf4cf65a8 Merge branch 'master' of /home/trondmy/repositories/git/linux-2.6/ 2007-05-17 08:23:04 -04:00
Heiko Carstens 8317f14b60 simplify compat_sys_timerfd
Just thought this is easier to read.

Acked-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-17 05:23:04 -07:00
Christoph Lameter a35afb830f Remove SLAB_CTOR_CONSTRUCTOR
SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@ucw.cz>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-17 05:23:04 -07:00
David Howells bb33ed6345 AFS: Fix afs_prepare_write()
afs_prepare_write() should not mark a page up to date if it only partially
fills it in, in expectation of the caller filling in the rest prior to calling
commit_write().  commit_write(), however, should mark the page up to date.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-16 21:19:15 -07:00
David Howells faab83bbcd AFS: write back dirty data on unmount
Fix AFS to write back dirty on unmounting.  This didn't happen because
afs_super_ops.drop_inode was pointing to generic_delete_inode.  Now this
pointer is left set to NULL so that the default behaviour occurs instead.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-16 21:19:15 -07:00
Trond Myklebust 6684e323a2 Merge branch 'origin' 2007-05-15 16:11:17 -04:00
Davide Libenzi f0ee9aabb0 epoll: move kfree inside ep_free
Move the kfree() call inside the ep_free() function.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-15 08:54:00 -07:00
Davide Libenzi 67647d0fb8 epoll: fix some comments
Fixes some epoll code comments.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-15 08:54:00 -07:00
Davide Libenzi c7ea763025 epoll locks changes and cleanups
Changes the rwlock to a spinlock, and drops the use-count variable.
Operations are always bound by the mutex now, so the use-count is no more
needed.  For the same reason, the rwlock can become a simple spinlock.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-15 08:53:59 -07:00
Davide Libenzi d47de16c72 fix epoll single pass code and add wait-exclusive flag
Fixes the epoll single pass code.  During the unlocked event delivery (to
userspace) code, the poll callback can re-issue new events, and we must
receive them correctly.  Since we loop in a lockless fashion, we want to be
O(nready), and we don't want to flash on/off the spinlock for every event, we
have the poll callback to use a secondary list to queue events while we're
inside the event delivery loop.  The rw_semaphore has been turned into a
mutex.  This patch also adds the wait-exclusive flag, as suggested by Davi
Arnaut.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-15 08:53:59 -07:00