Commit Graph

3240 Commits

Author SHA1 Message Date
Andrew Morton 7256d819e4 [PATCH] ufs: printk() fix
fs/ufs/inode.c: In function `ufs_frag_map':
fs/ufs/inode.c:101: warning: long long unsigned int format, u64 arg (arg 4)
fs/ufs/inode.c: In function `ufs_getfrag_block':
fs/ufs/inode.c:432: warning: long long unsigned int format, u64 arg (arg 2)

Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29 10:26:20 -07:00
KaiGai Kohei c6e8c6ccf9 [JFFS2][XATTR] Fix xd->refcnt race condition
When xd->refcnt is checked whether this xdatum should be released
or not, atomic_dec_and_lock() is used to ensure holding the
c->erase_completion_lock.

This fix change a specification of delete_xattr_datum().
Previously, it's only called when xd->refcnt equals zero.
(calling it with positive xd->refcnt cause a BUG())
If you applied this patch, the function checks whether
xd->refcnt is zero or not under the spinlock if necessary.
Then, it marks xd DEAD flahs and links with xattr_dead_list
or releases it immediately when xd->refcnt become zero.

Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-06-29 15:33:02 +01:00
Trond Myklebust 9f2fa46638 Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ 2006-06-28 23:27:48 -04:00
Latchesar Ionkov 9d7fa40098 [PATCH] v9fs: fix fid check in v9fs_create
Fix an incorrect check whether a fid was allocated in v9fs_create and if it
should be freed on error.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-28 14:59:05 -07:00
Latchesar Ionkov 94374e7cc3 [PATCH] v9fs: return the correct error when interrupted by signal
If a signal interrupts the user process, v9fs sends a flush request to the
file server and waits for its response.  It error code is incorrectly set
to the error code of the flush message instead of ERESTARTSYS.  The patch
sets the error code to the correct value.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-28 14:59:05 -07:00
Christoph Hellwig f5e54d6e53 [PATCH] mark address_space_operations const
Same as with already do with the file operations: keep them in .rodata and
prevents people from doing runtime patching.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-28 14:59:04 -07:00
Trond Myklebust 607f31e80b Revert "Merge branch 'odirect'"
This reverts ccf01ef7aa commit.

No idea how git managed this one: when I asked it to merge the odirect
topic branch it actually generated a patch which reverted the change.

Reverting the 'merge' will once again reveal Chuck's recent NFS/O_DIRECT
work to the world.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-28 16:52:45 -04:00
Linus Torvalds 936813a880 Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6:
  [MTD] NAND: Select chip before checking write protect status
  [MTD] CORE mtdchar.c: fix off-by-one error in lseek()
  [MTD] NAND: Fix typo in mtd/nand/ts7250.c
  [JFFS2][XATTR] coexistence between xattr and write buffering support.
  [JFFS2][XATTR] Fix wrong copyright
  [JFFS2][XATTR] Re-define xd->refcnt as atomic_t
  [JFFS2][XATTR] Fix memory leak with jffs2_xattr_ref
  [JFFS2][XATTR] rid unnecessary writing of delete marker.
  [JFFS2][XATTR] Fix ACL bug when updating null xattr by null ACL.
  [JFFS2][XATTR] using 'delete marker' for xdatum/xref deletion
  [MTD] Fix off-by-one error in physmap.c
  [MTD] Remove unused 'nr_banks' variable from ixp2000 map driver
  [MTD NAND] s3c2412 support in s3c2410.c
  [MTD] Initialize 'writesize'
  [MTD] NAND: ndfc fix address offset thinko
  [MTD] NAND: S3C2410 convert prinks to dev_*()s
  [MTD] NAND: Missing fixups
2006-06-27 19:13:56 -07:00
Linus Torvalds 73a0e405dc Merge git://oss.sgi.com:8090/nathans/xfs-2.6
* git://oss.sgi.com:8090/nathans/xfs-2.6:
  [XFS] Fixup whitespace damage in log_write, remove final warning.
  [XFS] Rework code snippets slightly to remove remaining recent-gcc
  [XFS] Fix realtime subvolume expansion, a porting bug b0rked it.  Coverity
  [XFS] Remove a race condition where a linked inode could BUG_ON in
  [XFS] Remove redundant directory checks from inode link operation.
  [XFS] Remove a couple of no-longer-used macros.
  [XFS] Reduce size of xfs_trans_t structure. * remove ->t_forw, ->t_back --
  [XFS] remove unused behaviour lock - shrink XFS vnode as a side effect.
  [XFS] * There is trivial "inode => vnode => inode" conversion, but only
  [XFS] link(2) on directory is banned in VFS.
2006-06-27 19:09:16 -07:00
Linus Torvalds f17a2686b1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (25 commits)
  [CIFS] Fix authentication choice so we do not force NTLMv2 unless the
  [CIFS] Fix alignment of unicode strings in previous patch
  [CIFS] Fix allocation of buffers for new session setup routine to allow
  [CIFS] Remove calls to to take f_owner.lock
  [CIFS] remove some redundant null pointer checks
  [CIFS] Fix compile warning when CONFIG_CIFS_EXPERIMENTAL is off
  [CIFS] Enable sec flags on mount for cifs (part one)
  [CIFS] Fix suspend/resume problem which causes EIO on subsequent access to
  [CIFS] fix minor compile warning when config_cifs_weak_security is off
  [CIFS] NTLMv2 support part 5
  [CIFS] Add support for readdir to legacy servers
  [CIFS] NTLMv2 support part 4
  [CIFS] NTLMv2 support part 3
  [CIFS] NTLMv2 support part 2
  [CIFS] Fix mask so can set new cifs security flags properly
  CIFS] Support for older servers which require plaintext passwords - part 2
  [CIFS] Support for older servers which require plaintext passwords
  [CIFS] Fix mapping of old SMB return code Invalid Net Name so it is
  [CIFS] Missing brace
  [CIFS] Do not overwrite aops
  ...
2006-06-27 18:31:57 -07:00
Nathan Scott 5493a0fcba [XFS] Fixup whitespace damage in log_write, remove final warning.
SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26366a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-28 11:17:28 +10:00
Jesper Juhl 4ad98457aa [PATCH] Remove redundant NULL checks before [kv]free - in fs/
Remove redundant NULL checks before kfree for fs/

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:48 -07:00
Chandra Seetharaman 5a67e4c5b6 [PATCH] cpu hotplug: use hotplug version of cpu notifier in appropriate places
Make use the of newly defined hotplug version of cpu_notifier functionality
wherever appropriate.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:41 -07:00
Evgeniy Dushistov 05f225dc87 [PATCH] ufs: ufs_read_inode cleanup
Add missed ufsi->i_dir_start_lookup initialization in ufs_read_inode in
UFS2 case.  Also it cleans ufs_read_inode function to prevent such kind of
situation in the future: it move depend on UFS type parts of code into
separate functions and leaves in ufs_read_inode only generic code.  It
cleans code and avoids duplication.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:39 -07:00
Ingo Molnar 34af946a22 [PATCH] spin/rwlock init cleanups
locking init cleanups:

 - convert " = SPIN_LOCK_UNLOCKED" to spin_lock_init() or DEFINE_SPINLOCK()
 - convert rwlocks in a similar manner

this patch was generated automatically.

Motivation:

 - cleanliness
 - lockdep needs control of lock initialization, which the open-coded
   variants do not give
 - it's also useful for -rt and for lock debugging in general

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:39 -07:00
Adrian Bunk b6cd0b772d [PATCH] fs/buffer.c: cleanups
- add a proper prototype for the following global function:
  - buffer_init()

- make the following needlessly global function static:
  - end_buffer_async_write()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:38 -07:00
Randy Dunlap c9cf55285e [PATCH] add poison.h and patch primary users
Localize poison values into one header file for better documentation and
easier/quicker debugging and so that the same values won't be used for
multiple purposes.

Use these constants in core arch., mm, driver, and fs code.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Matt Mackall <mpm@selenic.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:38 -07:00
Ingo Molnar e6e5494cb2 [PATCH] vdso: randomize the i386 vDSO by moving it into a vma
Move the i386 VDSO down into a vma and thus randomize it.

Besides the security implications, this feature also helps debuggers, which
can COW a vma-backed VDSO just like a normal DSO and can thus do
single-stepping and other debugging features.

It's good for hypervisors (Xen, VMWare) too, which typically live in the same
high-mapped address space as the VDSO, hence whenever the VDSO is used, they
get lots of guest pagefaults and have to fix such guest accesses up - which
slows things down instead of speeding things up (the primary purpose of the
VDSO).

There's a new CONFIG_COMPAT_VDSO (default=y) option, which provides support
for older glibcs that still rely on a prelinked high-mapped VDSO.  Newer
distributions (using glibc 2.3.3 or later) can turn this option off.  Turning
it off is also recommended for security reasons: attackers cannot use the
predictable high-mapped VDSO page as syscall trampoline anymore.

There is a new vdso=[0|1] boot option as well, and a runtime
/proc/sys/vm/vdso_enabled sysctl switch, that allows the VDSO to be turned
on/off.

(This version of the VDSO-randomization patch also has working ELF
coredumping, the previous patch crashed in the coredumping code.)

This code is a combined work of the exec-shield VDSO randomization
code and Gerd Hoffmann's hypervisor-centric VDSO patch. Rusty Russell
started this patch and i completed it.

[akpm@osdl.org: cleanups]
[akpm@osdl.org: compile fix]
[akpm@osdl.org: compile fix 2]
[akpm@osdl.org: compile fix 3]
[akpm@osdl.org: revernt MAXMEM change]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Cc: Gerd Hoffmann <kraxel@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:38 -07:00
Nathan Scott 6fdf8ccc09 [XFS] Rework code snippets slightly to remove remaining recent-gcc
warnings.

SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26364a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-28 10:13:52 +10:00
Steve French f40c562855 [CIFS] Fix authentication choice so we do not force NTLMv2 unless the
user specifies it is required or turns of ntlm

Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-06-28 00:13:38 +00:00
Eric Sesterhenn 73024cf115 [XFS] Fix realtime subvolume expansion, a porting bug b0rked it. Coverity
made me look at this code (bug id #344). We only return with
XFS_ERROR(EINVAL) if mp->m_rtdev_targp is valid and pass it otherwise to
xfs_read_buf() where some function calls later it gets dereferenced by an
assert.

SGI-PV: 954266
SGI-Modid: xfs-linux-melb:xfs-kern:26363a

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-28 08:42:26 +10:00
David Brownell 266bee8869 [PATCH] fix static linking of NFS
Builds on ARM report link problems with common configurations like
statically linked NFS (for nfsroot).  The symptom is that __init
section code references __exit section code; that won't work since
the exit sections are discarded (since they can never be called).

The best fix for these particular cases would be an "__init_or_exit"
section annotation.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 14:07:19 -07:00
Steve French 0223cf0b10 [CIFS] Fix alignment of unicode strings in previous patch
Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-06-27 19:50:57 +00:00
KaiGai Kohei 04510dee3c [JFFS2][XATTR] coexistence between xattr and write buffering support.
Drop '&& !JFFS2_FS_WRITEBUFFER' from fs/Kconfig.
The series of previous patches enables to use those
functionality at same time.

Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-06-27 16:20:06 +01:00
KaiGai Kohei 332a6b99c5 [JFFS2][XATTR] Fix wrong copyright
summary.c was modified at 2006.

Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-06-27 16:19:39 +01:00
KaiGai Kohei 2c887e2359 [JFFS2][XATTR] Re-define xd->refcnt as atomic_t
In jffs2_release_xattr_datum(), it refers xd->refcnt to ensure
whether releasing xd is allowed or not.
But we can't hold xattr_sem since this function is called under
spin_lock(&c->erase_completion_lock). Thus we have to refer it
without any locking.

This patch redefine xd->refcnt as atomic_t. It enables to refer
xd->refcnt without any locking.

Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-06-27 16:19:06 +01:00
KaiGai Kohei 355ed4e141 [JFFS2][XATTR] Fix memory leak with jffs2_xattr_ref
If xattr_ref is associated with an orphan inode_cache
on filesystem mounting, those xattr_refs are not
released even if this inode_cache is released.

This patch enables to call jffs2_xattr_delete_inode()
for such a irregular inode_cachde too.

Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-06-27 16:18:30 +01:00
KaiGai Kohei 8a13695cbe [JFFS2][XATTR] rid unnecessary writing of delete marker.
In the followinf situation, an explicit delete marker is not
necessary, because we can certainlly detect those obsolete
xattr_datum or xattr_ref on next mounting.

- When to delete xattr_datum node.
- When to delete xattr_ref node on removing inode.
- When to delete xattr_ref node on updating xattr.

This patch rids writing delete marker in those situations.

Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-06-27 16:18:02 +01:00
KaiGai Kohei a1ae76e96a [JFFS2][XATTR] Fix ACL bug when updating null xattr by null ACL.
This patch enable to handle the case when updating null xattr
by null ACL.

When we try to set NULL into NULL xattr, xattr subsystem returns
-ENODATA. This patch enables to handle this error code.

[2/3] jffs2-xattr-v6-02-fix_posixacl_bug.patch

Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-06-27 16:17:04 +01:00
KaiGai Kohei c9f700f840 [JFFS2][XATTR] using 'delete marker' for xdatum/xref deletion
- When xdatum is removed, a new xdatum with 'delete marker' is
  written. (version==0xffffffff means 'delete marker')
- When xref is removed, a new xref with 'delete marker' is written.
  (odd-numbered xseqno means 'delete marker')

- delete_xattr_(datum/xref)_delay() are new deletion functions
  are added. We can only use them if we can detect the target
  obsolete xdatum/xref as a orphan or errir one.
  (e.g when inode deletion, or detecting crc error)

[1/3] jffs2-xattr-v6-01-delete_marker.patch

Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-06-27 16:16:26 +01:00
Steve French 750d1151a6 [CIFS] Fix allocation of buffers for new session setup routine to allow
longer user and domain names and allow passing sec options on mount

Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-06-27 06:28:30 +00:00
Nathan Scott 97dfd70c89 [XFS] Remove a race condition where a linked inode could BUG_ON in
d_instantiate, due to fast transaction committal removing the last
remaining reference before we were all done.

SGI-PV: 953287
SGI-Modid: xfs-linux-melb:xfs-kern:26347a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-27 16:13:46 +10:00
Alexey Dobriyan 05a3332885 [XFS] Remove redundant directory checks from inode link operation.
SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26343a

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-27 16:13:29 +10:00
Nathan Scott ebe1090549 [XFS] Remove a couple of no-longer-used macros.
SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26339a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-27 16:13:02 +10:00
Alexey Dobriyan 1998764e5a [XFS] Reduce size of xfs_trans_t structure. * remove ->t_forw, ->t_back --
unused * ->t_ag_freeblks_delta, ->t_ag_flist_delta, ->t_ag_btree_delta
are debugging aid -- wrap them in everyone's favourite way.    As a
result, cut "xfs_trans" slab object size from 592 to 572 bytes here.

SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26319a

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-27 16:12:40 +10:00
Alexey Dobriyan b3bbed1d08 [XFS] remove unused behaviour lock - shrink XFS vnode as a side effect.
SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26299a

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-27 16:12:15 +10:00
Alexey Dobriyan 71306f3b88 [XFS] * There is trivial "inode => vnode => inode" conversion, but only
flags and   mode of final inode are looked at. Pass original inode
instead. * Two occurences of bhv_vnode_t go out.

SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26298a

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-27 14:10:29 +10:00
Alexey Dobriyan b71d300c8b [XFS] link(2) on directory is banned in VFS.
SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26293a

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
2006-06-27 12:45:17 +10:00
Adrian Bunk 3fb5a9891d [PATCH] fs/ocfs2/dlm/: cleanups
This patch #if 0's the no longer used dlm_dump_lock_resources().

Since this makes dlmdebug.h empty, this patch also removes this header.

Additionally, the needlessly global dlm_is_node_recovered() is made
static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:21 -07:00
Mark Fasheh 43dee336c9 ocfs2: fix compiler warnings in dlm_convert_lock_handler()
We need to cast to unsigned long long.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:20 -07:00
Mark Fasheh 8a9343fa24 ocfs2: dlm_print_one_mle() needs to be defined
Fixes compile breakage.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:19 -07:00
Kurt Hackel 0032abd674 ocfs2: remove whitespace in dlmunlock.c
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:19 -07:00
Kurt Hackel 3156d26701 ocfs2: move dlm work to a private work queue
The work that is done can block for long periods of time and so is not
appropriate for keventd.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:18 -07:00
Kurt Hackel 495ac96e63 ocfs2: fix incorrect error returns
Use DLM_REJECTED instead of DLM_RECOVERING.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:17 -07:00
Kurt Hackel 3b3b84a892 ocfs2: tune down some noisy messages during dlm recovery
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:16 -07:00
Kurt Hackel 56a7c104bc ocfs2: display message before waiting for recovery to complete
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:15 -07:00
Kurt Hackel 44a7f1d063 ocfs2: mlog in dlm_convert_lock_handler() should be ML_ERROR
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:15 -07:00
Kurt Hackel b220532a71 ocfs2: retry operations when a lock is marked in recovery
Before checking for a nonexistent lock, make sure the lockres is not marked
RECOVERING. The caller will just retry and the state should be fixed up when
recovery completes.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:14 -07:00
Kurt Hackel f85cd47a58 ocfs2: use cond_resched() in dlm_thread()
yield() does not yield.  cond_resched() does.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:13 -07:00
Kurt Hackel ad8100e0d2 ocfs2: use GFP_NOFS in some dlm operations
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:12 -07:00
Kurt Hackel b7084ab538 ocfs2: wait for recovery when starting lock mastery
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:12 -07:00
Kurt Hackel c27069e6cf ocfs2: continue recovery when a dead node is encountered
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:11 -07:00
Kurt Hackel 67a187412b ocfs2: remove unneccesary spin_unlock() in dlm_remaster_locks()
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:10 -07:00
Kurt Hackel 6a41321121 ocfs2: dlm_remaster_locks() should never exit without completing
We cannot restart recovery. Once we begin to recover a node, keep the state
of the recovery intact and follow through, regardless of any other node
deaths that may occur.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:09 -07:00
Kurt Hackel c8df412e1c ocfs2: special case recovery lock in dlmlock_remote()
If the previous master of the recovery lock dies, let calc_usage take it
down completely and let the caller completely redo the dlmlock() call.
Otherwise, there will never be an opportunity to re-master the lockres and
recovery wont be able to progress.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:08 -07:00
Kurt Hackel 36407488b1 ocfs2: pending mastery asserts and migrations should block each other
Use the existing structure for blocking migrations when ASTs are pending to
achieve the same result. If we can catch the assert before it goes on the
wire, just cancel it and let the migration continue.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:08 -07:00
Kurt Hackel c87a9ae705 ocfs2: temporarily disable automatic lock migration
Now we never change the owner of a lock resource until unmount or node
death. This will be re-enabled once some issues in the algorithm used have
been resolved.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:07 -07:00
Kurt Hackel 2abaf97e62 ocfs2: do not unconditionally purge the lockres in dlmlock_remote()
In dlmlock_remote(), do not call purge_lockres until the lock resource
actually changes. otherwise, the mastery info on the lockres will go away
underneath the caller.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:06 -07:00
Kurt Hackel aa087b8497 ocfs2: increase backoff before waiting for recovery
When mastering non-recovery lock resources, additional time was frequently
needed to allow the disk heartbeat to catch up with the network timeout. the
recovery lock resource is time critical and avoids this path.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:05 -07:00
Kurt Hackel f42a100b22 ocfs2: have dlm_pre_master_reco_lockres() ignore dead nodes
Recovery will spin in dlm_pre_master_reco_lockres if we do not ignore
timed-out network responses from dead nodes.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:05 -07:00
Kurt Hackel 6ff06a9391 ocfs2: give the dlm dirty list a reference on the lockres
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:04 -07:00
Kurt Hackel e7e69eb389 ocfs2: teach dlm_restart_lock_mastery() to wait on recovery
Change behavior of dlm_restart_lock_mastery() when a node goes down.  Dump
all responses that have been collected and start over.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:03 -07:00
Kurt Hackel e4eb03681a ocfs2: gracefully handle stale create_lock messages.
This is an error on the sending side, so gracefully error out on the
receiving end.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:02 -07:00
Kurt Hackel ccd8b1f916 ocfs2: update lvb immediately during recovery
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:02 -07:00
Kurt Hackel 588e00902b ocfs2: do not send master requests to localhost
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:01 -07:00
Kurt Hackel 8b2198097a ocfs2: purge lockres' sooner
Immediately purge a lockress that the local node is not the master of.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:43:00 -07:00
Kurt Hackel 343e26a400 ocfs2: dump mismatching migrated lvbs before BUG()
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:59 -07:00
Kurt Hackel 466d1a4591 ocfs2: make dlm recovery finalization 2 stage
Makes it easier for the recovery process to deal with node death.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:58 -07:00
Kurt Hackel 69d72b066c ocfs2: dlm recovery / lockres reference count fix
Take a reference on lockres structures while they are on the recovery list.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:58 -07:00
Kurt Hackel a9ee4c8a67 ocfs2: better error handling during assert master message
handle errors during lock assert master by either killing self or other node

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:57 -07:00
Kurt Hackel a7f90d83ea ocfs2: dump lockres info before we BUG() on a bad reference
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:56 -07:00
Mark Fasheh c0a8520c73 ocfs2: do LVB puts in place
Don't wait until the AST will be fired to do the LVB copy into the lock
resource.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:55 -07:00
Kurt Hackel aa85235427 ocfs2: mle ref count debugging
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:55 -07:00
Kurt Hackel dc2ed195dd ocfs2: allow for an assert message during lock mastery
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:54 -07:00
Kurt Hackel 2d1a868c56 ocfs2: take mle reference during migration
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:53 -07:00
Kurt Hackel 41b8c8a101 ocfs2: properly initialize the mle structure
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:52 -07:00
Kurt Hackel da01ad0552 ocfs2: detach mle from heartbeat events
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:52 -07:00
Kurt Hackel a2bf04774b ocfs2: mle ref counting fixes
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:51 -07:00
Kurt Hackel 958837197e ocfs2: better mle debugging
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:50 -07:00
Kurt Hackel d6dea6e973 ocfs2: clean up recovery related messages
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:49 -07:00
Kurt Hackel 29c0fa0f56 ocfs2: handle network errors during recovery
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:49 -07:00
Kurt Hackel c3187ce5e3 ocfs2: only recover one dead node at a time
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:48 -07:00
Kurt Hackel ab27eb6f47 ocfs2: Better tracking for recovery state changes
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:47 -07:00
Kurt Hackel 8bc674cb48 ocfs2: Fix empty lvb check
The check for an empty lvb should check the entire buffer not just the first
byte.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:46 -07:00
Kurt Hackel aba9aac788 ocfs2: fix inverted logic in dlm_is_node_dead
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:45 -07:00
Kurt Hackel 2580a580e0 ocfs2: recheck lockres master before sending an unlock request.
Recovery may have happened and it may now be mastered locally.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:45 -07:00
Kurt Hackel 8d79d088e8 ocfs2: add a small delay after a failed migration
Otherwise we risk starving other threads.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:44 -07:00
Mark Fasheh 685f1adb38 ocfs2: silence a compile warning in dlm_alloc_pagevec()
Reported by Andrew Morton.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:43 -07:00
Joel Becker c8f33b6e86 [PATCH] ocfs2: Alloc at least a page for the DLM hash
The OCFS2 DLM allocates a number of pages for a hash to lookup locks.
There was a bug where a PAGE_SIZE bigger than the hash size (eg, 64K
pages) would result in zero pages allocated.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:42 -07:00
Daniel Phillips 03d864c02c ocfs2: allocate lockres hash pages in an array
This allows us to have a hash table greater than a single page which greatly
improves dlm performance on some tests.

Signed-off-by: Daniel Phillips <phillips@google.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:42 -07:00
Mark Fasheh 95c4f581d6 ocfs2: inline dlm_lockres_get()
It's called on every lookup so this might help performance a bit.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:41 -07:00
Daniel Phillips 4198985f7a [PATCH] Clean up ocfs2 hash probe and make it faster
Signed-Off-By: Daniel Phillips <phillips@google.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:40 -07:00
Mark Fasheh a3d3329159 ocfs2: calculate lockid hash values outside of the spinlock
Fixes a performance bug - pointed out by Andrew.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:39 -07:00
Mark Fasheh 65c491d833 ocfs2: move lockres qstr next to hlist_node structure
Gains us a bit of performance on loads which heavily hit the lockres hash.
Patch suggested by Daniel Phillips <phillips@google.com>.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26 14:42:39 -07:00
Linus Torvalds da206c9e68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
  typo fixes
  Clean up 'inline is not at beginning' warnings for usb storage
  Storage class should be first
  i386: Trivial typo fixes
  ixj: make ixj_set_tone_off() static
  spelling fixes
  fix paniced->panicked typos
  Spelling fixes for Documentation/atomic_ops.txt
  move acknowledgment for Mark Adler to CREDITS
  remove the bouncing email address of David Campbell
2006-06-26 13:33:14 -07:00
Greg Kroah-Hartman ff23eca3e8 [PATCH] devfs: Remove the devfs_fs_kernel.h file from the tree
Also fixes up all files that #include it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:08 -07:00
Greg Kroah-Hartman 8ab5e4c15b [PATCH] devfs: Remove devfs_remove() function from the kernel tree
Removes the devfs_remove() function and all callers of it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:07 -07:00
Greg Kroah-Hartman 7c69ef7974 [PATCH] devfs: Remove devfs_mk_cdev() function from the kernel tree
Removes the devfs_mk_cdev() function and all callers of it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:07 -07:00
Greg Kroah-Hartman 1a715c5cf9 [PATCH] devfs: Remove devfs_mk_bdev() function from the kernel tree
Removes the devfs_mk_bdev() function and all callers of it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:06 -07:00
Greg Kroah-Hartman 95dc112a57 [PATCH] devfs: Remove devfs_mk_dir() function from the kernel tree
Removes the devfs_mk_dir() function and all callers of it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:06 -07:00
Greg Kroah-Hartman a29641883f [PATCH] devfs: Remove devfs from the partition code
This patch removes the devfs code from the fs/partitions/ directory.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:05 -07:00
Greg Kroah-Hartman d8deac5094 [PATCH] devfs: Remove devfs from the kernel tree
This is the first patch in a series of patches that removes devfs
support from the kernel.  This patch removes the core devfs code, and
its private header file.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:05 -07:00
Linus Torvalds 81a07d7588 Merge branch 'x86-64'
* x86-64: (83 commits)
  [PATCH] x86_64: x86_64 stack usage debugging
  [PATCH] x86_64: (resend) x86_64 stack overflow debugging
  [PATCH] x86_64: msi_apic.c build fix
  [PATCH] x86_64: i386/x86-64 Add nmi watchdog support for new Intel CPUs
  [PATCH] x86_64: Avoid broadcasting NMI IPIs
  [PATCH] x86_64: fix apic error on bootup
  [PATCH] x86_64: enlarge window for stack growth
  [PATCH] x86_64: Minor string functions optimizations
  [PATCH] x86_64: Move export symbols to their C functions
  [PATCH] x86_64: Standardize i386/x86_64 handling of NMI_VECTOR
  [PATCH] x86_64: Fix modular pc speaker
  [PATCH] x86_64: remove sys32_ni_syscall()
  [PATCH] x86_64: Do not use -ffunction-sections for modules
  [PATCH] x86_64: Add cpu_relax to apic_wait_icr_idle
  [PATCH] x86_64: adjust kstack_depth_to_print default
  [PATCH] i386/x86-64: adjust /proc/interrupts column headings
  [PATCH] x86_64: Fix race in cpu_local_* on preemptible kernels
  [PATCH] x86_64: Fix fast check in safe_smp_processor_id
  [PATCH] x86_64: x86_64 setup.c - printing cmp related boottime information
  [PATCH] i386/x86-64/ia64: Move polling flag into thread_info_status
  ...

Manual resolve of trivial conflict in arch/i386/kernel/Makefile
2006-06-26 10:51:09 -07:00
Andi Kleen bebfa1013e [PATCH] x86_64: Add compat_printk and sysctl to turn off compat layer warnings
Sometimes e.g. with crashme the compat layer warnings can be noisy.
Add a way to turn them off by gating all output through compat_printk
that checks a global sysctl. The default is not changed.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:16 -07:00
Linus Torvalds 8871e73fdb Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SPARC]: Add iomap interfaces.
  [OPENPROM]: Rewrite driver to use in-kernel device tree.
  [OPENPROMFS]: Rewrite using in-kernel device tree and seq_file.
  [SPARC]: Add unique device_node IDs and a ".node" property.
  [SPARC]: Add of_set_property() interface.
  [SPARC64]: Export auxio_register to modules.
  [SPARC64]: Add missing interfaces to dma-mapping.h
  [SPARC64]: Export _PAGE_IE to modules.
  [SPARC64]: Allow floppy driver to build modular.
  [SPARC]: Export x_bus_type to modules.
  [RIOWATCHDOG]: Fix the build.
  [CPWATCHDOG]: Fix the build.
  [PARPORT] sunbpp: Fix typo.
  [MTD] sun_uflash: Port to new EBUS device layer.
2006-06-26 10:08:32 -07:00
Oleg Nesterov 5debfa6da5 [PATCH] coredump: shutdown current process first
This patch optimizes zap_threads() for the case when there are no ->mm
users except the current's thread group.  In that case we can avoid
'for_each_process()' loop.

It also adds a useful invariant: SIGNAL_GROUP_EXIT (if checked under
->siglock) always implies that all threads (except may be current) have
pending SIGKILL.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:27 -07:00
Oleg Nesterov dcf560c593 [PATCH] coredump: some code relocations
This is a preparation for the next patch.  No functional changes.
Basically, this patch moves '->flags & SIGNAL_GROUP_EXIT' check into
zap_threads(), and 'complete(vfork_done)' into coredump_wait outside of
->mmap_sem protected area.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:27 -07:00
Oleg Nesterov 7b1c6154fa [PATCH] coredump: don't take tasklist_lock
This patch removes tasklist_lock from zap_threads().
This is safe wrt:

	do_exit:
		The caller holds mm->mmap_sem. This means that task which
		shares the same ->mm can't pass exit_mm(), so it can't be
		unhashed from init_task.tasks or ->thread_group lists.

	fork:
		None of sub-threads can fork after zap_process(leader). All
		processes which were created before this point should be
		visible to zap_threads() because copy_process() adds the new
		process to the tail of init_task.tasks list, and ->siglock
		lock/unlock provides a memory barrier.

	de_thread:
		It does list_replace_rcu(&leader->tasks, &current->tasks).
		So zap_threads() will see either old or new leader, it does
		not matter. However, it can change p->sighand, so we should
		use lock_task_sighand() in zap_process().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:27 -07:00
Oleg Nesterov d5f70c00ad [PATCH] coredump: kill ptrace related stuff
With this patch zap_process() sets SIGNAL_GROUP_EXIT while sending SIGKILL to
the thread group.  This means that a TASK_TRACED task

	1. Will be awakened by signal_wake_up(1)

	2. Can't sleep again via ptrace_notify()

	3. Can't go to do_signal_stop() after return
	   from ptrace_stop() in get_signal_to_deliver()

So we can remove all ptrace related stuff from coredump path.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:27 -07:00
Oleg Nesterov 281de339ce [PATCH] coredump: speedup SIGKILL sending
With this patch a thread group is killed atomically under ->siglock.  This is
faster because we can use sigaddset() instead of force_sig_info() and this is
used in further patches.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:27 -07:00
Oleg Nesterov aceecc0412 [PATCH] coredump: optimize ->mm users traversal
zap_threads() iterates over all threads to find those ones which share
current->mm.  All threads in the thread group share the same ->mm, so we can
skip entire thread group if it has another ->mm.

This patch shifts the killing of thread group into the newly added
zap_process() function.  This looks as unnecessary complication, but it is
used in further patches.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:26 -07:00
Oleg Nesterov 2ceb8693ef [PATCH] de_thread: fix lockless do_each_thread
We should keep the value of old_leader->tasks.next in de_thread, otherwise
we can't do for_each_process/do_each_thread without tasklist_lock held.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:26 -07:00
Eric Paris 42c3e03ef6 [PATCH] SELinux: Add sockcreate node to procattr API
Below is a patch to add a new /proc/self/attr/sockcreate A process may write a
context into this interface and all subsequent sockets created will be labeled
with that context.  This is the same idea as the fscreate interface where a
process can specify the label of a file about to be created.  At this time one
envisioned user of this will be xinetd.  It will be able to better label
sockets for the actual services.  At this time all sockets take the label of
the creating process, so all xinitd sockets would just be labeled the same.

I tested this by creating a tcp sender and listener.  The sender was able to
write to this new proc file and then create sockets with the specified label.
I am able to be sure the new label was used since the avc denial messages
kicked out by the kernel included both the new security permission
setsockcreate and all the socket denials were for the new label, not the label
of the running process.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:26 -07:00
Oleg Nesterov c1df7fb88a [PATCH] cleanup next_tid()
Try to make next_tid() a bit more readable and deletes unnecessary
"pid_alive(pos)" check.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:26 -07:00
Oleg Nesterov a872ff0cb2 [PATCH] simplify/fix first_tid()
first_tid:

	/* If nr exceeds the number of threads there is nothing todo */
	if (nr) {
		if (nr >= get_nr_threads(leader))
			goto done;
	}

This is not reliable: sub-threads can exit after this check, so the
'for' loop below can overlap and proc_task_readdir() can return an
already filldir'ed dirents.

	for (; pos && pid_alive(pos); pos = next_thread(pos)) {
		if (--nr > 0)
			continue;

Off-by-one error, will return 'leader' when nr == 1.

This patch tries to fix these problems and simplify the code.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:26 -07:00
Eric W. Biederman cc288738c9 [PATCH] proc: Remove tasklist_lock from proc_task_readdir.
This is just like my previous removal of tasklist_lock from first_tgid, and
next_tgid.  It simply had to wait until it was rcu safe to walk the thread
list.

This should be the last instance of the tasklist_lock in proc.  So user
processes should not be able to influence the tasklist lock hold times.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:26 -07:00
Eric W. Biederman df26c40e56 [PATCH] proc: Cleanup proc_fd_access_allowed
In process of getting proc_fd_access_allowed to work it has developed a few
warts.  In particular the special case that always allows introspection and
the special case to allow inspection of kernel threads.

The special case for introspection is needed for /proc/self/mem.

The special case for kernel threads really should be overridable
by security modules.

So consolidate these checks into ptrace.c:may_attach().

The check to always allow introspection is trivial.

The check to allow access to kernel threads, and zombies is a little
trickier.  mem_read and mem_write already verify an mm exists so it isn't
needed twice.  proc_fd_access_allowed only doesn't want a check to verify
task->mm exits, s it prevents all access to kernel threads.  So just move
the task->mm check into ptrace_attach where it is needed for practical
reasons.

I did a quick audit and none of the security modules in the kernel seem to
care if they are passed a task without an mm into security_ptrace.  So the
above move should be safe and it allows security modules to come up with
more restrictive policy.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:26 -07:00
Eric W. Biederman 778c114477 [PATCH] proc: Use sane permission checks on the /proc/<pid>/fd/ symlinks
Since 2.2 we have been doing a chroot check to see if it is appropriate to
return a read or follow one of these magic symlinks.  The chroot check was
asking a question about the visibility of files to the calling process and
it was actually checking the destination process, and not the files
themselves.  That test was clearly bogus.

In my first pass through I simply fixed the test to check the visibility of
the files themselves.  That naive approach to fixing the permissions was
too strict and resulted in cases where a task could not even see all of
it's file descriptors.

What has disturbed me about relaxing this check is that file descriptors
are per-process private things, and they are occasionaly used a user space
capability tokens.  Looking a little farther into the symlink path on /proc
I did find userid checks and a check for capability (CAP_DAC_OVERRIDE) so
there were permissions checking this.

But I was still concerned about privacy.  Besides /proc there is only one
other way to find out this kind of information, and that is ptrace.  ptrace
has been around for a long time and it has a well established security
model.

So after thinking about it I finally realized that the permission checks
that make sense are the permission checks applied to ptrace_attach.  The
checks are simple per process, and won't cause nasty surprises for people
coming from less capable unices.

Unfortunately there is one case that the current ptrace_attach test does
not cover: Zombies and kernel threads.  Single stepping those kinds of
processes is impossible.  Being able to see which file descriptors are open
on these tasks is important to lsof, fuser and friends.  So for these
special processes I made the rule you can't find out unless you have
CAP_SYS_PTRACE.

These proc permission checks should now conform to the principle of least
surprise.  As well as using much less code to implement :)

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:26 -07:00
Eric W. Biederman 5b0c1dd38b [PATCH] proc: optimize proc_check_dentry_visible
The code doesn't need to sleep to when making this check so I can just do the
comparison and not worry about the reference counts.

TODO: While looking at this I realized that my original cleanup did not push
the permission check far enough down into the stack.  The call of
proc_check_dentry_visible needs to move out of the generic proc
readlink/follow link code and into the individual get_link instances.
Otherwise the shared resources checks are not quite correct (shared
files_struct does not require a shared fs_struct), and there are races with
unshare.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:26 -07:00
Eric W. Biederman 13b41b0949 [PATCH] proc: Use struct pid not struct task_ref
Incrementally update my proc-dont-lock-task_structs-indefinitely patches so
that they work with struct pid instead of struct task_ref.

Mostly this is a straight 1-1 substitution.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:26 -07:00
Eric W. Biederman 99f8955183 [PATCH] proc: don't lock task_structs indefinitely
Every inode in /proc holds a reference to a struct task_struct.  If a
directory or file is opened and remains open after the the task exits this
pinning continues.  With 8K stacks on a 32bit machine the amount pinned per
file descriptor is about 10K.

Normally I would figure a reasonable per user process limit is about 100
processes.  With 80 processes, with a 1000 file descriptors each I can trigger
the 00M killer on a 32bit kernel, because I have pinned about 800MB of useless
data.

This patch replaces the struct task_struct pointer with a pointer to a struct
task_ref which has a struct task_struct pointer.  The so the pinning of dead
tasks does not happen.

The code now has to contend with the fact that the task may now exit at any
time.  Which is a little but not muh more complicated.

With this change it takes about 1000 processes each opening up 1000 file
descriptors before I can trigger the OOM killer.  Much better.

[mlp@google.com: task_mmu small fixes]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Paul Jackson <pj@sgi.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Albert Cahalan <acahalan@gmail.com>
Signed-off-by: Prasanna Meda <mlp@google.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:25 -07:00
Eric W. Biederman 8578cea750 [PATCH] proc: make PROC_NUMBUF the buffer size for holding integers as strings
Currently in /proc at several different places we define buffers to hold a
process id, or a file descriptor .  In most of them we use either a hard coded
number or a different define.  Modify them all to use PROC_NUMBUF, so the code
has a chance of being maintained.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:25 -07:00
Eric W. Biederman 9cc8cbc7f8 [PATCH] simply fix first_tgid
Like the bug Oleg spotted in first_tid there was also a small off by one
error in first_tgid, when a seek was done on the /proc directory.  This
fixes that and changes the code structure to make it a little more obvious
what is going on.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:25 -07:00
Eric W. Biederman de7587343b [PATCH] proc: Remove tasklist_lock from proc_pid_lookup() and proc_task_lookup()
Since we no longer need the tasklist_lock for get_task_struct the lookup
methods no longer need the tasklist_lock.

This just depends on my previous patch that makes get_task_struct() rcu
safe.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:25 -07:00
Eric W. Biederman 454cc105ef [PATCH] proc: Remove tasklist_lock from proc_pid_readdir
We don't need the tasklist_lock to safely iterate through processes
anymore.

This depends on my previous to task patches that make get_task_struct rcu
safe, and that make next_task() rcu safe.  I haven't gotten
first_tid/next_tid yet only because next_thread is missing an
rcu_dereference.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:25 -07:00
Eric W. Biederman 0bc58a9102 [PATCH] proc: refactor reading directories of tasks
There are a couple of problems this patch addresses.
- /proc/<tgid>/task currently does not work correctly if you stop reading
  in the middle of a directory.

- /proc/ currently requires a full pass through the task list with
  the tasklist lock held, to determine there are no more processes to read.

- The hand rolled integer to string conversion does not properly running
  out of buffer space.

- We seem to be batching reading of pids from the tasklist without reason,
  and complicating the logic of the code.

This patch addresses that by changing how tasks are processed.  A
first_<task_type> function is built that handles restarts, and a
next_<task_type> function is built that just advances to the next task.

first_<task_type> when it detects a restart usually uses find_task_by_pid.  If
that doesn't work because there has been a seek on the directory, or we have
already given a complete directory listing, it first checks the number tasks
of that type, and only if we are under that count does it walk through all of
the tasks to find the one we are interested in.

The code that fills in the directory is simpler because there is only a single
for loop.

The hand rolled integer to string conversion is replaced by snprintf which
should handle the the out of buffer case correctly.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:25 -07:00
Eric W. Biederman cd6a3ce9ec [PATCH] proc: Close the race of a process dying durning lookup
proc_lookup and task exiting are not synchronized, although some of the
previous code may have suggested that.  Every time before we reuse a dentry
namei.c calls d_op->derevalidate which prevents us from reusing a stale dcache
entry.  Unfortunately it does not prevent us from returning a stale dcache
entry.  This race has been explicitly plugged in proc_pid_lookup but there is
nothing to confine it to just that proc lookup function.

So to prevent the race I call revalidate explictily in all of the proc lookup
functions after I call d_add, and report an error if the revalidate does not
succeed.

Years ago Al Viro did something similar but those changes got lost in the
churn.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:25 -07:00
Eric W. Biederman 48e6484d49 [PATCH] proc: Rewrite the proc dentry flush on exit optimization
To keep the dcache from filling up with dead /proc entries we flush them on
process exit.  However over the years that code has gotten hairy with a
dentry_pointer and a lock in task_struct and misdocumented as a correctness
feature.

I have rewritten this code to look and see if we have a corresponding entry in
the dcache and if so flush it on process exit.  This removes the extra fields
in the task_struct and allows me to trivially handle the case of a
/proc/<tgid>/task/<pid> entry as well as the current /proc/<pid> entries.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:24 -07:00
Eric W. Biederman 662795deb8 [PATCH] proc: Move proc_maps_operations into task_mmu.c
All of the functions for proc_maps_operations are already defined in
task_mmu.c so move the operations structure to keep the functionality
together.

Since task_nommu.c implements a dummy version of /proc/<pid>/maps give it a
simplified version of proc_maps_operations that it can modify to best suit its
needs.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:24 -07:00
Eric W. Biederman 6e66b52bf5 [PATCH] proc: Fix the link count for /proc/<pid>/task
Use getattr to get an accurate link count when needed.  This is cheaper and
more accurate than trying to derive it by walking the thread list of a
process.

Especially as it happens when needed stat instead of at readdir time.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:24 -07:00
Eric W. Biederman 0f2fe20f55 [PATCH] proc: Properly filter out files that are not visible to a process
Long ago and far away in 2.2 we started checking to ensure the files we
displayed in /proc were visible to the current process.  It was an
unsophisticated time and no one was worried about functions full of FIXMES in
a stable kernel.  As time passed the function became sacred and was enshrined
in the shrine of how things have always been.  The fixes came in but only to
keep the function working no one really remembering or documenting why we did
things that way.

The intent and the functionality make a lot of sense.  Don't let /proc be an
access point for files a process can see no other way.  The implementation
however is completely wrong.

We are currently checking the root directories of the two processes, we are
not checking the actual file descriptors themselves.

We are strangely checking with a permission method instead of just when we use
the data.

This patch fixes the logic to actually check the file descriptors and make a
note that implementing a permission method for this part of /proc almost
certainly indicates a bug in the reasoning.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:24 -07:00
Eric W. Biederman 22c2c5d75e [PATCH] proc: Kill proc_mem_inode_operations
The inode operations only exist to support the proc_permission function.
Currently mem_read and mem_write have all the same permission checks as
ptrace.  The fs check makes no sense in this context, and we can trivially get
around it by calling ptrace.

So simply the code by killing the strange weird case.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:24 -07:00
Eric W. Biederman 68602066c3 [PATCH] proc: Remove bogus proc_task_permission
First we can access every /proc/<tgid>/task/<pid> directory as /proc/<pid> so
proc_task_permission is not usefully limiting visibility.

Second having related filesystems information should have nothing to do with
process visibility.  kill does not implement any checks like that.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:24 -07:00
Eric W. Biederman aed7a6c476 [PATCH] proc: Replace proc_inode.type with proc_inode.fd
The sole renaming use of proc_inode.type is to discover the file descriptor
number, so just store the file descriptor number and don't wory about
processing this field.  This removes any /proc limits on the maximum number of
file descriptors, and clears the path to make the hard coded /proc inode
numbers go away.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:24 -07:00
Eric W. Biederman 87bfbf679f [PATCH] proc: Simplify the ownership rules for /proc
Currently in /proc if the task is dumpable all of files are owned by the tasks
effective users.  Otherwise the files are owned by root.  Unless it is the
/proc/<tgid>/ or /proc/<tgid>/task/<pid> directory in that case we always make
the directory owned by the effective user.

However the special case for directories is pointless except as a way to read
the effective user, because the permissions on both of those directories are
world readable, and executable.

/proc/<tgid>/status provides a much better way to read a processes effecitve
userid, so it is silly to try to provide that on the directory.

So this patch simplifies the code by removing a pointless special case and
gets us one step closer to being able to remove the hard coded /proc inode
numbers.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:23 -07:00
Eric W. Biederman 1679654951 [PATCH] proc: Remove unnecessary and misleading assignments from proc_pid_make_inode
The removed fields are already set by proc_alloc_inode.  Initializing them in
proc_alloc_inode implies they need it for proper cleanup.  At least ei->pde
was not set on all paths making it look like proc_alloc_inode was buggy.  So
just remove the redundant assignments.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:23 -07:00
Eric W. Biederman ff9724a3f7 [PATCH] proc: Remove useless BKL in proc_pid_readlink
We already call everything except do_proc_readlink outside of the BKL in
proc_pid_followlink, and there appears to be nothing in do_proc_readlink that
needs any special protection.

So remove this leftover from one of the BKL cleanup efforts.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:23 -07:00
Eric W. Biederman 5634708b5f [PATCH] proc: Fix the .. inode number on /proc/<pid>/fd
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:23 -07:00
Herbert Xu f05e15b594 [PATCH] nfsd kconfig: select things at the closest tristate instead of bool
I noticed recently that my CONFIG_CRYPTO_MD5 turned into a y again instead
of m.  It turns out that CONFIG_NFSD_V4 is selecting it to be y even though
I've chosen to compile nfsd as a module.

In general when we have a bool sitting under a tristate it is better to
select things you need from the tristate rather than the bool since that
allows the things you select to be modules.

The following patch does it for nfsd.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:23 -07:00
Hansjoerg Lipp 5024ad4af6 [PATCH] i4l: Gigaset drivers: add IOCTLs to compat_ioctl.h
Add the IOCTLs of the Gigaset drivers to compat_ioctl.h in order to make
them available for 32 bit programs on 64 bit platforms.  Please merge.

Signed-off-by: Hansjoerg Lipp <hjlipp@web.de>
Acked-by: Tilman Schmidt <tilman@imap.cc>
Cc: Karsten Keil <kkeil@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:23 -07:00
Badari Pulavarty ade1a29e16 [PATCH] ext3: Add "-o bh" option
This patch adds "-o bh" option to force use of buffer_heads.  This option
is needed when we make "nobh" as default - and if we run into problems.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:20 -07:00
Alexey Dobriyan 9637f28f8b [PATCH] reiserfs: remove reiserfs_aio_write()
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:19 -07:00
Michael LeMay 4eb582cf1f [PATCH] keys: add a way to store the appropriate context for newly-created keys
Add a /proc/<pid>/attr/keycreate entry that stores the appropriate context for
newly-created keys.  Modify the selinux_key_alloc hook to make use of the new
entry.  Update the flask headers to include a new "setkeycreate" permission
for processes.  Update the flask headers to include a new "create" permission
for keys.  Use the create permission to restrict which SIDs each task can
assign to newly-created keys.  Add a new parameter to the security hook
"security_key_alloc" to indicate whether it is being invoked by the kernel, or
from userspace.  If it is being invoked by the kernel, the security hook
should never fail.  Update the documentation to reflect these changes.

Signed-off-by: Michael LeMay <mdlemay@epoch.ncsc.mil>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:18 -07:00
Akinobu Mita f116629d03 [PATCH] fs: use list_move()
This patch converts the combination of list_del(A) and list_add(A, B) to
list_move(A, B) under fs/.

Cc: Ian Kent <raven@themaw.net>
Acked-by: Joel Becker <joel.becker@oracle.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Hans Reiser <reiserfs-dev@namesys.com>
Cc: Urban Widmark <urban@teststation.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:18 -07:00
Akinobu Mita 1bfba4e8ea [PATCH] core: use list_move()
This patch converts the combination of list_del(A) and list_add(A, B) to
list_move(A, B).

Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:17 -07:00
Akinobu Mita 8e13059a37 [PATCH] use list_add_tail() instead of list_add()
This patch converts list_add(A, B.prev) to list_add_tail(A, &B) for
readability.

Acked-by: Karsten Keil <kkeil@suse.de>
Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Acked-by: Jan Kara <jack@suse.cz>
AOLed-by: David Woodhouse <dwmw2@infradead.org>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:17 -07:00
Andreas Mohr d6e05edc59 spelling fixes
acquired (aquired)
contiguous (contigious)
successful (succesful, succesfull)
surprise (suprise)
whether (weather)
some other misspellings

Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-26 18:35:02 +02:00
Ingo Molnar 124a27fe32 [CIFS] Remove calls to to take f_owner.lock
CIFS takes/releases f_owner.lock - why?  It does not change anything in the
fowner state.  Remove this locking.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-06-26 13:47:59 +00:00
David S. Miller 3d824a46b7 [OPENPROMFS]: Rewrite using in-kernel device tree and seq_file.
We lose property writing functionality for the time being, but
that will be easy to add back.  The code and framework is so
much simpler now.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:19:14 -07:00
Steve French cd49b492fe [CIFS] remove some redundant null pointer checks
some of them pointed out by Dave Jones

Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-06-26 04:22:36 +00:00
Malcolm Parsons fcc18e83e1 [PATCH] uclinux: use PER_LINUX_32BIT in binfmt_flat
binfmt_flat.c calls set_personality with PER_LINUX as the personality.
On the arm architecture this results in the program running in 26bit
usermode.  PER_LINUX_32BIT should be used instead.  This doesn't affect
other architectures that use binfmt_flat.

Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 21:04:24 -07:00
Alexey Dobriyan 1e788f8d1a [PATCH] xfs: update ->flush method proto
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 17:43:32 -07:00
Linus Torvalds f36f44de72 Fix NFS2 compile error
Trond had apparently merged the same patch twice, causing a duplicate
include of the "internal.h" file, with resulting obvious confusion.

Tssk.  I'm the only one allowed to send out trees that don't even
compile! Who does this Trond guy think he is?

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 12:30:33 -07:00
Linus Torvalds 1d77062b14 Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (51 commits)
  nfs: remove nfs_put_link()
  nfs-build-fix-99
  git-nfs-build-fixes
  Merge branch 'odirect'
  NFS: alloc nfs_read/write_data as direct I/O is scheduled
  NFS: Eliminate nfs_get_user_pages()
  NFS: refactor nfs_direct_free_user_pages
  NFS: remove user_addr, user_count, and pos from nfs_direct_req
  NFS: "open code" the NFS direct write rescheduler
  NFS: Separate functions for counting outstanding NFS direct I/Os
  NLM: Fix reclaim races
  NLM: sem to mutex conversion
  locks.c: add the fl_owner to nlm_compare_locks
  NFS: Display the chosen RPCSEC_GSS security flavour in /proc/mounts
  NFS: Split fs/nfs/inode.c
  NFS: Fix typo in nfs_do_clone_mount()
  NFS: Fix compile errors introduced by referrals patches
  NFSv4: Ensure that referral mounts bind to a reserved port
  NFSv4: A root pathname is sent as a zero component4
  NFSv4: Follow a referral
  ...
2006-06-25 10:54:14 -07:00
Linus Torvalds 25581ad107 Merge master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb
* master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (244 commits)
  V4L/DVB (4210b): git-dvb: tea575x-tuner build fix
  V4L/DVB (4210a): git-dvb versus matroxfb
  V4L/DVB (4209): Added some BTTV PCI IDs for newer boards
  Fixes some sync issues between V4L/DVB development and GIT
  V4L/DVB (4206): Cx88-blackbird: always set encoder height based on tvnorm->id
  V4L/DVB (4205): Merge tda9887 module into tuner.
  V4L/DVB (4203): Explicitly set the enum values.
  V4L/DVB (4202): allow selecting CX2341x port mode
  V4L/DVB (4200): Disable bitrate_mode when encoding mpeg-1.
  V4L/DVB (4199): Add cx2341x-specific control array to cx2341x.c
  V4L/DVB (4198): Avoid newer usages of obsoleted experimental MPEGCOMP API
  V4L/DVB (4197): Port new MPEG API to saa7134-empress with saa6752hs
  V4L/DVB (4196): Port cx88-blackbird to the new MPEG API.
  V4L/DVB (4193): Update cx2341x fw encoding API doc.
  V4L/DVB (4192): Use control helpers for saa7115, cx25840, msp3400.
  V4L/DVB (4191): Add CX2341X MPEG encoder module.
  V4L/DVB (4190): Add helper functions for control processing to v4l2-common.
  V4L/DVB (4189): Add videodev support for VIDIOC_S/G/TRY_EXT_CTRLS.
  V4L/DVB (4188): Add new MPEG control/ioctl definitions to videodev2.h
  V4L/DVB (4186): Add support for the DNTV Live! mini DVB-T card.
  ...
2006-06-25 10:09:31 -07:00
Hua Zhong f58a1ebb22 [PATCH] remove unlikely(sb) in prune_dcache
likely profiling shows that the following is a miss.

After boot:
[+- ] Type | # True | # False | Function:Filename@Line
+unlikely |     1074|        0  prune_dcache()@:fs/dcache.c@409

After a bonnie++ run:
+unlikely |    66716|    19584  prune_dcache()@:fs/dcache.c@409

So remove it.

Signed-off-by: Hua Zhong <hzhong@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:26 -07:00
Evgeniy Dushistov 7d93a1a53a [PATCH] ext2: cleanup: put_page and comment fix
Things which force me think a little: why so?

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:25 -07:00
Ulrich Drepper 45c9b11a1d [PATCH] Implement AT_SYMLINK_FOLLOW flag for linkat
When the linkat() syscall was added the flag parameter was added in the
last minute but it wasn't used so far.  The following patch should change
that.  My tests show that this is all that's needed.

If OLDNAME is a symlink setting the flag causes linkat to follow the
symlink and create a hardlink with the target.  This is actually the
behavior POSIX demands for link() as well but Linux wisely does not do
this.  With this flag (which will most likely be in the next POSIX
revision) the programmer can choose the behavior, defaulting to the safe
variant.  As a side effect it is now possible to implement a
POSIX-compliant link(2) function for those who are interested.

  touch file
  ln -s file symlink

  linkat(fd, "symlink", fd, "newlink", 0)
    -> newlink is hardlink of symlink

  linkat(fd, "symlink", fd, "newlink", AT_SYMLINK_FOLLOW)
    -> newlink is hardlink of file

The value of AT_SYMLINK_FOLLOW is determined by the definition we already
use in glibc.

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:22 -07:00
Frode Isaksen 04a3446c90 [PATCH] fs: sys_poll with timeout -1 bug fix
If you do a poll() call with timeout -1, the wait will be a big number
(depending on HZ) instead of infinite wait, since -1 is passed to the
msecs_to_jiffies function.

Signed-off-by: Frode Isaksen <frode.isaksen@gmail.com>
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:22 -07:00
Al Viro 2b943cf09d [PATCH] fix %s in affs_fill_super()
%s is only valid if array is known to contain NUL or precision is given and
does not exceed the size of array.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:22 -07:00
Serge E. Hallyn fa366ad5d7 [PATCH] kthread: convert smbiod
Update smbiod to use kthread instead of deprecated kernel_thread.

[akpm@osdl.org: cleanup]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:21 -07:00
Eric Sesterhenn 099a71d995 [PATCH] Remove needless checks in fs/9p/vfs_inode.c
coverity found two needless checks in vfs_inode.c (cid #1165 and #1164)
In both cases inode is always NULL when we goto error; either because it
is still initialized to NULL or is set to NULL explicitly. This patch
simply removes these checks to save some code.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@lanl.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:20 -07:00
Miklos Szeredi 9c8ef5614d [PATCH] fuse: scramble lock owner ID
VFS uses current->files pointer as lock owner ID, and it wouldn't be
prudent to expose this value to userspace.  So scramble it with XTEA using
a per connection random key, known only to the kernel.  Only one direction
needs to be implemented, since the ID is never sent in the reverse
direction.

The XTEA algorithm is implemented inline since it's simple enough to do so,
and this adds less complexity than if the crypto API were used.

Thanks to Jesper Juhl for the idea.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:20 -07:00
Miklos Szeredi a4d27e75ff [PATCH] fuse: add request interruption
Add synchronous request interruption.  This is needed for file locking
operations which have to be interruptible.  However filesystem may implement
interruptibility of other operations (e.g.  like NFS 'intr' mount option).

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:19 -07:00
Miklos Szeredi f9a2842e56 [PATCH] fuse: rename the interrupted flag
Rename the 'interrupted' flag to 'aborted', since it indicates exactly that,
and next patch will introduce an 'interrupted' flag for a

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:19 -07:00
Miklos Szeredi 33649c91a3 [PATCH] fuse: ensure FLUSH reaches userspace
All POSIX locks owned by the current task are removed on close().  If the
FLUSH request resulting initiated by close() fails to reach userspace, there
might be locks remaining, which cannot be removed.

The only reason it could fail, is if allocating the request fails.  In this
case use the request reserved for RELEASE, or if that is currently used by
another FLUSH, wait for it to become available.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:19 -07:00
Miklos Szeredi 7142125937 [PATCH] fuse: add POSIX file locking support
This patch adds POSIX file locking support to the fuse interface.

This implementation doesn't keep any locking state in kernel.  Unlocking on
close() is handled by the FLUSH message, which now contains the lock owner id.

Mandatory locking is not supported.  The filesystem may enfoce mandatory
locking in userspace if needed.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:19 -07:00
Miklos Szeredi bafa96541b [PATCH] fuse: add control filesystem
Add a control filesystem to fuse, replacing the attributes currently exported
through sysfs.  An empty directory '/sys/fs/fuse/connections' is still created
in sysfs, and mounting the control filesystem here provides backward
compatibility.

Advantages of the control filesystem over the previous solution:

  - allows the object directory and the attributes to be owned by the
    filesystem owner, hence letting unpriviled users abort the
    filesystem connection

  - does not suffer from module unload race

[akpm@osdl.org: fix this fs for recent dhowells depredations]
[akpm@osdl.org: fix 64-bit printk warnings]
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:19 -07:00
Miklos Szeredi 51eb01e735 [PATCH] fuse: no backgrounding on interrupt
Don't put requests into the background when a fatal interrupt occurs while the
request is in userspace.  This removes a major wart from the implementation.

Backgrounding of requests was introduced to allow breaking of deadlocks.
However now the same can be achieved by aborting the filesystem through the
'abort' sysfs attribute.

This is a change in the interface, but should not cause problems, since these
kinds of deadlocks never happen during normal operation.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:19 -07:00
Ian Kent f9022f6633 [PATCH] autofs4: need to invalidate children on tree mount expire
I've found a case where invalid dentrys in a mount tree, waiting to be
cleaned up by d_invalidate, prevent the expected expire.

In this case dentrys created during a lookup for which a mount fails or has
no entry in the mount map contribute to the d_count of the parent dentry.
These dentrys may not be invalidated prior to comparing the interanl usage
count of valid autofs dentrys against the dentry d_count which makes a
mount tree appear busy so it doesn't expire.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:18 -07:00
Peter Staubach 6e656be899 [PATCH] ftruncate does not always update m/ctime
In the course of trying to track down a bug where a file mtime was not
being updated correctly, it was discovered that the m/ctime updates were
not quite being handled correctly for ftruncate() calls.

Quoth SUSv3:

open(2):

        If O_TRUNC is set and the file did previously exist, upon
        successful completion, open() shall mark for update the st_ctime
        and st_mtime fields of the file.

truncate(2):

        Upon successful completion, if the file size is changed, this
        function shall mark for update the st_ctime and st_mtime fields
        of the file, and the S_ISUID and S_ISGID bits of the file mode
        may be cleared.

ftruncate(2):

        Upon successful completion, if fildes refers to a regular file,
        the ftruncate() function shall mark for update the st_ctime and
        st_mtime fields of the file and the S_ISUID and S_ISGID bits of
        the file mode may be cleared. If the ftruncate() function is
        unsuccessful, the file is unaffected.

The open(O_TRUNC) and truncate cases were being handled correctly, but the
ftruncate case was being handled like the truncate case.  The semantics of
truncate and ftruncate don't quite match, so ftruncate needs to be handled
slightly differently.

The attached patch addresses this issue for ftruncate(2).

My thanx to Stephen Tweedie and Trond Myklebust for their help in
understanding the situation and semantics.

Signed-off-by: Peter Staubach <staubach@redhat.com>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:15 -07:00
Johann Lombardi 92eeccd8ba [PATCH] ext3: cleanup dead code in ext3_add_entry()
The variables nlen and rlen are defined/initialized but not used in
ext3_add_entry().

Signed-off-by: Johann Lombardi <johann.lombardi@bull.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:15 -07:00
Florin Malita 0710d36a0f [PATCH] 9pfs: missing result check in v9fs_vfs_readlink() and v9fs_vfs_link()
__getname() may fail and return NULL (as pointed out by Coverity 437 &
1220).

Signed-off-by: Florin Malita <fmalita@gmail.com>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
Cc: <rminnich@lanl.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:15 -07:00
Davide Libenzi 3419b23a91 [PATCH] epoll: use unlocked wqueue operations
A few days ago Arjan signaled a lockdep red flag on epoll locks, and
precisely between the epoll's device structure lock (->lock) and the wait
queue head lock (->lock).

Like I explained in another email, and directly to Arjan, this can't happen
in reality because of the explicit check at eventpoll.c:592, that does not
allow to drop an epoll fd inside the same epoll fd.  Since lockdep is
working on per-structure locks, it will never be able to know of policies
enforced in other parts of the code.

It was decided time ago of having the ability to drop epoll fds inside
other epoll fds, that triggers a very trick wakeup operations (due to
possibly reentrant callback-driven wakeups) handled by the
ep_poll_safewake() function.  While looking again at the code though, I
noticed that all the operations done on the epoll's main structure wait
queue head (->wq) are already protected by the epoll lock (->lock), so that
locked-style functions can be used to manipulate the ->wq member.  This
makes both a lock-acquire save, and lockdep happy.

Running totalmess on my dual opteron for a while did not reveal any problem
so far:

http://www.xmailserver.org/totalmess.c

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:13 -07:00
Valerie Henson 21730eed11 [PATCH] Make EXT2_DEBUG work again
This patch makes EXT2_DEBUG work again.  Due to lack of proper include
file, EXT2_DEBUG was undefined in bitmap.c and ext2_count_free() is left
out.  Moved to balloc.c and removed bitmap.c entirely.

Second, debug versions of ext2_count_free_{inodes/blocks} reacquires
superblock lock.  Moved lock into callers.

Signed-off-by: Val Henson <val_henson@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:12 -07:00
H. Peter Anvin 69755652c9 [PATCH] Make procfs obligatory except under CONFIG_EMBEDDED
Make procfs non-optional unless EMBEDDED is set, just like sysfs.  procfs
is already de facto required for a large subset of Linux functionality.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:11 -07:00
Mingming Cao 43d23f9039 [PATCH] ext3_fsblk_t: the rest of in-kernel filesystem blocks conversion
Convert the ext3 in-kernel filesystem blocks to ext3_fsblk_t.  Convert the
rest of all unsigned long type in-kernel filesystem blocks to ext3_fsblk_t,
and replace the printk format string respondingly.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:10 -07:00
Mingming Cao 1c2bf374a4 [PATCH] ext3_fsblk_t: filesystem, group blocks and bug fixes
Some of the in-kernel ext3 block variable type are treated as signed 4 bytes
int type, thus limited ext3 filesystem to 8TB (4kblock size based).  While
trying to fix them, it seems quite confusing in the ext3 code where some
blocks are filesystem-wide blocks, some are group relative offsets that need
to be signed value (as -1 has special meaning).  So it seem saner to define
two types of physical blocks: one is filesystem wide blocks, another is
group-relative blocks.  The following patches clarify these two types of
blocks in the ext3 code, and fix the type bugs which limit current 32 bit ext3
filesystem limit to 8TB.

With this series of patches and the percpu counter data type changes in the mm
tree, we are able to extend exts filesystem limit to 16TB.

This work is also a pre-request for the recent >32 bit ext3 work, and makes
the kernel to able to address 48 bit ext3 block a lot easier: Simply redefine
ext3_fsblk_t from unsigned long to sector_t and redefine the format string for
ext3 filesystem block corresponding.

Two RFC with a series patches have been posted to ext2-devel list and have
been reviewed and discussed:
http://marc.theaimsgroup.com/?l=ext2-devel&m=114722190816690&w=2

http://marc.theaimsgroup.com/?l=ext2-devel&m=114784919525942&w=2

Patches are tested on both 32 bit machine and 64 bit machine, <8TB ext3 and
>8TB ext3 filesystem(with the latest to be released e2fsprogs-1.39).  Tests
includes overnight fsx, tiobench, dbench and fsstress.

This patch:

Defines ext3_fsblk_t and ext3_grpblk_t, and the printk format string for
filesystem wide blocks.

This patch classifies all block group relative blocks, and ext3_fsblk_t blocks
occurs in the same function where used to be confusing before.  Also include
kernel bug fixes for filesystem wide in-kernel block variables.  There are
some fileystem wide blocks are treated as int/unsigned int type in the kernel
currently, especially in ext3 block allocation and reservation code.  This
patch fixed those bugs by converting those variables to ext3_fsblk_t(unsigned
long) type.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:10 -07:00
NeilBrown 01408c4939 [PATCH] Prepare for __copy_from_user_inatomic to not zero missed bytes
The problem is that when we write to a file, the copy from userspace to
pagecache is first done with preemption disabled, so if the source address is
not immediately available the copy fails *and* *zeros* *the* *destination*.

This is a problem because a concurrent read (which admittedly is an odd thing
to do) might see zeros rather that was there before the write, or what was
there after, or some mixture of the two (any of these being a reasonable thing
to see).

If the copy did fail, it will immediately be retried with preemption
re-enabled so any transient problem with accessing the source won't cause an
error.

The first copying does not need to zero any uncopied bytes, and doing so
causes the problem.  It uses copy_from_user_atomic rather than copy_from_user
so the simple expedient is to change copy_from_user_atomic to *not* zero out
bytes on failure.

The first of these two patches prepares for the change by fixing two places
which assume copy_from_user_atomic does zero the tail.  The two usages are
very similar pieces of code which copy from a userspace iovec into one or more
page-cache pages.  These are changed to remove the assumption.

The second patch changes __copy_from_user_inatomic* to not zero the tail.
Once these are accepted, I will look at similar patches of other architectures
where this is important (ppc, mips and sparc being the ones I can find).

This patch:

There is a problem with __copy_from_user_inatomic zeroing the tail of the
buffer in the case of an error.  As it is called in atomic context, the error
may be transient, so it results in zeros being written where maybe they
shouldn't be.

In the usage in filemap, this opens a window for a well timed read to see data
(zeros) which is not consistent with any ordering of reads and writes.

Most cases where __copy_from_user_inatomic is called, a failure results in
__copy_from_user being called immediately.  As long as the latter zeros the
tail, the former doesn't need to.  However in *copy_from_user_iovec
implementations (in both filemap and ntfs/file), it is assumed that
copy_from_user_inatomic will zero the tail.

This patch removes that assumption, so that after this patch it will
be safe for copy_from_user_inatomic to not zero the tail.

This patch also adds some commentary to filemap.h and asm-i386/uaccess.h.

After this patch, all architectures that might disable preempt when
kmap_atomic is called need to have their __copy_from_user_inatomic* "fixed".
This includes
 - powerpc
 - i386
 - mips
 - sparc

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:09 -07:00
Theodore Ts'o d2e5b13c4a [PATCH] ext3: remove inconsistent space before exclamation point in mount code
This was reported as Debian bug #336604.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:07 -07:00
Theodore Ts'o e8f1c6227a [PATCH] ext3: fix memory leak when the journal file is corrupted
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:07 -07:00
Theodore Ts'o f16fdadba2 [PATCH] ext2: clean up dead code from mount code
The variable i is guaranteed to be the same as db_count given the previous
for loop.  So get rid of it since it's dead code.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:07 -07:00
Mingming Cao fcd5df3588 [PATCH] Avoid disk sector_t overflow for >2TB ext3 filesystem
If ext3 filesystem is larger than 2TB, and sector_t is a u32 (i.e.
CONFIG_LBD not defined in the kernel), the calculation of the disk sector
will overflow.  Add check at ext3_fill_super() and ext3_group_extend() to
prevent mount/remount/resize >2TB ext3 filesystem if sector_t size is 4
bytes.

Verified this patch on a 32 bit platform without CONFIG_LBD defined
(sector_t is 32 bits long), mount refuse to mount a 10TB ext3.

Signed-off-by: Mingming Cao<cmm@us.ibm.com>
Acked-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:07 -07:00
Jan Engelhardt a04ee14636 [PATCH] openpromfs: factorize out
"Move" "common code" out to PTR_NOD, which does the conversion from private
pointer to node number.  This is to reduce potential casting/conversion errors
due to redundancy.  (The naming PTR_NOD follows PTR_ERR, turning a pointer
into xyz.)

[akpm@osdl.org: cleanups]
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:05 -07:00
Jan Engelhardt 515decdccf [PATCH] openpromfs: remove unnecessary casts
Remove unnecessary casts in fs/openpromfs/inode.c

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:05 -07:00
Jan Engelhardt 0928d68056 [PATCH] openpromfs: fix missing NUL
tchars is not '\0'-terminated so the strtoul may run into problems.  Fix that.
 Also make tchars as big as a long in hexadecimal form would take rather than
just 16.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:05 -07:00
Adrian Bunk 138bb68ac9 [PATCH] fs/ufs/inode.c: make 2 functions static
Make two needlessly global functions static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:04 -07:00
Evgeniy Dushistov 098d5af7be [PATCH] ufs: ubh_ll_rw_block cleanup
In ufs code there is function: ubh_ll_rw_block, it has parameter how many
ufs_buffer_head it should handle, but it always called with "1" on the place
of this parameter.  This patch removes unused parameter of "ubh_ll_wr_block".

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:04 -07:00
Evgeniy Dushistov ee3ffd6c12 [PATCH] ufs: make fsck -f happy
ufs super block contains some statistic about file systems, like amount of
directories, free blocks, inodes and so on.

UFS1 hold this information in one location and uses 32bit integers for such
information, UFS2 hold statistic in another location and uses 64bit integers.

There is transition variant, if UFS1 has type 44BSD and flags field in super
block has some special value this mean that we work with statistic like UFS2
does.  and this also means that nobody care about old(UFS1) statistic.

So if start fsck against such file system, after usage linux ufs driver, it
found error: at now only UFS1 like statistic is updated.

This patch should fix this.  Also it contains some minor cleanup: CodingSytle
and remove unused variables.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:04 -07:00
Evgeniy Dushistov 577a82752f [PATCH] ufs: fsync implementation
Presently ufs doesn't support "fsync", this make some applications unhappy,
for example vim.  This patch fixes this situation.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:04 -07:00
Evgeniy Dushistov 647b7e87b5 [PATCH] ufs: one way to access super block
Super block of UFS usually has size >512, because of fragment size may be 512,
this cause some problems.

Currently, there are two methods to work with ufs super block:

1) split structure which describes ufs super blocks into structures with
   size <=512

2) use one structure which describes ufs super block, and hope that array
   of "buffer_head" which holds "super block", has such construction:

	bh[n]->b_data + bh[n]->b_size == bh[n + 1]->b_data

The second variant may cause some problems in the future, and usage of two
variants cause unnecessary code duplication.

This patch remove the second variant.  Also patch contains some CodingStyle
fixes.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:04 -07:00
Evgeniy Dushistov f391475812 [PATCH] ufs: missed brelse and wrong baseblk
This patch fixes two bugs, which introduced by previous patches:

1) Missed "brelse"

2) Sometimes "baseblk" may be wrongly calculated, if i_size is equal to
   zero, which lead infinite cycle in "mpage_writepages".

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:04 -07:00
Andrew Morton 96710b29e0 [PATCH] ufs: printk warning fixes
fs/ufs/super.c: In function `ufs_print_super_stuff':
fs/ufs/super.c:103: warning: unsigned int format, different type arg (arg 2)    fs/ufs/super.c: In function `ufs2_print_super_stuff':                           fs/ufs/super.c:147: warning: unsigned int format, different type arg (arg 2)    fs/ufs/super.c: In function `ufs_print_cylinder_stuff':
fs/ufs/super.c:175: warning: unsigned int format, different type arg (arg 2)

Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:04 -07:00
Evgeniy Dushistov 022a6dc5f4 [PATCH] ufs: zero metadata
Presently if we allocate several "metadata" blocks (pointers to indirect
blocks for example), we fill with zeroes only the first block.  This cause
some problems in "truncate" function.  Also this patch remove some unused
arguments from several functions and add comments.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:03 -07:00
Evgeniy Dushistov 2e006393ba [PATCH] ufs: unlock_super without lock
ufs_free_blocks function looks now in so way:
if (err)
 goto failed;
 lock_super();
failed:
 unlock_super();

So if error happen we'll unlock not locked super.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:03 -07:00
Evgeniy Dushistov 50aa4eb0b9 [PATCH] ufs: i_blocks wrong count
At now UFS code uses DQUOT_* mechanism, but it also update inode->i_blocks
manually, this cause wrong i_blocks value.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:03 -07:00
Evgeniy Dushistov dd187a2603 [PATCH] ufs: little directory lookup optimization
This patch make little optimization of ufs_find_entry like "ext2" does.  Save
number of page and reuse it again in the next call.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:03 -07:00
Evgeniy Dushistov abf5d15fd2 [PATCH] ufs: easy debug
Currently to turn on debug mode "user" has to edit ~10 files, to turn off he
has to do it again.

This patch introduce such changes:
1)turn on(off) debug messages via ".config"
2)remove unnecessary duplication of code
3)make "UFSD" macros more similar to function
4)fix some compiler warnings

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:03 -07:00
Evgeniy Dushistov 5afb3145c9 [PATCH] ufs: Unmark CONFIG_UFS_FS_WRITE as BROKEN
To find new bugs, I suggest revert this patch:
http://lkml.org/lkml/2006/1/31/275 in -mm tree.

So others can test "write support" of UFS.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:03 -07:00
Evgeniy Dushistov 3e41f597b1 [PATCH] ufs: not usual amounts of fragments per block
The writing to UFS file system with block/fragment!=8 may cause bogus
behaviour.  The problem in "ufs_bitmap_search" function, which doesn't work
correctly in "block/fragment!=8" case.  The idea is stolen from BSD code.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:02 -07:00
Evgeniy Dushistov 9695ef16ed [PATCH] ufs: wrong type cast
There are two ugly macros in ufs code:
#define UCPI_UBH ((struct ufs_buffer_head *)ucpi)
#define USPI_UBH ((struct ufs_buffer_head *)uspi)
when uspi looks like
struct {
struct ufs_buffer_head ;
}
and USPI_UBH has some sence,
ucpi looks like
struct {
struct not_ufs_buffer_head;
}

To prevent bugs in future, this patch convert macros to inline function and
fix "ucpi" structure.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:02 -07:00
Evgeniy Dushistov b71034e5e6 [PATCH] ufs: directory and page cache: from blocks to pages
Change function in fs/ufs/dir.c and fs/ufs/namei.c to work with pages
instead of straight work with blocks.  It fixed such bugs:

* for i in `seq 1 1000`; do touch $i; done - crash system
* mkdir create directory without "." and ".." entries

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:02 -07:00
Evgeniy Dushistov 826843a347 [PATCH] ufs: directory and page cache: install aops
This series of patches finished "bugs fixing" mentioned
here http://lkml.org/lkml/2006/1/31/275 .

The main bugs:
* for i in `seq 1 1000`; do touch $i; done - crash system
* mkdir create directory without "." and ".." entries

The suggested solution is work with page cache instead of straight work
with blocks.  Such solution has following advantages

* reduce code size and its complexity
* some global locks go away
* fix bugs

The most part of code is stolen from ext2, because of it has similar
directory structure.

Patches testes with UFS1 and UFS2 file systems.

This patch installs i_mapping->a_ops for directory inodes and removes some
duplicated code.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:02 -07:00
Evgeniy Dushistov 6ef4d6bf86 [PATCH] ufs: change block number on the fly
First of all some necessary notes about UFS by it self: To avoid waste of disk
space the tail of file consists not from blocks (which is ordinary big enough,
16K usually), it consists from fragments(which is ordinary 2K).  When file is
growing its tail occupy 1 fragment, 2 fragments...  At some stage decision to
allocate whole block is made and all fragments are moved to one block.

How this situation was handled before:

  ufs_prepare_write
  ->block_prepare_write
    ->ufs_getfrag_block
      ->...
        ->ufs_new_fragments:

	bh = sb_bread
	bh->b_blocknr = result + i;
	mark_buffer_dirty (bh);

This is wrong solution, because:

- it didn't take into consideration that there is another cache: "inode page
  cache"

- because of sb_getblk uses not b_blocknr, (it uses page->index) to find
  certain block, this breaks sb_getblk.

How this situation is handled now: we go though all "page inode cache", if
there are no such page in cache we load it into cache, and change b_blocknr.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:01 -07:00
Evgeniy Dushistov c9a27b5dca [PATCH] ufs: right block allocation
* After block allocation, we map it on the same "address" as 8 others
  blocks

* We nullify block several times: once in ufs/block.c and once in
  block_*write_full_page, and use different "caches" for this.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:01 -07:00
Evgeniy Dushistov 2061df0f89 [PATCH] ufs: ufs_trunc_indirect: infinite cycle
Currently, ufs write support have two sets of problems: work with files and
work with directories.

This series of patches should solve the first problem.

This patch is similar to http://lkml.org/lkml/2006/1/17/61 this patch
complements it.

The situation the same: in ufs_trunc_(not direct), we read block, check if
count of links to it is equal to one, if so we finish cycle, if not
continue.  Because of "count of links" always >=2 this operation cause
infinite cycle and hang up the kernel.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:01 -07:00
Cliff Wickman 552c03483e [PATCH] fs/freevxfs: cleanup of spelling errors
Fix of some spelling errors in fs/freevxfs error messages and comments

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:01 -07:00
Steve French f90f00a358 [CIFS] Fix compile warning when CONFIG_CIFS_EXPERIMENTAL is off
Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-06-25 15:59:32 +00:00
Steve French bbe5d235ee Merge with /pub/scm/linux/kernel/git/torvalds/linux-2.6.git 2006-06-25 15:57:32 +00:00
Alexey Dobriyan 9bf2aa129a nfs: remove nfs_put_link()
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-25 06:39:35 -04:00
Andrew Morton 6ab86aa130 nfs-build-fix-99
fs/built-in.o:(__param+0x20): undefined reference to `nfs_idmap_cache_timeout'
fs/built-in.o:(__param+0x48): undefined reference to `nfs_callback_set_tcpport'

Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andreas Gruenbacher <agruen@suse.de>
Cc: Andy Adamson <andros@citi.umich.edu>
Cc: Chuck Lever <cel@netapp.com>
Cc: David Howells <dhowells@redhat.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Manoj Naik <manoj@almaden.ibm.com>
Cc: Marc Eshel <eshel@almaden.ibm.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-25 06:38:47 -04:00
Andrew Morton d75d54147d git-nfs-build-fixes
Fix various problems with nfs4 disabled.  And various other things.

In file included from fs/nfs/inode.c:50:
fs/nfs/internal.h:24: error: static declaration of 'nfs_do_refmount' follows non-static declaration
include/linux/nfs_fs.h:320: error: previous declaration of 'nfs_do_refmount' was here
fs/nfs/internal.h:65: warning: 'struct nfs4_fs_locations' declared inside parameter list
fs/nfs/internal.h:65: warning: its scope is only this definition or declaration, which is probably not what you want
fs/nfs/internal.h: In function 'nfs4_path':
fs/nfs/internal.h:97: error: 'struct nfs_server' has no member named 'mnt_path'
fs/nfs/inode.c: In function 'init_once':
fs/nfs/inode.c:1116: error: 'struct nfs_inode' has no member named 'open_states'
fs/nfs/inode.c:1116: error: 'struct nfs_inode' has no member named 'delegation'
fs/nfs/inode.c:1116: error: 'struct nfs_inode' has no member named 'delegation_state'
fs/nfs/inode.c:1116: error: 'struct nfs_inode' has no member named 'rwsem'
distcc[26452] ERROR: compile fs/nfs/inode.c on g5/64 failed
make[1]: *** [fs/nfs/inode.o] Error 1
make: *** [fs/nfs/inode.o] Error 2
make: *** Waiting for unfinished jobs....
In file included from fs/nfs/nfs3xdr.c:26:
fs/nfs/internal.h:24: error: static declaration of 'nfs_do_refmount' follows non-static declaration
include/linux/nfs_fs.h:320: error: previous declaration of 'nfs_do_refmount' was here
fs/nfs/internal.h:65: warning: 'struct nfs4_fs_locations' declared inside parameter list
fs/nfs/internal.h:65: warning: its scope is only this definition or declaration, which is probably not what you want
fs/nfs/internal.h: In function 'nfs4_path':
fs/nfs/internal.h:97: error: 'struct nfs_server' has no member named 'mnt_path'
distcc[26486] ERROR: compile fs/nfs/nfs3xdr.c on g5/64 failed
make[1]: *** [fs/nfs/nfs3xdr.o] Error 1
make: *** [fs/nfs/nfs3xdr.o] Error 2
In file included from fs/nfs/nfs3proc.c:24:
fs/nfs/internal.h:24: error: static declaration of 'nfs_do_refmount' follows non-static declaration
include/linux/nfs_fs.h:320: error: previous declaration of 'nfs_do_refmount' was here
fs/nfs/internal.h:65: warning: 'struct nfs4_fs_locations' declared inside parameter list
fs/nfs/internal.h:65: warning: its scope is only this definition or declaration, which is probably not what you want
fs/nfs/internal.h: In function 'nfs4_path':
fs/nfs/internal.h:97: error: 'struct nfs_server' has no member named 'mnt_path'
distcc[26469] ERROR: compile fs/nfs/nfs3proc.c on bix/32 failed
make[1]: *** [fs/nfs/nfs3proc.o] Error 1
make: *** [fs/nfs/nfs3proc.o] Error 2
**FAILED**

Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andreas Gruenbacher <agruen@suse.de>
Cc: Andy Adamson <andros@citi.umich.edu>
Cc: Chuck Lever <cel@netapp.com>
Cc: David Howells <dhowells@redhat.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Manoj Naik <manoj@almaden.ibm.com>
Cc: Marc Eshel <eshel@almaden.ibm.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-25 06:38:11 -04:00
Trond Myklebust ccf01ef7aa Merge branch 'odirect' 2006-06-25 06:27:31 -04:00
Andrew Morton 1b77c54ee1 V4L/DVB (3809a): Remove compat stuff for DMX_GET_EVENT
The ioctl were removed by:
V4L/DVB (3727): Remove DMX_GET_EVENT and associated data structures
due to the ioctl DMX_GET_EVENT has never been implemented, and also
scrambling events can't be generated in a useful way by the hardware.

This patch removes the corresponding entry at fs/compat_ioctl.c

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2006-06-25 01:58:10 -03:00
Chuck Lever 82b145c5a5 NFS: alloc nfs_read/write_data as direct I/O is scheduled
Re-arrange the logic in the NFS direct I/O path so that nfs_read/write_data
structs are allocated just before they are scheduled, rather than
allocating them all at once before we start scheduling requests.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-24 13:11:39 -04:00
Chuck Lever 06cf6f2ed0 NFS: Eliminate nfs_get_user_pages()
Neil Brown observed that the kmalloc() in nfs_get_user_pages() is more
likely to fail if the I/O is large enough to require the allocation of more
than a single page to keep track of all the pinned pages in the user's
buffer.

Instead of tracking one large page array per dreq/iocb, track pages per
nfs_read/write_data, just like the cached I/O path does.  An array for
pages is already allocated for us by nfs_readdata_alloc() (and the write
and commit equivalents).

This is also required for adding support for vectored I/O to the NFS direct
I/O path.

The original reason to pin the user buffer and allocate all the NFS data
structures before trying to schedule I/O was to ensure all needed resources
are allocated on the client before starting to send requests.  This reduces
the chance that resource exhaustion on the client will cause a short read
or write.

On the other hand, for an application making very large application I/O
requests, this means that it will be nearly impossible for the application
to make forward progress on a resource-limited client.

Thus, moving the buffer pinning functionality into the I/O scheduling
loops should be good for scalability.  The next patch will do the same for
NFS data structure allocation.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-24 13:11:39 -04:00
Chuck Lever 9c93ab7dff NFS: refactor nfs_direct_free_user_pages
Clean-up and fix a minor bug: the logic was dirtying page cache pages on
both read and write operations.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-24 13:11:39 -04:00
Chuck Lever 51a7bc6cae NFS: remove user_addr, user_count, and pos from nfs_direct_req
Make the user_addr, user_count, and pos parameters explicit to the
scheduler routines, and remove the fields from nfs_direct_req.  The
iovec API will be passing in a series of these, not just one set.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-24 13:11:39 -04:00
Chuck Lever fedb595c66 NFS: "open code" the NFS direct write rescheduler
An NFSv3/v4 client must reschedule on-the-wire writes if the writes are
UNSTABLE, and the server reboots before the client can complete a
subsequent COMMIT request.

To support direct asynchronous scatter-gather writes, the write
rescheduler in fs/nfs/direct.c must not depend on the I/O parameters
in the controlling nfs_direct_req structure.  iovecs can be somewhat
arbitrarily complex, so there could be an unbounded amount of information
to save for a rarely encountered requirement.

Refactor the direct write rescheduler so it uses information from each
nfs_write_data structure to reschedule writes, instead of caching that
information in the controlling nfs_direct_req structure.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-24 13:11:38 -04:00
Chuck Lever b1c5921c5b NFS: Separate functions for counting outstanding NFS direct I/Os
Factor out the logic that increments and decrements the outstanding I/O
count.  This will be a commonly used bit of code in upcoming patches.
Also make this an atomic_t again, since it will be very often manipulated
outside dreq->spin lock.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-24 13:11:38 -04:00
Trond Myklebust 816724e65c Merge branch 'master' of /home/trondmy/kernel/linux-2.6/
Conflicts:

	fs/nfs/inode.c
	fs/super.c

Fix conflicts between patch 'NFS: Split fs/nfs/inode.c' and patch
'VFS: Permit filesystem to override root dentry on mount'
2006-06-24 13:07:53 -04:00
Jens Axboe 9e94cd4fd1 [PATCH] splice: retrieve mapping after locking the page
Otherwise we could be racing with truncate/mapping removal.

Problem found/fixed by Nick Piggin <npiggin@suse.de>, logic rewritten
by me.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23 17:10:39 +02:00
Jens Axboe b31dc66a54 [PATCH] Kill PF_SYNCWRITE flag
A process flag to indicate whether we are doing sync io is incredibly
ugly. It also causes performance problems when one does a lot of async
io and then proceeds to sync it. Part of the io will go out as async,
and the other part as sync. This causes a disconnect between the
previously submitted io and the synced io. For io schedulers such as CFQ,
this will cause us lost merges and suboptimal behaviour in scheduling.

Remove PF_SYNCWRITE completely from the fsync/msync paths, and let
the O_DIRECT path just directly indicate that the writes are sync
by using WRITE_SYNC instead.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23 17:10:39 +02:00
Mike Miller 98bd34eaf1 [PATCH] make kernel warn about incorrectly sized partitions
Sometimes partitions claim to be larger than the reported capacity of a
disk device.  This patch makes the kernel warn about those partitions.

We still permit these patitions to be used.  Quoting Andries Brouwer
<Andries.Brouwer@cwi.nl>:

 Case 1: The kernel is mistaken about the size of the disk.  (There are
 commands to clip a disk to a certain capacity, there are jumpers to tell a
 disk that it should report a certain capacity etc.  Usually this is because
 of BIOS bugs.  In bad cases the machine will crash in the BIOS and hence fail
 to boot if the disk reports full capacity.) In such cases actually accessing
 the blocks of the partition may work fine, or may work fine after running an
 unclip utility.  I wrote "setmax" some years ago precisely for this reason.

 Case 2: There was a messy partition table (maybe just a rounding error) but
 the actual filesystem on the partition is contained in the physical disk.
 Now using the filesystem goes without problem.

 Case 3: Both partition and filesystem extend beyond the end of the disk.  In
 forensic or debugging situations one often uses a copy of the start of a
 disk.  Now access beyond the end gives an expected I/O error.

Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Stephen Cameron <steve.cameron@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:09 -07:00
Jan Kara 78ce89c92b [PATCH] JBD: split checkpoint lists
Split the checkpoint list of the transaction into two lists.  In the first
list we keep the buffers that need to be submitted for IO.  In the second
list are kept buffers that were already submitted and we just have to wait
for the IO to complete.  This should simplify a handling of checkpoint
lists a bit and can eventually be also a performance gain.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:08 -07:00
Oleg Nesterov 626ab0e69d [PATCH] list: use list_replace_init() instead of list_splice_init()
list_splice_init(list, head) does unneeded job if it is known that
list_empty(head) == 1.  We can use list_replace_init() instead.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:07 -07:00
Mingming Cao 0216bfcffe [PATCH] percpu counter data type changes to suppport more than 2**31 ext3 free blocks counter
The percpu counter data type are changed in this set of patches to support
more users like ext3 who need more than 32 bit to store the free blocks
total in the filesystem.

- Generic perpcu counters data type changes.  The size of the global counter
  and local counter were explictly specified using s64 and s32.  The global
  counter is changed from long to s64, while the local counter is changed from
  long to s32, so we could avoid doing 64 bit update in most cases.

- Users of the percpu counters are updated to make use of the new
  percpu_counter_init() routine now taking an additional parameter to allow
  users to pass the initial value of the global counter.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:06 -07:00
Jesper Juhl 785d55708c [PATCH] binflt_elf: remove more casts
Remove redundant casts from NEW_AUX_ENT() arguments in fs/binfmt_elf.c

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:05 -07:00
Jesper Juhl f4e5cc2c44 [PATCH] binfmt_elf: CodingStyle cleanup and remove some pointless casts
Do a CodingStyle cleanup of fs/binfmt_elf.c and also remove some pointless
casts of kmalloc() return values in the same file.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:05 -07:00
Andrew Morton e6022603b9 [PATCH] ext3_clear_inode(): avoid kfree(NULL)
Steven Rostedt <rostedt@goodmis.org> points out that `rsv' here is usually
NULL, so we should avoid calling kfree().

Also, fix up some nearby whitespace damage.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:05 -07:00
Andrew Morton 304c4c841a [PATCH] jbd: avoid kfree(NULL)
There are a couple of places where JBD has to check to see whether an unneeded
memory allocation was performed.  Usually it _was_ needed, so we end up
calling kfree(NULL).  We can micro-optimise that by checking the pointer
before calling kfree().

Thanks to Steven Rostedt <rostedt@goodmis.org> for identifying this.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:05 -07:00
Jan Kara 9ada734098 [PATCH] jbd: fix BUG in journal_commit_transaction()
Fix possible assertion failure in journal_commit_transaction() on
jh->b_next_transaction == NULL (when we are processing BJ_Forget list and
buffer is not jbddirty).

!jbddirty buffers can be placed on BJ_Forget list for example by
journal_forget() or by __dispose_buffer() - generally such buffer means
that it has been freed by this transaction.

Freed buffers should not be reallocated until the transaction has committed
(that's why we have the assertion there) but they *can* be reallocated when
the transaction has already been committed to disk and we are just
processing the BJ_Forget list (as soon as we remove b_committed_data from
the bitmap bh, ext3 will be able to reallocate buffers freed by the
committing transaction).  So we have to also count with the case that the
buffer has been reallocated and b_next_transaction has been already set.

And one more subtle point: it can happen that we manage to reallocate the
buffer and also mark it jbddirty.  Then we also add the freed buffer to the
checkpoint list of the committing trasaction.  But that should do no harm.

Non-jbddirty buffers should be filed to BJ_Reserved and not BJ_Metadata
list.  It can actually happen that we refile such buffers during the commit
phase when we reallocate in the running transaction blocks deleted in
committing transaction (and that can happen if the committing transaction
already wrote all the data and is just cleaning up BJ_Forget list).

Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: "Stephen C. Tweedie" <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:04 -07:00
Vadim Lobanov 4a4b69f79b [PATCH] Poll cleanups/microoptimizations
The "count" and "pt" variables are declared and modified by do_poll(), as
well as accessed and written indirectly in the do_pollfd() subroutine.

This patch pulls all handling of these variables into the do_poll()
function, thereby eliminating the odd use of indirection in do_pollfd().
This is done by pulling the "struct pollfd" traversal loop from do_pollfd()
into its only caller do_poll().  As an added bonus, the patch saves a few
clock cycles, and also adds comments to make the code easier to follow.

Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:03 -07:00
Adrian Bunk 2da1326463 [PATCH] fs/fat/misc.c: unexport fat_sync_bhs
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:03 -07:00
Adrian Bunk b0904e147f [PATCH] fs/locks.c: make posix_locks_deadlock() static
We can now make posix_locks_deadlock() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:03 -07:00
Miklos Szeredi 75e1fcc0b1 [PATCH] vfs: add lock owner argument to flush operation
Pass the POSIX lock owner ID to the flush operation.

This is useful for filesystems which don't want to store any locking state
in inode->i_flock but want to handle locking/unlocking POSIX locks
internally.  FUSE is one such filesystem but I think it possible that some
network filesystems would need this also.

Also add a flag to indicate that a POSIX locking request was generated by
close(), so filesystems using the above feature won't send an extra locking
request in this case.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:02 -07:00
Miklos Szeredi ff7b86b820 [PATCH] locks: clean up locks_remove_posix()
locks_remove_posix() can use posix_lock_file() instead of doing the lock
removal by hand.  posix_lock_file() now does exacly the same.

The comment about pids no longer applies, posix_lock_file() takes only the
owner into account.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:02 -07:00
Miklos Szeredi 39005d022a [PATCH] locks: don't do unnecessary allocations
posix_lock_file() always allocates new locks in advance, even if it's easy to
determine that no allocations will be needed.

Optimize these cases:

 - FL_ACCESS flag is set

 - Unlocking the whole range

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:02 -07:00
Miklos Szeredi 0d9a490abe [PATCH] locks: don't unnecessarily fail posix lock operations
posix_lock_file() was too cautious, failing operations on OOM, even if they
didn't actually require an allocation.

This has the disadvantage, that a failing unlock on process exit could lead to
a memory leak.  There are two possibilites for this:

- filesystem implements .lock() and calls back to posix_lock_file().  On
cleanup of files_struct locks_remove_posix() is called which should remove all
locks belonging to files_struct.  However if filesystem calls
posix_lock_file() which fails, then those locks will never be freed.

- if a file is closed while a lock is blocked, then after acquiring
fcntl_setlk() will undo the lock.  But this unlock itself might fail on OOM,
again possibly leaking the lock.

The solution is to move the checking of the allocations until after it is sure
that they will be needed.  This will solve the above problem since unlock will
always succeed unless it splits an existing region.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:02 -07:00
Pekka Enberg 090d2b185d [PATCH] read_mapping_page for address space
Add read_mapping_page() which is used for callers that pass
mapping->a_ops->readpage as the filler for read_cache_page.  This removes
some duplication from filesystem code.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:02 -07:00
Al Viro 0c426f26cc [PATCH] ext2 XIP won't build without MMU
Disable Ext2 XIP if the kernel is configured in no-MMU mode as the former
won't build.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:55 -07:00
Al Viro 530018bf3d [PATCH] frv: binfmt_elf_fdpic __user annotations
Add __user annotations to binfmt_elf_fdpic.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:54 -07:00
James Morris 03e6806063 [PATCH] lsm: add task_setioprio hook
Implement an LSM hook for setting a task's IO priority, similar to the hook
for setting a tasks's nice value.

A previous version of this LSM hook was included in an older version of
multiadm by Jan Engelhardt, although I don't recall it being submitted
upstream.

Also included is the corresponding SELinux hook, which re-uses the setsched
permission in the proccess class.

Signed-off-by: James Morris <jmorris@namei.org>
Acked-by:  Stephen Smalley <sds@tycho.nsa.gov>
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:53 -07:00
OGAWA Hirofumi 111ebb6e6f [PATCH] writeback: fix range handling
When a writeback_control's `start' and `end' fields are used to
indicate a one-byte-range starting at file offset zero, the required
values of .start=0,.end=0 mean that the ->writepages() implementation
has no way of telling that it is being asked to perform a range
request.  Because we're currently overloading (start == 0 && end == 0)
to mean "this is not a write-a-range request".

To make all this sane, the patch changes range of writeback_control.

So caller does: If it is calling ->writepages() to write pages, it
sets range (range_start/end or range_cyclic) always.

And if range_cyclic is true, ->writepages() thinks the range is
cyclic, otherwise it just uses range_start and range_end.

This patch does,

    - Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h
      -1 is usually ok for range_end (type is long long). But, if someone did,

		range_end += val;		range_end is "val - 1"
		u64val = range_end >> bits;	u64val is "~(0ULL)"

      or something, they are wrong. So, this adds LLONG_MAX to avoid nasty
      things, and uses LLONG_MAX for range_end.

    - All callers of ->writepages() sets range_start/end or range_cyclic.

    - Fix updates of ->writeback_index. It seems already bit strange.
      If it starts at 0 and ended by check of nr_to_write, this last
      index may reduce chance to scan end of file.  So, this updates
      ->writeback_index only if range_cyclic is true or whole-file is
      scanned.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Steven French <sfrench@us.ibm.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:49 -07:00
Chen, Kenneth W a43a8c39bb [PATCH] tightening hugetlb strict accounting
Current hugetlb strict accounting for shared mapping always assume mapping
starts at zero file offset and reserves pages between zero and size of the
file.  This assumption often reserves (or lock down) a lot more pages then
necessary if application maps at none zero file offset.  libhugetlbfs is
one example that requires proper reservation on shared mapping starts at
none zero offset.

This patch extends the reservation and hugetlb strict accounting to support
any arbitrary pair of (offset, len), resulting a much more robust and
accurate scheme.  More importantly, it won't lock down any hugetlb pages
outside file mapping.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:48 -07:00
KAMEZAWA Hiroyuki 6f0419e06a [PATCH] for_each_possible_cpu: xfs
for_each_cpu() actually iterates across all possible CPUs.  We've had mistakes
in the past where people were using for_each_cpu() where they should have been
iterating across only online or present CPUs.  This is inefficient and
possibly buggy.

We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
future.

This patch replaces for_each_cpu with for_each_possible_cpu.
in xfs.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:45 -07:00
David Howells d6938d1b27 [PATCH] XFS: Use the dentry passed to statfs() to limit the scope of the results
Enable XFS to limit the statfs() results to the project quota covering the
dentry used as a base for call.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:45 -07:00
David Howells 726c334223 [PATCH] VFS: Permit filesystem to perform statfs with a known root dentry
Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.

This complements the get_sb() patch.  That reduced the significance of
sb->s_root, allowing NFS to place a fake root there.  However, NFS does
require a dentry to use as a target for the statfs operation.  This permits
the root in the vfsmount to be used instead.

linux/mount.h has been added where necessary to make allyesconfig build
successfully.

Interest has also been expressed for use with the FUSE and XFS filesystems.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:45 -07:00
David Howells 454e2398be [PATCH] VFS: Permit filesystem to override root dentry on mount
Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.

The filesystem is then required to manually set the superblock and root dentry
pointers.  For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).

The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.

This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing.  In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.

The patch also makes the following changes:

 (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
     pointer argument and return an integer, so most filesystems have to change
     very little.

 (*) If one of the convenience function is not used, then get_sb() should
     normally call simple_set_mnt() to instantiate the vfsmount. This will
     always return 0, and so can be tail-called from get_sb().

 (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
     dcache upon superblock destruction rather than shrink_dcache_anon().

     This is required because the superblock may now have multiple trees that
     aren't actually bound to s_root, but that still need to be cleaned up. The
     currently called functions assume that the whole tree is rooted at s_root,
     and that anonymous dentries are not the roots of trees which results in
     dentries being left unculled.

     However, with the way NFS superblock sharing are currently set to be
     implemented, these assumptions are violated: the root of the filesystem is
     simply a dummy dentry and inode (the real inode for '/' may well be
     inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
     with child trees.

     [*] Anonymous until discovered from another tree.

 (*) The documentation has been adjusted, including the additional bit of
     changing ext2_* into foo_* in the documentation.

[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:45 -07:00
Steve French 189acaaef8 [CIFS] Enable sec flags on mount for cifs (part one)
Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-06-23 02:33:48 +00:00