Commit Graph

11274 Commits

Author SHA1 Message Date
Rusty Russell cb78a0ce69 bitmap: fix seq_bitmap and seq_cpumask to take const pointer
Impact: cleanup

seq_bitmap just calls bitmap_scnprintf on the bits: that arg can be const.
Similarly, seq_cpumask just calls seq_bitmap.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-12-30 09:05:14 +10:30
Jens Axboe d3f761104b bio: get rid of bio_vec clearing
We don't need to clear the memory used for adding bio_vec entries,
since nobody should be looking at members unitialized. Any valid
use should be below bio->bi_vcnt, and that members up until that count
must be valid since they were added through bio_add_page().

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-12-29 08:29:53 +01:00
Jens Axboe b3a6ffe16b Get rid of CONFIG_LSF
We have two seperate config entries for large devices/files. One
is CONFIG_LBD that guards just the devices, the other is CONFIG_LSF
that handles large files. This doesn't make a lot of sense, you typically
want both or none. So get rid of CONFIG_LSF and change CONFIG_LBD wording
to indicate that it covers both.

Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-12-29 08:29:51 +01:00
Jens Axboe abf137dd77 aio: make the lookup_ioctx() lockless
The mm->ioctx_list is currently protected by a reader-writer lock,
so we always grab that lock on the read side for doing ioctx
lookups. As the workload is extremely reader biased, turn this into
an rcu hlist so we can make lookup_ioctx() lockless. Get rid of
the rwlock and use a spinlock for providing update side exclusion.

There's usually only 1 entry on this list, so it doesn't make sense
to look into fancier data structures.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-12-29 08:29:50 +01:00
Jens Axboe 392ddc3298 bio: add support for inlining a number of bio_vecs inside the bio
When we go and allocate a bio for IO, we actually do two allocations.
One for the bio itself, and one for the bi_io_vec that holds the
actual pages we are interested in.

This feature inlines a definable amount of io vecs inside the bio
itself, so we eliminate the bio_vec array allocation for IO's up
to a certain size. It defaults to 4 vecs, which is typically 16k
of IO.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-12-29 08:29:50 +01:00
Jens Axboe bb799ca020 bio: allow individual slabs in the bio_set
Instead of having a global bio slab cache, add a reference to one
in each bio_set that is created. This allows for personalized slabs
in each bio_set, so that they can have bios of different sizes.

This means we can personalize the bios we return. File systems may
want to embed the bio inside another structure, to avoid allocation
more items (and stuffing them in ->bi_private) after the get a bio.
Or we may want to embed a number of bio_vecs directly at the end
of a bio, to avoid doing two allocations to return a bio. This is now
possible.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-12-29 08:29:23 +01:00
Jens Axboe 1b43449869 bio: move the slab pointer inside the bio_set
In preparation for adding differently sized bios.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-12-29 08:28:46 +01:00
Jens Axboe 7ff9345ffa bio: only mempool back the largest bio_vec slab cache
We only very rarely need the mempool backing, so it makes sense to
get rid of all but one of the mempool in a bio_set. So keep the
largest bio_vec count mempool so we can always honor the largest
allocation, and "upgrade" callers that fail.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-12-29 08:28:46 +01:00
Keith Mannthey 08bafc0341 block: Supress Buffer I/O errors when SCSI REQ_QUIET flag set
Allow the scsi request REQ_QUIET flag to be propagated to the buffer
file system layer. The basic ideas is to pass the flag from the scsi
request to the bio (block IO) and then to the buffer layer.  The buffer
layer can then suppress needless printks.

This patch declutters the kernel log by removed the 40-50 (per lun)
buffer io error messages seen during a boot in my multipath setup . It
is a good chance any real errors will be missed in the "noise" it the
logs without this patch.

During boot I see blocks of messages like
"
__ratelimit: 211 callbacks suppressed
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242847
Buffer I/O error on device sdm, logical block 1
Buffer I/O error on device sdm, logical block 5242878
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242879
Buffer I/O error on device sdm, logical block 5242872
"
in my logs.

My disk environment is multipath fiber channel using the SCSI_DH_RDAC
code and multipathd.  This topology includes an "active" and "ghost"
path for each lun. IO's to the "ghost" path will never complete and the
SCSI layer, via the scsi device handler rdac code, quick returns the IOs
to theses paths and sets the REQ_QUIET scsi flag to suppress the scsi
layer messages.

 I am wanting to extend the QUIET behavior to include the buffer file
system layer to deal with these errors as well. I have been running this
patch for a while now on several boxes without issue.  A few runs of
bonnie++ show no noticeable difference in performance in my setup.

Thanks for John Stultz for the quiet_error finalization.

Submitted-by:  Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-12-29 08:28:44 +01:00
Lachlan McIlroy 0a8c5395f9 [XFS] Fix merge failures
Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6

Conflicts:

	fs/xfs/linux-2.6/xfs_cred.h
	fs/xfs/linux-2.6/xfs_globals.h
	fs/xfs/linux-2.6/xfs_ioctl.c
	fs/xfs/xfs_vnodeops.h

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-29 16:47:18 +11:00
Linus Torvalds 3c92ec8ae9 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (144 commits)
  powerpc/44x: Support 16K/64K base page sizes on 44x
  powerpc: Force memory size to be a multiple of PAGE_SIZE
  powerpc/32: Wire up the trampoline code for kdump
  powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M
  powerpc/32: Allow __ioremap on RAM addresses for kdump kernel
  powerpc/32: Setup OF properties for kdump
  powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs()
  powerpc: Prepare xmon_save_regs for use with kdump
  powerpc: Remove default kexec/crash_kernel ops assignments
  powerpc: Make default kexec/crash_kernel ops implicit
  powerpc: Setup OF properties for ppc32 kexec
  powerpc/pseries: Fix cpu hotplug
  powerpc: Fix KVM build on ppc440
  powerpc/cell: add QPACE as a separate Cell platform
  powerpc/cell: fix build breakage with CONFIG_SPUFS disabled
  powerpc/mpc5200: fix error paths in PSC UART probe function
  powerpc/mpc5200: add rts/cts handling in PSC UART driver
  powerpc/mpc5200: Make PSC UART driver update serial errors counters
  powerpc/mpc5200: Remove obsolete code from mpc5200 MDIO driver
  powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver
  ...

Fix trivial conflict in drivers/char/Makefile as per Paul's directions
2008-12-28 16:54:33 -08:00
Stephen Rothwell bf66542bef cifs: update for new IP4/6 address printing
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-28 16:29:58 -08:00
Linus Torvalds 0191b625ca Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
  net: Allow dependancies of FDDI & Tokenring to be modular.
  igb: Fix build warning when DCA is disabled.
  net: Fix warning fallout from recent NAPI interface changes.
  gro: Fix potential use after free
  sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
  sfc: When disabling the NIC, close the device rather than unregistering it
  sfc: SFT9001: Add cable diagnostics
  sfc: Add support for multiple PHY self-tests
  sfc: Merge top-level functions for self-tests
  sfc: Clean up PHY mode management in loopback self-test
  sfc: Fix unreliable link detection in some loopback modes
  sfc: Generate unique names for per-NIC workqueues
  802.3ad: use standard ethhdr instead of ad_header
  802.3ad: generalize out mac address initializer
  802.3ad: initialize ports LACPDU from const initializer
  802.3ad: remove typedef around ad_system
  802.3ad: turn ports is_individual into a bool
  802.3ad: turn ports is_enabled into a bool
  802.3ad: make ntt bool
  ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
  ...

Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
2008-12-28 12:49:40 -08:00
Linus Torvalds 54a696bd07 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (31 commits)
  [CIFS] Remove redundant test
  [CIFS] make sure that DFS pathnames are properly formed
  Remove an already-checked error condition in SendReceiveBlockingLock
  Streamline SendReceiveBlockingLock: Use "goto out:" in an error condition
  Streamline SendReceiveBlockingLock: Use "goto out:" in an error condition
  [CIFS] Streamline SendReceive[2] by using "goto out:" in an error condition
  Slightly streamline SendReceive[2]
  Check the return value of cifs_sign_smb[2]
  [CIFS] Cleanup: Move the check for too large R/W requests
  [CIFS] Slightly simplify wait_for_free_request(), remove an unnecessary "else" branch
  Simplify allocate_mid() slightly: Remove some unnecessary "else" branches
  [CIFS] In SendReceive, move consistency check out of the mutexed region
  cifs: store password in tcon
  cifs: have calc_lanman_hash take more granular args
  cifs: zero out session password before freeing it
  cifs: fix wait_for_response to time out sleeping processes correctly
  [CIFS] Can not mount with prefixpath if root directory of share is inaccessible
  [CIFS] various minor cleanups pointed out by checkpatch script
  [CIFS] fix typo
  [CIFS] remove sparse warning
  ...

Fix trivial conflict in fs/cifs/cifs_fs_sb.h due to comment changes for
the CIFS_MOUNT_xyz bit definitions between cifs updates and security
updates.
2008-12-28 12:37:14 -08:00
Linus Torvalds 1db2a5c11e Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: (85 commits)
  [S390] provide documentation for hvc_iucv kernel parameter.
  [S390] convert ctcm printks to dev_xxx and pr_xxx macros.
  [S390] convert zfcp printks to pr_xxx macros.
  [S390] convert vmlogrdr printks to pr_xxx macros.
  [S390] convert zfcp dumper printks to pr_xxx macros.
  [S390] convert cpu related printks to pr_xxx macros.
  [S390] convert qeth printks to dev_xxx and pr_xxx macros.
  [S390] convert sclp printks to pr_xxx macros.
  [S390] convert iucv printks to dev_xxx and pr_xxx macros.
  [S390] convert ap_bus printks to pr_xxx macros.
  [S390] convert dcssblk and extmem printks messages to pr_xxx macros.
  [S390] convert monwriter printks to pr_xxx macros.
  [S390] convert s390 debug feature printks to pr_xxx macros.
  [S390] convert monreader printks to pr_xxx macros.
  [S390] convert appldata printks to pr_xxx macros.
  [S390] convert setup printks to pr_xxx macros.
  [S390] convert hypfs printks to pr_xxx macros.
  [S390] convert time printks to pr_xxx macros.
  [S390] convert cpacf printks to pr_xxx macros.
  [S390] convert cio printks to pr_xxx macros.
  ...
2008-12-28 12:33:21 -08:00
Linus Torvalds a39b863342 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
  sched: fix warning in fs/proc/base.c
  schedstat: consolidate per-task cpu runtime stats
  sched: use RCU variant of list traversal in for_each_leaf_rt_rq()
  sched, cpuacct: export percpu cpuacct cgroup stats
  sched, cpuacct: refactoring cpuusage_read / cpuusage_write
  sched: optimize update_curr()
  sched: fix wakeup preemption clock
  sched: add missing arch_update_cpu_topology() call
  sched: let arch_update_cpu_topology indicate if topology changed
  sched: idle_balance() does not call load_balance_newidle()
  sched: fix sd_parent_degenerate on non-numa smp machine
  sched: add uid information to sched_debug for CONFIG_USER_SCHED
  sched: move double_unlock_balance() higher
  sched: update comment for move_task_off_dead_cpu
  sched: fix inconsistency when redistribute per-cpu tg->cfs_rq shares
  sched/rt: removed unneeded defintion
  sched: add hierarchical accounting to cpu accounting controller
  sched: include group statistics in /proc/sched_debug
  sched: rename SCHED_NO_NO_OMIT_FRAME_POINTER => SCHED_OMIT_FRAME_POINTER
  sched: clean up SCHED_CPUMASK_ALLOC
  ...
2008-12-28 12:27:58 -08:00
Linus Torvalds b0f4b285d7 Merge branch 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (241 commits)
  sched, trace: update trace_sched_wakeup()
  tracing/ftrace: don't trace on early stage of a secondary cpu boot, v3
  Revert "x86: disable X86_PTRACE_BTS"
  ring-buffer: prevent false positive warning
  ring-buffer: fix dangling commit race
  ftrace: enable format arguments checking
  x86, bts: memory accounting
  x86, bts: add fork and exit handling
  ftrace: introduce tracing_reset_online_cpus() helper
  tracing: fix warnings in kernel/trace/trace_sched_switch.c
  tracing: fix warning in kernel/trace/trace.c
  tracing/ring-buffer: remove unused ring_buffer size
  trace: fix task state printout
  ftrace: add not to regex on filtering functions
  trace: better use of stack_trace_enabled for boot up code
  trace: add a way to enable or disable the stack tracer
  x86: entry_64 - introduce FTRACE_ frame macro v2
  tracing/ftrace: add the printk-msg-only option
  tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()
  x86, bts: correctly report invalid bts records
  ...

Fixed up trivial conflict in scripts/recordmcount.pl due to SH bits
being already partly merged by the SH merge.
2008-12-28 12:21:10 -08:00
KOSAKI Motohiro 26ddd8d5ca proc: remove ifdef CONFIG_SPARSE_IRQ from stat.c
Impact: cleanup

irq_desc can be NULL when CONFIG_SPARSE_IRQ=y only.
therefore, NULL checking can move into kstat_irqs_cpu() of SPARSE_IRQ version.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: "Yinghai Lu" <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-26 09:48:18 +01:00
Julia Lawall 359d67d6ad [CIFS] Remove redundant test
In fs/cifs/cifssmb.c, pLockData is tested for being NULL at the beginning
of the function, and not reassigned subsequently.

A simplified version of the semantic patch that makes this change is as
follows: (http://www.emn.fr/x-info/coccinelle/)

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:13 +00:00
Steve French c6fbba0546 [CIFS] make sure that DFS pathnames are properly formed
The paths in a DFS request are supposed to only have a single preceding
backslash, but we are sending them with a double backslash. This is
exposing a bug in Windows where it also sends a path in the response
that has a double backslash.

The existing code that builds the mount option string however expects a
double backslash prefix in a couple of places when it tries to use the
path returned by build_path_from_dentry. Fix compose_mount_options to
expect properly formed DFS paths (single backslash at front).

Also clean up error handling in that function. There was a possible
NULL pointer dereference and situations where a partially built option
string would be returned.

Tested against Samba 3.0.28-ish server and Samba 3.3 and Win2k8.

CC: Stable <stable@kernel.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:13 +00:00
Volker Lendecke ac6a3ef405 Remove an already-checked error condition in SendReceiveBlockingLock
Remove an already-checked error condition in SendReceiveBlockingLock

Signed-off-by: Volker Lendecke <vl@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:13 +00:00
Volker Lendecke 698e96a826 Streamline SendReceiveBlockingLock: Use "goto out:" in an error condition
Streamline SendReceiveBlockingLock: Use "goto out:" in an error condition

Signed-off-by: Volker Lendecke <vl@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:13 +00:00
Volker Lendecke 17c8bfed8a Streamline SendReceiveBlockingLock: Use "goto out:" in an error condition
Streamline SendReceiveBlockingLock: Use "goto out:" in an error condition

Signed-off-by: Volker Lendecke <vl@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:12 +00:00
Steve French 2b2bdfba7a [CIFS] Streamline SendReceive[2] by using "goto out:" in an error condition
Signed-off-by: Volker Lendecke <vl@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:12 +00:00
Volker Lendecke 8e4f2e8a1e Slightly streamline SendReceive[2]
Slightly streamline SendReceive[2]

Remove an else branch by naming the error condition what it is

Signed-off-by: Volker Lendecke <vl@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:12 +00:00
Volker Lendecke 829049cbb1 Check the return value of cifs_sign_smb[2]
Check the return value of cifs_sign_smb[2]

Signed-off-by: Volker Lendecke <vl@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:12 +00:00
Steve French 4c3130efda [CIFS] Cleanup: Move the check for too large R/W requests
This avoids an unnecessary else branch

Signed-off-by: Volker Lendecke <vl@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:12 +00:00
Volker Lendecke 27a97a613b [CIFS] Slightly simplify wait_for_free_request(), remove an unnecessary "else" branch
This is no functional change, because in the "if" branch we do an early
"return 0;".

Signed-off-by: Volker Lendecke <vl@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:12 +00:00
Volker Lendecke 8fbbd365cc Simplify allocate_mid() slightly: Remove some unnecessary "else" branches
Simplify allocate_mid() slightly: Remove some unnecessary "else" branches

Signed-off-by: Volker Lendecke <vl@samba.org>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:12 +00:00
Volker Lendecke 6d9c6d5431 [CIFS] In SendReceive, move consistency check out of the mutexed region
inbuf->smb_buf_length does not change in in wait_for_free_request() or in
allocate_mid(), so we can check it early.

Signed-off-by: Volker Lendecke <vl@samba.org>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:11 +00:00
Jeff Layton 00e485b019 cifs: store password in tcon
cifs: store password in tcon

Each tcon has its own password for share-level security. Store it in
the tcon and wipe it clean and free it when freeing the tcon. When
doing the tree connect with share-level security, use the tcon password
instead of the session password.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:11 +00:00
Jeff Layton 4e53a3fb98 cifs: have calc_lanman_hash take more granular args
cifs: have calc_lanman_hash take more granular args

We need to use this routine to encrypt passwords associated with the
tcon too. Don't assume that the password will be attached to the
smb_session.

Also, make some of the values in the lower encryption functions
const since they aren't changed.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:11 +00:00
Jeff Layton 55162dec93 cifs: zero out session password before freeing it
cifs: zero out session password before freeing it

...just to be on the safe side.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:11 +00:00
Jeff Layton 8570552425 cifs: fix wait_for_response to time out sleeping processes correctly
cifs: fix wait_for_response to time out sleeping processes correctly

The current scheme that CIFS uses to sleep and wait for a response is
not quite what we want. After sending a request, wait_for_response puts
the task to sleep with wait_event(). One of the conditions for
wait_event is a timeout (using time_after()).

The problem with this is that there is no guarantee that the process
will ever be woken back up. If the server stops sending data, then
cifs_demultiplex_thread will leave its response queue sleeping.

I think the only thing that saves us here is the fact that
cifs_dnotify_thread periodically (every 15s) wakes up sleeping processes
on all response_q's that have calls in flight.  This makes for
unnecessary wakeups of some processes. It also means large variability
in the timeouts since they're all woken up at once.

Instead of this, put the tasks to sleep with wait_event_timeout. This
makes them wake up on their own if they time out. With this change,
cifs_dnotify_thread should no longer be needed.

I've been testing this in conjunction with some other patches that I'm
working on. It doesn't seem to affect performance at all with with heavy
I/O. Identical iozone -ac runs complete in almost exactly the same time
(<1% difference in times).

Thanks to Wasrshi Nimara for initially pointing this out. Wasrshi, it
would be nice to know whether this patch also helps your testcase.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Wasrshi Nimara <warshinimara@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:11 +00:00
Steve French 8be0ed44c2 [CIFS] Can not mount with prefixpath if root directory of share is inaccessible
Windows allows you to deny access to the top of a share, but permit access to
a directory lower in the path.  With the prefixpath feature of cifs
(ie mounting \\server\share\directory\subdirectory\etc.) this should have
worked if the user specified a prefixpath which put the root of the mount
at a directory to which he had access, but we still were doing a lookup
on the root of the share (null path) when we should have been doing it on
the prefixpath subdirectory.

This fixes Samba bug # 5925

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:11 +00:00
Steve French 61e7480158 [CIFS] various minor cleanups pointed out by checkpatch script
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:10 +00:00
Steve French 3de2091ac7 [CIFS] fix typo
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:10 +00:00
Steve French acc18aa1e6 [CIFS] remove sparse warning
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:10 +00:00
Steve French 13a6e42af8 [CIFS] add mount option to send mandatory rather than advisory locks
Some applications/subsystems require mandatory byte range locks
(as is used for Windows/DOS/OS2 etc). Sending advisory (posix style)
byte range lock requests (instead of mandatory byte range locks) can
lead to problems for these applications (which expect that other
clients be prevented from writing to portions of the file which
they have locked and are updating).  This mount option allows
mounting cifs with the new mount option "forcemand" (or
"forcemandatorylock") in order to have the cifs client use mandatory
byte range locks (ie SMB/CIFS/Windows/NTFS style locks) rather than
posix byte range lock requests, even if the server would support
posix byte range lock requests.  This has no effect if the server
does not support the CIFS Unix Extensions (since posix style locks
require support for the CIFS Unix Extensions), but for mounts
to Samba servers this can be helpful for Wine and applications
that require mandatory byte range locks.

Acked-by: Jeff Layton <jlayton@redhat.com>
CC: Alexander Bokovoy <ab@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:10 +00:00
Jeff Layton d5c5605c27 cifs: make ipv6_connect take a TCP_Server_Info arg
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:10 +00:00
Jeff Layton bcf4b1063d cifs: make ipv4_connect take a TCP_Server_Info arg
In order to unify the smb_send routines, we need to reorganize the
routines that connect the sockets. Have ipv4_connect take a
TCP_Server_Info pointer and get the necessary fields from that.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:10 +00:00
Jeff Layton 7586b76585 cifs: don't declare smb_vol info on the stack
struct smb_vol is fairly large, it's probably best to kzalloc it...

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:10 +00:00
Jeff Layton 63c038c297 cifs: move allocation of new TCP_Server_Info into separate function
Clean up cifs_mount a bit by moving the code that creates new TCP
sessions into a separate function. Have that function search for an
existing socket and then create a new one if one isn't found.

Also reorganize the initializion of TCP_Server_Info a bit to prepare
for cleanup of the socket connection code.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:10 +00:00
Jeff Layton 8ecaf67a8e cifs: account for IPv6 in ses->serverName and clean up netbios name handling
The current code for setting the session serverName is IPv4-specific.
Allow it to be an IPv6 address as well. Use NIP* macros to set the
format.

This also entails increasing the length of the serverName field, so
declare a new macro for RFC1001 name length and use it in the
appropriate places.

Finally, drop the unicode_server_Name field from TCP_Server_Info since
it's not used. We can add it back later if needed, but for now it just
wastes memory.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:09 +00:00
Jeff Layton 954d7a1cf1 cifs: make dnotify thread experimental code
Now that tasks sleeping in wait_for_response will time out on their own,
we're not reliant on the dnotify thread to do this. Mark it as
experimental code for now.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:09 +00:00
Jeff Layton 72ca545b2d cifs: convert tcpSem to a mutex
Mutexes are preferred for single-holder semaphores...

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:09 +00:00
Jeff Layton 0468a2cf91 cifs: take module reference when starting cifsd
cifsd can outlive the last cifs mount. We need to hold a module
reference until it exits to prevent someone from unplugging
the module until we're ready.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:09 +00:00
Jeff Layton 80909022ce cifs: display addr and prefixpath options in /proc/mounts
Have cifs_show_options display the addr and prefixpath options in
/proc/mounts. Reduce struct dereferencing by adding some local
variables.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:09 +00:00
Jeff Layton 24b9b06ba7 cifs: remove unused SMB session pointer from struct mid_q_entry
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-12-26 02:29:09 +00:00
Ingo Molnar 32e8d18683 Merge branches 'timers/clocksource', 'timers/hpet', 'timers/hrtimers', 'timers/nohz', 'timers/ntp', 'timers/posixtimers' and 'timers/rtc' into timers/core 2008-12-25 18:02:25 +01:00
Ingo Molnar 860cf8894b Merge branches 'irq/sparseirq', 'irq/genirq' and 'irq/urgent'; commit 'v2.6.28' into irq/core 2008-12-25 16:27:54 +01:00
Ingo Molnar 4e202284e6 Merge branch 'sched/urgent'; commit 'v2.6.28' into sched/core 2008-12-25 13:42:23 +01:00
Martin Schwidefsky fc5243d98a [S390] arch_setup_additional_pages arguments
arch_setup_additional_pages currently gets two arguments, the binary
format descripton and an indication if the process uses an executable
stack or not. The second argument is not used by anybody, it could
be removed without replacement.

What actually does make sense is to pass an indication if the process
uses the elf interpreter or not. The glibc code will not use anything
from the vdso if the process does not use the dynamic linker, so for
statically linked binaries the architecture backend can choose not
to map the vdso.

Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-12-25 13:38:54 +01:00
James Morris cbacc2c7f0 Merge branch 'next' into for-linus 2008-12-25 11:40:09 +11:00
Ingo Molnar db8862eafe Merge branch 'linus' into tracing/hw-branch-tracing 2008-12-24 21:08:26 +01:00
Lachlan McIlroy 25051158bb [XFS] Fix race in xfs_write() between direct and buffered I/O with DMAPI
The iolock is dropped and re-acquired around the call to XFS_SEND_NAMESP().
While the iolock is released the file can become cached.  We then
'goto retry' and - if we are doing direct I/O - mapping->nrpages may now be
non zero but need_i_mutex will be zero and we will hit the WARN_ON().

Since we have dropped the I/O lock then the file size may have also changed
so what we need to do here is 'goto start' like we do for the XFS_SEND_DATA()
DMAPI event.

We also need to update the filesize before releasing the iolock so that
needs to be done before the XFS_SEND_NAMESP event.  If we drop the iolock
before setting the filesize we could race with a truncate.

Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-24 14:07:32 +11:00
Olga Kornievskaia 61054b14d5 nfsd: support callbacks with gss flavors
This patch adds server-side support for callbacks other than AUTH_SYS.

Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:19:00 -05:00
Olga Kornievskaia 945b34a772 rpc: allow gss callbacks to client
This patch adds client-side support to allow for callbacks other than
AUTH_SYS.

Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:18:34 -05:00
Olga Kornievskaia 608207e888 rpc: pass target name down to rpc level on callbacks
The rpc client needs to know the principal that the setclientid was done
as, so it can tell gssd who to authenticate to.

Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:17:40 -05:00
Olga Kornievskaia 68e76ad0ba nfsd: pass client principal name in rsc downcall
Two principals are involved in krb5 authentication: the target, who we
authenticate *to* (normally the name of the server, like
nfs/server.citi.umich.edu@CITI.UMICH.EDU), and the source, we we
authenticate *as* (normally a user, like bfields@UMICH.EDU)

In the case of NFSv4 callbacks, the target of the callback should be the
source of the client's setclientid call, and the source should be the
nfs server's own principal.

Therefore we allow svcgssd to pass down the name of the principal that
just authenticated, so that on setclientid we can store that principal
name with the new client, to be used later on callbacks.

Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:17:15 -05:00
Andy Adamson cf8cdbe5bd NFS: remove unused status from encode routines
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:06:18 -05:00
Andy Adamson d017931cff NFS: increment number of operations in each encode routine
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:06:17 -05:00
Benny Halevy 49c2559e29 NFS: fix comment placement in nfs4xdr.c
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:06:16 -05:00
Andy Adamson 05d564fe00 NFS: fix tabs in nfs4xdr.c
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:06:15 -05:00
Andy Adamson 6c0195a468 NFS: remove white space from nfs4xdr.c
Clean-up

Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:06:15 -05:00
Benny Halevy 374130770e nfs: remove incorrect usage of nfs4 compound response hdr.status
3 call sites look at hdr.status before returning success.
hdr.status must be zero in this case so there's no point in this.

Currently, hdr.status is correctly processed at decode_op_hdr time
if the op status cannot be decoded.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:06:14 -05:00
Benny Halevy aadf615211 nfs: return compound hdr.status when there are no op replies
When there are no op replies encoded in the compound reply
hdr.status still contains the overall status of the compound
rpc.  This can happen, e.g., when the server returns a
NFS4ERR_MINOR_VERS_MISMATCH error.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:06:13 -05:00
Trond Myklebust 027b6ca021 NFSv4: Fix an infinite loop in the NFS state recovery code
Marten Gajda <marten.gajda@fernuni-hagen.de> states:

I tracked the problem down to the function nfs4_do_open_expired.
Within this function _nfs4_open_expired is called and may return
-NFS4ERR_DELAY. When a further call to _nfs4_open_expired is
executed and does not return -NFS4ERR_DELAY the "exception.retry"
variable is not reset to 0, causing the loop to iterate again
(and as long as err != -NFS4ERR_DELAY, probably forever)

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:04:13 -05:00
Peter Staubach 64672d55d9 optimize attribute timeouts for "noac" and "actimeo=0"
Hi.

I've been looking at a bugzilla which describes a problem where
a customer was advised to use either the "noac" or "actimeo=0"
mount options to solve a consistency problem that they were
seeing in the file attributes.  It turned out that this solution
did not work reliably for them because sometimes, the local
attribute cache was believed to be valid and not timed out.
(With an attribute cache timeout of 0, the cache should always
appear to be timed out.)

In looking at this situation, it appears to me that the problem
is that the attribute cache timeout code has an off-by-one
error in it.  It is assuming that the cache is valid in the
region, [read_cache_jiffies, read_cache_jiffies + attrtimeo].  The
cache should be considered valid only in the region,
[read_cache_jiffies, read_cache_jiffies + attrtimeo).  With this
change, the options, "noac" and "actimeo=0", work as originally
expected.

This problem was previously addressed by special casing the
attrtimeo == 0 case.  However, since the problem is only an off-
by-one error, the cleaner solution is address the off-by-one
error and thus, not require the special case.

    Thanx...

        ps

Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:56 -05:00
Trond Myklebust dc0b027dfa NFSv4: Convert the open and close ops to use fmode
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:56 -05:00
Trond Myklebust 7a50c60e46 NFS: Use delegations to optimise ACCESS calls
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:55 -05:00
Trond Myklebust 15860ab1d7 NFSv4: Ensure that we set the verifier when revalidating delegated dentries
This ensures that we don't have to look up the dentry again after we return
the delegation if we know that the directory didn't change.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:54 -05:00
Trond Myklebust 5584c30630 NFSv4: Clean up is_atomic_open()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:54 -05:00
Trond Myklebust bd7bf9d540 NFSv4: Convert delegation->type field to fmode_t
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:53 -05:00
Trond Myklebust 9082a5cc1e NFSv4: Fix up delegation callbacks
Currently, the callback server is listening on IPv6 if it is enabled. This
means that IPv4 addresses will always be mapped.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:53 -05:00
Trond Myklebust b7391f44f2 NFSv4: Return unreferenced delegations more promptly
If the client is not using a delegation, the right thing to do is to return
it as soon as possible. This helps reduce the amount of state the server
has to track, as well as reducing the potential for conflicts with other
clients.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:52 -05:00
Trond Myklebust 6411bd4a47 NFSv4: Clean up the asynchronous delegation return
Reuse the state management thread in order to return delegations when we
get a callback.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:51 -05:00
Trond Myklebust b0d3ded1a2 NFSv4: Clean up nfs_expire_all_delegations()
Let the actual delegreturn stuff be run in the state manager thread rather
than allocating a separate kthread.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:50 -05:00
Trond Myklebust 0d62f85a81 NFSv4: Fix a BAD_SEQUENCEID condition.
We really shouldn't be resetting the sequence ids when doing state
expiration recovery, since we don't know if the server still remembers our
previous state owners. There are servers out there that do attempt to
preserve client state even if the lease has expired. Such a server would
only release that state if a conflicting OPEN request occurs.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:49 -05:00
Trond Myklebust f3c76491e7 NFSv4: Don't exit the state management if there are still tasks to do
Fix up a potential race...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:48 -05:00
Trond Myklebust e005e8041c NFSv4: Rename the state reclaimer thread
It is really a more general purpose state management thread at this point.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:48 -05:00
Trond Myklebust 707fb4b324 NFSv4: Clean up NFS4ERR_CB_PATH_DOWN error management...
Add a delegation cleanup phase to the state management loop, and do the
NFS4ERR_CB_PATH_DOWN recovery there.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:47 -05:00
Trond Myklebust 515d861177 NFSv4: Clean up the support for returning multiple delegations
Add a flag to mark delegations as requiring return, then run a garbage
collector. In the future, this will allow for more flexible delegation
management, where delegations may be marked for return if it turns out
that they are not being referenced.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:46 -05:00
Trond Myklebust 9e33bed552 NFSv4: Add recovery for individual stateids
NFSv4 defines a number of state errors which the client does not currently
handle. Among those we should worry about are:
  NFS4ERR_ADMIN_REVOKED - the server's administrator revoked our locks
  			  and/or delegations.
  NFS4ERR_BAD_STATEID - the client and server are out of sync, possibly
                        due to a delegation return racing with an OPEN
			request.
  NFS4ERR_OPENMODE - the client attempted to do something not sanctioned
  		     by the open mode of the stateid. Should normally just
		     occur as a result of a delegation return race.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:46 -05:00
Trond Myklebust 95d35cb4c4 NFSv4: Remove nfs_client->cl_sem
Now that we're using the flags to indicate state that needs to be
recovered, as well as having implemented proper refcounting and spinlocking
on the state and open_owners, we can get rid of nfs_client->cl_sem. The
only remaining case that was dubious was the file locking, and that case is
now covered by the nfsi->rwsem.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:45 -05:00
Trond Myklebust 19e03c570e NFSv4: Ensure that file unlock requests don't conflict with state recovery
The unlock path is currently failing to take the nfs_client->cl_sem read
lock, and hence the recovery path may see locks disappear from underneath
it.
Also ensure that it takes the nfs_inode->rwsem read lock so that it there
is no conflict with delegation recalls.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:44 -05:00
Trond Myklebust 65de872ed6 NFS: Remove the unnecessary argument to nfs4_wait_clnt_recover()
...and move some code around in order to clear out an unnecessary
forward declaration.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:44 -05:00
Trond Myklebust fe1d81952e NFSv4: Ensure that nfs4_reclaim_open_state() doesn't depend on cl_sem
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:43 -05:00
Trond Myklebust 7eff03aec9 NFSv4: Add a recovery marking scheme for state owners
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:43 -05:00
Trond Myklebust 0f605b5600 NFSv4: Don't tell server we rebooted when not necessary
Instead of doing a full setclientid, try doing a RENEW call first.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:42 -05:00
Trond Myklebust e598d843c0 NFSv4: Remove redundant RENEW calls if we know the lease has expired
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:42 -05:00
Trond Myklebust b79a4a1b45 NFSv4: Fix state recovery when the client runs over the grace period
If the client for some reason is not able to recover all its state within
the time allotted for the grace period, and the server reboots again, the
client is not allowed to recover the state that was 'lost' using reboot
recovery.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:41 -05:00
Trond Myklebust 6dc9d57af9 NFSv4: Callers to nfs4_get_renew_cred() need to hold nfs_client->cl_lock
Ditto for nfs4_get_setclientid_cred().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:41 -05:00
Trond Myklebust 0286001430 NFSv4: Clean up for the state loss reclaimer
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:40 -05:00
Trond Myklebust 15c831bf1a NFS: Use atomic bitops when changing struct nfs_delegation->flags
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:39 -05:00
Trond Myklebust 86e8948998 NFSv4: Fix up the dereferencing of delegation->inode
Without an extra lock, we cannot just assume that the delegation->inode is
valid when we're traversing the rcu-protected nfs_client lists. Use the
delegation->lock to ensure that it is truly valid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:39 -05:00
Trond Myklebust 343104308a NFSv4: Fix up another delegation related race
When we can update_open_stateid(), we need to be certain that we don't
race with a delegation return. While we could do this by grabbing the
nfs_client->cl_lock, a dedicated spin lock in the delegation structure
will scale better.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:38 -05:00
Chuck Lever 0cb2659b81 NLM: allow lockd requests from an unprivileged port
If the admin has specified the "noresvport" option for an NFS mount
point, the kernel's NFS client uses an unprivileged source port for
the main NFS transport.  The kernel's lockd client should use an
unprivileged port in this case as well.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:38 -05:00
Chuck Lever 50a737f86d NFS: "[no]resvport" mount option changes mountd client too
If the admin has specified the "noresvport" option for an NFS mount
point, the kernel's NFS client uses an unprivileged source port for
the main NFS transport.  The kernel's mountd client should use an
unprivileged port in this case as well.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:37 -05:00
Chuck Lever d740351bf0 NFS: add "[no]resvport" mount option
The standard default security setting for NFS is AUTH_SYS.  An NFS
client connects to NFS servers via a privileged source port and a
fixed standard destination port (2049).  The client sends raw uid and
gid numbers to identify users making NFS requests, and the server
assumes an appropriate authority on the client has vetted these
values because the source port is privileged.

On Linux, by default in-kernel RPC services use a privileged port in
the range between 650 and 1023 to avoid using source ports of well-
known IP services.  Using such a small range limits the number of NFS
mount points and the number of unique NFS servers to which a client
can connect concurrently.

An NFS client can use unprivileged source ports to expand the range of
source port numbers, allowing more concurrent server connections and
more NFS mount points.  Servers must explicitly allow NFS connections
from unprivileged ports for this to work.

In the past, bumping the value of the sunrpc.max_resvport sysctl on
the client would permit the NFS client to use unprivileged ports.
Bumping this setting also changes the maximum port number used by
other in-kernel RPC services, some of which still required a port
number less than 1023.

This is exacerbated by the way source port numbers are chosen by the
Linux RPC client, which starts at the top of the range and works
downwards.  It means that bumping the maximum means all RPC services
requesting a source port will likely get an unprivileged port instead
of a privileged one.

Changing this setting effects all NFS mount points on a client.  A
sysadmin could not selectively choose which mount points would use
non-privileged ports and which could not.

Lastly, this mechanism of expanding the limit on the number of NFS
mount points was entirely undocumented.

To address the need for the NFS client to use a large range of source
ports without interfering with the activity of other in-kernel RPC
services, we introduce a new NFS mount option.  This option explicitly
tells only the NFS client to use a non-privileged source port when
communicating with the NFS server for one specific mount point.

This new mount option is called "resvport," like the similar NFS mount
option on FreeBSD and Mac OS X.  A sister patch for nfs-utils will be
submitted that documents this new option in nfs(5).

The default setting for this new mount option requires the NFS client
to use a privileged port, as before.  Explicitly specifying the
"noresvport" mount option allows the NFS client to use an unprivileged
source port for this mount point when connecting to the NFS server
port.

This mount option is supported only for text-based NFS mounts.

[ Sidebar: it is widely known that security mechanisms based on the
  use of privileged source ports are ineffective.  However, the NFS
  client can combine the use of unprivileged ports with the use of
  secure authentication mechanisms, such as Kerberos.  This allows a
  large number of connections and mount points while ensuring a useful
  level of security.

  Eventually we may change the default setting for this option
  depending on the security flavor used for the mount.  For example,
  if the mount is using only AUTH_SYS, then the default setting will
  be "resvport;" if the mount is using a strong security flavor such
  as krb5, the default setting will be "noresvport." ]

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[Trond.Myklebust@netapp.com: Fixed a bug whereby nfs4_init_client()
was being called with incorrect arguments.]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:37 -05:00
Chuck Lever 542fcc334a NFS: move nfs_server flag initialization
Make it possible for the NFSv4 mount set up logic to pass mount option
flags down the stack to nfs_create_rpc_client().

This is immediately useful if we want NFS mount options to modulate
settings of the underlying RPC transport, but it may be useful at some
later point if other parts of the NFSv4 mount initialization logic
want to know what the mount options are.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:36 -05:00
Chuck Lever 4a01b8a4ee NFS: expand flags passed to nfs_create_rpc_client()
The nfs_create_rpc_client() function sets up an RPC client for an NFS
mount point.  Add an option that allows it to set up an RPC transport
from an unprivileged port.

Instead of having nfs_create_rpc_client()'s callers retain local
knowledge about how to set up an RPC client, create a couple of flag
arguments to control the use of RPC_CLNT_CREATE flags.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:35 -05:00
Chuck Lever c5d120f8e8 NFS: introduce nfs_mount_info struct for calling nfs_mount()
Clean up: convert nfs_mount() to take a single data structure argument to make
it simpler to add more arguments.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:35 -05:00
Chuck Lever 146ec944bb NFS: Move declaration of nfs_mount() to fs/nfs/internal.h
Clean up:  The nfs_mount() function is not to be used outside of the
NFS client.  Move its public declaration to fs/nfs/internal.h.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:34 -05:00
Chuck Lever 7b5d2b98e1 NFS: rename nfs_path variable
Clean up: I'm about to move the declaration of nfs_mount into
fs/nfs/internal.h and include it in fs/nfs/nfsroot.c.  There's a
conflicting definition of nfs_path in fs/nfs/internal.h and
fs/nfs/nfsroot.c, so rename the private one.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:34 -05:00
Jeff Layton df94f000c4 lockd: convert reclaimer thread to kthread interface
My understanding is that there is a push to turn the kernel_thread
interface into a non-exported symbol and move all kernel threads to use
the kthread API. This patch changes lockd to use kthread_run to spawn
the reclaimer thread.

I've made the assumption here that the extra module references taken
when we spawn this thread are unnecessary and removed them. I've also
added a KERN_ERR printk that pops if the thread can't be spawned to warn
the admin that the locks won't be reclaimed.

In the future, it would be nice to be able to notify userspace that
locks have been lost (probably by implementing SIGLOST), and adding some
good policies about how long we should reattempt to reclaim the locks.

Finally, I removed a comment about memory leaks that I believe is
obsolete and added a new one to clarify the result of sending a SIGKILL
to the reclaimer thread. As best I can tell, doing so doesn't actually
cause a memory leak.

I consider this patch 2.6.29 material.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:33 -05:00
Trond Myklebust 2de59872a7 LOCKD: Make lockd_up() and lockd_down() exported GPL-only
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:33 -05:00
Trond Myklebust d716f0b8a5 SUNRPC: nfsacl_encode/nfsacl_decode should be exported as GPL-only
Again, this has never been intended as a public abi for out-of-tree
modules.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:32 -05:00
Wu Fengguang 136221fc32 nfs: remove redundant tests on reading new pages
aops->readpages() and its NFS helper readpage_async_filler() will only
be called to do readahead I/O for newly allocated pages. So it's not
necessary to test for the always 0 dirty/uptodate page flags.

The removal of nfs_wb_page() call also fixes a readahead bug: the NFS
readahead has been synchronous since 2.6.23, because that call will
clear PG_readahead, which is the reminder for asynchronous readahead.

More background: the PG_readahead page flag is shared with PG_reclaim,
one for read path and the other for write path. clear_page_dirty_for_io()
unconditionally clears PG_readahead to prevent possible readahead residuals,
assuming itself to be always called in the write path. However, NFS is one
and the only exception in that it _always_ calls clear_page_dirty_for_io()
in the read path, i.e. for readpages()/readpage().

Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Wu Fengguang <wfg@linux.intel.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:30 -05:00
Artem Bityutskiy c8f915913a UBIFS: avoid unnecessary calculations
Do not calculate min_idx_lebs, because it is available in
c->min_idx_lebs

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-23 12:24:16 +02:00
Artem Bityutskiy 650ed50f42 UBIFS: re-calculate min_idx_size after the commit
When we commit, but before we try to write anything to the flash
media, @c->min_idx_size is inaccurate, because we do not re-calculate
it after the commit. Do not forget to do this.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-23 12:24:05 +02:00
Artem Bityutskiy 4d61db4f87 UBIFS: use nicer 64-bit math
Instead of using do_div(), use better primitives from
linux/math64.h.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-23 12:23:40 +02:00
Artem Bityutskiy af14a1ad79 UBIFS: fix available blocks count
Take into account that 2 eraseblocks are never available because
they are reserved for the index. This gives more realistic count
of FS blocks.

To avoid future confusions like this, introduce a constant.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-23 12:23:29 +02:00
Artem Bityutskiy d3cf502b6c UBIFS: various comment improvements and fixes
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-23 12:23:08 +02:00
Artem Bityutskiy 21a6025897 UBIFS: improve budgeting dump
Dump available space calculated by budgeting subsystem.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-23 12:22:58 +02:00
Artem Bityutskiy 24fa9e9438 UBIFS: fix tnc dumping
debugfs tnc dumping was broken because of an obvious typo.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-23 12:22:39 +02:00
Artem Bityutskiy 7bbe5b5aa6 UBIFS: use PAGE_CACHE_MASK correctly
It has high bits set, not low bits set as the UBIFS code
assumed.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-23 12:19:14 +02:00
Christoph Hellwig ad1ad968f4 [XFS] handle unaligned data in xfs_bmbt_disk_get_all
In libxfs xfs_bmbt_disk_get_all needs to handle unaligned data and thus
has been updated to use get_unaligned_be64.  In kernelspace we don't strictly
need it as the routine is only used for tracing and xfsidbg, but let's keep
the two implementations in sync.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-23 11:54:46 +11:00
Christoph Hellwig efc557570d [XFS] avoid memory allocations in xfs_fs_vcmn_err
xfs_fs_vcmn_err can be called under a spinlock, but does a sleeping memory
allocation to create buffer for it's internal sprintf.  Fortunately it's
the only caller of icmn_err, so we can merge the two and have one single
static buffer and spinlock protecting it.  While we're at it make sure
we proper __attribute__ format annotations so that the compiler can detect
mismatched format strings.

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-22 18:02:01 +11:00
Lachlan McIlroy 9f6c92b9cc [XFS] Fix speculative allocation beyond eof
Speculative allocation beyond eof doesn't work properly.  It was
broken some time ago after a code cleanup that moved what is now
xfs_iomap_eof_align_last_fsb() and xfs_iomap_eof_want_preallocate()
out of xfs_iomap_write_delay() into separate functions.  The code
used to use the current file size in various checks but got changed
to be max(file_size, i_new_size).  Since i_new_size is the result
of 'offset + count' then in xfs_iomap_eof_want_preallocate() the
check for '(offset + count) <= isize' will always be true.

ie if 'offset + count' is > ip->i_size then isize will be i_new_size
and equal to 'offset + count'.

This change fixes all the places that used to use the current file
size.

Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-22 17:56:49 +11:00
Lachlan McIlroy 4fdc778179 [XFS] Remove XFS_BUF_SHUT() and friends
Code does nothing so remove it.

Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-22 17:52:58 +11:00
Lachlan McIlroy d415867e0a [XFS] Use the incore inode size in xfs_file_readdir()
We should be using the incore inode size here not the linux inode
size.  The incore inode size is always up to date for directories
whereas the linux inode size is not updated for directories.

We've hit assertions in xfs_bmap() and traced it back to the linux
inode size being zero but the incore size being correct.

Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-22 17:50:56 +11:00
Ingo Molnar 826e08b015 sched: fix warning in fs/proc/base.c
Stephen Rothwell reported this new (harmless) build warning on platforms that
define u64 to long:

 fs/proc/base.c: In function 'proc_pid_schedstat':
 fs/proc/base.c:352: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'u64'

asm-generic/int-l64.h platforms strike again: that file should be eliminated.

Fix it by casting the parameters to long long.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-22 07:41:06 +01:00
Lachlan McIlroy 27a0464a6c [XFS] Fix merge conflict in fs/xfs/xfs_rename.c
Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6

Conflicts:

	fs/xfs/xfs_rename.c

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-22 17:34:26 +11:00
Julia Lawall f1d9e4586e fs/9p: change simple_strtol to simple_strtoul
Since v9ses->uid is unsigned, it would seem better to use simple_strtoul that
simple_strtol.

A simplified version of the semantic patch that makes this change is as
follows: (http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@r2@
long e;
position p;
@@

e = simple_strtol@p(...)

@@
position p != r2.p;
type T;
T e;
@@

e =
- simple_strtol@p
+ simple_strtoul
  (...)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-12-19 16:50:22 -06:00
Wu Fengguang 7dd0cdc51c 9p: convert d_iname references to d_name.name
d_iname is rubbish for long file names.
Use d_name.name in printks instead.

Signed-off-by: Wu Fengguang <wfg@linux.intel.com>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-12-19 16:47:40 -06:00
Duane Griffin 6ff232070a 9p: Remove potentially bad parameter from function entry debug print.
Signed-off-by: Duane Griffin <duaneg@dghda.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-12-19 16:45:21 -06:00
James Morris 12204e24b1 security: pass mount flags to security_sb_kern_mount()
Pass mount flags to security_sb_kern_mount(), so security modules
can determine if a mount operation is being performed by the kernel.

Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
2008-12-20 09:02:39 +11:00
Ingo Molnar 30cd324e97 Merge branches 'tracing/ftrace', 'tracing/ring-buffer' and 'tracing/urgent' into tracing/core
Conflicts:
	include/linux/ftrace.h
2008-12-19 09:42:40 +01:00
Ken Chen 9c2c48020e schedstat: consolidate per-task cpu runtime stats
Impact: simplify code

When we turn on CONFIG_SCHEDSTATS, per-task cpu runtime is accumulated
twice. Once in task->se.sum_exec_runtime and once in sched_info.cpu_time.
These two stats are exactly the same.

Given that task->se.sum_exec_runtime is always accumulated by the core
scheduler, sched_info can reuse that data instead of duplicate the accounting.

Signed-off-by: Ken Chen <kenchen@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18 13:54:01 +01:00
Paul Mackerras c280266a32 Merge branch 'linux-2.6' into next 2008-12-18 11:06:12 +11:00
Linus Torvalds 0bc77ecbe4 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2: Add JBD2 compat feature bit.
  ocfs2: Always update xattr search when creating bucket.
2008-12-17 15:01:23 -08:00
Jeff Layton 331c313510 cifs: fix buffer overrun in parse_DFS_referrals
While testing a kernel with memory poisoning enabled, I saw some warnings
about the redzone getting clobbered when chasing DFS referrals. The
buffer allocation for the unicode converted version of the searchName is
too small and needs to take null termination into account.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-17 14:59:55 -08:00
Joel Becker a97721894a ocfs2: Add JBD2 compat feature bit.
Define the OCFS2_FEATURE_COMPAT_JBD2 bit in the filesystem header.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-16 18:26:16 -08:00
Tao Ma 83099bc647 ocfs2: Always update xattr search when creating bucket.
When we create xattr bucket during the process of xattr set, we always
need to update the ocfs2_xattr_search since even if the bucket size is
the same as block size, the offset will change because of the removal
of the ocfs2_xattr_block header.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-16 14:07:37 -08:00
Dave Kleikamp d69e83d99c jfs: ensure symlinks are NUL-terminated
This is an alternate fix for a bug reported and fixed by Duane Griffin.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Reported-by: Duane Griffin <duaneg@dghda.com>
2008-12-16 10:21:34 -06:00
KOSAKI Motohiro 13bd41bc22 proc: enclose desc variable of show_stat() in CONFIG_SPARSE_IRQ
Impact: restructure code to fix compiler warning

commit 240d367b4e moved desc usage point
into #ifdef CONFIG_SPARSE_IRQ.

Eliminate the desc variable, otherwise following warning happens:

 fs/proc/stat.c: In function 'show_stat':
 fs/proc/stat.c:31: warning: unused variable 'desc'

[ akpm: cleaned up the patch to remove #ifdef ]

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 11:24:14 +01:00
Anton Vorontsov 6b82b3e4b5 powerpc: Remove `have_of' global variable
The `have_of' variable is a relic from the arch/ppc time, it isn't
useful nowadays.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-16 15:52:57 +11:00
David S. Miller eb14f01959 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/e1000e/ich8lan.c
2008-12-15 20:03:50 -08:00
Oleg Nesterov 8187926bda posix-timers: simplify de_thread()->exit_itimers() path
Impact: simplify code

de_thread() postpones release_task(leader) until after exit_itimers().
This was needed because !SIGEV_THREAD_ID timers could use ->group_leader
without get_task_struct(). With the recent changes we can release the
leader earlier and simplify the code.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-12-12 17:01:03 +01:00
Lachlan McIlroy 4d9d4ebf5d Merge branch 'master' of git+ssh://git.melbourne.sgi.com/git/xfs 2008-12-12 15:28:02 +11:00
Lachlan McIlroy cfbe52672f [XFS] set b_error from bio error in xfs_buf_bio_end_io
Preserve any error returned by the bio layer.

Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-12 15:27:25 +11:00
Christoph Hellwig c4cd747ee6 [XFS] use inode_change_ok for setattr permission checking
Instead of implementing our own checks use inode_change_ok to check for
necessary permission in setattr.  There is a slight change in behaviour
as inode_change_ok doesn't allow i_mode updates to add the suid or sgid
without superuser privilegues while the old XFS code just stripped away
those bits from the file mode.

(First sent on Semptember 29th)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-11 13:15:10 +11:00
Christoph Hellwig 4d4be482a4 [XFS] add a FMODE flag to make XFS invisible I/O less hacky
XFS has a mode called invisble I/O that doesn't update any of the
timestamps.  It's used for HSM-style applications and exposed through
the nasty open by handle ioctl.

Instead of doing directly assignment of file operations that set an
internal flag for it add a new FMODE_NOCMTIME flag that we can check
in the normal file operations.

(addition of the generic VFS flag has been ACKed by Al as an interims
 solution)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-11 13:14:41 +11:00
Christoph Hellwig 6d73cf133c [XFS] resync headers with libxfs
- xfs_sb.h add the XFS_SB_VERSION2_PARENTBIT features2 that has been
   around in userspace for some time
 - xfs_inode.h: move a few things out of __KERNEL__ that are needed by
   userspace
 - xfs_mount.h: only include xfs_sync.h under __KERNEL__
 - xfs_inode.c: minor whitespace fixup.  I accidentaly changes this when
   importing this file for use by userspace.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-11 13:14:17 +11:00
Christoph Hellwig 2175dd9574 [XFS] simplify projid check in xfs_rename
Check for the project ID after attaching all inodes to the transaction.
That way the unlock in the error case is done by the transaction subsystem,
which guaratees that is uses the right flags (which was wrong from day one
of this check), and avoids having special code unlocking an array of inodes
with potential duplicates.  Attaching the inode first is the method used
by xfs_rename and the other namespace methods all other error that require
multiple locked inodes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-11 13:13:52 +11:00
Christoph Hellwig 15ac08a8b2 [XFS] replace b_fspriv with b_mount
Replace the b_fspriv pointer and it's ugly accessors with a properly types
xfs_mount pointer.  Also switch log reocvery over to it instead of using
b_fspriv for the mount pointer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-11 13:13:33 +11:00
Linus Torvalds f4fd2c5b6f Merge branch 'to-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland
* 'to-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland:
  tracehook: exec double-reporting fix
2008-12-10 14:40:21 -08:00
Hugh Dickins 9c24624727 KSYM_SYMBOL_LEN fixes
Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked
to my 966c8c12dc sprint_symbol(): use
less stack exposing a bug in slub's list_locations() -
kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was
beyond the end of page provided.

The 100 slop which list_locations() allows at end of page looks roughly
enough for all the other stuff it might print after the symbol before
it checks again: break out KSYM_SYMBOL_LEN earlier than before.

Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they
need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer
where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies
them.

[akpm@linux-foundation.org: ftrace.h needs module.h]
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc Miles Lane <miles.lane@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10 08:01:54 -08:00
Dmitri Monakhov 6ee5a399d6 inotify: fix IN_ONESHOT unmount event watcher
On umount two event will be dispatched to watcher:

1: inotify_dev_queue_event(.., IN_UNMOUNT,..)
2: remove_watch(watch, dev)
    ->inotify_dev_queue_event(.., IN_IGNORED, ..)

But if watcher has IN_ONESHOT bit set then the watcher will be released
inside first event.  Which result in accessing invalid object later.  IMHO
it is not pure regression.  This bug wasn't triggered while initial
inotify interface testing phase because of another bug in IN_ONESHOT
handling logic :)

  commit ac74c00e49
  Author: Ulisses Furquim <ulissesf@gmail.com>
  Date:   Fri Feb 8 04:18:16 2008 -0800
    inotify: fix check for one-shot watches before destroying them
    As the IN_ONESHOT bit is never set when an event is sent we must check it
    in the watch's mask and not in the event's mask.

TESTCASE:
mkdir mnt
mount -ttmpfs none mnt
mkdir mnt/d
./inotify mnt/d&
umount mnt ## << lockup or crash here

TESTSOURCE:
/* gcc -oinotify inotify.c */
#include <stdio.h>
#include <stdlib.h>
#include <sys/inotify.h>

int main(int argc, char **argv)
{
        char buf[1024];
        struct inotify_event *ie;
        char *p;
        int i;
        ssize_t l;

        p = argv[1];
        i = inotify_init();
        inotify_add_watch(i, p, ~0);

        l = read(i, buf, sizeof(buf));
        printf("read %d bytes\n", l);
        ie = (struct inotify_event *) buf;
        printf("event mask: %d\n", ie->mask);
	return 0;
}

Signed-off-by: Dmitri Monakhov <dmonakhov@openvz.org>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Robert Love <rlove@google.com>
Cc: Ulisses Furquim <ulissesf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10 08:01:53 -08:00
Matt Mackall 49c50342c7 pagemap: fix 32-bit pagemap regression
The large pages fix from bcf8039ed4 broke 32-bit pagemap by pulling the
pagemap entry code out into a function with the wrong return type.
Pagemap entries are 64 bits on all systems and unsigned long is only 32
bits on 32-bit systems.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Reported-by: Doug Graham <dgraham@nortel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: <stable@kernel.org>		[2.6.26.x, 2.6.27.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10 08:01:53 -08:00
Andrew Morton 02d2116887 revert "percpu_counter: new function percpu_counter_sum_and_set"
Revert

    commit e8ced39d5e
    Author: Mingming Cao <cmm@us.ibm.com>
    Date:   Fri Jul 11 19:27:31 2008 -0400

        percpu_counter: new function percpu_counter_sum_and_set

As described in

	revert "percpu counter: clean up percpu_counter_sum_and_set()"

the new percpu_counter_sum_and_set() is racy against updates to the
cpu-local accumulators on other CPUs.  Revert that change.

This means that ext4 will be slow again.  But correct.

Reported-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: <stable@kernel.org>		[2.6.27.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10 08:01:52 -08:00
Andrew Morton 71c5576fbd revert "percpu counter: clean up percpu_counter_sum_and_set()"
Revert

    commit 1f7c14c62c
    Author: Mingming Cao <cmm@us.ibm.com>
    Date:   Thu Oct 9 12:50:59 2008 -0400

        percpu counter: clean up percpu_counter_sum_and_set()

Before this patch we had the following:

percpu_counter_sum(): return the percpu_counter's value

percpu_counter_sum_and_set(): return the percpu_counter's value, copying
that value into the central value and zeroing the per-cpu counters before
returning.

After this patch, percpu_counter_sum_and_set() has gone, and
percpu_counter_sum() gets the old percpu_counter_sum_and_set()
functionality.

Problem is, as Eric points out, the old percpu_counter_sum_and_set()
functionality was racy and wrong.  It zeroes out counters on "other" cpus,
without holding any locks which will prevent races agaist updates from
those other CPUS.

This patch reverts 1f7c14c62c.  This means
that percpu_counter_sum_and_set() still has the race, but
percpu_counter_sum() does not.

Note that this is not a simple revert - ext4 has since started using
percpu_counter_sum() for its dirty_blocks counter as well.

Note that this revert patch changes percpu_counter_sum() semantics.

Before the patch, a call to percpu_counter_sum() will bring the counter's
central counter mostly up-to-date, so a following percpu_counter_read()
will return a close value.

After this patch, a call to percpu_counter_sum() will leave the counter's
central accumulator unaltered, so a subsequent call to
percpu_counter_read() can now return a significantly inaccurate result.

If there is any code in the tree which was introduced after
e8ced39d5e was merged, and which depends
upon the new percpu_counter_sum() semantics, that code will break.

Reported-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10 08:01:52 -08:00
Roland McGrath 85f334666a tracehook: exec double-reporting fix
The patch 6341c39 "tracehook: exec" introduced a small regression in
2.6.27 regarding binfmt_misc exec event reporting.  Since the reporting
is now done in the common search_binary_handler() function, an exec
of a misc binary will result in two (or possibly multiple) exec events
being reported, instead of just a single one, because the misc handler
contains a recursive call to search_binary_handler.

To add to the confusion, if PTRACE_O_TRACEEXEC is not active, the multiple
SIGTRAP signals will in fact cause only a single ptrace intercept, as the
signals are not queued.  However, if PTRACE_O_TRACEEXEC is on, the debugger
will actually see multiple ptrace intercepts (PTRACE_EVENT_EXEC).

The test program included below demonstrates the problem.

This change fixes the bug by calling tracehook_report_exec() only in the
outermost search_binary_handler() call (bprm->recursion_depth == 0).

The additional change to restore bprm->recursion_depth after each binfmt
load_binary call is actually superfluous for this bug, since we test the
value saved on entry to search_binary_handler().  But it keeps the use of
of the depth count to its most obvious expected meaning.  Depending on what
binfmt handlers do in certain cases, there could have been false-positive
tests for recursion limits before this change.

    /* Test program using PTRACE_O_TRACEEXEC.
       This forks and exec's the first argument with the rest of the arguments,
       while ptrace'ing.  It expects to see one PTRACE_EVENT_EXEC stop and
       then a successful exit, with no other signals or events in between.

       Test for kernel doing two PTRACE_EVENT_EXEC stops for a binfmt_misc exec:

       $ gcc -g traceexec.c -o traceexec
       $ sudo sh -c 'echo :test:M::foobar::/bin/cat: > /proc/sys/fs/binfmt_misc/register'
       $ echo 'foobar test' > ./foobar
       $ chmod +x ./foobar
       $ ./traceexec ./foobar; echo $?
       ==> good <==
       foobar test
       0
       $
       ==> bad <==
       foobar test
       unexpected status 0x4057f != 0
       3
       $

    */

    #include <stdio.h>
    #include <sys/types.h>
    #include <sys/wait.h>
    #include <sys/ptrace.h>
    #include <unistd.h>
    #include <signal.h>
    #include <stdlib.h>

    static void
    wait_for (pid_t child, int expect)
    {
      int status;
      pid_t p = wait (&status);
      if (p != child)
	{
	  perror ("wait");
	  exit (2);
	}
      if (status != expect)
	{
	  fprintf (stderr, "unexpected status %#x != %#x\n", status, expect);
	  exit (3);
	}
    }

    int
    main (int argc, char **argv)
    {
      pid_t child = fork ();

      if (child < 0)
	{
	  perror ("fork");
	  return 127;
	}
      else if (child == 0)
	{
	  ptrace (PTRACE_TRACEME);
	  raise (SIGUSR1);
	  execv (argv[1], &argv[1]);
	  perror ("execve");
	  _exit (127);
	}

      wait_for (child, W_STOPCODE (SIGUSR1));

      if (ptrace (PTRACE_SETOPTIONS, child,
		  0L, (void *) (long) PTRACE_O_TRACEEXEC) != 0)
	{
	  perror ("PTRACE_SETOPTIONS");
	  return 4;
	}

      if (ptrace (PTRACE_CONT, child, 0L, 0L) != 0)
	{
	  perror ("PTRACE_CONT");
	  return 5;
	}

      wait_for (child, W_STOPCODE (SIGTRAP | (PTRACE_EVENT_EXEC << 8)));

      if (ptrace (PTRACE_CONT, child, 0L, 0L) != 0)
	{
	  perror ("PTRACE_CONT");
	  return 6;
	}

      wait_for (child, W_EXITCODE (0, 0));

      return 0;
    }

Reported-by: Arnd Bergmann <arnd@arndb.de>
CC: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Roland McGrath <roland@redhat.com>
2008-12-09 19:36:38 -08:00
Lachlan McIlroy e055f13a6d [XFS] Remove unused tracing code
None of this code appears to be used anywhere so remove it.

Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-10 11:51:54 +11:00
J. Bruce Fields a4f4d6df53 EXPORTFS: handle NULL returns from fh_to_dentry()/fh_to_parent()
While 440037287c "[PATCH] switch all filesystems over to
d_obtain_alias" removed some cases where fh_to_dentry() and
fh_to_parent() could return NULL, there are still a few NULL returns
left in individual filesystems.  Thus it was a mistake for that commit
to remove the handling of NULL returns in the callers.

Revert those parts of 440037287c which removed the NULL handling.

(We could, alternatively, modify all implementations to return -ESTALE
instead of NULL, but that proves to require fixing a number of
filesystems, and in some cases it's arguably more natural to return
NULL.)

Thanks to David for original patch and Linus, Christoph, and Hugh for
review.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: David Howells <dhowells@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-08 19:49:32 -08:00
Yinghai Lu 240d367b4e sparseirq: fix Alpha build failure
Impact: build fix on Alpha

-tip testing found this build failure on the Alpha defconfig:

/home/mingo/tip/fs/proc/stat.c: In function 'show_stat':
/home/mingo/tip/fs/proc/stat.c:48: error: implicit declaration of function 'for_each_irq_desc'
/home/mingo/tip/fs/proc/stat.c:48: error: expected ';' before '{' token

can not use irq_desc() in stat.c on older architectures.

Signed-off-by: Yinghai Lu <yinghai@kernel.orgg>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-09 04:16:54 +01:00
Yinghai Lu 0b8f1efad3 sparse irq_desc[] array: core kernel and x86 changes
Impact: new feature

Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with
NR_CPUS set to large values. The goal is to be able to scale up to much
larger NR_IRQS value without impacting the (important) common case.

To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of
irq_desc pointers.

When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc,
this also makes the IRQ descriptors NUMA-local (to the site that calls
request_irq()).

This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now
uses desc->chip_data for x86 to store irq_cfg.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 14:31:51 +01:00
Jonathan Corbet 218d11a8b0 Fix a race condition in FASYNC handling
Changeset a238b790d5 (Call fasync()
functions without the BKL) introduced a race which could leave
file->f_flags in a state inconsistent with what the underlying
driver/filesystem believes.  Revert that change, and also fix the same
races in ioctl_fioasync() and ioctl_fionbio().

This is a minimal, short-term fix; the real fix will not involve the
BKL.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-05 15:35:10 -08:00
Ingo Molnar 970987beb9 Merge branches 'tracing/ftrace', 'tracing/function-graph-tracer' and 'tracing/urgent' into tracing/core 2008-12-05 14:45:22 +01:00
Linus Torvalds bbeba4c35c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev:
  [PATCH] fix bogus argument of blkdev_put() in pktcdvd
  [PATCH 2/2] documnt FMODE_ constants
  [PATCH 1/2] kill FMODE_NDELAY_NOW
  [PATCH] clean up blkdev_get a little bit
  [PATCH] Fix block dev compat ioctl handling
  [PATCH] kill obsolete temporary comment in swsusp_close()
2008-12-04 21:45:44 -08:00
Dave Chinner 576a488a27 [XFS] Fix hang after disallowed rename across directory quota domains
When project quota is active and is being used for directory tree
quota control, we disallow rename outside the current directory
tree. This requires a check to be made after all the inodes
involved in the rename are locked. We fail to unlock the inodes
correctly if we disallow the rename when the target is outside the
current directory tree. This results in a hang on the next access
to the inodes involved in failed rename.

Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-05 15:39:13 +11:00
Lachlan McIlroy 14d676f56f Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-12-05 15:27:43 +11:00
Lachlan McIlroy 797eaed40e [XFS] Remove unnecessary assertion
Hit this assert because an inode was tagged with XFS_ICI_RECLAIM_TAG but
not XFS_IRECLAIMABLE|XFS_IRECLAIM.  This is because xfs_iget_cache_hit()
first clears XFS_IRECLAIMABLE and then calls __xfs_inode_clear_reclaim_tag()
while only holding the pag_ici_lock in read mode so we can race with
xfs_reclaim_inodes_ag().  Looks like xfs_reclaim_inodes_ag() will do the
right thing anyway so just remove the assert.

Thanks to Christoph for pointing out where the problem was.

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
2008-12-05 14:15:49 +11:00
Lachlan McIlroy a5b429d41f [XFS] Remove unused variable in ktrace_free()
entries_size is probably left over from when we used to pass the
size to kmem_free().

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
2008-12-05 13:31:51 +11:00
Lachlan McIlroy c6422617a1 [XFS] Check return value of xfs_buf_get_noaddr()
We check the return value of all other calls to xfs_buf_get_noaddr().
Make sense to do it here too.

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
2008-12-05 13:16:15 +11:00
Dave Chinner 6a0775a991 [XFS] Fix hang after disallowed rename across directory quota domains
When project quota is active and is being used for directory tree
quota control, we disallow rename outside the current directory
tree. This requires a check to be made after all the inodes
involved in the rename are locked. We fail to unlock the inodes
correctly if we disallow the rename when the target is outside the
current directory tree. This results in a hang on the next access
to the inodes involved in failed rename.

Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-05 12:50:04 +11:00
Christoph Hellwig 8bb57320f3 [XFS] Fix compile with CONFIG_COMPAT enabled
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-05 11:23:10 +11:00
Alexey Dobriyan 995be04548 UBIFS: fix section mismatch
This patch fixes the following section mismatch:

WARNING: fs/ubifs/ubifs.o(.init.text+0xec): Section mismatch in reference from the function init_module() to the function .exit.text:ubifs_compressors_exit()

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-04 17:53:10 +02:00
Christoph Hellwig fd4ce1acd0 [PATCH 1/2] kill FMODE_NDELAY_NOW
Update FMODE_NDELAY before each ioctl call so that we can kill the
magic FMODE_NDELAY_NOW.  It would be even better to do this directly
in setfl(), but for that we'd need to have FMODE_NDELAY for all files,
not just block special files.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-04 04:22:57 -05:00
Christoph Hellwig ebbefc011e [PATCH] clean up blkdev_get a little bit
The way the bd_claim for the FMODE_EXCL case is implemented is rather
confusing.  Clean it up to the most logical style.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-04 04:22:56 -05:00
Ingo Molnar b8307db247 Merge commit 'v2.6.28-rc7' into tracing/core 2008-12-04 09:07:19 +01:00
James Morris ec98ce480a Merge branch 'master' into next
Conflicts:
	fs/nfsd/nfs4recover.c

Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().

Signed-off-by: James Morris <jmorris@namei.org>
2008-12-04 17:16:36 +11:00
Christoph Hellwig 5a8d0f3c7a move inode tracing out of xfs_vnode.
Move the inode tracing into xfs_iget.c / xfs_inode.h and kill xfs_vnode.c
now that it's empty.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:25 +11:00
Christoph Hellwig 25e41b3d52 move vn_iowait / vn_iowake into xfs_aops.c
The whole machinery to wait on I/O completion is related to the I/O path
and should be there instead of in xfs_vnode.c.  Also give the functions
more descriptive names.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:24 +11:00
Christoph Hellwig 583fa586f0 kill vn_ioerror
There's just one caller of this helper, and it's much cleaner to just merge
the xfs_do_force_shutdown call into it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:24 +11:00
Christoph Hellwig f95099ba5a kill xfs_unmount_flush
There's almost nothing left in this function, instead remove the IRELE
on the real times inodes and the call to XFS_QM_UNMOUNT into xfs_unmountfs.

For the regular unmount case that means it now also happenes after dmapi
notification, but otherwise there is no difference in behaviour.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:24 +11:00
Christoph Hellwig e57481dc26 no explicit xfs_iflush for special inodes during unmount
Currently we explicitly call xfs_iflush on the quota, real-time and root
inodes from xfs_unmount_flush.  But we just called xfs_sync_inodes with
SYNC_ATTR and do an XFS_bflush aka xfs_flush_buftarg to make sure all inodes
are on disk already, so there is no need for these special cases.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:23 +11:00
Christoph Hellwig 070c4616ec use xfs_trans_ijoin in xfs_trans_iget
Use xfs_trans_ijoin in xfs_trans_iget in case we need to join an inode into
a transaction instead of opencoding it.  Based on a discussion with and an
incomplete patch from Niv Sardi.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:23 +11:00
Christoph Hellwig b56757becf remove leftovers of shared read-only support
We never supported shared read-only filesystems, so remove the dead
code left over from IRIX for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:23 +11:00
Christoph Hellwig e88f11abe0 remove unused m_inode_quiesce member from struct xfs_mount
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:22 +11:00
Christoph Hellwig 6bd16ff270 kill dead inode flags
There are a few inode flags around that aren't used anywhere, so remove
them.  Also update xfsidbg to display all used inode flags correctly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:22 +11:00
Christoph Hellwig 5efcbb853b cleanup xfs_sb.h feature flag helpers
The various inlines in xfs_sb.h that deal with the superblock version
and fature flags were converted from macros a while ago, and this
show by the odd coding style full of useless braces and backslashes
and the avoidance of conditionals.

Clean these up to look like normal C code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:22 +11:00
Christoph Hellwig df6771bde1 kill dead quota flags
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:22 +11:00
Christoph Hellwig 63ad2a5c4c remove dead code from sv_t implementation
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:21 +11:00
Christoph Hellwig 39e2defe73 reduce l_icloglock roundtrips
All but one caller of xlog_state_want_sync drop and re-acquire
l_icloglock around the call to it, just so that xlog_state_want_sync can
acquire and drop it.

Move all lock operation out of l_icloglock and assert that the lock is
held when it is called.

Note that it would make sense to extende this scheme to
xlog_state_release_iclog, but the locking in there is more complicated
and we'd like to keep the atomic_dec_and_lock optmization for those
callers not having l_icloglock yet.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:21 +11:00
Christoph Hellwig d9424b3c4a stop using igrab in xfs_vn_link
->link is guranteed to get an already reference inode passed so we
can do a simple increment of i_count instead of using igrab and thus
avoid banging on the global inode_lock.  This is what most filesystems
already do.

Also move the increment after the call to xfs_link to simplify error
handling.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:21 +11:00
Christoph Hellwig 5d765b976c kill xfs_buf_iostart
xfs_buf_iostart is a "shared" helper for xfs_buf_read_flags,
xfs_bawrite, and xfs_bdwrite - except that there isn't much shared
code but rather special cases for each caller.

So remove this function and move the functionality to the caller.
xfs_bawrite and xfs_bdwrite are now big enough to be moved out of
line and the xfs_buf_read_flags is moved into a new helper called
_xfs_buf_read.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:20 +11:00
Christoph Hellwig 5cafdeb289 cleanup the inode reclaim path
Merge xfs_iextract and xfs_idestroy into xfs_ireclaim as they are never
called individually.  Also rewrite most comments in this area as they
were severly out of date.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:20 +11:00
Christoph Hellwig ccd0be6cfc remove unused prototypes for xfs_ihash_init / xfs_ihash_free
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:20 +11:00
Christoph Hellwig 73e6335c14 remove unused behvavior cruft in xfs_super.h
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:19 +11:00
Christoph Hellwig 2234d54d3d remove useless mnt_want_write call in xfs_write
When mnt_want_write was introduced a call to it was added around
xfs_ichgtime, but there is no need for this because a file can't be open
read/write on a r/o mount, and a mount can't degrade r/o while we still
have files open for writing.  As the mnt_want_write changes were never
merged into the CVS tree this patch is for mainline only.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:19 +11:00
Christoph Hellwig ddcd856d81 [XFS] fix compile on 32 bit systems
The recent compat patches make xfs_file.c include xfs_ioctl32.h unconditional,
which breaks the build on 32 bit systems which don't have the various compat
defintions.

Remove the include and move the defintion of xfs_file_compat_ioctl to
xfs_ioctl.h so that we can avoid including all the compat defintions in
xfs_file.c

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-04 13:07:29 +11:00
Linus Torvalds 2433c41789 Merge branch 'for-2.6.28' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.28' of git://linux-nfs.org/~bfields/linux:
  NLM: client-side nlm_lookup_host() should avoid matching on srcaddr
  nfsd: use of unitialized list head on error exit in nfs4recover.c
  Add a reference to sunrpc in svc_addsock
  nfsd: clean up grace period on early exit
2008-12-03 16:40:37 -08:00
Artem Bityutskiy 2ba5f7ae81 UBIFS: introduce LPT dump function
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-03 13:14:34 +02:00
Artem Bityutskiy 787845bdea UBIFS: dump stack in LPT check functions
It is useful to know how we got to the checking function when
hunting the bugs.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-03 13:14:34 +02:00
Artem Bityutskiy 45e12d901f UBIFS: run debugging checks only if they are enabled
Do not forget to check whether lpt debugging is enabled before
running the check functions. This commit also makes some spelling
fixes.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-03 13:14:34 +02:00
Artem Bityutskiy 552ff3179d UBIFS: add debugfs support
We need to have a possibility to see various UBIFS variables
and ask UBIFS to dump various information. Debugfs is what
we need.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-03 13:14:33 +02:00
Artem Bityutskiy 17c2f9f85c UBIFS: separate debugging fields out
Introduce a new data structure which contains all debugging
stuff inside. This is cleaner than having debugging stuff
directly in 'c'.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-03 13:14:33 +02:00
Kukkonen Mika 5dd7cbc083 UBIFS: avoid unnecessary checks
I have a habit of compiling kernel with
EXTRA_CFLAGS="-Wextra -Wno-unused -Wno-sign-compare -Wno-missing-field-initializers"
and so fs/ubifs/key.h give lots (~10) of these every time:

CC      fs/ubifs/tnc_misc.o
In file included from fs/ubifs/ubifs.h:1725,
from fs/ubifs/tnc_misc.c:30:
fs/ubifs/key.h: In function 'key_r5_hash':
fs/ubifs/key.h:64: warning: comparison of unsigned expression >= 0 is always true
fs/ubifs/key.h: In function 'key_test_hash':
fs/ubifs/key.h:81: warning: comparison of unsigned expression >= 0 is always true

This patch fixes the warnings.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-03 13:14:11 +02:00