linux

Commit Graph

Author	SHA1	Message	Date
Christoph Hellwig	722c55d13e	hfsplus: remove superflous rootflags field in hfsplus_inode_info The rootflags field in hfsplus_inode_info only caches the immutable and append-only flags in the VFS inode, so we can easily get rid of it. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-14 09:54:33 -04:00
Christoph Hellwig	f6089ff87d	hfsplus: fix link corruption HFS implements hardlink by using indirect catalog entries that refer to a hidden directly. The link target is cached in the dev field in the HFS+ specific inode, which is also used for the device number for device files, and inside for passing the nlink value of the indirect node from hfsplus_cat_write_inode to a helper function. Now if we happen to write out the indirect node while hfsplus_link is creating the catalog entry we'll get a link pointing to the linkid of the current nlink value. This can easily be reproduced by a large enough loop of local git-clone operations. Stop abusing the dev field in the HFS+ inode for short term storage by refactoring the way the permission structure in the catalog entry is set up, and rename the dev field to linkid to avoid any confusion. While we're at it also prevent creating hard links to special files, as the HFS+ dev and linkid share the same space in the on-disk structure. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-14 09:54:28 -04:00
Christoph Hellwig	13571a6977	hfsplus: validate btree flags Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-14 09:54:23 -04:00
Eric Sandeen	9250f92597	hfsplus: handle more on-disk corruptions without oopsing hfs seems prone to bad things when it encounters on disk corruption. Many values are read from disk, and used as lengths to memcpy, as an example. This patch fixes up several of these problematic cases. o sanity check the on-disk maximum key lengths on mount (these are set to a defined value at mkfs time and shouldn't differ) o check on-disk node keylens against the maximum key length for each tree o fix hfs_btree_open so that going out via free_tree: doesn't wind up in hfs_releasepage, which wants to follow the very pointer we were trying to set up: HFS_SB(sb)->cat_tree = hfs_btree_open() . failure gets to hfs_releasepage and tries to follow HFS_SB(sb)->cat_tree Tested with the fsfuzzer; it survives more than it used to. [hch: ported of commit `cf05946250` from hfs] [hch: added the fixes from 5581d018ed3493d226e7a4d645d9c8a5af6c36b] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-14 09:53:48 -04:00
Al Viro	b6b41424f0	hfsplus: hfs_bnode_find() can fail, resulting in hfs_bnode_split() breakage oops and fs corruption; the latter can happen even on valid fs in case of oom. [hch: port of commit `3d10a15d69` from hfs] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-14 09:53:42 -04:00
Jeff Mahoney	ee52716245	hfsplus: fix oops on mount with corrupted btree extent records A particular fsfuzzer run caused an hfs file system to crash on mount. This is due to a corrupted MDB extent record causing a miscalculation of HFSPLUS_I(inode)->first_blocks for the extent tree. If the extent records are zereod out, then it won't trigger the first_blocks special case and instead falls through to the extent code, which we're in the middle of initializing. This patch catches the 0 size extent records, reports the corruption, and fails the mount. [hch: ported of commit `47f365eb57` from hfs] Reported-by: Ramon de Carvalho Valle <rcvalle@linux.vnet.ibm.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-14 09:53:37 -04:00
Linus Torvalds	8c35bf368c	Merge branch 'for-2.6.36' of git://linux-nfs.org/~bfields/linux * 'for-2.6.36' of git://linux-nfs.org/~bfields/linux: nfsd: fix BUG at fs/nfsd/nfsfh.h:199 on unlink	2010-10-13 16:51:29 -07:00
J. Bruce Fields	b1e86db1de	nfsd: fix BUG at fs/nfsd/nfsfh.h:199 on unlink As of commit `43a9aa64a2` "NFSD: Fill in WCC data for REMOVE, RMDIR, MKNOD, and MKDIR", we sometimes call fh_unlock on a filehandle that isn't fully initialized. We should fix up the callers, but as a quick fix it is also sufficient just to remove this assertion. Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-13 15:48:55 -04:00
Jeff Layton	d7c86ff8cd	cifs: don't use vfsmount to pin superblock for oplock breaks Filesystems aren't really supposed to do anything with a vfsmount. It's considered a layering violation since vfsmounts are entirely managed at the VFS layer. CIFS currently keeps an active reference to a vfsmount in order to prevent the superblock vanishing before an oplock break has completed. What we really want to do instead is to keep sb->s_active high until the oplock break has completed. This patch borrows the scheme that NFS uses for handling sillyrenames. An atomic_t is added to the cifs_sb_info. When it transitions from 0 to 1, an extra reference to the superblock is taken (by bumping the s_active value). When it transitions from 1 to 0, that reference is dropped and a the superblock teardown may proceed if there are no more references to it. Also, the vfsmount pointer is removed from cifsFileInfo and from cifs_new_fileinfo, and some bogus forward declarations are removed from cifsfs.h. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-10-12 18:08:01 +00:00
Jeff Layton	a5e18bc36e	cifs: keep dentry reference in cifsFileInfo instead of inode reference cifsFileInfo is a bit problematic. It contains a reference back to the struct file itself. This makes it difficult for a cifsFileInfo to exist without a corresponding struct file. It would be better instead of the cifsFileInfo just held info pertaining to the open file on the server instead without any back refrences to the struct file. This would allow it to exist after the filp to which it was originally attached was closed. Much of the use of the file pointer in this struct is to get at the dentry. Begin divorcing the cifsFileInfo from the struct file by keeping a reference to the dentry. Since the dentry will have a reference to the inode, we can eliminate the "pInode" field too and convert the igrab/iput to dget/dput. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-10-12 18:06:42 +00:00
Jeff Layton	1c456013e9	cifs: on multiuser mount, set ownership to current_fsuid/current_fsgid (try #7 ) commit `3aa1c8c290` made cifs_getattr set the ownership of files to current_fsuid/current_fsgid when multiuser mounts were in use and when mnt_uid/mnt_gid were non-zero. It should have instead based that decision on the CIFS_MOUNT_OVERR_UID/GID flags. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-10-12 15:43:53 +00:00
Thomas Gleixner	756b0322e5	affs: Use sema_init instead of init_MUTEX Get rid of init_MUTE() and use sema_init() instead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> LKML-Reference: <20100907125056.511395595@linutronix.de>	2010-10-12 17:39:25 +02:00
Thomas Gleixner	4a94103554	hfs: Convert tree_lock to mutex tree_lock is used as mutex so make it a mutex. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> LKML-Reference: <20100907125056.416332114@linutronix.de>	2010-10-12 17:36:11 +02:00
Shirish Pargaonkar	9daa42e220	CIFS ntlm authentication and signing - Build a proper av/ti pair blob for ntlmv2 without extended security authentication Build an av pair blob as part of ntlmv2 (without extended security) auth request. Include netbios and dns names for domain and server and a timestamp in the blob. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-10-12 15:14:06 +00:00
Eric Paris	7c5347733d	fanotify: disable fanotify syscalls This patch disables the fanotify syscalls by just not building them and letting the cond_syscall() statements in kernel/sys_ni.c redirect them to sys_ni_syscall(). It was pointed out by Tvrtko Ursulin that the fanotify interface did not include an explicit prioritization between groups. This is necessary for fanotify to be usable for hierarchical storage management software, as they must get first access to the file, before inotify-like notifiers see the file. This feature can be added in an ABI compatible way in the next release (by using a number of bits in the flags field to carry the info) but it was suggested by Alan that maybe we should just hold off and do it in the next cycle, likely with an (new) explicit argument to the syscall. I don't like this approach best as I know people are already starting to use the current interface, but Alan is all wise and noone on list backed me up with just using what we have. I feel this is needlessly ripping the rug out from under people at the last minute, but if others think it needs to be a new argument it might be the best way forward. Three choices: Go with what we got (and implement the new feature next cycle). Add a new field right now (and implement the new feature next cycle). Wait till next cycle to release the ABI (and implement the new feature next cycle). This is number 3. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-11 18:15:28 -07:00
Tristan Ye	7bdb0d18bf	ocfs2: Add a mount option "coherency=" to handle cluster coherency for O_DIRECT writes. Currently, the default behavior of O_DIRECT writes was allowing concurrent writing among nodes to the same file, with no cluster coherency guaranteed (no EX lock held). This can leave stale data in the cache for buffered reads on other nodes. The new mount option introduce a chance to choose two different behaviors for O_DIRECT writes: coherency=full, as the default value, will disallow concurrent O_DIRECT writes by taking EX locks. * coherency=buffered, allow concurrent O_DIRECT writes without EX lock among nodes, which gains high performance at risk of getting stale data on other nodes. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-10-11 14:14:55 -07:00
Goldwyn Rodrigues	75d9bbc738	Initialize max_slots early Functions such as ocfs2_recovery_init() make use of osb->max_slots. Initialize osb->max_slots early so the functions may use the correct value. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-10-11 13:56:32 -07:00
Poyo VL	f30d44f3e5	When I tried to compile I got the following warning: fs/ocfs2/slot_map.c: In function ‘ocfs2_init_slot_info’: fs/ocfs2/slot_map.c:360: warning: ‘bytes’ may be used uninitialized in this function fs/ocfs2/slot_map.c:360: note: ‘bytes’ was declared here Compiler: gcc version 4.4.3 (GCC) on Mandriva I'm not sure why this warning occurs, I think compiler don't know that variable "bytes" is initialized when it is sent by reference to ocfs2_slot_map_physical_size and it throws that ugly warning. However, a simple initialization of "bytes" variable with 0 will fix it. Signed-off-by: Ionut Gabriel Popescu <poyo_vl@yahoo.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-10-11 13:45:52 -07:00
Srinivas Eeda	9b5cd10e4c	ocfs2: validate bg_free_bits_count after update This patch adds a safe check to ensure bg_free_bits_count doesn't exceed bg_bits in a group descriptor. This is to avoid on disk corruption that was seen recently. debugfs: group <52803072> Group Chain: 179 Parent Inode: 11 Generation: 2959379682 CRC32: 00000000 ECC: 0000 ## Block# Total Used Free Contig Size 0 52803072 32256 4294965350 34202 18207 4032 ...... Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-10-11 13:43:24 -07:00
Tejun Heo	6370a6ad3b	workqueue: add and use WQ_MEM_RECLAIM flag Add WQ_MEM_RECLAIM flag which currently maps to WQ_RESCUER, mark WQ_RESCUER as internal and replace all external WQ_RESCUER usages to WQ_MEM_RECLAIM. This makes the API users express the intent of the workqueue instead of indicating the internal mechanism used to guarantee forward progress. This is also to make it cleaner to add more semantics to WQ_MEM_RECLAIM. For example, if deemed necessary, memory reclaim workqueues can be made highpri. This patch doesn't introduce any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Steven Whitehouse <swhiteho@redhat.com>	2010-10-11 15:20:26 +02:00
Linus Torvalds	8dc54e49ce	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: update issue_seq on cap grant ceph: send cap release message early on failed revoke. ceph: Update max_len with minimum required size ceph: Fix return value of encode_fh function ceph: avoid null deref in osd request error path ceph: fix list_add usage on unsafe_writes list	2010-10-09 12:03:46 -07:00
Linus Torvalds	267aeb6c14	Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd * 'for-linus' of git://git.open-osd.org/linux-open-osd: exofs: Fix double page_unlock BUG in write_begin/end	2010-10-09 12:03:23 -07:00
Sunil Mushran	4d94aa1b1d	ocfs2/cluster: Bump up dlm protocol to version 1.1 dlm protocol 1.1. activates messages DLM_QUERY_REGION and DLM_QUERY_NODEINFO that are a must for global heartbeat. It also activates o2hb_global_heartbeat_active(). Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-09 10:27:04 -07:00
Jeff Layton	0dd12c2195	cifs: initialize tlink_tree_lock and tlink_tree Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-10-08 16:34:49 +00:00
Boaz Harrosh	f17b1f9f1a	exofs: Fix double page_unlock BUG in write_begin/end This BUG is there since the first submit of the code, but only triggered in last Kernel. It's timing related do to the asynchronous object-creation behaviour of exofs. (Which should be investigated farther) The bug is obvious hence the fixed. Signed-off-by: Boaz Harrosh <Boaz Harrosh bharrosh@panasas.com>	2010-10-08 11:26:54 -04:00
Nicolas Pitre	e4eab08d60	ARM: 6342/1: fix ASLR of PIE executables Since commits `990cb8acf2` and `cc92c28b2d`, it is possible to have full address space layout randomization (ASLR) on ARM. Except that one small change was missing for ASLR of PIE executables. Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2010-10-08 10:02:53 +01:00
Steve French	6ea75952d7	Merge branch 'for-next'	2010-10-08 03:42:03 +00:00
Steve French	d244555613	[CIFS] Remove build warning Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-10-08 03:38:46 +00:00
Jeff Layton	ccc46a7402	cifs: fix module refcount leak in find_domain_name find_domain_name() uses load_nls_default which takes a module reference on the appropriate NLS module, but doesn't put it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-10-08 03:33:08 +00:00
Jeff Layton	2de970ff69	cifs: implement recurring workqueue job to prune old tcons Create a workqueue job that cleans out unused tlinks. For now, it uses a hardcoded expire time of 10 minutes. When it's done, the work rearms itself. On umount, the work is cancelled before tearing down the tlink tree. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-10-08 03:31:21 +00:00
Jeff Layton	3aa1c8c290	cifs: on multiuser mount, set ownership to current_fsuid/current_fsgid (try #5 ) ...when unix extensions aren't enabled. This makes everything on the mount appear to be owned by the current user. This version of the patch differs from previous versions however in that the admin can still force the ownership of all files to appear as a single user via the uid=/gid= options. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-10-08 03:26:28 +00:00
Linus Torvalds	5710c2b275	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: properly account for reclaimed inodes	2010-10-07 13:45:26 -07:00
Steve French	13cd4b7f74	[CIFS] Various small checkpatch cleanups Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-10-07 18:46:32 +00:00
Jeff Layton	0eb8a132c4	cifs: add "multiuser" mount option This allows someone to declare a mount as a multiuser mount. Multiuser mounts also imply "noperm" since we want to allow the server to handle permission checking. It also (for now) requires Kerberos authentication. Eventually, we could expand this to other authtypes, but that requires a scheme to allow per-user credential stashing in some form. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-10-07 18:30:45 +00:00
Jeff Layton	9d002df492	cifs: add routines to build sessions and tcons on the fly This patch is rather large, but it's a bit difficult to do piecemeal... For non-multiuser mounts, everything will basically work as it does today. A call to cifs_sb_tlink will return the "master" tcon link. Turn the tcon pointer in the cifs_sb into a radix tree that uses the fsuid of the process as a key. The value is a new "tcon_link" struct that contains info about a tcon that's under construction. When a new process needs a tcon, it'll call cifs_sb_tcon. That will then look up the tcon_link in the radix tree. If it exists and is valid, it's returned. If it doesn't exist, then we stuff a new tcon_link into the tree and mark it as pending and then go and try to build the session/tcon. If that works, the tcon pointer in the tcon_link is updated and the pending flag is cleared. If the construction fails, then we set the tcon pointer to an ERR_PTR and clear the pending flag. If the radix tree is searched and the tcon_link is marked pending then we go to sleep and wait for the pending flag to be cleared. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-10-07 18:18:00 +00:00
Sage Weil	d91f2438d8	ceph: update issue_seq on cap grant We need to update the issue_seq on any grant operation, be it via an MDS reply or a separate grant message. The update in the grant path was missing. This broke cap release for inodes in which the MDS sent an explicit grant message that was not soon after followed by a successful MDS reply on the same inode. Also fix the signedness on seq locals. Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-07 08:01:50 -07:00
Greg Farnum	21b559de56	ceph: send cap release message early on failed revoke. If an MDS tries to revoke caps that we don't have, we want to send releases early since they probably contain the caps message the MDS is looking for. Previously, we only sent the messages if we didn't have the inode either. But in a multi-mds system we can retain the inode after dropping all caps for a single MDS. Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-07 08:00:24 -07:00
Aneesh Kumar K.V	bba0cd0e3d	ceph: Update max_len with minimum required size encode_fh on error should update max_len with minimum required size, so that caller can redo the call with the reallocated buffer. This is required with open by handle patch series Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-07 08:00:24 -07:00
Aneesh Kumar K.V	92923dcbfc	ceph: Fix return value of encode_fh function encode_fh function should return 255 on error as done by other file system to indicate EOVERFLOW. Also max_len is in sizeof(u32) units and not in bytes. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-07 08:00:23 -07:00
Sage Weil	6bc18876ba	ceph: avoid null deref in osd request error path If we interrupt an osd request, we call __cancel_request, but it wasn't verifying that req->r_osd was non-NULL before dereferencing it. This could cause a crash if osds were flapping and we aborted a request on said osd. Reported-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-07 08:00:23 -07:00
Henry C Chang	936aeb5c4a	ceph: fix list_add usage on unsafe_writes list Fix argument order. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-07 08:00:23 -07:00
Johannes Weiner	081003fff4	xfs: properly account for reclaimed inodes When marking an inode reclaimable, a per-AG counter is increased, the inode is tagged reclaimable in its per-AG tree, and, when this is the first reclaimable inode in the AG, the AG entry in the per-mount tree is also tagged. When an inode is finally reclaimed, however, it is only deleted from the per-AG tree. Neither the counter is decreased, nor is the parent tree's AG entry untagged properly. Since the tags in the per-mount tree are not cleared, the inode shrinker iterates over all AGs that have had reclaimable inodes at one point in time. The counters on the other hand signal an increasing amount of slab objects to reclaim. Since "70e60ce xfs: convert inode shrinker to per-filesystem context" this is not a real issue anymore because the shrinker bails out after one iteration. But the problem was observable on a machine running v2.6.34, where the reclaimable work increased and each process going into direct reclaim eventually got stuck on the xfs inode shrinking path, trying to scan several million objects. Fix this by properly unwinding the reclaimable-state tracking of an inode when it is reclaimed. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: stable@kernel.org Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-10-06 22:35:48 -05:00
Sunil Mushran	43695d095d	ocfs2/cluster: Show per region heartbeat elapsed time This patch adds a per region debugfs file that shows the elapsed time since the time the o2hb timer was last armed. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-06 17:55:09 -07:00
Sunil Mushran	d6aa1c7c9e	ocfs2/cluster: Add mlogs for heartbeat up/down events This patch adds mlogs for o2hb up and down events. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-06 18:50:50 -07:00
Sunil Mushran	1f28530537	ocfs2/cluster: Create debugfs dir/files for each region This patch creates debugfs directory for each o2hb region and creates files to expose the region number and the per region live node bitmap. This information will be useful in debugging cluster issues. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-06 17:55:12 -07:00
Sunil Mushran	a6de013654	ocfs2/cluster: Create debugfs files for live, quorum and failed region bitmaps This patch prints the bitmaps of live, quorum and failed regions. This information will be useful in debugging cluster issues. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-06 17:55:13 -07:00
Sunil Mushran	b1c5ebfbe3	ocfs2/cluster: Maintain bitmap of failed regions In global heartbeat mode, we track the bitmap of regions that have seen heartbeat timeouts. We fence if the number of such regions is greater than or equal to half the number of quorum regions. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-07 17:05:52 -07:00
Sunil Mushran	43182d2a79	ocfs2/cluster: Maintain bitmap of quorum regions o2hb allows online adding of regions. However, a newly added region is not used in quorum calculations unless it has been added on all nodes. This patch tracks a bitmap of such quorum regions. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-06 17:55:16 -07:00
Sunil Mushran	e7d656baf6	ocfs2/cluster: Track bitmap of live heartbeat regions A heartbeat region becomes live (or active) after a fixed number of (steady) iterations. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-06 17:55:18 -07:00
Sunil Mushran	536f0741f3	ocfs2/cluster: Track number of global heartbeat regions In global heartbeat mode, we have a upper limit for the number of active regions. This patch adds the facility to track the number of active global heartbeat regions and fails to start heartbeat if the number exceeds the maximum. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-07 17:03:07 -07:00
Sunil Mushran	823a637ae9	ocfs2/cluster: Maintain live node bitmap per heartbeat region Currently we track a global livenode bitmap that keeps track of all nodes that are heartbeating in all regions. This patch adds the ability to track the livenode bitmap on a per region basis. We will use this facility in a later patch to allow us to withstand the loss of a minority number of regions. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-06 17:55:21 -07:00
Sunil Mushran	8ca8b0bbd8	ocfs2/cluster: Reorganize o2hb debugfs init o2hb debugfs handling is reorganized to allow for easy expansion. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-07 17:01:27 -07:00
Sunil Mushran	0e105d37c2	ocfs2/cluster: Check slots for unconfigured live nodes o2hb currently checks slots for configured nodes only. This patch makes it check the slots for the live nodes too to take care of a race in which a node is removed from the configuration but not from the live map. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-07 17:00:16 -07:00
Sunil Mushran	39a298563e	ocfs2/cluster: Print messages when adding/removing nodes Prints messages when the user adds or removes nodes. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-07 17:30:17 -07:00
Sunil Mushran	18c50cb0d3	ocfs2/cluster: Print messages when adding/removing heartbeat regions Prints messages when the user adds or removes heartbeat regions in global heartbeat mode. These messages are useful when debugging cluster related issues. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-06 18:26:59 -07:00
Sunil Mushran	18cfdf1b1a	ocfs2/dlm: Add message DLM_QUERY_NODEINFO Adds new dlm message DLM_QUERY_NODEINFO that sends the attributes of all registered nodes. This message is sent if the negotiated dlm protocol is 1.1 or higher. If the information of the joining node does not match that of any existing nodes, the join domain request is rejected. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-07 16:47:03 -07:00
Sunil Mushran	5f3c6d9c61	ocfs2: Print message if user mounts without starting global heartbeat In global heartbeat mode, the heartbeat is started by the user. This patch prints an error if the user attempts to mount a volume without starting the heartbeat. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-06 17:55:29 -07:00
Sunil Mushran	ea2034416b	ocfs2/dlm: Add message DLM_QUERY_REGION Adds new dlm message DLM_QUERY_REGION that sends the names of all active heartbeat regions. This message is only sent in the global heartbeat mode. If the regions in the joining node do not fully match the ones in the active nodes, the join domain request is rejected. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-09 10:26:23 -07:00
Sunil Mushran	b3c85c4cdf	ocfs2/cluster: Get all heartbeat regions Export function in o2hb to get a list of heartbeat regions. It also adds an upper limit to the length of the heartbeat region name. o2hb_global_heartbeat_active() currently disables global heartbeat. It will be enabled in a later patch after all the code is added. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-07 14:31:06 -07:00
Sunil Mushran	b1365d0bd1	ocfs2/dlm: Expose dlm_protocol in dlm_state Add dlm_protocol to the list of info shown by the debugfs file, dlm_state. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-06 17:55:34 -07:00
Sunil Mushran	2c442719e9	ocfs2: Add support for heartbeat=global mount option Adds support for heartbeat=global mount option. It ensures that the heartbeat mode passed matches the one enabled on disk. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-07 15:23:50 -07:00
Sunil Mushran	98f486f23b	ocfs2: Add an incompat feature flag OCFS2_FEATURE_INCOMPAT_CLUSTERINFO OCFS2_FEATURE_INCOMPAT_CLUSTERINFO allows us to use sb->s_cluster_info for both userspace and o2cb cluster stacks. It also allows us to extend cluster info to include stack flags. This patch also adds stackflags to sb->s_clusterinfo. It also introduces a clusterinfo flag OCFS2_CLUSTER_O2CB_GLOBAL_HEARTBEAT to denote the enabled global heartbeat mode. This incompat flag can be set/cleared using tunefs.ocfs2 --fs-features. The clusterinfo flag is set/cleared using tunefs.ocfs2 --update-cluster-stack. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-09 10:24:46 -07:00
Sunil Mushran	54b5187b5a	ocfs2/cluster: Add heartbeat mode configfs parameter Add heartbeat mode parameter to the configfs tree. This will be used to set/show the heartbeat mode. The user is free to toggle the mode between local and global as long as there is no active heartbeat region. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>	2010-10-07 15:26:08 -07:00
Linus Torvalds	089eed29b4	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: writeback: always use sb->s_bdi for writeback purposes	2010-10-06 11:11:18 -07:00
Linus Torvalds	8fe9793af0	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: Initialize total_len in fuse_retrieve()	2010-10-06 09:50:41 -07:00
Shirish Pargaonkar	c9928f7040	ntlm authentication and signing - Correct response length for ntlmv2 authentication without extended security Fix incorrect calculation of case sensitive response length in the ntlmv2 (without extended security) response. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-10-06 16:13:19 +00:00
Jeff Layton	29e07c82a9	cifs: fix cifs_show_options to show "username=" or "multiuser" ...based on CIFS_MOUNT_MULTIUSER flag. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-10-06 16:13:11 +00:00
Jeff Layton	6508d904e6	cifs: have find_readable/writable_file filter by fsuid When we implement multiuser mounts, we'll need to filter filehandles by fsuid. Add a flag for multiuser mounts and code to filter by fsuid when it's set. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-10-06 16:12:59 +00:00
Jeff Layton	13cfb7334e	cifs: have cifsFileInfo hold a reference to a tlink rather than tcon pointer cifsFileInfo needs a pointer to a tcon, but it doesn't currently hold a reference to it. Change it to keep a pointer to a tcon_link instead and hold a reference to it. That will keep the tcon from being freed until the file is closed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-10-06 16:12:49 +00:00
Jeff Layton	7ffec37245	cifs: add refcounted and timestamped container for holding tcons Eventually, we'll need to track the use of tcons on a per-sb basis, so that we know when it's ok to tear them down. Begin this conversion by adding a new "tcon_link" struct and accessors that get it. For now, the core data structures are untouched -- cifs_sb still just points to a single tcon and the pointers are just cast to deal with the accessor functions. A later patch will flesh this out. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-10-06 16:12:44 +00:00
Steven Whitehouse	134669854e	GFS2: Fix type mapping for demote_rq interface Mostly the glock operations follow the type of the glock. The one exception is the transaction glock, so we need to check for that directly. Reported-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-10-06 09:58:44 +01:00
Felipe Contreras	5a44a73b90	autofs4: Only declare function when CONFIG_COMPAT is defined The patch solves the following warnings message when CONFIG_COMPAT is not defined: fs/autofs4/root.c:31: warning: ‘autofs4_root_compat_ioctl’ declared ‘static’ but never defined Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Cc: Ian Kent <raven@themaw.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-05 11:02:15 +02:00
Márton Németh	26daad8067	autofs: Only declare function when CONFIG_COMPAT is defined The patch solves the following warnings message when CONFIG_COMPAT is not defined: fs/autofs/root.c:30: warning: ‘autofs_root_compat_ioctl’ declared ‘static’ but never defined Signed-off-by: Márton Németh <nm127@freemail.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-05 11:02:15 +02:00
Petr Vandrovec	2a4df5d332	ncpfs: Lock socket in ncpfs while setting its callbacks Otherwise partially updated pointers could be seen if pointer update is not atomic. Signed-off-by: Petr Vandrovec <petr@vandrovec.name> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-05 11:02:14 +02:00
Arnd Bergmann	b89f432133	fs/locks.c: prepare for BKL removal This prepares the removal of the big kernel lock from the file locking code. We still use the BKL as long as fs/lockd uses it and ceph might sleep, but we can flip the definition to a private spinlock as soon as that's done. All users outside of fs/lockd get converted to use lock_flocks() instead of lock_kernel() where appropriate. Based on an earlier patch to use a spinlock from Matthew Wilcox, who has attempted this a few times before, the earliest patch from over 10 years ago turned it into a semaphore, which ended up being slower than the BKL and was subsequently reverted. Someone should do some serious performance testing when this becomes a spinlock, since this has caused problems before. Using a spinlock should be at least as good as the BKL in theory, but who knows... Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Matthew Wilcox <willy@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Sage Weil <sage@newdream.net> Cc: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org	2010-10-05 11:02:04 +02:00
Petr Vandrovec	2e54eb96e2	BKL: Remove BKL from ncpfs Dozen of changes in ncpfs to provide some locking other than BKL. In readdir cache unlock and mark complete first page as last operation, so it can be used for synchronization, as code intended. When updating dentry name on case insensitive filesystems do at least some basic locking... Hold i_mutex when updating inode fields. Push some ncp_conn_is_valid down to ncp_request. Connection can become invalid at any moment, and fewer error code paths to test the better. Use i_size_{read,write} to modify file size. Set inode's backing_dev_info as ncpfs has its own special bdi. In ioctl unbreak ioctls invoked on filesystem mounted 'ro' - tests are for inode writeable or owner match, but were turned to filesystem writeable and inode writeable or owner match. Also collect all permission checks in single place. Add some locking, and remove comments saying that it would be cool to add some locks to the code. Constify some pointers. Signed-off-by: Petr Vandrovec <petr@vandrovec.name> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-04 21:10:52 +02:00
Arnd Bergmann	6005679412	BKL: Remove BKL from OCFS2 The BKL in ocfs2/dlmfs is used in put_super, fill_super and remount_fs that are all three protected by the superblocks s_umount rw_semaphore. The use in ocfs2_control_open is evidently unrelated and the function is protected by ocfs2_control_lock. Therefore it is safe to remove the BKL entirely. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com>	2010-10-04 21:10:51 +02:00
Arnd Bergmann	3dbc4b32d0	BKL: Remove BKL from squashfs The BKL is only used in put_super and fill_super, which are both protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Phillip Lougher <phillip@lougher.demon.co.uk>	2010-10-04 21:10:50 +02:00
Arnd Bergmann	1a028dd2dd	BKL: Remove BKL from jffs2 The BKL is only used in put_super, fill_super and remount_fs that are all three protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: David Woodhouse <dwmw2@infradead.org>	2010-10-04 21:10:50 +02:00
Arnd Bergmann	18dfe89d7c	BKL: Remove BKL from ecryptfs The BKL is only used in fill_super, which is protected by the superblocks s_umount rw_semaphorei, and in fasync, which does not do anything that could require the BKL. Therefore it is safe to remove the BKL entirely. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Dustin Kirkland <kirkland@canonical.com> Cc: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Cc: ecryptfs-devel@lists.launchpad.net	2010-10-04 21:10:48 +02:00
Arnd Bergmann	77f2fe036c	BKL: Remove BKL from afs The BKL is only used in put_super and fill_super, which are both protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: linux-afs@lists.infradead.org Cc: David Howells <dhowells@redhat.com>	2010-10-04 21:10:48 +02:00
Arnd Bergmann	00e300e1b6	BKL: Remove BKL from autofs4 autofs4 uses the BKL only to guard its ioctl operations. This can be trivially converted to use a mutex, as we have done with most device drivers before. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ian Kent <raven@themaw.net>	2010-10-04 21:10:46 +02:00
Arnd Bergmann	4f819a7899	BKL: Remove BKL from isofs As in other file systems, we can replace the big kernel lock with a private mutex in isofs. This means we can now access multiple file systems concurrently, but it also means that we serialize readdir and lookup across sleeping operations which previously released the big kernel lock. This should not matter though, as these operations are in practice serialized through the hardware access. The isofs_get_blocks functions now does not take any lock any more, it used to recursively get the BKL. After looking at the code for hours, I convinced myself that it was never needed here anyway, because it only reads constant fields of the inode and writes to a buffer head array that is at this time only visible to the caller. The get_sb and fill_super operations do not need the locking at all because they operate on a file system that is either about to be created or to be destroyed but in either case is not visible to other threads. Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-04 21:10:45 +02:00
Arnd Bergmann	3768744cfe	BKL: Remove BKL from fat The lock_kernel in fat_put_super is not needed because it only protects the super block itself and we know that no other thread can reach it because we are about to kfree the object. In the two fill_super functions, this converts the locking to use lock_super like elsewhere in the fat code. This is probably not needed either, but is consistent and puts us on the safe side. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Jan Blunck <jblunck@infradead.org>	2010-10-04 21:10:45 +02:00
Jan Blunck	3e44f9f1dc	BKL: Remove BKL from ext2 filesystem The BKL is still used in ext2_put_super(), ext2_fill_super(), ext2_sync_fs() ext2_remount() and ext2_write_inode(). From these calls ext2_put_super(), ext2_fill_super() and ext2_remount() are protected against each other by the struct super_block s_umount rw semaphore. The call in ext2_write_inode() could only protect the modification of the ext2_sb_info through ext2_update_dynamic_rev() against concurrent ext2_sync_fs() or ext2_remount(). ext2_fill_super() and ext2_put_super() can be left out because you need a valid filesystem reference in all three cases, which you do not have when you are one of these functions. If the BKL is only protecting the modification of the ext2_sb_info it can safely be removed since this is protected by the struct ext2_sb_info s_lock. Signed-off-by: Jan Blunck <jblunck@infradead.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-04 21:10:44 +02:00
Jan Blunck	6841c05021	BKL: Remove BKL from do_new_mount() After pushing down the BKL to the get_sb/fill_super operations of the filesystems that still make usage of the BKL it is safe to remove it from do_new_mount(). I've read through all the code formerly covered by the BKL inside do_kern_mount() and have satisfied myself that it doesn't need the BKL any more. Signed-off-by: Jan Blunck <jblunck@infradead.org> Cc: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-04 21:10:43 +02:00
Jan Blunck	efdffb5403	BKL: Remove BKL from NTFS The BKL is only used in put_super, fill_super and remount_fs that are all three protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Jan Blunck <jblunck@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-04 21:10:41 +02:00
Jan Blunck	d6d4c19c5f	BKL: Remove BKL from NILFS2 The BKL is only used in put_super, fill_super and remount_fs that are all three protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Jan Blunck <jblunck@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-04 21:10:41 +02:00
Jan Blunck	22b26db6f8	BKL: Remove BKL from JFS The BKL is only used in put_super, fill_super and remount_fs that are all three protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Jan Blunck <jblunck@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-04 21:10:40 +02:00
Jan Blunck	8526fb37c9	BKL: Remove BKL from HFS The BKL is only used in put_super and fill_super that are both protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Jan Blunck <jblunck@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-04 21:10:39 +02:00
Jan Blunck	f2143c4e2e	BKL: Remove BKL from ext4 filesystem The BKL is still used in ext4_put_super(), ext4_fill_super() and ext4_remount(). All three calles are protected against concurrent calls by the s_umount rw semaphore of struct super_block. Therefore the BKL is protecting nothing in this case. Signed-off-by: Jan Blunck <jblunck@infradead.org> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-04 21:10:38 +02:00
Jan Blunck	77b54a46a8	BKL: Remove BKL from ext3_put_super() and ext3_remount() The BKL lock is protecting the remounting against a potential call to ext3_put_super(). This could not happen, since this is protected by the s_umount rw semaphore of struct super_block. Therefore I think the BKL is protecting nothing here. Signed-off-by: Jan Blunck <jblunck@infradead.org> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-04 21:10:37 +02:00
Jan Blunck	d646cf82e9	BKL: Remove BKL from ext3 fill_super() The BKL is protecting nothing than two memory allocations here. Signed-off-by: Jan Blunck <jblunck@infradead.org> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-04 21:10:36 +02:00
Jan Blunck	b0991aa324	BKL: Remove BKL from CifsFS The BKL is only used in put_super and fill_super that are both protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Jan Blunck <jblunck@infradead.org> Cc: Steve French <smfrench@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-04 21:10:36 +02:00
Jan Blunck	ba13d597a6	BKL: Remove BKL from BFS The BKL is only used in put_super and fill_super that are both protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Jan Blunck <jblunck@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-04 21:10:35 +02:00
Jan Blunck	74c41429ae	BKL: Remove BKL from Amiga FFS The BKL is only used in put_super, fill_super and remount_fs that are all three protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Jan Blunck <jblunck@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-04 21:10:34 +02:00
Jan Blunck	db71922217	BKL: Explicitly add BKL around get_sb/fill_super This patch is a preparation necessary to remove the BKL from do_new_mount(). It explicitly adds calls to lock_kernel()/unlock_kernel() around get_sb/fill_super operations for filesystems that still uses the BKL. I've read through all the code formerly covered by the BKL inside do_kern_mount() and have satisfied myself that it doesn't need the BKL any more. do_kern_mount() is already called without the BKL when mounting the rootfs and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called from various places without BKL: simple_pin_fs(), nfs_do_clone_mount() through nfs_follow_mountpoint(), afs_mntpt_do_automount() through afs_mntpt_follow_link(). Both later functions are actually the filesystems follow_link inode operation. vfs_kern_mount() is calling the specified get_sb function and lets the filesystem do its job by calling the given fill_super function. Therefore I think it is safe to push down the BKL from the VFS to the low-level filesystems get_sb/fill_super operation. [arnd: do not add the BKL to those file systems that already don't use it elsewhere] Signed-off-by: Jan Blunck <jblunck@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Christoph Hellwig <hch@infradead.org>	2010-10-04 21:10:10 +02:00
Christoph Hellwig	aaead25b95	writeback: always use sb->s_bdi for writeback purposes We currently use struct backing_dev_info for various different purposes. Originally it was introduced to describe a backing device which includes an unplug and congestion function and various bits of readahead information and VM-relevant flags. We're also using for tracking dirty inodes for writeback. To make writeback properly find all inodes we need to only access the per-filesystem backing_device pointed to by the superblock in ->s_bdi inside the writeback code, and not the instances pointeded to by inode->i_mapping->backing_dev which can be overriden by special devices or might not be set at all by some filesystems. Long term we should split out the writeback-relevant bits of struct backing_device_info (which includes more than the current bdi_writeback) and only point to it from the superblock while leaving the traditional backing device as a separate structure that can be overriden by devices. The one exception for now is the block device filesystem which really wants different writeback contexts for it's different (internal) inodes to handle the writeout more efficiently. For now we do this with a hack in fs-writeback.c because we're so late in the cycle, but in the future I plan to replace this with a superblock method that allows for multiple writeback contexts per filesystem. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-04 14:25:33 +02:00
Geert Uytterhoeven	0157443c56	fuse: Initialize total_len in fuse_retrieve() fs/fuse/dev.c:1357: warning: ‘total_len’ may be used uninitialized in this function Initialize total_len to zero, else its value will be undefined. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2010-10-04 10:45:32 +02:00
Linus Torvalds	c6ea21e35b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: prevent infinite recursion in cifs_reconnect_tcon cifs: set backing_dev_info on new S_ISREG inodes	2010-10-01 15:03:37 -07:00
Frederic Weisbecker	9d8117e72b	reiserfs: fix unwanted reiserfs lock recursion Prevent from recursively locking the reiserfs lock in reiserfs_unpack() because we may call journal_begin() that requires the lock to be taken only once, otherwise it won't be able to release the lock while taking other mutexes, ending up in inverted dependencies between the journal mutex and the reiserfs lock for example. This fixes: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.35.4.4a #3 ------------------------------------------------------- lilo/1620 is trying to acquire lock: (&journal->j_mutex){+.+...}, at: [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs] but task is already holding lock: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&REISERFS_SB(s)->lock){+.+.+.}: [<c10562b7>] lock_acquire+0x67/0x80 [<c12facad>] __mutex_lock_common+0x4d/0x410 [<c12fb0c8>] mutex_lock_nested+0x18/0x20 [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs] [<d0325c06>] do_journal_begin_r+0x86/0x340 [reiserfs] [<d0325f77>] journal_begin+0x77/0x140 [reiserfs] [<d0315be4>] reiserfs_remount+0x224/0x530 [reiserfs] [<c10b6a20>] do_remount_sb+0x60/0x110 [<c10cee25>] do_mount+0x625/0x790 [<c10cf014>] sys_mount+0x84/0xb0 [<c12fca3d>] syscall_call+0x7/0xb -> #0 (&journal->j_mutex){+.+...}: [<c10560f6>] __lock_acquire+0x1026/0x1180 [<c10562b7>] lock_acquire+0x67/0x80 [<c12facad>] __mutex_lock_common+0x4d/0x410 [<c12fb0c8>] mutex_lock_nested+0x18/0x20 [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs] [<d0325f77>] journal_begin+0x77/0x140 [reiserfs] [<d0326271>] reiserfs_persistent_transaction+0x41/0x90 [reiserfs] [<d030d06c>] reiserfs_get_block+0x22c/0x1530 [reiserfs] [<c10db9db>] __block_prepare_write+0x1bb/0x3a0 [<c10dbbe6>] block_prepare_write+0x26/0x40 [<d030b738>] reiserfs_prepare_write+0x88/0x170 [reiserfs] [<d03294d6>] reiserfs_unpack+0xe6/0x120 [reiserfs] [<d0329782>] reiserfs_ioctl+0x272/0x320 [reiserfs] [<c10c3188>] vfs_ioctl+0x28/0xa0 [<c10c3bbd>] do_vfs_ioctl+0x32d/0x5c0 [<c10c3eb3>] sys_ioctl+0x63/0x70 [<c12fca3d>] syscall_call+0x7/0xb other info that might help us debug this: 2 locks held by lilo/1620: #0: (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<d032945a>] reiserfs_unpack+0x6a/0x120 [reiserfs] #1: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs] stack backtrace: Pid: 1620, comm: lilo Not tainted 2.6.35.4.4a #3 Call Trace: [<c10560f6>] __lock_acquire+0x1026/0x1180 [<c10562b7>] lock_acquire+0x67/0x80 [<c12facad>] __mutex_lock_common+0x4d/0x410 [<c12fb0c8>] mutex_lock_nested+0x18/0x20 [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs] [<d0325f77>] journal_begin+0x77/0x140 [reiserfs] [<d0326271>] reiserfs_persistent_transaction+0x41/0x90 [reiserfs] [<d030d06c>] reiserfs_get_block+0x22c/0x1530 [reiserfs] [<c10db9db>] __block_prepare_write+0x1bb/0x3a0 [<c10dbbe6>] block_prepare_write+0x26/0x40 [<d030b738>] reiserfs_prepare_write+0x88/0x170 [reiserfs] [<d03294d6>] reiserfs_unpack+0xe6/0x120 [reiserfs] [<d0329782>] reiserfs_ioctl+0x272/0x320 [reiserfs] [<c10c3188>] vfs_ioctl+0x28/0xa0 [<c10c3bbd>] do_vfs_ioctl+0x32d/0x5c0 [<c10c3eb3>] sys_ioctl+0x63/0x70 [<c12fca3d>] syscall_call+0x7/0xb Reported-by: Jarek Poplawski <jarkao2@gmail.com> Tested-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: All since 2.6.32 <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-01 10:50:59 -07:00
Frederic Weisbecker	3f259d092c	reiserfs: fix dependency inversion between inode and reiserfs mutexes The reiserfs mutex already depends on the inode mutex, so we can't lock the inode mutex in reiserfs_unpack() without using the safe locking API, because reiserfs_unpack() is always called with the reiserfs mutex locked. This fixes: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.35c #13 ------------------------------------------------------- lilo/1606 is trying to acquire lock: (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs] but task is already holding lock: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&REISERFS_SB(s)->lock){+.+.+.}: [<c1056347>] lock_acquire+0x67/0x80 [<c12f083d>] __mutex_lock_common+0x4d/0x410 [<c12f0c58>] mutex_lock_nested+0x18/0x20 [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs] [<d0329e9a>] reiserfs_lookup_privroot+0x2a/0x90 [reiserfs] [<d0316b81>] reiserfs_fill_super+0x941/0xe60 [reiserfs] [<c10b7d17>] get_sb_bdev+0x117/0x170 [<d0313e21>] get_super_block+0x21/0x30 [reiserfs] [<c10b74ba>] vfs_kern_mount+0x6a/0x1b0 [<c10b7659>] do_kern_mount+0x39/0xe0 [<c10cebe0>] do_mount+0x340/0x790 [<c10cf0b4>] sys_mount+0x84/0xb0 [<c12f25cd>] syscall_call+0x7/0xb -> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}: [<c1056186>] __lock_acquire+0x1026/0x1180 [<c1056347>] lock_acquire+0x67/0x80 [<c12f083d>] __mutex_lock_common+0x4d/0x410 [<c12f0c58>] mutex_lock_nested+0x18/0x20 [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs] [<d0329772>] reiserfs_ioctl+0x272/0x320 [reiserfs] [<c10c3228>] vfs_ioctl+0x28/0xa0 [<c10c3c5d>] do_vfs_ioctl+0x32d/0x5c0 [<c10c3f53>] sys_ioctl+0x63/0x70 [<c12f25cd>] syscall_call+0x7/0xb other info that might help us debug this: 1 lock held by lilo/1606: #0: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs] stack backtrace: Pid: 1606, comm: lilo Not tainted 2.6.35c #13 Call Trace: [<c1056186>] __lock_acquire+0x1026/0x1180 [<c1056347>] lock_acquire+0x67/0x80 [<c12f083d>] __mutex_lock_common+0x4d/0x410 [<c12f0c58>] mutex_lock_nested+0x18/0x20 [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs] [<d0329772>] reiserfs_ioctl+0x272/0x320 [reiserfs] [<c10c3228>] vfs_ioctl+0x28/0xa0 [<c10c3c5d>] do_vfs_ioctl+0x32d/0x5c0 [<c10c3f53>] sys_ioctl+0x63/0x70 [<c12f25cd>] syscall_call+0x7/0xb Reported-by: Jarek Poplawski <jarkao2@gmail.com> Tested-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: <stable@kernel.org> [2.6.32 and later] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-01 10:50:59 -07:00
Jiri Olsa	3036e7b490	proc: make /proc/pid/limits world readable Having the limits file world readable will ease the task of system management on systems where root privileges might be restricted. Having admin restricted with root priviledges, he/she could not check other users process' limits. Also it'd align with most of the /proc stat files. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: Eugene Teo <eugene@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-01 10:50:59 -07:00
Jeff Layton	f569599ae7	cifs: prevent infinite recursion in cifs_reconnect_tcon cifs_reconnect_tcon is called from smb_init. After a successful reconnect, cifs_reconnect_tcon will call reset_cifs_unix_caps. That function will, in turn call CIFSSMBQFSUnixInfo and CIFSSMBSetFSUnixInfo. Those functions also call smb_init. It's possible for the session and tcon reconnect to succeed, and then for another cifs_reconnect to occur before CIFSSMBQFSUnixInfo or CIFSSMBSetFSUnixInfo to be called. That'll cause those functions to call smb_init and cifs_reconnect_tcon again, ad infinitum... Break the infinite recursion by having those functions use a new smb_init variant that doesn't attempt to perform a reconnect. Reported-and-Tested-by: Michal Suchanek <hramrach@centrum.cz> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-10-01 17:50:08 +00:00
Christoph Hellwig	40de9a7ceb	hfsplus: fix rename over directories When renaming over a directory we need to use hfsplus_rmdir instead of hfsplus_unlink to evict the victim. This makes sure we properly error out on non-empty directory as required by Posix (BZ #16571), and it also makes sure we do the right thing in case i_nlink will every be set correctly for directories on hfsplus. Reported-by: Vlado Plaga <rechner@vlado-do.de> Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 09:12:08 +02:00
Thomas Gleixner	467c3d9cd5	hfsplus: convert tree_lock to mutex tree_lock is used as mutex so make it a mutex. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:46:52 +02:00
Christoph Hellwig	7fcc99f4f2	hfsplus: add missing extent locking in hfsplus_write_inode Most of the extent handling code already does proper SMP locking, but hfsplus_write_inode was calling into hfsplus_ext_write_extent without taking the extents_lock. Fix this by splitting hfsplus_ext_write_extent into an internal helper that expects the lock, and a public interface that first acquires it. Also add a few locking asserts and document the locking rules in hfsplus_fs.h. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:46:31 +02:00
Christoph Hellwig	89755dcace	hfsplus: protect readdir against removals from open_dir_list We already have i_mutex for readdir and the namespace operations that add entries to open_dir_list, the only thing that was missing was the removal in hfsplus_dir_release. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:45:25 +02:00
Christoph Hellwig	84adede312	hfsplus: use atomic bitops for the superblock flags The flags in the HFS+-specific superlock do get modified during runtime, use atomic bitops to make the modifications SMP safe. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:45:20 +02:00
Christoph Hellwig	7ac9fb9c2a	hfsplus: add per-superblock lock for volume header updates Lock updates to the mutal fields in the volume header, and document the locing in the hfsplus_sb_info structure. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:45:08 +02:00
Christoph Hellwig	58a818f532	hfsplus: remove the rsrc_inodes list We never walk the list - the only reason for it is to make the resource fork inodes appear hashed to the writeback code. Borrow a trick from JFS to do that without needing a list head. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:44:02 +02:00
Christoph Hellwig	66e5db05bb	hfsplus: do not cache and write next_alloc We never look at it, nor change the next_alloc field in the superblock. So don't bother caching it or writing it out in hfsplus_sync_fs. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:43:58 +02:00
Christoph Hellwig	f17c89bfcc	hfsplus: fix error handling in hfsplus_symlink We need to free the inode again on a hfsplus_create_cat failure. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:43:54 +02:00
Christoph Hellwig	30d3abbec7	hfsplus: merge mknod/mkdir/creat Make hfsplus_mkdir and hfsplus_create call hfsplus_mknod instead of duplicating the code. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:43:50 +02:00
Christoph Hellwig	b5080f77ed	hfsplus: clean up hfsplus_write_inode Add a new hfsplus_system_write_inode for writing the special system inodes and streamline the fastpath write_inode code. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:43:43 +02:00
Christoph Hellwig	fc4fff8210	hfsplus: clean up hfsplus_iget Add a new hfsplus_system_read_inode for reading the special system inodes and streamline the fastpath iget code. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:43:41 +02:00
Christoph Hellwig	6af502de22	hfsplus: fix HFSPLUS_I calling convention HFSPLUS_I doesn't return a pointer to the hfsplus-specific inode information like all other FOO_I macros, but dereference the pointer in a way that made it look like a direct struct derefence. This only works as long as the HFSPLUS_I macro is used directly and prevents us from keepig a local hfsplus_inode_info pointer. Fix the calling convention and introduce a local hip variable in all functions that use it constantly. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:43:31 +02:00
Christoph Hellwig	dd73a01a30	hfsplus: fix HFSPLUS_SB calling convention HFSPLUS_SB doesn't return a pointer to the hfsplus-specific superblock information like all other FOO_SB macros, but dereference the pointer in a way that made it look like a direct struct derefence. This only works as long as the HFSPLUS_SB macro is used directly and prevents us from keepig a local hfsplus_sb_info pointer. Fix the calling convention and introduce a local sbi variable in all functions that use it constantly. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:42:59 +02:00
Christoph Hellwig	e753a62156	hfsplus: remove BKL from hfsplus_put_super Except for ->put_super the BKL is now gone from HFS, which means it's superflous there too as ->put_super is serialized by the VFS. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:41:53 +02:00
Christoph Hellwig	a9fdbf8c60	hfsplus: use alloc_mutex in hfsplus_sync_fs Use alloc_mutex to protect hfsplus_sync_fs against itself and concurrent allocations, which allows to get rid of lock_super in hfsplus. Note that most fields in the superblock still aren't protected against concurrent allocations, that will follow later. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:41:50 +02:00
Christoph Hellwig	40bf48afe9	hfsplus: introduce alloc_mutex Use a new per-sb alloc_mutex instead of abusing i_mutex of the alloc_file to protect block allocations. This gets rid of lockdep nesting warnings and prepares for extending the scope of alloc_mutex. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:41:39 +02:00
Christoph Hellwig	6333816ade	hfsplus: protect setflags using i_mutex Use i_mutex for protecting against concurrent setflags ioctls like in other filesystems and get rid of the BKL in hfsplus_ioctl. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:41:35 +02:00
Christoph Hellwig	94744567fe	hfsplus: split hfsplus_ioctl Give each ioctl command a function of it's own. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:41:31 +02:00
Christoph Hellwig	249e635300	hfsplus: fix BKL leak in hfsplus_ioctl Currenly the HFSPLUS_IOC_EXT2_GETFLAGS case never unlocks the BKL, which can lead to easily reproduced lockups when doing multiple GETFLAGS ioctls. Fix this by only taking the BKL for the HFSPLUS_IOC_EXT2_SETFLAGS case as neither HFSPLUS_IOC_EXT2_GETFLAGS not the default error case needs it. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2010-10-01 05:41:27 +02:00
Bob Peterson	46290341cd	GFS2 fatal: filesystem consistency error on rename This patch fixes a GFS2 problem whereby the first rename after a mount can result in a file system consistency error being flagged improperly and cause the file system to withdraw. The problem is that the rename code tries to run the rgrp list with function gfs2_blk2rgrpd before the rgrp list is guaranteed to be read in from disk. The patch makes the rename function hold the rindex glock (as the gfs2_unlink code does today) which reads in the rgrp list if need be. There were a total of three places in the rename code that improperly referenced the rgrp list without the rindex glock and this patch fixes all three. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-30 17:23:03 +01:00
Linus Torvalds	0d4911081c	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Don't walk off the end of fast symlinks.	2010-09-29 20:38:07 -07:00
Joel Becker	1fc8a11786	ocfs2: Don't walk off the end of fast symlinks. ocfs2 fast symlinks are NUL terminated strings stored inline in the inode data area. However, disk corruption or a local attacker could, in theory, remove that NUL. Because we're using strlen() (my fault, introduced in a731d1 when removing vfs_follow_link()), we could walk off the end of that string. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org	2010-09-29 17:33:05 -07:00
Jeff Layton	522440ed55	cifs: set backing_dev_info on new S_ISREG inodes Testing on very recent kernel (2.6.36-rc6) made this warning pop: WARNING: at fs/fs-writeback.c:87 inode_to_bdi+0x65/0x70() Hardware name: Dirtiable inode bdi default != sb bdi cifs ...the following patch fixes it and seems to be the obviously correct thing to do for cifs. Cc: stable@kernel.org Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:23:23 +00:00
Jeff Layton	f3983c2133	cifs: fix handling of signing with writepages (try #6 ) Get a reference to the file early so we can eventually base the decision about signing on the correct tcon. If that doesn't work for some reason, then fall back to generic_writepages. That's just as likely to fail, but it simplifies the error handling. In truth, I'm not sure how that could occur anyway, so maybe a NULL open_file here ought to be a BUG()? After that, we drop the reference to the open_file and then we re-get one prior to each WriteAndX call. This helps ensure that the filehandle isn't held open any longer than necessary and that open files are reclaimed prior to each write call. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:33 +00:00
Jeff Layton	f7a40689fd	cifs: have cifs_new_fileinfo take a tcon arg To minimize calls to cifs_sb_tcon and to allow for a clear error path if a tcon can't be acquired. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:33 +00:00
Jeff Layton	0d424ad0a4	cifs: add cifs_sb_master_tcon and convert some callers to use it At mount time, we'll always need to create a tcon that will serve as a template for others that are associated with the mount. This tcon is known as the "master" tcon. In some cases, we'll need to use that tcon regardless of who's accessing the mount. Add an accessor function for the master tcon and go ahead and switch the appropriate places to use it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:33 +00:00
Jeff Layton	f6acb9d059	cifs: temporarily rename cifs_sb->tcon to ptcon to catch stragglers Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:33 +00:00
Jeff Layton	a6e8a8455c	cifs: add function to get a tcon from cifs_sb When we convert cifs to do multiple sessions per mount, we'll need more than one tcon per superblock. At that point "cifs_sb->tcon" will make no sense. Add a new accessor function that gets a tcon given a cifs_sb. For now, it just returns cifs_sb->tcon. Later it'll do more. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:32 +00:00
Jeff Layton	ba00ba64cf	cifs: make various routines use the cifsFileInfo->tcon pointer ...where it's available and appropriate. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:32 +00:00
Steve French	d3bf5221d3	[CIFS] Fix ordering of cleanup on module init failure If registering fs cache failed, we weren't cleaning up proc. Acked-by: Jeff Layton <jlayton@redhat.com> CC: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:32 +00:00
Steve French	17edec6f56	[CIFS] Remove obsolete header We decided not to use connector to do the upcalls so cn_cifs.h is obsolete - remove it. Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:32 +00:00
Jeff Layton	ab9db8b737	cifs: allow matching of tcp sessions in CifsNew state With commit `7332f2a621`, cifsd will no longer exit when the socket abends and the tcpStatus is CifsNew. With that change, there's no reason to avoid matching an existing session in this state. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:31 +00:00
Jeff Layton	5fe97cfddc	cifs: add tcon field to cifsFileInfo struct Eventually, we'll have more than one tcon per superblock. At that point, we'll need to know which one is associated with a particular fid. For now, this is just set from the cifs_sb->tcon pointer, but eventually the caller of cifs_new_fileinfo will pass a tcon pointer in. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:31 +00:00
Stefan Metzmacher	736a332059	cifs: add "mfsymlinks" mount option This is the start for an implementation of "Minshall+French Symlinks" (see http://wiki.samba.org/index.php/UNIX_Extensions#Minshall.2BFrench_symlinks). Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:31 +00:00
Stefan Metzmacher	1b12b9c15b	cifs: use Minshall+French symlink functions If configured, Minshall+French Symlinks are used against all servers. If the server supports UNIX Extensions, we still create Minshall+French Symlinks on write, but on read we fallback to UNIX Extension symlinks. Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:31 +00:00
Stefan Metzmacher	8713d01db8	cifs: implement CIFSCreateMFSymLink() Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:31 +00:00
Stefan Metzmacher	18bddd1059	cifs: implement CIFSFormatMFSymlink() Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:30 +00:00
Stefan Metzmacher	0fd43ae475	cifs: implement CIFSQueryMFSymLink() Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:30 +00:00
Stefan Metzmacher	8bfb50a882	cifs: implement CIFSCouldBeMFSymlink() and CIFSCheckMFSymlink() Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:30 +00:00
Stefan Metzmacher	c69c1b6eae	cifs: implement CIFSParseMFSymlink() Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:30 +00:00
Ben Greear	3eb9a8893a	cifs: Allow binding to local IP address. When using multi-homed machines, it's nice to be able to specify the local IP to use for outbound connections. This patch gives cifs the ability to bind to a particular IP address. Usage: mount -t cifs -o srcaddr=192.168.1.50,user=foo, ... Usage: mount -t cifs -o srcaddr=2002:💯1,user=foo, ... Acked-by: Jeff Layton <jlayton@redhat.com> Acked-by: Dr. David Holder <david.holder@erion.co.uk> Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:29 +00:00
Shirish Pargaonkar	2b149f1197	cifs NTLMv2/NTLMSSP ntlmv2 within ntlmssp autentication code Attribue Value (AV) pairs or Target Info (TI) pairs are part of ntlmv2 authentication. Structure ntlmv2_resp had only definition for two av pairs. So removed it, and now allocation of av pairs is dynamic. For servers like Windows 7/2008, av pairs sent by server in challege packet (type 2 in the ntlmssp exchange/negotiation) can vary. Server sends them during ntlmssp negotiation. So when ntlmssp is used as an authentication mechanism, type 2 challenge packet from server has this information. Pluck it and use the entire blob for authenticaiton purpose. If user has not specified, extract (netbios) domain name from the av pairs which is used to calculate ntlmv2 hash. Servers like Windows 7 are particular about the AV pair blob. Servers like Windows 2003, are not very strict about the contents of av pair blob used during ntlmv2 authentication. So when security mechanism such as ntlmv2 is used (not ntlmv2 in ntlmssp), there is no negotiation and so genereate a minimal blob that gets used in ntlmv2 authentication as well as gets sent. Fields tilen and tilbob are session specific. AV pair values are defined. To calculate ntlmv2 response we need ti/av pair blob. For sec mech like ntlmssp, the blob is plucked from type 2 response from the server. From this blob, netbios name of the domain is retrieved, if user has not already provided, to be included in the Target String as part of ntlmv2 hash calculations. For sec mech like ntlmv2, create a minimal, two av pair blob. The allocated blob is freed in case of error. In case there is no error, this blob is used in calculating ntlmv2 response (in CalcNTLMv2_response) and is also copied on the response to the server, and then freed. The type 3 ntlmssp response is prepared on a buffer, 5 * sizeof of struct _AUTHENTICATE_MESSAGE, an empirical value large enough to hold _AUTHENTICATE_MESSAGE plus a blob with max possible 10 values as part of ntlmv2 response and lmv2 keys and domain, user, workstation names etc. Also, kerberos gets selected as a default mechanism if server supports it, over the other security mechanisms. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:29 +00:00
Shirish Pargaonkar	5f98ca9afb	cifs NTLMv2/NTLMSSP Change variable name mac_key to session key to reflect the key it holds Change name of variable mac_key to session key. The reason mac_key was changed to session key is, this structure does not hold message authentication code, it holds the session key (for ntlmv2, ntlmv1 etc.). mac is generated as a signature in cifs_calc* functions. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:29 +00:00
Suresh Jayaraman	aa91c7e4ab	cifs: fix broken oplock handling cifs_new_fileinfo() does not use the 'oplock' value from the callers. Instead, it sets it to REQ_OPLOCK which seems wrong. We should be using the oplock value obtained from the Server to set the inode's clientCanCacheAll or clientCanCacheRead flags. Fix this by passing oplock from the callers to cifs_new_fileinfo(). This change dates back to commit `a6ce4932` (2.6.30-rc3). So, all the affected versions will need this fix. Please Cc stable once reviewed and accepted. Cc: Stable <stable@kernel.org> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:28 +00:00
Suresh Jayaraman	a347ecb209	cifs: use type __u32 instead of int for the oplock parameter ... and avoid implicit casting from a signed type. Also, pass oplock by value instead by reference as we don't intend to change the value in cifs_open_inode_helper(). Thanks to Jeff Layton for spotting this. Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-29 19:04:28 +00:00
Steven Whitehouse	feb47ca931	GFS2: Improve journal allocation via sysfs Recently a feature was added to GFS2 to allow journal id allocation via sysfs. This patch builds upon that so that a negative journal id will be treated as an error code to be passed back as the return code from mount. This allows termination of the mount process if there is a failure. Also, the process has been updated so that the kernel will wait for a journal id, even in the "spectator" case. This is required in order to avoid mounting a filesystem in case there is an error while joining the cluster. In the spectator case, 0 is written into the file to indicate that all is well, and that mount should continue. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-29 15:04:18 +01:00
Steven Whitehouse	43f74c1995	GFS2: Add "norecovery" mount option as a synonym for "spectator" XFS supports the "norecovery" mount option which is basically the same as the GFS2 spectator mode. This adds support for "norecovery" as a synonym for spectator mode, which is hopefully a more obvious description of what it actually does. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-29 14:24:41 +01:00
Steven Whitehouse	c741c45512	GFS2: Fix spectator umount issue The tests further down the recovery function relating to unlocking the journal need to be updated to match the intial test. Also, a test in the umount code which was surplus to requirements has been removed. Umounting spectator mounts now works correctly, as expected. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-29 14:20:52 +01:00
Dave Chinner	80168676eb	xfs: force background CIL push under sustained load I have been seeing occasional pauses in transaction throughput up to 30s long under heavy parallel workloads. The only notable thing was that the xfsaild was trying to be active during the pauses, but making no progress. It was running exactly 20 times a second (on the 50ms no-progress backoff), and the number of pushbuf events was constant across this time as well. IOWs, the xfsaild appeared to be stuck on buffers that it could not push out. Further investigation indicated that it was trying to push out inode buffers that were pinned and/or locked. The xfsbufd was also getting woken at the same frequency (by the xfsaild, no doubt) to push out delayed write buffers. The xfsbufd was not making any progress because all the buffers in the delwri queue were pinned. This scan- and-make-no-progress dance went one in the trace for some seconds, before the xfssyncd came along an issued a log force, and then things started going again. However, I noticed something strange about the log force - there were way too many IO's issued. 516 log buffers were written, to be exact. That added up to 129MB of log IO, which got me very interested because it's almost exactly 25% of the size of the log. He delayed logging code is suppose to aggregate the minimum of 25% of the log or 8MB worth of changes before flushing. That's what really puzzled me - why did a log force write 129MB instead of only 8MB? Essentially what has happened is that no CIL pushes had occurred since the previous tail push which cleared out 25% of the log space. That caused all the new transactions to block because there wasn't log space for them, but they kick the xfsaild to push the tail. However, the xfsaild was not making progress because there were buffers it could not lock and flush, and the xfsbufd could not flush them because they were pinned. As a result, both the xfsaild and the xfsbufd could not move the tail of the log forward without the CIL first committing. The cause of the problem was that the background CIL push, which should happen when 8MB of aggregated changes have been committed, is being held off by the concurrent transaction commit load. The background push does a down_write_trylock() which will fail if there is a concurrent transaction commit holding the push lock in read mode. With 8 CPUs all doing transactions as fast as they can, there was enough concurrent transaction commits to hold off the background push until tail-pushing could no longer free log space, and the halt would occur. It should be noted that there is no reason why it would halt at 25% of log space used by a single CIL checkpoint. This bug could definitely violate the "no transaction should be larger than half the log" requirement and hence result in corruption if the system crashed under heavy load. This sort of bug is exactly the reason why delayed logging was tagged as experimental.... The fix is to start blocking background pushes once the threshold has been exceeded. Rework the threshold calculations to keep the amount of log space a CIL checkpoint can use to below that of the AIL push threshold to avoid the problem completely. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-09-29 07:51:03 -05:00
Steven Whitehouse	d594845106	GFS2: Fix compiler warning from previous patch This shouldn't really be required, but gcc can't tell that "al" is only accessed when initialised. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-28 10:17:47 +01:00
Benjamin Marzinski	bf97b6734e	GFS2: reserve more blocks for transactions Some of the functions in GFS2 were not reserving space in the transaction for the resource group header and the resource groups bitblocks that get added when you do allocation. GFS2 now makes sure to reserve space for the resource group header and either all the bitblocks in the resource group, or one for each block that it may allocate, whichever is smaller using the new gfs2_rg_blocks() inline function. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-28 09:44:24 +01:00
Steffen Sledz	54dd55a406	UBIFS: avoid kernel error if ubifs superblock read fails .get_sb is called on mounts with automatic fs detection too, so this function should print an error if it cannot read the superblock in debug mode only (new behaviour conforms the other fs types) Signed-off-by: Steffen Sledz <sledz@dresearch.de> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2010-09-28 09:46:30 +03:00
Steven Whitehouse	d0795f9123	GFS2: Fix journal check for spectator mounts When checking journals for spectator mounts, we cannot rely on the journal being locked, whatever its jid might be. This patch ensures that we always get the journal locks when checking journals for a spectator mount. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-27 15:58:11 +01:00
Linus Torvalds	d1f3e68efb	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: o2dlm: force free mles during dlm exit ocfs2: Sync inode flags with ext2. ocfs2: Move 'wanted' into parens of ocfs2_resmap_resv_bits. ocfs2: Use cpu_to_le16 for e_leaf_clusters in ocfs2_bg_discontig_add_extent. ocfs2: update ctime when changing the file's permission by setfacl ocfs2/net: fix uninitialized ret in o2net_send_message_vec() Ocfs2: Handle empty list in lockres_seq_start() for dlmdebug.c Ocfs2: Re-access the journal after ocfs2_insert_extent() in dxdir codes. ocfs2: Fix lockdep warning in reflink. ocfs2/lockdep: Move ip_xattr_sem out of ocfs2_xattr_get_nolock.	2010-09-24 14:08:15 -07:00
Steven Whitehouse	c80dbb58f9	GFS2: Remove upgrade mount option This option has never done anything useful. Also at the same time this cleans up the sb checks which are done at mount time. The debug option will be accepted, but ignored in future. Since it didn't do anything, there didn't seem much point in retaining it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-24 09:55:07 +01:00
Srinivas Eeda	5dad6c39d1	o2dlm: force free mles during dlm exit While umounting, a block mle doesn't get freed if dlm is shutdown after master request is received but before assert master. This results in unclean shutdown of dlm domain. This patch frees all mles that lie around after other nodes were notified about exiting the dlm and marking dlm state as leaving. Only block mles are expected to be around, so we log ERROR for other mles but still free them. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-23 14:16:53 -07:00
Tao Ma	0000b86202	ocfs2: Sync inode flags with ext2. We sync our inode flags with ext2 and define them by hex values. But actually in commit 3669567(4 years ago), all these values are moved to include/linux/fs.h. So we'd better also use them as what ext2 did. So sync our inode flags with ext2 by using FS_*. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-23 14:16:49 -07:00
Tao Ma	4a452de4fd	ocfs2: Move 'wanted' into parens of ocfs2_resmap_resv_bits. The first time I read the function ocfs2_resmap_resv_bits, I consider about what 'wanted' will be used and consider about the comments. Then I find it is only used if the reservation is empty. ;) So we'd better move it to the parens so that it make the code more readable, what's more, ocfs2_resmap_resv_bits is used so frequently and we should save some cpus. Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-23 14:16:47 -07:00
Tao Ma	47dea42379	ocfs2: Use cpu_to_le16 for e_leaf_clusters in ocfs2_bg_discontig_add_extent. e_leaf_clusters is a le16, so use cpu_to_le16 instead of cpu_to_le32. What's more, we change 'clusters' to unsigned int to signify that the size of 'clusters' isn't important here. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-23 14:16:34 -07:00
Tao Ma	12828061cd	ocfs2: update ctime when changing the file's permission by setfacl In commit `30e2bab`, ext3 fixed it. So change it accordingly in ocfs2. Steps to reproduce: # touch aaa # stat -c %Z aaa 1283760364 # setfacl -m 'u::x,g::x,o::x' aaa # stat -c %Z aaa 1283760364 Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-23 14:16:21 -07:00
Steven Whitehouse	c2048b003c	GFS2: Remove localcaching mount option This option defaulted to on for lock_nolock mounts and off otherwise. The only function was to avoid the revalidation of dentries. In the cluster case, that is entirely pointless and liable to cause coherency problems. The patch changes the revalidation to depend upon whether the fs is a local or cluster fs (i.e. it follows the existing default behaviour). I very much doubt anybody ever used this option as there is no reason to. Even so we will continue to accept it on the mount command line, but ignore it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-23 14:00:31 +01:00
Steven Whitehouse	f57a024ed2	GFS2: Remove ignore_local_fs mount argument This is been a no-op for a very long time now. I'm pretty sure nobody uses it, but just in case we'll still accept it on the command line, but ignore it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-23 13:41:42 +01:00
Namhyung Kim	72b43570f3	ext2: fix a typo on comment in ext2/inode.c 'excpet' should be 'except'. 'ext3_get_branch' should be 'ext2_get_branch'. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-09-23 13:49:50 +02:00
Joe Perches	0c6d7d5da2	fs/ecryptfs: Remove unnecessary casts of private_data Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-09-23 13:29:38 +02:00
Joe Perches	8209e2f467	fs/seq_file.c: Remove unnecessary casts of private_data Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-09-23 13:28:23 +02:00
KOSAKI Motohiro	1c2499ae87	/proc/pid/smaps: fix dirty pages accounting Currently, /proc/<pid>/smaps has wrong dirty pages accounting. Shared_Dirty and Private_Dirty output only pte dirty pages and ignore PG_dirty page flag. It is difference against documentation, but also inconsistent against Referenced field. (Referenced checks both pte and page flags) This patch fixes it. Test program: large-array.c --------------------------------------------------- #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> char array[110241024*1024L]; int main(void) { memset(array, 1, sizeof(array)); pause(); return 0; } --------------------------------------------------- Test case: 1. run ./large-array 2. cat /proc/`pidof large-array`/smaps 3. swapoff -a 4. cat /proc/`pidof large-array`/smaps again Test result: <before patch> 00601000-40601000 rw-p 00000000 00:00 0 Size: 1048576 kB Rss: 1048576 kB Pss: 1048576 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 218992 kB <-- showed pages as clean incorrectly Private_Dirty: 829584 kB Referenced: 388364 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB <after patch> 00601000-40601000 rw-p 00000000 00:00 0 Size: 1048576 kB Rss: 1048576 kB Pss: 1048576 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 1048576 kB <-- fixed Referenced: 388480 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-22 17:22:39 -07:00
Jan Kara	a0c42bac79	aio: do not return ERESTARTSYS as a result of AIO OCFS2 can return ERESTARTSYS from its write function when the process is signalled while waiting for a cluster lock (and the filesystem is mounted with intr mount option). Generally, it seems reasonable to allow filesystems to return this error code from its IO functions. As we must not leak ERESTARTSYS (and similar error codes) to userspace as a result of an AIO operation, we have to properly convert it to EINTR inside AIO code (restarting the syscall isn't really an option because other AIO could have been already submitted by the same io_submit syscall). Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-22 17:22:39 -07:00
Arnd Bergmann	c227e69028	/proc/vmcore: fix seeking Commit `73296bc611` ("procfs: Use generic_file_llseek in /proc/vmcore") broke seeking on /proc/vmcore. This changes it back to use default_llseek in order to restore the original behaviour. The problem with generic_file_llseek is that it only allows seeks up to inode->i_sb->s_maxbytes, which is zero on procfs and some other virtual file systems. We should merge generic_file_llseek and default_llseek some day and clean this up in a proper way, but for 2.6.35/36, reverting vmcore is the safer solution. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Reported-by: CAI Qian <caiqian@redhat.com> Tested-by: CAI Qian <caiqian@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-22 17:22:38 -07:00
Dan Rosenberg	767b68e969	Prevent freeing uninitialized pointer in compat_do_readv_writev In 32-bit compatibility mode, the error handling for compat_do_readv_writev() may free an uninitialized pointer, potentially leading to all sorts of ugly memory corruption. This is reliably triggerable by unprivileged users by invoking the readv()/writev() syscalls with an invalid iovec pointer. The below patch fixes this to emulate the non-compat version. Introduced by commit `b83733639a` ("compat: factor out compat_rw_copy_check_uvector from compat_do_readv_writev") Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Cc: stable@kernel.org (2.6.35) Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-22 17:22:38 -07:00
Linus Torvalds	b68e9d4581	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: bdi: Fix warnings in __mark_inode_dirty for /dev/zero and friends char: Mark /dev/zero and /dev/kmem as not capable of writeback bdi: Initialize noop_backing_dev_info properly cfq-iosched: fix a kernel OOPs when usb key is inserted block: fix blk_rq_map_kern bio direction flag cciss: freeing uninitialized data on error path	2010-09-22 09:12:37 -07:00
Jan Kara	692ebd17c2	bdi: Fix warnings in __mark_inode_dirty for /dev/zero and friends Inodes of devices such as /dev/zero can get dirty for example via utime(2) syscall or due to atime update. Backing device of such inodes (zero_bdi, etc.) is however unable to handle dirty inodes and thus __mark_inode_dirty complains. In fact, inode should be rather dirtied against backing device of the filesystem holding it. This is generally a good rule except for filesystems such as 'bdev' or 'mtd_inodefs'. Inodes in these pseudofilesystems are referenced from ordinary filesystem inodes and carry mapping with real data of the device. Thus for these inodes we have to use inode->i_mapping->backing_dev_info as we did so far. We distinguish these filesystems by checking whether sb->s_bdi points to a non-trivial backing device or not. Example: Assume we have an ext3 filesystem on /dev/sda1 mounted on /. There's a device inode A described by a path "/dev/sdb" on this filesystem. This inode will be dirtied against backing device "8:0" after this patch. bdev filesystem contains block device inode B coupled with our inode A. When someone modifies a page of /dev/sdb, it's B that gets dirtied and the dirtying happens against the backing device "8:16". Thus both inodes get filed to a correct bdi list. Cc: stable@kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-22 09:48:47 +02:00
Jan Kara	371d217ee1	char: Mark /dev/zero and /dev/kmem as not capable of writeback These devices don't do any writeback but their device inodes still can get dirty so mark bdi appropriately so that bdi code does the right thing and files inodes to lists of bdi carrying the device inodes. Cc: stable@kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-22 09:48:47 +02:00
Linus Torvalds	19746cad00	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: select CRYPTO ceph: check mapping to determine if FILE_CACHE cap is used ceph: only send one flushsnap per cap_snap per mds session ceph: fix cap_snap and realm split ceph: stop sending FLUSHSNAPs when we hit a dirty capsnap ceph: correctly set 'follows' in flushsnap messages ceph: fix dn offset during readdir_prepopulate ceph: fix file offset wrapping at 4GB on 32-bit archs ceph: fix reconnect encoding for old servers ceph: fix pagelist kunmap tail ceph: fix null pointer deref on anon root dentry release	2010-09-21 11:20:10 -07:00
Nikanth Karthikesan	817f2c842d	Fix various typos of valid in comments Fix various typos of valid. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-09-21 17:04:50 +02:00
Corrado Zoccolo	749ef9f842	cfq: improve fsync performance for small files Fsync performance for small files achieved by cfq on high-end disks is lower than what deadline can achieve, due to idling introduced between the sync write happening in process context and the journal commit. Moreover, when competing with a sequential reader, a process writing small files and fsync-ing them is starved. This patch fixes the two problems by: - marking journal commits as WRITE_SYNC, so that they get the REQ_NOIDLE flag set, - force all queues that have REQ_NOIDLE requests to be put in the noidle tree. Having the queue associated to the fsync-ing process and the one associated to journal commits in the noidle tree allows: - switching between them without idling, - fairness vs. competing idling queues, since they will be serviced only after the noidle tree expires its slice. Acked-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-20 15:24:50 +02:00
Steven Whitehouse	8d1235852b	GFS2: Make . and .. qstrs constant Rather than calculating the qstrs for . and .. each time we need them, its better to keep a constant version of these and just refer to them when required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reviewed-by: Christoph Hellwig <hch@infradead.org>	2010-09-20 11:21:09 +01:00
Steven Whitehouse	9fa0ea9f26	GFS2: Use new workqueue scheme The recovery workqueue can be freezable since we want it to finish what it is doing if the system is to be frozen (although why you'd want to freeze a cluster node is beyond me since it will result in it being ejected from the cluster). It does still make sense for single node GFS2 filesystems though. The glock workqueue will benefit from being able to run more work items concurrently. A test running postmark shows improved performance and multi-threaded workloads are likely to benefit even more. It needs to be high priority because the latency directly affects the latency of filesystem glock operations. The delete workqueue is similar to the recovery workqueue in that it must not get blocked by memory allocations, and may run for a long time. Potentially other GFS2 threads might also be converted to workqueues, but I'll leave that for a later patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Tejun Heo <tj@kernel.org>	2010-09-20 11:20:36 +01:00
Steven Whitehouse	1fea7c25a0	GFS2: Update handling of DLM return codes to match reality GFS2's idea of which return codes it needs to handle was based upon those listed in dlm.h. Those didn't cover all the possible codes and listed some which never happen. This updates GFS2 to handle all the codes which can actually be returned from the DLM under various circumstances. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:20:12 +01:00
Steven Whitehouse	7b5e3d5fcf	GFS2: Don't enforce min hold time when two demotes occur in rapid succession Due to the design of the VFS, it is quite usual for operations on GFS2 to consist of a lookup (requiring a shared lock) followed by an operation requiring an exclusive lock. If a remote node has cached an exclusive lock, then it will receive two demote events in rapid succession firstly for a shared lock and then to unlocked. The existing min hold time code was triggering in this case, even if the node was otherwise idle since the state change time was being updated by the initial demote. This patch introduces logic to skip the min hold timer in the case that a "double demote" of this kind has occurred. The min hold timer will still be used in all other cases. A new glock flag is introduced which is used to keep track of whether there have been any newly queued holders since the last glock state change. The min hold time is only applied if the flag is set. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Tested-by: Abhijith Das <adas@redhat.com>	2010-09-20 11:19:50 +01:00
Steven Whitehouse	fe08d5a897	GFS2: Fix whitespace in previous patch Removes the offending space Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:19:35 +01:00
Benjamin Marzinski	3921120e75	GFS2: fallocate support This patch adds support for fallocate to gfs2. Since the gfs2 does not support uninitialized data blocks, it must write out zeros to all the blocks. However, since it does not need to lock any pages to read from, gfs2 can write out the zero blocks much more efficiently. On a moderately full filesystem, fallocate works around 5 times faster on average. The fallocate call also allows gfs2 to add blocks to the file without changing the filesize, which will make it possible for gfs2 to preallocate space for the rindex file, so that gfs2 can grow a completely full filesystem. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:19:17 +01:00
Steven Whitehouse	9a3f236d40	GFS2: Add a bug trap in allocation code This adds a check to ensure that if we reach the block allocator that we don't try and proceed if there is no alloc structure hanging off the inode. This should only happen if there is a bug in GFS2. The error return code is distinctive in order that it will be easily spotted. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:18:59 +01:00
Steven Whitehouse	820969f353	GFS2: No longer experimental I think the time has arrvied to remove the experimental tag from GFS2. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:18:46 +01:00
Steven Whitehouse	a2e0f79939	GFS2: Remove i_disksize With the update of the truncate code, ip->i_disksize and inode->i_size are merely copies of each other. This means we can remove ip->i_disksize and use inode->i_size exclusively reducing the size of a GFS2 inode by 8 bytes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:18:29 +01:00
Steven Whitehouse	ff8f33c8b3	GFS2: New truncate sequence This updates GFS2's truncate code to use the new truncate sequence correctly. This is a stepping stone to being able to remove ip->i_disksize in favour of using i_size everywhere now that the two sizes are always identical. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Christoph Hellwig <hch@lst.de>	2010-09-20 11:18:16 +01:00
Artem Bityutskiy	2ef13294d2	UBIFS: introduce new flags for RO mounts Commit `2fde99cb55` "UBIFS: mark VFS SB RO too" introduced regression. This commit made UBIFS set the 'MS_RDONLY' flag in the VFS superblock when it switches to R/O mode due to an error. This was done to make VFS show the R/O UBIFS flag in /proc/mounts. However, several places in UBIFS relied on the 'MS_RDONLY' flag and assume this flag can only change when we re-mount. For example, 'ubifs_put_super()'. This patch introduces new UBIFS flag - 'c->ro_mount' which changes only when we re-mount, and preserves the way UBIFS was originally mounted (R/W or R/O). This allows us to de-initialize UBIFS cleanly in 'ubifs_put_super()'. This patch also changes all 'ubifs_assert(!c->ro_media)' assertions to 'ubifs_assert(!c->ro_media && !c->ro_mount)', because we never should write anything if the FS was mounter R/O. All the places where we test for 'MS_RDONLY' flag in the VFS SB were changed and now we test the 'c->ro_mount' flag instead, because it preserves the original UBIFS mount type, unlike the 'MS_RDONLY' flag. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2010-09-19 21:07:58 +03:00
Jan Harkes	112d421df2	Coda: mount hangs because of missed REQ_WRITE rename Coda's REQ_* defines were renamed to avoid clashes with the block layer (commit 4aeefdc69f7b: "coda: fixup clash with block layer REQ_* defines"). However one was missed and response messages are no longer matched with requests and waiting threads are no longer woken up. This patch fixes this. Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> [ Also fixed up whitespace while at it -Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-19 11:03:09 -07:00
Wu Fengguang	50aff04036	ocfs2/net: fix uninitialized ret in o2net_send_message_vec() mmotm/fs/ocfs2/cluster/tcp.c: In function ‘o2net_send_message_vec’: mmotm/fs/ocfs2/cluster/tcp.c:980:6: warning: ‘ret’ may be used uninitialized in this function It seems a real bug introduced by commit `9af0b38ff3` (ocfs2/net: Use wait_event() in o2net_send_message_vec()). cc: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-18 08:48:54 -07:00
Sage Weil	be4f104dfd	ceph: select CRYPTO We select CRYPTO_AES, but not CRYPTO. Signed-off-by: Sage Weil <sage@newdream.net>	2010-09-17 12:30:31 -07:00
Sage Weil	a43fb73101	ceph: check mapping to determine if FILE_CACHE cap is used See if the i_data mapping has any pages to determine if the FILE_CACHE capability is currently in use, instead of assuming it is any time the rdcache_gen value is set (i.e., issued -> used). This allows the MDS RECALL_STATE process work for inodes that have cached pages. Signed-off-by: Sage Weil <sage@newdream.net>	2010-09-17 09:54:31 -07:00
Sage Weil	e835124c2b	ceph: only send one flushsnap per cap_snap per mds session Sending multiple flushsnap messages is problematic because we ignore the response if the tid doesn't match, and the server may only respond to each one once. It's also a waste. So, skip cap_snaps that are already on the flushing list, unless the caller tells us to resend (because we are reconnecting). Signed-off-by: Sage Weil <sage@newdream.net>	2010-09-17 08:03:08 -07:00
Artem Bityutskiy	2680d722bf	UBIFS: introduce new flag for RO due to errors The R/O state may have various reasons: 1. The UBI volume is R/O 2. The FS is mounted R/O 3. The FS switched to R/O mode because of an error However, in UBIFS we have only one variable which represents cases 1 and 3 - 'c->ro_media'. Indeed, we set this to 1 if we switch to R/O mode due to an error, and then we test it in many places to make sure that we stop writing as soon as the error happens. But this is very unclean. One consequence of this, for example, is that in 'ubifs_remount_fs()' we use 'c->ro_media' to check whether we are in R/O mode because on an error, and we print a message in this case. However, if we are in R/O mode because the media is R/O, our message is bogus. This patch introduces new flag - 'c->ro_error' which is set when we switch to R/O mode because of an error. It also changes all "if (c->ro_media)" checks to "if (c->ro_error)" checks, because this is what the checks actually mean. We do not need to check for 'c->ro_media' because if the UBI volume is in R/O mode, we do not allow R/W mounting, and now writes can happen. This is guaranteed by VFS. But it is good to double-check this, so this patch also adds many "ubifs_assert(!c->ro_media)" checks. In the 'ubifs_remount_fs()' function this patch makes a bit more changes - it fixes the error messages as well. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2010-09-17 17:08:09 +03:00
Steven Whitehouse	5f4874903d	GFS2: gfs2_logd should be using interruptible waits Looks like this crept in, in a recent update. Reported-by: Krzysztof Urbaniak <urban@bash.org.pl> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-17 14:00:10 +01:00
Sage Weil	ae00d4f37f	ceph: fix cap_snap and realm split The cap_snap creation/queueing relies on both the current i_head_snapc _and_ the i_snap_realm pointers being correct, so that the new cap_snap can properly reference the old context and the new i_head_snapc can be updated to reference the new snaprealm's context. To fix this, we: - move inodes completely to the new (split) realm so that i_snap_realm is correct, and - generate the new snapc's _before_ queueing the cap_snaps in ceph_update_snap_trace(). Signed-off-by: Sage Weil <sage@newdream.net>	2010-09-16 16:26:51 -07:00
Linus Torvalds	03a7ab083e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: fix potential double put of TCP session reference	2010-09-16 12:59:11 -07:00
Christoph Hellwig	dd3932eddf	block: remove BLKDEV_IFL_WAIT All the blkdev_issue_* helpers can only sanely be used for synchronous caller. To issue cache flushes or barriers asynchronously the caller needs to set up a bio by itself with a completion callback to move the asynchronous state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always specified when calling blkdev_issue_* and also remove the now unused flags argument to blkdev_issue_flush and blkdev_issue_zeroout. For blkdev_issue_discard we need to keep it for the secure discard flag, which gains a more descriptive name and loses the bitops vs flag confusion. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-16 20:52:58 +02:00
Joel Becker	93f3b86fb1	ocfs2: Initialize the bktcnt variable properly, and call it bucket_count Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-15 17:00:58 -07:00
Joel Becker	c0e1a3e80d	ocfs2: Silence unused warning. When CONFIG_OCFS2_DEBUG_MASKLOG is undefined, we don't use the dentry variable in ocfs2_sync_file(). Let's just move all access to the dentry inside the logging call. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-15 16:56:54 -07:00
Will Drewry	eec7ecfede	genhd, efi: add efi partition metadata to hd_structs This change extends the partition_meta_info structure to support EFI GPT-specific metadata and ensures that data is copied in on partition scanning. Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-15 16:13:28 +02:00
Will Drewry	6d1d8050b4	block, partition: add partition_meta_info to hd_struct I'm reposting this patch series as v4 since there have been no additional comments, and I cleaned up one extra bit of unneeded code (in 3/3). The patches are against Linus's tree: `2bfc96a127` (2.6.36-rc3). Would this patchset be suitable for inclusion in an mm branch? This changes adds a partition_meta_info struct which itself contains a union of structures that provide partition table specific metadata. This change leaves the union empty. The subsequent patch includes an implementation for CONFIG_EFI_PARTITION-based metadata. Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-15 16:13:18 +02:00
Linus Torvalds	de8d4f5d75	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: SUNRPC: Fix the NFSv4 and RPCSEC_GSS Kconfig dependencies statfs() gives ESTALE error NFS: Fix a typo in nfs_sockaddr_match_ipaddr6 sunrpc: increase MAX_HASHTABLE_BITS to 14 gss:spkm3 miss returning error to caller when import security context gss:krb5 miss returning error to caller when import security context Remove incorrect do_vfs_lock message SUNRPC: cleanup state-machine ordering SUNRPC: Fix a race in rpc_info_open SUNRPC: Fix race corrupting rpc upcall Fix null dereference in call_allocate	2010-09-14 17:04:48 -07:00
Jeff Moyer	75e1c70fc3	aio: check for multiplication overflow in do_io_submit Tavis Ormandy pointed out that do_io_submit does not do proper bounds checking on the passed-in iocb array: if (unlikely(nr < 0)) return -EINVAL; if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(iocbpp))))) return -EFAULT; ^^^^^^^^^^^^^^^^^^ The attached patch checks for overflow, and if it is detected, the number of iocbs submitted is scaled down to a number that will fit in the long. This is an ok thing to do, as sys_io_submit is documented as returning the number of iocbs submitted, so callers should handle a return value of less than the 'nr' argument passed in. Reported-by: Tavis Ormandy <taviso@cmpxchg8b.com> Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-14 17:02:37 -07:00
Jeff Layton	460cf3411b	cifs: fix potential double put of TCP session reference cifs_get_smb_ses must be called on a server pointer on which it holds an active reference. It first does a search for an existing SMB session. If it finds one, it'll put the server reference and then try to ensure that the negprot is done, etc. If it encounters an error at that point then it'll return an error. There's a potential problem here though. When cifs_get_smb_ses returns an error, the caller will also put the TCP server reference leading to a double-put. Fix this by having cifs_get_smb_ses only put the server reference if it found an existing session that it could use and isn't returning an error. Cc: stable@kernel.org Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-09-14 23:21:03 +00:00
Sage Weil	cfc0bf6640	ceph: stop sending FLUSHSNAPs when we hit a dirty capsnap Stop sending FLUSHSNAP messages when we hit a capsnap that has dirty_pages or is still writing. We'll send the newer capsnaps only after the older ones complete. Signed-off-by: Sage Weil <sage@newdream.net>	2010-09-14 15:50:59 -07:00
Sage Weil	8bef9239ee	ceph: correctly set 'follows' in flushsnap messages The 'follows' should match the seq for the snap context for the given snap cap, which is the context under which we have been dirtying and writing data and metadata. The snapshot that _contains_ those updates thus _follows_ that context's seq #. Signed-off-by: Sage Weil <sage@newdream.net>	2010-09-14 15:45:44 -07:00
Linus Torvalds	ed8f425f54	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: prevent possible memory corruption in cifs_demultiplex_thread cifs: eliminate some more premature cifsd exits cifs: prevent cifsd from exiting prematurely [CIFS] ntlmv2/ntlmssp remove-unused-function CalcNTLMv2_partial_mac_key cifs: eliminate redundant xdev check in cifs_rename Revert "[CIFS] Fix ntlmv2 auth with ntlmssp" Revert "missing changes during ntlmv2/ntlmssp auth and sign" Revert "Eliminate sparse warning - bad constant expression" Revert "[CIFS] Eliminate unused variable warning"	2010-09-13 12:47:08 -07:00
Sage Weil	467c525109	ceph: fix dn offset during readdir_prepopulate When adding the readdir results to the cache, ceph_set_dentry_offset was clobbered our just-set offset. This can cause the readdir result offsets to get out of sync with the server. Add an argument to the helper so that it does not. This bug was introduced by `1cd3935bed`. Signed-off-by: Sage Weil <sage@newdream.net>	2010-09-13 11:40:36 -07:00
Aneesh Kumar K.V	1d76e31357	fs/9p: Don't use dotl version of mknod for dotu inode operations We should not use dotlversion for the dotu inode operations Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-09-13 08:13:03 -05:00
Aneesh Kumar K.V	3c30750ffa	fs/9p: Use the correct dentry operations We should use the cached dentry operation only if caching mode is enabled Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-09-13 08:13:03 -05:00
jvrao	62726a7ab3	9p: Check for NULL fid in v9fs_dir_release() NULL fid should be handled in cases where we endup calling v9fs_dir_release() before even we instantiate the fid in filp. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-09-13 08:13:03 -05:00
Aneesh Kumar K.V	5c25f347a7	fs/9p: Fix error handling in v9fs_get_sb This was introduced by 7cadb63d58a932041afa3f957d5cbb6ce69dcee5 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-09-13 08:13:02 -05:00
Latchesar Ionkov	62b2be591a	fs/9p, net/9p: memory leak fixes Four memory leak fixes in the 9P code. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-09-13 08:13:02 -05:00
Trond Myklebust	827e345702	SUNRPC: Fix the NFSv4 and RPCSEC_GSS Kconfig dependencies The NFSv4 client's callback server calls svc_gss_principal(), which is defined in the auth_rpcgss.ko The NFSv4 server has the same dependency, and in addition calls svcauth_gss_flavor(), gss_mech_get_by_pseudoflavor(), gss_pseudoflavor_to_service() and gss_mech_put() from the same module. The module auth_rpcgss itself has no dependencies aside from sunrpc, so we only need to select RPCSEC_GSS. Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-09-12 19:57:50 -04:00
Menyhart Zoltan	fbf3fdd244	statfs() gives ESTALE error Hi, An NFS client executes a statfs("file", &buff) call. "file" exists / existed, the client has read / written it, but it has already closed it. user_path(pathname, &path) looks up "file" successfully in the directory-cache and restarts the aging timer of the directory-entry. Even if "file" has already been removed from the server, because the lookupcache=positive option I use, keeps the entries valid for a while. nfs_statfs() returns ESTALE if "file" has already been removed from the server. If the user application repeats the statfs("file", &buff) call, we are stuck: "file" remains young forever in the directory-cache. Signed-off-by: Zoltan Menyhart <Zoltan.Menyhart@bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2010-09-12 19:55:26 -04:00
Trond Myklebust	b20d37ca95	NFS: Fix a typo in nfs_sockaddr_match_ipaddr6 Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2010-09-12 19:55:26 -04:00
Fabio Olive Leite	b1bde04c6d	Remove incorrect do_vfs_lock message The do_vfs_lock function on fs/nfs/file.c is only called if NLM is not being used, via the -onolock mount option. Therefore it cannot really be "out of sync with lock manager" when the local locking function called returns an error, as there will be no corresponding call to the NLM. For details, simply check the if/else on do_setlk and do_unlk on fs/nfs/file.c. Signed-Off-By: Fabio Olive Leite <fleite@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-09-12 19:55:25 -04:00
Sage Weil	a77d9f7dce	ceph: fix file offset wrapping at 4GB on 32-bit archs Cast the value before shifting so that we don't run out of bits with a 32-bit unsigned long. This fixes wrapping of high file offsets into the low 4GB of a file on disk, and the subsequent data corruption for large files. Signed-off-by: Sage Weil <sage@newdream.net>	2010-09-11 10:55:25 -07:00
Sage Weil	3612abbd5d	ceph: fix reconnect encoding for old servers Fix the reconnect encoding to encode the cap record when the MDS does not have the FLOCK capability (i.e., pre v0.22). Signed-off-by: Sage Weil <sage@newdream.net>	2010-09-11 10:52:47 -07:00
Yehuda Sadeh	3d4401d9d0	ceph: fix pagelist kunmap tail A wrong parameter was passed to the kunmap. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-09-11 10:52:47 -07:00
Sage Weil	ca04d9c3ec	ceph: fix null pointer deref on anon root dentry release When we release a root dentry, particularly after a splice, the parent (actually our) inode was evaluating to NULL and was getting dereferenced by ceph_snap(). This is reproduced by something as simple as mount -t ceph monhost:/a/b mnt mount -t ceph monhost:/a mnt2 ls mnt2 A splice_dentry() would kill the old 'b' inode's root dentry, and we'd crash while releasing it. Fix by checking for both the ROOT and NULL cases explicitly. We only need to invalidate the parent dir when we have a correct parent to invalidate. Signed-off-by: Sage Weil <sage@newdream.net>	2010-09-11 10:52:47 -07:00
Linus Torvalds	fbc1487019	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: log IO completion workqueue is a high priority queue xfs: prevent reading uninitialized stack memory	2010-09-10 18:19:26 -07:00
Tristan Ye	228ac63577	Ocfs2: Handle empty list in lockres_seq_start() for dlmdebug.c This patch tries to handle the case in which list 'dlm->tracking_list' is empty, to avoid accessing an invalid pointer. It fixes the following oops: http://oss.oracle.com/bugzilla/show_bug.cgi?id=1287 Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-10 09:19:30 -07:00
Tristan Ye	0f4da216b8	Ocfs2: Re-access the journal after ocfs2_insert_extent() in dxdir codes. In ocfs2_dx_dir_rebalance(), we need to rejournal_acess the blocks after calling ocfs2_insert_extent() since growing an extent tree may trigger ocfs2_extend_trans(), which makes previous journal_access meaningless. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-10 09:19:11 -07:00
Tao Ma	07eaac9438	ocfs2: Fix lockdep warning in reflink. This patch change mutex_lock to a new subclass and add a new inode lock subclass for the target inode which caused this lockdep warning. ============================================= [ INFO: possible recursive locking detected ] 2.6.35+ #5 --------------------------------------------- reflink/11086 is trying to acquire lock: (Meta){+++++.}, at: [<ffffffffa06f9d65>] ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2] but task is already holding lock: (Meta){+++++.}, at: [<ffffffffa06f9aa0>] ocfs2_reflink_ioctl+0x5d3/0x1229 [ocfs2] other info that might help us debug this: 6 locks held by reflink/11086: #0: (&sb->s_type->i_mutex_key#15/1){+.+.+.}, at: [<ffffffff820e09ec>] lookup_create+0x26/0x97 #1: (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffffa06f99a0>] ocfs2_reflink_ioctl+0x4d3/0x1229 [ocfs2] #2: (Meta){+++++.}, at: [<ffffffffa06f9aa0>] ocfs2_reflink_ioctl+0x5d3/0x1229 [ocfs2] #3: (&oi->ip_xattr_sem){+.+.+.}, at: [<ffffffffa06f9b58>] ocfs2_reflink_ioctl+0x68b/0x1229 [ocfs2] #4: (&oi->ip_alloc_sem){+.+.+.}, at: [<ffffffffa06f9b67>] ocfs2_reflink_ioctl+0x69a/0x1229 [ocfs2] #5: (&sb->s_type->i_mutex_key#15/2){+.+...}, at: [<ffffffffa06f9d4f>] ocfs2_reflink_ioctl+0x882/0x1229 [ocfs2] stack backtrace: Pid: 11086, comm: reflink Not tainted 2.6.35+ #5 Call Trace: [<ffffffff82063dd9>] validate_chain+0x56e/0xd68 [<ffffffff82062275>] ? mark_held_locks+0x49/0x69 [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1 [<ffffffff82065a81>] lock_acquire+0xc6/0xed [<ffffffffa06f9d65>] ? ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2] [<ffffffffa06c9ade>] __ocfs2_cluster_lock+0x975/0xa0d [ocfs2] [<ffffffffa06f9d65>] ? ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2] [<ffffffffa06e107b>] ? ocfs2_wait_for_recovery+0x15/0x8a [ocfs2] [<ffffffffa06cb6ea>] ocfs2_inode_lock_full_nested+0x1ac/0xdc5 [ocfs2] [<ffffffffa06f9d65>] ? ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2] [<ffffffff820623a0>] ? trace_hardirqs_on_caller+0x10b/0x12f [<ffffffff82060193>] ? debug_mutex_free_waiter+0x4f/0x53 [<ffffffffa06f9d65>] ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2] [<ffffffffa06ce24a>] ? ocfs2_file_lock_res_init+0x66/0x78 [ocfs2] [<ffffffff820bb2d2>] ? might_fault+0x40/0x8d [<ffffffffa06df9f6>] ocfs2_ioctl+0x61a/0x656 [ocfs2] [<ffffffff820ee5d3>] ? mntput_no_expire+0x1d/0xb0 [<ffffffff820e07b3>] ? path_put+0x2c/0x31 [<ffffffff820e53ac>] vfs_ioctl+0x2a/0x9d [<ffffffff820e5903>] do_vfs_ioctl+0x45d/0x4ae [<ffffffff8233a7f6>] ? _raw_spin_unlock+0x26/0x2a [<ffffffff8200299c>] ? sysret_check+0x27/0x62 [<ffffffff820e59ab>] sys_ioctl+0x57/0x7a [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-10 09:19:06 -07:00
Tao Ma	5e64b0d9e8	ocfs2/lockdep: Move ip_xattr_sem out of ocfs2_xattr_get_nolock. As the name shows, we shouldn't have any lock in ocfs2_xattr_get_nolock. so lift ip_xattr_sem to the caller. This should be safe for us since the only 2 callers are: 1. ocfs2_xattr_get which will lock the resources. 2. ocfs2_mknod which don't need this locking. And this also resolves the following lockdep warning. ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.35+ #5 ------------------------------------------------------- reflink/30027 is trying to acquire lock: (&oi->ip_alloc_sem){+.+.+.}, at: [<ffffffffa0673b67>] ocfs2_reflink_ioctl+0x69a/0x1226 [ocfs2] but task is already holding lock: (&oi->ip_xattr_sem){++++..}, at: [<ffffffffa0673b58>] ocfs2_reflink_ioctl+0x68b/0x1226 [ocfs2] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&oi->ip_xattr_sem){++++..}: [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1 [<ffffffff82065a81>] lock_acquire+0xc6/0xed [<ffffffff82339650>] down_read+0x34/0x47 [<ffffffffa0691cb8>] ocfs2_xattr_get_nolock+0xa0/0x4e6 [ocfs2] [<ffffffffa069d64f>] ocfs2_get_acl_nolock+0x5c/0x132 [ocfs2] [<ffffffffa069d9c7>] ocfs2_init_acl+0x60/0x243 [ocfs2] [<ffffffffa066499d>] ocfs2_mknod+0xae8/0xfea [ocfs2] [<ffffffffa0665041>] ocfs2_create+0x9d/0x105 [ocfs2] [<ffffffff820e1c83>] vfs_create+0x9b/0xf4 [<ffffffff820e20bb>] do_last+0x2fd/0x5be [<ffffffff820e31c0>] do_filp_open+0x1fb/0x572 [<ffffffff820d6cf6>] do_sys_open+0x5a/0xe7 [<ffffffff820d6dac>] sys_open+0x1b/0x1d [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b -> #2 (jbd2_handle){+.+...}: [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1 [<ffffffff82065a81>] lock_acquire+0xc6/0xed [<ffffffffa0604ff8>] start_this_handle+0x4a3/0x4bc [jbd2] [<ffffffffa06051d6>] jbd2__journal_start+0xba/0xee [jbd2] [<ffffffffa0605218>] jbd2_journal_start+0xe/0x10 [jbd2] [<ffffffffa065ca34>] ocfs2_start_trans+0xb7/0x19b [ocfs2] [<ffffffffa06645f3>] ocfs2_mknod+0x73e/0xfea [ocfs2] [<ffffffffa0665041>] ocfs2_create+0x9d/0x105 [ocfs2] [<ffffffff820e1c83>] vfs_create+0x9b/0xf4 [<ffffffff820e20bb>] do_last+0x2fd/0x5be [<ffffffff820e31c0>] do_filp_open+0x1fb/0x572 [<ffffffff820d6cf6>] do_sys_open+0x5a/0xe7 [<ffffffff820d6dac>] sys_open+0x1b/0x1d [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b -> #1 (&journal->j_trans_barrier){.+.+..}: [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1 [<ffffffff82064fa9>] lock_release_non_nested+0x1e5/0x24b [<ffffffff82065999>] lock_release+0x158/0x17a [<ffffffff823389f6>] __mutex_unlock_slowpath+0xbf/0x11b [<ffffffff82338a5b>] mutex_unlock+0x9/0xb [<ffffffffa0679673>] ocfs2_free_ac_resource+0x31/0x67 [ocfs2] [<ffffffffa067c6bc>] ocfs2_free_alloc_context+0x11/0x1d [ocfs2] [<ffffffffa0633de0>] ocfs2_write_begin_nolock+0x141e/0x159b [ocfs2] [<ffffffffa0635523>] ocfs2_write_begin+0x11e/0x1e7 [ocfs2] [<ffffffff820a1297>] generic_file_buffered_write+0x10c/0x210 [<ffffffffa0653624>] ocfs2_file_aio_write+0x4cc/0x6d3 [ocfs2] [<ffffffff820d822d>] do_sync_write+0xc2/0x106 [<ffffffff820d897b>] vfs_write+0xae/0x131 [<ffffffff820d8e55>] sys_write+0x47/0x6f [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b -> #0 (&oi->ip_alloc_sem){+.+.+.}: [<ffffffff82063f92>] validate_chain+0x727/0xd68 [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1 [<ffffffff82065a81>] lock_acquire+0xc6/0xed [<ffffffff82339694>] down_write+0x31/0x52 [<ffffffffa0673b67>] ocfs2_reflink_ioctl+0x69a/0x1226 [ocfs2] [<ffffffffa06599f6>] ocfs2_ioctl+0x61a/0x656 [ocfs2] [<ffffffff820e53ac>] vfs_ioctl+0x2a/0x9d [<ffffffff820e5903>] do_vfs_ioctl+0x45d/0x4ae [<ffffffff820e59ab>] sys_ioctl+0x57/0x7a [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-10 09:19:05 -07:00
Goldwyn Rodrigues	5e98d49240	Track negative entries v3 Track negative dentries by recording the generation number of the parent directory in d_fsdata. The generation number for the parent directory is recorded in the inode_info, which increments every time the lock on the directory is dropped. If the generation number of the parent directory and the negative dentry matches, there is no need to perform the revalidate, else a revalidate is forced. This improves performance in situations where nodes look for the same non-existent file multiple times. Thanks Mark for explaining the DLM sequence. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-10 09:18:15 -07:00
Tao Ma	b4d693fcc5	ocfs2: Cache system inodes of other slots. Durring orphan scan, if we are slot 0, and we are replaying orphan_dir:0001, the general process is that for every file in this dir: 1. we will iget orphan_dir:0001, since there is no inode for it. we will have to create an inode and read it from the disk. 2. do the normal work, such as delete_inode and remove it from the dir if it is allowed. 3. call iput orphan_dir:0001 when we are done. In this case, since we have no dcache for this inode, i_count will reach 0, and VFS will have to call clear_inode and in ocfs2_clear_inode we will checkpoint the inode which will let ocfs2_cmt and journald begin to work. 4. We loop back to 1 for the next file. So you see, actually for every deleted file, we have to read the orphan dir from the disk and checkpoint the journal. It is very time consuming and cause a lot of journal checkpoint I/O. A better solution is that we can have another reference for these inodes in ocfs2_super. So if there is no other race among nodes(which will let dlmglue to checkpoint the inode), for step 3, clear_inode won't be called and for step 1, we may only need to read the inode for the 1st time. This is a big win for us. So this patch will try to cache system inodes of other slots so that we will have one more reference for these inodes and avoid the extra inode read and journal checkpoint. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-10 08:56:24 -07:00
Joel Becker	a33f13efe0	libfs: Fix shift bug in generic_check_addressable() generic_check_addressable() erroneously shifts pages down by a block factor when it should be shifting up. To prevent overflow, we shift blocks down to pages. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-10 08:42:48 -07:00
Patrick J. LoPresti	3bdb8efd94	OCFS2: Allow huge (> 16 TiB) volumes to mount The OCFS2 developers have already done all of the hard work to allow volumes larger than 16 TiB. But there is still a "sanity check" in fs/ocfs2/super.c that prevents the mounting of such volumes, even when the cluster size and journal options would allow it. This patch replaces that sanity check with a more sophisticated one to mount a huge volume provided that (a) it is addressable by the raw word/address size of the system (borrowing a test from ext4); (b) the volume is using JBD2; and (c) the JBD2_FEATURE_INCOMPAT_64BIT flag is set on the journal. I factored out the sanity check into its own function. I also moved it from ocfs2_initialize_super() down to ocfs2_check_volume(); any earlier, and the journal will not have been initialized yet. This patch is one of a pair, and it depends on the other ("JBD2: Allow feature checks before journal recovery"). I have tested this patch on small volumes, huge volumes, and huge volumes without 64-bit block support in the journal. All of them appear to work or to fail gracefully, as appropriate. Signed-off-by: Patrick LoPresti <lopresti@gmail.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-10 08:42:10 -07:00
Patrick J. LoPresti	1113e1b504	JBD2: Allow feature checks before journal recovery Before we start accessing a huge (> 16 TiB) OCFS2 volume, we need to confirm that its journal supports 64-bit offsets. In particular, we need to check the journal's feature bits before recovering the journal. This is not possible with JBD2 at present, because the journal superblock (where the feature bits reside) is not loaded from disk until the journal is recovered. This patch loads the journal superblock in jbd2_journal_check_used_features() if it has not already been loaded, allowing us to check the feature bits before journal recovery. Signed-off-by: Patrick LoPresti <lopresti@gmail.com> Cc: linux-ext4@vger.kernel.org Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-10 08:41:54 -07:00
Patrick J. LoPresti	30ca22c70e	ext3/ext4: Factor out disk addressability check As part of adding support for OCFS2 to mount huge volumes, we need to check that the sector_t and page cache of the system are capable of addressing the entire volume. An identical check already appears in ext3 and ext4. This patch moves the addressability check into its own function in fs/libfs.c and modifies ext3 and ext4 to invoke it. [Edited to -EINVAL instead of BUG_ON() for bad blocksize_bits -- Joel] Signed-off-by: Patrick LoPresti <lopresti@gmail.com> Cc: linux-ext4@vger.kernel.org Acked-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-10 08:41:42 -07:00
Joel Becker	729963a1ff	Merge branch 'cow_readahead' of git://oss.oracle.com/git/tma/linux-2.6 into merge-2	2010-09-10 08:41:04 -07:00
Tao Ma	17ae521158	ocfs2: Remove obsolete comments before ocfs2_start_trans. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-10 08:40:18 -07:00
Tao Ma	f9c57ada32	ocfs2: Remove unused old_id in ocfs2_commit_cache. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-10 08:40:08 -07:00
Jan Kara	4c38881f87	ocfs2: Remove ocfs2_sync_inode() ocfs2_sync_inode() is used only from ocfs2_sync_file(). But all data has already been written before calling ocfs2_sync_file() and ocfs2 doesn't use inode's private_list for tracking metadata buffers thus sync_mapping_buffers() is superfluous as well. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-10 08:39:44 -07:00
Goldwyn Rodrigues	83fd9c7f65	Reorganize data elements to reduce struct sizes Thanks for the comments. I have incorportated them all. CONFIG_OCFS2_FS_STATS is enabled and CONFIG_DEBUG_LOCK_ALLOC is disabled. Statistics now look like - ocfs2_write_ctxt: 2144 - 2136 = 8 ocfs2_inode_info: 1960 - 1848 = 112 ocfs2_journal: 168 - 160 = 8 ocfs2_lock_res: 336 - 304 = 32 ocfs2_refcount_tree: 512 - 472 = 40 Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-10 08:39:27 -07:00
Tao Ma	95fa859a26	ocfs2: Remove obscure error handling in direct_write. In ocfs2, actually we don't allow any direct write pass i_size, see the function ocfs2_prepare_inode_for_write. So we don't need the bogus simple_setsize. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-10 08:38:52 -07:00
Tao Ma	3c3f20c981	ocfs2: Add some trace log for orphan scan. Now orphan scan worker has no trace log, so it is very hard to tell whether it is finished or blocked. So add 2 mlog trace log so that we can tell whether the current orphan scan worker is blocked or not. It does help when I analyzed a orphan scan bug. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-10 08:35:51 -07:00
Tristan Ye	ddee5cdb70	Ocfs2: Add new OCFS2_IOC_INFO ioctl for ocfs2 v8. The reason why we need this ioctl is to offer the none-privileged end-user a possibility to get filesys info gathering. We use OCFS2_IOC_INFO to manipulate the new ioctl, userspace passes a structure to kernel containing an array of request pointers and request count, such as, * From userspace: struct ocfs2_info_blocksize oib = { .ib_req = { .ir_magic = OCFS2_INFO_MAGIC, .ir_code = OCFS2_INFO_BLOCKSIZE, ... } ... } struct ocfs2_info_clustersize oic = { ... } uint64_t reqs[2] = {(unsigned long)&oib, (unsigned long)&oic}; struct ocfs2_info info = { .oi_requests = reqs, .oi_count = 2, } ret = ioctl(fd, OCFS2_IOC_INFO, &info); * In kernel: Get the request pointers from info, then handle each request one bye one. Idea here is to make the spearated request small enough to guarantee a better backward&forward compatibility since a small piece of request would be less likely to be broken if filesys on raw disk get changed. Currently, the following 7 requests are supported per the requirement from userspace tool o2info, and I believe it will grow over time:-) OCFS2_INFO_CLUSTERSIZE OCFS2_INFO_BLOCKSIZE OCFS2_INFO_MAXSLOTS OCFS2_INFO_LABEL OCFS2_INFO_UUID OCFS2_INFO_FS_FEATURES OCFS2_INFO_JOURNAL_SIZE This ioctl is only specific to OCFS2. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-09-10 08:35:41 -07:00
Dave Chinner	51749e47e1	xfs: log IO completion workqueue is a high priority queue The workqueue implementation in 2.6.36-rcX has changed, resulting in the workqueues no longer having dedicated threads for work processing. This has caused severe livelocks under heavy parallel create workloads because the log IO completions have been getting held up behind metadata IO completions. Hence log commits would stall, memory allocation would stall because pages could not be cleaned, and lock contention on the AIL during inode IO completion processing was being seen to slow everything down even further. By making the log Io completion workqueue a high priority workqueue, they are queued ahead of all data/metadata IO completions and processed before the data/metadata completions. Hence the log never gets stalled, and operations needed to clean memory can continue as quickly as possible. This avoids the livelock conditions and allos the system to keep running under heavy load as per normal. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-09-10 10:16:54 -05:00
Roland McGrath	9aea5a65aa	execve: make responsive to SIGKILL with large arguments An execve with a very large total of argument/environment strings can take a really long time in the execve system call. It runs uninterruptibly to count and copy all the strings. This change makes it abort the exec quickly if sent a SIGKILL. Note that this is the conservative change, to interrupt only for SIGKILL, by using fatal_signal_pending(). It would be perfectly correct semantics to let any signal interrupt the string-copying in execve, i.e. use signal_pending() instead of fatal_signal_pending(). We'll save that change for later, since it could have user-visible consequences, such as having a timer set too quickly make it so that an execve can never complete, though it always happened to work before. Signed-off-by: Roland McGrath <roland@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-10 08:10:26 -07:00
Roland McGrath	7993bc1f46	execve: improve interactivity with large arguments This adds a preemption point during the copying of the argument and environment strings for execve, in copy_strings(). There is already a preemption point in the count() loop, so this doesn't add any new points in the abstract sense. When the total argument+environment strings are very large, the time spent copying them can be much more than a normal user time slice. So this change improves the interactivity of the rest of the system when one process is doing an execve with very large arguments. Signed-off-by: Roland McGrath <roland@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-10 08:10:26 -07:00
Roland McGrath	1b528181b2	setup_arg_pages: diagnose excessive argument size The CONFIG_STACK_GROWSDOWN variant of setup_arg_pages() does not check the size of the argument/environment area on the stack. When it is unworkably large, shift_arg_pages() hits its BUG_ON. This is exploitable with a very large RLIMIT_STACK limit, to create a crash pretty easily. Check that the initial stack is not too large to make it possible to map in any executable. We're not checking that the actual executable (or intepreter, for binfmt_elf) will fit. So those mappings might clobber part of the initial stack mapping. But that is just userland lossage that userland made happen, not a kernel problem. Signed-off-by: Roland McGrath <roland@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-10 08:10:26 -07:00
Linus Torvalds	ff3cb3fec3	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: Range check cpu in blk_cpu_to_group scatterlist: prevent invalid free when alloc fails writeback: Fix lost wake-up shutting down writeback thread writeback: do not lose wakeup events when forking bdi threads cciss: fix reporting of max queue depth since init block: switch s390 tape_block and mg_disk to elevator_change() block: add function call to switch the IO scheduler from a driver fs/bio-integrity.c: return -ENOMEM on kmalloc failure bio-integrity.c: remove dependency on __GFP_NOFAIL BLOCK: fix bio.bi_rw handling block: put dev->kobj in blk_register_queue fail path cciss: handle allocation failure cfq-iosched: Documentation help for new tunables cfq-iosched: blktrace print per slice sector stats cfq-iosched: Implement tunable group_idle cfq-iosched: Do group share accounting in IOPS when slice_idle=0 cfq-iosched: Do not idle if slice_idle=0 cciss: disable doorbell reset on reset_devices blkio: Fix return code for mkdir calls	2010-09-10 07:26:27 -07:00
Dan Rosenberg	a122eb2fdf	xfs: prevent reading uninitialized stack memory The XFS_IOC_FSGETXATTR ioctl allows unprivileged users to read 12 bytes of uninitialized stack memory, because the fsxattr struct declared on the stack in xfs_ioc_fsgetxattr() does not alter (or zero) the 12-byte fsx_pad member before copying it back to the user. This patch takes care of it. Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-09-10 07:39:28 -05:00

... 3 4 5 6 7 ...

19860 Commits