linux

Commit Graph

Author	SHA1	Message	Date
Jan Kara	56fcad29d4	ext3: Flush disk caches on fsync when needed In case we fsync() a file and inode is not dirty, we don't force a transaction to disk and hence don't flush disk caches. Thus file data could be just in disk caches and not on persistent storage. Fix the problem by flushing disk caches if we didn't force a transaction commit. Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-16 17:44:11 +02:00
Chris Mason	4f003fd32b	ext3: Add locking to ext3_do_update_inode I've been struggling with this off and on while I've been testing the data=guarded work. The symptom is corrupted orphan lists and inodes with the wrong i_size stored on disk. I was convinced the data=guarded code was just missing a call to ext3_mark_inode_dirty, but tracing showed the i_disksize I was sending to ext3_mark_inode_dirty wasn't actually making it to the drive. ext3_mark_inode_dirty can be called without locks held (atime updates and a few others), so the data=guarded code uses locks while updating the in-memory inode, and then calls ext3_mark_inode_dirty without any locks held. But, ext3_mark_inode_dirty has no internal locking to make sure that only one CPU is updating the buffer head at a time. Generally this works out ok because everyone that changes the inode then calls ext3_mark_inode_dirty themselves. Even though it races, eventually someone updates the buffer heads and things move on. But there is still a risk of the wrong values getting in, and the data=guarded code seems to hit the race very often. Since everyone that changes the inode also logs it, it should be possible to fix this with some memory barriers. I'll leave that as an exercise to the reader and lock the buffer head instead. It it probably a good idea to have a different patch series for lockless bit flipping on the ext3 i_state field. ext3_do_update_inode &= clears EXT3_STATE_NEW without any locks held. Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-16 17:44:11 +02:00
Jan Kara	00171d3c7e	ext3: Fix possible deadlock between ext3_truncate() and ext3_get_blocks() During truncate we are sometimes forced to start a new transaction as the amount of blocks to be journaled is both quite large and hard to predict. So far we restarted a transaction while holding truncate_mutex and that violates lock ordering because truncate_mutex ranks below transaction start (and it can lead to a real deadlock with ext3_get_blocks() allocating new blocks from ext3_writepage()). Luckily, the problem is easy to fix: We just drop the truncate_mutex before restarting the transaction and acquire it afterwards. We are safe to do this as by the time ext3_truncate() is called, all the page cache for the truncated part of the file is dropped and so writepage() cannot come and allocate new blocks in the part of the file we are truncating. The rest of writers is stopped by us holding i_mutex. Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-16 17:44:11 +02:00
Jan Kara	3adae9da0b	jbd: Annotate transaction start also for journal_restart() lockdep annotation for a transaction start has been at the end of journal_start(). But a transaction is also started from journal_restart(). Move the lockdep annotation to start_this_handle() which covers both cases. Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-16 17:44:10 +02:00
Jan Kara	9c28cbccec	jbd: Journal block numbers can ever be only 32-bit use unsigned int for them It does not make sense to store block number for journal as unsigned long since they can be only 32-bit (because of on-disk format limitation). So change in-memory structures and variables to use unsigned int instead. Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-16 17:44:10 +02:00
Andreas Dilger	b449fc6fcc	JBD: round commit timer up to avoid uncommitted transaction Fix jiffie rounding in jbd commit timer setup code. Rounding down could cause the timer to be fired before the corresponding transaction has expired. That transaction can stay not committed forever if no new transaction is created or explicit sync/umount happens. Signed-off-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-16 17:44:10 +02:00
Linus Torvalds	ab86e5765d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev debugfs: Modify default debugfs directory for debugging pktcdvd. debugfs: Modified default dir of debugfs for debugging UHCI. debugfs: Change debugfs directory of IWMC3200 debugfs: Change debuhgfs directory of trace-events-sample.h debugfs: Fix mount directory of debugfs by default in events.txt hpilo: add poll f_op hpilo: add interrupt handler hpilo: staging for interrupt handling driver core: platform_device_add_data(): use kmemdup() Driver core: Add support for compatibility classes uio: add generic driver for PCI 2.3 devices driver-core: move dma-coherent.c from kernel to driver/base mem_class: fix bug mem_class: use minor as index instead of searching the array driver model: constify attribute groups UIO: remove 'default n' from Kconfig Driver core: Add accessor for device platform data Driver core: move dev_get/set_drvdata to drivers/base/dd.c Driver core: add new device to bus's list before probing	2009-09-16 08:27:10 -07:00
Nick Piggin	1ef7d9aa32	writeback: fix possible bdi writeback refcounting problem wb_clear_pending AFAIKS should not be called after the item has been put on the list, except by the worker threads. It could lead to the situation where the refcount is decremented below 0 and cause lots of problems. Presumably the !wb_has_dirty_io case is not a common one, so it can be discovered when the thread wakes up to check? Also add a comment in bdi_work_clear. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-16 15:18:53 +02:00
Nick Piggin	77b9d059cb	writeback: Fix bdi use after free in wb_work_complete() By the time bdi_work_on_stack gets evaluated again in bdi_work_free, it can already have been deallocated and used for something else in the !on stack case, giving a false positive in this test and causing corruption. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-16 15:18:52 +02:00
Nick Piggin	77fad5e625	writeback: improve scalability of bdi writeback work queues If you're going to do an atomic RMW on each list entry, there's not much point in all the RCU complexities of the list walking. This is only going to help the multi-thread case I guess, but it doesn't hurt to do now. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-16 15:18:52 +02:00
Nick Piggin	deed62edff	writeback: remove smp_mb(), it's not needed with list_add_tail_rcu() list_add_tail_rcu contains required barriers. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-16 15:18:52 +02:00
Jens Axboe	49db041430	writeback: use schedule_timeout_interruptible() Gets rid of a manual set_current_state(). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-16 15:18:52 +02:00
Jens Axboe	8010c3b634	writeback: add comments to bdi_work structure And document its retriever, get_next_work_item(). Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-16 15:18:52 +02:00
Jens Axboe	b6e51316da	writeback: separate starting of sync vs opportunistic writeback bdi_start_writeback() is currently split into two paths, one for WB_SYNC_NONE and one for WB_SYNC_ALL. Add bdi_sync_writeback() for WB_SYNC_ALL writeback and let bdi_start_writeback() handle only WB_SYNC_NONE. Push down the writeback_control allocation and only accept the parameters that make sense for each function. This cleans up the API considerably. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-16 15:18:52 +02:00
Jens Axboe	bcddc3f01c	writeback: inline allocation failure handling in bdi_alloc_queue_work() This gets rid of work == NULL in bdi_queue_work() and puts the OOM handling where it belongs. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-16 15:18:52 +02:00
Jens Axboe	cfc4ba5365	writeback: use RCU to protect bdi_list Now that bdi_writeback_all() no longer handles integrity writeback, it doesn't have to block anymore. This means that we can switch bdi_list reader side protection to RCU. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-16 15:18:51 +02:00
Jens Axboe	f11fcae840	writeback: only use bdi_writeback_all() for WB_SYNC_NONE writeout Data integrity writeback must use bdi_start_writeback() and ensure that wbc->sb and wbc->bdi are set. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-16 15:18:51 +02:00
Jens Axboe	32a88aa1b6	fs: Assign bdi in super_block We do this automatically in get_sb_bdev() from the set_bdev_super() callback. Filesystems that have their own private backing_dev_info must assign that in ->fill_super(). Note that ->s_bdi assignment is required for proper writeback! Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-16 15:18:51 +02:00
Jens Axboe	c4a77a6c7d	writeback: make wb_writeback() take an argument structure We need to be able to pass in range_cyclic as well, so instead of growing yet another argument, split the arguments into a struct wb_writeback_args structure that we can use internally. Also makes it easier to just copy all members to an on-stack struct, since we can't access work after clearing the pending bit. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-16 15:18:25 +02:00
Christoph Hellwig	f0fad8a530	writeback: merely wakeup flusher thread if work allocation fails for WB_SYNC_NONE Since it's an opportunistic writeback and not a data integrity action, don't punt to blocking writeback. Just wakeup the thread and it will flush old data. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-16 15:16:18 +02:00
Jens Axboe	1fe06ad892	writeback: get rid of wbc->for_writepages It's only set, it's never checked. Kill it. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-16 15:16:18 +02:00
Jens Axboe	2c96ce9f20	fs: remove bdev->bd_inode_backing_dev_info It has been unused since it was introduced in: commit 520808bf20e90fdbdb320264ba7dd5cf9d47dcac Author: Andrew Morton <akpm@osdl.org> Date: Fri May 21 00:46:17 2004 -0700 [PATCH] block device layer: separate backing_dev_info infrastructure So lets just kill it. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-16 15:16:18 +02:00
Csaba Henk	79a9d99434	fuse: add fusectl interface to max_background Make the max_background and congestion_threshold parameters of a FUSE mount tunable at runtime by adding the respective knobs to its directory within the fusectl filesystem. Signed-off-by: Csaba Henk <csaba@gluster.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-09-16 14:15:29 +02:00
Csaba Henk	487ea5af63	fuse: limit user-specified values of max background requests An untrusted user could DoS the system if s/he were allowed to accumulate an arbitrary number of pending background requests by setting the above limits to extremely high values in INIT. This patch excludes this possibility by imposing global upper limits on the possible values of per-mount "max background requests" and "congestion threshold" parameters for unprivileged FUSE filesystems. These global limits are implemented as module parameters. Signed-off-by: Csaba Henk <csaba@gluster.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-09-16 14:15:29 +02:00
Csaba Henk	d6db07ded5	fuse: use drop_nlink() instead of direct nlink manipulation drop_nlink() is the API function to decrease the link count of an inode. However, at a place the control filesystem used the decrement operator on i_nlink directly. Fix this. Cc: Anand Avati <avati@gluster.com> Signed-off-by: Csaba Henk <csaba@gluster.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2009-09-16 14:15:28 +02:00
Alex Elder	fdec29c5fc	Merge branch 'master' of git://oss.sgi.com/xfs/xfs into for-linus Conflicts: fs/xfs/linux-2.6/xfs_lrw.c	2009-09-15 21:37:47 -05:00
Ricardo Labiaga	b09333c464	nfsd41: Refactor create_client() Move common initialization of 'struct nfs4_client' inside create_client(). Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> [nfsd41: Remember the auth flavor to use for callbacks] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-15 20:52:13 -04:00
Alexandros Batsakis	3ddc8bf5f3	nfsd41: modify nfsd4.1 backchannel to use new xprt class This patch enables the use of the nfsv4.1 backchannel. Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> [initialize rpc_create_args.bc_xprt too] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-15 20:52:13 -04:00
Ricardo Labiaga	0421b5c55a	nfsd41: Backchannel: Implement cb_recall over NFSv4.1 Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> [nfsd41: cb_recall callback] [Share v4.0 and v4.1 back channel xdr] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [Share v4.0 and v4.1 back channel xdr] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: use nfsd4_cb_sequence for callback minorversion] [nfsd41: conditionally decode_sequence in nfs4_xdr_dec_cb_recall] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: Backchannel: Add sequence arguments to callback RPC arguments] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> [pulled-in definition of nfsd4_cb_done] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-15 20:52:12 -04:00
Benny Halevy	2af73580b7	nfsd41: Backchannel: cb_sequence callback Implement the cb_sequence callback conforming to draft-ietf-nfsv4-minorversion1 Note: highest slot id and target highest slot id do not have to be 0 as was previously implemented. They can be greater than what the nfs server sent if the client supports a larger slot table on the backchannel. At this point we just ignore that. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> [Rework the back channel xdr using the shared v4.0 and v4.1 framework.] Signed-off-by: Andy Adamson <andros@netapp.com> [fixed indentation] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: use nfsd4_cb_sequence for callback minorversion] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: fix verification of CB_SEQUENCE highest slot id[ Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: Backchannel: Remove old backchannel serialization] [nfsd41: Backchannel: First callback sequence ID should be 1] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: decode_cb_sequence does not need to actually decode ignored fields] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-15 20:49:56 -04:00
Ricardo Labiaga	2a1d1b5938	nfsd41: Backchannel: Setup sequence information Follows the model used by the NFS client. Setup the RPC prepare and done function pointers so that we can populate the sequence information if minorversion == 1. rpc_run_task() is then invoked directly just like existing NFS client operations do. nfsd4_cb_prepare() determines if the sequence information needs to be setup. If the slot is in use, it adds itself to the wait queue. nfsd4_cb_done() wakes anyone sleeping on the callback channel wait queue after our RPC reply has been received. It also sets the task message result pointer to NULL to clearly indicate we're done using it. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> [define and initialize cl_cb_seq_nr here] [pulled out unused defintion of nfsd4_cb_done] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-15 20:49:56 -04:00
Ricardo Labiaga	199ff35e1c	nfsd41: Backchannel: Server backchannel RPC wait queue RPC callback requests will wait on this wait queue if the backchannel is out of slots. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-15 20:49:55 -04:00
Ricardo Labiaga	132f97715c	nfsd41: Backchannel: Add sequence arguments to callback RPC arguments Follow the model we use in the client. Make the sequence arguments part of the regular RPC arguments. None of the callbacks that are soon to be implemented expect results that need to be passed back to the caller, so we don't define a separate RPC results structure. For session validation, the cb_sequence decoding will use a pointer to the sequence arguments that are part of the RPC argument. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> [define struct nfsd4_cb_sequence here] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-15 20:49:55 -04:00
Andy Adamson	38524ab38f	nfsd41: Backchannel: callback infrastructure Keep the xprt used for create_session in cl_cb_xprt. Mark cl_callback.cb_minorversion = 1 and remember the client provided cl_callback.cb_prog rpc program number. Use it to probe the callback path. Use the client's network address to initialize as the callback's address as expected by the xprt creation routines. Define xdr sizes and code nfs4_cb_compound header to be able to send a null callback rpc. Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> [get callback minorversion from fore channel's] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: change bc_sock to bc_xprt] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [pulled definition for cl_cb_xprt] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: set up backchannel's cb_addr] [moved rpc_create_args init to "nfsd: modify nfsd4.1 backchannel to use new xprt class"] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-15 20:49:55 -04:00
J. Bruce Fields	80fc015bdf	nfsd4: use common rpc_cred for all callbacks Callbacks are always made using the machine's identity, so we can use a single auth_generic credential shared among callbacks to all clients and let the rpc code take care of the rest. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-15 20:49:34 -04:00
J. Bruce Fields	29ab23cc5d	nfsd4: allow nfs4 state startup to fail The failure here is pretty unlikely, but we should handle it anyway. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-15 20:49:33 -04:00
J. Bruce Fields	886e3b7fe6	nfsd4: fix null dereference creating nfsv4 callback client On setting up the callback to the client, we attempt to use the same authentication flavor the client did. We find an rpc cred to use by calling rpcauth_lookup_credcache(), which assumes that the given authentication flavor has a credentials cache. However, this is not required to be true--in particular, auth_null does not use one. Instead, we should call the auth's lookup_cred() method. Without this, a client attempting to mount using nfsv4 and auth_null triggers a null dereference. Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-15 20:49:33 -04:00
Jaswinder Singh Rajput	9ef96da6ec	xfs: includecheck fix for fs/xfs/xfs_iops.c fix the following 'make includecheck' warning: fs/xfs/linux-2.6/xfs_iops.c: xfs_acl.h is included more than once. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-09-15 12:30:30 -05:00
Alexey Dobriyan	361735fd8f	xfs: switch to seq_file create_proc_read_entry() is getting deprecated. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-09-15 12:29:24 -05:00
David Brownell	a4dbd6740d	driver model: constify attribute groups Let attribute group vectors be declared "const". We'd like to let most attribute metadata live in read-only sections... this is a start. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-09-15 09:50:47 -07:00
Artem Bityutskiy	be9e62a730	UBIFS: improve lprops dump Improve 'dbg_dump_lprop()' and print dark and dead space there, decode flags, and journal heads. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2009-09-15 17:09:48 +03:00
Artem Bityutskiy	055da1b704	UBIFS: various minor commentary fixes Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2009-09-15 17:09:24 +03:00
Artem Bityutskiy	77a7ae580c	UBIFS: improve journal head debugging prints Convert the journal head integer into the head name when printing debugging information. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2009-09-15 17:05:06 +03:00
Artem Bityutskiy	d6d140097b	UBIFS: define journal head numbers in ubifs-media.h The journal head names and numbers are part of the UBIFS format, so they should be in the ubifs-media.h. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2009-09-15 14:45:35 +03:00
Theodore Ts'o	3661d28615	ext4: Fix include/trace/events/ext4.h to work with Systemtap Using relative pathnames in #include statements interacts badly with SystemTap, since the fs/ext4/*.h header files are not packaged up as part of a distribution kernel's header files. Since systemtap doesn't use TP_fast_assign(), we can use a blind structure definition and then make sure the needed header files are defined before the ext4 source files #include the trace/events/ext4.h header file. https://bugzilla.redhat.com/show_bug.cgi?id=512478 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-14 22:59:50 -04:00
Linus Torvalds	355bbd8cb8	Merge branch 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block: (29 commits) block: use blkdev_issue_discard in blk_ioctl_discard Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads block: don't assume device has a request list backing in nr_requests store block: Optimal I/O limit wrapper cfq: choose a new next_req when a request is dispatched Seperate read and write statistics of in_flight requests aoe: end barrier bios with EOPNOTSUPP block: trace bio queueing trial only when it occurs block: enable rq CPU completion affinity by default cfq: fix the log message after dispatched a request block: use printk_once cciss: memory leak in cciss_init_one() splice: update mtime and atime on files block: make blk_iopoll_prep_sched() follow normal 0/1 return convention cfq-iosched: get rid of must_alloc flag block: use interrupts disabled version of raise_softirq_irqoff() block: fix comment in blk-iopoll.c block: adjust default budget for blk-iopoll block: fix long lines in block/blk-iopoll.c block: add blk-iopoll, a NAPI like approach for block devices ...	2009-09-14 17:55:15 -07:00
Linus Torvalds	4142e0d1de	Merge branch 'osync_cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'osync_cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: fsync: wait for data writeout completion before calling ->fsync vfs: Remove generic_osync_inode() and sync_page_range{_nolock}() fat: Opencode sync_page_range_nolock() pohmelfs: Use new syncing helper xfs: Convert sync_page_range() to simple filemap_write_and_wait_range() ocfs2: Update syncing after splicing to match generic version ntfs: Use new syncing helpers and update comments ext4: Remove syncing logic from ext4_file_write ext3: Remove syncing logic from ext3_file_write ext2: Update comment about generic_osync_inode vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode vfs: Rename generic_file_aio_write_nolock ocfs2: Use __generic_file_aio_write instead of generic_file_aio_write_nolock pohmelfs: Use __generic_file_aio_write instead of generic_file_aio_write_nolock vfs: Remove syncing from generic_file_direct_write() and generic_file_buffered_write() vfs: Export __generic_file_aio_write() and add some comments vfs: Introduce filemap_fdatawait_range	2009-09-14 14:36:47 -07:00
Linus Torvalds	33f1de6931	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: GFS2: Whitespace fixes GFS2: Remove unused sysfs file GFS2: Be extra careful about deallocating inodes GFS2: Remove no_formal_ino generating code GFS2: Rename eattr.[ch] as xattr.[ch] GFS2: Clean up of extended attribute support GFS2: Add explanation of extended attr on-disk format GFS2: Add "-o errors=panic\|withdraw" mount options GFS2: jumping to wrong label? GFS2: free disk inode which is deleted by remote node -V2 GFS2: Add a document explaining GFS2's uevents GFS2: Add sysfs link to device GFS2: Replace assertion with proper error handling GFS2: Improve error handling in inode allocation GFS2: Add some more info to uevents GFS2: Add online uevent to GFS2	2009-09-14 14:35:56 -07:00
Linus Torvalds	041d6d0be8	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6: udf: Fix possible corruption when close races with write udf: Perform preallocation only for regular files udf: Remove wrong assignment in udf_symlink udf: Remove dead code	2009-09-14 14:35:07 -07:00
Linus Torvalds	af8cb8aa38	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (21 commits) fs/Kconfig: move nilfs2 outside misc filesystems nilfs2: convert nilfs_bmap_lookup to an inline function nilfs2: allow btree code to directly call dat operations nilfs2: add update functions of virtual block address to dat nilfs2: remove individual gfp constants for each metadata file nilfs2: stop zero-fill of btree path just before free it nilfs2: remove unused btree argument from btree functions nilfs2: remove nilfs_dat_abort_start and nilfs_dat_abort_free nilfs2: shorten freeze period due to GC in write operation v3 nilfs2: add more check routines in mount process nilfs2: An unassigned variable is assigned to a never used structure member nilfs2: use GFP_NOIO for bio_alloc instead of GFP_NOWAIT nilfs2: stop using periodic write_super callback nilfs2: clean up nilfs_write_super nilfs2: fix disorder of nilfs_write_super in nilfs_sync_fs nilfs2: remove redundant super block commit nilfs2: implement nilfs_show_options to display mount options in /proc/mounts nilfs2: always lookup disk block address before reading metadata block nilfs2: use semaphore to protect pointer to a writable FS-instance nilfs2: fix format string compile warning (ino_t) ...	2009-09-14 14:34:33 -07:00
Linus Torvalds	6cdb5930a6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: consolidate reconnect logic in smb_init routines cifs: Replace wrtPending with a real reference count cifs: protect GlobalOplock_Q with its own spinlock cifs: use tcon pointer in cifs_show_options cifs: send IPv6 addr in upcall with colon delimiters [CIFS] Fix checkpatch warnings PATCH] cifs: fix broken mounts when a SSH tunnel is used (try #4) [CIFS] Memory leak in ntlmv2 hash calculation [CIFS] potential NULL dereference in parse_DFS_referrals()	2009-09-14 14:33:13 -07:00
Linus Torvalds	d7e9660ad9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits) netxen: update copyright netxen: fix tx timeout recovery netxen: fix file firmware leak netxen: improve pci memory access netxen: change firmware write size tg3: Fix return ring size breakage netxen: build fix for INET=n cdc-phonet: autoconfigure Phonet address Phonet: back-end for autoconfigured addresses Phonet: fix netlink address dump error handling ipv6: Add IFA_F_DADFAILED flag net: Add DEVTYPE support for Ethernet based devices mv643xx_eth.c: remove unused txq_set_wrr() ucc_geth: Fix hangs after switching from full to half duplex ucc_geth: Rearrange some code to avoid forward declarations phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs drivers/net/phy: introduce missing kfree drivers/net/wan: introduce missing kfree net: force bridge module(s) to be GPL Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded ... Fixed up trivial conflicts: - arch/x86/include/asm/socket.h converted to <asm-generic/socket.h> in the x86 tree. The generic header has the same new #define's, so that works out fine. - drivers/net/tun.c fix conflict between `89f56d1e9` ("tun: reuse struct sock fields") that switched over to using 'tun->socket.sk' instead of the redundantly available (and thus removed) 'tun->sk', and `2b980dbd` ("lsm: Add hooks to the TUN driver") which added a new 'tun->sk' use. Noted in 'next' by Stephen Rothwell.	2009-09-14 10:37:28 -07:00
Jan Kara	cbc8cc3352	udf: Fix possible corruption when close races with write When we close a file, we remove preallocated blocks from it. But this truncation was not protected by i_mutex and thus it could have raced with a write through a different fd and cause crashes or even filesystem corruption. Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-14 19:13:01 +02:00
Jan Kara	81056dd044	udf: Perform preallocation only for regular files So far we preallocated blocks also for directories but that brings a problem, when to get rid of preallocated blocks we don't need. So far we removed them in udf_clear_inode() which has a disadvantage that 1) blocks are unavailable long after writing to a directory finished and thus one can get out of space unnecessarily early 2) releasing blocks from udf_clear_inode is problematic because VFS does not expect us to redirty inode there and it also slows down memory reclaim. So preallocate blocks only for regular files where we can drop preallocation in udf_release_file. Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-14 19:13:00 +02:00
Jan Kara	7c6e3d1aae	udf: Remove wrong assignment in udf_symlink Recomputation of the pointer was wrong (it should have been just increment). Luckily, we never use the computed value. Remove it. Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-14 19:13:00 +02:00
Jan Kara	5891d9dd2a	udf: Remove dead code Remove code that gets never used. Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-14 19:13:00 +02:00
Christoph Hellwig	2daea67e96	fsync: wait for data writeout completion before calling ->fsync Currenly vfs_fsync(_range) first calls filemap_fdatawrite to write out the data, the calls into ->fsync to write out the metadata and then finally calls filemap_fdatawait to wait for the data I/O to complete. What sounds like a clever micro-optimization actually is nast trap for many filesystems. For many modern filesystems i_size or other inode information is only updated on I/O completion and we need to wait for I/O to finish before we can write out the metadata. For old fashionen filesystems that instanciate blocks during the actual write and also update the metadata at that point it opens up a large window were we could expose uninitialized blocks after a crash. While a few filesystems that need it already wait for the I/O to finish inside their ->fsync methods it is rather suboptimal as it is done under the i_mutex and also always for the whole file instead of just a part as we could do for O_SYNC handling. Here is a small audit of all fsync instances in the tree: - spufs_mfc_fsync: - ps3flash_fsync: - vol_cdev_fsync: - printer_fsync: - fb_deferred_io_fsync: - bad_file_fsync: - simple_sync_file: don't care - filesystems/drivers do't use the page cache or are purely in-memory. - simple_fsync: - file_fsync: - affs_file_fsync: - fat_file_fsync: - jfs_fsync: - ubifs_fsync: - reiserfs_dir_fsync: - reiserfs_sync_file: never touch pagecache themselves. We need to wait before if we do not want to expose stale data after an allocation. - afs_fsync: - fuse_fsync_common: do the waiting writeback itself in awkward ways, would benefit from proper semantics - block_fsync: Does a filemap_write_and_wait on the block device inode. Because we now have f_mapping that is the same inode we call it on in vfs_fsync. So just removing it and letting the VFS do the work in one go would be an improvement. - btrfs_sync_file: - cifs_fsync: - xfs_file_fsync: need the wait first and currently do it themselves. would benefit from doing it outside i_mutex. - coda_fsync: - ecryptfs_fsync: - exofs_file_fsync: - shm_fsync: only passes the fsync through to the lower layer - ext3_sync_file: doesn't seem to care, comments are confusing. - ext4_sync_file: would need the wait to work correctly for delalloc mode with late i_size updates. Otherwise the ext3 comment applies. currently implemens it's own writeback and wait in an odd way, could benefit from doing it properly. - gfs2_fsync: not needed for journaled data mode, but probably harmless there. Currently writes back data asynchronously itself. Needs some major audit. - hostfs_fsync: just calls fsync/datasync on the host FD. Without the wait before data might not even be inflight yet if we're unlucky. - hpfs_file_fsync: - ncp_fsync: no-ops. Dangerous before and after. - jffs2_fsync: just calls jffs2_flush_wbuf_gc, not sure how this relates to data. - nfs_fsync_dir: just increments stats, claims all directory operations are synchronous - nfs_file_fsync: only writes out data??? Looks very odd. - nilfs_sync_file: looks like it expects all data done, but not sure from the code - ntfs_dir_fsync: - ntfs_file_fsync: appear to do their own data writeback. Very convoluted code. - ocfs2_sync_file: does it's own data writeback, but no wait. probably needs the wait. - smb_fsync: according to a comment expects all pages written already, probably needs the wait before. This patch only changes vfs_fsync_range, removal of the wait in the methods that have it is left to the filesystem maintainers. Note that most filesystems really do need an audit for their fsync methods given the gems found in this very brief audit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-14 17:08:17 +02:00
Jan Kara	18f2ee705d	vfs: Remove generic_osync_inode() and sync_page_range{_nolock}() Remove these three functions since nobody uses them anymore. Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-14 17:08:17 +02:00
Jan Kara	2f3d675bcd	fat: Opencode sync_page_range_nolock() fat_cont_expand() is the only user of sync_page_range_nolock(). It's also the only user of generic_osync_inode() which does not have a file open. So opencode needed actions for FAT so that we can convert generic_osync_inode() to a standard syncing path. Update a comment about generic_osync_inode(). CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-14 17:08:17 +02:00
Jan Kara	af0f4414f3	xfs: Convert sync_page_range() to simple filemap_write_and_wait_range() Christoph Hellwig says that it is enough for XFS to call filemap_write_and_wait_range() instead of sync_page_range() because we do all the metadata syncing when forcing the log. CC: Felix Blyakher <felixb@sgi.com> CC: xfs@oss.sgi.com CC: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-14 17:08:17 +02:00
Jan Kara	d23c937b0f	ocfs2: Update syncing after splicing to match generic version Update ocfs2 specific splicing code to use generic syncing helper. The sync now does not happen under rw_lock because generic_write_sync() acquires i_mutex which ranks above rw_lock. That should not matter because standard fsync path does not hold it either. Acked-by: Joel Becker <Joel.Becker@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> CC: ocfs2-devel@oss.oracle.com Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-14 17:08:16 +02:00
Jan Kara	ebbbf757c6	ntfs: Use new syncing helpers and update comments Use new syncing helpers in .write and .aio_write functions. Also remove superfluous syncing in ntfs_file_buffered_write() and update comments about generic_osync_inode(). CC: Anton Altaparmakov <aia21@cantab.net> CC: linux-ntfs-dev@lists.sourceforge.net Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-14 17:08:16 +02:00
Jan Kara	0d34ec62e1	ext4: Remove syncing logic from ext4_file_write The syncing is now properly handled by generic_file_aio_write() so no special ext4 code is needed. CC: linux-ext4@vger.kernel.org CC: tytso@mit.edu Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-14 17:08:16 +02:00
Jan Kara	e367626b61	ext3: Remove syncing logic from ext3_file_write Syncing is now properly done by generic_file_aio_write() so no special logic is needed in ext3. CC: linux-ext4@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-14 17:08:16 +02:00
Jan Kara	a2a735ad66	ext2: Update comment about generic_osync_inode We rely on generic_write_sync() now. CC: linux-ext4@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-14 17:08:16 +02:00
Jan Kara	148f948ba8	vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode Introduce new function for generic inode syncing (vfs_fsync_range) and use it from fsync() path. Introduce also new helper for syncing after a sync write (generic_write_sync) using the generic function. Use these new helpers for syncing from generic VFS functions. This makes O_SYNC writes to block devices acquire i_mutex for syncing. If we really care about this, we can make block_fsync() drop the i_mutex and reacquire it before it returns. CC: Evgeniy Polyakov <zbr@ioremap.net> CC: ocfs2-devel@oss.oracle.com CC: Joel Becker <joel.becker@oracle.com> CC: Felix Blyakher <felixb@sgi.com> CC: xfs@oss.sgi.com CC: Anton Altaparmakov <aia21@cantab.net> CC: linux-ntfs-dev@lists.sourceforge.net CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> CC: linux-ext4@vger.kernel.org CC: tytso@mit.edu Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-14 17:08:15 +02:00
Christoph Hellwig	eef9938067	vfs: Rename generic_file_aio_write_nolock generic_file_aio_write_nolock() is now used only by block devices and raw character device. Filesystems should use __generic_file_aio_write() in case generic_file_aio_write() doesn't suit them. So rename the function to blkdev_aio_write() and move it to fs/blockdev.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-14 17:08:15 +02:00
Jan Kara	918941a3f3	ocfs2: Use __generic_file_aio_write instead of generic_file_aio_write_nolock Use the new helper. We have to submit data pages ourselves in case of O_SYNC write because __generic_file_aio_write does not do it for us. OCFS2 developpers might think about moving the sync out of i_mutex which seems to be easily possible but that's out of scope of this patch. CC: ocfs2-devel@oss.oracle.com Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-14 17:08:15 +02:00
Ryusuke Konishi	41f4db0f48	fs/Kconfig: move nilfs2 outside misc filesystems Some people asked me questions like the following: On Wed, 15 Jul 2009 13:11:21 +0200, Leon Woestenberg wrote: > just wondering, any reasons why NILFS2 is one of the miscellaneous > filesystems and, for example, btrfs, is not in Kconfig? Actually, nilfs is NOT a filesystem came from other operating systems, but a filesystem created purely for Linux. Nor is it a flash filesystem but that for generic block devices. So, this moves nilfs outside the misc category as I responded in LKML "Re: Why does NILFS2 hide under Miscellaneous filesystems?" (Message-Id: <20090716.002526.93465395.ryusuke@osrg.net>). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:16 +09:00
Ryusuke Konishi	0f3fe33b39	nilfs2: convert nilfs_bmap_lookup to an inline function The nilfs_bmap_lookup() is now a wrapper function of nilfs_bmap_lookup_at_level(). This moves the nilfs_bmap_lookup() to a header file converting it to an inline function and gives an opportunity for optimization. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:16 +09:00
Ryusuke Konishi	2e0c2c7392	nilfs2: allow btree code to directly call dat operations The current btree code is written so that btree functions call dat operations via wrapper functions in bmap.c when they allocate, free, or modify virtual block addresses. This abstraction requires additional function calls and causes frequent call of nilfs_bmap_get_dat() function since it is used in the every wrapper function. This removes the wrapper functions and makes them available from btree.c and direct.c, which will increase the opportunity of compiler optimization. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:16 +09:00
Ryusuke Konishi	bd8169efae	nilfs2: add update functions of virtual block address to dat This is a preparation for the successive cleanup ("nilfs2: allow btree to directly call dat operations"). This adds functions bundling a few operations to change an entry of virtual block address on the dat file. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:15 +09:00
Ryusuke Konishi	7a102b0923	nilfs2: remove individual gfp constants for each metadata file This gets rid of NILFS_CPFILE_GFP, NILFS_SUFILE_GFP, NILFS_DAT_GFP, and NILFS_IFILE_GFP. All of these constants refer to NILFS_MDT_GFP, and can be removed. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:15 +09:00
Ryusuke Konishi	3218929dbd	nilfs2: stop zero-fill of btree path just before free it The btree path object is cleared just before it is freed. This will remove the code doing the unnecessary clear operation. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:15 +09:00
Ryusuke Konishi	6d28f7ea43	nilfs2: remove unused btree argument from btree functions Even though many btree functions take a btree object as their first argument, most of them are not used in their functions. This sticky use of the btree argument is hurting code readability and giving the possibility of inefficient code generation. So, this removes the unnecessary btree arguments. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:15 +09:00
Ryusuke Konishi	9ead986373	nilfs2: remove nilfs_dat_abort_start and nilfs_dat_abort_free These functions are not called from any functions. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:15 +09:00
Jiro SEKIBA	1cf58fa840	nilfs2: shorten freeze period due to GC in write operation v3 This is a re-revised patch to shorten freeze period. This version include a fix of the bug Konishi-san mentioned last time. When GC is runnning, GC moves live block to difference segments. Copying live blocks into memory is done in a transaction, however it is not necessarily to be in the transaction. This patch will get the nilfs_ioctl_move_blocks() out from transaction lock and put it before the transaction. I ran sysbench fileio test against nilfs partition. I copied some DVD/CD images and created snapshot to create live blocks before starting the benchmark. Followings are summary of rc8 and rc8 w/ the patch of per-request statistics, which is min/max and avg. I ran each test three times and bellow is average of those numers. According to this benchmark result, average time is slightly degrated. However, worstcase (max) result is significantly improved. This can address a few seconds write freeze. - random write per-request performance of rc8 min 0.843ms max 680.406ms avg 3.050ms - random write per-request performance of rc8 w/ this patch min 0.843ms -> 100.00% max 380.490ms -> 55.90% avg 3.233ms -> 106.00% - sequential write per-request performance of rc8 min 0.736ms max 774.343ms avg 2.883ms - sequential write per-request performance of rc8 w/ this patch min 0.720ms -> 97.80% max 644.280ms-> 83.20% avg 3.130ms -> 108.50% -----8<-----8<-----nilfs_cleanerd.conf-----8<-----8<----- protection_period 150 selection_policy timestamp # timestamp in ascend order nsegments_per_clean 2 cleaning_interval 2 retry_interval 60 use_mmap log_priority info -----8<-----8<-----nilfs_cleanerd.conf-----8<-----8<----- Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:15 +09:00
Zhu Yanhai	43be0ec038	nilfs2: add more check routines in mount process nilfs2: Add more safeguard routines and protections in mount process, which also makes nilfs2 report consistency error messages when checkpoint number is invalid. Signed-off-by: Zhu Yanhai <zhu.yanhai@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:14 +09:00
Zhang Qiang	a4f0b9c5b4	nilfs2: An unassigned variable is assigned to a never used structure member nilfs2: In procedure 'nilfs_get_sb()', when a nilfs filesysttem is mounted for the first time, local variable 'nilfs->ns_last_cno' is used before loading the latest checkpoint number from disk (in 'nilfs_fill_super'). 'nilfs->ns_last_cno' is assigned to 'sd.cno', but 'sd.cno' has never been used in the procedure. Signed-off-by: Zhang Qiang <zhangqiang.buaa@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:14 +09:00
Ryusuke Konishi	c1b353f04a	nilfs2: use GFP_NOIO for bio_alloc instead of GFP_NOWAIT Alberto Bertogli advised me about bio_alloc() use in nilfs: On Sat, 13 Jun 2009 22:52:40 -0300, Alberto Bertogli wrote: > By the way, those bio_alloc()s are using GFP_NOWAIT but it looks > like they could use at least GFP_NOIO or GFP_NOFS, since the caller > can (and sometimes do) sleep. The only caller is nilfs_submit_bh(), > which calls nilfs_submit_seg_bio() which can sleep calling > wait_for_completion(). This takes in the comment and replaces the use of GFP_NOWAIT flag with GFP_NOIO. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:14 +09:00
Jiro SEKIBA	1dfa27105a	nilfs2: stop using periodic write_super callback This removes nilfs_write_super and commit super block in nilfs internal thread, instead of periodic write_super callback. VFS layer calls ->write_super callback periodically. However, it looks like that calling back is ommited when disk I/O is busy. And when cleanerd (nilfs GC) is runnig, disk I/O tend to be busy thus nilfs superblock is not synchronized as nilfs designed. To avoid it, syncing superblock by nilfs thread instead of pdflush. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:14 +09:00
Jiro SEKIBA	79efdd9411	nilfs2: clean up nilfs_write_super Separate conditions that check if syncing super block and alternative super block are required as inline functions to reuse the conditions. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:14 +09:00
Jiro SEKIBA	6233caa9d5	nilfs2: fix disorder of nilfs_write_super in nilfs_sync_fs This fixes disorder of nilfs_write_super in nilfs_sync_fs. Commiting super block must be the end of the function so that every changes are reflected. ->sync_fs() is not called frequently so this makes nilfs_sync_fs call nilfs_commit_super instead of nilfs_write_super. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:14 +09:00
Jiro SEKIBA	ec5d66abdb	nilfs2: remove redundant super block commit This removes redundant super block commit. nilfs_write_super will call nilfs_commit_super to store super block into block device. However, nilfs_put_super will call nilfs_commit_super right after calling nilfs_write_super. So calling nilfs_write_super in nilfs_put_super would be redundant. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:13 +09:00
Jiro SEKIBA	b58a285ba4	nilfs2: implement nilfs_show_options to display mount options in /proc/mounts This is a patch to display mount options in procfs. Mount options will show up in the /proc/mounts as other fs does. ... /dev/sda6 /mnt nilfs2 ro,relatime,barrier=off,cp=3,order=strict 0 0 ... Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:13 +09:00
Ryusuke Konishi	1435110467	nilfs2: always lookup disk block address before reading metadata block The current metadata file code skips disk address lookup for its data block if the buffer has a mapped flag. This has a potential risk to cause read request to be performed against the stale block address that GC moved, and it may lead to meta data corruption. The mapped flag is safe if the buffer has an uptodate flag, otherwise it may prevent necessary update of disk address in the next read. This will avoid the potential problem by ensuring disk address lookup before reading metadata block even for buffers with the mapped flag. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:13 +09:00
Ryusuke Konishi	027d6404eb	nilfs2: use semaphore to protect pointer to a writable FS-instance will get rid of nilfs_get_writer() and nilfs_put_writer() pair used to retain a writable FS-instance for a period. The pair functions were making up some kind of recursive lock with a mutex, but they became overkill since the commit `201913ed74`. Furthermore, they caused the following lockdep warning because the mutex can be released by a task which didn't lock it: ===================================== [ BUG: bad unlock balance detected! ] ------------------------------------- kswapd0/422 is trying to release lock (&nilfs->ns_writer_mutex) at: [<c1359ff5>] mutex_unlock+0x8/0xa but there are no more locks to release! other info that might help us debug this: no locks held by kswapd0/422. stack backtrace: Pid: 422, comm: kswapd0 Not tainted 2.6.31-rc4-nilfs #51 Call Trace: [<c1358f97>] ? printk+0xf/0x18 [<c104fea7>] print_unlock_inbalance_bug+0xcc/0xd7 [<c11578de>] ? prop_put_global+0x3/0x35 [<c1050195>] lock_release+0xed/0x1dc [<c1359ff5>] ? mutex_unlock+0x8/0xa [<c1359f83>] __mutex_unlock_slowpath+0xaf/0x119 [<c1359ff5>] mutex_unlock+0x8/0xa [<d1284add>] nilfs_mdt_write_page+0xd8/0xe1 [nilfs2] [<c1092653>] shrink_page_list+0x379/0x68d [<c109171b>] ? isolate_pages_global+0xb4/0x18c [<c1092bd2>] shrink_list+0x26b/0x54b [<c10930be>] shrink_zone+0x20c/0x2a2 [<c10936b7>] kswapd+0x407/0x591 [<c1091667>] ? isolate_pages_global+0x0/0x18c [<c1040603>] ? autoremove_wake_function+0x0/0x33 [<c10932b0>] ? kswapd+0x0/0x591 [<c104033b>] kthread+0x69/0x6e [<c10402d2>] ? kthread+0x0/0x6e [<c1003e33>] kernel_thread_helper+0x7/0x1a This patch uses a reader/writer semaphore instead of the own lock and kills this warning. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:13 +09:00
Heiko Carstens	b5696e5e0d	nilfs2: fix format string compile warning (ino_t) Unlike on most other architectures ino_t is an unsigned int on s390. So add an explicit cast to avoid this compile warning: fs/nilfs2/recovery.c: In function 'recover_dsync_blocks': fs/nilfs2/recovery.c:555: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'ino_t' Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:13 +09:00
Ryusuke Konishi	1b2f5a641b	nilfs2: fix ignored error code in __nilfs_read_inode() The __nilfs_read_inode function is ignoring the error code returned from nilfs_read_inode_common(), and wrongly delivers a success code (zero) when it escapes from the function in erroneous cases. This adds the missing error handling. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:12 +09:00
Steven Whitehouse	86d0063656	GFS2: Whitespace fixes Reported-by: Daniel Walker <dwalker@fifo99.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-09-14 09:50:57 +01:00
Christoph Hellwig	746cd1e7e4	block: use blkdev_issue_discard in blk_ioctl_discard blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard, the only difference between the two is that blkdev_issue_discard needs to send a barrier discard request and blk_ioctl_discard a non-barrier one, and blk_ioctl_discard needs to wait on the request. To facilitates this add a flags argument to blkdev_issue_discard to control both aspects of the behaviour. This will be very useful later on for using the waiting funcitonality for other callers. Based on an earlier patch from Matthew Wilcox <matthew@wil.cx>. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-14 08:24:53 +02:00
Nikanth Karthikesan	a9327cac44	Seperate read and write statistics of in_flight requests Currently, there is a single in_flight counter measuring the number of requests in the request_queue. But some monitoring tools would like to know how many read requests and write requests are in progress. Split the current in_flight counter into two seperate counters for read and write. This information is exported as a sysfs attribute, as changing the currently available stat files would break the existing tools. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-14 08:24:52 +02:00
Benny Halevy	4be36ca0ce	nfsd4: fix whitespace in NFSPROC4_CLNT_CB_NULL definition Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-13 15:57:39 -04:00
Linus Torvalds	86d710146f	Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (87 commits) NFSv4: Disallow 'mount -t nfs4 -overs=2' and 'mount -t nfs4 -overs=3' NFS: Allow the "nfs" file system type to support NFSv4 NFS: Move details of nfs4_get_sb() to a helper NFS: Refactor NFSv4 text-based mount option validation NFS: Mount option parser should detect missing "port=" NFS: out of date comment regarding O_EXCL above nfs3_proc_create() NFS: Handle a zero-length auth flavor list SUNRPC: Ensure that sunrpc gets initialised before nfs, lockd, etc... nfs: fix compile error in rpc_pipefs.h nfs: Remove reference to generic_osync_inode from a comment SUNRPC: cache must take a reference to the cache detail's module on open() NFS: Use the DNS resolver in the mount code. NFS: Add a dns resolver for use with NFSv4 referrals and migration SUNRPC: Fix a typo in cache_pipefs_files nfs: nfs4xdr: optimize low level decoding nfs: nfs4xdr: get rid of READ_BUF nfs: nfs4xdr: simplify decode_exchange_id by reusing decode_opaque_inline nfs: nfs4xdr: get rid of COPYMEM nfs: nfs4xdr: introduce decode_sessionid helper nfs: nfs4xdr: introduce decode_verifier helper ...	2009-09-11 16:39:11 -07:00
Theodore Ts'o	7ad9bb651f	ext4: Fix initalization of s_flex_groups The s_flex_groups array should have been initialized using atomic_add to sum up the free counts from the block groups that make up a flex_bg. By using atomic_set, the value of the s_flex_groups array was set to the values of the last block group in the flex_bg. The impact of this bug is that the block and inode allocation algorithms might not pick the best flex_bg for new allocation. Thanks to Damien Guibouret for pointing out this problem! Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-11 16:51:28 -04:00
Linus Torvalds	774a694f8c	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits) sched: Fix sched::sched_stat_wait tracepoint field sched: Disable NEW_FAIR_SLEEPERS for now sched: Keep kthreads at default priority sched: Re-tune the scheduler latency defaults to decrease worst-case latencies sched: Turn off child_runs_first sched: Ensure that a child can't gain time over it's parent after fork() sched: enable SD_WAKE_IDLE sched: Deal with low-load in wake_affine() sched: Remove short cut from select_task_rq_fair() sched: Turn on SD_BALANCE_NEWIDLE sched: Clean up topology.h sched: Fix dynamic power-balancing crash sched: Remove reciprocal for cpu_power sched: Try to deal with low capacity, fix update_sd_power_savings_stats() sched: Try to deal with low capacity sched: Scale down cpu_power due to RT tasks sched: Implement dynamic cpu_power sched: Add smt_gain sched: Update the cpu_power sum during load-balance sched: Add SD_PREFER_SIBLING ...	2009-09-11 13:23:18 -07:00
Trond Myklebust	ab3bbaa8b2	Merge branch 'nfs-for-2.6.32'	2009-09-11 14:59:37 -04:00
Linus Torvalds	a9c86d4259	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (377 commits) ASoC: au1x: PSC-AC97 bugfixes ALSA: dummy - Increase MAX_PCM_SUBSTREAMS to 128 ALSA: dummy - Add debug proc file ALSA: Add const prefix to proc helper functions ALSA: Re-export snd_pcm_format_name() function ALSA: hda - Use auto model for HP laptops with ALC268 codec ALSA: cs46xx - Fix minimum period size ASoC: Fix WM835x Out4 capture enumeration ALSA: Remove unneeded ifdef from sound/core.h ALSA: Remove struct snd_monitor_file from public sound/core.h ASoC: Remove unuused hw_read_t sound: oxygen: work around MCE when changing volume ALSA: dummy - Fake buffer allocations ALSA: hda/realtek: Added support for CLEVO M540R subsystem, 6 channel + digital ASoC: fix pxa2xx-ac97.c breakage ALSA: dummy - Fix the timer calculation in systimer mode ALSA: dummy - Add more description ALSA: dummy - Better jiffies handling ALSA: dummy - Support high-res timer mode ALSA: Release v1.0.21 ...	2009-09-11 09:19:35 -07:00
Linus Torvalds	a12e4d304c	Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-block * 'writeback' of git://git.kernel.dk/linux-2.6-block: writeback: check for registered bdi in flusher add and inode dirty writeback: add name to backing_dev_info writeback: add some debug inode list counters to bdi stats writeback: get rid of pdflush completely writeback: switch to per-bdi threads for flushing data writeback: move dirty inodes from super_block to backing_dev_info writeback: get rid of generic_sync_sb_inodes() export	2009-09-11 09:17:05 -07:00
Linus Torvalds	f6f7919086	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (57 commits) binfmt_elf: fix PT_INTERP bss handling TPM: Fixup boot probe timeout for tpm_tis driver sysfs: Add labeling support for sysfs LSM/SELinux: inode_{get,set,notify}secctx hooks to access LSM security context information. VFS: Factor out part of vfs_setxattr so it can be called from the SELinux hook for inode_setsecctx. KEYS: Add missing linux/tracehook.h #inclusions KEYS: Fix default security_session_to_parent() Security/SELinux: includecheck fix kernel/sysctl.c KEYS: security_cred_alloc_blank() should return int under all circumstances IMA: open new file for read KEYS: Add a keyctl to install a process's session keyring on its parent [try #6] KEYS: Extend TIF_NOTIFY_RESUME to (almost) all architectures [try #6] KEYS: Do some whitespace cleanups [try #6] KEYS: Make /proc/keys use keyid not numread as file position [try #6] KEYS: Add garbage collection for dead, revoked and expired keys. [try #6] KEYS: Flag dead keys to induce EKEYREVOKED [try #6] KEYS: Allow keyctl_revoke() on keys that have SETATTR but not WRITE perm [try #6] KEYS: Deal with dead-type keys appropriately [try #6] CRED: Add some configurable debugging [try #6] selinux: Support for the new TUN LSM hooks ...	2009-09-11 08:55:49 -07:00

1 2 3 4 5 ...

15294 Commits