d_add_ci was lifted 1:1 from ntfs. Change ntfs to use the common
version.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
As pointed out during review d_add_ci argument order should match d_add,
so switch the dentry and inode arguments.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Ouch, if number taken from IDA is too big, the intent was to signal an
error, not check for overflow and still do overflowing addition.
One still needs 2^28 proc entries to notice this.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch lets the files using linux/version.h match the files that
#include it.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It appears that configfs_rmdir() can protect configfs_detach_prep() retries with
less calls to {spin,mutex}_{lock,unlock}, and a cleaner code.
This patch does not change any behavior, except that it removes two useless
lock/unlock pairs having nothing inside to protect and providing a useless
barrier.
Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
Signed-off-by: Joel Becker <Joel.Becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
We were setting i_blocks based on allocation before the extent insert, which
is wrong as the value is a calculation based on ip_clusters which gets
updated as a result of the insert. This patch moves the line in question
to just after the call to ocfs2_insert_extent().
Without this fix, inline directories were temporarily having an i_blocks
value of zero immediately after expansion to extents.
Reported-and-tested-by: Tristan Ye <tristan.ye@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
When we fail to insert extent in ocfs2_expand_inline_dir(), we should go to
out_commit, not out.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
This fixes a bug introduced with 539d8264093560b917ee3afe4c7f74e5da09d6a5:
[PATCH 2/2] ocfs2: Fix race between mount and recovery
ocfs2_mark_dead_nodes() was reading journal inodes while holding the
spinlock protecting our in-memory recovery state. The fix is very simple -
the disk state is protected by a cluster lock that's already held, so we
just move the spinlock down past the read.
Reviewed-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
ocfs2/cluster/netdebug.c: fix warning
fs/ocfs2/cluster/netdebug.c:154: warning: format '%lu' expects
type 'long unsigned int', but argument 17 has type 'suseconds_t'
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Commit 0f475b2abe (ocfs2/net: Silence build
warnings) made sense as far as it fixed compile warnings, but it was not
required that it made the functions global.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: Update documentation to remind users to update mke2fs.conf
ext4: Fix small file fragmentation
ext4: Initialize writeback_index to 0 when allocating a new inode
ext4: make sure ext4_has_free_blocks returns 0 for ENOSPC
ext4: journal credit fix for the delayed allocation's writepages() function
ext4: Rework the ext4_da_writepages() function
ext4: journal credits reservation fixes for DIO, fallocate
ext4: journal credits reservation fixes for extent file writepage
ext4: journal credits calulation cleanup and fix for non-extent writepage
ext4: Fix bug where we return ENOSPC even though we have plenty of inodes
ext4: don't try to resize if there are no reserved gdt blocks left
ext4: Use ext4_discard_reservations instead of mballoc-specific call
ext4: Fix ext4_dx_readdir hash collision handling
ext4: Fix delalloc release block reservation for truncate
ext4: Fix potential truncate BUG due to i_prealloc_list being non-empty
ext4: Handle unwritten extent properly with delayed allocation
Always allow truncations to zero, even if budgeting thinks there
is no space. UBIFS reserves some space for deletions anyway.
Otherwise, the following happans:
1. create a file, and write as much as possible there, until ENOSPC
2. truncate the file, which fails with ENOSPC, which is not good.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
After commit a97c9bf33f (fix cramfs
making duplicate entries in inode cache) in kernel 2.6.14, named-pipe
on cramfs does not work properly.
It seems the commit make all named-pipe on cramfs share their inode
(and named-pipe buffer).
Make ..._test() refuse to merge inodes with ->i_ino == 1, take inode setup
back to get_cramfs_inode() and make ->drop_inode() evict ones with ->i_ino
== 1 immediately.
Reported-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@kernel.org> [2.6.14 and later]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When user calls sys_setpriority(PRIO_PGRP ...) on a NPTL style multi-LWP
process, only the task leader of the process is affected, all other
sibling LWP threads didn't receive the setting. The problem was that the
iterator used in sys_setpriority() only iteartes over one task for each
process, ignoring all other sibling thread.
Introduce a new macro do_each_pid_thread / while_each_pid_thread to walk
each thread of a process. Convert 4 call sites in {set/get}priority and
ioprio_{set/get}.
Signed-off-by: Ken Chen <kenchen@google.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In case the binfmt_misc binary handler is registered *before* the e.g.
script one (when for example being compiled as a module) the following
situation may occur:
1. user launches a script, whose interpreter is a misc binary;
2. the load_misc_binary sets the misc_bang and returns -ENOEVEC,
since the binary is a script;
3. the load_script_binary loads one and calls for search_binary_hander
to run the interpreter;
4. the load_misc_binary is called again, but refuses to load the
binary due to misc_bang bit set.
The fix is to move the misc_bang setting lower - prior to the actual
call to the search_binary_handler.
Caused by the commit 3a2e7f47 (binfmt_misc.c: avoid potential kernel
stack overflow)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Tested-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: <stable@kernel.org> [2.6.26.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There was another FAT BKL conversion deadlock reported by Bart
Trojanowski due to the BKL being used as a recursive lock by FAT, which
was missed because it only triggers with 'sync' (or 'dirsync') mounts.
The recursion worked for the BKL, but after the conversion to lock_super
(which uses a mutex), it just deadlocks.
Thanks to Bart for debugging this and testing the fix. The lock
debugging information from the original report:
=============================================
[ INFO: possible recursive locking detected ]
2.6.27-rc3-bisect-00448-ga7f5aaf #16
---------------------------------------------
mv/4020 is trying to acquire lock:
(&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20
but task is already holding lock:
(&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20
other info that might help us debug this:
3 locks held by mv/4020:
#0: (&sb->s_type->i_mutex_key#9/1){--..}, at: [<c01b2336>] do_unlinkat+0x66/0x140
#1: (&sb->s_type->i_mutex_key#9){--..}, at: [<c01b0954>] vfs_unlink+0x84/0x110
#2: (&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20
stack backtrace:
Pid: 4020, comm: mv Not tainted 2.6.27-rc3-bisect-00448-ga7f5aaf #16
[<c014e694>] validate_chain+0x984/0xea0
[<c0108d70>] ? native_sched_clock+0x0/0xf0
[<c014ee9c>] __lock_acquire+0x2ec/0x9b0
[<c014f5cf>] lock_acquire+0x6f/0x90
[<c01a90fe>] ? lock_super+0x1e/0x20
[<c044e5fd>] mutex_lock_nested+0xad/0x300
[<c01a90fe>] ? lock_super+0x1e/0x20
[<c01a90fe>] ? lock_super+0x1e/0x20
[<c01a90fe>] lock_super+0x1e/0x20
[<f8b3a700>] fat_write_inode+0x60/0x2b0 [fat]
[<c0450878>] ? _spin_unlock_irqrestore+0x48/0x80
[<f8b3a953>] ? fat_sync_inode+0x3/0x20 [fat]
[<f8b3a962>] fat_sync_inode+0x12/0x20 [fat]
[<f8b37c7e>] fat_remove_entries+0xbe/0x120 [fat]
[<f8b422ef>] vfat_unlink+0x5f/0x90 [vfat]
[<f8b42290>] ? vfat_unlink+0x0/0x90 [vfat]
[<c01b0968>] vfs_unlink+0x98/0x110
[<c01b2400>] do_unlinkat+0x130/0x140
[<c016a8f5>] ? audit_syscall_entry+0x105/0x150
[<c01b253b>] sys_unlinkat+0x3b/0x40
[<c01040d3>] sysenter_do_call+0x12/0x3f
=======================
where the deadlock is due to the nesting of lock_super from vfat_unlink
to fat_write_inode:
- do_unlinkat
- vfs_unlink
- vfat_unlink
* lock_super
- fat_remove_entries
- fat_sync_inode
- fat_write_inode
* lock_super
and the fix is to simply remove the use of lock_super() in fat_write_inode.
The lock_super() there had been just an automatic conversion of the
kernel lock to the superblock lock, but no locking was actually needed
there, since the code in fat_write_inode already protected all relevant
accesses with a spinlock (sbi->inode_hash_lock to be exact). The only
code inside the BKL (and thus the superblock lock) was accesses tp local
variables or calls to functions that have long been SMP-safe (i.e.
sb_bread, mark_buffe_dirty and brlese).
Bart reports:
"Looks good. I ran 10 parallel processes creating 1M files truncating
them, writing to them again and then deleting them. This patch fixes
the issue I ran into.
Signed-off-by: Bart Trojanowski <bart@jukie.net>"
Reported-and-tested-by: Bart Trojanowski <bart@jukie.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Properly handle MSKRB5 by passing sec=mskrb5 to the upcall so that the
spengo blob can be generated appropriately. Also, make
decode_negTokenInit prefer whichever mechanism is first in the list.
Needed for some NetApp servers, and possibly some older
versions of Windows which treat the two KRB5 mechanisms differently.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs_setup_session references pSesInfo->server several times. That
pointer shouldn't change during the life of the function so grab it
once and store it in a local var. This makes the code look a little
cleaner too.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
I case we failed to allocate memory for inode when creating it, we did not
properly free block already allocated for this inode. Move memory allocation
before the block allocation which fixes this issue (thanks for the idea go to
Ingo Oeser <ioe-lkml@rameria.de>). Also remove a few superfluous
initializations already done in udf_alloc_inode().
Reviewed-by: Ingo Oeser <ioe-lkml@rameria.de>
Signed-off-by: Jan Kara <jack@suse.cz>
A memory allocation inside alloc_mutex must not recurse back into the
filesystem itself because that leads to lock inversion between iprune_mutex and
alloc_mutex (and thus to deadlocks - see traces below). alloc_mutex is actually
needed only to update allocation statistics in the superblock so we can drop it
before we start allocating memory for the inode.
tar D ffff81015b9c8c90 0 6614 6612
ffff8100d5a21a20 0000000000000086 0000000000000000 00000000ffff0000
ffff81015b9c8c90 ffff81015b8f0cd0 ffff81015b9c8ee0 0000000000000000
0000000000000003 0000000000000000 0000000000000000 0000000000000000
Call Trace:
[<ffffffff803c1d8a>] __mutex_lock_slowpath+0x64/0x9b
[<ffffffff803c1bef>] mutex_lock+0xa/0xb
[<ffffffff8027f8c2>] shrink_icache_memory+0x38/0x200
[<ffffffff80257742>] shrink_slab+0xe3/0x15b
[<ffffffff802579db>] try_to_free_pages+0x221/0x30d
[<ffffffff8025657e>] isolate_pages_global+0x0/0x31
[<ffffffff8025324b>] __alloc_pages_internal+0x252/0x3ab
[<ffffffff8026b08b>] cache_alloc_refill+0x22e/0x47b
[<ffffffff8026ae37>] kmem_cache_alloc+0x3b/0x61
[<ffffffff8026b15b>] cache_alloc_refill+0x2fe/0x47b
[<ffffffff8026b34e>] __kmalloc+0x76/0x9c
[<ffffffffa00751f2>] :udf:udf_new_inode+0x202/0x2e2
[<ffffffffa007ae5e>] :udf:udf_create+0x2f/0x16d
[<ffffffffa0078f27>] :udf:udf_lookup+0xa6/0xad
...
kswapd0 D ffff81015b9d9270 0 125 2
ffff81015b903c28 0000000000000046 ffffffff8028cbb0 00000000fffffffb
ffff81015b9d9270 ffff81015b8f0cd0 ffff81015b9d94c0 000000000271b490
ffffe2000271b458 ffffe2000271b420 ffffe20002728dc8 ffffe20002728d90
Call Trace:
[<ffffffff8028cbb0>] __set_page_dirty+0xeb/0xf5
[<ffffffff8025403a>] get_dirty_limits+0x1d/0x22f
[<ffffffff803c1d8a>] __mutex_lock_slowpath+0x64/0x9b
[<ffffffff803c1bef>] mutex_lock+0xa/0xb
[<ffffffffa0073f58>] :udf:udf_bitmap_free_blocks+0x47/0x1eb
[<ffffffffa007df31>] :udf:udf_discard_prealloc+0xc6/0x172
[<ffffffffa007875a>] :udf:udf_clear_inode+0x1e/0x48
[<ffffffff8027f121>] clear_inode+0x6d/0xc4
[<ffffffff8027f7f2>] dispose_list+0x56/0xee
[<ffffffff8027fa5a>] shrink_icache_memory+0x1d0/0x200
[<ffffffff80257742>] shrink_slab+0xe3/0x15b
[<ffffffff80257e93>] kswapd+0x346/0x447
...
Reported-by: Tibor Tajti <tibor.tajti@gmail.com>
Reviewed-by: Ingo Oeser <ioe-lkml@rameria.de>
Signed-off-by: Jan Kara <jack@suse.cz>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] mount of IPC$ breaks with iget patch
[CIFS] remove trailing whitespace
[CIFS] if get root inode fails during mount, cleanup tree connection
* 'linux-next' of git://git.infradead.org/~dedekind/ubifs-2.6: (29 commits)
UBIFS: xattr bugfixes
UBIFS: remove unneeded check
UBIFS: few commentary fixes
UBIFS: fix budgeting request alignment in xattr code
UBIFS: improve arguments checking in debugging messages
UBIFS: always set i_generation to 0
UBIFS: correct spelling of "thrice".
UBIFS: support splice_write
UBIFS: minor tweaks in commit
UBIFS: reserve more space for index
UBIFS: print pid in dump function
UBIFS: align inode data to eight
UBIFS: improve budgeting checks
UBIFS: correct orphan deletion order
UBIFS: fix typos in comments
UBIFS: do not union creat_sqnum and del_cmtno
UBIFS: optimize deletions
UBIFS: increment commit number earlier
UBIFS: remove another unneeded function parameter
UBIFS: remove unneeded function parameter
...
A fuzzed fileystem image failed with OMFS when the extent count was
used in a loop without being checked against the max number of extents.
It also provoked a signed division for an array index that was checked
as if unsigned, leading to index by -1.
omfsck will be updated to fix these cases, in the meantime bail out
gracefully.
Reported-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
write_cache_pages() uses i_mapping->writeback_index to pick up where it
left off the last time a given inode was found by pdflush or
balance_dirty_pages (or anyone else who sets wbc->range_cyclic)
alloc_inode() should set it to a sane value so that writeback doesn't
start in the middle of a file. It is somewhat difficult to notice the bug
since write_cache_pages will loop around to the start of the file and the
elevator helps hide the resulting seeks.
For whatever reason, Btrfs hits this often. Unpatched, untarring 30
copies of the linux kernel in series runs at 47MB/s on a single sata
drive. With this fix, it jumps to 62MB/s.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xattr code has not been tested for a while and there were
serveral bugs. One of them is using wrong inode in
'ubifs_jnl_change_xattr()'. The other is a deadlock in
'ubifs_setxattr()': the i_mutex is locked in
'cap_inode_need_killpriv()' path, so deadlock happens when
'ubifs_setxattr()' tries to lock it again.
Thanks to Zoltan Sogor for finding these bugs.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
In looking at network named pipe support on cifs, I noticed that
Dave Howell's iget patch:
iget: stop CIFS from using iget() and read_inode()
broke mounts to IPC$ (the interprocess communication share), and don't
handle the error case (when getting info on the root inode fails).
Thanks to Gunter who noted a typo in a debug line in the original
version of this patch.
CC: David Howells <dhowells@redhat.com>
CC: Gunter Kukkukk <linux@kukkukk.com>
CC: Stable Kernel <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The patches that are intended to introduce copy-on-write credentials for 2.6.28
require abstraction of access to some fields of the task structure,
particularly for the case of one task accessing another's credentials where RCU
will have to be observed.
Introduced here are trivial no-op versions of the desired accessors for current
and other tasks so that other subsystems can start to be converted over more
easily.
Wrappers are introduced into a new header (linux/cred.h) for UID/GID,
EUID/EGID, SUID/SGID, FSUID/FSGID, cap_effective and current's subscribed
user_struct. These wrappers are macros because the ordering between header
files mitigates against making them inline functions.
linux/cred.h is #included from linux/sched.h.
Further, XFS is modified such that it no longer defines and uses parameterised
versions of current_fs[ug]id(), thus getting rid of the namespace collision
otherwise incurred.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
* git://oss.sgi.com:8090/xfs/linux-2.6: (45 commits)
[XFS] Fix use after free in xfs_log_done().
[XFS] Make xfs_bmap_*_count_leaves void.
[XFS] Use KM_NOFS for debug trace buffers
[XFS] use KM_MAYFAIL in xfs_mountfs
[XFS] refactor xfs_mount_free
[XFS] don't call xfs_freesb from xfs_unmountfs
[XFS] xfs_unmountfs should return void
[XFS] cleanup xfs_mountfs
[XFS] move root inode IRELE into xfs_unmountfs
[XFS] stop using file_update_time
[XFS] optimize xfs_ichgtime
[XFS] update timestamp in xfs_ialloc manually
[XFS] remove the sema_t from XFS.
[XFS] replace dquot flush semaphore with a completion
[XFS] replace inode flush semaphore with a completion
[XFS] extend completions to provide XFS object flush requirements
[XFS] replace the XFS buf iodone semaphore with a completion
[XFS] clean up stale references to semaphores
[XFS] use get_unaligned_* helpers
[XFS] Fix compile failure in xfs_buf_trace()
...
Add a dlm_ prefix to the struct names in config.c. This resolves a
conflict with struct node in particular, when include/linux/node.h
happens to be included.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Teigland <teigland@redhat.com>
A couple of unlikely error conditions were missing a kfree on the error
exit path.
Reported-by: Juha Leppanen <juha_motorsportcom@luukku.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Commit d70b67c8bc fixed VFS and
it never calls FS lookup function in deleted directories now.
We may remove corresponding UBIFS check.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Data length has to be aligned in the budgeting request. Code
in xattr.c did not do this.
Signed-off-by: Zoltan Sogor <weth@inf.u-szeged.hu>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Use "if (0) printk()" construct in debugging print macros to
make the debugging messages be checked even if debugging is
off.
This patch also removes some unneeded spaces and blank lines.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
UBIFS does not presently re-use inode numbers, so leaving
i_generation zero is most appropriate for now.
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
At the moment UBIFS reserves twice old index size space for the
index. But this is not enough in some cases, because if the indexing
node are very fragmented and there are many small gaps, while the
dirty index has big znodes - in-the-gaps method would fail.
Thus, reserve trise as more, in which case we are guaranteed that
we can commit in any case.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
UBIFS aligns node lengths to 8, so budgeting has to do the
same. Well, direntry, inode, and page budgets are already
aligned, but not inode data budget (e.g., data in special
devices or symlinks). Do this for inode data as well.
Also, add corresponding debugging checks.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Budgeting is a crucial UBIFS subsystem - add more assertions
to improve requests checking. This is not compiled in when
UBIFS debugging is disabled.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The debug function that checks orphans, does so using the
TNC mutex. That means it will not see a correct picture
if the inode is removed from the orphan tree before it is
removed from TNC.
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
The values in these two fields need to be preserved independently
and so a union cannot be used.
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Every time anything is deleted, UBIFS writes the deletion inode
node twice - once in 'ubifs_jnl_update()' and the second time in
'ubifs_jnl_write_inode()'. However, the second write is not needed
if no commit happened after 'ubifs_jnl_update()'. This patch
checks that condition and avoids writing the deletion inode for
the second time.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Increment the commit number at the beginnig of the commit, instead
of doing this after the commit. This is needed for further
optimizations.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The 'last_reference' parameter of 'pack_inode()' is not really
needed because 'inode->i_nlink' may be tested instead. Zap it.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Simplify 'ubifs_jnl_write_inode()' by removing the 'deletion'
parameter which is not really needed because we may test
inode->i_nlink and check whether this is a deletion or not.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Orphan inodes are deleted inodes which will disappear after FS
re-mount. There is not need to write orphan inodes back, because
they are not needed on the flash media.
So optimize orphans a little by not writing them back. Just mark
them as clean, free the budget, and report success to VFS.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
We use ubifs_ro_mode() quite a lot, and not in fast-path, so
there is no reason to blow the code up by having it inlined.
Also, we usually want R/O mode change to be seen to other
CPUs as soon as possible, so when we make this a function
call, we will automatically have a memory barrier.
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
UBI transparently handles write errors by automatically copying
and remapping the affected eraseblock. If UBI is unable to do
that, for example its pool of eraseblocks reserved for bad block
handling is empty, then the error is propagated to UBIFS. UBIFS
must protect the media from falling into an inconsistent state
by immediately switching to read-only mode. In the case of log
updates, this was not being done.
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
UBIFS recovery testing debug facility simulates media failures.
When simulating an IO error, the error code returned must be
-EIO but it was not always if the user switched off the
debug recovery testing option at the same time.
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Although the inode is marked as clean when it is being deleted,
it might stay and be used as orphan, and be marked as dirty.
So we have to free the budget when we delete it.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The 'ubifs_release_dirty_inode_budget()' was buggy and incorrectly
freed the budget, which led to not freeing all dirty data budget.
This patch fixes that.
Also, this patch fixes ubifs_mkdir() which passed 1 in dirty_ino_d,
which makes no sense. Well, it is harmless though.
Also, add few more useful assertions. And improve few debugging
messages.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
We encouredge people to mount using volume name, not device
numbers. So print the name of the mounted UBI volume, not just
IDs.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The ticket allocation code got reworked in 2.6.26 and we now free tickets
whereas before we used to cache them so the use-after-free went
undetected.
SGI-PV: 985525
SGI-Modid: xfs-linux-melb:xfs-kern:31877a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Use KM_NOFS to prevent recursion back into the filesystem which can cause
deadlocks.
In the case of xfs_iread() we hold the lock on the inode cluster buffer
while allocating memory for the trace buffers. If we recurse back into XFS
to flush data that may require a transaction to allocate extents which
needs log space. This can deadlock with the xfsaild thread which can't
push the tail of the log because it is trying to get the inode cluster
buffer lock.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31838a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Use KM_MAYFAIL for the m_perag allocation, we can deal with the error
easily and blocking forever during mount is not a good idea either.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31837a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
xfs_mount_free mostly frees the perag data, which is something that is
duplicated in the mount error path.
Move the XFS_QM_DONE call to the caller and remove the useless
mutex_destroy/spinlock_destroy calls so that we can re-use it for the
mount error path. Also rename it to xfs_free_perag to reflect what it
does.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31836a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
xfs_readsb is called before xfs_mount so xfs_freesb should be called after
xfs_unmountfs, too. This means it now happens after a few things during
the of xfs_unmount which all have nothing to do with the superblock.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31835a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
xfs_unmounts can't and shouldn't return errors so declare it as returning
void.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31833a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Remove all the useless flags and code keyed off it in xfs_mountfs.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31831a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
The root inode is allocated in xfs_mountfs so it should be release in
xfs_unmountfs. For the unmount case that means we do it after the the
xfs_sync(mp, SYNC_WAIT | SYNC_CLOSE) in the forced shutdown case and the
dmapi unmount event. Note that both reference the rip variable which might
be freed by that time in case inode flushing has kicked in, so strictly
speaking this might count as a bug fix
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31830a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
xfs_ichtime updates the xfs_inode and Linux inode timestamps just fine, no
need to call file_update_time and then copy the values over to the XFS
inode. The only additional thing in file_update_time are checks not
applicable to the write path.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31829a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Port a little optmization from file_update_time to xfs_ichgtime, and only
update the timestamp and mark the inode dirty if the timestamp actually
changes in the timer tick resultion supported by the running kernel.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31827a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
In xfs_ialloc we just want to set all timestamps to the current time. We
don't need to mark the inode dirty like xfs_ichgtime does, and we don't
need nor want the opimizations in xfs_ichgtime that I will introduce in
the next patch.
So just opencode the timestamp update in xfs_ialloc, and remove the new
unused XFS_ICHGTIME_ACC case in xfs_ichgtime.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31825a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Now that all users of the sema_t are gone from XFS we can finally kill it.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31823a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Use the new completion flush code to implement the dquot flush lock.
Removes one of the final users of semaphores in the XFS code base.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31822a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Use the new completion flush code to implement the inode flush lock.
Removes one of the final users of semaphores in the XFS code base.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31817a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
The xfs_buf_t b_iodonesema is really just a semaphore that wants to be a
completion. Change it to a completion and remove the last user of the
sema_t from XFS.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31815a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
A lot of code has been converted away from semaphores, but there are still
comments that reference semaphore behaviour. The log code is the worst
offender. Update the comments to reflect what the code really does now.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31814a
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
The alloc and inobt btree use the same agbp/agno pair in the btree_cur
union. Make them use the same bc_private.a union member so that code for
these two short form btree implementations can be shared.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31788a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Sanitize setting up the Linux indode.
Setting up the xfs_inode <-> inode link is opencoded in xfs_iget_core now
because that's the only place it needs to be done, xfs_initialize_vnode is
renamed to xfs_setup_inode and loses all superflous paramaters. The check
for I_NEW is removed because it always is true and the di_mode check moves
into xfs_iget_core because it's only needed there.
xfs_set_inodeops and xfs_revalidate_inode are merged into xfs_setup_inode
and the whole things is moved into xfs_iops.c where it belongs.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31782a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
All remaining bhv_vnode_t instance are in code that's more or less Linux
specific. (Well, for xfs_acl.c that could be argued, but that code is on
the removal list, too). So just do an s/bhv_vnode_t/struct inode/ over the
whole tree. We can clean up variable naming and some useless helpers
later.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31781a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
In various places we can just move a VFS_I call into the argument list of
called functions/macros instead of having a local bhv_vnode_t.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31776a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
When multiple inodes are locked in XFS it happens in order of the inode
number, with the everything but the first inode trylocked if any of the
previous inodes is in the AIL.
Except for the sorting of the inodes this logic is implemented in
xfs_lock_inodes, but also partially duplicated in xfs_lock_dir_and_entry
in a particularly stupid way adds a lock roundtrip if the inode ordering
is not optimal.
This patch adds a new helper xfs_lock_two_inodes that takes two inodes and
locks them in the most optimal way according to the above locking protocol
and uses it for all places that want to lock two inodes.
The only caller of xfs_lock_inodes is xfs_rename which might lock up to
four inodes.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31772a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
All the error injection is already enabled through ifdef DEBUG, so kill
the never set second cpp symbol to activate it without the rest of the
debugging infrastructure.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31771a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Now that all direct calls to VN_HOLD/VN_RELE are gone we can implement
IHOLD/IRELE directly.
For the IHOLD case also replace igrab with a direct increment of i_count
because we are guaranteed to already have a live and referenced inode by
the VFS. Also remove the vn_hold statistic because it's been rather
meaningless for some time with most references done by other callers.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31764a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
All the ACL routines are called from inode operations which are guaranteed
to have a referenced inode by the VFS, so there's no need for the ACL code
to grab another temporary one.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31763a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
bhv_vnode_t is just a typedef for struct inode, so there's
no need for a helper to convert between the two.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31761a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
bhv_vnode_t is just a typedef for struct inode, so there's
no need for a helper to convert between the two.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31760a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Looks like somehow xfs got missed in the conversion that took place in
e231c2ee64, "Convert ERR_PTR(PTR_ERR(p))
instances to ERR_CAST(p)
<http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit
diff;h=e231c2ee64eb1c5cd3c63c31da9dac7d888dcf7f>"
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31757a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Thanks to hch's endian work, INT_GET etc are no longer used, and may as
well be removed. INT_SET is still used in the acl code, though.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31756a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Move it from the attr code to the transaction code and make
the attr code call the new function.
We rolltrans is really usefull whenever we want to use rolling
transaction, should be generic, it isn't dependent on any part
of the attr code anyway.
We use this excuse to change all the:
if ((error = xfs_attr_rolltrans()))
calls into:
error = xfs_trans_roll();
if (error)
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31729a
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Add a helper to free the m_fsname/m_rtname/m_logname allocations and use
it properly for all mount failure cases. Also switch the allocations for
these to kstrdup while we're at it.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31728a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
We will need that to be able to calculate the size of log we need for a
specific attr (for Create+EA). The local flag is needed so that we can
fail if we run into ENOSPC when trying to alloc blocks.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31727a
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>