In e9964c10 we change cap flushing to do a delicate dance because some
inodes on the cap_dirty list could be in a migrating state (got EXPORT but
not IMPORT) in which we couldn't actually flush and move from
dirty->flushing, breaking the while (!empty) { process first } loop
structure. It worked for a single sync thread, but was not reentrant and
triggered infinite loops when multiple syncers came along.
Instead, move inodes with dirty to a separate cap_dirty_migrating list
when in the limbo export-but-no-import state, allowing us to go back to
the simple loop structure (which was reentrant). This is cleaner and more
robust.
Audited the cap_dirty users and this looks fine:
list_empty(&ci->i_dirty_item) is still a reliable indicator of whether we
have dirty caps (which list we're on is irrelevant) and list_del_init()
calls still do the right thing.
Signed-off-by: Sage Weil <sage@newdream.net>
When the cluster is marked full, subscribe to subsequent map updates to
ensure we find out promptly when it is no longer full. This will prevent
us from spewing ENOSPC for (much) longer than necessary.
Signed-off-by: Sage Weil <sage@newdream.net>
Old incrementals encode a 0 value (nearly always) when an osd goes down.
Change that to allow any state bit(s) to be flipped. Special case 0 to
mean flip the CEPH_OSD_UP bit to mimic the old behavior.
Signed-off-by: Sage Weil <sage@newdream.net>
Since we pass the nofail arg, we should never get an error; BUG if we do.
(And fix the function to not return an error if __map_request fails.)
Signed-off-by: Sage Weil <sage@newdream.net>
If we get a WAIT as a client something went wrong; error out. And don't
fall through to an unrelated case.
Signed-off-by: Sage Weil <sage@newdream.net>
Both off and fi->offset are unsigned, so the difference is always >= 0.
Compare them directly instead of the sign of the difference.
Signed-off-by: Sage Weil <sage@newdream.net>
If we grab new_cap, retake the lock, and find we already have a cap now
for the given mds, release new_cap.
Signed-off-by: Sage Weil <sage@newdream.net>
If there is no get_authorizer method we set the out_kvec to a bogus
pointer. The length is also zero in that case, so it doesn't much matter,
but it's better not to add the empty item in the first place.
Signed-off-by: Sage Weil <sage@newdream.net>
If a connection is closed and/or reopened (ceph_con_close, ceph_con_open)
it can race with a callback. con_work does various state checks for
closed or reopened sockets at the beginning, but drops con->mutex before
making callbacks. We need to check for state bit changes after retaking
the lock to ensure we restart con_work and execute those CLOSED/OPENING
tests or else we may end up operating under stale assumptions.
In Jim's case, this was causing 'bad tag' errors.
There are four cases where we re-take the con->mutex inside con_work: catch
them all and return EAGAIN from try_{read,write} so that we can restart
con_work.
Reported-by: Jim Schutt <jaschut@sandia.gov>
Tested-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Sage Weil <sage@newdream.net>
We put ourselves on an inode list for the parent directory of metadata
operations so that an fsync on the directory will wait for metadata updates
to commit to disk. We weren't holding a reference to that directory,
however, and under certain workloads (fsstress in this case) the directory
can go away.
Signed-off-by: Sage Weil <sage@newdream.net>
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
configfs: Fix race between configfs_readdir() and configfs_d_iput()
configfs: Don't try to d_delete() negative dentries.
ocfs2/dlm: Target node death during resource migration leads to thread spin
ocfs2: Skip mount recovery for hard-ro mounts
ocfs2/cluster: Heartbeat mismatch message improved
ocfs2/cluster: Increase the live threshold for global heartbeat
ocfs2/dlm: Use negotiated o2dlm protocol version
ocfs2: skip existing hole when removing the last extent_rec in punching-hole codes.
ocfs2: Initialize data_ac (might be used uninitialized)
* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
drivercore: revert addition of of_match to struct device
of: fix race when matching drivers
Commit b826291c, "drivercore/dt: add a match table pointer to struct
device" added an of_match pointer to struct device to cache the
of_match_table entry discovered at driver match time. This was unsafe
because matching is not an atomic operation with probing a driver. If
two or more drivers are attempted to be matched to a driver at the
same time, then the cached matching entry pointer could get
overwritten.
This patch reverts the of_match cache pointer and reworks all users to
call of_match_device() directly instead.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
If two drivers are probing devices at the same time, both will write
their match table result to the dev->of_match cache at the same time.
Only write the result if the device matches.
In a thread titled "SBus devices sometimes detected, sometimes not",
Meelis reported his SBus hme was not detected about 50% of the time.
From the debug suggested by Grant it was obvious another driver matched
some devices between the call to match the hme and the hme discovery
failling.
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Milton Miller <miltonm@bga.com>
[grant.likely: modified to only call of_match_device() once]
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: don't delay blk_run_queue_async
scsi: remove performance regression due to async queue run
blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup
block: rescan partitions on invalidated devices on -ENOMEDIA too
cdrom: always check_disk_change() on open
block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers
The 'size' variable contains the correct register size for both AR7
and Titan, but we never used it to ioremap the correct register size.
This problem only shows up on Titan.
[ralf@linux-mips.org: Fixed the fix. The original patch as in patchwork
recognizes the problem correctly then fails to fix it ...]
Reported-by: Alexander Clouter <alex@digriz.org.uk>
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Patchwork: https://patchwork.linux-mips.org/patch/2380/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This is the MIPS portion of Joe Perches <joe@perches.com>'s
https://patchwork.linux-mips.org/patch/2172/ which seems to have been
lost in time and space.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
configfs_readdir() will use the existing inode numbers of inodes in the
dcache, but it makes them up for attribute files that aren't currently
instantiated. There is a race where a closing attribute file can be
tearing down at the same time as configfs_readdir() is trying to get its
inode number.
We want to get the inode number of open attribute files, because they
should match while instantiated. We can't lock down the transition
where dentry->d_inode is set to NULL, so we just check for NULL there.
We can, however, ensure that an inode we find isn't iput() in
configfs_d_iput() until after we've accessed it.
Signed-off-by: Joel Becker <jlbec@evilplan.org>
When configfs is faking mkdir() on its subsystem or default group
objects, it starts by adding a negative dentry. It then tries to
instantiate the group. If that should fail, it must clean up after
itself.
I was using d_delete() here, but configfs_attach_group() promises to
return an empty dentry on error. d_delete() explodes with the entry
dentry. Let's try d_drop() instead. The unhashing is what we want for
our dentry.
Signed-off-by: Joel Becker <jlbec@evilplan.org>
Let's check a scenario:
1. blk_delay_queue(q, SCSI_QUEUE_DELAY);
2. blk_run_queue_async();
the second one will became a noop, because q->delay_work already has
WORK_STRUCT_PENDING_BIT set, so the delayed work will still run after
SCSI_QUEUE_DELAY. But blk_run_queue_async actually hopes the delayed
work runs immediately.
Fix this by doing a cancel on potentially pending delayed work
before queuing an immediate run of the workqueue.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf evlist: Fix per thread mmap setup
perf tools: Honour the cpu list parameter when also monitoring a thread list
kprobes, x86: Disable irqs during optimized callback
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: fix cifsConvertToUCS() for the mapchars case
cifs: add fallback in is_path_accessible for old servers
Provide a stub for proc_mkdir_mode() when CONFIG_PROC_FS is not
enabled, just like the stub for proc_mkdir().
Fixes this linux-next build error:
drivers/net/wireless/airo.c:4504: error: implicit declaration of function 'proc_mkdir_mode'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
os_dump_core() uses abort() to terminate UML in case of an fatal error.
glibc's abort() calls raise(SIGABRT) which makes use of tgkill().
tgkill() has no effect within UML's kernel threads because they are not
pthreads. As fallback abort() executes an invalid instruction to
terminate the process. Therefore UML gets killed by SIGSEGV and leaves a
ugly log entry in the host's kernel ring buffer.
To get rid of this we use our own abort routine.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ZONE_CONGESTED should be a state of global memory reclaim. If not, a busy
memcg sets this and give unnecessary throttoling in wait_iff_congested()
against memory recalim in other contexts. This makes system performance
bad.
I'll think about "memcg is congested!" flag is required or not, later.
But this fix is required first.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: Ying Han <yinghan@google.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adding the necessary MODULE_DEVICE_TABLE() information allows the driver
to be automatically loaded by udev.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Shreshtha Kumar SAHU <shreshthakumar.sahu@stericsson.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix switch initialization to ensure that all switches have default routing
disabled. This guarantees that no unexpected RapidIO packets arrive to
the default port set by reset and there is no default routing destination
until it is properly configured by software.
This update also unifies handling of unmapped destinations by tsi57x, IDT
Gen1 and IDT Gen2 switches.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: <stable@kernel.org> [2.6.37+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As Metze pointed out, commit 84cdf74e broke mapchars option:
Commit "cifs: fix unaligned accesses in cifsConvertToUCS"
(84cdf74e80) does multiple steps
in just one commit (moving the function and changing it without
testing).
put_unaligned_le16(temp, &target[j]); is never called for any
codepoint the goes via the 'default' switch statement. As a result
we put just zero (or maybe uninitialized) bytes into the target
buffer.
His proposed patch looks correct, but doesn't apply to the current head
of the tree. This patch should also fix it.
Cc: <stable@kernel.org> # .38.x: 581ade4: cifs: clean up various nits in unicode routines (try #2)
Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The is_path_accessible check uses a QPathInfo call, which isn't
supported by ancient win9x era servers. Fall back to an older
SMBQueryInfo call if it fails with the magic error codes.
Cc: stable@kernel.org
Reported-and-Tested-by: Sandro Bonazzola <sandro.bonazzola@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Trying to enable the local APIC timer on early K8 revisions
uncovers a number of other issues with it, in conjunction with
the C1E enter path on AMD. Fixing those causes much more churn
and troubles than the benefit of using that timer brings so
don't enable it on K8 at all, falling back to the original
functionality the kernel had wrt to that.
Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com>
Cc: Boris Ostrovsky <Boris.Ostrovsky@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Nick Bowler <nbowler@elliptictech.com>
Cc: Joerg-Volker-Peetz <jvpeetz@web.de>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1305636919-31165-3-git-send-email-bp@amd64.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>