The function tracer had two different versions of function tracing.
The disabling of irqs version and the preempt disable version.
As function tracing in very intrusive and can cause nasty recursion
issues, it has its own recursion protection. But the old method to
do this was a flat layer. If it detected that a recursion was happening
then it would just return without recording.
This made the preempt version (much faster than the irq disabling one)
not very useful, because if an interrupt were to occur after the
recursion flag was set, the interrupt would not be traced at all,
because every function that was traced would think it recursed on
itself (due to the context it preempted setting the recursive flag).
Now that we have a recursion flag for every context level, we
no longer need to worry about that. We can disable preemption,
set the current context recursion check bit, and go on. If an
interrupt were to come along, it would check its own context bit
and happily continue to trace.
As the preempt version is faster than the irq disable version,
there's no more reason to keep the preempt version around.
And the irq disable version still had an issue with missing
out on tracing NMI code.
Remove the irq disable function tracer version and have the
preempt disable version be the default (and only version).
Before this patch we had from running:
# echo function > /debug/tracing/current_tracer
# for i in `seq 10`; do ./hackbench 50; done
Time: 12.028
Time: 11.945
Time: 11.925
Time: 11.964
Time: 12.002
Time: 11.910
Time: 11.944
Time: 11.929
Time: 11.941
Time: 11.924
(average: 11.9512)
Now we have:
# echo function > /debug/tracing/current_tracer
# for i in `seq 10`; do ./hackbench 50; done
Time: 10.285
Time: 10.407
Time: 10.243
Time: 10.372
Time: 10.380
Time: 10.198
Time: 10.272
Time: 10.354
Time: 10.248
Time: 10.253
(average: 10.3012)
a 13.8% savings!
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When function tracing occurs, the following steps are made:
If arch does not support a ftrace feature:
call internal function (uses INTERNAL bits) which calls...
If callback is registered to the "global" list, the list
function is called and recursion checks the GLOBAL bits.
then this function calls...
The function callback, which can use the FTRACE bits to
check for recursion.
Now if the arch does not suppport a feature, and it calls
the global list function which calls the ftrace callback
all three of these steps will do a recursion protection.
There's no reason to do one if the previous caller already
did. The recursion that we are protecting against will
go through the same steps again.
To prevent the multiple recursion checks, if a recursion
bit is set that is higher than the MAX bit of the current
check, then we know that the check was made by the previous
caller, and we can skip the current check.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently for recursion checking in the function tracer, ftrace
tests a task_struct bit to determine if the function tracer had
recursed or not. If it has, then it will will return without going
further.
But this leads to races. If an interrupt came in after the bit
was set, the functions being traced would see that bit set and
think that the function tracer recursed on itself, and would return.
Instead add a bit for each context (normal, softirq, irq and nmi).
A check of which context the task is in is made before testing the
associated bit. Now if an interrupt preempts the function tracer
after the previous context has been set, the interrupt functions
can still be traced.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
There is lots of places that perform:
op = rcu_dereference_raw(ftrace_control_list);
while (op != &ftrace_list_end) {
Add a helper macro to do this, and also optimize for a single
entity. That is, gcc will optimize a loop for either no iterations
or more than one iteration. But usually only a single callback
is registered to the function tracer, thus the optimized case
should be a single pass. to do this we now do:
op = rcu_dereference_raw(list);
do {
[...]
} while (likely(op = rcu_dereference_raw((op)->next)) &&
unlikely((op) != &ftrace_list_end));
An op is always registered (ftrace_list_end when no callbacks is
registered), thus when a single callback is registered, the link
list looks like:
top => callback => ftrace_list_end => NULL.
The likely(op = op->next) still must be performed due to the race
of removing the callback, where the first op assignment could
equal ftrace_list_end. In that case, the op->next would be NULL.
But this is unlikely (only happens in a race condition when
removing the callback).
But it is very likely that the next op would be ftrace_list_end,
unless more than one callback has been registered. This tells
gcc what the most common case is and makes the fast path with
the least amount of branches.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The function tracing recursion self test should not crash
the machine if the resursion test fails. If it detects that
the function tracing is recursing when it should not be, then
bail, don't go into an infinite recursive loop.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
If one of the function tracers set by the global ops is not recursion
safe, it can still be called directly without the added recursion
supplied by the ftrace infrastructure.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The test that checks function recursion does things differently
if the arch does not support all ftrace features. But that really
doesn't make a difference with how the test runs, and either way
the count variable should be 2 at the end.
Currently the test wrongly fails for archs that don't support all
the ftrace features.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
There's a race condition between the setting of a new tracer and
the update of the max trace buffers (the swap). When a new tracer
is added, it sets current_trace to nop_trace before disabling
the old tracer. At this moment, if the old tracer uses update_max_tr(),
the update may trigger the warning against !current_trace->use_max-tr,
as nop_trace doesn't have that set.
As update_max_tr() requires that interrupts be disabled, we can
add a check to see if current_trace == nop_trace and bail if it
does. Then when disabling the current_trace, set it to nop_trace
and run synchronize_sched(). This will make sure all calls to
update_max_tr() have completed (it was called with interrupts disabled).
As a clean up, this commit also removes shrinking and recreating
the max_tr buffer if the old and new tracers both have use_max_tr set.
The old way use to always shrink the buffer, and then expand it
for the next tracer. This is a waste of time.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
As trace_clock is used by other things besides tracing, and it
does not require anything from trace.h, it is best not to include
the header file in trace_clock.c.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Due to a userspace issue with PowerTop v2beta, which hardcoded
the offset of event fields that it was using, it broke when
we removed the Big Kernel Lock counter from the event header.
(commit e6e1e2593 "tracing: Remove lock_depth from event entry")
Because this broke userspace, it was determined that we must
keep those 4 bytes around.
(commit a3a4a5acd "Regression: partial revert "tracing: Remove lock_depth from event entry"")
This unfortunately wastes space in the ring buffer. 4 bytes per
event, where a lot of events are just 24 bytes. That's 16% of the
buffer wasted. A million events will add 4 megs of white space
into the buffer.
It was later noticed that PowerTop v2beta could not work on systems
where the kernel was 64 bit but the userspace was 32 bits.
The reason was because the offsets are different between the
two and the hard coded offset of one would not work with the other.
With PowerTop v2 final, it implemented the same interface that both
perf and trace-cmd use. That is, it reads the format file of
the event to find the offsets of the fields it needs. This fixes
the problem with running powertop on a 32 bit userspace running
on a 64 bit kernel. It also no longer requires the 4 byte padding.
As PowerTop v2 has been out for a while, and is included in all
major distributions, it is time that we can safely remove the
4 bytes of padding. Users of PowerTop v2beta should upgrade to
PowerTop v2 final.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Move SAVE_REGS support flag into Kconfig and rename
it to CONFIG_DYNAMIC_FTRACE_WITH_REGS. This also introduces
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS which indicates
the architecture depending part of ftrace has a code
that saves full registers.
On the other hand, CONFIG_DYNAMIC_FTRACE_WITH_REGS indicates
the code is enabled.
Link: http://lkml.kernel.org/r/20120928081516.3560.72534.stgit@ltc138.sdl.hitachi.co.jp
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add the file max_graph_depth to the debug tracing directory that lets
the user define the depth of the function graph.
A very useful operation is to set the depth to 1. Then it traces only
the first function that is called when entering the kernel. This can
be used to determine what system operations interrupt a process.
For example, to work on NOHZ processes (single tasks running without
a timer tick), if any interrupt goes off and preempts that task, this
code will show it happening.
# cd /sys/kernel/debug/tracing
# echo 1 > max_graph_depth
# echo function_graph > current_tracer
# cat per_cpu/cpu/<cpu-of-process>/trace
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When function tracing with either debug locks enabled or tracing
preempt disabled, the add_preempt_count() is traced. This is an
issue with lockdep and function tracing. As function tracing
can disable interrupts, and lockdep records that change,
lockdep may not be able to handle this recursion if it happens from
an NMI context.
The first thing that an NMI does is:
#define nmi_enter() \
do { \
ftrace_nmi_enter(); \
BUG_ON(in_nmi()); \
add_preempt_count(NMI_OFFSET + HARDIRQ_OFFSET); \
lockdep_off(); \
rcu_nmi_enter(); \
trace_hardirq_enter(); \
} while (0)
When the add_preempt_count() is traced, and the tracing callback
disables interrupts, it will jump into the lockdep code. There's
some places in lockdep that can't handle this re-entrance, and
causes lockdep to fail.
As the lockdep_off() (and lockdep_on) is a simple:
void lockdep_off(void)
{
current->lockdep_recursion++;
}
and is never traced, it can be called first in nmi_enter()
and lockdep_on() last in nmi_exit().
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
There's now a check in tracing_reset_online_cpus() if the buffer is
allocated or NULL. No need to do a check before calling it with max_tr.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
max_tr->buffer could be NULL in the tracing_reset{_online_cpus}. In this
case, a NULL pointer dereference happens, so we should return immediately
from these functions.
Note, the current code does not call tracing_reset*() with max_tr when
its buffer is NULL, but future code will. This patch is needed to prevent
the future code from crashing.
Link: http://lkml.kernel.org/r/20121219070234.31200.93863.stgit@liselsia
Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Some functions in the syscall tracing is used only locally to
the file, but they are labeled global. Convert them to static functions.
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Without this patch, we can register a uprobe event for a directory.
Enabling such a uprobe event would anyway fail.
Example:
$ echo 'p /bin:0x4245c0' > /sys/kernel/debug/tracing/uprobe_events
However dirctories cannot be valid targets for uprobe.
Hence verify if the target is a regular file during the probe
registration.
Link: http://lkml.kernel.org/r/20130103004212.690763002@goodmis.org
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jovi Zhang <bookjovi@gmail.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
[ cleaned up whitespace and removed redundant IS_DIR() check ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
typeof(&buffer) is a pointer to array of 1024 char, or char (*)[1024].
But, typeof(&buffer[0]) is a pointer to char which match the return type of get_trace_buf().
As well-known, the value of &buffer is equal to &buffer[0].
so return this_cpu_ptr(&percpu_buffer->buffer[0]) can avoid type cast.
Link: http://lkml.kernel.org/r/50A1A800.3020102@gmail.com
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Shan Wei <davidshan@tencent.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The original ring-buffer code had special checks at the start
of rb_advance_iter() and instead of repeating them again at the
end of the function if a certain condition existed, I just did
a recursive call to rb_advance_iter() because the special condition
would cause rb_advance_iter() to return early (after the checks).
But as things have changed, the special checks no longer exist
and the only thing done for the special_condition is to call
rb_inc_iter() and return. Instead of doing a confusing recursive call,
just call rb_inc_iter instead.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Sparse complains when is_signed_type() is used on a pointer.
This macro is needed for the format output used for ftrace
and perf, to know if a binary field is a signed type or not.
The is_signed_type() macro is used against all fields that are
recorded by events to automate the operation.
The problem sparse has is with the current way is_signed_type()
works:
((type)-1 < 0)
If "type" is a poiner, than sparse does not like it being compared
to an integer (zero). The simple fix is to just give zero the
same type. The runtime result stays the same.
Reported-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
If some other kernel subsystem has a module notifier, and adds a kprobe
to a ftrace mcount point (now that kprobes work on ftrace points),
when the ftrace notifier runs it will fail and disable ftrace, as well
as kprobes that are attached to ftrace points.
Here's the error:
WARNING: at kernel/trace/ftrace.c:1618 ftrace_bug+0x239/0x280()
Hardware name: Bochs
Modules linked in: fat(+) stap_56d28a51b3fe546293ca0700b10bcb29__8059(F) nfsv4 auth_rpcgss nfs dns_resolver fscache xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack lockd sunrpc ppdev parport_pc parport microcode virtio_net i2c_piix4 drm_kms_helper ttm drm i2c_core [last unloaded: bid_shared]
Pid: 8068, comm: modprobe Tainted: GF 3.7.0-0.rc8.git0.1.fc19.x86_64 #1
Call Trace:
[<ffffffff8105e70f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff81134106>] ? __probe_kernel_read+0x46/0x70
[<ffffffffa0180000>] ? 0xffffffffa017ffff
[<ffffffffa0180000>] ? 0xffffffffa017ffff
[<ffffffff8105e76a>] warn_slowpath_null+0x1a/0x20
[<ffffffff810fd189>] ftrace_bug+0x239/0x280
[<ffffffff810fd626>] ftrace_process_locs+0x376/0x520
[<ffffffff810fefb7>] ftrace_module_notify+0x47/0x50
[<ffffffff8163912d>] notifier_call_chain+0x4d/0x70
[<ffffffff810882f8>] __blocking_notifier_call_chain+0x58/0x80
[<ffffffff81088336>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff810c2a23>] sys_init_module+0x73/0x220
[<ffffffff8163d719>] system_call_fastpath+0x16/0x1b
---[ end trace 9ef46351e53bbf80 ]---
ftrace failed to modify [<ffffffffa0180000>] init_once+0x0/0x20 [fat]
actual: cc:bb:d2:4b:e1
A kprobe was added to the init_once() function in the fat module on load.
But this happened before ftrace could have touched the code. As ftrace
didn't run yet, the kprobe system had no idea it was a ftrace point and
simply added a breakpoint to the code (0xcc in the cc:bb:d2:4b:e1).
Then when ftrace went to modify the location from a call to mcount/fentry
into a nop, it didn't see a call op, but instead it saw the breakpoint op
and not knowing what to do with it, ftrace shut itself down.
The solution is to simply give the ftrace module notifier the max priority.
This should have been done regardless, as the core code ftrace modification
also happens very early on in boot up. This makes the module modification
closer to core modification.
Link: http://lkml.kernel.org/r/20130107140333.593683061@goodmis.org
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reported-by: Frank Ch. Eigler <fche@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Pull more s390 patches from Martin Schwidefsky:
"A couple of bug fixes: one of the transparent huge page primitives is
broken, the sched_clock function overflows after 417 days, the XFS
module has grown too large for -fpic and the new pci code has broken
normal channel subsystem notifications."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/chsc: fix SEI usage
s390/time: fix sched_clock() overflow
s390: use -fPIC for module compile
s390/mm: fix pmd_pfn() for thp
- fix(es) for compound buffers
- fix for dquot soft timer asserts due to overflow of d_blk_softlimit
- fix for regression in dir v2 code introduced in commit 20f7e9f3
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJQ9zKnAAoJENaLyazVq6ZORGcP/RemqCHJEw0a89Y0tLLLAcz/
Es97kJMESdvi3gX3JTdz3vC8LP21dSCR3k3MvVgucb8RsvGoiLixrmluIRxKb79M
DEmz9YJ/qxFIpnM9y46VxCYV+/ezxUDEv68wA6T2wJbof26nTLlTj2gAgqjvyWiF
R1c1OmdCsTfA257UvxfxSVixVnWv7E2io2ZXUGsrBkP4J9OMaMtn00UYOuP1YL8S
NJ44z9QAzTqVEbAfGeaeV/QVUJzMj/IqWCwF7YKEhfmccO/tPyN0+nMG2DI0Fp5e
cYGsi4JnaFbqE6Aa/7mu3kv8lYnPe0n3t9d3EwzxOEx+PAvuY8N0EW8Qa4c+805n
zXFvAroLgP0jYEEuIfEGYIwDPxG0xjor6ztu8e2twcIj6cDHzSpeYaDPnYvWJlwu
FiupnVu+3FX6mVY1jCealI47nOwzM12R7nXysqF3F6Sf95xGJtG3BoTIKioNqk1g
dzJGMQvwg/WLvquYb9W/ZNb1T314R23wdYtmI7gWJ74z9IQqWCZBWFYyBhQ8y1Pr
Vf3LFjzqNqqnYNzoe8Wnn9wKQ57Es7onAo34Y9HZCOkslZsn5nKriNTXNN6Q9Upc
5RKvj1CbTpKAJYrrhWryI1HtlDKqqtMFdmRQulSu+O9ZJuWZh4XNTu4t3oewt0Ac
5otZwOdk53V3tGxt3prs
=gA4q
-----END PGP SIGNATURE-----
Merge tag 'for-linus-v3.8-rc4' of git://oss.sgi.com/xfs/xfs
Pull xfs bugfixes from Ben Myers:
- fix(es) for compound buffers
- fix for dquot soft timer asserts due to overflow of d_blk_softlimit
- fix for regression in dir v2 code introduced in commit 20f7e9f372
("xfs: factor dir2 block read operations")
* tag 'for-linus-v3.8-rc4' of git://oss.sgi.com/xfs/xfs:
xfs: recalculate leaf entry pointer after compacting a dir2 block
xfs: remove int casts from debug dquot soft limit timer asserts
xfs: fix the multi-segment log buffer format
xfs: fix segment in xfs_buf_item_format_segment
xfs: rename bli_format to avoid confusion with bli_formats
xfs: use b_maps[] for discontiguous buffers
* cpuidle initialization regression fix from Krzysztof Mazur.
* cpuidle fix for power usage fields handling from Daniel Lezcano.
* ACPI build fix from Yinghai Lu.
-
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJQ9xwEAAoJEKhOf7ml8uNsaUEP/iwVRuWSPqEzzl++mLBe8uf5
vP1+72Ko5NBPG56uqQMCanuB6M9YsIRr1yv4SSYIF15K4DKbYfpXMvR6yoZox3CA
Y+vrlA62AYOBsX3wOHo+5JVtBdV82IZOBXYhy9hNcxIVzh0NiAWtyz2QxlNIz7I1
9R33HEfIKwi4L2SSiXBqLEMuz0JKie131FunBwvHEtZ4QTq2OFxmCWxfaFz0syvH
9NZfOnh2ijiGb0ou3FTAXLqbEJHJUIhYzZnejobrxFCJmhA+hfsmxRnokrRdLZJ+
14lOpdBQJas06QePs+hadWwLrebjvio+CTb8w0Fhclt5O2fqgMG2jdwO+f4pEWA9
E7DBo0LJCKoDPofsnAXYjoOI3r9EL6o0fhhMzIrZdZazEFOj8WP+EoK7/nG2KRq2
eIO4Lv0sfKmlnJriUUzhEjdkLql0ctLBGZk8T+x/o8WQMPYUw6AnNf1+voEvLTPQ
C2/yyzs+1bPzFj0/0qsvUx5ee6xNgT3p/+YaQW89RlTibW91LN1m5ezNtAF5atEk
K9va5y1w54molOL/j2U56bP+RrktSTKmrnFHluHWWb9tUVBapOTRrCg03xSgvJOq
PEv5LHUIfjHHl2r7I67/Lf2LJjgvpqO0BfEGgmfCgJE/BUFTmT7S1FYxllaNJVk+
EvdSOXokr52pFltHG5Bl
=4ifX
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-for-3.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
- cpuidle regression fix related to the initialization of state
kobjects from Krzysztof Mazur.
- cpuidle fix removing some not very useful code and making some
user-visible problems go away at the same time. From Daniel Lezcano.
- ACPI build fix from Yinghai Lu.
* tag 'pm+acpi-for-3.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: remove the power_specified field in the driver
ACPI / glue: Fix build with ACPI_GLUE_DEBUG set
cpuidle: fix number of initialized/destroyed states
Dave Jones hit this assert when doing a compile on recent git, with
CONFIG_XFS_DEBUG enabled:
XFS: Assertion failed: (char *)dup - (char *)hdr == be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup)), file: fs/xfs/xfs_dir2_data.c, line: 828
Upon further digging, the tag found by xfs_dir2_data_unused_tag_p(dup)
contained "2" and not the proper offset, and I found that this value was
changed after the memmoves under "Use a stale leaf for our new entry."
in xfs_dir2_block_addname(), i.e.
memmove(&blp[mid + 1], &blp[mid],
(highstale - mid) * sizeof(*blp));
overwrote it.
What has happened is that the previous call to xfs_dir2_block_compact()
has rearranged things; it changes btp->count as well as the
blp array. So after we make that call, we must recalculate the
proper pointer to the leaf entries by making another call to
xfs_dir2_block_leaf_p().
Dave provided a metadump image which led to a simple reproducer
(create a particular filename in the affected directory) and this
resolves the testcase as well as the bug on his live system.
Thanks also to dchinner for looking at this one with me.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Dave Jones <davej@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
The int casts here make it easy to trigger an assert with a large
soft limit. For example, set a >4TB soft limit on an empty volume
to reproduce a (0 > -x) comparison due to an overflow of
d_blk_softlimit.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Per Dave Chinner suggestion, this patch:
1) Corrects the detection of whether a multi-segment buffer is
still tracking data.
2) Clears all the buffer log formats for a multi-segment buffer.
Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Not every segment in a multi-segment buffer is dirty in a
transaction and they will not be outputted. The assert in
xfs_buf_item_format_segment() that checks for the at least
one chunk of data in the segment to be used is not necessary
true for multi-segmented buffers.
Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Rename the bli_format structure to __bli_format to avoid
accidently confusing them with the bli_formats pointer.
Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Commits starting at 77c1a08 introduced a multiple segment support
to xfs_buf. xfs_trans_buf_item_match() could not find a multi-segment
buffer in the transaction because it was looking at the single segment
block number rather than the multi-segment b_maps[0].bm.bn. This
results on a recursive buffer lock that can never be satisfied.
This patch:
1) Changed the remaining b_map accesses to be b_maps[0] accesses.
2) Renames the single segment b_map structure to __b_map to avoid
future confusion.
Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
In commit 281dc5c5ec ("Give up on pushing CC_OPTIMIZE_FOR_SIZE") we
already changed the actual default value, but the help-text still
suggested 'y'. Fix the help text too, for all the same reasons.
Sadly, -Os keeps on generating some very suboptimal code for certain
cases, to the point where any I$ miss upside is swamped by the downside.
The main ones are:
- using "rep movsb" for memcpy, even on CPU's where that is
horrendously bad for performance.
- not honoring branch prediction information, so any I$ footprint you
win from smaller code, you lose from less code density in the I$.
- using divide instructions when that is very expensive.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the build error:
drivers/built-in.o: In function `twl_probe':
drivers/mfd/twl-core.c:1256: undefined reference to `devm_regmap_init_i2c'
make: *** [vmlinux] Error 1
Signed-off-by: liu chuansheng <chuansheng.liu@intel.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
[ Samuel is busy, taking it directly - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ We should make fun of people who can't speel too, but then we'd have
no time for any real work at all - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 1b963c81b1 ("lockdep, rwsem: provide down_write_nest_lock()")
contains a bug in a codepath when CONFIG_DEBUG_LOCK_ALLOC is disabled,
which causes down_read() to be called instead of down_write() by mistake
on such configurations. Fix that.
Reported-and-tested-by: Andrew Clayton <andrew@digital-domain.net>
Reported-and-tested-by: Zlatko Calusic <zlatko.calusic@iskon.hr>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yet a few more fixes popped up in this week.
The biggest change here is the addition of pinctrl support for Atmel,
which turned out to be almost mandatory to make things working.
The rest are a few fixes for M-Audio usb-audio device and a fix for
regression of HD-audio HDMI codecs with alsactl in the recent kernel.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJQ9nYgAAoJEGwxgFQ9KSmkM84QAKWlUp8NFsr5HNNiwj6urp18
jhPoM4AbMozeb5abfZpWwwalAVhbq/E5R2w2z8ETdnMnd1ohKqhU5Mx/e0mmUprF
3bZoxm8etTFfqallxPBBTj9exF8iAdA/XPNe5Zw2r1jY7w3viZiQYCgivB1TTSOG
wt0Y5SF0FmawyHgqujqEjo4nm/K04Rp4FPS4/MpdjRXCfzmW+x9nP6CBbdDxGk5J
q8v48mOhk7RTBrRCmfCF0Jw/eJNrS9JYL2RagEaKuPFoy5OEm06OwQZZ76mt3XTF
8S7ExCwfmvbzHW8mIKE3ZFLLDXjWgjxh3jQXeULOAYnPrfe4SHTkUF7mCdmHdbG2
sDTh86C3R784aIwhusXPAZGyVZJ7km+wqFPZa+20Jzbo848PBNgDotlRgmULSqlo
cK8Bsuf5QyRmdpVVON58Owo3Mqorp0EtPiFbfwljkr98JsUQrRX5COaAZ+UHmd2i
18fK0rltPhmJkKwKEAx+0vtqcucoAfvxiS1DSNsjafxDXTy1XJYQ/HmmSUeq1uD/
i1b2kN1yzQQ/Kki7dW9YhekoF5WYyzRP0OoPO73ekSioaCimTwDOo7IF3RwbVfQM
G6eiwLkNpA6BWi3V/q3Cic+eKN/NguM9UlZEKYlCpZq01pMLndreG8MNbpub/O3F
97TzflJSAyIGCShKZH6K
=Sd6n
-----END PGP SIGNATURE-----
Merge tag 'sound-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull second round of sound fixes from Takashi Iwai:
"Yet a few more fixes popped up in this week.
The biggest change here is the addition of pinctrl support for Atmel,
which turned out to be almost mandatory to make things working.
The rest are a few fixes for M-Audio usb-audio device and a fix for
regression of HD-audio HDMI codecs with alsactl in the recent kernel."
* tag 'sound-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/hdmi - Work around "alsactl restore" errors
ALSA: usb-audio: selector map for M-Audio FT C400
ALSA: usb-audio: M-Audio FT C400 skip packet quirk
ALSA: usb-audio: correct M-Audio C400 clock source quirk
ALSA: usb - fix race in creation of M-Audio Fast track pro driver
ASoC: atmel-ssc: add pinctrl selection to driver
ARM: at91/dts: add pinctrl support for SSC peripheral
Pull scsi target fixes from Nicholas Bellinger:
"This includes an important >= v3.6 regression bugfix for active I/O
shutdown (Roland), some TMR related failure / corner cases fixes for
long outstanding I/O (Roland), two FCoE target mode fabric fabric role
fixes (MDR), a fix for an incorrect sense code during LUN
communication failure (Dr. Hannes), plus a handful of other minor
fixes.
There are still some outstanding zero-length control CDB regression
fixes that need to be addressed for v3.8, that will be coming in a
follow-up PULL request."
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
iscsi-target: Fix CmdSN comparison (use cmd->cmd_sn instead of cmd->stat_sn)
target: Release se_cmd when LUN lookup fails for TMR
target: Fix use-after-free in LUN RESET handling
target: Fix missing CMD_T_ACTIVE bit regression for pending WRITEs
tcm_fc: Do not report target role when target is not defined
tcm_fc: Do not indicate retry capability to initiators
target: Use TCM_NO_SENSE for initialisation
target: Introduce TCM_NO_SENSE
target: use correct sense code for LUN communication failure
Pull ext3 and udf fixes from Jan Kara:
"One ext3 performance regression fix and one udf regression fix (oops
on interrupted mount)."
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
UDF: Fix a null pointer dereference in udf_sb_free_partitions
jbd: don't wake kjournald unnecessarily
Pull x86 fixes from Peter Anvin:
"This is mainly a workaround for a bug in Sandy Bridge graphics which
causes corruption of certain memory pages."
* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/Sandy Bridge: Sandy Bridge workaround depends on CONFIG_PCI
x86/Sandy Bridge: mark arrays in __init functions as __initconst
x86/Sandy Bridge: reserve pages when integrated graphics is present
x86, efi: correct precedence of operators in setup_efi_pci
Timur Tabi no longer works for Freescale, so update the email address
and status for all of his maintained projects.
Also mark the QE library as orphaned, for lack of interest in
maintaining it.
The CS4270 driver is marked as "Odd Fixes" because appropriate hardware
is no longer available.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the requested firmware file size is 0 bytes in the filesytem, we
will try to vmalloc(0), which causes a warning:
vmalloc: allocation failure: 0 bytes
kworker/1:1: page allocation failure: order:0, mode:0xd2
__vmalloc_node_range+0x164/0x208
__vmalloc_node+0x4c/0x58
vmalloc+0x38/0x44
_request_firmware_load+0x220/0x6b0
request_firmware+0x64/0xc8
wl18xx_setup+0xb4/0x570 [wl18xx]
wlcore_nvs_cb+0x64/0x9f8 [wlcore]
request_firmware_work_func+0x94/0x100
process_one_work+0x1d0/0x750
worker_thread+0x184/0x4ac
kthread+0xb4/0xc0
To fix this, check whether the file size is less than or equal to zero
in fw_read_file_contents().
Cc: stable <stable@vger.kernel.org> [3.7]
Signed-off-by: Luciano Coelho <coelho@ti.com>
Acked-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the default iosched is built as module, the kernel may deadlock
while trying to load the iosched module on device probe if the probing
was running off async. This is because async_synchronize_full() at
the end of module init ends up waiting for the async job which
initiated the module loading.
async A modprobe
1. finds a device
2. registers the block device
3. request_module(default iosched)
4. modprobe in userland
5. load and init module
6. async_synchronize_full()
Async A waits for modprobe to finish in request_module() and modprobe
waits for async A to finish in async_synchronize_full().
Because there's no easy to track dependency once control goes out to
userland, implementing properly nested flushing is difficult. For
now, make module init perform async_synchronize_full() iff module init
has queued async jobs as suggested by Linus.
This avoids the described deadlock because iosched module doesn't use
async and thus wouldn't invoke async_synchronize_full(). This is
hacky and incomplete. It will deadlock if async module loading nests;
however, this works around the known problem case and seems to be the
best of bad options.
For more details, please refer to the following thread.
http://thread.gmane.org/gmane.linux.kernel/1420814
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Alex Riesen <raa.lkml@gmail.com>
Tested-by: Ming Lei <ming.lei@canonical.com>
Tested-by: Alex Riesen <raa.lkml@gmail.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cbc0dd1 "s390/pci: CHSC PCI support for error and availability events"
introduced a new SEI notification type as part of pci support.
The way SEI was called with nt2 and nt0 consecutive broke the nt0
stuff used for channel subsystem notifications.
The reason why this was broken with the mentioned patch is that you
cannot selectively disable type 0 notifications (so even when asked
for type 2 only, type 0 could be presented).
The way to do it is to tell SEI which types of notification you can
process and -this is the important part- look at the SEI result which
notification type you actually received.
Reviewed-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Converting a 64 Bit TOD format value to nanoseconds means that the value
must be divided by 4.096. In order to achieve that we multiply with 125
and divide by 512.
When used within sched_clock() this triggers an overflow after appr.
417 days. Resulting in a sched_clock() return value that is much smaller
than previously and therefore may cause all sort of weird things in
subsystems that rely on a monotonic sched_clock() behaviour.
To fix this implement a tod_to_ns() helper function which converts TOD
values without overflow and call this function from both places that
open coded the conversion: sched_clock() and kvm_s390_handle_wait().
Cc: stable@kernel.org
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>