in scheduler-intense workloads native_read_tsc() overhead accounts for
20% of the system overhead:
659567 system_call 41222.9375
686796 schedule 435.7843
718382 __switch_to 665.1685
823875 switch_mm 4526.7857
1883122 native_read_tsc 55385.9412
9761990 total 2.8468
this is large part due to the rdtsc_barrier() that is done before
and after reading the TSC.
But sched_clock() is not a precise clock in the GTOD sense, using such
barriers is completely pointless. So remove the barriers and only use
them in vget_cycles().
This improves lat_ctx performance by about 5%.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: change in trace output
Because the trace buffers are per cpu ring buffers, the start of
the trace can be confusing. If one CPU is very active at the
end of the trace, its history will not go as far back as the
other CPU traces. This means that output for a particular CPU
may not appear for the first part of a trace.
To help annotate what is happening, and to prevent any more
confusion, this patch adds a line that annotates the start of
a CPU buffer output.
For example:
automount-3495 [001] 184.596443: dnotify_parent <-vfs_write
[...]
automount-3495 [001] 184.596449: dput <-path_put
automount-3496 [002] 184.596450: down_read_trylock <-do_page_fault
[...]
sshd-3497 [001] 184.597069: up_read <-do_page_fault
<idle>-0 [000] 184.597074: __exit_idle <-exit_idle
[...]
automount-3496 [002] 184.597257: filemap_fault <-__do_fault
<idle>-0 [003] 184.597261: exit_idle <-smp_apic_timer_interrupt
Note, parsers of a trace output should always ignore any lines that
start with a '#'.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: preemptoff not tested in selftest
Due to the BKL not being preemptable anymore, the selftest of the
preemptoff code can not be tested. It requires that it is called
with preemption enabled, but since the BKL is held, that is no
longer the case.
This patch simply skips those tests if it detects that the context
is not preemptable. The following will now show up in the tests:
Testing tracer preemptoff: can not test ... force PASSED
Testing tracer preemptirqsoff: can not test ... force PASSED
When the BKL is removed, or it becomes preemptable once again, then
the tests will be performed.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: add alignment option for recordmcount.pl script
Align the __mcount_loc sections so that architectures with strict
alignment requirements need not worry about performing unaligned
accesses.
This fixes an issue where I was seeing unaligned accesses, which are not
supported on our architecture (the results of an unaligned access are
undefined).
Signed-off-by: Matt Fleming <matthew.fleming@imgtec.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: remove obsolete variable in trace_array structure
With the new start / stop method of ftrace, the ctrl variable
in the trace_array structure is now obsolete. Remove it.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Remove the ctrl_update tracer method
With the new quick start/stop method of tracing, the ctrl_update
method is out of date.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: have the ftrace_printk enabled on startup
It is confusing to have to "echo trace_printk > /debug/tracing/iter_ctrl"
after adding ftrace_printk in the kernel.
Currently the trace_printk is set to off by default. ftrace_printk
should only be in open kernel code when used for debugging, and thus
it should be enabled by default.
It may also be used to record data within a tracer, but those ftrace_printks
should be within wrappers that are either enabled by trace_points or
have a variable protecting the code path from being entered when the
tracer is disabled.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix to irqsoff tracer output
In converting to the new start / stop ftrace handling, the irqsoff
tracer start called the irqsoff reset function. irqsoff tracer is
not the same as the other traces, and it resets the buffers while
searching for the longest latency.
The reset that the irqsoff stop method calls disables the function
tracing. That means that, by starting the tracer, the function
tracer is disabled incorrectly.
This patch simply removes the call to reset which keeps the function
tracing enabled. Reset is not needed for the irqsoff stop method.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix for sched_switch that broke dynamic ftrace startup
The commit: tracing/fastboot: use sched switch tracer from boot tracer
broke the API of the sched_switch trace. The use of the
tracing_start/stop_cmdline record is for only recording the cmdline,
NOT recording the schedule switches themselves.
Seeing that the boot tracer broke the API to do something that it
wanted, this patch adds a new interface for the API while
puting back the original interface of the old API.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: boot tracer startup modified
The boot tracer calls into some of the schedule tracing private functions
that should not be exported. This patch cleans it up, and makes
way for further changes in the ftrace infrastructure.
This patch adds a api to assign a tracer array to the schedule
context switch tracer.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix of output of set_ftrace_filter
Commit ftrace: do not show freed records in available_filter_functions
Removed a bit too much from the set_ftrace_filter code, where we now see
all functions in the set_ftrace_filter file even when we set a filter.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thanks to Randy Dunlap for finding this problem.
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This Kconfig change allows the common 'make allmodconfig' and
'make allyesconfig' build options to skip the staging tree, which is
probably what you want to have happen anyway.
This makes the linux-next developer's life a lot easier so he doesn't
have to worry about changes that break the staging tree, that's for me
to worry about...
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Reserve elfcorehdr memory in CONFIG_CRASH_DUMP
[IA64] fix boot panic caused by offline CPUs
[IA64] reorder Kconfig options to match x86
[IA64] Build VT-D iommu support into generic kernel
[IA64] remove dead BIO_VMERGE_BOUNDARY definition
[IA64] remove duplicated #include from pci-dma.c
[IA64] use common header for software IO/TLB
[IA64] fix the difference between node_mem_map and node_start_pfn
[IA64] Add error_recovery_info field to SAL section header
[IA64] Add UV watchlist support.
[IA64] Simplify SGI uv vs. sn2 driver issues
IA64 kdump kernel failed to initialize /proc/vmcore in 2.6.28-rc2.
A bug was introduced in this patch commit:
d9a9855d0b
always reserve elfcore header memory in crash kernel
The problem was that the call to reserve_elfcorehdr() should be placed
in CONFIG_CRASH_DUMP rather than in CONFIG_CRASH_KERNEL, which does
not exist.
Signed-off-by: Jay Lan <jlan@sgi.com>
Acked-by: Simon Hormon <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: fix range check on mmapped sysfs resource files
PCI: remove excess kernel-doc notation
PCI: annotate return value of pci_ioremap_bar with __iomem
PCI: fix VPD limit quirk for Broadcom 5708S
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, xen: fix use of pgd_page now that it really does return a page
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
xen: make sure stray alias mappings are gone before pinning
vmap: cope with vm_unmap_aliases before vmalloc_init()
Fix the counter overflow check for CPUs with counter width > 32
I had a similar change in a different patch that I didn't submit
and I didn't notice the problem earlier because it was always
tested together.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
This triggers false bug reports as it does a bogus kmalloc with locks held
but is never really compiled into the kernel.
Closes#8329
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As we've lost our trivial maintainer for the moment I'll send this
directly. Only touches a comment
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: add checksum calculation when clearing UNINIT flag in ext4_new_inode
ext4: Mark the buffer_heads as dirty and uptodate after prepare_write
ext4: calculate journal credits correctly
ext4: wait on all pending commits in ext4_sync_fs()
ext4: Convert to host order before using the values.
ext4: fix missing ext4_unlock_group in error path
jbd2: deregister proc on failure in jbd2_journal_init_inode
jbd2: don't give up looking for space so easily in __jbd2_log_wait_for_space
jbd: don't give up looking for space so easily in __log_wait_for_space
Tune SD_MC_INIT the same way as SD_CPU_INIT:
unset SD_BALANCE_NEWIDLE, and set SD_WAKE_BALANCE.
This improves vmark by 5%:
vmark 132102 125968 125497 messages/sec avg 127855.66 .984
vmark 139404 131719 131272 messages/sec avg 134131.66 1.033
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
# *DOCUMENTATION*
When initializing an uninitialized block group in ext4_new_inode(),
its block group checksum must be re-calculated. This fixes a race
when several threads try to allocate a new inode in an UNINIT'd group.
There is some question whether we need to be initializing the block
bitmap in ext4_new_inode() at all, but for now, if we are going to
init the block group, let's eliminate the race.
Signed-off-by: Frederic Bohe <frederic.bohe@bull.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We need to make sure we mark the buffer_heads as dirty and uptodate
so that block_write_full_page write them correctly.
This fixes mmap corruptions that can occur in low memory situations.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Xen requires that all mappings of pagetable pages are read-only, so
that they can't be updated illegally. As a result, if a page is being
turned into a pagetable page, we need to make sure all its mappings
are RO.
If the page had been used for ioremap or vmalloc, it may still have
left over mappings as a result of not having been lazily unmapped.
This change makes sure we explicitly mop them all up before pinning
the page.
Unlike aliases created by kmap, the there can be vmalloc aliases even
for non-high pages, so we must do the flush unconditionally.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Xen can end up calling vm_unmap_aliases() before vmalloc_init() has
been called. In this case its safe to make it a simple no-op.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yup, this appears to be the problem, thanks. I think &hso_net->net->dev
is more intuitive for the error message, so I've used that. I've also
added missing line endings on the error messages and set our local
rfkill structure element to NULL on failure so we don't try to call
rfkill_unregister on driver removal if we failed to register at all.
The patch below Works For Me (TM); the device is detected fine, can be
removed without problems and connects ok. I'll have a prod at why the
rfkill stuff isn't working next, but I believe this cleanup of the error
handling is appropriate no matter what the issue with registration is.
Signed-Off-By: Jonathan McDowell <noodles@earth.li>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Due to a hardware bug, the originally assigned range cannot reliably
be used for boot configuration and must not be modifiable through
ethtool.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Paulius Zaleckas <paulius.zaleckas@teltonika.lt>
Cc: Denis Joseph Barrow <D.Barow@option.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Tried to deactivate rx ring that wasn't activated,
used wrong index.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Impact: fix rare memory leak in the sched-domains manual reconfiguration code
In the failure path, rd is not attached to a sched domain,
so it causes a leak.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
xfrm_policy_destroy() will oops if not dead policy is passed to it.
On error path in pfkey_compile_policy() exactly this happens.
Oopsable for CAP_NET_ADMIN owners.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
net: Fix recursive descent in __scm_destroy().
iwl3945: fix deadlock on suspend
iwl3945: do not send scan command if channel count zero
iwl3945: clear scanning bits upon failure
ath5k: correct handling of rx status fields
zd1211rw: Add 2 device IDs
Fix logic error in rfkill_check_duplicity
iwlagn: avoid sleep in softirq context
iwlwifi: clear scanning bits upon failure
Revert "ath5k: honor FIF_BCN_PRBRESP_PROMISC in STA mode"
tcp: Fix recvmsg MSG_PEEK influence of blocking behavior.
netfilter: netns ct: walk netns list under RTNL
ipv6: fix run pending DAD when interface becomes ready
net/9p: fix printk format warnings
net: fix packet socket delivery in rx irq handler
xfrm: Have af-specific init_tempsel() initialize family field of temporary selector
sound/pci/pcxhr/pcxhr_core.c: In function 'pcxhr_set_pipe_cmd_params':
sound/pci/pcxhr/pcxhr_core.c:700: warning: statement with no effect
sound/pci/pcxhr/pcxhr_core.c:706: warning: statement with no effect
sound/pci/pcxhr/pcxhr_core.c:710: warning: statement with no effect
Due to
try to fix this, and be more conventional about the empty stubs.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
Revert "x86: default to reboot via ACPI"
x86: align DirectMap in /proc/meminfo
AMD IOMMU: fix lazy IO/TLB flushing in unmap path
x86: add smp_mb() before sending INVALIDATE_TLB_VECTOR
x86: remove VISWS and PARAVIRT around NR_IRQS puzzle
x86: mention ACPI in top-level Kconfig menu
x86: size NR_IRQS on 32-bit systems the same way as 64-bit
x86: don't allow nr_irqs > NR_IRQS
x86/docs: remove noirqbalance param docs
x86: don't use tsc_khz to calculate lpj if notsc is passed
x86, voyager: fix smp_intr_init() compile breakage
AMD IOMMU: fix detection of NP capable IOMMUs
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
Block: use round_jiffies_up()
Add round_jiffies_up and related routines
block: fix __blkdev_get() for removable devices
generic-ipi: fix the smp_mb() placement
blk: move blk_delete_timer call in end_that_request_last
block: add timer on blkdev_dequeue_request() not elv_next_request()
bio: define __BIOVEC_PHYS_MERGEABLE
block: remove unused ll_new_mergeable()
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
[WATCHDOG] SAM9 watchdog - supported on all SAM9 and CAP9 processors
[WATCHDOG] SAM9 watchdog - update for moved headers
* 'for-linus' of git://neil.brown.name/md:
md: linear: Fix a division by zero bug for very small arrays.
md: fix bug in raid10 recovery.
md: revert the recent addition of a call to the BLKRRPART ioctl.