linux/include
Joel Fernandes (Google) e6753f23d9 tracepoint: Make rcuidle tracepoint callers use SRCU
In recent tests with IRQ on/off tracepoints, a large performance
overhead ~10% is noticed when running hackbench. This is root caused to
calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
tracepoint code. Following a long discussion on the list [1] about this,
we concluded that srcu is a better alternative for use during rcu idle.
Although it does involve extra barriers, its lighter than the sched-rcu
version which has to do additional RCU calls to notify RCU idle about
entry into RCU sections.

In this patch, we change the underlying implementation of the
trace_*_rcuidle API to use SRCU. This has shown to improve performance
alot for the high frequency irq enable/disable tracepoints.

Test: Tested idle and preempt/irq tracepoints.

Here are some performance numbers:

With a run of the following 30 times on a single core x86 Qemu instance
with 1GB memory:
hackbench -g 4 -f 2 -l 3000

Completion times in seconds. CONFIG_PROVE_LOCKING=y.

No patches (without this series)
Mean: 3.048
Median: 3.025
Std Dev: 0.064

With Lockdep using irq tracepoints with RCU implementation:
Mean: 3.451   (-11.66 %)
Median: 3.447 (-12.22%)
Std Dev: 0.049

With Lockdep using irq tracepoints with SRCU implementation (this series):
Mean: 3.020   (I would consider the improvement against the "without
	       this series" case as just noise).
Median: 3.013
Std Dev: 0.033

[1] https://patchwork.kernel.org/patch/10344297/

[remove rcu_read_lock_sched_notrace as its the equivalent of
preempt_disable_notrace and is unnecessary to call in tracepoint code]
Link: http://lkml.kernel.org/r/20180730222423.196630-3-joel@joelfernandes.org

Cleaned-up-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[ Simplified WARN_ON_ONCE() ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-30 19:13:03 -04:00
..
acpi ACPI / processor: Finish making acpi_processor_ppc_has_changed() void 2018-06-20 10:50:40 +02:00
asm-generic mm: allow arch to supply p??_free_tlb functions 2018-07-14 11:11:09 -07:00
clocksource
crypto Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
drm drm for v4.18-rc1 2018-06-06 08:16:33 -07:00
dt-bindings dt-bindings: clock: imx6ul: Do not change the clock definition order 2018-06-29 11:40:20 -07:00
keys docs: Fix some broken references 2018-06-15 18:10:01 -03:00
kvm KVM: arm/arm64: Bump VGIC_V3_MAX_CPUS to 512 2018-05-25 12:29:27 +01:00
linux tracepoint: Make rcuidle tracepoint callers use SRCU 2018-07-30 19:13:03 -04:00
math-emu
media media: v4l2-core: push taking ioctl mutex down to ioctl handler 2018-05-28 16:31:44 -04:00
memory
misc ocxl: Expose the thread_id needed for wait on POWER9 2018-06-03 20:40:32 +10:00
net ipv6: fix useless rol32 call on hash 2018-07-18 15:11:09 -07:00
pcmcia
ras
rdma 4.18-rc 2018-06-21 07:22:30 +09:00
scsi SCSI misc on 20180610 2018-06-10 13:01:12 -07:00
soc ARM: SoC: late updates 2018-06-11 18:19:45 -07:00
sound sound updates for 4.18 2018-06-06 09:08:38 -07:00
target
trace NFS client updates for Linux 4.18 2018-06-12 10:09:03 -07:00
uapi Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-07-22 12:04:51 -07:00
video fbdev changes for v4.18: 2018-06-17 05:00:24 +09:00
xen xen: fixes for 4.18-rc2 2018-06-23 20:44:11 +08:00