2009-11-10 19:50:54 +08:00
|
|
|
perf-bench(1)
|
2010-05-05 22:23:27 +08:00
|
|
|
=============
|
2009-11-10 19:50:54 +08:00
|
|
|
|
|
|
|
NAME
|
|
|
|
----
|
|
|
|
perf-bench - General framework for benchmark suites
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
--------
|
|
|
|
[verse]
|
|
|
|
'perf bench' [<common options>] <subsystem> <suite> [<options>]
|
|
|
|
|
|
|
|
DESCRIPTION
|
|
|
|
-----------
|
2012-06-20 14:08:06 +08:00
|
|
|
This 'perf bench' command is a general framework for benchmark suites.
|
2009-11-10 19:50:54 +08:00
|
|
|
|
|
|
|
COMMON OPTIONS
|
|
|
|
--------------
|
2014-06-17 02:14:19 +08:00
|
|
|
-r::
|
|
|
|
--repeat=::
|
|
|
|
Specify amount of times to repeat the run (default 10).
|
|
|
|
|
2009-11-10 19:50:54 +08:00
|
|
|
-f::
|
|
|
|
--format=::
|
|
|
|
Specify format style.
|
2010-04-01 02:31:00 +08:00
|
|
|
Current available format styles are:
|
2009-11-10 19:50:54 +08:00
|
|
|
|
|
|
|
'default'::
|
|
|
|
Default style. This is mainly for human reading.
|
|
|
|
---------------------
|
2010-04-01 02:31:00 +08:00
|
|
|
% perf bench sched pipe # with no style specified
|
2009-11-10 19:50:54 +08:00
|
|
|
(executing 1000000 pipe operations between two tasks)
|
|
|
|
Total time:5.855 sec
|
|
|
|
5.855061 usecs/op
|
|
|
|
170792 ops/sec
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
'simple'::
|
|
|
|
This simple style is friendly for automated
|
|
|
|
processing by scripts.
|
|
|
|
---------------------
|
|
|
|
% perf bench --format=simple sched pipe # specified simple
|
|
|
|
5.988
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
SUBSYSTEM
|
|
|
|
---------
|
|
|
|
|
|
|
|
'sched'::
|
|
|
|
Scheduler and IPC mechanisms.
|
|
|
|
|
2012-06-20 14:08:06 +08:00
|
|
|
'mem'::
|
|
|
|
Memory access performance.
|
|
|
|
|
2014-03-28 07:50:18 +08:00
|
|
|
'numa'::
|
|
|
|
NUMA scheduling and MM benchmarks.
|
|
|
|
|
|
|
|
'futex'::
|
|
|
|
Futex stressing benchmarks.
|
|
|
|
|
perf bench: Add epoll parallel epoll_wait benchmark
This program benchmarks concurrent epoll_wait(2) for file descriptors
that are monitored with with EPOLLIN along various semantics, by a
single epoll instance. Such conditions can be found when using
single/combined or multiple queuing when load balancing.
Each thread has a number of private, nonblocking file descriptors,
referred to as fdmap. A writer thread will constantly be writing to the
fdmaps of all threads, minimizing each threads's chances of epoll_wait
not finding any ready read events and blocking as this is not what we
want to stress. Full details in the start of the C file.
Committer testing:
# perf bench
Usage:
perf bench [<common options>] <collection> <benchmark> [<options>]
# List of all available benchmark collections:
sched: Scheduler and IPC benchmarks
mem: Memory access benchmarks
numa: NUMA scheduling and MM benchmarks
futex: Futex stressing benchmarks
epoll: Epoll stressing benchmarks
all: All benchmarks
# perf bench epoll
# List of available benchmarks for collection 'epoll':
wait: Benchmark epoll concurrent epoll_waits
all: Run all futex benchmarks
# perf bench epoll wait
# Running 'epoll/wait' benchmark:
Run summary [PID 19295]: 3 threads monitoring on 64 file-descriptors for 8 secs.
[thread 0] fdmap: 0xdaa650 ... 0xdaa74c [ 328241 ops/sec ]
[thread 1] fdmap: 0xdaa900 ... 0xdaa9fc [ 351695 ops/sec ]
[thread 2] fdmap: 0xdaabb0 ... 0xdaacac [ 381423 ops/sec ]
Averaged 353786 operations/sec (+- 4.35%), total secs = 8
#
Committer notes:
Fix the build on debian:experimental-x-mips, debian:experimental-x-mipsel
and others:
CC /tmp/build/perf/bench/epoll-wait.o
bench/epoll-wait.c: In function 'writerfn':
bench/epoll-wait.c:399:12: error: format '%ld' expects argument of type 'long int', but argument 2 has type 'size_t' {aka 'unsigned int'} [-Werror=format=]
printinfo("exiting writer-thread (total full-loops: %ld)\n", iter);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~
bench/epoll-wait.c:86:31: note: in definition of macro 'printinfo'
do { if (__verbose) { printf(fmt, ## arg); fflush(stdout); } } while (0)
^~~
cc1: all warnings being treated as errors
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com> <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/20181106152226.20883-2-dave@stgolabs.net
Link: http://lkml.kernel.org/r/20181106182349.thdkpvshkna5vd7o@linux-r8p5>
[ Applied above fixup as per Davidlohr's request ]
[ Use inttypes.h to print rlim_t fields, fixing the build on Alpine Linux / musl libc ]
[ Check if eventfd() is available, i.e. if HAVE_EVENTFD is defined ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-06 23:22:25 +08:00
|
|
|
'epoll'::
|
|
|
|
Eventpoll (epoll) stressing benchmarks.
|
|
|
|
|
2012-06-20 14:08:06 +08:00
|
|
|
'all'::
|
|
|
|
All benchmark subsystems.
|
|
|
|
|
2009-11-10 19:50:54 +08:00
|
|
|
SUITES FOR 'sched'
|
|
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
*messaging*::
|
|
|
|
Suite for evaluating performance of scheduler and IPC mechanisms.
|
|
|
|
Based on hackbench by Rusty Russell.
|
|
|
|
|
2012-06-20 14:08:06 +08:00
|
|
|
Options of *messaging*
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^
|
2009-11-10 19:50:54 +08:00
|
|
|
-p::
|
|
|
|
--pipe::
|
|
|
|
Use pipe() instead of socketpair()
|
|
|
|
|
|
|
|
-t::
|
|
|
|
--thread::
|
|
|
|
Be multi thread instead of multi process
|
|
|
|
|
|
|
|
-g::
|
|
|
|
--group=::
|
|
|
|
Specify number of groups
|
|
|
|
|
|
|
|
-l::
|
2015-10-19 16:04:28 +08:00
|
|
|
--nr_loops=::
|
2009-11-10 19:50:54 +08:00
|
|
|
Specify number of loops
|
|
|
|
|
|
|
|
Example of *messaging*
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
---------------------
|
|
|
|
% perf bench sched messaging # run with default
|
|
|
|
options (20 sender and receiver processes per group)
|
|
|
|
(10 groups == 400 processes run)
|
|
|
|
|
|
|
|
Total time:0.308 sec
|
|
|
|
|
2010-04-01 02:31:00 +08:00
|
|
|
% perf bench sched messaging -t -g 20 # be multi-thread, with 20 groups
|
2009-11-10 19:50:54 +08:00
|
|
|
(20 sender and receiver threads per group)
|
|
|
|
(20 groups == 800 threads run)
|
|
|
|
|
|
|
|
Total time:0.582 sec
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
*pipe*::
|
|
|
|
Suite for pipe() system call.
|
|
|
|
Based on pipe-test-1m.c by Ingo Molnar.
|
|
|
|
|
|
|
|
Options of *pipe*
|
|
|
|
^^^^^^^^^^^^^^^^^
|
|
|
|
-l::
|
|
|
|
--loop=::
|
|
|
|
Specify number of loops.
|
|
|
|
|
|
|
|
Example of *pipe*
|
|
|
|
^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
---------------------
|
|
|
|
% perf bench sched pipe
|
|
|
|
(executing 1000000 pipe operations between two tasks)
|
|
|
|
|
|
|
|
Total time:8.091 sec
|
|
|
|
8.091833 usecs/op
|
|
|
|
123581 ops/sec
|
|
|
|
|
|
|
|
% perf bench sched pipe -l 1000 # loop 1000
|
|
|
|
(executing 1000 pipe operations between two tasks)
|
|
|
|
|
|
|
|
Total time:0.016 sec
|
|
|
|
16.948000 usecs/op
|
|
|
|
59004 ops/sec
|
|
|
|
---------------------
|
|
|
|
|
2012-06-20 14:08:06 +08:00
|
|
|
SUITES FOR 'mem'
|
|
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
*memcpy*::
|
|
|
|
Suite for evaluating performance of simple memory copy in various ways.
|
|
|
|
|
|
|
|
Options of *memcpy*
|
|
|
|
^^^^^^^^^^^^^^^^^^^
|
|
|
|
-l::
|
2015-10-19 16:04:25 +08:00
|
|
|
--size::
|
|
|
|
Specify size of memory to copy (default: 1MB).
|
2012-06-20 14:08:06 +08:00
|
|
|
Available units are B, KB, MB, GB and TB (case insensitive).
|
|
|
|
|
2015-10-19 16:04:29 +08:00
|
|
|
-f::
|
|
|
|
--function::
|
|
|
|
Specify function to copy (default: default).
|
|
|
|
Available functions are depend on the architecture.
|
2012-06-20 14:08:06 +08:00
|
|
|
On x86-64, x86-64-unrolled, x86-64-movsq and x86-64-movsb are supported.
|
|
|
|
|
2015-10-19 16:04:28 +08:00
|
|
|
-l::
|
|
|
|
--nr_loops::
|
2012-06-20 14:08:06 +08:00
|
|
|
Repeat memcpy invocation this number of times.
|
|
|
|
|
|
|
|
-c::
|
2015-10-19 16:04:23 +08:00
|
|
|
--cycles::
|
2012-06-20 14:08:06 +08:00
|
|
|
Use perf's cpu-cycles event instead of gettimeofday syscall.
|
|
|
|
|
|
|
|
*memset*::
|
|
|
|
Suite for evaluating performance of simple memory set in various ways.
|
|
|
|
|
|
|
|
Options of *memset*
|
|
|
|
^^^^^^^^^^^^^^^^^^^
|
|
|
|
-l::
|
2015-10-19 16:04:25 +08:00
|
|
|
--size::
|
|
|
|
Specify size of memory to set (default: 1MB).
|
2012-06-20 14:08:06 +08:00
|
|
|
Available units are B, KB, MB, GB and TB (case insensitive).
|
|
|
|
|
2015-10-19 16:04:29 +08:00
|
|
|
-f::
|
|
|
|
--function::
|
|
|
|
Specify function to set (default: default).
|
|
|
|
Available functions are depend on the architecture.
|
2012-06-20 14:08:06 +08:00
|
|
|
On x86-64, x86-64-unrolled, x86-64-stosq and x86-64-stosb are supported.
|
|
|
|
|
2015-10-19 16:04:28 +08:00
|
|
|
-l::
|
|
|
|
--nr_loops::
|
2012-06-20 14:08:06 +08:00
|
|
|
Repeat memset invocation this number of times.
|
|
|
|
|
|
|
|
-c::
|
2015-10-19 16:04:23 +08:00
|
|
|
--cycles::
|
2012-06-20 14:08:06 +08:00
|
|
|
Use perf's cpu-cycles event instead of gettimeofday syscall.
|
|
|
|
|
2014-03-28 07:50:18 +08:00
|
|
|
SUITES FOR 'numa'
|
|
|
|
~~~~~~~~~~~~~~~~~
|
|
|
|
*mem*::
|
|
|
|
Suite for evaluating NUMA workloads.
|
|
|
|
|
|
|
|
SUITES FOR 'futex'
|
|
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
*hash*::
|
|
|
|
Suite for evaluating hash tables.
|
|
|
|
|
|
|
|
*wake*::
|
|
|
|
Suite for evaluating wake calls.
|
|
|
|
|
2015-05-09 02:37:59 +08:00
|
|
|
*wake-parallel*::
|
|
|
|
Suite for evaluating parallel wake calls.
|
|
|
|
|
2014-03-28 07:50:18 +08:00
|
|
|
*requeue*::
|
|
|
|
Suite for evaluating requeue calls.
|
|
|
|
|
2015-07-07 16:55:53 +08:00
|
|
|
*lock-pi*::
|
|
|
|
Suite for evaluating futex lock_pi calls.
|
|
|
|
|
perf bench: Add epoll parallel epoll_wait benchmark
This program benchmarks concurrent epoll_wait(2) for file descriptors
that are monitored with with EPOLLIN along various semantics, by a
single epoll instance. Such conditions can be found when using
single/combined or multiple queuing when load balancing.
Each thread has a number of private, nonblocking file descriptors,
referred to as fdmap. A writer thread will constantly be writing to the
fdmaps of all threads, minimizing each threads's chances of epoll_wait
not finding any ready read events and blocking as this is not what we
want to stress. Full details in the start of the C file.
Committer testing:
# perf bench
Usage:
perf bench [<common options>] <collection> <benchmark> [<options>]
# List of all available benchmark collections:
sched: Scheduler and IPC benchmarks
mem: Memory access benchmarks
numa: NUMA scheduling and MM benchmarks
futex: Futex stressing benchmarks
epoll: Epoll stressing benchmarks
all: All benchmarks
# perf bench epoll
# List of available benchmarks for collection 'epoll':
wait: Benchmark epoll concurrent epoll_waits
all: Run all futex benchmarks
# perf bench epoll wait
# Running 'epoll/wait' benchmark:
Run summary [PID 19295]: 3 threads monitoring on 64 file-descriptors for 8 secs.
[thread 0] fdmap: 0xdaa650 ... 0xdaa74c [ 328241 ops/sec ]
[thread 1] fdmap: 0xdaa900 ... 0xdaa9fc [ 351695 ops/sec ]
[thread 2] fdmap: 0xdaabb0 ... 0xdaacac [ 381423 ops/sec ]
Averaged 353786 operations/sec (+- 4.35%), total secs = 8
#
Committer notes:
Fix the build on debian:experimental-x-mips, debian:experimental-x-mipsel
and others:
CC /tmp/build/perf/bench/epoll-wait.o
bench/epoll-wait.c: In function 'writerfn':
bench/epoll-wait.c:399:12: error: format '%ld' expects argument of type 'long int', but argument 2 has type 'size_t' {aka 'unsigned int'} [-Werror=format=]
printinfo("exiting writer-thread (total full-loops: %ld)\n", iter);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~
bench/epoll-wait.c:86:31: note: in definition of macro 'printinfo'
do { if (__verbose) { printf(fmt, ## arg); fflush(stdout); } } while (0)
^~~
cc1: all warnings being treated as errors
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com> <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/20181106152226.20883-2-dave@stgolabs.net
Link: http://lkml.kernel.org/r/20181106182349.thdkpvshkna5vd7o@linux-r8p5>
[ Applied above fixup as per Davidlohr's request ]
[ Use inttypes.h to print rlim_t fields, fixing the build on Alpine Linux / musl libc ]
[ Check if eventfd() is available, i.e. if HAVE_EVENTFD is defined ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-06 23:22:25 +08:00
|
|
|
SUITES FOR 'epoll'
|
|
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
*wait*::
|
|
|
|
Suite for evaluating concurrent epoll_wait calls.
|
2015-07-07 16:55:53 +08:00
|
|
|
|
2009-11-10 19:50:54 +08:00
|
|
|
SEE ALSO
|
|
|
|
--------
|
|
|
|
linkperf:perf[1]
|