2009-04-12 03:51:18 +08:00
|
|
|
Event Tracing
|
|
|
|
|
|
|
|
Documentation written by Theodore Ts'o
|
2009-09-11 12:13:51 +08:00
|
|
|
Updated by Li Zefan and Tom Zanussi
|
2009-04-12 03:51:18 +08:00
|
|
|
|
2009-05-19 14:43:15 +08:00
|
|
|
1. Introduction
|
|
|
|
===============
|
2009-04-12 03:51:18 +08:00
|
|
|
|
|
|
|
Tracepoints (see Documentation/trace/tracepoints.txt) can be used
|
|
|
|
without creating custom kernel modules to register probe functions
|
|
|
|
using the event tracing infrastructure.
|
|
|
|
|
|
|
|
Not all tracepoints can be traced using the event tracing system;
|
|
|
|
the kernel developer must provide code snippets which define how the
|
|
|
|
tracing information is saved into the tracing buffer, and how the
|
2009-05-19 14:43:15 +08:00
|
|
|
tracing information should be printed.
|
2009-04-12 03:51:18 +08:00
|
|
|
|
2009-05-19 14:43:15 +08:00
|
|
|
2. Using Event Tracing
|
|
|
|
======================
|
|
|
|
|
|
|
|
2.1 Via the 'set_event' interface
|
|
|
|
---------------------------------
|
2009-04-12 03:51:18 +08:00
|
|
|
|
|
|
|
The events which are available for tracing can be found in the file
|
2009-09-07 20:37:17 +08:00
|
|
|
/sys/kernel/debug/tracing/available_events.
|
2009-04-12 03:51:18 +08:00
|
|
|
|
|
|
|
To enable a particular event, such as 'sched_wakeup', simply echo it
|
2009-09-07 20:37:17 +08:00
|
|
|
to /sys/kernel/debug/tracing/set_event. For example:
|
2009-04-12 03:51:18 +08:00
|
|
|
|
2009-09-07 20:37:17 +08:00
|
|
|
# echo sched_wakeup >> /sys/kernel/debug/tracing/set_event
|
2009-04-12 03:51:18 +08:00
|
|
|
|
2009-05-19 14:43:15 +08:00
|
|
|
[ Note: '>>' is necessary, otherwise it will firstly disable
|
|
|
|
all the events. ]
|
2009-04-12 03:51:18 +08:00
|
|
|
|
|
|
|
To disable an event, echo the event name to the set_event file prefixed
|
|
|
|
with an exclamation point:
|
|
|
|
|
2009-09-07 20:37:17 +08:00
|
|
|
# echo '!sched_wakeup' >> /sys/kernel/debug/tracing/set_event
|
2009-05-19 14:43:15 +08:00
|
|
|
|
|
|
|
To disable all events, echo an empty line to the set_event file:
|
|
|
|
|
2009-09-07 20:37:17 +08:00
|
|
|
# echo > /sys/kernel/debug/tracing/set_event
|
2009-04-12 03:51:18 +08:00
|
|
|
|
2009-05-19 14:43:15 +08:00
|
|
|
To enable all events, echo '*:*' or '*:' to the set_event file:
|
2009-04-12 03:51:18 +08:00
|
|
|
|
2009-09-07 20:37:17 +08:00
|
|
|
# echo *:* > /sys/kernel/debug/tracing/set_event
|
2009-04-12 03:51:18 +08:00
|
|
|
|
|
|
|
The events are organized into subsystems, such as ext4, irq, sched,
|
|
|
|
etc., and a full event name looks like this: <subsystem>:<event>. The
|
|
|
|
subsystem name is optional, but it is displayed in the available_events
|
|
|
|
file. All of the events in a subsystem can be specified via the syntax
|
|
|
|
"<subsystem>:*"; for example, to enable all irq events, you can use the
|
|
|
|
command:
|
|
|
|
|
2009-09-07 20:37:17 +08:00
|
|
|
# echo 'irq:*' > /sys/kernel/debug/tracing/set_event
|
2009-05-19 14:43:15 +08:00
|
|
|
|
|
|
|
2.2 Via the 'enable' toggle
|
|
|
|
---------------------------
|
|
|
|
|
2009-09-07 20:37:17 +08:00
|
|
|
The events available are also listed in /sys/kernel/debug/tracing/events/ hierarchy
|
2009-05-19 14:43:15 +08:00
|
|
|
of directories.
|
|
|
|
|
|
|
|
To enable event 'sched_wakeup':
|
|
|
|
|
2009-09-07 20:37:17 +08:00
|
|
|
# echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
|
2009-05-19 14:43:15 +08:00
|
|
|
|
|
|
|
To disable it:
|
|
|
|
|
2009-09-07 20:37:17 +08:00
|
|
|
# echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
|
2009-05-19 14:43:15 +08:00
|
|
|
|
|
|
|
To enable all events in sched subsystem:
|
|
|
|
|
2009-09-07 20:37:17 +08:00
|
|
|
# echo 1 > /sys/kernel/debug/tracing/events/sched/enable
|
2009-05-19 14:43:15 +08:00
|
|
|
|
2009-09-10 14:02:21 +08:00
|
|
|
To enable all events:
|
2009-05-19 14:43:15 +08:00
|
|
|
|
2009-09-07 20:37:17 +08:00
|
|
|
# echo 1 > /sys/kernel/debug/tracing/events/enable
|
2009-05-19 14:43:15 +08:00
|
|
|
|
|
|
|
When reading one of these enable files, there are four results:
|
|
|
|
|
|
|
|
0 - all events this file affects are disabled
|
|
|
|
1 - all events this file affects are enabled
|
|
|
|
X - there is a mixture of events enabled and disabled
|
|
|
|
? - this file does not affect any event
|
|
|
|
|
2009-07-01 10:47:05 +08:00
|
|
|
2.3 Boot option
|
|
|
|
---------------
|
|
|
|
|
|
|
|
In order to facilitate early boot debugging, use boot option:
|
|
|
|
|
|
|
|
trace_event=[event-list]
|
|
|
|
|
2009-12-21 14:27:24 +08:00
|
|
|
event-list is a comma separated list of events. See section 2.1 for event
|
|
|
|
format.
|
2009-07-01 10:47:05 +08:00
|
|
|
|
2009-05-19 14:43:15 +08:00
|
|
|
3. Defining an event-enabled tracepoint
|
|
|
|
=======================================
|
|
|
|
|
|
|
|
See The example provided in samples/trace_events
|
|
|
|
|
2009-09-11 12:13:51 +08:00
|
|
|
4. Event formats
|
|
|
|
================
|
|
|
|
|
|
|
|
Each trace event has a 'format' file associated with it that contains
|
|
|
|
a description of each field in a logged event. This information can
|
|
|
|
be used to parse the binary trace stream, and is also the place to
|
|
|
|
find the field names that can be used in event filters (see section 5).
|
|
|
|
|
|
|
|
It also displays the format string that will be used to print the
|
|
|
|
event in text mode, along with the event name and ID used for
|
|
|
|
profiling.
|
|
|
|
|
|
|
|
Every event has a set of 'common' fields associated with it; these are
|
|
|
|
the fields prefixed with 'common_'. The other fields vary between
|
|
|
|
events and correspond to the fields defined in the TRACE_EVENT
|
|
|
|
definition for that event.
|
|
|
|
|
|
|
|
Each field in the format has the form:
|
|
|
|
|
|
|
|
field:field-type field-name; offset:N; size:N;
|
|
|
|
|
|
|
|
where offset is the offset of the field in the trace record and size
|
|
|
|
is the size of the data item, in bytes.
|
|
|
|
|
|
|
|
For example, here's the information displayed for the 'sched_wakeup'
|
|
|
|
event:
|
|
|
|
|
2011-01-05 14:07:54 +08:00
|
|
|
# cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/format
|
2009-09-11 12:13:51 +08:00
|
|
|
|
|
|
|
name: sched_wakeup
|
|
|
|
ID: 60
|
|
|
|
format:
|
|
|
|
field:unsigned short common_type; offset:0; size:2;
|
|
|
|
field:unsigned char common_flags; offset:2; size:1;
|
|
|
|
field:unsigned char common_preempt_count; offset:3; size:1;
|
|
|
|
field:int common_pid; offset:4; size:4;
|
|
|
|
field:int common_tgid; offset:8; size:4;
|
|
|
|
|
|
|
|
field:char comm[TASK_COMM_LEN]; offset:12; size:16;
|
|
|
|
field:pid_t pid; offset:28; size:4;
|
|
|
|
field:int prio; offset:32; size:4;
|
|
|
|
field:int success; offset:36; size:4;
|
|
|
|
field:int cpu; offset:40; size:4;
|
|
|
|
|
|
|
|
print fmt: "task %s:%d [%d] success=%d [%03d]", REC->comm, REC->pid,
|
|
|
|
REC->prio, REC->success, REC->cpu
|
|
|
|
|
|
|
|
This event contains 10 fields, the first 5 common and the remaining 5
|
|
|
|
event-specific. All the fields for this event are numeric, except for
|
|
|
|
'comm' which is a string, a distinction important for event filtering.
|
|
|
|
|
|
|
|
5. Event filtering
|
|
|
|
==================
|
|
|
|
|
|
|
|
Trace events can be filtered in the kernel by associating boolean
|
|
|
|
'filter expressions' with them. As soon as an event is logged into
|
|
|
|
the trace buffer, its fields are checked against the filter expression
|
|
|
|
associated with that event type. An event with field values that
|
|
|
|
'match' the filter will appear in the trace output, and an event whose
|
|
|
|
values don't match will be discarded. An event with no filter
|
|
|
|
associated with it matches everything, and is the default when no
|
|
|
|
filter has been set for an event.
|
|
|
|
|
|
|
|
5.1 Expression syntax
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
A filter expression consists of one or more 'predicates' that can be
|
|
|
|
combined using the logical operators '&&' and '||'. A predicate is
|
|
|
|
simply a clause that compares the value of a field contained within a
|
|
|
|
logged event with a constant value and returns either 0 or 1 depending
|
|
|
|
on whether the field value matched (1) or didn't match (0):
|
|
|
|
|
|
|
|
field-name relational-operator value
|
|
|
|
|
|
|
|
Parentheses can be used to provide arbitrary logical groupings and
|
|
|
|
double-quotes can be used to prevent the shell from interpreting
|
|
|
|
operators as shell metacharacters.
|
|
|
|
|
|
|
|
The field-names available for use in filters can be found in the
|
|
|
|
'format' files for trace events (see section 4).
|
|
|
|
|
|
|
|
The relational-operators depend on the type of the field being tested:
|
|
|
|
|
|
|
|
The operators available for numeric fields are:
|
|
|
|
|
tracing: Add binary '&' filter for events
There are some cases when filtering on a set flag of a field of a tracepoint
is useful. But currently the only filtering commands for numbered fields
is ==, !=, <, <=, >, >=. This does not help when you just want to trace if
a specific flag is set. For example:
> # sudo trace-cmd record -e brcmfmac:brcmf_dbg -f 'level & 0x40000'
> disable all
> enable brcmfmac:brcmf_dbg
> path = /sys/kernel/debug/tracing/events/brcmfmac/brcmf_dbg/enable
> (level & 0x40000)
> ^
> parse_error: Invalid operator
>
When trying to trace brcmf_dbg when level has its 1 << 18 bit set, the
filter fails to perform.
By allowing a binary '&' operation, this gives the user the ability to
test a bit.
Note, a binary '|' is not added, as it doesn't make sense as fields must
be compared to constants (for now), and ORing a constant will always return
true.
Link: http://lkml.kernel.org/r/1371057385.9844.261.camel@gandalf.local.home
Suggested-by: Arend van Spriel <arend@broadcom.com>
Tested-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-13 01:16:25 +08:00
|
|
|
==, !=, <, <=, >, >=, &
|
2009-09-11 12:13:51 +08:00
|
|
|
|
|
|
|
And for string fields they are:
|
|
|
|
|
2013-06-17 22:59:17 +08:00
|
|
|
==, !=, ~
|
2009-09-11 12:13:51 +08:00
|
|
|
|
2013-06-17 22:59:17 +08:00
|
|
|
The glob (~) only accepts a wild card character (*) at the start and or
|
|
|
|
end of the string. For example:
|
|
|
|
|
|
|
|
prev_comm ~ "*sh"
|
|
|
|
prev_comm ~ "sh*"
|
|
|
|
prev_comm ~ "*sh*"
|
|
|
|
|
|
|
|
But does not allow for it to be within the string:
|
|
|
|
|
|
|
|
prev_comm ~ "ba*sh" <-- is invalid
|
2009-09-11 12:13:51 +08:00
|
|
|
|
|
|
|
5.2 Setting filters
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
A filter for an individual event is set by writing a filter expression
|
|
|
|
to the 'filter' file for the given event.
|
|
|
|
|
|
|
|
For example:
|
|
|
|
|
2011-01-05 14:07:54 +08:00
|
|
|
# cd /sys/kernel/debug/tracing/events/sched/sched_wakeup
|
2009-09-11 12:13:51 +08:00
|
|
|
# echo "common_preempt_count > 4" > filter
|
|
|
|
|
|
|
|
A slightly more involved example:
|
|
|
|
|
2011-01-05 14:07:54 +08:00
|
|
|
# cd /sys/kernel/debug/tracing/events/signal/signal_generate
|
2009-09-11 12:13:51 +08:00
|
|
|
# echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter
|
|
|
|
|
|
|
|
If there is an error in the expression, you'll get an 'Invalid
|
|
|
|
argument' error when setting it, and the erroneous string along with
|
|
|
|
an error message can be seen by looking at the filter e.g.:
|
|
|
|
|
2011-01-05 14:07:54 +08:00
|
|
|
# cd /sys/kernel/debug/tracing/events/signal/signal_generate
|
2009-09-11 12:13:51 +08:00
|
|
|
# echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter
|
|
|
|
-bash: echo: write error: Invalid argument
|
|
|
|
# cat filter
|
|
|
|
((sig >= 10 && sig < 15) || dsig == 17) && comm != bash
|
|
|
|
^
|
|
|
|
parse_error: Field not found
|
|
|
|
|
|
|
|
Currently the caret ('^') for an error always appears at the beginning of
|
|
|
|
the filter string; the error message should still be useful though
|
|
|
|
even without more accurate position info.
|
|
|
|
|
|
|
|
5.3 Clearing filters
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
To clear the filter for an event, write a '0' to the event's filter
|
|
|
|
file.
|
|
|
|
|
|
|
|
To clear the filters for all events in a subsystem, write a '0' to the
|
|
|
|
subsystem's filter file.
|
|
|
|
|
|
|
|
5.3 Subsystem filters
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
For convenience, filters for every event in a subsystem can be set or
|
|
|
|
cleared as a group by writing a filter expression into the filter file
|
2010-03-16 18:47:56 +08:00
|
|
|
at the root of the subsystem. Note however, that if a filter for any
|
2009-09-11 12:13:51 +08:00
|
|
|
event within the subsystem lacks a field specified in the subsystem
|
|
|
|
filter, or if the filter can't be applied for any other reason, the
|
|
|
|
filter for that event will retain its previous setting. This can
|
|
|
|
result in an unintended mixture of filters which could lead to
|
|
|
|
confusing (to the user who might think different filters are in
|
|
|
|
effect) trace output. Only filters that reference just the common
|
|
|
|
fields can be guaranteed to propagate successfully to all events.
|
|
|
|
|
|
|
|
Here are a few subsystem filter examples that also illustrate the
|
|
|
|
above points:
|
|
|
|
|
2010-03-16 18:47:56 +08:00
|
|
|
Clear the filters on all events in the sched subsystem:
|
2009-09-11 12:13:51 +08:00
|
|
|
|
|
|
|
# cd /sys/kernel/debug/tracing/events/sched
|
|
|
|
# echo 0 > filter
|
|
|
|
# cat sched_switch/filter
|
|
|
|
none
|
|
|
|
# cat sched_wakeup/filter
|
|
|
|
none
|
|
|
|
|
|
|
|
Set a filter using only common fields for all events in the sched
|
2010-03-16 18:47:56 +08:00
|
|
|
subsystem (all events end up with the same filter):
|
2009-09-11 12:13:51 +08:00
|
|
|
|
|
|
|
# cd /sys/kernel/debug/tracing/events/sched
|
|
|
|
# echo common_pid == 0 > filter
|
|
|
|
# cat sched_switch/filter
|
|
|
|
common_pid == 0
|
|
|
|
# cat sched_wakeup/filter
|
|
|
|
common_pid == 0
|
|
|
|
|
|
|
|
Attempt to set a filter using a non-common field for all events in the
|
2010-03-16 18:47:56 +08:00
|
|
|
sched subsystem (all events but those that have a prev_pid field retain
|
2009-09-11 12:13:51 +08:00
|
|
|
their old filters):
|
|
|
|
|
|
|
|
# cd /sys/kernel/debug/tracing/events/sched
|
|
|
|
# echo prev_pid == 0 > filter
|
|
|
|
# cat sched_switch/filter
|
|
|
|
prev_pid == 0
|
|
|
|
# cat sched_wakeup/filter
|
|
|
|
common_pid == 0
|
2013-10-24 21:59:30 +08:00
|
|
|
|
2015-11-04 05:37:15 +08:00
|
|
|
5.4 PID filtering
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
The set_event_pid file in the same directory as the top events directory
|
|
|
|
exists, will filter all events from tracing any task that does not have the
|
|
|
|
PID listed in the set_event_pid file.
|
|
|
|
|
|
|
|
# cd /sys/kernel/debug/tracing
|
|
|
|
# echo $$ > set_event_pid
|
|
|
|
# echo 1 > events/enabled
|
|
|
|
|
|
|
|
Will only trace events for the current task.
|
|
|
|
|
|
|
|
To add more PIDs without losing the PIDs already included, use '>>'.
|
|
|
|
|
|
|
|
# echo 123 244 1 >> set_event_pid
|
|
|
|
|
|
|
|
|
2013-10-24 21:59:30 +08:00
|
|
|
6. Event triggers
|
|
|
|
=================
|
|
|
|
|
|
|
|
Trace events can be made to conditionally invoke trigger 'commands'
|
|
|
|
which can take various forms and are described in detail below;
|
|
|
|
examples would be enabling or disabling other trace events or invoking
|
|
|
|
a stack trace whenever the trace event is hit. Whenever a trace event
|
|
|
|
with attached triggers is invoked, the set of trigger commands
|
|
|
|
associated with that event is invoked. Any given trigger can
|
|
|
|
additionally have an event filter of the same form as described in
|
|
|
|
section 5 (Event filtering) associated with it - the command will only
|
|
|
|
be invoked if the event being invoked passes the associated filter.
|
|
|
|
If no filter is associated with the trigger, it always passes.
|
|
|
|
|
|
|
|
Triggers are added to and removed from a particular event by writing
|
|
|
|
trigger expressions to the 'trigger' file for the given event.
|
|
|
|
|
|
|
|
A given event can have any number of triggers associated with it,
|
|
|
|
subject to any restrictions that individual commands may have in that
|
|
|
|
regard.
|
|
|
|
|
|
|
|
Event triggers are implemented on top of "soft" mode, which means that
|
|
|
|
whenever a trace event has one or more triggers associated with it,
|
|
|
|
the event is activated even if it isn't actually enabled, but is
|
|
|
|
disabled in a "soft" mode. That is, the tracepoint will be called,
|
|
|
|
but just will not be traced, unless of course it's actually enabled.
|
|
|
|
This scheme allows triggers to be invoked even for events that aren't
|
|
|
|
enabled, and also allows the current event filter implementation to be
|
|
|
|
used for conditionally invoking triggers.
|
|
|
|
|
|
|
|
The syntax for event triggers is roughly based on the syntax for
|
|
|
|
set_ftrace_filter 'ftrace filter commands' (see the 'Filter commands'
|
|
|
|
section of Documentation/trace/ftrace.txt), but there are major
|
|
|
|
differences and the implementation isn't currently tied to it in any
|
|
|
|
way, so beware about making generalizations between the two.
|
|
|
|
|
|
|
|
6.1 Expression syntax
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
Triggers are added by echoing the command to the 'trigger' file:
|
|
|
|
|
|
|
|
# echo 'command[:count] [if filter]' > trigger
|
|
|
|
|
|
|
|
Triggers are removed by echoing the same command but starting with '!'
|
|
|
|
to the 'trigger' file:
|
|
|
|
|
|
|
|
# echo '!command[:count] [if filter]' > trigger
|
|
|
|
|
|
|
|
The [if filter] part isn't used in matching commands when removing, so
|
|
|
|
leaving that off in a '!' command will accomplish the same thing as
|
|
|
|
having it in.
|
|
|
|
|
|
|
|
The filter syntax is the same as that described in the 'Event
|
|
|
|
filtering' section above.
|
|
|
|
|
|
|
|
For ease of use, writing to the trigger file using '>' currently just
|
|
|
|
adds or removes a single trigger and there's no explicit '>>' support
|
|
|
|
('>' actually behaves like '>>') or truncation support to remove all
|
|
|
|
triggers (you have to use '!' for each one added.)
|
|
|
|
|
|
|
|
6.2 Supported trigger commands
|
|
|
|
------------------------------
|
|
|
|
|
|
|
|
The following commands are supported:
|
|
|
|
|
|
|
|
- enable_event/disable_event
|
|
|
|
|
|
|
|
These commands can enable or disable another trace event whenever
|
|
|
|
the triggering event is hit. When these commands are registered,
|
|
|
|
the other trace event is activated, but disabled in a "soft" mode.
|
|
|
|
That is, the tracepoint will be called, but just will not be traced.
|
|
|
|
The event tracepoint stays in this mode as long as there's a trigger
|
|
|
|
in effect that can trigger it.
|
|
|
|
|
|
|
|
For example, the following trigger causes kmalloc events to be
|
|
|
|
traced when a read system call is entered, and the :1 at the end
|
|
|
|
specifies that this enablement happens only once:
|
|
|
|
|
|
|
|
# echo 'enable_event:kmem:kmalloc:1' > \
|
|
|
|
/sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
|
|
|
|
|
|
|
|
The following trigger causes kmalloc events to stop being traced
|
|
|
|
when a read system call exits. This disablement happens on every
|
|
|
|
read system call exit:
|
|
|
|
|
|
|
|
# echo 'disable_event:kmem:kmalloc' > \
|
|
|
|
/sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
|
|
|
|
|
|
|
|
The format is:
|
|
|
|
|
|
|
|
enable_event:<system>:<event>[:count]
|
|
|
|
disable_event:<system>:<event>[:count]
|
|
|
|
|
|
|
|
To remove the above commands:
|
|
|
|
|
|
|
|
# echo '!enable_event:kmem:kmalloc:1' > \
|
|
|
|
/sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
|
|
|
|
|
|
|
|
# echo '!disable_event:kmem:kmalloc' > \
|
|
|
|
/sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
|
|
|
|
|
|
|
|
Note that there can be any number of enable/disable_event triggers
|
|
|
|
per triggering event, but there can only be one trigger per
|
|
|
|
triggered event. e.g. sys_enter_read can have triggers enabling both
|
|
|
|
kmem:kmalloc and sched:sched_switch, but can't have two kmem:kmalloc
|
|
|
|
versions such as kmem:kmalloc and kmem:kmalloc:1 or 'kmem:kmalloc if
|
|
|
|
bytes_req == 256' and 'kmem:kmalloc if bytes_alloc == 256' (they
|
|
|
|
could be combined into a single filter on kmem:kmalloc though).
|
|
|
|
|
|
|
|
- stacktrace
|
|
|
|
|
|
|
|
This command dumps a stacktrace in the trace buffer whenever the
|
|
|
|
triggering event occurs.
|
|
|
|
|
|
|
|
For example, the following trigger dumps a stacktrace every time the
|
|
|
|
kmalloc tracepoint is hit:
|
|
|
|
|
|
|
|
# echo 'stacktrace' > \
|
|
|
|
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
|
|
|
|
|
|
|
|
The following trigger dumps a stacktrace the first 5 times a kmalloc
|
|
|
|
request happens with a size >= 64K
|
|
|
|
|
|
|
|
# echo 'stacktrace:5 if bytes_req >= 65536' > \
|
|
|
|
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
|
|
|
|
|
|
|
|
The format is:
|
|
|
|
|
|
|
|
stacktrace[:count]
|
|
|
|
|
|
|
|
To remove the above commands:
|
|
|
|
|
|
|
|
# echo '!stacktrace' > \
|
|
|
|
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
|
|
|
|
|
|
|
|
# echo '!stacktrace:5 if bytes_req >= 65536' > \
|
|
|
|
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
|
|
|
|
|
|
|
|
The latter can also be removed more simply by the following (without
|
|
|
|
the filter):
|
|
|
|
|
|
|
|
# echo '!stacktrace:5' > \
|
|
|
|
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
|
|
|
|
|
|
|
|
Note that there can be only one stacktrace trigger per triggering
|
|
|
|
event.
|
|
|
|
|
|
|
|
- snapshot
|
|
|
|
|
|
|
|
This command causes a snapshot to be triggered whenever the
|
|
|
|
triggering event occurs.
|
|
|
|
|
|
|
|
The following command creates a snapshot every time a block request
|
|
|
|
queue is unplugged with a depth > 1. If you were tracing a set of
|
|
|
|
events or functions at the time, the snapshot trace buffer would
|
2014-04-05 10:31:00 +08:00
|
|
|
capture those events when the trigger event occurred:
|
2013-10-24 21:59:30 +08:00
|
|
|
|
|
|
|
# echo 'snapshot if nr_rq > 1' > \
|
|
|
|
/sys/kernel/debug/tracing/events/block/block_unplug/trigger
|
|
|
|
|
|
|
|
To only snapshot once:
|
|
|
|
|
|
|
|
# echo 'snapshot:1 if nr_rq > 1' > \
|
|
|
|
/sys/kernel/debug/tracing/events/block/block_unplug/trigger
|
|
|
|
|
|
|
|
To remove the above commands:
|
|
|
|
|
|
|
|
# echo '!snapshot if nr_rq > 1' > \
|
|
|
|
/sys/kernel/debug/tracing/events/block/block_unplug/trigger
|
|
|
|
|
|
|
|
# echo '!snapshot:1 if nr_rq > 1' > \
|
|
|
|
/sys/kernel/debug/tracing/events/block/block_unplug/trigger
|
|
|
|
|
|
|
|
Note that there can be only one snapshot trigger per triggering
|
|
|
|
event.
|
|
|
|
|
|
|
|
- traceon/traceoff
|
|
|
|
|
|
|
|
These commands turn tracing on and off when the specified events are
|
|
|
|
hit. The parameter determines how many times the tracing system is
|
|
|
|
turned on and off. If unspecified, there is no limit.
|
|
|
|
|
|
|
|
The following command turns tracing off the first time a block
|
|
|
|
request queue is unplugged with a depth > 1. If you were tracing a
|
|
|
|
set of events or functions at the time, you could then examine the
|
|
|
|
trace buffer to see the sequence of events that led up to the
|
|
|
|
trigger event:
|
|
|
|
|
|
|
|
# echo 'traceoff:1 if nr_rq > 1' > \
|
|
|
|
/sys/kernel/debug/tracing/events/block/block_unplug/trigger
|
|
|
|
|
|
|
|
To always disable tracing when nr_rq > 1 :
|
|
|
|
|
|
|
|
# echo 'traceoff if nr_rq > 1' > \
|
|
|
|
/sys/kernel/debug/tracing/events/block/block_unplug/trigger
|
|
|
|
|
|
|
|
To remove the above commands:
|
|
|
|
|
|
|
|
# echo '!traceoff:1 if nr_rq > 1' > \
|
|
|
|
/sys/kernel/debug/tracing/events/block/block_unplug/trigger
|
|
|
|
|
|
|
|
# echo '!traceoff if nr_rq > 1' > \
|
|
|
|
/sys/kernel/debug/tracing/events/block/block_unplug/trigger
|
|
|
|
|
|
|
|
Note that there can be only one traceon or traceoff trigger per
|
|
|
|
triggering event.
|
2016-03-04 02:54:56 +08:00
|
|
|
|
|
|
|
- hist
|
|
|
|
|
|
|
|
This command aggregates event hits into a hash table keyed on one or
|
|
|
|
more trace event format fields (or stacktrace) and a set of running
|
|
|
|
totals derived from one or more trace event format fields and/or
|
|
|
|
event counts (hitcount).
|
|
|
|
|
|
|
|
The format of a hist trigger is as follows:
|
|
|
|
|
|
|
|
hist:keys=<field1[,field2,...]>[:values=<field1[,field2,...]>]
|
|
|
|
[:sort=<field1[,field2,...]>][:size=#entries][:pause][:continue]
|
|
|
|
[:clear] [if <filter>]
|
|
|
|
|
|
|
|
When a matching event is hit, an entry is added to a hash table
|
|
|
|
using the key(s) and value(s) named. Keys and values correspond to
|
|
|
|
fields in the event's format description. Values must correspond to
|
|
|
|
numeric fields - on an event hit, the value(s) will be added to a
|
|
|
|
sum kept for that field. The special string 'hitcount' can be used
|
|
|
|
in place of an explicit value field - this is simply a count of
|
|
|
|
event hits. If 'values' isn't specified, an implicit 'hitcount'
|
|
|
|
value will be automatically created and used as the only value.
|
|
|
|
Keys can be any field, or the special string 'stacktrace', which
|
|
|
|
will use the event's kernel stacktrace as the key. The keywords
|
|
|
|
'keys' or 'key' can be used to specify keys, and the keywords
|
|
|
|
'values', 'vals', or 'val' can be used to specify values. Compound
|
|
|
|
keys consisting of up to two fields can be specified by the 'keys'
|
|
|
|
keyword. Hashing a compound key produces a unique entry in the
|
|
|
|
table for each unique combination of component keys, and can be
|
|
|
|
useful for providing more fine-grained summaries of event data.
|
|
|
|
Additionally, sort keys consisting of up to two fields can be
|
|
|
|
specified by the 'sort' keyword. If more than one field is
|
|
|
|
specified, the result will be a 'sort within a sort': the first key
|
|
|
|
is taken to be the primary sort key and the second the secondary
|
|
|
|
key.
|
|
|
|
|
|
|
|
'hist' triggers add a 'hist' file to each event's subdirectory.
|
|
|
|
Reading the 'hist' file for the event will dump the hash table in
|
|
|
|
its entirety to stdout. Each printed hash table entry is a simple
|
|
|
|
list of the keys and values comprising the entry; keys are printed
|
|
|
|
first and are delineated by curly braces, and are followed by the
|
|
|
|
set of value fields for the entry. By default, numeric fields are
|
|
|
|
displayed as base-10 integers. This can be modified by appending
|
|
|
|
any of the following modifiers to the field name:
|
|
|
|
|
|
|
|
.hex display a number as a hex value
|
|
|
|
.sym display an address as a symbol
|
|
|
|
.sym-offset display an address as a symbol and offset
|
|
|
|
.syscall display a syscall id as a system call name
|
|
|
|
.execname display a common_pid as a program name
|
|
|
|
|
|
|
|
Note that in general the semantics of a given field aren't
|
|
|
|
interpreted when applying a modifier to it, but there are some
|
|
|
|
restrictions to be aware of in this regard:
|
|
|
|
|
|
|
|
- only the 'hex' modifier can be used for values (because values
|
|
|
|
are essentially sums, and the other modifiers don't make sense
|
|
|
|
in that context).
|
|
|
|
- the 'execname' modifier can only be used on a 'common_pid'. The
|
|
|
|
reason for this is that the execname is simply the 'comm' value
|
|
|
|
saved for the 'current' process when an event was triggered,
|
|
|
|
which is the same as the common_pid value saved by the event
|
|
|
|
tracing code. Trying to apply that comm value to other pid
|
|
|
|
values wouldn't be correct, and typically events that care save
|
|
|
|
pid-specific comm fields in the event itself.
|
|
|
|
|
|
|
|
A typical usage scenario would be the following to enable a hist
|
|
|
|
trigger, read its current contents, and then turn it off:
|
|
|
|
|
|
|
|
# echo 'hist:keys=skbaddr.hex:vals=len' > \
|
|
|
|
/sys/kernel/debug/tracing/events/net/netif_rx/trigger
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/net/netif_rx/hist
|
|
|
|
|
|
|
|
# echo '!hist:keys=skbaddr.hex:vals=len' > \
|
|
|
|
/sys/kernel/debug/tracing/events/net/netif_rx/trigger
|
|
|
|
|
|
|
|
The trigger file itself can be read to show the details of the
|
|
|
|
currently attached hist trigger. This information is also displayed
|
|
|
|
at the top of the 'hist' file when read.
|
|
|
|
|
|
|
|
By default, the size of the hash table is 2048 entries. The 'size'
|
|
|
|
parameter can be used to specify more or fewer than that. The units
|
|
|
|
are in terms of hashtable entries - if a run uses more entries than
|
|
|
|
specified, the results will show the number of 'drops', the number
|
|
|
|
of hits that were ignored. The size should be a power of 2 between
|
|
|
|
128 and 131072 (any non- power-of-2 number specified will be rounded
|
|
|
|
up).
|
|
|
|
|
|
|
|
The 'sort' parameter can be used to specify a value field to sort
|
|
|
|
on. The default if unspecified is 'hitcount' and the default sort
|
|
|
|
order is 'ascending'. To sort in the opposite direction, append
|
|
|
|
.descending' to the sort key.
|
|
|
|
|
|
|
|
The 'pause' parameter can be used to pause an existing hist trigger
|
|
|
|
or to start a hist trigger but not log any events until told to do
|
|
|
|
so. 'continue' or 'cont' can be used to start or restart a paused
|
|
|
|
hist trigger.
|
|
|
|
|
|
|
|
The 'clear' parameter will clear the contents of a running hist
|
|
|
|
trigger and leave its current paused/active state.
|
|
|
|
|
|
|
|
Note that the 'pause', 'cont', and 'clear' parameters should be
|
|
|
|
applied using 'append' shell operator ('>>') if applied to an
|
|
|
|
existing trigger, rather than via the '>' operator, which will cause
|
|
|
|
the trigger to be removed through truncation.
|
|
|
|
|
|
|
|
- enable_hist/disable_hist
|
|
|
|
|
|
|
|
The enable_hist and disable_hist triggers can be used to have one
|
|
|
|
event conditionally start and stop another event's already-attached
|
|
|
|
hist trigger. Any number of enable_hist and disable_hist triggers
|
|
|
|
can be attached to a given event, allowing that event to kick off
|
|
|
|
and stop aggregations on a host of other events.
|
|
|
|
|
|
|
|
The format is very similar to the enable/disable_event triggers:
|
|
|
|
|
|
|
|
enable_hist:<system>:<event>[:count]
|
|
|
|
disable_hist:<system>:<event>[:count]
|
|
|
|
|
|
|
|
Instead of enabling or disabling the tracing of the target event
|
|
|
|
into the trace buffer as the enable/disable_event triggers do, the
|
|
|
|
enable/disable_hist triggers enable or disable the aggregation of
|
|
|
|
the target event into a hash table.
|
|
|
|
|
|
|
|
A typical usage scenario for the enable_hist/disable_hist triggers
|
|
|
|
would be to first set up a paused hist trigger on some event,
|
|
|
|
followed by an enable_hist/disable_hist pair that turns the hist
|
|
|
|
aggregation on and off when conditions of interest are hit:
|
|
|
|
|
|
|
|
# echo 'hist:keys=skbaddr.hex:vals=len:pause' > \
|
|
|
|
/sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
|
|
|
|
|
|
|
|
# echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > \
|
|
|
|
/sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
|
|
|
|
|
|
|
|
# echo 'disable_hist:net:netif_receive_skb if comm==wget' > \
|
|
|
|
/sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
|
|
|
|
|
|
|
|
The above sets up an initially paused hist trigger which is unpaused
|
|
|
|
and starts aggregating events when a given program is executed, and
|
|
|
|
which stops aggregating when the process exits and the hist trigger
|
|
|
|
is paused again.
|
|
|
|
|
|
|
|
The examples below provide a more concrete illustration of the
|
|
|
|
concepts and typical usage patterns discussed above.
|
|
|
|
|
|
|
|
|
|
|
|
6.2 'hist' trigger examples
|
|
|
|
---------------------------
|
|
|
|
|
|
|
|
The first set of examples creates aggregations using the kmalloc
|
|
|
|
event. The fields that can be used for the hist trigger are listed
|
|
|
|
in the kmalloc event's format file:
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/format
|
|
|
|
name: kmalloc
|
|
|
|
ID: 374
|
|
|
|
format:
|
|
|
|
field:unsigned short common_type; offset:0; size:2; signed:0;
|
|
|
|
field:unsigned char common_flags; offset:2; size:1; signed:0;
|
|
|
|
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
|
|
|
|
field:int common_pid; offset:4; size:4; signed:1;
|
|
|
|
|
|
|
|
field:unsigned long call_site; offset:8; size:8; signed:0;
|
|
|
|
field:const void * ptr; offset:16; size:8; signed:0;
|
|
|
|
field:size_t bytes_req; offset:24; size:8; signed:0;
|
|
|
|
field:size_t bytes_alloc; offset:32; size:8; signed:0;
|
|
|
|
field:gfp_t gfp_flags; offset:40; size:4; signed:0;
|
|
|
|
|
|
|
|
We'll start by creating a hist trigger that generates a simple table
|
|
|
|
that lists the total number of bytes requested for each function in
|
|
|
|
the kernel that made one or more calls to kmalloc:
|
|
|
|
|
|
|
|
# echo 'hist:key=call_site:val=bytes_req' > \
|
|
|
|
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
|
|
|
|
|
|
|
|
This tells the tracing system to create a 'hist' trigger using the
|
|
|
|
call_site field of the kmalloc event as the key for the table, which
|
|
|
|
just means that each unique call_site address will have an entry
|
|
|
|
created for it in the table. The 'val=bytes_req' parameter tells
|
|
|
|
the hist trigger that for each unique entry (call_site) in the
|
|
|
|
table, it should keep a running total of the number of bytes
|
|
|
|
requested by that call_site.
|
|
|
|
|
|
|
|
We'll let it run for awhile and then dump the contents of the 'hist'
|
|
|
|
file in the kmalloc event's subdirectory (for readability, a number
|
|
|
|
of entries have been omitted):
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
|
|
|
|
# trigger info: hist:keys=call_site:vals=bytes_req:sort=hitcount:size=2048 [active]
|
|
|
|
|
|
|
|
{ call_site: 18446744072106379007 } hitcount: 1 bytes_req: 176
|
|
|
|
{ call_site: 18446744071579557049 } hitcount: 1 bytes_req: 1024
|
|
|
|
{ call_site: 18446744071580608289 } hitcount: 1 bytes_req: 16384
|
|
|
|
{ call_site: 18446744071581827654 } hitcount: 1 bytes_req: 24
|
|
|
|
{ call_site: 18446744071580700980 } hitcount: 1 bytes_req: 8
|
|
|
|
{ call_site: 18446744071579359876 } hitcount: 1 bytes_req: 152
|
|
|
|
{ call_site: 18446744071580795365 } hitcount: 3 bytes_req: 144
|
|
|
|
{ call_site: 18446744071581303129 } hitcount: 3 bytes_req: 144
|
|
|
|
{ call_site: 18446744071580713234 } hitcount: 4 bytes_req: 2560
|
|
|
|
{ call_site: 18446744071580933750 } hitcount: 4 bytes_req: 736
|
|
|
|
.
|
|
|
|
.
|
|
|
|
.
|
|
|
|
{ call_site: 18446744072106047046 } hitcount: 69 bytes_req: 5576
|
|
|
|
{ call_site: 18446744071582116407 } hitcount: 73 bytes_req: 2336
|
|
|
|
{ call_site: 18446744072106054684 } hitcount: 136 bytes_req: 140504
|
|
|
|
{ call_site: 18446744072106224230 } hitcount: 136 bytes_req: 19584
|
|
|
|
{ call_site: 18446744072106078074 } hitcount: 153 bytes_req: 2448
|
|
|
|
{ call_site: 18446744072106062406 } hitcount: 153 bytes_req: 36720
|
|
|
|
{ call_site: 18446744071582507929 } hitcount: 153 bytes_req: 37088
|
|
|
|
{ call_site: 18446744072102520590 } hitcount: 273 bytes_req: 10920
|
|
|
|
{ call_site: 18446744071582143559 } hitcount: 358 bytes_req: 716
|
|
|
|
{ call_site: 18446744072106465852 } hitcount: 417 bytes_req: 56712
|
|
|
|
{ call_site: 18446744072102523378 } hitcount: 485 bytes_req: 27160
|
|
|
|
{ call_site: 18446744072099568646 } hitcount: 1676 bytes_req: 33520
|
|
|
|
|
|
|
|
Totals:
|
|
|
|
Hits: 4610
|
|
|
|
Entries: 45
|
|
|
|
Dropped: 0
|
|
|
|
|
|
|
|
The output displays a line for each entry, beginning with the key
|
|
|
|
specified in the trigger, followed by the value(s) also specified in
|
|
|
|
the trigger. At the beginning of the output is a line that displays
|
|
|
|
the trigger info, which can also be displayed by reading the
|
|
|
|
'trigger' file:
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
|
|
|
|
hist:keys=call_site:vals=bytes_req:sort=hitcount:size=2048 [active]
|
|
|
|
|
|
|
|
At the end of the output are a few lines that display the overall
|
|
|
|
totals for the run. The 'Hits' field shows the total number of
|
|
|
|
times the event trigger was hit, the 'Entries' field shows the total
|
|
|
|
number of used entries in the hash table, and the 'Dropped' field
|
|
|
|
shows the number of hits that were dropped because the number of
|
|
|
|
used entries for the run exceeded the maximum number of entries
|
|
|
|
allowed for the table (normally 0, but if not a hint that you may
|
|
|
|
want to increase the size of the table using the 'size' parameter).
|
|
|
|
|
|
|
|
Notice in the above output that there's an extra field, 'hitcount',
|
|
|
|
which wasn't specified in the trigger. Also notice that in the
|
|
|
|
trigger info output, there's a parameter, 'sort=hitcount', which
|
|
|
|
wasn't specified in the trigger either. The reason for that is that
|
|
|
|
every trigger implicitly keeps a count of the total number of hits
|
|
|
|
attributed to a given entry, called the 'hitcount'. That hitcount
|
|
|
|
information is explicitly displayed in the output, and in the
|
|
|
|
absence of a user-specified sort parameter, is used as the default
|
|
|
|
sort field.
|
|
|
|
|
|
|
|
The value 'hitcount' can be used in place of an explicit value in
|
|
|
|
the 'values' parameter if you don't really need to have any
|
|
|
|
particular field summed and are mainly interested in hit
|
|
|
|
frequencies.
|
|
|
|
|
|
|
|
To turn the hist trigger off, simply call up the trigger in the
|
|
|
|
command history and re-execute it with a '!' prepended:
|
|
|
|
|
|
|
|
# echo '!hist:key=call_site:val=bytes_req' > \
|
|
|
|
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
|
|
|
|
|
|
|
|
Finally, notice that the call_site as displayed in the output above
|
|
|
|
isn't really very useful. It's an address, but normally addresses
|
|
|
|
are displayed in hex. To have a numeric field displayed as a hex
|
|
|
|
value, simply append '.hex' to the field name in the trigger:
|
|
|
|
|
|
|
|
# echo 'hist:key=call_site.hex:val=bytes_req' > \
|
|
|
|
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
|
|
|
|
# trigger info: hist:keys=call_site.hex:vals=bytes_req:sort=hitcount:size=2048 [active]
|
|
|
|
|
|
|
|
{ call_site: ffffffffa026b291 } hitcount: 1 bytes_req: 433
|
|
|
|
{ call_site: ffffffffa07186ff } hitcount: 1 bytes_req: 176
|
|
|
|
{ call_site: ffffffff811ae721 } hitcount: 1 bytes_req: 16384
|
|
|
|
{ call_site: ffffffff811c5134 } hitcount: 1 bytes_req: 8
|
|
|
|
{ call_site: ffffffffa04a9ebb } hitcount: 1 bytes_req: 511
|
|
|
|
{ call_site: ffffffff8122e0a6 } hitcount: 1 bytes_req: 12
|
|
|
|
{ call_site: ffffffff8107da84 } hitcount: 1 bytes_req: 152
|
|
|
|
{ call_site: ffffffff812d8246 } hitcount: 1 bytes_req: 24
|
|
|
|
{ call_site: ffffffff811dc1e5 } hitcount: 3 bytes_req: 144
|
|
|
|
{ call_site: ffffffffa02515e8 } hitcount: 3 bytes_req: 648
|
|
|
|
{ call_site: ffffffff81258159 } hitcount: 3 bytes_req: 144
|
|
|
|
{ call_site: ffffffff811c80f4 } hitcount: 4 bytes_req: 544
|
|
|
|
.
|
|
|
|
.
|
|
|
|
.
|
|
|
|
{ call_site: ffffffffa06c7646 } hitcount: 106 bytes_req: 8024
|
|
|
|
{ call_site: ffffffffa06cb246 } hitcount: 132 bytes_req: 31680
|
|
|
|
{ call_site: ffffffffa06cef7a } hitcount: 132 bytes_req: 2112
|
|
|
|
{ call_site: ffffffff8137e399 } hitcount: 132 bytes_req: 23232
|
|
|
|
{ call_site: ffffffffa06c941c } hitcount: 185 bytes_req: 171360
|
|
|
|
{ call_site: ffffffffa06f2a66 } hitcount: 185 bytes_req: 26640
|
|
|
|
{ call_site: ffffffffa036a70e } hitcount: 265 bytes_req: 10600
|
|
|
|
{ call_site: ffffffff81325447 } hitcount: 292 bytes_req: 584
|
|
|
|
{ call_site: ffffffffa072da3c } hitcount: 446 bytes_req: 60656
|
|
|
|
{ call_site: ffffffffa036b1f2 } hitcount: 526 bytes_req: 29456
|
|
|
|
{ call_site: ffffffffa0099c06 } hitcount: 1780 bytes_req: 35600
|
|
|
|
|
|
|
|
Totals:
|
|
|
|
Hits: 4775
|
|
|
|
Entries: 46
|
|
|
|
Dropped: 0
|
|
|
|
|
|
|
|
Even that's only marginally more useful - while hex values do look
|
|
|
|
more like addresses, what users are typically more interested in
|
|
|
|
when looking at text addresses are the corresponding symbols
|
|
|
|
instead. To have an address displayed as symbolic value instead,
|
|
|
|
simply append '.sym' or '.sym-offset' to the field name in the
|
|
|
|
trigger:
|
|
|
|
|
|
|
|
# echo 'hist:key=call_site.sym:val=bytes_req' > \
|
|
|
|
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
|
|
|
|
# trigger info: hist:keys=call_site.sym:vals=bytes_req:sort=hitcount:size=2048 [active]
|
|
|
|
|
|
|
|
{ call_site: [ffffffff810adcb9] syslog_print_all } hitcount: 1 bytes_req: 1024
|
|
|
|
{ call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8
|
|
|
|
{ call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7
|
|
|
|
{ call_site: [ffffffff8154acbe] usb_alloc_urb } hitcount: 1 bytes_req: 192
|
|
|
|
{ call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7
|
|
|
|
{ call_site: [ffffffff811e3a25] __seq_open_private } hitcount: 1 bytes_req: 40
|
|
|
|
{ call_site: [ffffffff8109524a] alloc_fair_sched_group } hitcount: 2 bytes_req: 128
|
|
|
|
{ call_site: [ffffffff811febd5] fsnotify_alloc_group } hitcount: 2 bytes_req: 528
|
|
|
|
{ call_site: [ffffffff81440f58] __tty_buffer_request_room } hitcount: 2 bytes_req: 2624
|
|
|
|
{ call_site: [ffffffff81200ba6] inotify_new_group } hitcount: 2 bytes_req: 96
|
|
|
|
{ call_site: [ffffffffa05e19af] ieee80211_start_tx_ba_session [mac80211] } hitcount: 2 bytes_req: 464
|
|
|
|
{ call_site: [ffffffff81672406] tcp_get_metrics } hitcount: 2 bytes_req: 304
|
|
|
|
{ call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128
|
|
|
|
{ call_site: [ffffffff81089b05] sched_create_group } hitcount: 2 bytes_req: 1424
|
|
|
|
.
|
|
|
|
.
|
|
|
|
.
|
|
|
|
{ call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 1185 bytes_req: 123240
|
|
|
|
{ call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl [drm] } hitcount: 1185 bytes_req: 104280
|
|
|
|
{ call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 1402 bytes_req: 190672
|
|
|
|
{ call_site: [ffffffff812891ca] ext4_find_extent } hitcount: 1518 bytes_req: 146208
|
|
|
|
{ call_site: [ffffffffa029070e] drm_vma_node_allow [drm] } hitcount: 1746 bytes_req: 69840
|
|
|
|
{ call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 2021 bytes_req: 792312
|
|
|
|
{ call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 2592 bytes_req: 145152
|
|
|
|
{ call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 2629 bytes_req: 378576
|
|
|
|
{ call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 2629 bytes_req: 3783248
|
|
|
|
{ call_site: [ffffffff81325607] apparmor_file_alloc_security } hitcount: 5192 bytes_req: 10384
|
|
|
|
{ call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 5529 bytes_req: 110584
|
|
|
|
{ call_site: [ffffffff8131ebf7] aa_alloc_task_context } hitcount: 21943 bytes_req: 702176
|
|
|
|
{ call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 55759 bytes_req: 5074265
|
|
|
|
|
|
|
|
Totals:
|
|
|
|
Hits: 109928
|
|
|
|
Entries: 71
|
|
|
|
Dropped: 0
|
|
|
|
|
|
|
|
Because the default sort key above is 'hitcount', the above shows a
|
|
|
|
the list of call_sites by increasing hitcount, so that at the bottom
|
|
|
|
we see the functions that made the most kmalloc calls during the
|
|
|
|
run. If instead we we wanted to see the top kmalloc callers in
|
|
|
|
terms of the number of bytes requested rather than the number of
|
|
|
|
calls, and we wanted the top caller to appear at the top, we can use
|
|
|
|
the 'sort' parameter, along with the 'descending' modifier:
|
|
|
|
|
|
|
|
# echo 'hist:key=call_site.sym:val=bytes_req:sort=bytes_req.descending' > \
|
|
|
|
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
|
|
|
|
# trigger info: hist:keys=call_site.sym:vals=bytes_req:sort=bytes_req.descending:size=2048 [active]
|
|
|
|
|
|
|
|
{ call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 2186 bytes_req: 3397464
|
|
|
|
{ call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 1790 bytes_req: 712176
|
|
|
|
{ call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 8132 bytes_req: 513135
|
|
|
|
{ call_site: [ffffffff811e2a1b] seq_buf_alloc } hitcount: 106 bytes_req: 440128
|
|
|
|
{ call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 2186 bytes_req: 314784
|
|
|
|
{ call_site: [ffffffff812891ca] ext4_find_extent } hitcount: 2174 bytes_req: 208992
|
|
|
|
{ call_site: [ffffffff811ae8e1] __kmalloc } hitcount: 8 bytes_req: 131072
|
|
|
|
{ call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 859 bytes_req: 116824
|
|
|
|
{ call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 1834 bytes_req: 102704
|
|
|
|
{ call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 972 bytes_req: 101088
|
|
|
|
{ call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl [drm] } hitcount: 972 bytes_req: 85536
|
|
|
|
{ call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 3333 bytes_req: 66664
|
|
|
|
{ call_site: [ffffffff8137e559] sg_kmalloc } hitcount: 209 bytes_req: 61632
|
|
|
|
.
|
|
|
|
.
|
|
|
|
.
|
|
|
|
{ call_site: [ffffffff81095225] alloc_fair_sched_group } hitcount: 2 bytes_req: 128
|
|
|
|
{ call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128
|
|
|
|
{ call_site: [ffffffff812d8406] copy_semundo } hitcount: 2 bytes_req: 48
|
|
|
|
{ call_site: [ffffffff81200ba6] inotify_new_group } hitcount: 1 bytes_req: 48
|
|
|
|
{ call_site: [ffffffffa027121a] drm_getmagic [drm] } hitcount: 1 bytes_req: 48
|
|
|
|
{ call_site: [ffffffff811e3a25] __seq_open_private } hitcount: 1 bytes_req: 40
|
|
|
|
{ call_site: [ffffffff811c52f4] bprm_change_interp } hitcount: 2 bytes_req: 16
|
|
|
|
{ call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8
|
|
|
|
{ call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7
|
|
|
|
{ call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7
|
|
|
|
|
|
|
|
Totals:
|
|
|
|
Hits: 32133
|
|
|
|
Entries: 81
|
|
|
|
Dropped: 0
|
|
|
|
|
|
|
|
To display the offset and size information in addition to the symbol
|
|
|
|
name, just use 'sym-offset' instead:
|
|
|
|
|
|
|
|
# echo 'hist:key=call_site.sym-offset:val=bytes_req:sort=bytes_req.descending' > \
|
|
|
|
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
|
|
|
|
# trigger info: hist:keys=call_site.sym-offset:vals=bytes_req:sort=bytes_req.descending:size=2048 [active]
|
|
|
|
|
|
|
|
{ call_site: [ffffffffa046041c] i915_gem_execbuffer2+0x6c/0x2c0 [i915] } hitcount: 4569 bytes_req: 3163720
|
|
|
|
{ call_site: [ffffffffa0489a66] intel_ring_begin+0xc6/0x1f0 [i915] } hitcount: 4569 bytes_req: 657936
|
|
|
|
{ call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23+0x694/0x1020 [i915] } hitcount: 1519 bytes_req: 472936
|
|
|
|
{ call_site: [ffffffffa045e646] i915_gem_do_execbuffer.isra.23+0x516/0x1020 [i915] } hitcount: 3050 bytes_req: 211832
|
|
|
|
{ call_site: [ffffffff811e2a1b] seq_buf_alloc+0x1b/0x50 } hitcount: 34 bytes_req: 148384
|
|
|
|
{ call_site: [ffffffffa04a580c] intel_crtc_page_flip+0xbc/0x870 [i915] } hitcount: 1385 bytes_req: 144040
|
|
|
|
{ call_site: [ffffffff811ae8e1] __kmalloc+0x191/0x1b0 } hitcount: 8 bytes_req: 131072
|
|
|
|
{ call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl+0x282/0x360 [drm] } hitcount: 1385 bytes_req: 121880
|
|
|
|
{ call_site: [ffffffffa02911f2] drm_modeset_lock_crtc+0x32/0x100 [drm] } hitcount: 1848 bytes_req: 103488
|
|
|
|
{ call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state+0x2c/0xa0 [i915] } hitcount: 461 bytes_req: 62696
|
|
|
|
{ call_site: [ffffffffa029070e] drm_vma_node_allow+0x2e/0xd0 [drm] } hitcount: 1541 bytes_req: 61640
|
|
|
|
{ call_site: [ffffffff815f8d7b] sk_prot_alloc+0xcb/0x1b0 } hitcount: 57 bytes_req: 57456
|
|
|
|
.
|
|
|
|
.
|
|
|
|
.
|
|
|
|
{ call_site: [ffffffff8109524a] alloc_fair_sched_group+0x5a/0x1a0 } hitcount: 2 bytes_req: 128
|
|
|
|
{ call_site: [ffffffffa027b921] drm_vm_open_locked+0x31/0xa0 [drm] } hitcount: 3 bytes_req: 96
|
|
|
|
{ call_site: [ffffffff8122e266] proc_self_follow_link+0x76/0xb0 } hitcount: 8 bytes_req: 96
|
|
|
|
{ call_site: [ffffffff81213e80] load_elf_binary+0x240/0x1650 } hitcount: 3 bytes_req: 84
|
|
|
|
{ call_site: [ffffffff8154bc62] usb_control_msg+0x42/0x110 } hitcount: 1 bytes_req: 8
|
|
|
|
{ call_site: [ffffffffa00bf6fe] hidraw_send_report+0x7e/0x1a0 [hid] } hitcount: 1 bytes_req: 7
|
|
|
|
{ call_site: [ffffffffa00bf1ca] hidraw_report_event+0x8a/0x120 [hid] } hitcount: 1 bytes_req: 7
|
|
|
|
|
|
|
|
Totals:
|
|
|
|
Hits: 26098
|
|
|
|
Entries: 64
|
|
|
|
Dropped: 0
|
|
|
|
|
|
|
|
We can also add multiple fields to the 'values' parameter. For
|
|
|
|
example, we might want to see the total number of bytes allocated
|
|
|
|
alongside bytes requested, and display the result sorted by bytes
|
|
|
|
allocated in a descending order:
|
|
|
|
|
|
|
|
# echo 'hist:keys=call_site.sym:values=bytes_req,bytes_alloc:sort=bytes_alloc.descending' > \
|
|
|
|
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
|
|
|
|
# trigger info: hist:keys=call_site.sym:vals=bytes_req,bytes_alloc:sort=bytes_alloc.descending:size=2048 [active]
|
|
|
|
|
|
|
|
{ call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 7403 bytes_req: 4084360 bytes_alloc: 5958016
|
|
|
|
{ call_site: [ffffffff811e2a1b] seq_buf_alloc } hitcount: 541 bytes_req: 2213968 bytes_alloc: 2228224
|
|
|
|
{ call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 7404 bytes_req: 1066176 bytes_alloc: 1421568
|
|
|
|
{ call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 1565 bytes_req: 557368 bytes_alloc: 1037760
|
|
|
|
{ call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 9557 bytes_req: 595778 bytes_alloc: 695744
|
|
|
|
{ call_site: [ffffffffa045e646] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 5839 bytes_req: 430680 bytes_alloc: 470400
|
|
|
|
{ call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 2388 bytes_req: 324768 bytes_alloc: 458496
|
|
|
|
{ call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 3911 bytes_req: 219016 bytes_alloc: 250304
|
|
|
|
{ call_site: [ffffffff815f8d7b] sk_prot_alloc } hitcount: 235 bytes_req: 236880 bytes_alloc: 240640
|
|
|
|
{ call_site: [ffffffff8137e559] sg_kmalloc } hitcount: 557 bytes_req: 169024 bytes_alloc: 221760
|
|
|
|
{ call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 9378 bytes_req: 187548 bytes_alloc: 206312
|
|
|
|
{ call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 1519 bytes_req: 157976 bytes_alloc: 194432
|
|
|
|
.
|
|
|
|
.
|
|
|
|
.
|
|
|
|
{ call_site: [ffffffff8109bd3b] sched_autogroup_create_attach } hitcount: 2 bytes_req: 144 bytes_alloc: 192
|
|
|
|
{ call_site: [ffffffff81097ee8] alloc_rt_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128
|
|
|
|
{ call_site: [ffffffff8109524a] alloc_fair_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128
|
|
|
|
{ call_site: [ffffffff81095225] alloc_fair_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128
|
|
|
|
{ call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128
|
|
|
|
{ call_site: [ffffffff81213e80] load_elf_binary } hitcount: 3 bytes_req: 84 bytes_alloc: 96
|
|
|
|
{ call_site: [ffffffff81079a2e] kthread_create_on_node } hitcount: 1 bytes_req: 56 bytes_alloc: 64
|
|
|
|
{ call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7 bytes_alloc: 8
|
|
|
|
{ call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8 bytes_alloc: 8
|
|
|
|
{ call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7 bytes_alloc: 8
|
|
|
|
|
|
|
|
Totals:
|
|
|
|
Hits: 66598
|
|
|
|
Entries: 65
|
|
|
|
Dropped: 0
|
|
|
|
|
|
|
|
Finally, to finish off our kmalloc example, instead of simply having
|
|
|
|
the hist trigger display symbolic call_sites, we can have the hist
|
|
|
|
trigger additionally display the complete set of kernel stack traces
|
|
|
|
that led to each call_site. To do that, we simply use the special
|
|
|
|
value 'stacktrace' for the key parameter:
|
|
|
|
|
|
|
|
# echo 'hist:keys=stacktrace:values=bytes_req,bytes_alloc:sort=bytes_alloc' > \
|
|
|
|
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
|
|
|
|
|
|
|
|
The above trigger will use the kernel stack trace in effect when an
|
|
|
|
event is triggered as the key for the hash table. This allows the
|
|
|
|
enumeration of every kernel callpath that led up to a particular
|
|
|
|
event, along with a running total of any of the event fields for
|
|
|
|
that event. Here we tally bytes requested and bytes allocated for
|
|
|
|
every callpath in the system that led up to a kmalloc (in this case
|
|
|
|
every callpath to a kmalloc for a kernel compile):
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
|
|
|
|
# trigger info: hist:keys=stacktrace:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048 [active]
|
|
|
|
|
|
|
|
{ stacktrace:
|
|
|
|
__kmalloc_track_caller+0x10b/0x1a0
|
|
|
|
kmemdup+0x20/0x50
|
|
|
|
hidraw_report_event+0x8a/0x120 [hid]
|
|
|
|
hid_report_raw_event+0x3ea/0x440 [hid]
|
|
|
|
hid_input_report+0x112/0x190 [hid]
|
|
|
|
hid_irq_in+0xc2/0x260 [usbhid]
|
|
|
|
__usb_hcd_giveback_urb+0x72/0x120
|
|
|
|
usb_giveback_urb_bh+0x9e/0xe0
|
|
|
|
tasklet_hi_action+0xf8/0x100
|
|
|
|
__do_softirq+0x114/0x2c0
|
|
|
|
irq_exit+0xa5/0xb0
|
|
|
|
do_IRQ+0x5a/0xf0
|
|
|
|
ret_from_intr+0x0/0x30
|
|
|
|
cpuidle_enter+0x17/0x20
|
|
|
|
cpu_startup_entry+0x315/0x3e0
|
|
|
|
rest_init+0x7c/0x80
|
|
|
|
} hitcount: 3 bytes_req: 21 bytes_alloc: 24
|
|
|
|
{ stacktrace:
|
|
|
|
__kmalloc_track_caller+0x10b/0x1a0
|
|
|
|
kmemdup+0x20/0x50
|
|
|
|
hidraw_report_event+0x8a/0x120 [hid]
|
|
|
|
hid_report_raw_event+0x3ea/0x440 [hid]
|
|
|
|
hid_input_report+0x112/0x190 [hid]
|
|
|
|
hid_irq_in+0xc2/0x260 [usbhid]
|
|
|
|
__usb_hcd_giveback_urb+0x72/0x120
|
|
|
|
usb_giveback_urb_bh+0x9e/0xe0
|
|
|
|
tasklet_hi_action+0xf8/0x100
|
|
|
|
__do_softirq+0x114/0x2c0
|
|
|
|
irq_exit+0xa5/0xb0
|
|
|
|
do_IRQ+0x5a/0xf0
|
|
|
|
ret_from_intr+0x0/0x30
|
|
|
|
} hitcount: 3 bytes_req: 21 bytes_alloc: 24
|
|
|
|
{ stacktrace:
|
|
|
|
kmem_cache_alloc_trace+0xeb/0x150
|
|
|
|
aa_alloc_task_context+0x27/0x40
|
|
|
|
apparmor_cred_prepare+0x1f/0x50
|
|
|
|
security_prepare_creds+0x16/0x20
|
|
|
|
prepare_creds+0xdf/0x1a0
|
|
|
|
SyS_capset+0xb5/0x200
|
|
|
|
system_call_fastpath+0x12/0x6a
|
|
|
|
} hitcount: 1 bytes_req: 32 bytes_alloc: 32
|
|
|
|
.
|
|
|
|
.
|
|
|
|
.
|
|
|
|
{ stacktrace:
|
|
|
|
__kmalloc+0x11b/0x1b0
|
|
|
|
i915_gem_execbuffer2+0x6c/0x2c0 [i915]
|
|
|
|
drm_ioctl+0x349/0x670 [drm]
|
|
|
|
do_vfs_ioctl+0x2f0/0x4f0
|
|
|
|
SyS_ioctl+0x81/0xa0
|
|
|
|
system_call_fastpath+0x12/0x6a
|
|
|
|
} hitcount: 17726 bytes_req: 13944120 bytes_alloc: 19593808
|
|
|
|
{ stacktrace:
|
|
|
|
__kmalloc+0x11b/0x1b0
|
|
|
|
load_elf_phdrs+0x76/0xa0
|
|
|
|
load_elf_binary+0x102/0x1650
|
|
|
|
search_binary_handler+0x97/0x1d0
|
|
|
|
do_execveat_common.isra.34+0x551/0x6e0
|
|
|
|
SyS_execve+0x3a/0x50
|
|
|
|
return_from_execve+0x0/0x23
|
|
|
|
} hitcount: 33348 bytes_req: 17152128 bytes_alloc: 20226048
|
|
|
|
{ stacktrace:
|
|
|
|
kmem_cache_alloc_trace+0xeb/0x150
|
|
|
|
apparmor_file_alloc_security+0x27/0x40
|
|
|
|
security_file_alloc+0x16/0x20
|
|
|
|
get_empty_filp+0x93/0x1c0
|
|
|
|
path_openat+0x31/0x5f0
|
|
|
|
do_filp_open+0x3a/0x90
|
|
|
|
do_sys_open+0x128/0x220
|
|
|
|
SyS_open+0x1e/0x20
|
|
|
|
system_call_fastpath+0x12/0x6a
|
|
|
|
} hitcount: 4766422 bytes_req: 9532844 bytes_alloc: 38131376
|
|
|
|
{ stacktrace:
|
|
|
|
__kmalloc+0x11b/0x1b0
|
|
|
|
seq_buf_alloc+0x1b/0x50
|
|
|
|
seq_read+0x2cc/0x370
|
|
|
|
proc_reg_read+0x3d/0x80
|
|
|
|
__vfs_read+0x28/0xe0
|
|
|
|
vfs_read+0x86/0x140
|
|
|
|
SyS_read+0x46/0xb0
|
|
|
|
system_call_fastpath+0x12/0x6a
|
|
|
|
} hitcount: 19133 bytes_req: 78368768 bytes_alloc: 78368768
|
|
|
|
|
|
|
|
Totals:
|
|
|
|
Hits: 6085872
|
|
|
|
Entries: 253
|
|
|
|
Dropped: 0
|
|
|
|
|
|
|
|
If you key a hist trigger on common_pid, in order for example to
|
|
|
|
gather and display sorted totals for each process, you can use the
|
|
|
|
special .execname modifier to display the executable names for the
|
|
|
|
processes in the table rather than raw pids. The example below
|
|
|
|
keeps a per-process sum of total bytes read:
|
|
|
|
|
|
|
|
# echo 'hist:key=common_pid.execname:val=count:sort=count.descending' > \
|
|
|
|
/sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/hist
|
|
|
|
# trigger info: hist:keys=common_pid.execname:vals=count:sort=count.descending:size=2048 [active]
|
|
|
|
|
|
|
|
{ common_pid: gnome-terminal [ 3196] } hitcount: 280 count: 1093512
|
|
|
|
{ common_pid: Xorg [ 1309] } hitcount: 525 count: 256640
|
|
|
|
{ common_pid: compiz [ 2889] } hitcount: 59 count: 254400
|
|
|
|
{ common_pid: bash [ 8710] } hitcount: 3 count: 66369
|
|
|
|
{ common_pid: dbus-daemon-lau [ 8703] } hitcount: 49 count: 47739
|
|
|
|
{ common_pid: irqbalance [ 1252] } hitcount: 27 count: 27648
|
|
|
|
{ common_pid: 01ifupdown [ 8705] } hitcount: 3 count: 17216
|
|
|
|
{ common_pid: dbus-daemon [ 772] } hitcount: 10 count: 12396
|
|
|
|
{ common_pid: Socket Thread [ 8342] } hitcount: 11 count: 11264
|
|
|
|
{ common_pid: nm-dhcp-client. [ 8701] } hitcount: 6 count: 7424
|
|
|
|
{ common_pid: gmain [ 1315] } hitcount: 18 count: 6336
|
|
|
|
.
|
|
|
|
.
|
|
|
|
.
|
|
|
|
{ common_pid: postgres [ 1892] } hitcount: 2 count: 32
|
|
|
|
{ common_pid: postgres [ 1891] } hitcount: 2 count: 32
|
|
|
|
{ common_pid: gmain [ 8704] } hitcount: 2 count: 32
|
|
|
|
{ common_pid: upstart-dbus-br [ 2740] } hitcount: 21 count: 21
|
|
|
|
{ common_pid: nm-dispatcher.a [ 8696] } hitcount: 1 count: 16
|
|
|
|
{ common_pid: indicator-datet [ 2904] } hitcount: 1 count: 16
|
|
|
|
{ common_pid: gdbus [ 2998] } hitcount: 1 count: 16
|
|
|
|
{ common_pid: rtkit-daemon [ 2052] } hitcount: 1 count: 8
|
|
|
|
{ common_pid: init [ 1] } hitcount: 2 count: 2
|
|
|
|
|
|
|
|
Totals:
|
|
|
|
Hits: 2116
|
|
|
|
Entries: 51
|
|
|
|
Dropped: 0
|
|
|
|
|
|
|
|
Similarly, if you key a hist trigger on syscall id, for example to
|
|
|
|
gather and display a list of systemwide syscall hits, you can use
|
|
|
|
the special .syscall modifier to display the syscall names rather
|
|
|
|
than raw ids. The example below keeps a running total of syscall
|
|
|
|
counts for the system during the run:
|
|
|
|
|
|
|
|
# echo 'hist:key=id.syscall:val=hitcount' > \
|
|
|
|
/sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
|
|
|
|
# trigger info: hist:keys=id.syscall:vals=hitcount:sort=hitcount:size=2048 [active]
|
|
|
|
|
|
|
|
{ id: sys_fsync [ 74] } hitcount: 1
|
|
|
|
{ id: sys_newuname [ 63] } hitcount: 1
|
|
|
|
{ id: sys_prctl [157] } hitcount: 1
|
|
|
|
{ id: sys_statfs [137] } hitcount: 1
|
|
|
|
{ id: sys_symlink [ 88] } hitcount: 1
|
|
|
|
{ id: sys_sendmmsg [307] } hitcount: 1
|
|
|
|
{ id: sys_semctl [ 66] } hitcount: 1
|
|
|
|
{ id: sys_readlink [ 89] } hitcount: 3
|
|
|
|
{ id: sys_bind [ 49] } hitcount: 3
|
|
|
|
{ id: sys_getsockname [ 51] } hitcount: 3
|
|
|
|
{ id: sys_unlink [ 87] } hitcount: 3
|
|
|
|
{ id: sys_rename [ 82] } hitcount: 4
|
|
|
|
{ id: unknown_syscall [ 58] } hitcount: 4
|
|
|
|
{ id: sys_connect [ 42] } hitcount: 4
|
|
|
|
{ id: sys_getpid [ 39] } hitcount: 4
|
|
|
|
.
|
|
|
|
.
|
|
|
|
.
|
|
|
|
{ id: sys_rt_sigprocmask [ 14] } hitcount: 952
|
|
|
|
{ id: sys_futex [202] } hitcount: 1534
|
|
|
|
{ id: sys_write [ 1] } hitcount: 2689
|
|
|
|
{ id: sys_setitimer [ 38] } hitcount: 2797
|
|
|
|
{ id: sys_read [ 0] } hitcount: 3202
|
|
|
|
{ id: sys_select [ 23] } hitcount: 3773
|
|
|
|
{ id: sys_writev [ 20] } hitcount: 4531
|
|
|
|
{ id: sys_poll [ 7] } hitcount: 8314
|
|
|
|
{ id: sys_recvmsg [ 47] } hitcount: 13738
|
|
|
|
{ id: sys_ioctl [ 16] } hitcount: 21843
|
|
|
|
|
|
|
|
Totals:
|
|
|
|
Hits: 67612
|
|
|
|
Entries: 72
|
|
|
|
Dropped: 0
|
|
|
|
|
|
|
|
The syscall counts above provide a rough overall picture of system
|
|
|
|
call activity on the system; we can see for example that the most
|
|
|
|
popular system call on this system was the 'sys_ioctl' system call.
|
|
|
|
|
|
|
|
We can use 'compound' keys to refine that number and provide some
|
|
|
|
further insight as to which processes exactly contribute to the
|
|
|
|
overall ioctl count.
|
|
|
|
|
|
|
|
The command below keeps a hitcount for every unique combination of
|
|
|
|
system call id and pid - the end result is essentially a table
|
|
|
|
that keeps a per-pid sum of system call hits. The results are
|
|
|
|
sorted using the system call id as the primary key, and the
|
|
|
|
hitcount sum as the secondary key:
|
|
|
|
|
|
|
|
# echo 'hist:key=id.syscall,common_pid.execname:val=hitcount:sort=id,hitcount' > \
|
|
|
|
/sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
|
|
|
|
# trigger info: hist:keys=id.syscall,common_pid.execname:vals=hitcount:sort=id.syscall,hitcount:size=2048 [active]
|
|
|
|
|
|
|
|
{ id: sys_read [ 0], common_pid: rtkit-daemon [ 1877] } hitcount: 1
|
|
|
|
{ id: sys_read [ 0], common_pid: gdbus [ 2976] } hitcount: 1
|
|
|
|
{ id: sys_read [ 0], common_pid: console-kit-dae [ 3400] } hitcount: 1
|
|
|
|
{ id: sys_read [ 0], common_pid: postgres [ 1865] } hitcount: 1
|
|
|
|
{ id: sys_read [ 0], common_pid: deja-dup-monito [ 3543] } hitcount: 2
|
|
|
|
{ id: sys_read [ 0], common_pid: NetworkManager [ 890] } hitcount: 2
|
|
|
|
{ id: sys_read [ 0], common_pid: evolution-calen [ 3048] } hitcount: 2
|
|
|
|
{ id: sys_read [ 0], common_pid: postgres [ 1864] } hitcount: 2
|
|
|
|
{ id: sys_read [ 0], common_pid: nm-applet [ 3022] } hitcount: 2
|
|
|
|
{ id: sys_read [ 0], common_pid: whoopsie [ 1212] } hitcount: 2
|
|
|
|
.
|
|
|
|
.
|
|
|
|
.
|
|
|
|
{ id: sys_ioctl [ 16], common_pid: bash [ 8479] } hitcount: 1
|
|
|
|
{ id: sys_ioctl [ 16], common_pid: bash [ 3472] } hitcount: 12
|
|
|
|
{ id: sys_ioctl [ 16], common_pid: gnome-terminal [ 3199] } hitcount: 16
|
|
|
|
{ id: sys_ioctl [ 16], common_pid: Xorg [ 1267] } hitcount: 1808
|
|
|
|
{ id: sys_ioctl [ 16], common_pid: compiz [ 2994] } hitcount: 5580
|
|
|
|
.
|
|
|
|
.
|
|
|
|
.
|
|
|
|
{ id: sys_waitid [247], common_pid: upstart-dbus-br [ 2690] } hitcount: 3
|
|
|
|
{ id: sys_waitid [247], common_pid: upstart-dbus-br [ 2688] } hitcount: 16
|
|
|
|
{ id: sys_inotify_add_watch [254], common_pid: gmain [ 975] } hitcount: 2
|
|
|
|
{ id: sys_inotify_add_watch [254], common_pid: gmain [ 3204] } hitcount: 4
|
|
|
|
{ id: sys_inotify_add_watch [254], common_pid: gmain [ 2888] } hitcount: 4
|
|
|
|
{ id: sys_inotify_add_watch [254], common_pid: gmain [ 3003] } hitcount: 4
|
|
|
|
{ id: sys_inotify_add_watch [254], common_pid: gmain [ 2873] } hitcount: 4
|
|
|
|
{ id: sys_inotify_add_watch [254], common_pid: gmain [ 3196] } hitcount: 6
|
|
|
|
{ id: sys_openat [257], common_pid: java [ 2623] } hitcount: 2
|
|
|
|
{ id: sys_eventfd2 [290], common_pid: ibus-ui-gtk3 [ 2760] } hitcount: 4
|
|
|
|
{ id: sys_eventfd2 [290], common_pid: compiz [ 2994] } hitcount: 6
|
|
|
|
|
|
|
|
Totals:
|
|
|
|
Hits: 31536
|
|
|
|
Entries: 323
|
|
|
|
Dropped: 0
|
|
|
|
|
|
|
|
The above list does give us a breakdown of the ioctl syscall by
|
|
|
|
pid, but it also gives us quite a bit more than that, which we
|
|
|
|
don't really care about at the moment. Since we know the syscall
|
|
|
|
id for sys_ioctl (16, displayed next to the sys_ioctl name), we
|
|
|
|
can use that to filter out all the other syscalls:
|
|
|
|
|
|
|
|
# echo 'hist:key=id.syscall,common_pid.execname:val=hitcount:sort=id,hitcount if id == 16' > \
|
|
|
|
/sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
|
|
|
|
# trigger info: hist:keys=id.syscall,common_pid.execname:vals=hitcount:sort=id.syscall,hitcount:size=2048 if id == 16 [active]
|
|
|
|
|
|
|
|
{ id: sys_ioctl [ 16], common_pid: gmain [ 2769] } hitcount: 1
|
|
|
|
{ id: sys_ioctl [ 16], common_pid: evolution-addre [ 8571] } hitcount: 1
|
|
|
|
{ id: sys_ioctl [ 16], common_pid: gmain [ 3003] } hitcount: 1
|
|
|
|
{ id: sys_ioctl [ 16], common_pid: gmain [ 2781] } hitcount: 1
|
|
|
|
{ id: sys_ioctl [ 16], common_pid: gmain [ 2829] } hitcount: 1
|
|
|
|
{ id: sys_ioctl [ 16], common_pid: bash [ 8726] } hitcount: 1
|
|
|
|
{ id: sys_ioctl [ 16], common_pid: bash [ 8508] } hitcount: 1
|
|
|
|
{ id: sys_ioctl [ 16], common_pid: gmain [ 2970] } hitcount: 1
|
|
|
|
{ id: sys_ioctl [ 16], common_pid: gmain [ 2768] } hitcount: 1
|
|
|
|
.
|
|
|
|
.
|
|
|
|
.
|
|
|
|
{ id: sys_ioctl [ 16], common_pid: pool [ 8559] } hitcount: 45
|
|
|
|
{ id: sys_ioctl [ 16], common_pid: pool [ 8555] } hitcount: 48
|
|
|
|
{ id: sys_ioctl [ 16], common_pid: pool [ 8551] } hitcount: 48
|
|
|
|
{ id: sys_ioctl [ 16], common_pid: avahi-daemon [ 896] } hitcount: 66
|
|
|
|
{ id: sys_ioctl [ 16], common_pid: Xorg [ 1267] } hitcount: 26674
|
|
|
|
{ id: sys_ioctl [ 16], common_pid: compiz [ 2994] } hitcount: 73443
|
|
|
|
|
|
|
|
Totals:
|
|
|
|
Hits: 101162
|
|
|
|
Entries: 103
|
|
|
|
Dropped: 0
|
|
|
|
|
|
|
|
The above output shows that 'compiz' and 'Xorg' are far and away
|
|
|
|
the heaviest ioctl callers (which might lead to questions about
|
|
|
|
whether they really need to be making all those calls and to
|
|
|
|
possible avenues for further investigation.)
|
|
|
|
|
|
|
|
The compound key examples used a key and a sum value (hitcount) to
|
|
|
|
sort the output, but we can just as easily use two keys instead.
|
|
|
|
Here's an example where we use a compound key composed of the the
|
|
|
|
common_pid and size event fields. Sorting with pid as the primary
|
|
|
|
key and 'size' as the secondary key allows us to display an
|
|
|
|
ordered summary of the recvfrom sizes, with counts, received by
|
|
|
|
each process:
|
|
|
|
|
|
|
|
# echo 'hist:key=common_pid.execname,size:val=hitcount:sort=common_pid,size' > \
|
|
|
|
/sys/kernel/debug/tracing/events/syscalls/sys_enter_recvfrom/trigger
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_recvfrom/hist
|
|
|
|
# trigger info: hist:keys=common_pid.execname,size:vals=hitcount:sort=common_pid.execname,size:size=2048 [active]
|
|
|
|
|
|
|
|
{ common_pid: smbd [ 784], size: 4 } hitcount: 1
|
|
|
|
{ common_pid: dnsmasq [ 1412], size: 4096 } hitcount: 672
|
|
|
|
{ common_pid: postgres [ 1796], size: 1000 } hitcount: 6
|
|
|
|
{ common_pid: postgres [ 1867], size: 1000 } hitcount: 10
|
|
|
|
{ common_pid: bamfdaemon [ 2787], size: 28 } hitcount: 2
|
|
|
|
{ common_pid: bamfdaemon [ 2787], size: 14360 } hitcount: 1
|
|
|
|
{ common_pid: compiz [ 2994], size: 8 } hitcount: 1
|
|
|
|
{ common_pid: compiz [ 2994], size: 20 } hitcount: 11
|
|
|
|
{ common_pid: gnome-terminal [ 3199], size: 4 } hitcount: 2
|
|
|
|
{ common_pid: firefox [ 8817], size: 4 } hitcount: 1
|
|
|
|
{ common_pid: firefox [ 8817], size: 8 } hitcount: 5
|
|
|
|
{ common_pid: firefox [ 8817], size: 588 } hitcount: 2
|
|
|
|
{ common_pid: firefox [ 8817], size: 628 } hitcount: 1
|
|
|
|
{ common_pid: firefox [ 8817], size: 6944 } hitcount: 1
|
|
|
|
{ common_pid: firefox [ 8817], size: 408880 } hitcount: 2
|
|
|
|
{ common_pid: firefox [ 8822], size: 8 } hitcount: 2
|
|
|
|
{ common_pid: firefox [ 8822], size: 160 } hitcount: 2
|
|
|
|
{ common_pid: firefox [ 8822], size: 320 } hitcount: 2
|
|
|
|
{ common_pid: firefox [ 8822], size: 352 } hitcount: 1
|
|
|
|
.
|
|
|
|
.
|
|
|
|
.
|
|
|
|
{ common_pid: pool [ 8923], size: 1960 } hitcount: 10
|
|
|
|
{ common_pid: pool [ 8923], size: 2048 } hitcount: 10
|
|
|
|
{ common_pid: pool [ 8924], size: 1960 } hitcount: 10
|
|
|
|
{ common_pid: pool [ 8924], size: 2048 } hitcount: 10
|
|
|
|
{ common_pid: pool [ 8928], size: 1964 } hitcount: 4
|
|
|
|
{ common_pid: pool [ 8928], size: 1965 } hitcount: 2
|
|
|
|
{ common_pid: pool [ 8928], size: 2048 } hitcount: 6
|
|
|
|
{ common_pid: pool [ 8929], size: 1982 } hitcount: 1
|
|
|
|
{ common_pid: pool [ 8929], size: 2048 } hitcount: 1
|
|
|
|
|
|
|
|
Totals:
|
|
|
|
Hits: 2016
|
|
|
|
Entries: 224
|
|
|
|
Dropped: 0
|
|
|
|
|
|
|
|
The above example also illustrates the fact that although a compound
|
|
|
|
key is treated as a single entity for hashing purposes, the sub-keys
|
|
|
|
it's composed of can be accessed independently.
|
|
|
|
|
|
|
|
The next example uses a string field as the hash key and
|
|
|
|
demonstrates how you can manually pause and continue a hist trigger.
|
|
|
|
In this example, we'll aggregate fork counts and don't expect a
|
|
|
|
large number of entries in the hash table, so we'll drop it to a
|
|
|
|
much smaller number, say 256:
|
|
|
|
|
|
|
|
# echo 'hist:key=child_comm:val=hitcount:size=256' > \
|
|
|
|
/sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
|
|
|
|
# trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [active]
|
|
|
|
|
|
|
|
{ child_comm: dconf worker } hitcount: 1
|
|
|
|
{ child_comm: ibus-daemon } hitcount: 1
|
|
|
|
{ child_comm: whoopsie } hitcount: 1
|
|
|
|
{ child_comm: smbd } hitcount: 1
|
|
|
|
{ child_comm: gdbus } hitcount: 1
|
|
|
|
{ child_comm: kthreadd } hitcount: 1
|
|
|
|
{ child_comm: dconf worker } hitcount: 1
|
|
|
|
{ child_comm: evolution-alarm } hitcount: 2
|
|
|
|
{ child_comm: Socket Thread } hitcount: 2
|
|
|
|
{ child_comm: postgres } hitcount: 2
|
|
|
|
{ child_comm: bash } hitcount: 3
|
|
|
|
{ child_comm: compiz } hitcount: 3
|
|
|
|
{ child_comm: evolution-sourc } hitcount: 4
|
|
|
|
{ child_comm: dhclient } hitcount: 4
|
|
|
|
{ child_comm: pool } hitcount: 5
|
|
|
|
{ child_comm: nm-dispatcher.a } hitcount: 8
|
|
|
|
{ child_comm: firefox } hitcount: 8
|
|
|
|
{ child_comm: dbus-daemon } hitcount: 8
|
|
|
|
{ child_comm: glib-pacrunner } hitcount: 10
|
|
|
|
{ child_comm: evolution } hitcount: 23
|
|
|
|
|
|
|
|
Totals:
|
|
|
|
Hits: 89
|
|
|
|
Entries: 20
|
|
|
|
Dropped: 0
|
|
|
|
|
|
|
|
If we want to pause the hist trigger, we can simply append :pause to
|
|
|
|
the command that started the trigger. Notice that the trigger info
|
|
|
|
displays as [paused]:
|
|
|
|
|
|
|
|
# echo 'hist:key=child_comm:val=hitcount:size=256:pause' >> \
|
|
|
|
/sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
|
|
|
|
# trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [paused]
|
|
|
|
|
|
|
|
{ child_comm: dconf worker } hitcount: 1
|
|
|
|
{ child_comm: kthreadd } hitcount: 1
|
|
|
|
{ child_comm: dconf worker } hitcount: 1
|
|
|
|
{ child_comm: gdbus } hitcount: 1
|
|
|
|
{ child_comm: ibus-daemon } hitcount: 1
|
|
|
|
{ child_comm: Socket Thread } hitcount: 2
|
|
|
|
{ child_comm: evolution-alarm } hitcount: 2
|
|
|
|
{ child_comm: smbd } hitcount: 2
|
|
|
|
{ child_comm: bash } hitcount: 3
|
|
|
|
{ child_comm: whoopsie } hitcount: 3
|
|
|
|
{ child_comm: compiz } hitcount: 3
|
|
|
|
{ child_comm: evolution-sourc } hitcount: 4
|
|
|
|
{ child_comm: pool } hitcount: 5
|
|
|
|
{ child_comm: postgres } hitcount: 6
|
|
|
|
{ child_comm: firefox } hitcount: 8
|
|
|
|
{ child_comm: dhclient } hitcount: 10
|
|
|
|
{ child_comm: emacs } hitcount: 12
|
|
|
|
{ child_comm: dbus-daemon } hitcount: 20
|
|
|
|
{ child_comm: nm-dispatcher.a } hitcount: 20
|
|
|
|
{ child_comm: evolution } hitcount: 35
|
|
|
|
{ child_comm: glib-pacrunner } hitcount: 59
|
|
|
|
|
|
|
|
Totals:
|
|
|
|
Hits: 199
|
|
|
|
Entries: 21
|
|
|
|
Dropped: 0
|
|
|
|
|
|
|
|
To manually continue having the trigger aggregate events, append
|
|
|
|
:cont instead. Notice that the trigger info displays as [active]
|
|
|
|
again, and the data has changed:
|
|
|
|
|
|
|
|
# echo 'hist:key=child_comm:val=hitcount:size=256:cont' >> \
|
|
|
|
/sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
|
|
|
|
# trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [active]
|
|
|
|
|
|
|
|
{ child_comm: dconf worker } hitcount: 1
|
|
|
|
{ child_comm: dconf worker } hitcount: 1
|
|
|
|
{ child_comm: kthreadd } hitcount: 1
|
|
|
|
{ child_comm: gdbus } hitcount: 1
|
|
|
|
{ child_comm: ibus-daemon } hitcount: 1
|
|
|
|
{ child_comm: Socket Thread } hitcount: 2
|
|
|
|
{ child_comm: evolution-alarm } hitcount: 2
|
|
|
|
{ child_comm: smbd } hitcount: 2
|
|
|
|
{ child_comm: whoopsie } hitcount: 3
|
|
|
|
{ child_comm: compiz } hitcount: 3
|
|
|
|
{ child_comm: evolution-sourc } hitcount: 4
|
|
|
|
{ child_comm: bash } hitcount: 5
|
|
|
|
{ child_comm: pool } hitcount: 5
|
|
|
|
{ child_comm: postgres } hitcount: 6
|
|
|
|
{ child_comm: firefox } hitcount: 8
|
|
|
|
{ child_comm: dhclient } hitcount: 11
|
|
|
|
{ child_comm: emacs } hitcount: 12
|
|
|
|
{ child_comm: dbus-daemon } hitcount: 22
|
|
|
|
{ child_comm: nm-dispatcher.a } hitcount: 22
|
|
|
|
{ child_comm: evolution } hitcount: 35
|
|
|
|
{ child_comm: glib-pacrunner } hitcount: 59
|
|
|
|
|
|
|
|
Totals:
|
|
|
|
Hits: 206
|
|
|
|
Entries: 21
|
|
|
|
Dropped: 0
|
|
|
|
|
|
|
|
The previous example showed how to start and stop a hist trigger by
|
|
|
|
appending 'pause' and 'continue' to the hist trigger command. A
|
|
|
|
hist trigger can also be started in a paused state by initially
|
|
|
|
starting the trigger with ':pause' appended. This allows you to
|
|
|
|
start the trigger only when you're ready to start collecting data
|
|
|
|
and not before. For example, you could start the trigger in a
|
|
|
|
paused state, then unpause it and do something you want to measure,
|
|
|
|
then pause the trigger again when done.
|
|
|
|
|
|
|
|
Of course, doing this manually can be difficult and error-prone, but
|
|
|
|
it is possible to automatically start and stop a hist trigger based
|
|
|
|
on some condition, via the enable_hist and disable_hist triggers.
|
|
|
|
|
|
|
|
For example, suppose we wanted to take a look at the relative
|
|
|
|
weights in terms of skb length for each callpath that leads to a
|
|
|
|
netif_receieve_skb event when downloading a decent-sized file using
|
|
|
|
wget.
|
|
|
|
|
|
|
|
First we set up an initially paused stacktrace trigger on the
|
|
|
|
netif_receive_skb event:
|
|
|
|
|
|
|
|
# echo 'hist:key=stacktrace:vals=len:pause' > \
|
|
|
|
/sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
|
|
|
|
|
|
|
|
Next, we set up an 'enable_hist' trigger on the sched_process_exec
|
|
|
|
event, with an 'if filename==/usr/bin/wget' filter. The effect of
|
|
|
|
this new trigger is that it will 'unpause' the hist trigger we just
|
|
|
|
set up on netif_receive_skb if and only if it sees a
|
|
|
|
sched_process_exec event with a filename of '/usr/bin/wget'. When
|
|
|
|
that happens, all netif_receive_skb events are aggregated into a
|
|
|
|
hash table keyed on stacktrace:
|
|
|
|
|
|
|
|
# echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > \
|
|
|
|
/sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
|
|
|
|
|
|
|
|
The aggregation continues until the netif_receive_skb is paused
|
|
|
|
again, which is what the following disable_hist event does by
|
|
|
|
creating a similar setup on the sched_process_exit event, using the
|
|
|
|
filter 'comm==wget':
|
|
|
|
|
|
|
|
# echo 'disable_hist:net:netif_receive_skb if comm==wget' > \
|
|
|
|
/sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
|
|
|
|
|
|
|
|
Whenever a process exits and the comm field of the disable_hist
|
|
|
|
trigger filter matches 'comm==wget', the netif_receive_skb hist
|
|
|
|
trigger is disabled.
|
|
|
|
|
|
|
|
The overall effect is that netif_receive_skb events are aggregated
|
|
|
|
into the hash table for only the duration of the wget. Executing a
|
|
|
|
wget command and then listing the 'hist' file will display the
|
|
|
|
output generated by the wget command:
|
|
|
|
|
|
|
|
$ wget https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.19.xz
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist
|
|
|
|
# trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused]
|
|
|
|
|
|
|
|
{ stacktrace:
|
|
|
|
__netif_receive_skb_core+0x46d/0x990
|
|
|
|
__netif_receive_skb+0x18/0x60
|
|
|
|
netif_receive_skb_internal+0x23/0x90
|
|
|
|
napi_gro_receive+0xc8/0x100
|
|
|
|
ieee80211_deliver_skb+0xd6/0x270 [mac80211]
|
|
|
|
ieee80211_rx_handlers+0xccf/0x22f0 [mac80211]
|
|
|
|
ieee80211_prepare_and_rx_handle+0x4e7/0xc40 [mac80211]
|
|
|
|
ieee80211_rx+0x31d/0x900 [mac80211]
|
|
|
|
iwlagn_rx_reply_rx+0x3db/0x6f0 [iwldvm]
|
|
|
|
iwl_rx_dispatch+0x8e/0xf0 [iwldvm]
|
|
|
|
iwl_pcie_irq_handler+0xe3c/0x12f0 [iwlwifi]
|
|
|
|
irq_thread_fn+0x20/0x50
|
|
|
|
irq_thread+0x11f/0x150
|
|
|
|
kthread+0xd2/0xf0
|
|
|
|
ret_from_fork+0x42/0x70
|
|
|
|
} hitcount: 85 len: 28884
|
|
|
|
{ stacktrace:
|
|
|
|
__netif_receive_skb_core+0x46d/0x990
|
|
|
|
__netif_receive_skb+0x18/0x60
|
|
|
|
netif_receive_skb_internal+0x23/0x90
|
|
|
|
napi_gro_complete+0xa4/0xe0
|
|
|
|
dev_gro_receive+0x23a/0x360
|
|
|
|
napi_gro_receive+0x30/0x100
|
|
|
|
ieee80211_deliver_skb+0xd6/0x270 [mac80211]
|
|
|
|
ieee80211_rx_handlers+0xccf/0x22f0 [mac80211]
|
|
|
|
ieee80211_prepare_and_rx_handle+0x4e7/0xc40 [mac80211]
|
|
|
|
ieee80211_rx+0x31d/0x900 [mac80211]
|
|
|
|
iwlagn_rx_reply_rx+0x3db/0x6f0 [iwldvm]
|
|
|
|
iwl_rx_dispatch+0x8e/0xf0 [iwldvm]
|
|
|
|
iwl_pcie_irq_handler+0xe3c/0x12f0 [iwlwifi]
|
|
|
|
irq_thread_fn+0x20/0x50
|
|
|
|
irq_thread+0x11f/0x150
|
|
|
|
kthread+0xd2/0xf0
|
|
|
|
} hitcount: 98 len: 664329
|
|
|
|
{ stacktrace:
|
|
|
|
__netif_receive_skb_core+0x46d/0x990
|
|
|
|
__netif_receive_skb+0x18/0x60
|
|
|
|
process_backlog+0xa8/0x150
|
|
|
|
net_rx_action+0x15d/0x340
|
|
|
|
__do_softirq+0x114/0x2c0
|
|
|
|
do_softirq_own_stack+0x1c/0x30
|
|
|
|
do_softirq+0x65/0x70
|
|
|
|
__local_bh_enable_ip+0xb5/0xc0
|
|
|
|
ip_finish_output+0x1f4/0x840
|
|
|
|
ip_output+0x6b/0xc0
|
|
|
|
ip_local_out_sk+0x31/0x40
|
|
|
|
ip_send_skb+0x1a/0x50
|
|
|
|
udp_send_skb+0x173/0x2a0
|
|
|
|
udp_sendmsg+0x2bf/0x9f0
|
|
|
|
inet_sendmsg+0x64/0xa0
|
|
|
|
sock_sendmsg+0x3d/0x50
|
|
|
|
} hitcount: 115 len: 13030
|
|
|
|
{ stacktrace:
|
|
|
|
__netif_receive_skb_core+0x46d/0x990
|
|
|
|
__netif_receive_skb+0x18/0x60
|
|
|
|
netif_receive_skb_internal+0x23/0x90
|
|
|
|
napi_gro_complete+0xa4/0xe0
|
|
|
|
napi_gro_flush+0x6d/0x90
|
|
|
|
iwl_pcie_irq_handler+0x92a/0x12f0 [iwlwifi]
|
|
|
|
irq_thread_fn+0x20/0x50
|
|
|
|
irq_thread+0x11f/0x150
|
|
|
|
kthread+0xd2/0xf0
|
|
|
|
ret_from_fork+0x42/0x70
|
|
|
|
} hitcount: 934 len: 5512212
|
|
|
|
|
|
|
|
Totals:
|
|
|
|
Hits: 1232
|
|
|
|
Entries: 4
|
|
|
|
Dropped: 0
|
|
|
|
|
|
|
|
The above shows all the netif_receive_skb callpaths and their total
|
|
|
|
lengths for the duration of the wget command.
|
|
|
|
|
|
|
|
The 'clear' hist trigger param can be used to clear the hash table.
|
|
|
|
Suppose we wanted to try another run of the previous example but
|
|
|
|
this time also wanted to see the complete list of events that went
|
|
|
|
into the histogram. In order to avoid having to set everything up
|
|
|
|
again, we can just clear the histogram first:
|
|
|
|
|
|
|
|
# echo 'hist:key=stacktrace:vals=len:clear' >> \
|
|
|
|
/sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
|
|
|
|
|
|
|
|
Just to verify that it is in fact cleared, here's what we now see in
|
|
|
|
the hist file:
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist
|
|
|
|
# trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused]
|
|
|
|
|
|
|
|
Totals:
|
|
|
|
Hits: 0
|
|
|
|
Entries: 0
|
|
|
|
Dropped: 0
|
|
|
|
|
|
|
|
Since we want to see the detailed list of every netif_receive_skb
|
|
|
|
event occurring during the new run, which are in fact the same
|
|
|
|
events being aggregated into the hash table, we add some additional
|
|
|
|
'enable_event' events to the triggering sched_process_exec and
|
|
|
|
sched_process_exit events as such:
|
|
|
|
|
|
|
|
# echo 'enable_event:net:netif_receive_skb if filename==/usr/bin/wget' > \
|
|
|
|
/sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
|
|
|
|
|
|
|
|
# echo 'disable_event:net:netif_receive_skb if comm==wget' > \
|
|
|
|
/sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
|
|
|
|
|
|
|
|
If you read the trigger files for the sched_process_exec and
|
|
|
|
sched_process_exit triggers, you should see two triggers for each:
|
|
|
|
one enabling/disabling the hist aggregation and the other
|
|
|
|
enabling/disabling the logging of events:
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
|
|
|
|
enable_event:net:netif_receive_skb:unlimited if filename==/usr/bin/wget
|
|
|
|
enable_hist:net:netif_receive_skb:unlimited if filename==/usr/bin/wget
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
|
|
|
|
enable_event:net:netif_receive_skb:unlimited if comm==wget
|
|
|
|
disable_hist:net:netif_receive_skb:unlimited if comm==wget
|
|
|
|
|
|
|
|
In other words, whenever either of the sched_process_exec or
|
|
|
|
sched_process_exit events is hit and matches 'wget', it enables or
|
|
|
|
disables both the histogram and the event log, and what you end up
|
|
|
|
with is a hash table and set of events just covering the specified
|
|
|
|
duration. Run the wget command again:
|
|
|
|
|
|
|
|
$ wget https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.19.xz
|
|
|
|
|
|
|
|
Displaying the 'hist' file should show something similar to what you
|
|
|
|
saw in the last run, but this time you should also see the
|
|
|
|
individual events in the trace file:
|
|
|
|
|
|
|
|
# cat /sys/kernel/debug/tracing/trace
|
|
|
|
|
|
|
|
# tracer: nop
|
|
|
|
#
|
|
|
|
# entries-in-buffer/entries-written: 183/1426 #P:4
|
|
|
|
#
|
|
|
|
# _-----=> irqs-off
|
|
|
|
# / _----=> need-resched
|
|
|
|
# | / _---=> hardirq/softirq
|
|
|
|
# || / _--=> preempt-depth
|
|
|
|
# ||| / delay
|
|
|
|
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
|
|
|
|
# | | | |||| | |
|
|
|
|
wget-15108 [000] ..s1 31769.606929: netif_receive_skb: dev=lo skbaddr=ffff88009c353100 len=60
|
|
|
|
wget-15108 [000] ..s1 31769.606999: netif_receive_skb: dev=lo skbaddr=ffff88009c353200 len=60
|
|
|
|
dnsmasq-1382 [000] ..s1 31769.677652: netif_receive_skb: dev=lo skbaddr=ffff88009c352b00 len=130
|
|
|
|
dnsmasq-1382 [000] ..s1 31769.685917: netif_receive_skb: dev=lo skbaddr=ffff88009c352200 len=138
|
|
|
|
##### CPU 2 buffer started ####
|
|
|
|
irq/29-iwlwifi-559 [002] ..s. 31772.031529: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433d00 len=2948
|
|
|
|
irq/29-iwlwifi-559 [002] ..s. 31772.031572: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d432200 len=1500
|
|
|
|
irq/29-iwlwifi-559 [002] ..s. 31772.032196: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433100 len=2948
|
|
|
|
irq/29-iwlwifi-559 [002] ..s. 31772.032761: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433000 len=2948
|
|
|
|
irq/29-iwlwifi-559 [002] ..s. 31772.033220: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d432e00 len=1500
|
|
|
|
.
|
|
|
|
.
|
|
|
|
.
|