Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-nq1wvtky4mpu0nupjyar7sbw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We have addch() for chars, add() for fixed size data, and addstr() for
variable length strings, use them.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-0ap02fn2xtvpduj2j6b2o1j4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
That doesn't chekcs malloc return and that, when using strbuf, if it
can't grow, just explodes away via die().
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-vr8qsjbwub7e892hpa9msz95@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-s87zi5d03m6rz622y1z6rlsa@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use instead the copy just made to tools/include/linux/.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-q736w12nwy98x5ox2hamp5ow@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There is code in tools/ that is directly including this file from the
kernel, and this is verboten for a while, copy it so that the next csets
can fix this situation.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-e0r3nks2uai020ndghvxv5qw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit a674533078 ("tools lib traceevent: Split pevent_print_event()
into specific functionality functions") broke apart the function
pevent_print_event() into three functions.
The first function prints the comm, pid and CPU, the second prints the
timestamp.
But that commit added the printing of the CPU in the timestamp function,
which now causes pevent_print_event() to duplicate the CPU output.
Remove the redundant printing of the record's CPU from the timestamp
function.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: a674533078 ("tools lib traceevent: Split pevent_print_event() into specific functionality functions")
Link: http://lkml.kernel.org/r/20160323101628.459375d2@gandalf.local.home
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-w246stf7ponfamclsai6b9zo@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This should die altogether, but for now lets remove a bit of this stuff,
as it is not used at all.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ade3n99xscldhg5mx2vzd8p3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-elxg25jd4dhwod4wqbko87qh@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since none of the perf_event fields are used anymore, just the
perf_sample ones, and since this resolves to (map, symbol) from data
structures within struct thread, rename it to thread__resolve and make
the argument ordering similar to the one in machine__resolve().
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-2b33hs9bp550tezzlhl4kejh@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since we only deal with fields in the passed struct perf_sample move
this method to struct machine, that is where the perf_sample fields
will be resolved to a struct addr_location, i.e. thread, map, symbol,
etc.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-a1ww2lbm2vbuqsv4p7ilubu9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To avoid parsing event->header.misc in many locations.
This will also allow setting perf.sample.{ip,cpumode} in a single place,
from tracepoint fields, as needed by 'perf kvm' with PPC guests, where
the guest hardware counters is not available at the host.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-qp3yradhyt6q3wl895b1aat0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It _will_ be used, no sense in receiving it and nor fowarding it along.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ht8v5et209wuoh5o6nh9pzyq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
All over the tree.
Cc: David Ahern <dsahern@gmail.com>
cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-8nzhnokxyp8y4v7gf0j00oyb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Correctly document what is implemented for :ppp on Intel CPUs in recent
kernels.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1458575793-12091-2-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Comparing bits and bytes in numa benchmark assertion
I hit the issue on two socket Power8 machine presenting its numa nodes
as 0,1,16,17 (according to numactl). Therefore I got error (and hang of
parent process):
perf: bench/numa.c:296: bind_to_memnode: Assertion `!(g->p.nr_nodes > (int)sizeof(nodemask))' failed.
This is obviously false positive. We can fit all the 18 nodes into
bitfield of 8 bytes (long on 64b architecture).
Signed-off-by: Jakub Jelen <jakuje@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jakub Jelen <jjelen@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: trivial@kernel.org
Link: http://lkml.kernel.org/r/1458388687-24421-1-git-send-email-jakuje@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The ev_sel_ext in PCU_MSR_PMON_CTL is locked on some CPU models, so despite
it being documented in the SDM, if we write 1 to that bit then we can get a #GP
fault.
Which #GP the perf fuzzer happily triggered in Peter Zijlstra's testing.
Also, there are no public events which use that bit, so remove ev_sel_ext
bit support for PCU.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1458500301-3594-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Introduce an AMD accumlated power reporting mechanism for the Family
15h, Model 60h processor that can be used to calculate the average
power consumed by a processor during a measurement interval. The
feature support is indicated by CPUID Fn8000_0007_EDX[12].
This feature will be implemented both in hwmon and perf. The current
design provides one event to report per package/processor power
consumption by counting each compute unit power value.
Here the gory details of how the computation is done:
* Tsample: compute unit power accumulator sample period
* Tref: the PTSC counter period (PTSC: performance timestamp counter)
* N: the ratio of compute unit power accumulator sample period to the
PTSC period
* Jmax: max compute unit accumulated power which is indicated by
MSR_C001007b[MaxCpuSwPwrAcc]
* Jx/Jy: compute unit accumulated power which is indicated by
MSR_C001007a[CpuSwPwrAcc]
* Tx/Ty: the value of performance timestamp counter which is indicated
by CU_PTSC MSR_C0010280[PTSC]
* PwrCPUave: CPU average power
i. Determine the ratio of Tsample to Tref by executing CPUID Fn8000_0007.
N = value of CPUID Fn8000_0007_ECX[CpuPwrSampleTimeRatio[15:0]].
ii. Read the full range of the cumulative energy value from the new
MSR MaxCpuSwPwrAcc.
Jmax = value returned.
iii. At time x, software reads CpuSwPwrAcc and samples the PTSC.
Jx = value read from CpuSwPwrAcc and Tx = value read from PTSC.
iv. At time y, software reads CpuSwPwrAcc and samples the PTSC.
Jy = value read from CpuSwPwrAcc and Ty = value read from PTSC.
v. Calculate the average power consumption for a compute unit over
time period (y-x). Unit of result is uWatt:
if (Jy < Jx) // Rollover has occurred
Jdelta = (Jy + Jmax) - Jx
else
Jdelta = Jy - Jx
PwrCPUave = N * Jdelta * 1000 / (Ty - Tx)
Simple example:
root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' make -j4
CHK include/config/kernel.release
CHK include/generated/uapi/linux/version.h
CHK include/generated/utsrelease.h
CHK include/generated/timeconst.h
CHK include/generated/bounds.h
CHK include/generated/asm-offsets.h
CALL scripts/checksyscalls.sh
CHK include/generated/compile.h
SKIPPED include/generated/compile.h
Building modules, stage 2.
Kernel: arch/x86/boot/bzImage is ready (#40)
MODPOST 4225 modules
Performance counter stats for 'system wide':
183.44 mWatts power/power-pkg/
341.837270111 seconds time elapsed
root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' sleep 10
Performance counter stats for 'system wide':
0.18 mWatts power/power-pkg/
10.012551815 seconds time elapsed
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: jacob.w.shin@gmail.com
Link: http://lkml.kernel.org/r/1457502306-2559-1-git-send-email-ray.huang@amd.com
[ Fixed the modular build. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
AMD CPU family 15h model 0x60 introduces a mechanism for measuring
accumulated power. It is used to report the processor power consumption
and support for it is indicated by CPUID Fn8000_0007_EDX[12].
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wan Zongshun <Vincent.Wan@amd.com>
Cc: spg_linux_kernel@amd.com
Link: http://lkml.kernel.org/r/1452739808-11871-4-git-send-email-ray.huang@amd.com
[ Resolved conflict and moved the synthetic CPUID slot to 19. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Having the same code twice (and once quite ugly) is fragile.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch adds a per package timer which periodically updates the
memory bandwidth counters for the events that are currently active.
Current patch has a periodic timer every 1s since the SDM guarantees
that the counter will not overflow in 1s but this time can be definitely
improved by calibrating on the system. The overflow is really a function
of the max memory b/w that the socket can support, max counter value and
scaling factor.
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fenghua.yu@intel.com
Cc: h.peter.anvin@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/013b756c5006b1c4ca411f3ecf43ed52f19fbf87.1457723885.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
RMID could be allocated or deallocated as part of RMID recycling.
When an RMID is allocated for MBM event, the MBM counter needs to be
initialized because next time we read the counter we need the previous
value to account for total bytes that went to the memory controller.
Similarly, when RMID is deallocated we need to update the ->count
variable.
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fenghua.yu@intel.com
Cc: h.peter.anvin@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1457652732-4499-6-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Includes all the core infrastructure to measure the total_bytes and
bandwidth.
We have per socket counters for both total system wide L3 external
bytes and local socket memory-controller bytes. The OS does MSR writes
to MSR_IA32_QM_EVTSEL and MSR_IA32_QM_CTR to read the counters and
uses the IA32_PQR_ASSOC_MSR to associate the RMID with the task. The
tasks have a common RMID for CQM (cache quality of service monitoring)
and MBM. Hence most of the scheduling code is reused from CQM.
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Restructured rmid_read to not have an obvious hole, removed MBM_CNTR_MAX as its unused. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fenghua.yu@intel.com
Cc: h.peter.anvin@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/abd7aac9a18d93b95b985b931cf258df0164746d.1457723885.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The MBM init patch enumerates the Intel MBM (Memory b/w monitoring)
and initializes the perf events and datastructures for monitoring the
memory b/w.
Its based on original patch series by Tony Luck and Kanaka Juvva.
Memory bandwidth monitoring (MBM) provides OS/VMM a way to monitor
bandwidth from one level of cache to another. The current patches
support L3 external bandwidth monitoring. It supports both 'local
bandwidth' and 'total bandwidth' monitoring for the socket. Local
bandwidth measures the amount of data sent through the memory controller
on the socket and total b/w measures the total system bandwidth.
Extending the cache quality of service monitoring (CQM) we add two
more events to the perf infrastructure:
intel_cqm_llc/local_bytes - bytes sent through local socket memory controller
intel_cqm_llc/total_bytes - total L3 external bytes sent
The tasks are associated with a Resouce Monitoring ID (RMID) just like
in CQM and OS uses a MSR write to indicate the RMID of the task during
scheduling.
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fenghua.yu@intel.com
Cc: h.peter.anvin@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1457652732-4499-4-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently CQM (cache quality of service monitoring) is grouping all
events belonging to same PID to use one RMID. However its not counting
all of these different events. Hence we end up with a count of zero
for all events other than the group leader.
The patch tries to address the issue by keeping a flag in the
perf_event.hw which has other CQM related fields. The field is updated
at event creation and during grouping.
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
[peterz: Changed hw_perf_event::is_group_event to an int]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fenghua.yu@intel.com
Cc: h.peter.anvin@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1457652732-4499-2-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Sasha reported:
[ 3494.030114] UBSAN: Undefined behaviour in kernel/events/ring_buffer.c:685:22
[ 3494.030647] shift exponent -1 is negative
Andrey spotted that this is because:
It happens if nr_pages = 0:
rb->page_order = ilog2(nr_pages);
Fix it by making both assignments conditional on nr_pages; since
otherwise they should both be 0 anyway, and will be because of the
kzalloc() used to allocate the structure.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20160129141751.GA407@worktop
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There were two problems with the dynamic interrupt throttle mechanism,
both triggered by the same action.
When you (or perf_fuzzer) write a huge value into
/proc/sys/kernel/perf_event_max_sample_rate the computed
perf_sample_allowed_ns becomes 0. This effectively disables the whole
dynamic throttle.
This is fixed by ensuring update_perf_cpu_limits() never sets the
value to 0. However, we allow disabling of the dynamic throttle by
writing 100 to /proc/sys/kernel/perf_cpu_time_max_percent. This will
generate a warning in dmesg.
The second problem is that by setting the max_sample_rate to a huge
number, the adaptive process can take a few tries, since it halfs the
limit each time. Change that to directly compute a new value based on
the observed duration.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Interrupt throttling is normally only done against
sysctl_perf_event_sample_rate. This means that if that number is too
high (for whatever reason) you can lock up your machine.
We have, however, a dynamic throttling scheme too, but for that to
work, we need to add a callback to the interrupt handler, IBS did not
have this, so add it.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While tracing the IBS bits I saw the NMI hitting between clearing
IBS_STARTING and the actual MSR writes to disable the counter.
Since IBS_STARTING was cleared, the handler assumed these were spurious
NMIs and because STOPPING wasn't set yet either, insta-triggered an
"Unknown NMI".
Cure this by clearing IBS_STARTING after disabling the hardware.
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When the IBS IRQ handler get a !0 return from perf_event_overflow;
meaning it should throttle the event, it only disables it, it doesn't
call perf_ibs_stop().
This confuses the state machine, as we'll use pmu::start() ->
perf_ibs_start() to unthrottle.
Tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: dvyukov@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Link: http://lkml.kernel.org/r/20160311142346.GE6344@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Its possible to IOC_PERIOD while the event is throttled, this would
re-start the event and the next tick would then try to unthrottle it,
and find the event wasn't actually stopped anymore.
This would tickle a WARN in the x86-pmu code which isn't expecting to
start a !stopped event.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: dvyukov@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160310143924.GR6356@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The notifier is missing the CPU_DOWN_FAILED transition. That leaves the
heartbeat disabled when CPU_DOWN_PREPARE fails.
It also does not handle the FROZEN transition variants. That might not be an
issue for UV, but it's inconsistent.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
nr_cpu_ids can be limited on the command line via nr_cpus=. That can break the
logical package management because it results in a smaller number of packages,
but the cpus to online are occupying the full package space as the hyper
threads are enumerated after the physical cores typically.
total_cpus is the real possible cpu space not limited by nr_cpus command line
and gives us the proper number of packages.
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Fixes: 1f12e32f4c ("x86/topology: Create logical package id")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Xiong Zhou <jencce.kernel@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andreas Herrmann <aherrmann@suse.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603181254330.3978@nanos
As per the comment in the code; due to BIOS it is sometimes impossible to know
if there actually are smp siblings until the machine is fully enumerated. So
we rather overestimate the number of possible packages.
Fixes: 1f12e32f4c ("x86/topology: Create logical package id")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: aherrmann@suse.com
Cc: jencce.kernel@gmail.com
Cc: bp@alien8.de
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Link: http://lkml.kernel.org/r/20160318150538.611014173@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
That first branch testing pkg against __max_logical_packages is wrong,
because if the first pkg id is larger, then the find_first_zero will
find us logical package id 0. However, if the second pkg id is indeed
0, we'll again claim it without testing if it was already taken.
Also, it fails to print the mapping.
Fixes: 1f12e32f4c ("x86/topology: Create logical package id")
Reported-by: Xiong Zhou <jencce.kernel@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: aherrmann@suse.com
Cc: bp@alien8.de
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Link: http://lkml.kernel.org/r/20160317095220.GO6344@twins.programming.kicks-ass.net
Link: http://lkml.kernel.org/r/20160318150538.482393396@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Store DSO's .text offset into DSO, used for VDSOs and will also be used for
other needs, like handling kernel modules.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456479154-136027-2-git-send-email-wangnan0@huawei.com
[ Extracted from larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As it is used by several other tools, better move it outside tools/perf.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-34s9kue3xq9w5mijdmfrfx8s@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Harry reported, that he's able to trigger a system freeze with cpu hot
unplug. The freeze turned out to be a live lock caused by recent changes in
irq_force_complete_move().
When fixup_irqs() and from there irq_force_complete_move() is called on the
dying cpu, then all other cpus are in stop machine an wait for the dying cpu
to complete the teardown. If there is a move of an interrupt pending then
irq_force_complete_move() sends the cleanup IPI to the cpus in the old_domain
mask and waits for them to clear the mask. That's obviously impossible as
those cpus are firmly stuck in stop machine with interrupts disabled.
I should have known that, but I completely overlooked it being concentrated on
the locking issues around the vectors. And the existance of the call to
__irq_complete_move() in the code, which actually sends the cleanup IPI made
it reasonable to wait for that cleanup to complete. That call was bogus even
before the recent changes as it was just a pointless distraction.
We have to look at two cases:
1) The move_in_progress flag of the interrupt is set
This means the ioapic has been updated with the new vector, but it has not
fired yet. In theory there is a race:
set_ioapic(new_vector) <-- Interrupt is raised before update is effective,
i.e. it's raised on the old vector.
So if the target cpu cannot handle that interrupt before the old vector is
cleaned up, we get a spurious interrupt and in the worst case the ioapic
irq line becomes stale, but my experiments so far have only resulted in
spurious interrupts.
But in case of cpu hotplug this should be a non issue because if the
affinity update happens right before all cpus rendevouz in stop machine,
there is no way that the interrupt can be blocked on the target cpu because
all cpus loops first with interrupts enabled in stop machine, so the old
vector is not yet cleaned up when the interrupt fires.
So the only way to run into this issue is if the delivery of the interrupt
on the apic/system bus would be delayed beyond the point where the target
cpu disables interrupts in stop machine. I doubt that it can happen, but at
least there is a theroretical chance. Virtualization might be able to
expose this, but AFAICT the IOAPIC emulation is not as stupid as the real
hardware.
I've spent quite some time over the weekend to enforce that situation,
though I was not able to trigger the delayed case.
2) The move_in_progress flag is not set and the old_domain cpu mask is not
empty.
That means, that an interrupt was delivered after the change and the
cleanup IPI has been sent to the cpus in old_domain, but not all CPUs have
responded to it yet.
In both cases we can assume that the next interrupt will arrive on the new
vector, so we can cleanup the old vectors on the cpus in the old_domain cpu
mask.
Fixes: 98229aa36c "x86/irq: Plug vector cleanup race"
Reported-by: Harry Junior <harryjr@outlook.fr>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603140931430.3657@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The topology_core_cpumask is used to find a neighbour cpu in
calibrate_delay_is_known(). It might not be allocated at the first invocation
of that function on the boot cpu, when CONFIG_CPUMASK_OFFSTACK is set.
The mask is allocated later in native_smp_prepare_cpus. As a consequence the
underlying find_next_bit() call dereferences a NULL pointer.
Add a proper check to prevent this.
Fixes: c25323c073 "x86/tsc: Use topology functions"
Reported-and-tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603180843270.3978@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>