Commit Graph

35571 Commits

Author SHA1 Message Date
Ilya Leoshkevich b1828f0b04 bpf: Add BTF_KIND_FLOAT support
On the kernel side, introduce a new btf_kind_operations. It is
similar to that of BTF_KIND_INT, however, it does not need to
handle encodings and bit offsets. Do not implement printing, since
the kernel does not know how to format floating-point values.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210226202256.116518-7-iii@linux.ibm.com
2021-03-04 17:58:16 -08:00
Yonghong Song 06dcdcd4b9 bpf: Add arraymap support for bpf_for_each_map_elem() helper
This patch added support for arraymap and percpu arraymap.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204928.3885192-1-yhs@fb.com
2021-02-26 13:23:52 -08:00
Yonghong Song 314ee05e2f bpf: Add hashtab support for bpf_for_each_map_elem() helper
This patch added support for hashmap, percpu hashmap,
lru hashmap and percpu lru hashmap.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204927.3885020-1-yhs@fb.com
2021-02-26 13:23:52 -08:00
Yonghong Song 69c087ba62 bpf: Add bpf_for_each_map_elem() helper
The bpf_for_each_map_elem() helper is introduced which
iterates all map elements with a callback function. The
helper signature looks like
  long bpf_for_each_map_elem(map, callback_fn, callback_ctx, flags)
and for each map element, the callback_fn will be called. For example,
like hashmap, the callback signature may look like
  long callback_fn(map, key, val, callback_ctx)

There are two known use cases for this. One is from upstream ([1]) where
a for_each_map_elem helper may help implement a timeout mechanism
in a more generic way. Another is from our internal discussion
for a firewall use case where a map contains all the rules. The packet
data can be compared to all these rules to decide allow or deny
the packet.

For array maps, users can already use a bounded loop to traverse
elements. Using this helper can avoid using bounded loop. For other
type of maps (e.g., hash maps) where bounded loop is hard or
impossible to use, this helper provides a convenient way to
operate on all elements.

For callback_fn, besides map and map element, a callback_ctx,
allocated on caller stack, is also passed to the callback
function. This callback_ctx argument can provide additional
input and allow to write to caller stack for output.

If the callback_fn returns 0, the helper will iterate through next
element if available. If the callback_fn returns 1, the helper
will stop iterating and returns to the bpf program. Other return
values are not used for now.

Currently, this helper is only available with jit. It is possible
to make it work with interpreter with so effort but I leave it
as the future work.

[1]: https://lore.kernel.org/bpf/20210122205415.113822-1-xiyou.wangcong@gmail.com/

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204925.3884923-1-yhs@fb.com
2021-02-26 13:23:52 -08:00
Yonghong Song 282a0f46d6 bpf: Change return value of verifier function add_subprog()
Currently, verifier function add_subprog() returns 0 for success
and negative value for failure. Change the return value
to be the subprog number for success. This functionality will be
used in the next patch to save a call to find_subprog().

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204924.3884848-1-yhs@fb.com
2021-02-26 13:23:52 -08:00
Yonghong Song 1435137573 bpf: Refactor check_func_call() to allow callback function
Later proposed bpf_for_each_map_elem() helper has callback
function as one of its arguments. This patch refactored
check_func_call() to permit callback function which sets
callee state. Different callback functions may have
different callee states.
There is no functionality change for this patch.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204923.3884627-1-yhs@fb.com
2021-02-26 13:23:52 -08:00
Yonghong Song bc2591d63f bpf: Factor out verbose_invalid_scalar()
Factor out the function verbose_invalid_scalar() to verbose
print if a scalar is not in a tnum range. There is no
functionality change and the function will be used by
later patch which introduced bpf_for_each_map_elem().

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204922.3884375-1-yhs@fb.com
2021-02-26 13:23:52 -08:00
Yonghong Song efdb22de7d bpf: Factor out visit_func_call_insn() in check_cfg()
During verifier check_cfg(), all instructions are
visited to ensure verifier can handle program control flows.
This patch factored out function visit_func_call_insn()
so it can be reused in later patch to visit callback function
calls. There is no functionality change for this patch.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204920.3884136-1-yhs@fb.com
2021-02-26 13:23:52 -08:00
Brendan Jackman e6ac593372 bpf: Rename fixup_bpf_calls and add some comments
This function has become overloaded, it actually does lots of diverse
things in a single pass. Rename it to avoid confusion, and add some
concise commentary.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210217104509.2423183-1-jackmanb@google.com
2021-02-26 12:05:07 -08:00
Dmitrii Banshchikov 523a4cf491 bpf: Use MAX_BPF_FUNC_REG_ARGS macro
Instead of using integer literal here and there use macro name for
better context.

Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210225202629.585485-1-me@ubique.spb.ru
2021-02-26 11:59:53 -08:00
Song Liu bc235cdb42 bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]
BPF helpers bpf_task_storage_[get|delete] could hold two locks:
bpf_local_storage_map_bucket->lock and bpf_local_storage->lock. Calling
these helpers from fentry/fexit programs on functions in bpf_*_storage.c
may cause deadlock on either locks.

Prevent such deadlock with a per cpu counter, bpf_task_storage_busy. We
need this counter to be global, because the two locks here belong to two
different objects: bpf_local_storage_map and bpf_local_storage. If we
pick one of them as the owner of the counter, it is still possible to
trigger deadlock on the other lock. For example, if bpf_local_storage_map
owns the counters, it cannot prevent deadlock on bpf_local_storage->lock
when two maps are used.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210225234319.336131-3-songliubraving@fb.com
2021-02-26 11:51:48 -08:00
Song Liu a10787e6d5 bpf: Enable task local storage for tracing programs
To access per-task data, BPF programs usually creates a hash table with
pid as the key. This is not ideal because:
 1. The user need to estimate the proper size of the hash table, which may
    be inaccurate;
 2. Big hash tables are slow;
 3. To clean up the data properly during task terminations, the user need
    to write extra logic.

Task local storage overcomes these issues and offers a better option for
these per-task data. Task local storage is only available to BPF_LSM. Now
enable it for tracing programs.

Unlike LSM programs, tracing programs can be called in IRQ contexts.
Helpers that access task local storage are updated to use
raw_spin_lock_irqsave() instead of raw_spin_lock_bh().

Tracing programs can attach to functions on the task free path, e.g.
exit_creds(). To avoid allocating task local storage after
bpf_task_storage_free(). bpf_task_storage_get() is updated to not allocate
new storage when the task is not refcounted (task->usage == 0).

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210225234319.336131-2-songliubraving@fb.com
2021-02-26 11:51:47 -08:00
Linus Torvalds d310ec03a3 The performance event updates for v5.12 are:
- Add CPU-PMU support for Intel Sapphire Rapids CPUs
 
  - Extend the perf ABI with PERF_SAMPLE_WEIGHT_STRUCT, to offer two-parameter
    sampling event feedback. Not used yet, but is intended for Golden Cove
    CPU-PMU, which can provide both the instruction latency and the cache
    latency information for memory profiling events.
 
  - Remove experimental, default-disabled perfmon-v4 counter_freezing support
    that could only be enabled via a boot option. The hardware is hopelessly
    broken, we'd like to make sure nobody starts relying on this, as it would
    only end in tears.
 
  - Fix energy/power events on Intel SPR platforms
 
  - Simplify the uprobes resume_execution() logic
 
  - Misc smaller fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmAtf7kRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iJ2xAAvygKF8hm/UAGyT2R3iEruO49wRrmUfgt
 13iBBA1DotKw2b8F5UN5MqjfwS8UgGKuAd8agvQ6XXANpnJ5mpy0nrzgjEXUx4j+
 sQUqL7vxSdZ5J3kKblSZ4QoMzLVYSUkEDmw818vsa4eFWN8z58FJsv+ySegIFbXx
 +I3hF1O9a8MERZBUz4T5xHlgcbSDGEX6EvYRcO+zZ0rXfARfo9StfHYv1V53j6iY
 EOotFEKEn/5naczAd/sQo1SE1IgHtX2cbjOaKF7LulgEwZQWHpdKq0gww6nFK5yz
 XMSE9oXAFXRkRCJbrSqC0Dvrrf8hdlxWbKYbj9L7XILoxw199AdOBDbliJm6P/UH
 6+JSEu/N4R0TFYc7TX6yef7ncw12e+64USjKOlWWwww97rVWWH1/tFTdlXhS6s+d
 jVI3yEECKyZlddrDdsetRdUj+QKyZQfDqbMXPXiDTv9P6AFqBvNLZYT0UPU3akk5
 jXueHJQYSSgqnN+eRaIwvm4ZYWa031YHJXxiq2E89RnzL4JJArBYaddpukgxTYka
 c6Tn8L7f4zP5Bghu7hHv5Vy69i1N/3YvzUoYc6ljjmapgAJzxzq/yoEKrBlKnjtA
 MrstHhnwnPJl+PKjlbLpjl74rtcCiKJxjVhm+a5UbEcYoVuzJ86lmQK2WrLaoCTU
 B/zFplUF8C4=
 =BCcg
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull performance event updates from Ingo Molnar:

 - Add CPU-PMU support for Intel Sapphire Rapids CPUs

 - Extend the perf ABI with PERF_SAMPLE_WEIGHT_STRUCT, to offer
   two-parameter sampling event feedback. Not used yet, but is intended
   for Golden Cove CPU-PMU, which can provide both the instruction
   latency and the cache latency information for memory profiling
   events.

 - Remove experimental, default-disabled perfmon-v4 counter_freezing
   support that could only be enabled via a boot option. The hardware is
   hopelessly broken, we'd like to make sure nobody starts relying on
   this, as it would only end in tears.

 - Fix energy/power events on Intel SPR platforms

 - Simplify the uprobes resume_execution() logic

 - Misc smaller fixes.

* tag 'perf-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/rapl: Fix psys-energy event on Intel SPR platform
  perf/x86/rapl: Only check lower 32bits for RAPL energy counters
  perf/x86/rapl: Add msr mask support
  perf/x86/kvm: Add Cascade Lake Xeon steppings to isolation_ucodes[]
  perf/x86/intel: Support CPUID 10.ECX to disable fixed counters
  perf/x86/intel: Add perf core PMU support for Sapphire Rapids
  perf/x86/intel: Filter unsupported Topdown metrics event
  perf/x86/intel: Factor out intel_update_topdown_event()
  perf/core: Add PERF_SAMPLE_WEIGHT_STRUCT
  perf/intel: Remove Perfmon-v4 counter_freezing support
  x86/perf: Use static_call for x86_pmu.guest_get_msrs
  perf/x86/intel/uncore: With > 8 nodes, get pci bus die id from NUMA info
  perf/x86/intel/uncore: Store the logical die id instead of the physical die id.
  x86/kprobes: Do not decode opcode in resume_execution()
2021-02-21 12:49:32 -08:00
Linus Torvalds 657bd90c93 Scheduler updates for v5.12:
[ NOTE: unfortunately this tree had to be freshly rebased today,
         it's a same-content tree of 82891be90f3c (-next published)
         merged with v5.11.
 
         The main reason for the rebase was an authorship misattribution
         problem with a new commit, which we noticed in the last minute,
         and which we didn't want to be merged upstream. The offending
         commit was deep in the tree, and dependent commits had to be
         rebased as well. ]
 
 - Core scheduler updates:
 
   - Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the
     preempt=none/voluntary/full boot options (default: full),
     to allow distros to build a PREEMPT kernel but fall back to
     close to PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling
     behavior via a boot time selection.
 
     There's also the /debug/sched_debug switch to do this runtime.
 
     This feature is implemented via runtime patching (a new variant of static calls).
 
     The scope of the runtime patching can be best reviewed by looking
     at the sched_dynamic_update() function in kernel/sched/core.c.
 
     ( Note that the dynamic none/voluntary mode isn't 100% identical,
       for example preempt-RCU is available in all cases, plus the
       preempt count is maintained in all models, which has runtime
       overhead even with the code patching. )
 
     The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast majority
     of distributions, are supposed to be unaffected.
 
   - Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that
     was found via rcutorture triggering a hang. The bug is that
     rcu_idle_enter() may wake up a NOCB kthread, but this happens after
     the last generic need_resched() check. Some cpuidle drivers fix it
     by chance but many others don't.
 
     In true 2020 fashion the original bug fix has grown into a 5-patch
     scheduler/RCU fix series plus another 16 RCU patches to address
     the underlying issue of missed preemption events. These are the
     initial fixes that should fix current incarnations of the bug.
 
   - Clean up rbtree usage in the scheduler, by providing & using the following
     consistent set of rbtree APIs:
 
      partial-order; less() based:
        - rb_add(): add a new entry to the rbtree
        - rb_add_cached(): like rb_add(), but for a rb_root_cached
 
      total-order; cmp() based:
        - rb_find(): find an entry in an rbtree
        - rb_find_add(): find an entry, and add if not found
 
        - rb_find_first(): find the first (leftmost) matching entry
        - rb_next_match(): continue from rb_find_first()
        - rb_for_each(): iterate a sub-tree using the previous two
 
   - Improve the SMP/NUMA load-balancer: scan for an idle sibling in a single pass.
     This is a 4-commit series where each commit improves one aspect of the idle
     sibling scan logic.
 
   - Improve the cpufreq cooling driver by getting the effective CPU utilization
     metrics from the scheduler
 
   - Improve the fair scheduler's active load-balancing logic by reducing the number
     of active LB attempts & lengthen the load-balancing interval. This improves
     stress-ng mmapfork performance.
 
   - Fix CFS's estimated utilization (util_est) calculation bug that can result in
     too high utilization values
 
 - Misc updates & fixes:
 
    - Fix the HRTICK reprogramming & optimization feature
    - Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code
    - Reduce dl_add_task_root_domain() overhead
    - Fix uprobes refcount bug
    - Process pending softirqs in flush_smp_call_function_from_idle()
    - Clean up task priority related defines, remove *USER_*PRIO and
      USER_PRIO()
    - Simplify the sched_init_numa() deduplication sort
    - Documentation updates
    - Fix EAS bug in update_misfit_status(), which degraded the quality
      of energy-balancing
    - Smaller cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmAtHBsRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1itgg/+NGed12pgPjYBzesdou60Lvx7LZLGjfOt
 M1F1EnmQGn/hEH2fCY6ZoqIZQTVltm7GIcBNabzYTzlaHZsdtyuDUJBZyj19vTlk
 zekcj7WVt+qvfjChaNwEJhQ9nnOM/eohMgEOHMAAJd9zlnQvve7NOLQ56UDM+kn/
 9taFJ5ZPvb4avP6C5p3KivvKex6Bjof/Tl0m3utpNyPpI/qK3FyGxwdgCxU0yepT
 ABWQX5ZQCufFvo1bgnBPfqyzab4MqhoM3bNKBsLQfuAlssG1xRv4KQOev4dRwrt9
 pXJikV5C9yez5d2lGe5p0ltH5IZS/l9x2yI/ZQj3OUDTFyV1ic6WfFAqJgDzVF8E
 i/vvA4NPQiI241Bkps+ErcCw4aVOgiY6TWli74cHjLUIX0+As6aHrFWXGSxUmiHB
 WR+B8KmdfzRTTlhOxMA+cvlpZcKCfxWkJJmXzr/lDZzIuKPqM3QCE2wD9sixkfVo
 JNICT0IvZghWOdbMEfZba8Psh/e2LVI9RzdpEiuYJz1ZrVlt1hO0M6jBxY0hMz9n
 k54z81xODw0a8P2FHMtpmB1vhAeqCmvwA6DO8z0Oxs0DFi+KM2bLf2efHsCKafI+
 Bm5v9YFaOk/55R76hJVh+aYLlyFgFkKd+P/niJTPDnxOk3SqJuXvTrql1HeGHkNr
 kYgQa23dsZk=
 =pyaG
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "Core scheduler updates:

   - Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the
     preempt=none/voluntary/full boot options (default: full), to allow
     distros to build a PREEMPT kernel but fall back to close to
     PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling behavior via
     a boot time selection.

     There's also the /debug/sched_debug switch to do this runtime.

     This feature is implemented via runtime patching (a new variant of
     static calls).

     The scope of the runtime patching can be best reviewed by looking
     at the sched_dynamic_update() function in kernel/sched/core.c.

     ( Note that the dynamic none/voluntary mode isn't 100% identical,
       for example preempt-RCU is available in all cases, plus the
       preempt count is maintained in all models, which has runtime
       overhead even with the code patching. )

     The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast
     majority of distributions, are supposed to be unaffected.

   - Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that
     was found via rcutorture triggering a hang. The bug is that
     rcu_idle_enter() may wake up a NOCB kthread, but this happens after
     the last generic need_resched() check. Some cpuidle drivers fix it
     by chance but many others don't.

     In true 2020 fashion the original bug fix has grown into a 5-patch
     scheduler/RCU fix series plus another 16 RCU patches to address the
     underlying issue of missed preemption events. These are the initial
     fixes that should fix current incarnations of the bug.

   - Clean up rbtree usage in the scheduler, by providing & using the
     following consistent set of rbtree APIs:

       partial-order; less() based:
         - rb_add(): add a new entry to the rbtree
         - rb_add_cached(): like rb_add(), but for a rb_root_cached

       total-order; cmp() based:
         - rb_find(): find an entry in an rbtree
         - rb_find_add(): find an entry, and add if not found

         - rb_find_first(): find the first (leftmost) matching entry
         - rb_next_match(): continue from rb_find_first()
         - rb_for_each(): iterate a sub-tree using the previous two

   - Improve the SMP/NUMA load-balancer: scan for an idle sibling in a
     single pass. This is a 4-commit series where each commit improves
     one aspect of the idle sibling scan logic.

   - Improve the cpufreq cooling driver by getting the effective CPU
     utilization metrics from the scheduler

   - Improve the fair scheduler's active load-balancing logic by
     reducing the number of active LB attempts & lengthen the
     load-balancing interval. This improves stress-ng mmapfork
     performance.

   - Fix CFS's estimated utilization (util_est) calculation bug that can
     result in too high utilization values

  Misc updates & fixes:

   - Fix the HRTICK reprogramming & optimization feature

   - Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code

   - Reduce dl_add_task_root_domain() overhead

   - Fix uprobes refcount bug

   - Process pending softirqs in flush_smp_call_function_from_idle()

   - Clean up task priority related defines, remove *USER_*PRIO and
     USER_PRIO()

   - Simplify the sched_init_numa() deduplication sort

   - Documentation updates

   - Fix EAS bug in update_misfit_status(), which degraded the quality
     of energy-balancing

   - Smaller cleanups"

* tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
  sched,x86: Allow !PREEMPT_DYNAMIC
  entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point
  entry: Explicitly flush pending rcuog wakeup before last rescheduling point
  rcu/nocb: Trigger self-IPI on late deferred wake up before user resume
  rcu/nocb: Perform deferred wake up before last idle's need_resched() check
  rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers
  sched/features: Distinguish between NORMAL and DEADLINE hrtick
  sched/features: Fix hrtick reprogramming
  sched/deadline: Reduce rq lock contention in dl_add_task_root_domain()
  uprobes: (Re)add missing get_uprobe() in __find_uprobe()
  smp: Process pending softirqs in flush_smp_call_function_from_idle()
  sched: Harden PREEMPT_DYNAMIC
  static_call: Allow module use without exposing static_call_key
  sched: Add /debug/sched_preempt
  preempt/dynamic: Support dynamic preempt with preempt= boot option
  preempt/dynamic: Provide irqentry_exit_cond_resched() static call
  preempt/dynamic: Provide preempt_schedule[_notrace]() static calls
  preempt/dynamic: Provide cond_resched() and might_resched() static calls
  preempt: Introduce CONFIG_PREEMPT_DYNAMIC
  static_call: Provide DEFINE_STATIC_CALL_RET0()
  ...
2021-02-21 12:35:04 -08:00
Linus Torvalds 9eef023345 These are the v5.12 updates for the locking subsystem:
- Core locking primitives updates:
 
     - Remove mutex_trylock_recursive() from the API - no users left
     - Simplify + constify the futex code a bit
 
  - Lockdep updates:
 
     - Teach lockdep about local_lock_t
     - Add CONFIG_DEBUG_IRQFLAGS=y debug config option to check for
       potentially unsafe IRQ mask restoration patterns. (I.e.
       calling raw_local_irq_restore() with IRQs enabled.)
     - Add wait context self-tests
     - Fix graph lock corner case corrupting internal data structures
     - Fix noinstr annotations
 
  - LKMM updates:
 
     - Simplify the litmus tests
     - Documentation fixes
 
  - KCSAN updates:
 
     - Re-enable KCSAN instrumentation in lib/random32.c
 
  - Misc fixes:
 
     - Don't branch-trace static label APIs
     - DocBook fix
     - Remove stale leftover empty file
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmAs//sRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1im+g/9G8taVrfiBQ7hg4PoEo28w8fzu5pGBOWd
 rYzUNJO96dW262FbQE6txGDBeGEahnVTz1sGwqKcy1NfZgQBCWj4uZMOluyECrY3
 SV8Iccz2+M6CV+pyjM6Agm7OrgEHxlB/oorZy3TD6s2YeuR6nVGfO3vAXbNNeAsk
 N8TR5mKY8ELbKXkjrc4KauOOiaqsQmVMuV/l/1DLoydDxATYq4Fczh0lcIdwMtYB
 pqzWAKa0Qy2mKcHXe2YMYjddn2JEcDWNGJCsmZTa6m45aaAW1XyICLLxcQ2X8aL+
 aj9rxYTBkZl9vAjrICfbJTtYku6fN48JiDoNRQxUShGVmVKAlHxYQ4vZ7dJz0NHz
 EdRrd9JIr25ImXNHlX2KCKGc/aUm4TvDtNVXCdxVlZGwnEEF8J5VocWKRKmXmA1W
 MkAvPnXnynqRfcMkFaTtTfdMTan41uEixwEnUy++JTuNSMx2ie3VGMC0MgxvTBiH
 iKN5iVtZVa1mUN2593Jd1qdZvGQMeIydMj+WaT4xh5hptjLCGLg4yPgYuoO7vNMT
 uEfv8oODvTN8BqEixNP1Ef9pzxujuSiPoO4ZO4DNnbJJZVw1TwAZIK5Zz1wR1Zso
 Wf1LKPaEOyqz5cFAJ/OxcnxvxMv3fat0vhLNzJlBEFEgKmfRhbsQVUNNL1AcdMJA
 +Npbj/v5seo=
 =BYju
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:
 "Core locking primitives updates:
    - Remove mutex_trylock_recursive() from the API - no users left
    - Simplify + constify the futex code a bit

  Lockdep updates:
    - Teach lockdep about local_lock_t
    - Add CONFIG_DEBUG_IRQFLAGS=y debug config option to check for
      potentially unsafe IRQ mask restoration patterns. (I.e.
      calling raw_local_irq_restore() with IRQs enabled.)
    - Add wait context self-tests
    - Fix graph lock corner case corrupting internal data structures
    - Fix noinstr annotations

  LKMM updates:
    - Simplify the litmus tests
    - Documentation fixes

  KCSAN updates:
    - Re-enable KCSAN instrumentation in lib/random32.c

  Misc fixes:
    - Don't branch-trace static label APIs
    - DocBook fix
    - Remove stale leftover empty file"

* tag 'locking-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  checkpatch: Don't check for mutex_trylock_recursive()
  locking/mutex: Kill mutex_trylock_recursive()
  s390: Use arch_local_irq_{save,restore}() in early boot code
  lockdep: Noinstr annotate warn_bogus_irq_restore()
  locking/lockdep: Avoid unmatched unlock
  locking/rwsem: Remove empty rwsem.h
  locking/rtmutex: Add missing kernel-doc markup
  futex: Remove unneeded gotos
  futex: Change utime parameter to be 'const ... *'
  lockdep: report broken irq restoration
  jump_label: Do not profile branch annotations
  locking: Add Reviewers
  locking/selftests: Add local_lock inversion tests
  locking/lockdep: Exclude local_lock_t from IRQ inversions
  locking/lockdep: Clean up check_redundant() a bit
  locking/lockdep: Add a skip() function to __bfs()
  locking/lockdep: Mark local_lock_t
  locking/selftests: More granular debug_locks_verbose
  lockdep/selftest: Add wait context selftests
  tools/memory-model: Fix typo in klitmus7 compatibility table
  ...
2021-02-21 12:12:01 -08:00
Linus Torvalds d089f48fba These are the latest RCU updates for v5.12:
- Documentation updates.
 
  - Miscellaneous fixes.
 
  - kfree_rcu() updates: Addition of mem_dump_obj() to provide allocator return
    addresses to more easily locate bugs.  This has a couple of RCU-related commits,
    but is mostly MM.  Was pulled in with akpm's agreement.
 
  - Per-callback-batch tracking of numbers of callbacks,
    which enables better debugging information and smarter
    reactions to large numbers of callbacks.
 
  - The first round of changes to allow CPUs to be runtime switched from and to
    callback-offloaded state.
 
  - CONFIG_PREEMPT_RT-related changes.
 
  - RCU CPU stall warning updates.
 
  - Addition of polling grace-period APIs for SRCU.
 
  - Torture-test and torture-test scripting updates, including a "torture everything"
    script that runs rcutorture, locktorture, scftorture, rcuscale, and refscale.
    Plus does an allmodconfig build.
 
  - nolibc fixes for the torture tests
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmAs9lgRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1j/axAAsqIvarDD6OLmgcOPCyWSvfG6LsIFgqI9
 CY0JdQBtvFBTvE8Q2No5ktbLVmuYsBh0dGeFkv4HQZJyRlr7mjstVMNN4SeBDVIS
 /+zZO1wlwzaXfKQopLctTK1O/UzFqIN2sqyzA3nLzGGj8DqgxXJyreJ10feK5XM+
 6ttZPd1qm4hqtpA22ZEODbct5OFqZuvnK8VNqBb2YHabA1rasUXbIEJPBpsuv/W2
 l9W5AGP4erdOFm3nHJxiCpvLJtgHy4njvw0HJp5f99Abj6OVeAzw5kFjvRB3n1Qd
 ayKyTw8T/1mfmkjvYkGsMAqhEmqwXcryFX0dR/14/XPdXyjPhZlbkz+MfRKrn4NT
 LBJPX+MlX9lVFWBNR9HMe2o/083+gorlwZt9wtyt0OBBGGgudYo4uKNdbyy6tB3Y
 Gb98P2vtVSO24EsQce6M+ppHN4TgVBd6id82MQxNuFw+PQJdBiCY0JJfNQApbAry
 cIKOchSSR2SkJHlAevNVaKAeiTnkAXd1jDBKtCnvCqOUyvtnhE3rQCqwS5xT2Cno
 oQydpudwBKT7uO/GUyS0ESErjHuy9zhExNSYD0ydxlBCrGbzrrgPg57ntXHA1die
 mtFyvc2tfT/AshWRNYiuCG+eaUG3qK7n7jN7Vc6/K5DR4GMb5tOhL9wPx2ljCRGu
 Z8WDg0pJGz4=
 =31yj
 -----END PGP SIGNATURE-----

Merge tag 'core-rcu-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:
 "These are the latest RCU updates for v5.12:

   - Documentation updates.

   - Miscellaneous fixes.

   - kfree_rcu() updates: Addition of mem_dump_obj() to provide
     allocator return addresses to more easily locate bugs. This has a
     couple of RCU-related commits, but is mostly MM. Was pulled in with
     akpm's agreement.

   - Per-callback-batch tracking of numbers of callbacks, which enables
     better debugging information and smarter reactions to large numbers
     of callbacks.

   - The first round of changes to allow CPUs to be runtime switched
     from and to callback-offloaded state.

   - CONFIG_PREEMPT_RT-related changes.

   - RCU CPU stall warning updates.

   - Addition of polling grace-period APIs for SRCU.

   - Torture-test and torture-test scripting updates, including a
     "torture everything" script that runs rcutorture, locktorture,
     scftorture, rcuscale, and refscale. Plus does an allmodconfig
     build.

   - nolibc fixes for the torture tests"

* tag 'core-rcu-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
  percpu_ref: Dump mem_dump_obj() info upon reference-count underflow
  rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback
  mm: Make mem_obj_dump() vmalloc() dumps include start and length
  mm: Make mem_dump_obj() handle vmalloc() memory
  mm: Make mem_dump_obj() handle NULL and zero-sized pointers
  mm: Add mem_dump_obj() to print source of memory block
  tools/rcutorture: Fix position of -lgcc in mkinitrd.sh
  tools/nolibc: Fix position of -lgcc in the documented example
  tools/nolibc: Emit detailed error for missing alternate syscall number definitions
  tools/nolibc: Remove incorrect definitions of __ARCH_WANT_*
  tools/nolibc: Get timeval, timespec and timezone from linux/time.h
  tools/nolibc: Implement poll() based on ppoll()
  tools/nolibc: Implement fork() based on clone()
  tools/nolibc: Make getpgrp() fall back to getpgid(0)
  tools/nolibc: Make dup2() rely on dup3() when available
  tools/nolibc: Add the definition for dup()
  rcutorture: Add rcutree.use_softirq=0 to RUDE01 and TASKS01
  torture: Maintain torture-specific set of CPUs-online books
  torture: Clean up after torture-test CPU hotplugging
  rcutorture: Make object_debug also double call_rcu() heap object
  ...
2021-02-21 12:04:41 -08:00
Linus Torvalds 3f6ec19f2d Time and timer updates:
- Instead of new drivers remove tango, sirf, u300 and atlas drivers
  - Add suspend/resume support for microchip pit64b
  - The usual fixes, improvements and cleanups here and there
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmAqjecTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodVTD/9UuKlOifRBNd4ECR+yF65MzLfjqNHU
 j76E3Dzuf4QCbXTjmQAsadaqQ+w9l8Ie7OVT51XlmTczEBkJiV3FOGLVJeRun6R4
 7OLF3VOYyBUtMoRdHzyXaYOTbsOK9gZitucDeLCQhvKhDCkVnKKFXNJR+TTkSYM3
 xDBLwuI7uuHWHyh0+W+3SI1pTiEA4yMe5ZbqqoJbjGQapr3Eao+nyjd1aa3ERb2f
 PtS7UVQ69QowRqq6DQyZk0yKit8J3HZnHfCPH/T6eXsxGnui36GiUnTGCMhLMZpD
 Xvl/5cjqQuKjgt2093t8nGiumOGBOfrb8uvc/qMW777DzFe/VJtXrC/7pySVLAhK
 oc9Swj0iX/WPARzlpyOk3lfpDMzv6qyjMNJIXcnav2lrknITp+TMORKWOADB03UV
 sswlN7YFTrNe7d7uxEdybKkNX6bUwgOzo2m69A1IdSXwKPzYkZQmku6Y7GnzYErZ
 aiJiZl858VB9g24ROKLt/uQTarzYCS0sjcdnDgO1KSR7zKHZ4iUpd3zucd3mlmUo
 fGTMIbCqL/gzl4Zcl6njvzMVfJeMzOeDiQ41wCyYnOsXlIKmWNi1rONdGZcyDYvN
 bOiGVUMKicEZwBlSZQQ1GdR9eGf8/Ix5cTlynZepN3dkHUDc7SU+OvQZdg4Yd/RY
 BsquFDHnt4gl+A==
 =xZQt
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2021-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "Time and timer updates:

   - Instead of new drivers remove tango, sirf, u300 and atlas drivers

   - Add suspend/resume support for microchip pit64b

   - The usual fixes, improvements and cleanups here and there"

* tag 'timers-core-2021-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timens: Delete no-op time_ns_init()
  alarmtimer: Update kerneldoc
  clocksource/drivers/timer-microchip-pit64b: Add clocksource suspend/resume
  clocksource/drivers/prima: Remove sirf prima driver
  clocksource/drivers/atlas: Remove sirf atlas driver
  clocksource/drivers/tango: Remove tango driver
  clocksource/drivers/u300: Remove the u300 driver
  dt-bindings: timer: nuvoton: Clarify that interrupt of timer 0 should be specified
  clocksource/drivers/davinci: Move pr_fmt() before the includes
  clocksource/drivers/efm32: Drop unused timer code
2021-02-21 11:55:43 -08:00
Linus Torvalds b5183bc94b Updates for the irq subsystem:
- The usual new irq chip driver (Realtek RTL83xx)
   - Removal of sirfsoc and tango irq chip drivers
   - Conversion of the sun6i chip support to hierarchical irq domains
   - The usual fixes, improvements and cleanups all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmAqjRITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYob0ND/9ghLC0XF7gSrIFBxNXczQcckd0Ev8y
 gt7EJMxy9+UG/krEmLk4p5FkkKaUQy4v8PMqhDpI0pR7Fdm2oaOCgPRHsNdVYr5J
 g7FawE99qPiRXaR29rpfX6qIBRP3rJ4/MS90YlcXGvKbgPtqCXStEWrIyu1/1Q8A
 3BdfKL3K15/TeWeu9075yHX8V4Ovss6Birc0/XNtQWiHNk84GAZVIQPjfdHEu83j
 Kv4QxlrfSDjCvB016YKL04kjrwzUXxksvGnyjFXNXFZwg0bVtLTujSdkXajUBQjv
 1ZpcEM+gBPdeOU99ELYeaQ9B0Di2mNjya5TBttmKdKW/nQ01BkWUouoqyBRqjpvd
 +2uXaad/D7iKUYXqXtSim8OpwBLb5THQdG4qn1vgYZOUxWu9DKdtZEO3hw8edgjN
 zJEQFI8yBeDtcAjKTqm0C2fomlWPhDM0A97EHmV8LiQpz0PyY9q8Oogg51t1RggH
 7w9+YjjrGkaPifZXX+YogJT/CbFanK6lOUIFg/qro1QcicCsEbWrTaHUOwdpWpHr
 B4FKZIfrlnrY7yMC7aClPQSJr3iUXBUjfWhRKvYiwOLTabfUWz+HRFA++R8vytbL
 RvEZnhuvi3eyRn91yiapQM/f69yC1yslTZHy4/Ww9kvtOoawQpmegnZ3dLMhwBN1
 aFgQi4U9Z7G2eA==
 =rM8s
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2021-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "Updates for the irq subsystem:

   - The usual new irq chip driver (Realtek RTL83xx)

   - Removal of sirfsoc and tango irq chip drivers

   - Conversion of the sun6i chip support to hierarchical irq domains

   - The usual fixes, improvements and cleanups all over the place"

* tag 'irq-core-2021-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/imx: IMX_INTMUX should not default to y, unconditionally
  irqchip/loongson-pch-msi: Use bitmap_zalloc() to allocate bitmap
  irqchip/csky-mpintc: Prevent selection on unsupported platforms
  irqchip: Add support for Realtek RTL838x/RTL839x interrupt controller
  dt-bindings: interrupt-controller: Add Realtek RTL838x/RTL839x support
  irqchip/ls-extirq: add IRQCHIP_SKIP_SET_WAKE to the irqchip flags
  genirq: Use new tasklet API for resend_tasklet
  dt-bindings: qcom,pdc: Add compatible for SM8350
  dt-bindings: qcom,pdc: Add compatible for SM8250
  irqchip/sun6i-r: Add wakeup support
  irqchip/sun6i-r: Use a stacked irqchip driver
  dt-bindings: irq: sun6i-r: Add a compatible for the H3
  dt-bindings: irq: sun6i-r: Split the binding from sun7i-nmi
  irqchip/gic-v3: Fix typos in PMR/RPR SCR_EL3.FIQ handling explanation
  irqchip: Remove sirfsoc driver
  irqchip: Remove sigma tango driver
2021-02-21 11:53:06 -08:00
Linus Torvalds 582cd91f69 for-5.12/block-2021-02-17
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmAtmIwQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgplzLEAC5O+3rBM8QuiJdo39Yppmuw4hDJ6hOKynP
 EJQLKQQi0VfXgU+MprGvcbpFYmNbgICvUICQkEzJuk++kPCu/BJtJz0yErQeLgS+
 RdXiPV6enbF7iRML5TVRTr1q/z7sJMXcIIJ8Pz/rU/JNfGYExVd0WfnEY9mp1jOt
 Bl9V+qyTazdP+Ma4+uEPatSayqcdi1rxB5I+7v/sLiOvKZZWkaRZjUZ/mxAjUfvK
 dBOOPjMygEo3tCLkIyyA6lpLvr1r+SUZhLuebRLEKa3To3TW6RtoG0qwpKmI2iKw
 ylLeVLB60nM9RUxjflVOfBsHxz1bDg5Ve86y5nCjQd4Jo8x1c4DnecyGE5/Tu8Rg
 rgbsfD6nFWzhDCvcZT0XrfQ4ZAjIL2IfT+ypQiQ6UlRd3hvIKRmzWMkjuH2svr0u
 ey9Kq+lYerI4cM0F3W73gzUKdIQOuCzBCYxQuSQQomscBa7FCInyU192dAI9Aj6l
 Yd06mgKu6qCx6zLv6JfpBqaBHZMwyGE4dmZgPQFuuwO+b4N+Ck3Jm5fzEzw/xIxQ
 wdo/DlsAl60BXentB6FByGBJaCjVdSymRqN/xNCAbFKCjmr6TLBuXPfg1gYYO7xC
 VOcVjWe8iN3wWHZab3t2mxMKH9B9B/KKzIhu6TNHSmgtQ5paZPRCBx995pDyRw26
 WC22RGC2MA==
 =os1E
 -----END PGP SIGNATURE-----

Merge tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block

Pull core block updates from Jens Axboe:
 "Another nice round of removing more code than what is added, mostly
  due to Christoph's relentless pursuit of tech debt removal/cleanups.
  This pull request contains:

   - Two series of BFQ improvements (Paolo, Jan, Jia)

   - Block iov_iter improvements (Pavel)

   - bsg error path fix (Pan)

   - blk-mq scheduler improvements (Jan)

   - -EBUSY discard fix (Jan)

   - bvec allocation improvements (Ming, Christoph)

   - bio allocation and init improvements (Christoph)

   - Store bdev pointer in bio instead of gendisk + partno (Christoph)

   - Block trace point cleanups (Christoph)

   - hard read-only vs read-only split (Christoph)

   - Block based swap cleanups (Christoph)

   - Zoned write granularity support (Damien)

   - Various fixes/tweaks (Chunguang, Guoqing, Lei, Lukas, Huhai)"

* tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block: (104 commits)
  mm: simplify swapdev_block
  sd_zbc: clear zone resources for non-zoned case
  block: introduce blk_queue_clear_zone_settings()
  zonefs: use zone write granularity as block size
  block: introduce zone_write_granularity limit
  block: use blk_queue_set_zoned in add_partition()
  nullb: use blk_queue_set_zoned() to setup zoned devices
  nvme: cleanup zone information initialization
  block: document zone_append_max_bytes attribute
  block: use bi_max_vecs to find the bvec pool
  md/raid10: remove dead code in reshape_request
  block: mark the bio as cloned in bio_iov_bvec_set
  block: set BIO_NO_PAGE_REF in bio_iov_bvec_set
  block: remove a layer of indentation in bio_iov_iter_get_pages
  block: turn the nr_iovecs argument to bio_alloc* into an unsigned short
  block: remove the 1 and 4 vec bvec_slabs entries
  block: streamline bvec_alloc
  block: factor out a bvec_alloc_gfp helper
  block: move struct biovec_slab to bio.c
  block: reuse BIO_INLINE_VECS for integrity bvecs
  ...
2021-02-21 11:02:48 -08:00
Linus Torvalds 24880bef41 Remove oprofile and dcookies support
The "oprofile" user-space tools don't use the kernel OPROFILE support any more,
 and haven't in a long time. User-space has been converted to the perf
 interfaces.
 
 The dcookies stuff is only used by the oprofile code. Now that oprofile's
 support is getting removed from the kernel, there is no need for dcookies as
 well.
 
 Remove kernel's old oprofile and dcookies support.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJgJMEVAAoJENK5HDyugRIcL8YP/jkmXH5CZT80ntcqrJGWKcG7
 lWbach7uNeQteht7B1ZPKvojxizTkmfrN2sClX0B2hbGkc5TiWUQ2ZSnvnfWDZ8+
 z2qQcEB11G/ReL2vvRk1fJlWdAOyUfrPee/44AkemnLRv+Niw/8PqnGd87yDQGsK
 qy5E1XXfbjUq6Y/uMiLOX3+21I6w6o2Q6I3NNXC93s0wS3awqnft8n0XBC7iAPBj
 eowRJxpdRU2Vcuj8UOzzOI7gQlwdjwYImyLPbRy/V8NawC8a+FHrPrf5/GCYlVzl
 7TGFBsDQSmzvrBChUfoGz1Rq/VZ1a357p5rhRqemfUrdkjW+vyzelnD8I1W/hb2o
 SmBXoPoyl3+UkFHNyJI0mI7obaV+2PzyXMV0JIQUj+IiX/mfeFv0nF4XfZD2IkRt
 6xhaYj775Zrx32iBdGZIvvLg5Gh9ZkZmR5vJ7Fi/EIZFe6Z+bZnPKUROnAgS/o0z
 +UkSygOhgo/1XbqrzZVk1iweWeu+EUMbY4YQv2qVnFhpvsq4ieThcUGQpWcxGjjH
 WP8O0n1yq1slsnpUtxhiTsm46ENajx9zZp6Iv6Ws+NM0RUqjND8BdF1co9WGD3LS
 cnZMFBs4Bg/V1HICL/D4s6L7t1ofrEXIgJH1y3iF0HeECq03mU4CgA/qly9Aebqg
 UxPF3oNlVOPlds9FzsU2
 =I2Ac
 -----END PGP SIGNATURE-----

Merge tag 'oprofile-removal-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux

Pull oprofile and dcookies removal from Viresh Kumar:
 "Remove oprofile and dcookies support

  The 'oprofile' user-space tools don't use the kernel OPROFILE support
  any more, and haven't in a long time. User-space has been converted to
  the perf interfaces.

  The dcookies stuff is only used by the oprofile code. Now that
  oprofile's support is getting removed from the kernel, there is no
  need for dcookies as well.

  Remove kernel's old oprofile and dcookies support"

* tag 'oprofile-removal-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux:
  fs: Remove dcookies support
  drivers: Remove CONFIG_OPROFILE support
  arch: xtensa: Remove CONFIG_OPROFILE support
  arch: x86: Remove CONFIG_OPROFILE support
  arch: sparc: Remove CONFIG_OPROFILE support
  arch: sh: Remove CONFIG_OPROFILE support
  arch: s390: Remove CONFIG_OPROFILE support
  arch: powerpc: Remove oprofile
  arch: powerpc: Stop building and using oprofile
  arch: parisc: Remove CONFIG_OPROFILE support
  arch: mips: Remove CONFIG_OPROFILE support
  arch: microblaze: Remove CONFIG_OPROFILE support
  arch: ia64: Remove rest of perfmon support
  arch: ia64: Remove CONFIG_OPROFILE support
  arch: hexagon: Don't select HAVE_OPROFILE
  arch: arc: Remove CONFIG_OPROFILE support
  arch: arm: Remove CONFIG_OPROFILE support
  arch: alpha: Remove CONFIG_OPROFILE support
2021-02-21 10:40:34 -08:00
Linus Torvalds 591fd30eee Merge branch 'work.elf-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ELF compat updates from Al Viro:
 "Sanitizing ELF compat support, especially for triarch architectures:

   - X32 handling cleaned up

   - MIPS64 uses compat_binfmt_elf.c both for O32 and N32 now

   - Kconfig side of things regularized

  Eventually I hope to have compat_binfmt_elf.c killed, with both native
  and compat built from fs/binfmt_elf.c, with -DELF_BITS={64,32} passed
  by kbuild, but that's a separate story - not included here"

* 'work.elf-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  get rid of COMPAT_ELF_EXEC_PAGESIZE
  compat_binfmt_elf: don't bother with undef of ELF_ARCH
  Kconfig: regularize selection of CONFIG_BINFMT_ELF
  mips compat: switch to compat_binfmt_elf.c
  mips: don't bother with ELF_CORE_EFLAGS
  mips compat: don't bother with ELF_ET_DYN_BASE
  mips: KVM_GUEST makes no sense for 64bit builds...
  mips: kill unused definitions in binfmt_elf[on]32.c
  mips binfmt_elf*32.c: use elfcore-compat.h
  x32: make X32, !IA32_EMULATION setups able to execute x32 binaries
  [amd64] clean PRSTATUS_SIZE/SET_PR_FPVALID up properly
  elf_prstatus: collect the common part (everything before pr_reg) into a struct
  binfmt_elf: partially sanitize PRSTATUS_SIZE and SET_PR_FPVALID
2021-02-21 09:29:23 -08:00
Linus Torvalds 02f9fc286e Power management updates for 5.12-rc1
- Add new power capping facility called DTPM (Dynamic Thermal Power
    Management), based on the existing power capping framework, to
    allow aggregate power constraints to be applied to sets of devices
    in a distributed manner, along with a CPU backend driver based on
    the Energy Model (Daniel Lezcano, Dan Carpenter, Colin Ian King).
 
  - Add AlderLake Mobile support to the Intel RAPL power capping
    driver and make it use the topology interface when laying out the
    system topology (Zhang Rui, Yunfeng Ye).
 
  - Drop the cpufreq tango driver belonging to a platform that is not
    supported any more (Arnd Bergmann).
 
  - Drop the redundant CPUFREQ_STICKY and CPUFREQ_PM_NO_WARN cpufreq
    driver flags (Viresh Kumar).
 
  - Update cpufreq drivers:
 
    * Fix max CPU frequency discovery in the intel_pstate driver and
      make janitorial changes in it (Chen Yu, Rafael Wysocki, Nigel
      Christian).
 
    * Fix resource leaks in the brcmstb-avs-cpufreq driver (Christophe
      JAILLET).
 
    * Make the tegra20 driver use the resource-managed API (Dmitry
      Osipenko).
 
    * Enable boost support in the qcom-hw driver (Shawn Guo).
 
  - Update the operating performance points (OPP) framework:
 
    * Clean up the OPP core (Dmitry Osipenko, Viresh Kumar).
 
    * Extend the OPP API by adding new helpers to it (Dmitry Osipenko,
      Viresh Kumar).
 
    * Allow required OPPs to be used for devfreq devices and update
      the devfreq governor code accordingly (Saravana Kannan).
 
    * Prepare the framework for introducing new dev_pm_opp_set_opp()
      helper (Viresh Kumar).
 
    * Drop dev_pm_opp_set_bw() and update related drivers (Viresh
      Kumar).
 
    * Allow lazy linking of required-OPPs (Viresh Kumar).
 
  - Simplify and clean up devfreq somewhat (Lukasz Luba, Yang Li,
    Pierre Kuo).
 
  - Update the generic power domains (genpd) framework:
 
    * Use device's next wakeup to determine domain idle state (Lina
      Iyer).
 
    * Improve initialization and debug (Dmitry Osipenko).
 
    * Simplify computations (Abaci Team).
 
  - Make janitorial changes in the core code handling system sleep
    and PM-runtime (Bhaskar Chowdhury, Bjorn Helgaas, Rikard Falkeborn,
    Zqiang).
 
  - Update the MAINTAINERS entry for the exynos cpuidle driver and
    drop DEBUG definition from intel_idle (Krzysztof Kozlowski, Tom
    Rix).
 
  - Extend the PM clock layer to cover clocks that must sleep (Nicolas
    Pitre).
 
  - Update the cpupower utility:
 
    * Update cpupower command, add support for AMD family 0x19 and clean
      up the code to remove many of the family checks to make future
      family updates easier (Nathan Fontenot, Robert Richter).
 
    * Add Makefile dependencies for install targets to allow building
      cpupower in parallel rather than serially (Ivan Babrou).
 
  - Make janitorial changes in power management Kconfig (Lukasz Luba).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmAquvMSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx7MEQAIFx7WGu0TquvYYbX1Op4oxHaN5wLZba
 XZjh5g4dSn+fKF+WXO4+Ze8VwFx2E6p2650OYJ9A3H83yqjWz5x0CoYm++LTWkWJ
 CRPhyAL2JzrqDL2oJ/7XdK41cz/DT1p2B5cdOVeI4OaLhTzLa4TAO3EwCA/Eyayb
 qumbp6vt3G3zfSDLMA8Wa/HDNaWcN0/NDdnhlj9zlT27COyFJJUARa1jGA/5+2BK
 k/LP7homFIPf8vTGMkL/JuHU7Fsqa2vCFEzB22xyD8GE59dHoXYTlKA2pOr/2lJM
 VV5h0FDS6lFbkp6AQDHwh5tsGusT7dREFXebBUWtmETZmB9iQZXAo4k+MnUbYGSt
 stYFdDJpQkK42icF7uhJE1xuZkQ16xBm02pBvlJWMSyYyHCHhUH83VxhA11szA5/
 glLHfhhdbAa1BKmFHuTEZiwCAssY+YmoAvUpgLW04csJ4s4+my7VCgVe6jOILv2H
 0PdakYam/UEXPoDR4bAdJePZQvwbeUQUtmZ/oYb7Ab2ztfFcpVtZ5T0QeKazXkiZ
 BDtJ+XQJAhZOmLL4u2owGdevvjQU+68QZF1NfMoOTW2K7bcok3pVjj5uKRkfir6F
 R74mzHNNKOWBO9NwByinrMNJqHqPxfbREvg92Xi7GGDtt+rD7/K3K4Au+9W9MxwL
 AvUhPGadUZT1
 =9xhj
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These add a new power capping facility allowing aggregate power
  constraints to be applied to sets of devices in a distributed manner,
  add a new CPU ID to the RAPL power capping driver and improve it, drop
  a cpufreq driver belonging to a platform that is not supported any
  more, drop two redundant cpufreq driver flags, update cpufreq drivers
  (intel_pstate, brcmstb-avs, qcom-hw), update the operating performance
  points (OPP) framework (code cleanups, new helpers, devfreq-related
  modifications), clean up devfreq, extend the PM clock layer, update
  the cpupower utility and make assorted janitorial changes.

  Specifics:

   - Add new power capping facility called DTPM (Dynamic Thermal Power
     Management), based on the existing power capping framework, to
     allow aggregate power constraints to be applied to sets of devices
     in a distributed manner, along with a CPU backend driver based on
     the Energy Model (Daniel Lezcano, Dan Carpenter, Colin Ian King).

   - Add AlderLake Mobile support to the Intel RAPL power capping driver
     and make it use the topology interface when laying out the system
     topology (Zhang Rui, Yunfeng Ye).

   - Drop the cpufreq tango driver belonging to a platform that is not
     supported any more (Arnd Bergmann).

   - Drop the redundant CPUFREQ_STICKY and CPUFREQ_PM_NO_WARN cpufreq
     driver flags (Viresh Kumar).

   - Update cpufreq drivers:

      * Fix max CPU frequency discovery in the intel_pstate driver and
        make janitorial changes in it (Chen Yu, Rafael Wysocki, Nigel
        Christian).

      * Fix resource leaks in the brcmstb-avs-cpufreq driver (Christophe
        JAILLET).

      * Make the tegra20 driver use the resource-managed API (Dmitry
        Osipenko).

      * Enable boost support in the qcom-hw driver (Shawn Guo).

   - Update the operating performance points (OPP) framework:

      * Clean up the OPP core (Dmitry Osipenko, Viresh Kumar).

      * Extend the OPP API by adding new helpers to it (Dmitry Osipenko,
        Viresh Kumar).

      * Allow required OPPs to be used for devfreq devices and update
        the devfreq governor code accordingly (Saravana Kannan).

      * Prepare the framework for introducing new dev_pm_opp_set_opp()
        helper (Viresh Kumar).

      * Drop dev_pm_opp_set_bw() and update related drivers (Viresh
        Kumar).

      * Allow lazy linking of required-OPPs (Viresh Kumar).

   - Simplify and clean up devfreq somewhat (Lukasz Luba, Yang Li,
     Pierre Kuo).

   - Update the generic power domains (genpd) framework:

      * Use device's next wakeup to determine domain idle state (Lina
        Iyer).

      * Improve initialization and debug (Dmitry Osipenko).

      * Simplify computations (Abaci Team).

   - Make janitorial changes in the core code handling system sleep and
     PM-runtime (Bhaskar Chowdhury, Bjorn Helgaas, Rikard Falkeborn,
     Zqiang).

   - Update the MAINTAINERS entry for the exynos cpuidle driver and drop
     DEBUG definition from intel_idle (Krzysztof Kozlowski, Tom Rix).

   - Extend the PM clock layer to cover clocks that must sleep (Nicolas
     Pitre).

   - Update the cpupower utility:

      * Update cpupower command, add support for AMD family 0x19 and
        clean up the code to remove many of the family checks to make
        future family updates easier (Nathan Fontenot, Robert Richter).

      * Add Makefile dependencies for install targets to allow building
        cpupower in parallel rather than serially (Ivan Babrou).

   - Make janitorial changes in power management Kconfig (Lukasz Luba)"

* tag 'pm-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits)
  MAINTAINERS: cpuidle: exynos: include header in file pattern
  powercap: intel_rapl: Use topology interface in rapl_init_domains()
  powercap: intel_rapl: Use topology interface in rapl_add_package()
  PM: sleep: Constify static struct attribute_group
  PM: Kconfig: remove unneeded "default n" options
  PM: EM: update Kconfig description and drop "default n" option
  cpufreq: Remove unused flag CPUFREQ_PM_NO_WARN
  cpufreq: Remove CPUFREQ_STICKY flag
  PM / devfreq: Add required OPPs support to passive governor
  PM / devfreq: Cache OPP table reference in devfreq
  OPP: Add function to look up required OPP's for a given OPP
  PM / devfreq: rk3399_dmc: Remove unneeded semicolon
  opp: Replace ENOTSUPP with EOPNOTSUPP
  opp: Fix "foo * bar" should be "foo *bar"
  opp: Don't ignore clk_get() errors other than -ENOENT
  opp: Update bandwidth requirements based on scaling up/down
  opp: Allow lazy-linking of required-opps
  opp: Remove dev_pm_opp_set_bw()
  devfreq: tegra30: Migrate to dev_pm_opp_set_opp()
  drm: msm: Migrate to dev_pm_opp_set_opp()
  ...
2021-02-20 21:42:18 -08:00
Linus Torvalds 51e6d17809 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:
 "Here is what we have this merge window:

   1) Support SW steering for mlx5 Connect-X6Dx, from Yevgeny Kliteynik.

   2) Add RSS multi group support to octeontx2-pf driver, from Geetha
      Sowjanya.

   3) Add support for KS8851 PHY. From Marek Vasut.

   4) Add support for GarfieldPeak bluetooth controller from Kiran K.

   5) Add support for half-duplex tcan4x5x can controllers.

   6) Add batch skb rx processing to bcrm63xx_enet, from Sieng Piaw
      Liew.

   7) Rework RX port offload infrastructure, particularly wrt, UDP
      tunneling, from Jakub Kicinski.

   8) Add BCM72116 PHY support, from Florian Fainelli.

   9) Remove Dsa specific notifiers, they are unnecessary. From Vladimir
      Oltean.

  10) Add support for picosecond rx delay in dwmac-meson8b chips. From
      Martin Blumenstingl.

  11) Support TSO on xfrm interfaces from Eyal Birger.

  12) Add support for MP_PRIO to mptcp stack, from Geliang Tang.

  13) Support BCM4908 integrated switch, from Rafał Miłecki.

  14) Support for directly accessing kernel module variables via module
      BTF info, from Andrii Naryiko.

  15) Add DASH (esktop and mobile Architecture for System Hardware)
      support to r8169 driver, from Heiner Kallweit.

  16) Add rx vlan filtering to dpaa2-eth, from Ionut-robert Aron.

  17) Add support for 100 base0x SFP devices, from Bjarni Jonasson.

  18) Support link aggregation in DSA, from Tobias Waldekranz.

  19) Support for bitwidse atomics in bpf, from Brendan Jackman.

  20) SmartEEE support in at803x driver, from Russell King.

  21) Add support for flow based tunneling to GTP, from Pravin B Shelar.

  22) Allow arbitrary number of interconnrcts in ipa, from Alex Elder.

  23) TLS RX offload for bonding, from Tariq Toukan.

  24) RX decap offklload support in mac80211, from Felix Fietkou.

  25) devlink health saupport in octeontx2-af, from George Cherian.

  26) Add TTL attr to SCM_TIMESTAMP_OPT_STATS, from Yousuk Seung

  27) Delegated actionss support in mptcp, from Paolo Abeni.

  28) Support receive timestamping when doin zerocopy tcp receive. From
      Arjun Ray.

  29) HTB offload support for mlx5, from Maxim Mikityanskiy.

  30) UDP GRO forwarding, from Maxim Mikityanskiy.

  31) TAPRIO offloading in dsa hellcreek driver, from Kurt Kanzenbach.

  32) Weighted random twos choice algorithm for ipvs, from Darby Payne.

  33) Fix netdev registration deadlock, from Johannes Berg.

  34) Various conversions to new tasklet api, from EmilRenner Berthing.

  35) Bulk skb allocations in veth, from Lorenzo Bianconi.

  36) New ethtool interface for lane setting, from Danielle Ratson.

  37) Offload failiure notifications for routes, from Amit Cohen.

  38) BCM4908 support, from Rafał Miłecki.

  39) Support several new iwlwifi chips, from Ihab Zhaika.

  40) Flow drector support for ipv6 in i40e, from Przemyslaw Patynowski.

  41) Support for mhi prrotocols, from Loic Poulain.

  42) Optimize bpf program stats.

  43) Implement RFC6056, for better port randomization, from Eric
      Dumazet.

  44) hsr tag offloading support from George McCollister.

  45) Netpoll support in qede, from Bhaskar Upadhaya.

  46) 2005/400g speed support in bonding 3ad mode, from Nikolay
      Aleksandrov.

  47) Netlink event support in mptcp, from Florian Westphal.

  48) Better skbuff caching, from Alexander Lobakin.

  49) MRP (Media Redundancy Protocol) offloading in DSA and a few
      drivers, from Horatiu Vultur.

  50) mqprio saupport in mvneta, from Maxime Chevallier.

  51) Remove of_phy_attach, no longer needed, from Florian Fainelli"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1766 commits)
  octeontx2-pf: Fix otx2_get_fecparam()
  cteontx2-pf: cn10k: Prevent harmless double shift bugs
  net: stmmac: Add PCI bus info to ethtool driver query output
  ptp: ptp_clockmatrix: clean-up - parenthesis around a == b are unnecessary
  ptp: ptp_clockmatrix: Simplify code - remove unnecessary `err` variable.
  ptp: ptp_clockmatrix: Coding style - tighten vertical spacing.
  ptp: ptp_clockmatrix: Clean-up dev_*() messages.
  ptp: ptp_clockmatrix: Remove unused header declarations.
  ptp: ptp_clockmatrix: Add alignment of 1 PPS to idtcm_perout_enable.
  ptp: ptp_clockmatrix: Add wait_for_sys_apll_dpll_lock.
  net: stmmac: dwmac-sun8i: Add a shutdown callback
  net: stmmac: dwmac-sun8i: Minor probe function cleanup
  net: stmmac: dwmac-sun8i: Use reset_control_reset
  net: stmmac: dwmac-sun8i: Remove unnecessary PHY power check
  net: stmmac: dwmac-sun8i: Return void from PHY unpower
  r8169: use macro pm_ptr
  net: mdio: Remove of_phy_attach()
  net: mscc: ocelot: select PACKING in the Kconfig
  net: re-solve some conflicts after net -> net-next merge
  net: dsa: tag_rtl4_a: Support also egress tags
  ...
2021-02-20 17:45:32 -08:00
Frederic Weisbecker 4ae7dc97f7 entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point
Following the idle loop model, cleanly check for pending rcuog wakeup
before the last rescheduling point upon resuming to guest mode. This
way we can avoid to do it from rcu_user_enter() with the last resort
self-IPI hack that enforces rescheduling.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-6-frederic@kernel.org
2021-02-17 14:12:43 +01:00
Frederic Weisbecker 47b8ff194c entry: Explicitly flush pending rcuog wakeup before last rescheduling point
Following the idle loop model, cleanly check for pending rcuog wakeup
before the last rescheduling point on resuming to user mode. This
way we can avoid to do it from rcu_user_enter() with the last resort
self-IPI hack that enforces rescheduling.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-5-frederic@kernel.org
2021-02-17 14:12:43 +01:00
Frederic Weisbecker f8bb5cae96 rcu/nocb: Trigger self-IPI on late deferred wake up before user resume
Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
kthread (rcuog) to be serviced.

Unfortunately the call to rcu_user_enter() is already past the last
rescheduling opportunity before we resume to userspace or to guest mode.
We may escape there with the woken task ignored.

The ultimate resort to fix every callsites is to trigger a self-IPI
(nohz_full depends on arch to implement arch_irq_work_raise()) that will
trigger a reschedule on IRQ tail or guest exit.

Eventually every site that want a saner treatment will need to carefully
place a call to rcu_nocb_flush_deferred_wakeup() before the last explicit
need_resched() check upon resume.

Fixes: 96d3fd0d31 (rcu: Break call_rcu() deadlock involving scheduler and perf)
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-4-frederic@kernel.org
2021-02-17 14:12:43 +01:00
Frederic Weisbecker 43789ef3f7 rcu/nocb: Perform deferred wake up before last idle's need_resched() check
Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
kthread (rcuog) to be serviced.

Usually a local wake up happening while running the idle task is handled
in one of the need_resched() checks carefully placed within the idle
loop that can break to the scheduler.

Unfortunately the call to rcu_idle_enter() is already beyond the last
generic need_resched() check and we may halt the CPU with a resched
request unhandled, leaving the task hanging.

Fix this with splitting the rcuog wakeup handling from rcu_idle_enter()
and place it before the last generic need_resched() check in the idle
loop. It is then assumed that no call to call_rcu() will be performed
after that in the idle loop until the CPU is put in low power mode.

Fixes: 96d3fd0d31 (rcu: Break call_rcu() deadlock involving scheduler and perf)
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-3-frederic@kernel.org
2021-02-17 14:12:43 +01:00
Frederic Weisbecker 54b7429eff rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers
Deferred wakeup of rcuog kthreads upon RCU idle mode entry is going to
be handled differently whether initiated by idle, user or guest. Prepare
with pulling that control up to rcu_eqs_enter() callers.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-2-frederic@kernel.org
2021-02-17 14:12:42 +01:00
Juri Lelli e0ee463c93 sched/features: Distinguish between NORMAL and DEADLINE hrtick
The HRTICK feature has traditionally been servicing configurations that
need precise preemptions point for NORMAL tasks. More recently, the
feature has been extended to also service DEADLINE tasks with stringent
runtime enforcement needs (e.g., runtime < 1ms with HZ=1000).

Enabling HRTICK sched feature currently enables the additional timer and
task tick for both classes, which might introduced undesired overhead
for no additional benefit if one needed it only for one of the cases.

Separate HRTICK sched feature in two (and leave the traditional case
name unmodified) so that it can be selectively enabled when needed.

With:

  $ echo HRTICK > /sys/kernel/debug/sched_features

the NORMAL/fair hrtick gets enabled.

With:

  $ echo HRTICK_DL > /sys/kernel/debug/sched_features

the DEADLINE hrtick gets enabled.

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210208073554.14629-3-juri.lelli@redhat.com
2021-02-17 14:12:42 +01:00
Juri Lelli 156ec6f42b sched/features: Fix hrtick reprogramming
Hung tasks and RCU stall cases were reported on systems which were not
100% busy. Investigation of such unexpected cases (no sign of potential
starvation caused by tasks hogging the system) pointed out that the
periodic sched tick timer wasn't serviced anymore after a certain point
and that caused all machinery that depends on it (timers, RCU, etc.) to
stop working as well. This issues was however only reproducible if
HRTICK was enabled.

Looking at core dumps it was found that the rbtree of the hrtimer base
used also for the hrtick was corrupted (i.e. next as seen from the base
root and actual leftmost obtained by traversing the tree are different).
Same base is also used for periodic tick hrtimer, which might get "lost"
if the rbtree gets corrupted.

Much alike what described in commit 1f71addd34 ("tick/sched: Do not
mess with an enqueued hrtimer") there is a race window between
hrtimer_set_expires() in hrtick_start and hrtimer_start_expires() in
__hrtick_restart() in which the former might be operating on an already
queued hrtick hrtimer, which might lead to corruption of the base.

Use hrtick_start() (which removes the timer before enqueuing it back) to
ensure hrtick hrtimer reprogramming is entirely guarded by the base
lock, so that no race conditions can occur.

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210208073554.14629-2-juri.lelli@redhat.com
2021-02-17 14:12:42 +01:00
Dietmar Eggemann de40f33e78 sched/deadline: Reduce rq lock contention in dl_add_task_root_domain()
dl_add_task_root_domain() is called during sched domain rebuild:

  rebuild_sched_domains_locked()
    partition_and_rebuild_sched_domains()
      rebuild_root_domains()
         for all top_cpuset descendants:
           update_tasks_root_domain()
             for all tasks of cpuset:
               dl_add_task_root_domain()

Change it so that only the task pi lock is taken to check if the task
has a SCHED_DEADLINE (DL) policy. In case that p is a DL task take the
rq lock as well to be able to safely de-reference root domain's DL
bandwidth structure.

Most of the tasks will have another policy (namely SCHED_NORMAL) and
can now bail without taking the rq lock.

One thing to note here: Even in case that there aren't any DL user
tasks, a slow frequency switching system with cpufreq gov schedutil has
a DL task (sugov) per frequency domain running which participates in DL
bandwidth management.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20210119083542.19856-1-dietmar.eggemann@arm.com
2021-02-17 14:12:42 +01:00
Sven Schnelle b0d6d47896 uprobes: (Re)add missing get_uprobe() in __find_uprobe()
commit c6bc9bd06dff ("rbtree, uprobes: Use rbtree helpers")
accidentally removed the refcount increase. Add it again.

Fixes: c6bc9bd06dff ("rbtree, uprobes: Use rbtree helpers")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210209150711.36778-1-svens@linux.ibm.com
2021-02-17 14:12:42 +01:00
Sebastian Andrzej Siewior f9d34595ae smp: Process pending softirqs in flush_smp_call_function_from_idle()
send_call_function_single_ipi() may wake an idle CPU without sending an
IPI. The woken up CPU will process the SMP-functions in
flush_smp_call_function_from_idle(). Any raised softirq from within the
SMP-function call will not be processed.
Should the CPU have no tasks assigned, then it will go back to idle with
pending softirqs and the NOHZ will rightfully complain.

Process pending softirqs on return from flush_smp_call_function_queue().

Fixes: b2a02fc43a ("smp: Optimize send_call_function_single_ipi()")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210123201027.3262800-2-bigeasy@linutronix.de
2021-02-17 14:12:42 +01:00
Peter Zijlstra ef72661e28 sched: Harden PREEMPT_DYNAMIC
Use the new EXPORT_STATIC_CALL_TRAMP() / static_call_mod() to unexport
the static_call_key for the PREEMPT_DYNAMIC calls such that modules
can no longer update these calls.

Having modules change/hi-jack the preemption calls would be horrible.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-02-17 14:12:42 +01:00
Josh Poimboeuf 73f44fe19d static_call: Allow module use without exposing static_call_key
When exporting static_call_key; with EXPORT_STATIC_CALL*(), the module
can use static_call_update() to change the function called.  This is
not desirable in general.

Not exporting static_call_key however also disallows usage of
static_call(), since objtool needs the key to construct the
static_call_site.

Solve this by allowing objtool to create the static_call_site using
the trampoline address when it builds a module and cannot find the
static_call_key symbol. The module loader will then try and map the
trampole back to a key before it constructs the normal sites list.

Doing this requires a trampoline -> key associsation, so add another
magic section that keeps those.

Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210127231837.ifddpn7rhwdaepiu@treble
2021-02-17 14:12:42 +01:00
Peter Zijlstra e59e10f8ef sched: Add /debug/sched_preempt
Add a debugfs file to muck about with the preempt mode at runtime.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/YAsGiUYf6NyaTplX@hirez.programming.kicks-ass.net
2021-02-17 14:12:42 +01:00
Peter Zijlstra (Intel) 826bfeb37b preempt/dynamic: Support dynamic preempt with preempt= boot option
Support the preempt= boot option and patch the static call sites
accordingly.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-9-frederic@kernel.org
2021-02-17 14:12:42 +01:00
Peter Zijlstra (Intel) 40607ee97e preempt/dynamic: Provide irqentry_exit_cond_resched() static call
Provide static call to control IRQ preemption (called in CONFIG_PREEMPT)
so that we can override its behaviour when preempt= is overriden.

Since the default behaviour is full preemption, its call is
initialized to provide IRQ preemption when preempt= isn't passed.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-8-frederic@kernel.org
2021-02-17 14:12:42 +01:00
Peter Zijlstra (Intel) 2c9a98d3bc preempt/dynamic: Provide preempt_schedule[_notrace]() static calls
Provide static calls to control preempt_schedule[_notrace]()
(called in CONFIG_PREEMPT) so that we can override their behaviour when
preempt= is overriden.

Since the default behaviour is full preemption, both their calls are
initialized to the arch provided wrapper, if any.

[fweisbec: only define static calls when PREEMPT_DYNAMIC, make it less
           dependent on x86 with __preempt_schedule_func]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-7-frederic@kernel.org
2021-02-17 14:12:42 +01:00
Peter Zijlstra (Intel) b965f1ddb4 preempt/dynamic: Provide cond_resched() and might_resched() static calls
Provide static calls to control cond_resched() (called in !CONFIG_PREEMPT)
and might_resched() (called in CONFIG_PREEMPT_VOLUNTARY) to that we
can override their behaviour when preempt= is overriden.

Since the default behaviour is full preemption, both their calls are
ignored when preempt= isn't passed.

  [fweisbec: branch might_resched() directly to __cond_resched(), only
             define static calls when PREEMPT_DYNAMIC]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-6-frederic@kernel.org
2021-02-17 14:12:42 +01:00
Michal Hocko 6ef869e064 preempt: Introduce CONFIG_PREEMPT_DYNAMIC
Preemption mode selection is currently hardcoded on Kconfig choices.
Introduce a dedicated option to tune preemption flavour at boot time,

This will be only available on architectures efficiently supporting
static calls in order not to tempt with the feature against additional
overhead that might be prohibitive or undesirable.

CONFIG_PREEMPT_DYNAMIC is automatically selected by CONFIG_PREEMPT if
the architecture provides the necessary support (CONFIG_STATIC_CALL_INLINE,
CONFIG_GENERIC_ENTRY, and provide with __preempt_schedule_function() /
__preempt_schedule_notrace_function()).

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
[peterz: relax requirement to HAVE_STATIC_CALL]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-5-frederic@kernel.org
2021-02-17 14:12:24 +01:00
Peter Zijlstra 3f2a8fc4b1 static_call/x86: Add __static_call_return0()
Provide a stub function that return 0 and wire up the static call site
patching to replace the CALL with a single 5 byte instruction that
clears %RAX, the return value register.

The function can be cast to any function pointer type that has a
single %RAX return (including pointers). Also provide a version that
returns an int for convenience. We are clearing the entire %RAX register
in any case, whether the return value is 32 or 64 bits, since %RAX is
always a scratch register anyway.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-2-frederic@kernel.org
2021-02-17 14:08:43 +01:00
Dietmar Eggemann c541bb7835 sched/core: Update task_prio() function header
The description of the RT offset and the values for 'normal' tasks needs
update. Moreover there are DL tasks now.
task_prio() has to stay like it is to guarantee compatibility with the
/proc/<pid>/stat priority field:

  # cat /proc/<pid>/stat | awk '{ print $18; }'

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210128131040.296856-4-dietmar.eggemann@arm.com
2021-02-17 14:08:30 +01:00
Dietmar Eggemann 9d061ba6bc sched: Remove USER_PRIO, TASK_USER_PRIO and MAX_USER_PRIO
The only remaining use of MAX_USER_PRIO (and USER_PRIO) is the
SCALE_PRIO() definition in the PowerPC Cell architecture's Synergistic
Processor Unit (SPU) scheduler. TASK_USER_PRIO isn't used anymore.

Commit fe443ef2ac ("[POWERPC] spusched: Dynamic timeslicing for
SCHED_OTHER") copied SCALE_PRIO() from the task scheduler in v2.6.23.

Commit a4ec24b48d ("sched: tidy up SCHED_RR") removed it from the task
scheduler in v2.6.24.

Commit 3ee237dddc ("sched/prio: Add 3 macros of MAX_NICE, MIN_NICE and
NICE_WIDTH in prio.h") introduced NICE_WIDTH much later.

With:

  MAX_USER_PRIO = USER_PRIO(MAX_PRIO)

                = MAX_PRIO - MAX_RT_PRIO

       MAX_PRIO = MAX_RT_PRIO + NICE_WIDTH

  MAX_USER_PRIO = MAX_RT_PRIO + NICE_WIDTH - MAX_RT_PRIO

  MAX_USER_PRIO = NICE_WIDTH

MAX_USER_PRIO can be replaced by NICE_WIDTH to be able to remove all the
{*_}USER_PRIO defines.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210128131040.296856-3-dietmar.eggemann@arm.com
2021-02-17 14:08:17 +01:00
Dietmar Eggemann ae18ad281e sched: Remove MAX_USER_RT_PRIO
Commit d46523ea32 ("[PATCH] fix MAX_USER_RT_PRIO and MAX_RT_PRIO")
was introduced due to a a small time period in which the realtime patch
set was using different values for MAX_USER_RT_PRIO and MAX_RT_PRIO.

This is no longer true, i.e. now MAX_RT_PRIO == MAX_USER_RT_PRIO.

Get rid of MAX_USER_RT_PRIO and make everything use MAX_RT_PRIO
instead.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210128131040.296856-2-dietmar.eggemann@arm.com
2021-02-17 14:08:11 +01:00
Dietmar Eggemann 71e5f6644f sched/topology: Fix sched_domain_topology_level alloc in sched_init_numa()
Commit "sched/topology: Make sched_init_numa() use a set for the
deduplicating sort" allocates 'i + nr_levels (level)' instead of
'i + nr_levels + 1' sched_domain_topology_level.

This led to an Oops (on Arm64 juno with CONFIG_SCHED_DEBUG):

sched_init_domains
  build_sched_domains()
    __free_domain_allocs()
      __sdt_free() {
	...
        for_each_sd_topology(tl)
	  ...
          sd = *per_cpu_ptr(sdd->sd, j); <--
	  ...
      }

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Barry Song <song.bao.hua@hisilicon.com>
Link: https://lkml.kernel.org/r/6000e39e-7d28-c360-9cd6-8798fd22a9bf@arm.com
2021-02-17 14:08:05 +01:00
Peter Zijlstra 5a7987253e rbtree, rtmutex: Use rb_add_cached()
Reduce rbtree boiler plate by using the new helpers.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
2021-02-17 14:07:57 +01:00
Peter Zijlstra a905e84e64 rbtree, uprobes: Use rbtree helpers
Reduce rbtree boilerplate by using the new helpers.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
2021-02-17 14:07:52 +01:00
Peter Zijlstra a3b8986455 rbtree, perf: Use new rbtree helpers
Reduce rbtree boiler plate by using the new helpers.

One noteworthy change is unification of the various (partial) compare
functions. We construct a subtree match by forcing the sub-order to
always match, see __group_cmp().

Due to 'const' we had to touch cgroup_id().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
2021-02-17 14:07:48 +01:00
Peter Zijlstra 8ecca39483 rbtree, sched/deadline: Use rb_add_cached()
Reduce rbtree boiler plate by using the new helpers.

Make rb_add_cached() / rb_erase_cached() return a pointer to the
leftmost node to aid in updating additional state.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
2021-02-17 14:07:44 +01:00