doc: Update removal of RCU-bh/sched update machinery
The RCU-bh update API is now defined in terms of that of RCU-bh and RCU-sched, so this commit updates the documentation accordingly. In addition, although RCU-sched persists in !PREEMPT kernels, in the PREEMPT case its update API is now defined in terms of that of RCU-preempt, so this commit also updates the documentation accordingly. While in the area, this commit removes the documentation for the now-obsolete synchronize_rcu_mult() and clarifies the Tasks RCU documentation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit is contained in:
parent
ea24c125fe
commit
77095901b8
|
@ -1374,8 +1374,7 @@ that is, if the CPU is currently idle.
|
|||
Accessor Functions</a></h3>
|
||||
|
||||
<p>The following listing shows the
|
||||
<tt>rcu_get_root()</tt>, <tt>rcu_for_each_node_breadth_first</tt>,
|
||||
<tt>rcu_for_each_nonleaf_node_breadth_first()</tt>, and
|
||||
<tt>rcu_get_root()</tt>, <tt>rcu_for_each_node_breadth_first</tt> and
|
||||
<tt>rcu_for_each_leaf_node()</tt> function and macros:
|
||||
|
||||
<pre>
|
||||
|
@ -1388,13 +1387,9 @@ Accessor Functions</a></h3>
|
|||
7 for ((rnp) = &(rsp)->node[0]; \
|
||||
8 (rnp) < &(rsp)->node[NUM_RCU_NODES]; (rnp)++)
|
||||
9
|
||||
10 #define rcu_for_each_nonleaf_node_breadth_first(rsp, rnp) \
|
||||
11 for ((rnp) = &(rsp)->node[0]; \
|
||||
12 (rnp) < (rsp)->level[NUM_RCU_LVLS - 1]; (rnp)++)
|
||||
13
|
||||
14 #define rcu_for_each_leaf_node(rsp, rnp) \
|
||||
15 for ((rnp) = (rsp)->level[NUM_RCU_LVLS - 1]; \
|
||||
16 (rnp) < &(rsp)->node[NUM_RCU_NODES]; (rnp)++)
|
||||
10 #define rcu_for_each_leaf_node(rsp, rnp) \
|
||||
11 for ((rnp) = (rsp)->level[NUM_RCU_LVLS - 1]; \
|
||||
12 (rnp) < &(rsp)->node[NUM_RCU_NODES]; (rnp)++)
|
||||
</pre>
|
||||
|
||||
<p>The <tt>rcu_get_root()</tt> simply returns a pointer to the
|
||||
|
@ -1407,10 +1402,7 @@ macro takes advantage of the layout of the <tt>rcu_node</tt>
|
|||
structures in the <tt>rcu_state</tt> structure's
|
||||
<tt>->node[]</tt> array, performing a breadth-first traversal by
|
||||
simply traversing the array in order.
|
||||
The <tt>rcu_for_each_nonleaf_node_breadth_first()</tt> macro operates
|
||||
similarly, but traverses only the first part of the array, thus excluding
|
||||
the leaf <tt>rcu_node</tt> structures.
|
||||
Finally, the <tt>rcu_for_each_leaf_node()</tt> macro traverses only
|
||||
Similarly, the <tt>rcu_for_each_leaf_node()</tt> macro traverses only
|
||||
the last part of the array, thus traversing only the leaf
|
||||
<tt>rcu_node</tt> structures.
|
||||
|
||||
|
@ -1418,15 +1410,14 @@ the last part of the array, thus traversing only the leaf
|
|||
<tr><th> </th></tr>
|
||||
<tr><th align="left">Quick Quiz:</th></tr>
|
||||
<tr><td>
|
||||
What do <tt>rcu_for_each_nonleaf_node_breadth_first()</tt> and
|
||||
What does
|
||||
<tt>rcu_for_each_leaf_node()</tt> do if the <tt>rcu_node</tt> tree
|
||||
contains only a single node?
|
||||
</td></tr>
|
||||
<tr><th align="left">Answer:</th></tr>
|
||||
<tr><td bgcolor="#ffffff"><font color="ffffff">
|
||||
In the single-node case,
|
||||
<tt>rcu_for_each_nonleaf_node_breadth_first()</tt> is a no-op
|
||||
and <tt>rcu_for_each_leaf_node()</tt> traverses the single node.
|
||||
<tt>rcu_for_each_leaf_node()</tt> traverses the single node.
|
||||
</font></td></tr>
|
||||
<tr><td> </td></tr>
|
||||
</table>
|
||||
|
|
|
@ -12,10 +12,9 @@ high efficiency and minimal disturbance, expedited grace periods accept
|
|||
lower efficiency and significant disturbance to attain shorter latencies.
|
||||
|
||||
<p>
|
||||
There are three flavors of RCU (RCU-bh, RCU-preempt, and RCU-sched),
|
||||
but only two flavors of expedited grace periods because the RCU-bh
|
||||
expedited grace period maps onto the RCU-sched expedited grace period.
|
||||
Each of the remaining two implementations is covered in its own section.
|
||||
There are two flavors of RCU (RCU-preempt and RCU-sched), with an earlier
|
||||
third RCU-bh flavor having been implemented in terms of the other two.
|
||||
Each of the two implementations is covered in its own section.
|
||||
|
||||
<ol>
|
||||
<li> <a href="#Expedited Grace Period Design">
|
||||
|
|
|
@ -1306,8 +1306,6 @@ doing so would degrade real-time response.
|
|||
|
||||
<p>
|
||||
This non-requirement appeared with preemptible RCU.
|
||||
If you need a grace period that waits on non-preemptible code regions, use
|
||||
<a href="#Sched Flavor">RCU-sched</a>.
|
||||
|
||||
<h2><a name="Parallelism Facts of Life">Parallelism Facts of Life</a></h2>
|
||||
|
||||
|
@ -2165,14 +2163,9 @@ however, this is not a panacea because there would be severe restrictions
|
|||
on what operations those callbacks could invoke.
|
||||
|
||||
<p>
|
||||
Perhaps surprisingly, <tt>synchronize_rcu()</tt>,
|
||||
<a href="#Bottom-Half Flavor"><tt>synchronize_rcu_bh()</tt></a>
|
||||
(<a href="#Bottom-Half Flavor">discussed below</a>),
|
||||
<a href="#Sched Flavor"><tt>synchronize_sched()</tt></a>,
|
||||
Perhaps surprisingly, <tt>synchronize_rcu()</tt> and
|
||||
<tt>synchronize_rcu_expedited()</tt>,
|
||||
<tt>synchronize_rcu_bh_expedited()</tt>, and
|
||||
<tt>synchronize_sched_expedited()</tt>
|
||||
will all operate normally
|
||||
will operate normally
|
||||
during very early boot, the reason being that there is only one CPU
|
||||
and preemption is disabled.
|
||||
This means that the call <tt>synchronize_rcu()</tt> (or friends)
|
||||
|
@ -2861,15 +2854,22 @@ The other four flavors are listed below, with requirements for each
|
|||
described in a separate section.
|
||||
|
||||
<ol>
|
||||
<li> <a href="#Bottom-Half Flavor">Bottom-Half Flavor</a>
|
||||
<li> <a href="#Sched Flavor">Sched Flavor</a>
|
||||
<li> <a href="#Bottom-Half Flavor">Bottom-Half Flavor (Historical)</a>
|
||||
<li> <a href="#Sched Flavor">Sched Flavor (Historical)</a>
|
||||
<li> <a href="#Sleepable RCU">Sleepable RCU</a>
|
||||
<li> <a href="#Tasks RCU">Tasks RCU</a>
|
||||
<li> <a href="#Waiting for Multiple Grace Periods">
|
||||
Waiting for Multiple Grace Periods</a>
|
||||
</ol>
|
||||
|
||||
<h3><a name="Bottom-Half Flavor">Bottom-Half Flavor</a></h3>
|
||||
<h3><a name="Bottom-Half Flavor">Bottom-Half Flavor (Historical)</a></h3>
|
||||
|
||||
<p>
|
||||
The RCU-bh flavor of RCU has since been expressed in terms of
|
||||
the other RCU flavors as part of a consolidation of the three
|
||||
flavors into a single flavor.
|
||||
The read-side API remains, and continues to disable softirq and to
|
||||
be accounted for by lockdep.
|
||||
Much of the material in this section is therefore strictly historical
|
||||
in nature.
|
||||
|
||||
<p>
|
||||
The softirq-disable (AKA “bottom-half”,
|
||||
|
@ -2929,8 +2929,20 @@ includes
|
|||
<tt>call_rcu_bh()</tt>,
|
||||
<tt>rcu_barrier_bh()</tt>, and
|
||||
<tt>rcu_read_lock_bh_held()</tt>.
|
||||
However, the update-side APIs are now simple wrappers for other RCU
|
||||
flavors, namely RCU-sched in CONFIG_PREEMPT=n kernels and RCU-preempt
|
||||
otherwise.
|
||||
|
||||
<h3><a name="Sched Flavor">Sched Flavor</a></h3>
|
||||
<h3><a name="Sched Flavor">Sched Flavor (Historical)</a></h3>
|
||||
|
||||
<p>
|
||||
The RCU-sched flavor of RCU has since been expressed in terms of
|
||||
the other RCU flavors as part of a consolidation of the three
|
||||
flavors into a single flavor.
|
||||
The read-side API remains, and continues to disable preemption and to
|
||||
be accounted for by lockdep.
|
||||
Much of the material in this section is therefore strictly historical
|
||||
in nature.
|
||||
|
||||
<p>
|
||||
Before preemptible RCU, waiting for an RCU grace period had the
|
||||
|
@ -3150,94 +3162,14 @@ The tasks-RCU API is quite compact, consisting only of
|
|||
<tt>call_rcu_tasks()</tt>,
|
||||
<tt>synchronize_rcu_tasks()</tt>, and
|
||||
<tt>rcu_barrier_tasks()</tt>.
|
||||
|
||||
<h3><a name="Waiting for Multiple Grace Periods">
|
||||
Waiting for Multiple Grace Periods</a></h3>
|
||||
|
||||
<p>
|
||||
Perhaps you have an RCU protected data structure that is accessed from
|
||||
RCU read-side critical sections, from softirq handlers, and from
|
||||
hardware interrupt handlers.
|
||||
That is three flavors of RCU, the normal flavor, the bottom-half flavor,
|
||||
and the sched flavor.
|
||||
How to wait for a compound grace period?
|
||||
|
||||
<p>
|
||||
The best approach is usually to “just say no!” and
|
||||
insert <tt>rcu_read_lock()</tt> and <tt>rcu_read_unlock()</tt>
|
||||
around each RCU read-side critical section, regardless of what
|
||||
environment it happens to be in.
|
||||
But suppose that some of the RCU read-side critical sections are
|
||||
on extremely hot code paths, and that use of <tt>CONFIG_PREEMPT=n</tt>
|
||||
is not a viable option, so that <tt>rcu_read_lock()</tt> and
|
||||
<tt>rcu_read_unlock()</tt> are not free.
|
||||
What then?
|
||||
|
||||
<p>
|
||||
You <i>could</i> wait on all three grace periods in succession, as follows:
|
||||
|
||||
<blockquote>
|
||||
<pre>
|
||||
1 synchronize_rcu();
|
||||
2 synchronize_rcu_bh();
|
||||
3 synchronize_sched();
|
||||
</pre>
|
||||
</blockquote>
|
||||
|
||||
<p>
|
||||
This works, but triples the update-side latency penalty.
|
||||
In cases where this is not acceptable, <tt>synchronize_rcu_mult()</tt>
|
||||
may be used to wait on all three flavors of grace period concurrently:
|
||||
|
||||
<blockquote>
|
||||
<pre>
|
||||
1 synchronize_rcu_mult(call_rcu, call_rcu_bh, call_rcu_sched);
|
||||
</pre>
|
||||
</blockquote>
|
||||
|
||||
<p>
|
||||
But what if it is necessary to also wait on SRCU?
|
||||
This can be done as follows:
|
||||
|
||||
<blockquote>
|
||||
<pre>
|
||||
1 static void call_my_srcu(struct rcu_head *head,
|
||||
2 void (*func)(struct rcu_head *head))
|
||||
3 {
|
||||
4 call_srcu(&my_srcu, head, func);
|
||||
5 }
|
||||
6
|
||||
7 synchronize_rcu_mult(call_rcu, call_rcu_bh, call_rcu_sched, call_my_srcu);
|
||||
</pre>
|
||||
</blockquote>
|
||||
|
||||
<p>
|
||||
If you needed to wait on multiple different flavors of SRCU
|
||||
(but why???), you would need to create a wrapper function resembling
|
||||
<tt>call_my_srcu()</tt> for each SRCU flavor.
|
||||
|
||||
<table>
|
||||
<tr><th> </th></tr>
|
||||
<tr><th align="left">Quick Quiz:</th></tr>
|
||||
<tr><td>
|
||||
But what if I need to wait for multiple RCU flavors, but I also need
|
||||
the grace periods to be expedited?
|
||||
</td></tr>
|
||||
<tr><th align="left">Answer:</th></tr>
|
||||
<tr><td bgcolor="#ffffff"><font color="ffffff">
|
||||
If you are using expedited grace periods, there should be less penalty
|
||||
for waiting on them in succession.
|
||||
But if that is nevertheless a problem, you can use workqueues
|
||||
or multiple kthreads to wait on the various expedited grace
|
||||
periods concurrently.
|
||||
</font></td></tr>
|
||||
<tr><td> </td></tr>
|
||||
</table>
|
||||
|
||||
<p>
|
||||
Again, it is usually better to adjust the RCU read-side critical sections
|
||||
to use a single flavor of RCU, but when this is not feasible, you can use
|
||||
<tt>synchronize_rcu_mult()</tt>.
|
||||
In <tt>CONFIG_PREEMPT=n</tt> kernels, trampolines cannot be preempted,
|
||||
so these APIs map to
|
||||
<tt>call_rcu()</tt>,
|
||||
<tt>synchronize_rcu()</tt>, and
|
||||
<tt>rcu_barrier()</tt>, respectively.
|
||||
In <tt>CONFIG_PREEMPT=y</tt> kernels, trampolines can be preempted,
|
||||
and these three APIs are therefore implemented by separate functions
|
||||
that check for voluntary context switches.
|
||||
|
||||
<h2><a name="Possible Future Changes">Possible Future Changes</a></h2>
|
||||
|
||||
|
@ -3248,12 +3180,6 @@ If this becomes a serious problem, it will be necessary to rework the
|
|||
grace-period state machine so as to avoid the need for the additional
|
||||
latency.
|
||||
|
||||
<p>
|
||||
Expedited grace periods scan the CPUs, so their latency and overhead
|
||||
increases with increasing numbers of CPUs.
|
||||
If this becomes a serious problem on large systems, it will be necessary
|
||||
to do some redesign to avoid this scalability problem.
|
||||
|
||||
<p>
|
||||
RCU disables CPU hotplug in a few places, perhaps most notably in the
|
||||
<tt>rcu_barrier()</tt> operations.
|
||||
|
@ -3298,11 +3224,6 @@ Please note that arrangements that require RCU to remap CPU numbers will
|
|||
require extremely good demonstration of need and full exploration of
|
||||
alternatives.
|
||||
|
||||
<p>
|
||||
There is an embarrassingly large number of flavors of RCU, and this
|
||||
number has been increasing over time.
|
||||
Perhaps it will be possible to combine some at some future date.
|
||||
|
||||
<p>
|
||||
RCU's various kthreads are reasonably recent additions.
|
||||
It is quite likely that adjustments will be required to more gracefully
|
||||
|
|
|
@ -16,12 +16,9 @@ o A CPU looping in an RCU read-side critical section.
|
|||
|
||||
o A CPU looping with interrupts disabled.
|
||||
|
||||
o A CPU looping with preemption disabled. This condition can
|
||||
result in RCU-sched stalls and, if ksoftirqd is in use, RCU-bh
|
||||
stalls.
|
||||
o A CPU looping with preemption disabled.
|
||||
|
||||
o A CPU looping with bottom halves disabled. This condition can
|
||||
result in RCU-sched and RCU-bh stalls.
|
||||
o A CPU looping with bottom halves disabled.
|
||||
|
||||
o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
|
||||
without invoking schedule(). If the looping in the kernel is
|
||||
|
@ -87,9 +84,9 @@ o A hardware failure. This is quite unlikely, but has occurred
|
|||
This resulted in a series of RCU CPU stall warnings, eventually
|
||||
leading the realization that the CPU had failed.
|
||||
|
||||
The RCU, RCU-sched, RCU-bh, and RCU-tasks implementations have CPU stall
|
||||
warning. Note that SRCU does -not- have CPU stall warnings. Please note
|
||||
that RCU only detects CPU stalls when there is a grace period in progress.
|
||||
The RCU, RCU-sched, and RCU-tasks implementations have CPU stall warning.
|
||||
Note that SRCU does -not- have CPU stall warnings. Please note that
|
||||
RCU only detects CPU stalls when there is a grace period in progress.
|
||||
No grace period, no CPU stall warnings.
|
||||
|
||||
To diagnose the cause of the stall, inspect the stack traces.
|
||||
|
|
|
@ -934,7 +934,8 @@ c. Do you need to treat NMI handlers, hardirq handlers,
|
|||
d. Do you need RCU grace periods to complete even in the face
|
||||
of softirq monopolization of one or more of the CPUs? For
|
||||
example, is your code subject to network-based denial-of-service
|
||||
attacks? If so, you need RCU-bh.
|
||||
attacks? If so, you should disable softirq across your readers,
|
||||
for example, by using rcu_read_lock_bh().
|
||||
|
||||
e. Is your workload too update-intensive for normal use of
|
||||
RCU, but inappropriate for other synchronization mechanisms?
|
||||
|
|
|
@ -3534,14 +3534,14 @@
|
|||
|
||||
In kernels built with CONFIG_RCU_NOCB_CPU=y, set
|
||||
the specified list of CPUs to be no-callback CPUs.
|
||||
Invocation of these CPUs' RCU callbacks will
|
||||
be offloaded to "rcuox/N" kthreads created for
|
||||
that purpose, where "x" is "b" for RCU-bh, "p"
|
||||
for RCU-preempt, and "s" for RCU-sched, and "N"
|
||||
is the CPU number. This reduces OS jitter on the
|
||||
offloaded CPUs, which can be useful for HPC and
|
||||
real-time workloads. It can also improve energy
|
||||
efficiency for asymmetric multiprocessors.
|
||||
Invocation of these CPUs' RCU callbacks will be
|
||||
offloaded to "rcuox/N" kthreads created for that
|
||||
purpose, where "x" is "p" for RCU-preempt, and
|
||||
"s" for RCU-sched, and "N" is the CPU number.
|
||||
This reduces OS jitter on the offloaded CPUs,
|
||||
which can be useful for HPC and real-time
|
||||
workloads. It can also improve energy efficiency
|
||||
for asymmetric multiprocessors.
|
||||
|
||||
rcu_nocb_poll [KNL]
|
||||
Rather than requiring that offloaded CPUs
|
||||
|
|
|
@ -321,7 +321,7 @@ To reduce its OS jitter, do at least one of the following:
|
|||
to do.
|
||||
|
||||
Name:
|
||||
rcuob/%d, rcuop/%d, and rcuos/%d
|
||||
rcuop/%d and rcuos/%d
|
||||
|
||||
Purpose:
|
||||
Offload RCU callbacks from the corresponding CPU.
|
||||
|
|
Loading…
Reference in New Issue