The scheduler is currently required to hold rq/pi locks across the entire
RCU read-side critical section or not at all. This is inconvenient and
leaves traps for the unwary, including the author of this commit.
But now that excessively long grace periods enable scheduling-clock
interrupts for holdout nohz_full CPUs, the nohz_full rescue logic in
rcu_read_unlock_special() can be dispensed with. In other words, the
rcu_read_unlock_special() function can refrain from doing wakeups unless
such wakeups are guaranteed safe.
This commit therefore avoids unsafe wakeups, freeing the scheduler to
hold rq/pi locks across rcu_read_unlock() even if the corresponding RCU
read-side critical section might have been preempted. This commit also
updates RCU's requirements documentation.
This commit is inspired by a patch from Lai Jiangshan:
https://lore.kernel.org/lkml/20191102124559.1135-2-laijs@linux.alibaba.com
This commit is further intended to be a step towards his goal of permitting
the inlining of RCU-preempt's rcu_read_lock() and rcu_read_unlock().
Cc: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Commit ccc9971e21 ("docs: rcu: convert some articles from html to
ReST") has converted a few of html RCU docs into ReST files, but a few
of html tags which not supported on rst is remaining. This commit
converts those to ReST appropriate alternatives.
Reviewed-by: Madhuparna Bhowmik <madhuparnabhowmik04@gmail.com>
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
While Paul was explaining some RCU magic I noticed a typo in
rcu_note_context_switch(). As a result, this commit replaces
rcu_node_context_switch() with rcu_note_context_switch().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This restores docs back in ReST format.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[ paulmck: Added Joel's SoB per Stephen Rothwell feedback. ]
[ paulmck: Joel approved via private email. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This restores docs back in ReST format.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[ paulmck: Added Joel's SoB per Stephen Rothwell feedback. ]
[ paulmck: Joel approved via private email. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Mauro's auto conversion broken these links, fix them.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
There are 4 RCU articles that are written on html format.
The way they are, they can't be part of the Linux Kernel
documentation body nor share the styles and pdf output.
So, convert them to ReST format.
This way, make htmldocs and make pdfdocs will produce a
documentation output that will be like the original ones, but
will be part of the Linux Kernel documentation body.
Part of the conversion was done with the help of pandoc, but
the result had some broken things that had to be manually
fixed.
Following are manual changes Mauro made when doing the automatic conversion:
Quoting from: https://lore.kernel.org/rcu/20190726154550.5eeae294@coco.lan/
> > At least the pandoc's version I used here has a bug: its conversion
> > from html to ReST on those files only start after a <body> tag - or
> > when the first quiz table starts. I only discovered that adding a
> > <body> at the beginning of the file solve this book at the last
> > conversions.
> >
> > So, for most html->ReST conversions, I manually converted the first
> > part of the document, basically stripping html paragraph tags and
> > by replacing highlights by the ReST syntax.
> >
> > Also, all the quiz tables seem to assume some javascript macro or
> > css style that would be hiding the answer part until the mouse moves
> > to it. Such macro/css was not there at the kernel tree. So, the quiz
> > answers have the same color as the background, making them invisible.
> > Even if we had such macro/css, this is not portable for pdf/LaTeX output
> > (and I'm not sure if this would work with ePub).
> >
> > So, I ended by manually doing the table conversion.
> >
> > Finally, I double-checked if the conversions ended ok, addressing any
> > issues that might have heppened.
> >
> > So, after both automatic conversion and manual fixes, I opened both the
> > html files produced by Sphinx and the original ones and compared them
> > line per line (except for the indexes, as Sphinx produces them
> > automatically), in order to see if all information from the original
> > files will be there on a format close to what we have on other ReST
> > files, fixing any pending issues if any.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This reverts docs from commit 355e9972da81e803bbb825b76106ae9b358caf8e.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[ paulmck: Added Joel's SoB per Stephen Rothwell feedback. ]
[ paulmck: Joel approved via private email. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This reverts docs from commit d6b9cd7dc8e041ee83cb1362fce59a3cdb1f2709.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[ paulmck: Added Joel's SoB per Stephen Rothwell feedback. ]
[ paulmck: Joel approved via private email. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The rcu_dereference_raw_notrace() API name is confusing. It is equivalent
to rcu_dereference_raw() except that it also does sparse pointer checking.
There are only a few users of rcu_dereference_raw_notrace(). This patches
renames all of them to be rcu_dereference_raw_check() with the "_check()"
indicating sparse checking.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[ paulmck: Fix checkpatch warnings about parentheses. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Commit bb73c52bad ("rcu: Don't disable preemption for Tiny and Tree
RCU readers") removed the barrier() calls from rcu_read_lock() and
rcu_write_lock() in CONFIG_PREEMPT=n&&CONFIG_PREEMPT_COUNT=n kernels.
Within RCU, this commit was OK, but it failed to account for things like
get_user() that can pagefault and that can be reordered by the compiler.
Lack of the barrier() calls in rcu_read_lock() and rcu_read_unlock()
can cause these page faults to migrate into RCU read-side critical
sections, which in CONFIG_PREEMPT=n kernels could result in too-short
grace periods and arbitrary misbehavior. Please see commit 386afc9114
("spinlocks and preemption points need to be at least compiler barriers")
and Linus's commit 66be4e66a7 ("rcu: locking and unlocking need to
always be at least barriers"), this last of which restores the barrier()
call to both rcu_read_lock() and rcu_read_unlock().
This commit removes barrier() calls that are no longer needed given that
the addition of them in Linus's commit noted above. The combination of
this commit and Linus's commit effectively reverts commit bb73c52bad
("rcu: Don't disable preemption for Tiny and Tree RCU readers").
Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Fix embarrassing typo located by Alan Stern. ]
Now that synchronize_rcu_bh, synchronize_rcu_bh_expedited, call_rcu_bh,
rcu_barrier_bh, synchronize_sched, synchronize_sched_expedited,
call_rcu_sched, rcu_barrier_sched, get_state_synchronize_sched,
and cond_synchronize_sched are obsolete, let's remove them from the
documentation aside from a small historical section.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Although the name rcu_process_callbacks() still makes sense for Tiny
RCU, where most of what it does is invoke callbacks, it no longer makes
much sense for Tree RCU, especially given that the actually callback
invocation is relegated to rcu_do_batch(), or, for no-CBs CPUs, to the
rcuo kthreads. Especially in the latter case, rcu_process_callbacks()
has very little to do with actual callbacks. A better description of
this function is that it performs RCU's core processing.
This commit therefore changes the name of Tree RCU's rcu_process_callbacks()
function to rcu_core(), which also has the virtue of being consistent with
the existing invoke_rcu_core() function.
While in the area, the header comment is reworked.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
The name rcu_check_callbacks() arguably made sense back in the early
2000s when RCU was quite a bit simpler than it is today, but it has
become quite misleading, especially with the advent of dyntick-idle
and NO_HZ_FULL. The rcu_check_callbacks() function is RCU's hook into
the scheduling-clock interrupt, and is now but one of many ways that
callbacks get promoted to invocable state.
This commit therefore changes the name to rcu_sched_clock_irq(),
which is the same number of characters and clearly indicates this
function's relation to the rest of the Linux kernel. In addition, for
the sake of consistency, rcu_flavor_check_callbacks() is also renamed
to rcu_flavor_sched_clock_irq().
While in the area, the header comments for both functions are reworked.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
SRCU's synchronize_srcu() may not be invoked from CPU-hotplug notifiers,
due to the fact that SRCU grace periods make use of timers and the
possibility of timers being temporarily stranded on the outgoing CPU.
This stranding of timers means that timers posted to the outgoing CPU
will not fire until late in the CPU-hotplug process. The problem is
that if a notifier is waiting on an SRCU grace period, that grace period
is waiting on a timer, and that timer is stranded on the outgoing CPU,
then the notifier will never be awakened, in other words, deadlock has
occurred. This same situation of course also prohibits srcu_barrier()
from being invoked from CPU-hotplug notifiers.
This commit therefore updates the requirements to include this restriction.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Back when there could be multiple RCU flavors running in the same kernel
at the same time, it was necessary to specify the expedited grace-period
IPI handler at runtime. Now that there is only one RCU flavor, the
IPI handler can be determined at build time. There is therefore no
longer any reason for the RCU-preempt and RCU-sched IPI handlers to
have different names, nor is there any reason to pass these handlers in
function arguments and in the data structures enclosing workqueues.
This commit therefore makes all these changes, pushing the specification
of the expedited grace-period IPI handler down to the point of use.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
This commit replaces "struction" with the correct "structure".
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Given RCU flavor consolidation, when rcu_read_unlock() is invoked with
interrupts disabled, the reporting of the corresponding quiescent state is
deferred until interrupts are re-enabled. There was therefore some hope
that this would allow dropping the restriction against holding scheduler
spinlocks across an rcu_read_unlock() without disabling interrupts across
the entire corresponding RCU read-side critical section. Unfortunately,
the need to quickly provide a quiescent state to expedited grace periods
sometimes requires a call to raise_softirq() during rcu_read_unlock()
execution. Because raise_softirq() can sometimes acquire the scheduler
spinlocks, the restriction must remain in effect. This commit therefore
updates the RCU requirements documentation accordingly.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
The code listing under this section has a quick quiz that says line
19 uses rcu_access_pointer, but the code listing itself instead uses
rcu_dereference(). This commit therefore makes the code listing match
the quick quiz.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
The Requirements.html document says "Disabling Preemption Does
Not Block Grace Periods". However this is no longer true with
the RCU consolidation. This commit therefore removes the obsolete
(non-)requirement entirely.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
The rcu_state structure doesn't have a gp_seq_needed field. Update the
description under rcu_data accordingly, to reflect this.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: <kernel-team@android.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
An important note under the rcu_segcblist description could use a more
detailed description. Especially explanation of the scenario where the
->head field may be temporarily NULL making it not wise to rely on it
to determine if callbacks are associated with the rcu_segcblist.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: <kernel-team@android.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
This patch updates all Data-Structures document figures and text and
removes some unwanted figures, to reflect the recent work Paul has been
doing with consolidating all flavors of RCU.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: <kernel-team@android.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
rcu_dynticks was folded into rcu_data structure. Update the data
structures RCU document accordingly.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: <kernel-team@android.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Since commit fced9c8cfe ("rcu: Avoid resched_cpu() when rescheduling
the current CPU"), resched_cpu is not directly called from
sync_sched_exp_handler. Update the documentation about the same.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: <kernel-team@android.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
RCU Data-Structures document describes a trick to test RCU with small
number of CPUs but with a taller tree. It wasn't immediately clear how
the document arrived at 16 CPUs which also requires setting the
FANOUT_LEAF to 2 instead of the default of 16. This commit therefore
provides the needed clarification.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: <kernel-team@android.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
This commit adds a section to the requirements documentation setting down
requirements for grace-period and callback-invocation forward progress.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit defers reporting of RCU-preempt quiescent states at
rcu_read_unlock_special() time when any of interrupts, softirq, or
preemption are disabled. These deferred quiescent states are reported
at a later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug
offline operation. Of course, if another RCU read-side critical
section has started in the meantime, the reporting of the quiescent
state will be further deferred.
This also means that disabling preemption, interrupts, and/or
softirqs will act as an RCU-preempt read-side critical section.
This is enforced by checking preempt_count() as needed.
Some special cases must be handled on an ad-hoc basis, for example,
context switch is a quiescent state even though both the scheduler and
do_exit() disable preemption. In these cases, additional calls to
rcu_preempt_deferred_qs() override the preemption disabling. Similar
logic overrides disabled interrupts in rcu_preempt_check_callbacks()
because in this case the quiescent state happened just before the
corresponding scheduling-clock interrupt.
In theory, this change lifts a long-standing restriction that required
that if interrupts were disabled across a call to rcu_read_unlock()
that the matching rcu_read_lock() also be contained within that
interrupts-disabled region of code. Because the reporting of the
corresponding RCU-preempt quiescent state is now deferred until
after interrupts have been enabled, it is no longer possible for this
situation to result in deadlocks involving the scheduler's runqueue and
priority-inheritance locks. This may allow some code simplification that
might reduce interrupt latency a bit. Unfortunately, in practice this
would also defer deboosting a low-priority task that had been subjected
to RCU priority boosting, so real-time-response considerations might
well force this restriction to remain in place.
Because RCU-preempt grace periods are now blocked not only by RCU
read-side critical sections, but also by disabling of interrupts,
preemption, and softirqs, it will be possible to eliminate RCU-bh and
RCU-sched in favor of RCU-preempt in CONFIG_PREEMPT=y kernels. This may
require some additional plumbing to provide the network denial-of-service
guarantees that have been traditionally provided by RCU-bh. Once these
are in place, CONFIG_PREEMPT=n kernels will be able to fold RCU-bh
into RCU-sched. This would mean that all kernels would have but
one flavor of RCU, which would open the door to significant code
cleanup.
Moving to a single flavor of RCU would also have the beneficial effect
of reducing the NOCB kthreads by at least a factor of two.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Apply rcu_read_unlock_special() preempt_count() feedback
from Joel Fernandes. ]
[ paulmck: Adjust rcu_eqs_enter() call to rcu_preempt_deferred_qs() in
response to bug reports from kbuild test robot. ]
[ paulmck: Fix bug located by kbuild test robot involving recursion
via rcu_preempt_deferred_qs(). ]
The RCU-bh update API is now defined in terms of that of RCU-bh and
RCU-sched, so this commit updates the documentation accordingly.
In addition, although RCU-sched persists in !PREEMPT kernels, in
the PREEMPT case its update API is now defined in terms of that of
RCU-preempt, so this commit also updates the documentation accordingly.
While in the area, this commit removes the documentation for the
now-obsolete synchronize_rcu_mult() and clarifies the Tasks RCU
documentation.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The very useful RCU Data-Structures describes that the dynticks counter
of the rcu_dynticks data structure is incremented when we transitions to
or from dynticks-idle mode. However it doesn't mention that it is also
incremented due to transitions to and from user mode which for dynticks
purposes is an extended quiescent state.
I found this with tracing calls to rcu_dynticks_eqs_enter which can also
happen from rcu_user_enter. Lets add this information to the
Data-Structures document.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Two of the Requirements.html LKML links are broken. This patch changes
them to use the archive from lore.kernel.org, which works fine.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Make Requirements.html talk about how NMI handlers can take what appear
to RCU to be normal interrupts.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit keeps only the historical and low-level discussion of
smp_read_barrier_depends().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Adjusted to allow for David Howells feedback on prior commit. ]
Now that cond_resched() also provides RCU quiescent states when
needed, it can be used in place of cond_resched_rcu_qs(). This
commit therefore documents this change.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Pull RCU updates from Ingo Molnar:
"The main changes in this cycle are:
- Documentation updates
- RCU CPU stall-warning updates
- Torture-test updates
- Miscellaneous fixes
Size wise the biggest updates are to documentation. Excluding
documentation most of the code increase comes from a single commit
which expands debugging"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
srcu: Add parameters to SRCU docbook comments
doc: Rewrite confusing statement about memory barriers
memory-barriers.txt: Fix typo in pairing example
rcu/segcblist: Include rcupdate.h
rcu: Add extended-quiescent-state testing advice
rcu: Suppress lockdep false-positive ->boost_mtx complaints
rcu: Do not include rtmutex_common.h unconditionally
torture: Provide TMPDIR environment variable to specify tmpdir
rcutorture: Dump writer stack if stalled
rcutorture: Add interrupt-disable capability to stall-warning tests
rcu: Suppress RCU CPU stall warnings while dumping trace
rcu: Turn off tracing before dumping trace
rcu: Make RCU CPU stall warnings check for irq-disabled CPUs
sched,rcu: Make cond_resched() provide RCU quiescent state
sched: Make resched_cpu() unconditional
irq_work: Map irq_work_on_queue() to irq_work_on() in !SMP
rcu: Create call_rcu_tasks() kthread at boot time
rcu: Fix up pending cbs check in rcu_prepare_for_idle
memory-barriers: Rework multicopy-atomicity section
memory-barriers: Replace uses of "transitive"
...
This commit provides text and diagrams showing how Tree RCU implements
its grace-period memory ordering guarantees.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
This commit documents the situations in which RCU needs the
scheduling-clock interrupt to be enabled, along with the consequences
of failing to meet RCU's needs in this area.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
RCU's debugfs tracing used to be the only reasonable low-level debug
information available, but ftrace and event tracing has since surpassed
the RCU debugfs level of usefulness. This commit therefore removes
RCU's debugfs tracing.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The sparse-based checking for non-RCU accesses to RCU-protected pointers
has been around for a very long time, and it is now the only type of
sparse-based checking that is optional. This commit therefore makes
it unconditional.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
The NO_HZ_FULL_SYSIDLE full-system-idle capability was added in 2013
by commit 0edd1b1784 ("nohz_full: Add full-system-idle state machine"),
but has not been used. This commit therefore removes it.
If it turns out to be needed later, this commit can always be reverted.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit classifies tail recursion as an alternative way to write
a loop, with similar limitations.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>