mirror of https://gitee.com/openkylin/linux.git
These were the main changes in this cycle:
- LKMM updates: mostly documentation changes, but also some new litmus tests for atomic ops. - KCSAN updates: the most important change is that GCC 11 now has all fixes in place to support KCSAN, so GCC support can be enabled again. Also more annotations. - futex updates: minor cleanups and simplifications - seqlock updates: merge preparatory changes/cleanups for the 'associated locks' facilities. - lockdep updates: - simplify IRQ trace event handling - add various new debug checks - simplify header dependencies, split out <linux/lockdep_types.h>, decouple lockdep from other low level headers some more - fix NMI handling - misc cleanups and smaller fixes Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8n9/wRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hZFQ//dD+AKw9Nym+WbylovmeD0qxWxPyeN/jG vBVDTOJIJLtZTkZf6YHcYOJlPwaMDYUQluqTPQhsaQZy/NoEb5NM2cFAj2R9gjyT O8665T1dvhW9Sh353mBpuwviqdrnvCeHTBEcglSlFY7hxToYAflUN0+DXGVtNys8 PFNf3L9SHT0GLVC8+di/eJzQaRqxiB0Pq7kvh2RvPJM/dcQNA9Ho3CCNO5j6qGoY u7OnMT8xJXkgbdjjUO4RO0v9VjMuNthZ2JiONDgvgKtJfIL2wt5YXIv1EYX0GuWp WZgIzE4o1G7GJOOzKpFfZFyK8grHu2fWgK1plvodWjlLkBmltJZ1qyOM+wngd/m2 TgtPo73/YFbxFUbbBpkb0eiIaH2t99kMvfCWd05+GiPCtzn9UL9GfFRWd42vonwc sQWjFrHKlnuzifUfNcLmKg7R2nUtF3Dm/SydiTJ+9NtH/QA17YJKWnlE1moulNtQ p7H7+8UdcvSQ7F38A74v2IYNIyDsv5qcE8ar4QHdaanBBX/LCyD0UlfgsgxEReXf GDKkpx7LFQlI6Y2YB+dZgkCwhNBl3/OQ3v6hC95B37fA67dAIQyPIWHiHbaM+029 gghqU4GcUcbjSnHPzl9PPL+hi9MyXrMjpb7CBXytg4NI4EE1waHR+0kX14V8ndRj MkWQOKPUgB0= =3MTT -----END PGP SIGNATURE----- Merge tag 'locking-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - LKMM updates: mostly documentation changes, but also some new litmus tests for atomic ops. - KCSAN updates: the most important change is that GCC 11 now has all fixes in place to support KCSAN, so GCC support can be enabled again. Also more annotations. - futex updates: minor cleanups and simplifications - seqlock updates: merge preparatory changes/cleanups for the 'associated locks' facilities. - lockdep updates: - simplify IRQ trace event handling - add various new debug checks - simplify header dependencies, split out <linux/lockdep_types.h>, decouple lockdep from other low level headers some more - fix NMI handling - misc cleanups and smaller fixes * tag 'locking-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) kcsan: Improve IRQ state trace reporting lockdep: Refactor IRQ trace events fields into struct seqlock: lockdep assert non-preemptibility on seqcount_t write lockdep: Add preemption enabled/disabled assertion APIs seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount() seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs seqlock: Reorder seqcount_t and seqlock_t API definitions seqlock: seqcount_t latch: End read sections with read_seqcount_retry() seqlock: Properly format kernel-doc code samples Documentation: locking: Describe seqlock design and usage locking/qspinlock: Do not include atomic.h from qspinlock_types.h locking/atomic: Move ATOMIC_INIT into linux/types.h lockdep: Move list.h inclusion into lockdep.h locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs futex: Remove unused or redundant includes futex: Consistently use fshared as boolean futex: Remove needless goto's futex: Remove put_futex_key() rwsem: fix commas in initialisation docs: locking: Replace HTTP links with HTTPS ones ...
This commit is contained in:
commit
9ba19ccd2d
|
@ -85,21 +85,21 @@ smp_store_release() respectively. Therefore, if you find yourself only using
|
|||
the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all
|
||||
and are doing it wrong.
|
||||
|
||||
A subtle detail of atomic_set{}() is that it should be observable to the RMW
|
||||
ops. That is:
|
||||
A note for the implementation of atomic_set{}() is that it must not break the
|
||||
atomicity of the RMW ops. That is:
|
||||
|
||||
C atomic-set
|
||||
C Atomic-RMW-ops-are-atomic-WRT-atomic_set
|
||||
|
||||
{
|
||||
atomic_set(v, 1);
|
||||
atomic_t v = ATOMIC_INIT(1);
|
||||
}
|
||||
|
||||
P0(atomic_t *v)
|
||||
{
|
||||
(void)atomic_add_unless(v, 1, 0);
|
||||
}
|
||||
|
||||
P1(atomic_t *v)
|
||||
{
|
||||
atomic_add_unless(v, 1, 0);
|
||||
}
|
||||
|
||||
P2(atomic_t *v)
|
||||
{
|
||||
atomic_set(v, 0);
|
||||
}
|
||||
|
@ -233,19 +233,19 @@ as well. Similarly, something like:
|
|||
is an ACQUIRE pattern (though very much not typical), but again the barrier is
|
||||
strictly stronger than ACQUIRE. As illustrated:
|
||||
|
||||
C strong-acquire
|
||||
C Atomic-RMW+mb__after_atomic-is-stronger-than-acquire
|
||||
|
||||
{
|
||||
}
|
||||
|
||||
P1(int *x, atomic_t *y)
|
||||
P0(int *x, atomic_t *y)
|
||||
{
|
||||
r0 = READ_ONCE(*x);
|
||||
smp_rmb();
|
||||
r1 = atomic_read(y);
|
||||
}
|
||||
|
||||
P2(int *x, atomic_t *y)
|
||||
P1(int *x, atomic_t *y)
|
||||
{
|
||||
atomic_inc(y);
|
||||
smp_mb__after_atomic();
|
||||
|
@ -253,14 +253,14 @@ strictly stronger than ACQUIRE. As illustrated:
|
|||
}
|
||||
|
||||
exists
|
||||
(r0=1 /\ r1=0)
|
||||
(0:r0=1 /\ 0:r1=0)
|
||||
|
||||
This should not happen; but a hypothetical atomic_inc_acquire() --
|
||||
(void)atomic_fetch_inc_acquire() for instance -- would allow the outcome,
|
||||
because it would not order the W part of the RMW against the following
|
||||
WRITE_ONCE. Thus:
|
||||
|
||||
P1 P2
|
||||
P0 P1
|
||||
|
||||
t = LL.acq *y (0)
|
||||
t++;
|
||||
|
|
|
@ -8,7 +8,8 @@ approach to detect races. KCSAN's primary purpose is to detect `data races`_.
|
|||
Usage
|
||||
-----
|
||||
|
||||
KCSAN requires Clang version 11 or later.
|
||||
KCSAN is supported by both GCC and Clang. With GCC we require version 11 or
|
||||
later, and with Clang also require version 11 or later.
|
||||
|
||||
To enable KCSAN configure the kernel with::
|
||||
|
||||
|
|
|
@ -0,0 +1,35 @@
|
|||
============
|
||||
LITMUS TESTS
|
||||
============
|
||||
|
||||
Each subdirectory contains litmus tests that are typical to describe the
|
||||
semantics of respective kernel APIs.
|
||||
For more information about how to "run" a litmus test or how to generate
|
||||
a kernel test module based on a litmus test, please see
|
||||
tools/memory-model/README.
|
||||
|
||||
|
||||
atomic (/atomic derectory)
|
||||
--------------------------
|
||||
|
||||
Atomic-RMW+mb__after_atomic-is-stronger-than-acquire.litmus
|
||||
Test that an atomic RMW followed by a smp_mb__after_atomic() is
|
||||
stronger than a normal acquire: both the read and write parts of
|
||||
the RMW are ordered before the subsequential memory accesses.
|
||||
|
||||
Atomic-RMW-ops-are-atomic-WRT-atomic_set.litmus
|
||||
Test that atomic_set() cannot break the atomicity of atomic RMWs.
|
||||
NOTE: Require herd7 7.56 or later which supports "(void)expr".
|
||||
|
||||
|
||||
RCU (/rcu directory)
|
||||
--------------------
|
||||
|
||||
MP+onceassign+derefonce.litmus (under tools/memory-model/litmus-tests/)
|
||||
Demonstrates the use of rcu_assign_pointer() and rcu_dereference() to
|
||||
ensure that an RCU reader will not see pre-initialization garbage.
|
||||
|
||||
RCU+sync+read.litmus
|
||||
RCU+sync+free.litmus
|
||||
Both the above litmus tests demonstrate the RCU grace period guarantee
|
||||
that an RCU read-side critical section can never span a grace period.
|
|
@ -0,0 +1,32 @@
|
|||
C Atomic-RMW+mb__after_atomic-is-stronger-than-acquire
|
||||
|
||||
(*
|
||||
* Result: Never
|
||||
*
|
||||
* Test that an atomic RMW followed by a smp_mb__after_atomic() is
|
||||
* stronger than a normal acquire: both the read and write parts of
|
||||
* the RMW are ordered before the subsequential memory accesses.
|
||||
*)
|
||||
|
||||
{
|
||||
}
|
||||
|
||||
P0(int *x, atomic_t *y)
|
||||
{
|
||||
int r0;
|
||||
int r1;
|
||||
|
||||
r0 = READ_ONCE(*x);
|
||||
smp_rmb();
|
||||
r1 = atomic_read(y);
|
||||
}
|
||||
|
||||
P1(int *x, atomic_t *y)
|
||||
{
|
||||
atomic_inc(y);
|
||||
smp_mb__after_atomic();
|
||||
WRITE_ONCE(*x, 1);
|
||||
}
|
||||
|
||||
exists
|
||||
(0:r0=1 /\ 0:r1=0)
|
|
@ -0,0 +1,25 @@
|
|||
C Atomic-RMW-ops-are-atomic-WRT-atomic_set
|
||||
|
||||
(*
|
||||
* Result: Never
|
||||
*
|
||||
* Test that atomic_set() cannot break the atomicity of atomic RMWs.
|
||||
* NOTE: This requires herd7 7.56 or later which supports "(void)expr".
|
||||
*)
|
||||
|
||||
{
|
||||
atomic_t v = ATOMIC_INIT(1);
|
||||
}
|
||||
|
||||
P0(atomic_t *v)
|
||||
{
|
||||
(void)atomic_add_unless(v, 1, 0);
|
||||
}
|
||||
|
||||
P1(atomic_t *v)
|
||||
{
|
||||
atomic_set(v, 0);
|
||||
}
|
||||
|
||||
exists
|
||||
(v=2)
|
|
@ -0,0 +1,42 @@
|
|||
C RCU+sync+free
|
||||
|
||||
(*
|
||||
* Result: Never
|
||||
*
|
||||
* This litmus test demonstrates that an RCU reader can never see a write that
|
||||
* follows a grace period, if it did not see writes that precede that grace
|
||||
* period.
|
||||
*
|
||||
* This is a typical pattern of RCU usage, where the write before the grace
|
||||
* period assigns a pointer, and the writes following the grace period destroy
|
||||
* the object that the pointer used to point to.
|
||||
*
|
||||
* This is one implication of the RCU grace-period guarantee, which says (among
|
||||
* other things) that an RCU read-side critical section cannot span a grace period.
|
||||
*)
|
||||
|
||||
{
|
||||
int x = 1;
|
||||
int *y = &x;
|
||||
int z = 1;
|
||||
}
|
||||
|
||||
P0(int *x, int *z, int **y)
|
||||
{
|
||||
int *r0;
|
||||
int r1;
|
||||
|
||||
rcu_read_lock();
|
||||
r0 = rcu_dereference(*y);
|
||||
r1 = READ_ONCE(*r0);
|
||||
rcu_read_unlock();
|
||||
}
|
||||
|
||||
P1(int *x, int *z, int **y)
|
||||
{
|
||||
rcu_assign_pointer(*y, z);
|
||||
synchronize_rcu();
|
||||
WRITE_ONCE(*x, 0);
|
||||
}
|
||||
|
||||
exists (0:r0=x /\ 0:r1=0)
|
|
@ -0,0 +1,37 @@
|
|||
C RCU+sync+read
|
||||
|
||||
(*
|
||||
* Result: Never
|
||||
*
|
||||
* This litmus test demonstrates that after a grace period, an RCU updater always
|
||||
* sees all stores done in prior RCU read-side critical sections. Such
|
||||
* read-side critical sections would have ended before the grace period ended.
|
||||
*
|
||||
* This is one implication of the RCU grace-period guarantee, which says (among
|
||||
* other things) that an RCU read-side critical section cannot span a grace period.
|
||||
*)
|
||||
|
||||
{
|
||||
int x = 0;
|
||||
int y = 0;
|
||||
}
|
||||
|
||||
P0(int *x, int *y)
|
||||
{
|
||||
rcu_read_lock();
|
||||
WRITE_ONCE(*x, 1);
|
||||
WRITE_ONCE(*y, 1);
|
||||
rcu_read_unlock();
|
||||
}
|
||||
|
||||
P1(int *x, int *y)
|
||||
{
|
||||
int r0;
|
||||
int r1;
|
||||
|
||||
r0 = READ_ONCE(*x);
|
||||
synchronize_rcu();
|
||||
r1 = READ_ONCE(*y);
|
||||
}
|
||||
|
||||
exists (1:r0=1 /\ 1:r1=0)
|
|
@ -14,6 +14,7 @@ locking
|
|||
mutex-design
|
||||
rt-mutex-design
|
||||
rt-mutex
|
||||
seqlock
|
||||
spinlocks
|
||||
ww-mutex-design
|
||||
preempt-locking
|
||||
|
|
|
@ -18,7 +18,7 @@ as an alternative to these. This new data structure provided a number
|
|||
of advantages, including simpler interfaces, and at that time smaller
|
||||
code (see Disadvantages).
|
||||
|
||||
[1] http://lwn.net/Articles/164802/
|
||||
[1] https://lwn.net/Articles/164802/
|
||||
|
||||
Implementation
|
||||
--------------
|
||||
|
|
|
@ -0,0 +1,170 @@
|
|||
======================================
|
||||
Sequence counters and sequential locks
|
||||
======================================
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
Sequence counters are a reader-writer consistency mechanism with
|
||||
lockless readers (read-only retry loops), and no writer starvation. They
|
||||
are used for data that's rarely written to (e.g. system time), where the
|
||||
reader wants a consistent set of information and is willing to retry if
|
||||
that information changes.
|
||||
|
||||
A data set is consistent when the sequence count at the beginning of the
|
||||
read side critical section is even and the same sequence count value is
|
||||
read again at the end of the critical section. The data in the set must
|
||||
be copied out inside the read side critical section. If the sequence
|
||||
count has changed between the start and the end of the critical section,
|
||||
the reader must retry.
|
||||
|
||||
Writers increment the sequence count at the start and the end of their
|
||||
critical section. After starting the critical section the sequence count
|
||||
is odd and indicates to the readers that an update is in progress. At
|
||||
the end of the write side critical section the sequence count becomes
|
||||
even again which lets readers make progress.
|
||||
|
||||
A sequence counter write side critical section must never be preempted
|
||||
or interrupted by read side sections. Otherwise the reader will spin for
|
||||
the entire scheduler tick due to the odd sequence count value and the
|
||||
interrupted writer. If that reader belongs to a real-time scheduling
|
||||
class, it can spin forever and the kernel will livelock.
|
||||
|
||||
This mechanism cannot be used if the protected data contains pointers,
|
||||
as the writer can invalidate a pointer that the reader is following.
|
||||
|
||||
|
||||
.. _seqcount_t:
|
||||
|
||||
Sequence counters (``seqcount_t``)
|
||||
==================================
|
||||
|
||||
This is the the raw counting mechanism, which does not protect against
|
||||
multiple writers. Write side critical sections must thus be serialized
|
||||
by an external lock.
|
||||
|
||||
If the write serialization primitive is not implicitly disabling
|
||||
preemption, preemption must be explicitly disabled before entering the
|
||||
write side section. If the read section can be invoked from hardirq or
|
||||
softirq contexts, interrupts or bottom halves must also be respectively
|
||||
disabled before entering the write section.
|
||||
|
||||
If it's desired to automatically handle the sequence counter
|
||||
requirements of writer serialization and non-preemptibility, use
|
||||
:ref:`seqlock_t` instead.
|
||||
|
||||
Initialization::
|
||||
|
||||
/* dynamic */
|
||||
seqcount_t foo_seqcount;
|
||||
seqcount_init(&foo_seqcount);
|
||||
|
||||
/* static */
|
||||
static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount);
|
||||
|
||||
/* C99 struct init */
|
||||
struct {
|
||||
.seq = SEQCNT_ZERO(foo.seq),
|
||||
} foo;
|
||||
|
||||
Write path::
|
||||
|
||||
/* Serialized context with disabled preemption */
|
||||
|
||||
write_seqcount_begin(&foo_seqcount);
|
||||
|
||||
/* ... [[write-side critical section]] ... */
|
||||
|
||||
write_seqcount_end(&foo_seqcount);
|
||||
|
||||
Read path::
|
||||
|
||||
do {
|
||||
seq = read_seqcount_begin(&foo_seqcount);
|
||||
|
||||
/* ... [[read-side critical section]] ... */
|
||||
|
||||
} while (read_seqcount_retry(&foo_seqcount, seq));
|
||||
|
||||
|
||||
.. _seqlock_t:
|
||||
|
||||
Sequential locks (``seqlock_t``)
|
||||
================================
|
||||
|
||||
This contains the :ref:`seqcount_t` mechanism earlier discussed, plus an
|
||||
embedded spinlock for writer serialization and non-preemptibility.
|
||||
|
||||
If the read side section can be invoked from hardirq or softirq context,
|
||||
use the write side function variants which disable interrupts or bottom
|
||||
halves respectively.
|
||||
|
||||
Initialization::
|
||||
|
||||
/* dynamic */
|
||||
seqlock_t foo_seqlock;
|
||||
seqlock_init(&foo_seqlock);
|
||||
|
||||
/* static */
|
||||
static DEFINE_SEQLOCK(foo_seqlock);
|
||||
|
||||
/* C99 struct init */
|
||||
struct {
|
||||
.seql = __SEQLOCK_UNLOCKED(foo.seql)
|
||||
} foo;
|
||||
|
||||
Write path::
|
||||
|
||||
write_seqlock(&foo_seqlock);
|
||||
|
||||
/* ... [[write-side critical section]] ... */
|
||||
|
||||
write_sequnlock(&foo_seqlock);
|
||||
|
||||
Read path, three categories:
|
||||
|
||||
1. Normal Sequence readers which never block a writer but they must
|
||||
retry if a writer is in progress by detecting change in the sequence
|
||||
number. Writers do not wait for a sequence reader::
|
||||
|
||||
do {
|
||||
seq = read_seqbegin(&foo_seqlock);
|
||||
|
||||
/* ... [[read-side critical section]] ... */
|
||||
|
||||
} while (read_seqretry(&foo_seqlock, seq));
|
||||
|
||||
2. Locking readers which will wait if a writer or another locking reader
|
||||
is in progress. A locking reader in progress will also block a writer
|
||||
from entering its critical section. This read lock is
|
||||
exclusive. Unlike rwlock_t, only one locking reader can acquire it::
|
||||
|
||||
read_seqlock_excl(&foo_seqlock);
|
||||
|
||||
/* ... [[read-side critical section]] ... */
|
||||
|
||||
read_sequnlock_excl(&foo_seqlock);
|
||||
|
||||
3. Conditional lockless reader (as in 1), or locking reader (as in 2),
|
||||
according to a passed marker. This is used to avoid lockless readers
|
||||
starvation (too much retry loops) in case of a sharp spike in write
|
||||
activity. First, a lockless read is tried (even marker passed). If
|
||||
that trial fails (odd sequence counter is returned, which is used as
|
||||
the next iteration marker), the lockless read is transformed to a
|
||||
full locking read and no retry loop is necessary::
|
||||
|
||||
/* marker; even initialization */
|
||||
int seq = 0;
|
||||
do {
|
||||
read_seqbegin_or_lock(&foo_seqlock, &seq);
|
||||
|
||||
/* ... [[read-side critical section]] ... */
|
||||
|
||||
} while (need_seqretry(&foo_seqlock, seq));
|
||||
done_seqretry(&foo_seqlock, seq);
|
||||
|
||||
|
||||
API documentation
|
||||
=================
|
||||
|
||||
.. kernel-doc:: include/linux/seqlock.h
|
|
@ -9981,6 +9981,7 @@ M: Luc Maranget <luc.maranget@inria.fr>
|
|||
M: "Paul E. McKenney" <paulmck@kernel.org>
|
||||
R: Akira Yokosawa <akiyks@gmail.com>
|
||||
R: Daniel Lustig <dlustig@nvidia.com>
|
||||
R: Joel Fernandes <joel@joelfernandes.org>
|
||||
L: linux-kernel@vger.kernel.org
|
||||
L: linux-arch@vger.kernel.org
|
||||
S: Supported
|
||||
|
@ -9989,6 +9990,7 @@ F: Documentation/atomic_bitops.txt
|
|||
F: Documentation/atomic_t.txt
|
||||
F: Documentation/core-api/atomic_ops.rst
|
||||
F: Documentation/core-api/refcount-vs-atomic.rst
|
||||
F: Documentation/litmus-tests/
|
||||
F: Documentation/memory-barriers.txt
|
||||
F: tools/memory-model/
|
||||
|
||||
|
|
|
@ -24,7 +24,6 @@
|
|||
#define __atomic_acquire_fence()
|
||||
#define __atomic_post_full_fence()
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
#define ATOMIC64_INIT(i) { (i) }
|
||||
|
||||
#define atomic_read(v) READ_ONCE((v)->counter)
|
||||
|
|
|
@ -14,8 +14,6 @@
|
|||
#include <asm/barrier.h>
|
||||
#include <asm/smp.h>
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
|
||||
#ifndef CONFIG_ARC_PLAT_EZNPS
|
||||
|
||||
#define atomic_read(v) READ_ONCE((v)->counter)
|
||||
|
|
|
@ -15,8 +15,6 @@
|
|||
#include <asm/barrier.h>
|
||||
#include <asm/cmpxchg.h>
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
|
||||
#ifdef __KERNEL__
|
||||
|
||||
/*
|
||||
|
|
|
@ -5,7 +5,7 @@
|
|||
#ifndef _ASM_ARM_PERCPU_H_
|
||||
#define _ASM_ARM_PERCPU_H_
|
||||
|
||||
#include <asm/thread_info.h>
|
||||
register unsigned long current_stack_pointer asm ("sp");
|
||||
|
||||
/*
|
||||
* Same as asm-generic/percpu.h, except that we store the per cpu offset
|
||||
|
|
|
@ -75,11 +75,6 @@ struct thread_info {
|
|||
.addr_limit = KERNEL_DS, \
|
||||
}
|
||||
|
||||
/*
|
||||
* how to get the current stack pointer in C
|
||||
*/
|
||||
register unsigned long current_stack_pointer asm ("sp");
|
||||
|
||||
/*
|
||||
* how to get the thread information struct from C
|
||||
*/
|
||||
|
|
|
@ -99,8 +99,6 @@ static inline long arch_atomic64_dec_if_positive(atomic64_t *v)
|
|||
return __lse_ll_sc_body(atomic64_dec_if_positive, v);
|
||||
}
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
|
||||
#define arch_atomic_read(v) __READ_ONCE((v)->counter)
|
||||
#define arch_atomic_set(v, i) __WRITE_ONCE(((v)->counter), (i))
|
||||
|
||||
|
|
|
@ -12,8 +12,6 @@
|
|||
* resource counting etc..
|
||||
*/
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
|
||||
#define atomic_read(v) READ_ONCE((v)->counter)
|
||||
#define atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
|
||||
|
||||
|
|
|
@ -12,8 +12,6 @@
|
|||
#include <asm/cmpxchg.h>
|
||||
#include <asm/barrier.h>
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
|
||||
/* Normal writes in our arch don't clear lock reservations */
|
||||
|
||||
static inline void atomic_set(atomic_t *v, int new)
|
||||
|
|
|
@ -19,7 +19,6 @@
|
|||
#include <asm/barrier.h>
|
||||
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
#define ATOMIC64_INIT(i) { (i) }
|
||||
|
||||
#define atomic_read(v) READ_ONCE((v)->counter)
|
||||
|
|
|
@ -16,8 +16,6 @@
|
|||
* We do not have SMP m68k systems, so we don't have to deal with that.
|
||||
*/
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
|
||||
#define atomic_read(v) READ_ONCE((v)->counter)
|
||||
#define atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
|
||||
|
||||
|
|
|
@ -45,7 +45,6 @@ static __always_inline type pfx##_xchg(pfx##_t *v, type n) \
|
|||
return xchg(&v->counter, n); \
|
||||
}
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
ATOMIC_OPS(atomic, int)
|
||||
|
||||
#ifdef CONFIG_64BIT
|
||||
|
|
|
@ -136,8 +136,6 @@ ATOMIC_OPS(xor, ^=)
|
|||
#undef ATOMIC_OP_RETURN
|
||||
#undef ATOMIC_OP
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
|
||||
#ifdef CONFIG_64BIT
|
||||
|
||||
#define ATOMIC64_INIT(i) { (i) }
|
||||
|
|
|
@ -11,8 +11,6 @@
|
|||
#include <asm/cmpxchg.h>
|
||||
#include <asm/barrier.h>
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
|
||||
/*
|
||||
* Since *_return_relaxed and {cmp}xchg_relaxed are implemented with
|
||||
* a "bne-" instruction at the end, so an isync is enough as a acquire barrier
|
||||
|
|
|
@ -0,0 +1,52 @@
|
|||
#ifndef _ASM_POWERPC_DTL_H
|
||||
#define _ASM_POWERPC_DTL_H
|
||||
|
||||
#include <asm/lppaca.h>
|
||||
#include <linux/spinlock_types.h>
|
||||
|
||||
/*
|
||||
* Layout of entries in the hypervisor's dispatch trace log buffer.
|
||||
*/
|
||||
struct dtl_entry {
|
||||
u8 dispatch_reason;
|
||||
u8 preempt_reason;
|
||||
__be16 processor_id;
|
||||
__be32 enqueue_to_dispatch_time;
|
||||
__be32 ready_to_enqueue_time;
|
||||
__be32 waiting_to_ready_time;
|
||||
__be64 timebase;
|
||||
__be64 fault_addr;
|
||||
__be64 srr0;
|
||||
__be64 srr1;
|
||||
};
|
||||
|
||||
#define DISPATCH_LOG_BYTES 4096 /* bytes per cpu */
|
||||
#define N_DISPATCH_LOG (DISPATCH_LOG_BYTES / sizeof(struct dtl_entry))
|
||||
|
||||
/*
|
||||
* Dispatch trace log event enable mask:
|
||||
* 0x1: voluntary virtual processor waits
|
||||
* 0x2: time-slice preempts
|
||||
* 0x4: virtual partition memory page faults
|
||||
*/
|
||||
#define DTL_LOG_CEDE 0x1
|
||||
#define DTL_LOG_PREEMPT 0x2
|
||||
#define DTL_LOG_FAULT 0x4
|
||||
#define DTL_LOG_ALL (DTL_LOG_CEDE | DTL_LOG_PREEMPT | DTL_LOG_FAULT)
|
||||
|
||||
extern struct kmem_cache *dtl_cache;
|
||||
extern rwlock_t dtl_access_lock;
|
||||
|
||||
/*
|
||||
* When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE = y, the cpu accounting code controls
|
||||
* reading from the dispatch trace log. If other code wants to consume
|
||||
* DTL entries, it can set this pointer to a function that will get
|
||||
* called once for each DTL entry that gets processed.
|
||||
*/
|
||||
extern void (*dtl_consumer)(struct dtl_entry *entry, u64 index);
|
||||
|
||||
extern void register_dtl_buffer(int cpu);
|
||||
extern void alloc_dtl_buffers(unsigned long *time_limit);
|
||||
extern long hcall_vphn(unsigned long cpu, u64 flags, __be32 *associativity);
|
||||
|
||||
#endif /* _ASM_POWERPC_DTL_H */
|
|
@ -42,7 +42,6 @@
|
|||
*/
|
||||
#include <linux/cache.h>
|
||||
#include <linux/threads.h>
|
||||
#include <linux/spinlock_types.h>
|
||||
#include <asm/types.h>
|
||||
#include <asm/mmu.h>
|
||||
#include <asm/firmware.h>
|
||||
|
@ -146,49 +145,6 @@ struct slb_shadow {
|
|||
} save_area[SLB_NUM_BOLTED];
|
||||
} ____cacheline_aligned;
|
||||
|
||||
/*
|
||||
* Layout of entries in the hypervisor's dispatch trace log buffer.
|
||||
*/
|
||||
struct dtl_entry {
|
||||
u8 dispatch_reason;
|
||||
u8 preempt_reason;
|
||||
__be16 processor_id;
|
||||
__be32 enqueue_to_dispatch_time;
|
||||
__be32 ready_to_enqueue_time;
|
||||
__be32 waiting_to_ready_time;
|
||||
__be64 timebase;
|
||||
__be64 fault_addr;
|
||||
__be64 srr0;
|
||||
__be64 srr1;
|
||||
};
|
||||
|
||||
#define DISPATCH_LOG_BYTES 4096 /* bytes per cpu */
|
||||
#define N_DISPATCH_LOG (DISPATCH_LOG_BYTES / sizeof(struct dtl_entry))
|
||||
|
||||
/*
|
||||
* Dispatch trace log event enable mask:
|
||||
* 0x1: voluntary virtual processor waits
|
||||
* 0x2: time-slice preempts
|
||||
* 0x4: virtual partition memory page faults
|
||||
*/
|
||||
#define DTL_LOG_CEDE 0x1
|
||||
#define DTL_LOG_PREEMPT 0x2
|
||||
#define DTL_LOG_FAULT 0x4
|
||||
#define DTL_LOG_ALL (DTL_LOG_CEDE | DTL_LOG_PREEMPT | DTL_LOG_FAULT)
|
||||
|
||||
extern struct kmem_cache *dtl_cache;
|
||||
extern rwlock_t dtl_access_lock;
|
||||
|
||||
/*
|
||||
* When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE = y, the cpu accounting code controls
|
||||
* reading from the dispatch trace log. If other code wants to consume
|
||||
* DTL entries, it can set this pointer to a function that will get
|
||||
* called once for each DTL entry that gets processed.
|
||||
*/
|
||||
extern void (*dtl_consumer)(struct dtl_entry *entry, u64 index);
|
||||
|
||||
extern void register_dtl_buffer(int cpu);
|
||||
extern void alloc_dtl_buffers(unsigned long *time_limit);
|
||||
extern long hcall_vphn(unsigned long cpu, u64 flags, __be32 *associativity);
|
||||
|
||||
#endif /* CONFIG_PPC_BOOK3S */
|
||||
|
|
|
@ -29,7 +29,6 @@
|
|||
#include <asm/hmi.h>
|
||||
#include <asm/cpuidle.h>
|
||||
#include <asm/atomic.h>
|
||||
#include <asm/rtas-types.h>
|
||||
|
||||
#include <asm-generic/mmiowb_types.h>
|
||||
|
||||
|
@ -53,6 +52,7 @@ extern unsigned int debug_smp_processor_id(void); /* from linux/smp.h */
|
|||
#define get_slb_shadow() (get_paca()->slb_shadow_ptr)
|
||||
|
||||
struct task_struct;
|
||||
struct rtas_args;
|
||||
|
||||
/*
|
||||
* Defines the layout of the paca.
|
||||
|
|
|
@ -183,6 +183,8 @@ static inline unsigned long read_spurr(unsigned long tb)
|
|||
|
||||
#ifdef CONFIG_PPC_SPLPAR
|
||||
|
||||
#include <asm/dtl.h>
|
||||
|
||||
/*
|
||||
* Scan the dispatch trace log and count up the stolen time.
|
||||
* Should be called with interrupts disabled.
|
||||
|
|
|
@ -74,6 +74,7 @@
|
|||
#include <asm/hw_breakpoint.h>
|
||||
#include <asm/kvm_book3s_uvmem.h>
|
||||
#include <asm/ultravisor.h>
|
||||
#include <asm/dtl.h>
|
||||
|
||||
#include "book3s.h"
|
||||
|
||||
|
|
|
@ -12,6 +12,7 @@
|
|||
#include <asm/smp.h>
|
||||
#include <linux/uaccess.h>
|
||||
#include <asm/firmware.h>
|
||||
#include <asm/dtl.h>
|
||||
#include <asm/lppaca.h>
|
||||
#include <asm/debugfs.h>
|
||||
#include <asm/plpar_wrappers.h>
|
||||
|
|
|
@ -40,6 +40,7 @@
|
|||
#include <asm/fadump.h>
|
||||
#include <asm/asm-prototypes.h>
|
||||
#include <asm/debugfs.h>
|
||||
#include <asm/dtl.h>
|
||||
|
||||
#include "pseries.h"
|
||||
|
||||
|
|
|
@ -70,6 +70,7 @@
|
|||
#include <asm/idle.h>
|
||||
#include <asm/swiotlb.h>
|
||||
#include <asm/svm.h>
|
||||
#include <asm/dtl.h>
|
||||
|
||||
#include "pseries.h"
|
||||
#include "../../../../drivers/pci/pci.h"
|
||||
|
|
|
@ -11,6 +11,7 @@
|
|||
#include <asm/svm.h>
|
||||
#include <asm/swiotlb.h>
|
||||
#include <asm/ultravisor.h>
|
||||
#include <asm/dtl.h>
|
||||
|
||||
static int __init init_svm(void)
|
||||
{
|
||||
|
|
|
@ -19,8 +19,6 @@
|
|||
#include <asm/cmpxchg.h>
|
||||
#include <asm/barrier.h>
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
|
||||
#define __atomic_acquire_fence() \
|
||||
__asm__ __volatile__(RISCV_ACQUIRE_BARRIER "" ::: "memory")
|
||||
|
||||
|
|
|
@ -15,8 +15,6 @@
|
|||
#include <asm/barrier.h>
|
||||
#include <asm/cmpxchg.h>
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
|
||||
static inline int atomic_read(const atomic_t *v)
|
||||
{
|
||||
int c;
|
||||
|
|
|
@ -10,6 +10,7 @@
|
|||
|
||||
#include <asm/sigp.h>
|
||||
#include <asm/lowcore.h>
|
||||
#include <asm/processor.h>
|
||||
|
||||
#define raw_smp_processor_id() (S390_lowcore.cpu_nr)
|
||||
|
||||
|
|
|
@ -24,7 +24,6 @@
|
|||
#ifndef __ASSEMBLY__
|
||||
#include <asm/lowcore.h>
|
||||
#include <asm/page.h>
|
||||
#include <asm/processor.h>
|
||||
|
||||
#define STACK_INIT_OFFSET \
|
||||
(THREAD_SIZE - STACK_FRAME_OVERHEAD - sizeof(struct pt_regs))
|
||||
|
|
|
@ -19,8 +19,6 @@
|
|||
#include <asm/cmpxchg.h>
|
||||
#include <asm/barrier.h>
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
|
||||
#define atomic_read(v) READ_ONCE((v)->counter)
|
||||
#define atomic_set(v,i) WRITE_ONCE((v)->counter, (i))
|
||||
|
||||
|
|
|
@ -18,8 +18,6 @@
|
|||
#include <asm/barrier.h>
|
||||
#include <asm-generic/atomic64.h>
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
|
||||
int atomic_add_return(int, atomic_t *);
|
||||
int atomic_fetch_add(int, atomic_t *);
|
||||
int atomic_fetch_and(int, atomic_t *);
|
||||
|
|
|
@ -12,7 +12,6 @@
|
|||
#include <asm/cmpxchg.h>
|
||||
#include <asm/barrier.h>
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
#define ATOMIC64_INIT(i) { (i) }
|
||||
|
||||
#define atomic_read(v) READ_ONCE((v)->counter)
|
||||
|
|
|
@ -4,7 +4,9 @@
|
|||
|
||||
#include <linux/compiler.h>
|
||||
|
||||
#ifndef BUILD_VDSO
|
||||
register unsigned long __local_per_cpu_offset asm("g5");
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_SMP
|
||||
|
||||
|
|
|
@ -2,6 +2,8 @@
|
|||
#ifndef _SPARC_TRAP_BLOCK_H
|
||||
#define _SPARC_TRAP_BLOCK_H
|
||||
|
||||
#include <linux/threads.h>
|
||||
|
||||
#include <asm/hypervisor.h>
|
||||
#include <asm/asi.h>
|
||||
|
||||
|
|
|
@ -3,6 +3,9 @@
|
|||
config TRACE_IRQFLAGS_SUPPORT
|
||||
def_bool y
|
||||
|
||||
config TRACE_IRQFLAGS_NMI_SUPPORT
|
||||
def_bool y
|
||||
|
||||
config EARLY_PRINTK_USB
|
||||
bool
|
||||
|
||||
|
|
|
@ -559,8 +559,7 @@ SYSCALL_DEFINE0(ni_syscall)
|
|||
}
|
||||
|
||||
/**
|
||||
* idtentry_enter_cond_rcu - Handle state tracking on idtentry with conditional
|
||||
* RCU handling
|
||||
* idtentry_enter - Handle state tracking on ordinary idtentries
|
||||
* @regs: Pointer to pt_regs of interrupted context
|
||||
*
|
||||
* Invokes:
|
||||
|
@ -572,6 +571,9 @@ SYSCALL_DEFINE0(ni_syscall)
|
|||
* - The hardirq tracer to keep the state consistent as low level ASM
|
||||
* entry disabled interrupts.
|
||||
*
|
||||
* As a precondition, this requires that the entry came from user mode,
|
||||
* idle, or a kernel context in which RCU is watching.
|
||||
*
|
||||
* For kernel mode entries RCU handling is done conditional. If RCU is
|
||||
* watching then the only RCU requirement is to check whether the tick has
|
||||
* to be restarted. If RCU is not watching then rcu_irq_enter() has to be
|
||||
|
@ -585,18 +587,21 @@ SYSCALL_DEFINE0(ni_syscall)
|
|||
* establish the proper context for NOHZ_FULL. Otherwise scheduling on exit
|
||||
* would not be possible.
|
||||
*
|
||||
* Returns: True if RCU has been adjusted on a kernel entry
|
||||
* False otherwise
|
||||
* Returns: An opaque object that must be passed to idtentry_exit()
|
||||
*
|
||||
* The return value must be fed into the rcu_exit argument of
|
||||
* idtentry_exit_cond_rcu().
|
||||
* The return value must be fed into the state argument of
|
||||
* idtentry_exit().
|
||||
*/
|
||||
bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs)
|
||||
noinstr idtentry_state_t idtentry_enter(struct pt_regs *regs)
|
||||
{
|
||||
idtentry_state_t ret = {
|
||||
.exit_rcu = false,
|
||||
};
|
||||
|
||||
if (user_mode(regs)) {
|
||||
check_user_regs(regs);
|
||||
enter_from_user_mode();
|
||||
return false;
|
||||
return ret;
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -634,7 +639,8 @@ bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs)
|
|||
trace_hardirqs_off_finish();
|
||||
instrumentation_end();
|
||||
|
||||
return true;
|
||||
ret.exit_rcu = true;
|
||||
return ret;
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -649,7 +655,7 @@ bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs)
|
|||
trace_hardirqs_off();
|
||||
instrumentation_end();
|
||||
|
||||
return false;
|
||||
return ret;
|
||||
}
|
||||
|
||||
static void idtentry_exit_cond_resched(struct pt_regs *regs, bool may_sched)
|
||||
|
@ -667,10 +673,9 @@ static void idtentry_exit_cond_resched(struct pt_regs *regs, bool may_sched)
|
|||
}
|
||||
|
||||
/**
|
||||
* idtentry_exit_cond_rcu - Handle return from exception with conditional RCU
|
||||
* handling
|
||||
* idtentry_exit - Handle return from exception that used idtentry_enter()
|
||||
* @regs: Pointer to pt_regs (exception entry regs)
|
||||
* @rcu_exit: Invoke rcu_irq_exit() if true
|
||||
* @state: Return value from matching call to idtentry_enter()
|
||||
*
|
||||
* Depending on the return target (kernel/user) this runs the necessary
|
||||
* preemption and work checks if possible and reguired and returns to
|
||||
|
@ -679,10 +684,10 @@ static void idtentry_exit_cond_resched(struct pt_regs *regs, bool may_sched)
|
|||
* This is the last action before returning to the low level ASM code which
|
||||
* just needs to return to the appropriate context.
|
||||
*
|
||||
* Counterpart to idtentry_enter_cond_rcu(). The return value of the entry
|
||||
* function must be fed into the @rcu_exit argument.
|
||||
* Counterpart to idtentry_enter(). The return value of the entry
|
||||
* function must be fed into the @state argument.
|
||||
*/
|
||||
void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit)
|
||||
noinstr void idtentry_exit(struct pt_regs *regs, idtentry_state_t state)
|
||||
{
|
||||
lockdep_assert_irqs_disabled();
|
||||
|
||||
|
@ -695,7 +700,7 @@ void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit)
|
|||
* carefully and needs the same ordering of lockdep/tracing
|
||||
* and RCU as the return to user mode path.
|
||||
*/
|
||||
if (rcu_exit) {
|
||||
if (state.exit_rcu) {
|
||||
instrumentation_begin();
|
||||
/* Tell the tracer that IRET will enable interrupts */
|
||||
trace_hardirqs_on_prepare();
|
||||
|
@ -714,7 +719,7 @@ void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit)
|
|||
* IRQ flags state is correct already. Just tell RCU if it
|
||||
* was not watching on entry.
|
||||
*/
|
||||
if (rcu_exit)
|
||||
if (state.exit_rcu)
|
||||
rcu_irq_exit();
|
||||
}
|
||||
}
|
||||
|
@ -726,7 +731,7 @@ void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit)
|
|||
* Invokes enter_from_user_mode() to establish the proper context for
|
||||
* NOHZ_FULL. Otherwise scheduling on exit would not be possible.
|
||||
*/
|
||||
void noinstr idtentry_enter_user(struct pt_regs *regs)
|
||||
noinstr void idtentry_enter_user(struct pt_regs *regs)
|
||||
{
|
||||
check_user_regs(regs);
|
||||
enter_from_user_mode();
|
||||
|
@ -744,13 +749,47 @@ void noinstr idtentry_enter_user(struct pt_regs *regs)
|
|||
*
|
||||
* Counterpart to idtentry_enter_user().
|
||||
*/
|
||||
void noinstr idtentry_exit_user(struct pt_regs *regs)
|
||||
noinstr void idtentry_exit_user(struct pt_regs *regs)
|
||||
{
|
||||
lockdep_assert_irqs_disabled();
|
||||
|
||||
prepare_exit_to_usermode(regs);
|
||||
}
|
||||
|
||||
noinstr bool idtentry_enter_nmi(struct pt_regs *regs)
|
||||
{
|
||||
bool irq_state = lockdep_hardirqs_enabled();
|
||||
|
||||
__nmi_enter();
|
||||
lockdep_hardirqs_off(CALLER_ADDR0);
|
||||
lockdep_hardirq_enter();
|
||||
rcu_nmi_enter();
|
||||
|
||||
instrumentation_begin();
|
||||
trace_hardirqs_off_finish();
|
||||
ftrace_nmi_enter();
|
||||
instrumentation_end();
|
||||
|
||||
return irq_state;
|
||||
}
|
||||
|
||||
noinstr void idtentry_exit_nmi(struct pt_regs *regs, bool restore)
|
||||
{
|
||||
instrumentation_begin();
|
||||
ftrace_nmi_exit();
|
||||
if (restore) {
|
||||
trace_hardirqs_on_prepare();
|
||||
lockdep_hardirqs_on_prepare(CALLER_ADDR0);
|
||||
}
|
||||
instrumentation_end();
|
||||
|
||||
rcu_nmi_exit();
|
||||
lockdep_hardirq_exit();
|
||||
if (restore)
|
||||
lockdep_hardirqs_on(CALLER_ADDR0);
|
||||
__nmi_exit();
|
||||
}
|
||||
|
||||
#ifdef CONFIG_XEN_PV
|
||||
#ifndef CONFIG_PREEMPTION
|
||||
/*
|
||||
|
@ -800,9 +839,10 @@ static void __xen_pv_evtchn_do_upcall(void)
|
|||
__visible noinstr void xen_pv_evtchn_do_upcall(struct pt_regs *regs)
|
||||
{
|
||||
struct pt_regs *old_regs;
|
||||
bool inhcall, rcu_exit;
|
||||
bool inhcall;
|
||||
idtentry_state_t state;
|
||||
|
||||
rcu_exit = idtentry_enter_cond_rcu(regs);
|
||||
state = idtentry_enter(regs);
|
||||
old_regs = set_irq_regs(regs);
|
||||
|
||||
instrumentation_begin();
|
||||
|
@ -812,13 +852,13 @@ __visible noinstr void xen_pv_evtchn_do_upcall(struct pt_regs *regs)
|
|||
set_irq_regs(old_regs);
|
||||
|
||||
inhcall = get_and_clear_inhcall();
|
||||
if (inhcall && !WARN_ON_ONCE(rcu_exit)) {
|
||||
if (inhcall && !WARN_ON_ONCE(state.exit_rcu)) {
|
||||
instrumentation_begin();
|
||||
idtentry_exit_cond_resched(regs, true);
|
||||
instrumentation_end();
|
||||
restore_inhcall(inhcall);
|
||||
} else {
|
||||
idtentry_exit_cond_rcu(regs, rcu_exit);
|
||||
idtentry_exit(regs, state);
|
||||
}
|
||||
}
|
||||
#endif /* CONFIG_XEN_PV */
|
||||
|
|
|
@ -14,8 +14,6 @@
|
|||
* resource counting etc..
|
||||
*/
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
|
||||
/**
|
||||
* arch_atomic_read - read atomic variable
|
||||
* @v: pointer of type atomic_t
|
||||
|
|
|
@ -13,8 +13,15 @@
|
|||
void idtentry_enter_user(struct pt_regs *regs);
|
||||
void idtentry_exit_user(struct pt_regs *regs);
|
||||
|
||||
bool idtentry_enter_cond_rcu(struct pt_regs *regs);
|
||||
void idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit);
|
||||
typedef struct idtentry_state {
|
||||
bool exit_rcu;
|
||||
} idtentry_state_t;
|
||||
|
||||
idtentry_state_t idtentry_enter(struct pt_regs *regs);
|
||||
void idtentry_exit(struct pt_regs *regs, idtentry_state_t state);
|
||||
|
||||
bool idtentry_enter_nmi(struct pt_regs *regs);
|
||||
void idtentry_exit_nmi(struct pt_regs *regs, bool irq_state);
|
||||
|
||||
/**
|
||||
* DECLARE_IDTENTRY - Declare functions for simple IDT entry points
|
||||
|
@ -54,12 +61,12 @@ static __always_inline void __##func(struct pt_regs *regs); \
|
|||
\
|
||||
__visible noinstr void func(struct pt_regs *regs) \
|
||||
{ \
|
||||
bool rcu_exit = idtentry_enter_cond_rcu(regs); \
|
||||
idtentry_state_t state = idtentry_enter(regs); \
|
||||
\
|
||||
instrumentation_begin(); \
|
||||
__##func (regs); \
|
||||
instrumentation_end(); \
|
||||
idtentry_exit_cond_rcu(regs, rcu_exit); \
|
||||
idtentry_exit(regs, state); \
|
||||
} \
|
||||
\
|
||||
static __always_inline void __##func(struct pt_regs *regs)
|
||||
|
@ -101,12 +108,12 @@ static __always_inline void __##func(struct pt_regs *regs, \
|
|||
__visible noinstr void func(struct pt_regs *regs, \
|
||||
unsigned long error_code) \
|
||||
{ \
|
||||
bool rcu_exit = idtentry_enter_cond_rcu(regs); \
|
||||
idtentry_state_t state = idtentry_enter(regs); \
|
||||
\
|
||||
instrumentation_begin(); \
|
||||
__##func (regs, error_code); \
|
||||
instrumentation_end(); \
|
||||
idtentry_exit_cond_rcu(regs, rcu_exit); \
|
||||
idtentry_exit(regs, state); \
|
||||
} \
|
||||
\
|
||||
static __always_inline void __##func(struct pt_regs *regs, \
|
||||
|
@ -199,7 +206,7 @@ static __always_inline void __##func(struct pt_regs *regs, u8 vector); \
|
|||
__visible noinstr void func(struct pt_regs *regs, \
|
||||
unsigned long error_code) \
|
||||
{ \
|
||||
bool rcu_exit = idtentry_enter_cond_rcu(regs); \
|
||||
idtentry_state_t state = idtentry_enter(regs); \
|
||||
\
|
||||
instrumentation_begin(); \
|
||||
irq_enter_rcu(); \
|
||||
|
@ -207,7 +214,7 @@ __visible noinstr void func(struct pt_regs *regs, \
|
|||
__##func (regs, (u8)error_code); \
|
||||
irq_exit_rcu(); \
|
||||
instrumentation_end(); \
|
||||
idtentry_exit_cond_rcu(regs, rcu_exit); \
|
||||
idtentry_exit(regs, state); \
|
||||
} \
|
||||
\
|
||||
static __always_inline void __##func(struct pt_regs *regs, u8 vector)
|
||||
|
@ -241,7 +248,7 @@ static void __##func(struct pt_regs *regs); \
|
|||
\
|
||||
__visible noinstr void func(struct pt_regs *regs) \
|
||||
{ \
|
||||
bool rcu_exit = idtentry_enter_cond_rcu(regs); \
|
||||
idtentry_state_t state = idtentry_enter(regs); \
|
||||
\
|
||||
instrumentation_begin(); \
|
||||
irq_enter_rcu(); \
|
||||
|
@ -249,7 +256,7 @@ __visible noinstr void func(struct pt_regs *regs) \
|
|||
run_on_irqstack_cond(__##func, regs, regs); \
|
||||
irq_exit_rcu(); \
|
||||
instrumentation_end(); \
|
||||
idtentry_exit_cond_rcu(regs, rcu_exit); \
|
||||
idtentry_exit(regs, state); \
|
||||
} \
|
||||
\
|
||||
static noinline void __##func(struct pt_regs *regs)
|
||||
|
@ -270,7 +277,7 @@ static __always_inline void __##func(struct pt_regs *regs); \
|
|||
\
|
||||
__visible noinstr void func(struct pt_regs *regs) \
|
||||
{ \
|
||||
bool rcu_exit = idtentry_enter_cond_rcu(regs); \
|
||||
idtentry_state_t state = idtentry_enter(regs); \
|
||||
\
|
||||
instrumentation_begin(); \
|
||||
__irq_enter_raw(); \
|
||||
|
@ -278,7 +285,7 @@ __visible noinstr void func(struct pt_regs *regs) \
|
|||
__##func (regs); \
|
||||
__irq_exit_raw(); \
|
||||
instrumentation_end(); \
|
||||
idtentry_exit_cond_rcu(regs, rcu_exit); \
|
||||
idtentry_exit(regs, state); \
|
||||
} \
|
||||
\
|
||||
static __always_inline void __##func(struct pt_regs *regs)
|
||||
|
|
|
@ -233,7 +233,7 @@ EXPORT_SYMBOL_GPL(kvm_read_and_reset_apf_flags);
|
|||
noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
|
||||
{
|
||||
u32 reason = kvm_read_and_reset_apf_flags();
|
||||
bool rcu_exit;
|
||||
idtentry_state_t state;
|
||||
|
||||
switch (reason) {
|
||||
case KVM_PV_REASON_PAGE_NOT_PRESENT:
|
||||
|
@ -243,7 +243,7 @@ noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
|
|||
return false;
|
||||
}
|
||||
|
||||
rcu_exit = idtentry_enter_cond_rcu(regs);
|
||||
state = idtentry_enter(regs);
|
||||
instrumentation_begin();
|
||||
|
||||
/*
|
||||
|
@ -264,7 +264,7 @@ noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
|
|||
}
|
||||
|
||||
instrumentation_end();
|
||||
idtentry_exit_cond_rcu(regs, rcu_exit);
|
||||
idtentry_exit(regs, state);
|
||||
return true;
|
||||
}
|
||||
|
||||
|
|
|
@ -330,7 +330,6 @@ static noinstr void default_do_nmi(struct pt_regs *regs)
|
|||
__this_cpu_write(last_nmi_rip, regs->ip);
|
||||
|
||||
instrumentation_begin();
|
||||
trace_hardirqs_off_finish();
|
||||
|
||||
handled = nmi_handle(NMI_LOCAL, regs);
|
||||
__this_cpu_add(nmi_stats.normal, handled);
|
||||
|
@ -417,8 +416,6 @@ static noinstr void default_do_nmi(struct pt_regs *regs)
|
|||
unknown_nmi_error(reason, regs);
|
||||
|
||||
out:
|
||||
if (regs->flags & X86_EFLAGS_IF)
|
||||
trace_hardirqs_on_prepare();
|
||||
instrumentation_end();
|
||||
}
|
||||
|
||||
|
@ -478,6 +475,8 @@ static DEFINE_PER_CPU(unsigned long, nmi_dr7);
|
|||
|
||||
DEFINE_IDTENTRY_RAW(exc_nmi)
|
||||
{
|
||||
bool irq_state;
|
||||
|
||||
if (IS_ENABLED(CONFIG_SMP) && arch_cpu_is_offline(smp_processor_id()))
|
||||
return;
|
||||
|
||||
|
@ -491,14 +490,14 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
|
|||
|
||||
this_cpu_write(nmi_dr7, local_db_save());
|
||||
|
||||
nmi_enter();
|
||||
irq_state = idtentry_enter_nmi(regs);
|
||||
|
||||
inc_irq_stat(__nmi_count);
|
||||
|
||||
if (!ignore_nmis)
|
||||
default_do_nmi(regs);
|
||||
|
||||
nmi_exit();
|
||||
idtentry_exit_nmi(regs, irq_state);
|
||||
|
||||
local_db_restore(this_cpu_read(nmi_dr7));
|
||||
|
||||
|
|
|
@ -245,7 +245,7 @@ static noinstr bool handle_bug(struct pt_regs *regs)
|
|||
|
||||
DEFINE_IDTENTRY_RAW(exc_invalid_op)
|
||||
{
|
||||
bool rcu_exit;
|
||||
idtentry_state_t state;
|
||||
|
||||
/*
|
||||
* We use UD2 as a short encoding for 'CALL __WARN', as such
|
||||
|
@ -255,11 +255,11 @@ DEFINE_IDTENTRY_RAW(exc_invalid_op)
|
|||
if (!user_mode(regs) && handle_bug(regs))
|
||||
return;
|
||||
|
||||
rcu_exit = idtentry_enter_cond_rcu(regs);
|
||||
state = idtentry_enter(regs);
|
||||
instrumentation_begin();
|
||||
handle_invalid_op(regs);
|
||||
instrumentation_end();
|
||||
idtentry_exit_cond_rcu(regs, rcu_exit);
|
||||
idtentry_exit(regs, state);
|
||||
}
|
||||
|
||||
DEFINE_IDTENTRY(exc_coproc_segment_overrun)
|
||||
|
@ -405,7 +405,7 @@ DEFINE_IDTENTRY_DF(exc_double_fault)
|
|||
}
|
||||
#endif
|
||||
|
||||
nmi_enter();
|
||||
idtentry_enter_nmi(regs);
|
||||
instrumentation_begin();
|
||||
notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV);
|
||||
|
||||
|
@ -651,15 +651,12 @@ DEFINE_IDTENTRY_RAW(exc_int3)
|
|||
instrumentation_end();
|
||||
idtentry_exit_user(regs);
|
||||
} else {
|
||||
nmi_enter();
|
||||
bool irq_state = idtentry_enter_nmi(regs);
|
||||
instrumentation_begin();
|
||||
trace_hardirqs_off_finish();
|
||||
if (!do_int3(regs))
|
||||
die("int3", regs, 0);
|
||||
if (regs->flags & X86_EFLAGS_IF)
|
||||
trace_hardirqs_on_prepare();
|
||||
instrumentation_end();
|
||||
nmi_exit();
|
||||
idtentry_exit_nmi(regs, irq_state);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -867,9 +864,8 @@ static void handle_debug(struct pt_regs *regs, unsigned long dr6, bool user)
|
|||
static __always_inline void exc_debug_kernel(struct pt_regs *regs,
|
||||
unsigned long dr6)
|
||||
{
|
||||
nmi_enter();
|
||||
bool irq_state = idtentry_enter_nmi(regs);
|
||||
instrumentation_begin();
|
||||
trace_hardirqs_off_finish();
|
||||
|
||||
/*
|
||||
* If something gets miswired and we end up here for a user mode
|
||||
|
@ -886,10 +882,8 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
|
|||
|
||||
handle_debug(regs, dr6, false);
|
||||
|
||||
if (regs->flags & X86_EFLAGS_IF)
|
||||
trace_hardirqs_on_prepare();
|
||||
instrumentation_end();
|
||||
nmi_exit();
|
||||
idtentry_exit_nmi(regs, irq_state);
|
||||
}
|
||||
|
||||
static __always_inline void exc_debug_user(struct pt_regs *regs,
|
||||
|
@ -905,6 +899,7 @@ static __always_inline void exc_debug_user(struct pt_regs *regs,
|
|||
instrumentation_begin();
|
||||
|
||||
handle_debug(regs, dr6, true);
|
||||
|
||||
instrumentation_end();
|
||||
idtentry_exit_user(regs);
|
||||
}
|
||||
|
|
|
@ -1377,7 +1377,7 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,
|
|||
DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
|
||||
{
|
||||
unsigned long address = read_cr2();
|
||||
bool rcu_exit;
|
||||
idtentry_state_t state;
|
||||
|
||||
prefetchw(¤t->mm->mmap_lock);
|
||||
|
||||
|
@ -1412,11 +1412,11 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
|
|||
* code reenabled RCU to avoid subsequent wreckage which helps
|
||||
* debugability.
|
||||
*/
|
||||
rcu_exit = idtentry_enter_cond_rcu(regs);
|
||||
state = idtentry_enter(regs);
|
||||
|
||||
instrumentation_begin();
|
||||
handle_page_fault(regs, error_code, address);
|
||||
instrumentation_end();
|
||||
|
||||
idtentry_exit_cond_rcu(regs, rcu_exit);
|
||||
idtentry_exit(regs, state);
|
||||
}
|
||||
|
|
|
@ -135,7 +135,7 @@ static inline void cpa_inc_2m_checked(void)
|
|||
|
||||
static inline void cpa_inc_4k_install(void)
|
||||
{
|
||||
cpa_4k_install++;
|
||||
data_race(cpa_4k_install++);
|
||||
}
|
||||
|
||||
static inline void cpa_inc_lp_sameprot(int level)
|
||||
|
|
|
@ -19,8 +19,6 @@
|
|||
#include <asm/cmpxchg.h>
|
||||
#include <asm/barrier.h>
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
|
||||
/*
|
||||
* This Xtensa implementation assumes that the right mechanism
|
||||
* for exclusion is for locking interrupts to level EXCM_LEVEL.
|
||||
|
|
|
@ -159,8 +159,6 @@ ATOMIC_OP(xor, ^)
|
|||
* resource counting etc..
|
||||
*/
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
|
||||
/**
|
||||
* atomic_read - read atomic variable
|
||||
* @v: pointer of type atomic_t
|
||||
|
|
|
@ -11,6 +11,7 @@
|
|||
#define __ASM_GENERIC_QSPINLOCK_H
|
||||
|
||||
#include <asm-generic/qspinlock_types.h>
|
||||
#include <linux/atomic.h>
|
||||
|
||||
/**
|
||||
* queued_spin_is_locked - is the spinlock locked?
|
||||
|
|
|
@ -9,15 +9,7 @@
|
|||
#ifndef __ASM_GENERIC_QSPINLOCK_TYPES_H
|
||||
#define __ASM_GENERIC_QSPINLOCK_TYPES_H
|
||||
|
||||
/*
|
||||
* Including atomic.h with PARAVIRT on will cause compilation errors because
|
||||
* of recursive header file incluson via paravirt_types.h. So don't include
|
||||
* it if PARAVIRT is on.
|
||||
*/
|
||||
#ifndef CONFIG_PARAVIRT
|
||||
#include <linux/types.h>
|
||||
#include <linux/atomic.h>
|
||||
#endif
|
||||
|
||||
typedef struct qspinlock {
|
||||
union {
|
||||
|
|
|
@ -111,32 +111,42 @@ extern void rcu_nmi_exit(void);
|
|||
/*
|
||||
* nmi_enter() can nest up to 15 times; see NMI_BITS.
|
||||
*/
|
||||
#define nmi_enter() \
|
||||
#define __nmi_enter() \
|
||||
do { \
|
||||
lockdep_off(); \
|
||||
arch_nmi_enter(); \
|
||||
printk_nmi_enter(); \
|
||||
lockdep_off(); \
|
||||
BUG_ON(in_nmi() == NMI_MASK); \
|
||||
__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \
|
||||
rcu_nmi_enter(); \
|
||||
} while (0)
|
||||
|
||||
#define nmi_enter() \
|
||||
do { \
|
||||
__nmi_enter(); \
|
||||
lockdep_hardirq_enter(); \
|
||||
rcu_nmi_enter(); \
|
||||
instrumentation_begin(); \
|
||||
ftrace_nmi_enter(); \
|
||||
instrumentation_end(); \
|
||||
} while (0)
|
||||
|
||||
#define __nmi_exit() \
|
||||
do { \
|
||||
BUG_ON(!in_nmi()); \
|
||||
__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
|
||||
printk_nmi_exit(); \
|
||||
arch_nmi_exit(); \
|
||||
lockdep_on(); \
|
||||
} while (0)
|
||||
|
||||
#define nmi_exit() \
|
||||
do { \
|
||||
instrumentation_begin(); \
|
||||
ftrace_nmi_exit(); \
|
||||
instrumentation_end(); \
|
||||
lockdep_hardirq_exit(); \
|
||||
rcu_nmi_exit(); \
|
||||
BUG_ON(!in_nmi()); \
|
||||
__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
|
||||
lockdep_on(); \
|
||||
printk_nmi_exit(); \
|
||||
arch_nmi_exit(); \
|
||||
lockdep_hardirq_exit(); \
|
||||
__nmi_exit(); \
|
||||
} while (0)
|
||||
|
||||
#endif /* LINUX_HARDIRQ_H */
|
||||
|
|
|
@ -14,6 +14,7 @@
|
|||
|
||||
#include <linux/typecheck.h>
|
||||
#include <asm/irqflags.h>
|
||||
#include <asm/percpu.h>
|
||||
|
||||
/* Currently lockdep_softirqs_on/off is used only by lockdep */
|
||||
#ifdef CONFIG_PROVE_LOCKING
|
||||
|
@ -31,17 +32,34 @@
|
|||
#endif
|
||||
|
||||
#ifdef CONFIG_TRACE_IRQFLAGS
|
||||
|
||||
/* Per-task IRQ trace events information. */
|
||||
struct irqtrace_events {
|
||||
unsigned int irq_events;
|
||||
unsigned long hardirq_enable_ip;
|
||||
unsigned long hardirq_disable_ip;
|
||||
unsigned int hardirq_enable_event;
|
||||
unsigned int hardirq_disable_event;
|
||||
unsigned long softirq_disable_ip;
|
||||
unsigned long softirq_enable_ip;
|
||||
unsigned int softirq_disable_event;
|
||||
unsigned int softirq_enable_event;
|
||||
};
|
||||
|
||||
DECLARE_PER_CPU(int, hardirqs_enabled);
|
||||
DECLARE_PER_CPU(int, hardirq_context);
|
||||
|
||||
extern void trace_hardirqs_on_prepare(void);
|
||||
extern void trace_hardirqs_off_finish(void);
|
||||
extern void trace_hardirqs_on(void);
|
||||
extern void trace_hardirqs_off(void);
|
||||
# define lockdep_hardirq_context(p) ((p)->hardirq_context)
|
||||
# define lockdep_hardirq_context() (this_cpu_read(hardirq_context))
|
||||
# define lockdep_softirq_context(p) ((p)->softirq_context)
|
||||
# define lockdep_hardirqs_enabled(p) ((p)->hardirqs_enabled)
|
||||
# define lockdep_hardirqs_enabled() (this_cpu_read(hardirqs_enabled))
|
||||
# define lockdep_softirqs_enabled(p) ((p)->softirqs_enabled)
|
||||
# define lockdep_hardirq_enter() \
|
||||
do { \
|
||||
if (!current->hardirq_context++) \
|
||||
if (this_cpu_inc_return(hardirq_context) == 1) \
|
||||
current->hardirq_threaded = 0; \
|
||||
} while (0)
|
||||
# define lockdep_hardirq_threaded() \
|
||||
|
@ -50,7 +68,7 @@ do { \
|
|||
} while (0)
|
||||
# define lockdep_hardirq_exit() \
|
||||
do { \
|
||||
current->hardirq_context--; \
|
||||
this_cpu_dec(hardirq_context); \
|
||||
} while (0)
|
||||
# define lockdep_softirq_enter() \
|
||||
do { \
|
||||
|
@ -104,9 +122,9 @@ do { \
|
|||
# define trace_hardirqs_off_finish() do { } while (0)
|
||||
# define trace_hardirqs_on() do { } while (0)
|
||||
# define trace_hardirqs_off() do { } while (0)
|
||||
# define lockdep_hardirq_context(p) 0
|
||||
# define lockdep_hardirq_context() 0
|
||||
# define lockdep_softirq_context(p) 0
|
||||
# define lockdep_hardirqs_enabled(p) 0
|
||||
# define lockdep_hardirqs_enabled() 0
|
||||
# define lockdep_softirqs_enabled(p) 0
|
||||
# define lockdep_hardirq_enter() do { } while (0)
|
||||
# define lockdep_hardirq_threaded() do { } while (0)
|
||||
|
|
|
@ -10,33 +10,15 @@
|
|||
#ifndef __LINUX_LOCKDEP_H
|
||||
#define __LINUX_LOCKDEP_H
|
||||
|
||||
#include <linux/lockdep_types.h>
|
||||
#include <asm/percpu.h>
|
||||
|
||||
struct task_struct;
|
||||
struct lockdep_map;
|
||||
|
||||
/* for sysctl */
|
||||
extern int prove_locking;
|
||||
extern int lock_stat;
|
||||
|
||||
#define MAX_LOCKDEP_SUBCLASSES 8UL
|
||||
|
||||
#include <linux/types.h>
|
||||
|
||||
enum lockdep_wait_type {
|
||||
LD_WAIT_INV = 0, /* not checked, catch all */
|
||||
|
||||
LD_WAIT_FREE, /* wait free, rcu etc.. */
|
||||
LD_WAIT_SPIN, /* spin loops, raw_spinlock_t etc.. */
|
||||
|
||||
#ifdef CONFIG_PROVE_RAW_LOCK_NESTING
|
||||
LD_WAIT_CONFIG, /* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
|
||||
#else
|
||||
LD_WAIT_CONFIG = LD_WAIT_SPIN,
|
||||
#endif
|
||||
LD_WAIT_SLEEP, /* sleeping locks, mutex_t etc.. */
|
||||
|
||||
LD_WAIT_MAX, /* must be last */
|
||||
};
|
||||
|
||||
#ifdef CONFIG_LOCKDEP
|
||||
|
||||
#include <linux/linkage.h>
|
||||
|
@ -44,147 +26,6 @@ enum lockdep_wait_type {
|
|||
#include <linux/debug_locks.h>
|
||||
#include <linux/stacktrace.h>
|
||||
|
||||
/*
|
||||
* We'd rather not expose kernel/lockdep_states.h this wide, but we do need
|
||||
* the total number of states... :-(
|
||||
*/
|
||||
#define XXX_LOCK_USAGE_STATES (1+2*4)
|
||||
|
||||
/*
|
||||
* NR_LOCKDEP_CACHING_CLASSES ... Number of classes
|
||||
* cached in the instance of lockdep_map
|
||||
*
|
||||
* Currently main class (subclass == 0) and signle depth subclass
|
||||
* are cached in lockdep_map. This optimization is mainly targeting
|
||||
* on rq->lock. double_rq_lock() acquires this highly competitive with
|
||||
* single depth.
|
||||
*/
|
||||
#define NR_LOCKDEP_CACHING_CLASSES 2
|
||||
|
||||
/*
|
||||
* A lockdep key is associated with each lock object. For static locks we use
|
||||
* the lock address itself as the key. Dynamically allocated lock objects can
|
||||
* have a statically or dynamically allocated key. Dynamically allocated lock
|
||||
* keys must be registered before being used and must be unregistered before
|
||||
* the key memory is freed.
|
||||
*/
|
||||
struct lockdep_subclass_key {
|
||||
char __one_byte;
|
||||
} __attribute__ ((__packed__));
|
||||
|
||||
/* hash_entry is used to keep track of dynamically allocated keys. */
|
||||
struct lock_class_key {
|
||||
union {
|
||||
struct hlist_node hash_entry;
|
||||
struct lockdep_subclass_key subkeys[MAX_LOCKDEP_SUBCLASSES];
|
||||
};
|
||||
};
|
||||
|
||||
extern struct lock_class_key __lockdep_no_validate__;
|
||||
|
||||
struct lock_trace;
|
||||
|
||||
#define LOCKSTAT_POINTS 4
|
||||
|
||||
/*
|
||||
* The lock-class itself. The order of the structure members matters.
|
||||
* reinit_class() zeroes the key member and all subsequent members.
|
||||
*/
|
||||
struct lock_class {
|
||||
/*
|
||||
* class-hash:
|
||||
*/
|
||||
struct hlist_node hash_entry;
|
||||
|
||||
/*
|
||||
* Entry in all_lock_classes when in use. Entry in free_lock_classes
|
||||
* when not in use. Instances that are being freed are on one of the
|
||||
* zapped_classes lists.
|
||||
*/
|
||||
struct list_head lock_entry;
|
||||
|
||||
/*
|
||||
* These fields represent a directed graph of lock dependencies,
|
||||
* to every node we attach a list of "forward" and a list of
|
||||
* "backward" graph nodes.
|
||||
*/
|
||||
struct list_head locks_after, locks_before;
|
||||
|
||||
const struct lockdep_subclass_key *key;
|
||||
unsigned int subclass;
|
||||
unsigned int dep_gen_id;
|
||||
|
||||
/*
|
||||
* IRQ/softirq usage tracking bits:
|
||||
*/
|
||||
unsigned long usage_mask;
|
||||
const struct lock_trace *usage_traces[XXX_LOCK_USAGE_STATES];
|
||||
|
||||
/*
|
||||
* Generation counter, when doing certain classes of graph walking,
|
||||
* to ensure that we check one node only once:
|
||||
*/
|
||||
int name_version;
|
||||
const char *name;
|
||||
|
||||
short wait_type_inner;
|
||||
short wait_type_outer;
|
||||
|
||||
#ifdef CONFIG_LOCK_STAT
|
||||
unsigned long contention_point[LOCKSTAT_POINTS];
|
||||
unsigned long contending_point[LOCKSTAT_POINTS];
|
||||
#endif
|
||||
} __no_randomize_layout;
|
||||
|
||||
#ifdef CONFIG_LOCK_STAT
|
||||
struct lock_time {
|
||||
s64 min;
|
||||
s64 max;
|
||||
s64 total;
|
||||
unsigned long nr;
|
||||
};
|
||||
|
||||
enum bounce_type {
|
||||
bounce_acquired_write,
|
||||
bounce_acquired_read,
|
||||
bounce_contended_write,
|
||||
bounce_contended_read,
|
||||
nr_bounce_types,
|
||||
|
||||
bounce_acquired = bounce_acquired_write,
|
||||
bounce_contended = bounce_contended_write,
|
||||
};
|
||||
|
||||
struct lock_class_stats {
|
||||
unsigned long contention_point[LOCKSTAT_POINTS];
|
||||
unsigned long contending_point[LOCKSTAT_POINTS];
|
||||
struct lock_time read_waittime;
|
||||
struct lock_time write_waittime;
|
||||
struct lock_time read_holdtime;
|
||||
struct lock_time write_holdtime;
|
||||
unsigned long bounces[nr_bounce_types];
|
||||
};
|
||||
|
||||
struct lock_class_stats lock_stats(struct lock_class *class);
|
||||
void clear_lock_stats(struct lock_class *class);
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Map the lock object (the lock instance) to the lock-class object.
|
||||
* This is embedded into specific lock instances:
|
||||
*/
|
||||
struct lockdep_map {
|
||||
struct lock_class_key *key;
|
||||
struct lock_class *class_cache[NR_LOCKDEP_CACHING_CLASSES];
|
||||
const char *name;
|
||||
short wait_type_outer; /* can be taken in this context */
|
||||
short wait_type_inner; /* presents this context */
|
||||
#ifdef CONFIG_LOCK_STAT
|
||||
int cpu;
|
||||
unsigned long ip;
|
||||
#endif
|
||||
};
|
||||
|
||||
static inline void lockdep_copy_map(struct lockdep_map *to,
|
||||
struct lockdep_map *from)
|
||||
{
|
||||
|
@ -440,8 +281,6 @@ static inline void lock_set_subclass(struct lockdep_map *lock,
|
|||
|
||||
extern void lock_downgrade(struct lockdep_map *lock, unsigned long ip);
|
||||
|
||||
struct pin_cookie { unsigned int val; };
|
||||
|
||||
#define NIL_COOKIE (struct pin_cookie){ .val = 0U, }
|
||||
|
||||
extern struct pin_cookie lock_pin_lock(struct lockdep_map *lock);
|
||||
|
@ -520,10 +359,6 @@ static inline void lockdep_set_selftest_task(struct task_struct *task)
|
|||
# define lockdep_reset() do { debug_locks = 1; } while (0)
|
||||
# define lockdep_free_key_range(start, size) do { } while (0)
|
||||
# define lockdep_sys_exit() do { } while (0)
|
||||
/*
|
||||
* The class key takes no space if lockdep is disabled:
|
||||
*/
|
||||
struct lock_class_key { };
|
||||
|
||||
static inline void lockdep_register_key(struct lock_class_key *key)
|
||||
{
|
||||
|
@ -533,11 +368,6 @@ static inline void lockdep_unregister_key(struct lock_class_key *key)
|
|||
{
|
||||
}
|
||||
|
||||
/*
|
||||
* The lockdep_map takes no space if lockdep is disabled:
|
||||
*/
|
||||
struct lockdep_map { };
|
||||
|
||||
#define lockdep_depth(tsk) (0)
|
||||
|
||||
#define lockdep_is_held_type(l, r) (1)
|
||||
|
@ -549,8 +379,6 @@ struct lockdep_map { };
|
|||
|
||||
#define lockdep_recursing(tsk) (0)
|
||||
|
||||
struct pin_cookie { };
|
||||
|
||||
#define NIL_COOKIE (struct pin_cookie){ }
|
||||
|
||||
#define lockdep_pin_lock(l) ({ struct pin_cookie cookie = { }; cookie; })
|
||||
|
@ -703,38 +531,58 @@ do { \
|
|||
lock_release(&(lock)->dep_map, _THIS_IP_); \
|
||||
} while (0)
|
||||
|
||||
#define lockdep_assert_irqs_enabled() do { \
|
||||
WARN_ONCE(debug_locks && !current->lockdep_recursion && \
|
||||
!current->hardirqs_enabled, \
|
||||
"IRQs not enabled as expected\n"); \
|
||||
DECLARE_PER_CPU(int, hardirqs_enabled);
|
||||
DECLARE_PER_CPU(int, hardirq_context);
|
||||
|
||||
#define lockdep_assert_irqs_enabled() \
|
||||
do { \
|
||||
WARN_ON_ONCE(debug_locks && !this_cpu_read(hardirqs_enabled)); \
|
||||
} while (0)
|
||||
|
||||
#define lockdep_assert_irqs_disabled() do { \
|
||||
WARN_ONCE(debug_locks && !current->lockdep_recursion && \
|
||||
current->hardirqs_enabled, \
|
||||
"IRQs not disabled as expected\n"); \
|
||||
#define lockdep_assert_irqs_disabled() \
|
||||
do { \
|
||||
WARN_ON_ONCE(debug_locks && this_cpu_read(hardirqs_enabled)); \
|
||||
} while (0)
|
||||
|
||||
#define lockdep_assert_in_irq() do { \
|
||||
WARN_ONCE(debug_locks && !current->lockdep_recursion && \
|
||||
!current->hardirq_context, \
|
||||
"Not in hardirq as expected\n"); \
|
||||
#define lockdep_assert_in_irq() \
|
||||
do { \
|
||||
WARN_ON_ONCE(debug_locks && !this_cpu_read(hardirq_context)); \
|
||||
} while (0)
|
||||
|
||||
#define lockdep_assert_preemption_enabled() \
|
||||
do { \
|
||||
WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT) && \
|
||||
debug_locks && \
|
||||
(preempt_count() != 0 || \
|
||||
!this_cpu_read(hardirqs_enabled))); \
|
||||
} while (0)
|
||||
|
||||
#define lockdep_assert_preemption_disabled() \
|
||||
do { \
|
||||
WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT) && \
|
||||
debug_locks && \
|
||||
(preempt_count() == 0 && \
|
||||
this_cpu_read(hardirqs_enabled))); \
|
||||
} while (0)
|
||||
|
||||
#else
|
||||
# define might_lock(lock) do { } while (0)
|
||||
# define might_lock_read(lock) do { } while (0)
|
||||
# define might_lock_nested(lock, subclass) do { } while (0)
|
||||
|
||||
# define lockdep_assert_irqs_enabled() do { } while (0)
|
||||
# define lockdep_assert_irqs_disabled() do { } while (0)
|
||||
# define lockdep_assert_in_irq() do { } while (0)
|
||||
|
||||
# define lockdep_assert_preemption_enabled() do { } while (0)
|
||||
# define lockdep_assert_preemption_disabled() do { } while (0)
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_PROVE_RAW_LOCK_NESTING
|
||||
|
||||
# define lockdep_assert_RT_in_threaded_ctx() do { \
|
||||
WARN_ONCE(debug_locks && !current->lockdep_recursion && \
|
||||
current->hardirq_context && \
|
||||
lockdep_hardirq_context() && \
|
||||
!(current->hardirq_threaded || current->irq_config), \
|
||||
"Not in threaded context on PREEMPT_RT as expected\n"); \
|
||||
} while (0)
|
||||
|
|
|
@ -0,0 +1,194 @@
|
|||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
/*
|
||||
* Runtime locking correctness validator
|
||||
*
|
||||
* Copyright (C) 2006,2007 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
|
||||
* Copyright (C) 2007 Red Hat, Inc., Peter Zijlstra
|
||||
*
|
||||
* see Documentation/locking/lockdep-design.rst for more details.
|
||||
*/
|
||||
#ifndef __LINUX_LOCKDEP_TYPES_H
|
||||
#define __LINUX_LOCKDEP_TYPES_H
|
||||
|
||||
#include <linux/types.h>
|
||||
|
||||
#define MAX_LOCKDEP_SUBCLASSES 8UL
|
||||
|
||||
enum lockdep_wait_type {
|
||||
LD_WAIT_INV = 0, /* not checked, catch all */
|
||||
|
||||
LD_WAIT_FREE, /* wait free, rcu etc.. */
|
||||
LD_WAIT_SPIN, /* spin loops, raw_spinlock_t etc.. */
|
||||
|
||||
#ifdef CONFIG_PROVE_RAW_LOCK_NESTING
|
||||
LD_WAIT_CONFIG, /* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
|
||||
#else
|
||||
LD_WAIT_CONFIG = LD_WAIT_SPIN,
|
||||
#endif
|
||||
LD_WAIT_SLEEP, /* sleeping locks, mutex_t etc.. */
|
||||
|
||||
LD_WAIT_MAX, /* must be last */
|
||||
};
|
||||
|
||||
#ifdef CONFIG_LOCKDEP
|
||||
|
||||
/*
|
||||
* We'd rather not expose kernel/lockdep_states.h this wide, but we do need
|
||||
* the total number of states... :-(
|
||||
*/
|
||||
#define XXX_LOCK_USAGE_STATES (1+2*4)
|
||||
|
||||
/*
|
||||
* NR_LOCKDEP_CACHING_CLASSES ... Number of classes
|
||||
* cached in the instance of lockdep_map
|
||||
*
|
||||
* Currently main class (subclass == 0) and signle depth subclass
|
||||
* are cached in lockdep_map. This optimization is mainly targeting
|
||||
* on rq->lock. double_rq_lock() acquires this highly competitive with
|
||||
* single depth.
|
||||
*/
|
||||
#define NR_LOCKDEP_CACHING_CLASSES 2
|
||||
|
||||
/*
|
||||
* A lockdep key is associated with each lock object. For static locks we use
|
||||
* the lock address itself as the key. Dynamically allocated lock objects can
|
||||
* have a statically or dynamically allocated key. Dynamically allocated lock
|
||||
* keys must be registered before being used and must be unregistered before
|
||||
* the key memory is freed.
|
||||
*/
|
||||
struct lockdep_subclass_key {
|
||||
char __one_byte;
|
||||
} __attribute__ ((__packed__));
|
||||
|
||||
/* hash_entry is used to keep track of dynamically allocated keys. */
|
||||
struct lock_class_key {
|
||||
union {
|
||||
struct hlist_node hash_entry;
|
||||
struct lockdep_subclass_key subkeys[MAX_LOCKDEP_SUBCLASSES];
|
||||
};
|
||||
};
|
||||
|
||||
extern struct lock_class_key __lockdep_no_validate__;
|
||||
|
||||
struct lock_trace;
|
||||
|
||||
#define LOCKSTAT_POINTS 4
|
||||
|
||||
/*
|
||||
* The lock-class itself. The order of the structure members matters.
|
||||
* reinit_class() zeroes the key member and all subsequent members.
|
||||
*/
|
||||
struct lock_class {
|
||||
/*
|
||||
* class-hash:
|
||||
*/
|
||||
struct hlist_node hash_entry;
|
||||
|
||||
/*
|
||||
* Entry in all_lock_classes when in use. Entry in free_lock_classes
|
||||
* when not in use. Instances that are being freed are on one of the
|
||||
* zapped_classes lists.
|
||||
*/
|
||||
struct list_head lock_entry;
|
||||
|
||||
/*
|
||||
* These fields represent a directed graph of lock dependencies,
|
||||
* to every node we attach a list of "forward" and a list of
|
||||
* "backward" graph nodes.
|
||||
*/
|
||||
struct list_head locks_after, locks_before;
|
||||
|
||||
const struct lockdep_subclass_key *key;
|
||||
unsigned int subclass;
|
||||
unsigned int dep_gen_id;
|
||||
|
||||
/*
|
||||
* IRQ/softirq usage tracking bits:
|
||||
*/
|
||||
unsigned long usage_mask;
|
||||
const struct lock_trace *usage_traces[XXX_LOCK_USAGE_STATES];
|
||||
|
||||
/*
|
||||
* Generation counter, when doing certain classes of graph walking,
|
||||
* to ensure that we check one node only once:
|
||||
*/
|
||||
int name_version;
|
||||
const char *name;
|
||||
|
||||
short wait_type_inner;
|
||||
short wait_type_outer;
|
||||
|
||||
#ifdef CONFIG_LOCK_STAT
|
||||
unsigned long contention_point[LOCKSTAT_POINTS];
|
||||
unsigned long contending_point[LOCKSTAT_POINTS];
|
||||
#endif
|
||||
} __no_randomize_layout;
|
||||
|
||||
#ifdef CONFIG_LOCK_STAT
|
||||
struct lock_time {
|
||||
s64 min;
|
||||
s64 max;
|
||||
s64 total;
|
||||
unsigned long nr;
|
||||
};
|
||||
|
||||
enum bounce_type {
|
||||
bounce_acquired_write,
|
||||
bounce_acquired_read,
|
||||
bounce_contended_write,
|
||||
bounce_contended_read,
|
||||
nr_bounce_types,
|
||||
|
||||
bounce_acquired = bounce_acquired_write,
|
||||
bounce_contended = bounce_contended_write,
|
||||
};
|
||||
|
||||
struct lock_class_stats {
|
||||
unsigned long contention_point[LOCKSTAT_POINTS];
|
||||
unsigned long contending_point[LOCKSTAT_POINTS];
|
||||
struct lock_time read_waittime;
|
||||
struct lock_time write_waittime;
|
||||
struct lock_time read_holdtime;
|
||||
struct lock_time write_holdtime;
|
||||
unsigned long bounces[nr_bounce_types];
|
||||
};
|
||||
|
||||
struct lock_class_stats lock_stats(struct lock_class *class);
|
||||
void clear_lock_stats(struct lock_class *class);
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Map the lock object (the lock instance) to the lock-class object.
|
||||
* This is embedded into specific lock instances:
|
||||
*/
|
||||
struct lockdep_map {
|
||||
struct lock_class_key *key;
|
||||
struct lock_class *class_cache[NR_LOCKDEP_CACHING_CLASSES];
|
||||
const char *name;
|
||||
short wait_type_outer; /* can be taken in this context */
|
||||
short wait_type_inner; /* presents this context */
|
||||
#ifdef CONFIG_LOCK_STAT
|
||||
int cpu;
|
||||
unsigned long ip;
|
||||
#endif
|
||||
};
|
||||
|
||||
struct pin_cookie { unsigned int val; };
|
||||
|
||||
#else /* !CONFIG_LOCKDEP */
|
||||
|
||||
/*
|
||||
* The class key takes no space if lockdep is disabled:
|
||||
*/
|
||||
struct lock_class_key { };
|
||||
|
||||
/*
|
||||
* The lockdep_map takes no space if lockdep is disabled:
|
||||
*/
|
||||
struct lockdep_map { };
|
||||
|
||||
struct pin_cookie { };
|
||||
|
||||
#endif /* !LOCKDEP */
|
||||
|
||||
#endif /* __LINUX_LOCKDEP_TYPES_H */
|
|
@ -248,6 +248,8 @@ static inline void __list_splice_init_rcu(struct list_head *list,
|
|||
*/
|
||||
|
||||
sync();
|
||||
ASSERT_EXCLUSIVE_ACCESS(*first);
|
||||
ASSERT_EXCLUSIVE_ACCESS(*last);
|
||||
|
||||
/*
|
||||
* Readers are finished with the source list, so perform splice.
|
||||
|
|
|
@ -60,39 +60,39 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem)
|
|||
}
|
||||
|
||||
#define RWSEM_UNLOCKED_VALUE 0L
|
||||
#define __RWSEM_INIT_COUNT(name) .count = ATOMIC_LONG_INIT(RWSEM_UNLOCKED_VALUE)
|
||||
#define __RWSEM_COUNT_INIT(name) .count = ATOMIC_LONG_INIT(RWSEM_UNLOCKED_VALUE)
|
||||
|
||||
/* Common initializer macros and functions */
|
||||
|
||||
#ifdef CONFIG_DEBUG_LOCK_ALLOC
|
||||
# define __RWSEM_DEP_MAP_INIT(lockname) \
|
||||
, .dep_map = { \
|
||||
.dep_map = { \
|
||||
.name = #lockname, \
|
||||
.wait_type_inner = LD_WAIT_SLEEP, \
|
||||
}
|
||||
},
|
||||
#else
|
||||
# define __RWSEM_DEP_MAP_INIT(lockname)
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_DEBUG_RWSEMS
|
||||
# define __DEBUG_RWSEM_INITIALIZER(lockname) , .magic = &lockname
|
||||
# define __RWSEM_DEBUG_INIT(lockname) .magic = &lockname,
|
||||
#else
|
||||
# define __DEBUG_RWSEM_INITIALIZER(lockname)
|
||||
# define __RWSEM_DEBUG_INIT(lockname)
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
|
||||
#define __RWSEM_OPT_INIT(lockname) , .osq = OSQ_LOCK_UNLOCKED
|
||||
#define __RWSEM_OPT_INIT(lockname) .osq = OSQ_LOCK_UNLOCKED,
|
||||
#else
|
||||
#define __RWSEM_OPT_INIT(lockname)
|
||||
#endif
|
||||
|
||||
#define __RWSEM_INITIALIZER(name) \
|
||||
{ __RWSEM_INIT_COUNT(name), \
|
||||
{ __RWSEM_COUNT_INIT(name), \
|
||||
.owner = ATOMIC_LONG_INIT(0), \
|
||||
.wait_list = LIST_HEAD_INIT((name).wait_list), \
|
||||
.wait_lock = __RAW_SPIN_LOCK_UNLOCKED(name.wait_lock) \
|
||||
__RWSEM_OPT_INIT(name) \
|
||||
__DEBUG_RWSEM_INITIALIZER(name) \
|
||||
.wait_lock = __RAW_SPIN_LOCK_UNLOCKED(name.wait_lock),\
|
||||
.wait_list = LIST_HEAD_INIT((name).wait_list), \
|
||||
__RWSEM_DEBUG_INIT(name) \
|
||||
__RWSEM_DEP_MAP_INIT(name) }
|
||||
|
||||
#define DECLARE_RWSEM(name) \
|
||||
|
|
|
@ -18,6 +18,7 @@
|
|||
#include <linux/mutex.h>
|
||||
#include <linux/plist.h>
|
||||
#include <linux/hrtimer.h>
|
||||
#include <linux/irqflags.h>
|
||||
#include <linux/seccomp.h>
|
||||
#include <linux/nodemask.h>
|
||||
#include <linux/rcupdate.h>
|
||||
|
@ -980,19 +981,9 @@ struct task_struct {
|
|||
#endif
|
||||
|
||||
#ifdef CONFIG_TRACE_IRQFLAGS
|
||||
unsigned int irq_events;
|
||||
struct irqtrace_events irqtrace;
|
||||
unsigned int hardirq_threaded;
|
||||
unsigned long hardirq_enable_ip;
|
||||
unsigned long hardirq_disable_ip;
|
||||
unsigned int hardirq_enable_event;
|
||||
unsigned int hardirq_disable_event;
|
||||
int hardirqs_enabled;
|
||||
int hardirq_context;
|
||||
u64 hardirq_chain_key;
|
||||
unsigned long softirq_disable_ip;
|
||||
unsigned long softirq_enable_ip;
|
||||
unsigned int softirq_disable_event;
|
||||
unsigned int softirq_enable_event;
|
||||
int softirqs_enabled;
|
||||
int softirq_context;
|
||||
int irq_config;
|
||||
|
@ -1193,8 +1184,12 @@ struct task_struct {
|
|||
#ifdef CONFIG_KASAN
|
||||
unsigned int kasan_depth;
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_KCSAN
|
||||
struct kcsan_ctx kcsan_ctx;
|
||||
#ifdef CONFIG_TRACE_IRQFLAGS
|
||||
struct irqtrace_events kcsan_save_irqtrace;
|
||||
#endif
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
|
||||
|
|
|
@ -1,36 +1,15 @@
|
|||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#ifndef __LINUX_SEQLOCK_H
|
||||
#define __LINUX_SEQLOCK_H
|
||||
|
||||
/*
|
||||
* Reader/writer consistent mechanism without starving writers. This type of
|
||||
* lock for data where the reader wants a consistent set of information
|
||||
* and is willing to retry if the information changes. There are two types
|
||||
* of readers:
|
||||
* 1. Sequence readers which never block a writer but they may have to retry
|
||||
* if a writer is in progress by detecting change in sequence number.
|
||||
* Writers do not wait for a sequence reader.
|
||||
* 2. Locking readers which will wait if a writer or another locking reader
|
||||
* is in progress. A locking reader in progress will also block a writer
|
||||
* from going forward. Unlike the regular rwlock, the read lock here is
|
||||
* exclusive so that only one locking reader can get it.
|
||||
* seqcount_t / seqlock_t - a reader-writer consistency mechanism with
|
||||
* lockless readers (read-only retry loops), and no writer starvation.
|
||||
*
|
||||
* This is not as cache friendly as brlock. Also, this may not work well
|
||||
* for data that contains pointers, because any writer could
|
||||
* invalidate a pointer that a reader was following.
|
||||
* See Documentation/locking/seqlock.rst
|
||||
*
|
||||
* Expected non-blocking reader usage:
|
||||
* do {
|
||||
* seq = read_seqbegin(&foo);
|
||||
* ...
|
||||
* } while (read_seqretry(&foo, seq));
|
||||
*
|
||||
*
|
||||
* On non-SMP the spin locks disappear but the writer still needs
|
||||
* to increment the sequence variables because an interrupt routine could
|
||||
* change the state of the data.
|
||||
*
|
||||
* Based on x86_64 vsyscall gettimeofday
|
||||
* by Keith Owens and Andrea Arcangeli
|
||||
* Copyrights:
|
||||
* - Based on x86_64 vsyscall gettimeofday: Keith Owens, Andrea Arcangeli
|
||||
*/
|
||||
|
||||
#include <linux/spinlock.h>
|
||||
|
@ -41,8 +20,8 @@
|
|||
#include <asm/processor.h>
|
||||
|
||||
/*
|
||||
* The seqlock interface does not prescribe a precise sequence of read
|
||||
* begin/retry/end. For readers, typically there is a call to
|
||||
* The seqlock seqcount_t interface does not prescribe a precise sequence of
|
||||
* read begin/retry/end. For readers, typically there is a call to
|
||||
* read_seqcount_begin() and read_seqcount_retry(), however, there are more
|
||||
* esoteric cases which do not follow this pattern.
|
||||
*
|
||||
|
@ -50,16 +29,30 @@
|
|||
* via seqcount_t under KCSAN: upon beginning a seq-reader critical section,
|
||||
* pessimistically mark the next KCSAN_SEQLOCK_REGION_MAX memory accesses as
|
||||
* atomics; if there is a matching read_seqcount_retry() call, no following
|
||||
* memory operations are considered atomic. Usage of seqlocks via seqlock_t
|
||||
* interface is not affected.
|
||||
* memory operations are considered atomic. Usage of the seqlock_t interface
|
||||
* is not affected.
|
||||
*/
|
||||
#define KCSAN_SEQLOCK_REGION_MAX 1000
|
||||
|
||||
/*
|
||||
* Version using sequence counter only.
|
||||
* This can be used when code has its own mutex protecting the
|
||||
* updating starting before the write_seqcountbeqin() and ending
|
||||
* after the write_seqcount_end().
|
||||
* Sequence counters (seqcount_t)
|
||||
*
|
||||
* This is the raw counting mechanism, without any writer protection.
|
||||
*
|
||||
* Write side critical sections must be serialized and non-preemptible.
|
||||
*
|
||||
* If readers can be invoked from hardirq or softirq contexts,
|
||||
* interrupts or bottom halves must also be respectively disabled before
|
||||
* entering the write section.
|
||||
*
|
||||
* This mechanism can't be used if the protected data contains pointers,
|
||||
* as the writer can invalidate a pointer that a reader is following.
|
||||
*
|
||||
* If it's desired to automatically handle the sequence counter writer
|
||||
* serialization and non-preemptibility requirements, use a sequential
|
||||
* lock (seqlock_t) instead.
|
||||
*
|
||||
* See Documentation/locking/seqlock.rst
|
||||
*/
|
||||
typedef struct seqcount {
|
||||
unsigned sequence;
|
||||
|
@ -82,6 +75,10 @@ static inline void __seqcount_init(seqcount_t *s, const char *name,
|
|||
# define SEQCOUNT_DEP_MAP_INIT(lockname) \
|
||||
.dep_map = { .name = #lockname } \
|
||||
|
||||
/**
|
||||
* seqcount_init() - runtime initializer for seqcount_t
|
||||
* @s: Pointer to the seqcount_t instance
|
||||
*/
|
||||
# define seqcount_init(s) \
|
||||
do { \
|
||||
static struct lock_class_key __key; \
|
||||
|
@ -105,13 +102,15 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
|
|||
# define seqcount_lockdep_reader_access(x)
|
||||
#endif
|
||||
|
||||
#define SEQCNT_ZERO(lockname) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(lockname)}
|
||||
|
||||
/**
|
||||
* SEQCNT_ZERO() - static initializer for seqcount_t
|
||||
* @name: Name of the seqcount_t instance
|
||||
*/
|
||||
#define SEQCNT_ZERO(name) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(name) }
|
||||
|
||||
/**
|
||||
* __read_seqcount_begin - begin a seq-read critical section (without barrier)
|
||||
* @s: pointer to seqcount_t
|
||||
* Returns: count to be passed to read_seqcount_retry
|
||||
* __read_seqcount_begin() - begin a seqcount_t read section w/o barrier
|
||||
* @s: Pointer to seqcount_t
|
||||
*
|
||||
* __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
|
||||
* barrier. Callers should ensure that smp_rmb() or equivalent ordering is
|
||||
|
@ -120,6 +119,8 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
|
|||
*
|
||||
* Use carefully, only in critical code, and comment how the barrier is
|
||||
* provided.
|
||||
*
|
||||
* Return: count to be passed to read_seqcount_retry()
|
||||
*/
|
||||
static inline unsigned __read_seqcount_begin(const seqcount_t *s)
|
||||
{
|
||||
|
@ -136,13 +137,40 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
|
|||
}
|
||||
|
||||
/**
|
||||
* raw_read_seqcount - Read the raw seqcount
|
||||
* @s: pointer to seqcount_t
|
||||
* Returns: count to be passed to read_seqcount_retry
|
||||
* raw_read_seqcount_begin() - begin a seqcount_t read section w/o lockdep
|
||||
* @s: Pointer to seqcount_t
|
||||
*
|
||||
* Return: count to be passed to read_seqcount_retry()
|
||||
*/
|
||||
static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
|
||||
{
|
||||
unsigned ret = __read_seqcount_begin(s);
|
||||
smp_rmb();
|
||||
return ret;
|
||||
}
|
||||
|
||||
/**
|
||||
* read_seqcount_begin() - begin a seqcount_t read critical section
|
||||
* @s: Pointer to seqcount_t
|
||||
*
|
||||
* Return: count to be passed to read_seqcount_retry()
|
||||
*/
|
||||
static inline unsigned read_seqcount_begin(const seqcount_t *s)
|
||||
{
|
||||
seqcount_lockdep_reader_access(s);
|
||||
return raw_read_seqcount_begin(s);
|
||||
}
|
||||
|
||||
/**
|
||||
* raw_read_seqcount() - read the raw seqcount_t counter value
|
||||
* @s: Pointer to seqcount_t
|
||||
*
|
||||
* raw_read_seqcount opens a read critical section of the given
|
||||
* seqcount without any lockdep checking and without checking or
|
||||
* masking the LSB. Calling code is responsible for handling that.
|
||||
* seqcount_t, without any lockdep checking, and without checking or
|
||||
* masking the sequence counter LSB. Calling code is responsible for
|
||||
* handling that.
|
||||
*
|
||||
* Return: count to be passed to read_seqcount_retry()
|
||||
*/
|
||||
static inline unsigned raw_read_seqcount(const seqcount_t *s)
|
||||
{
|
||||
|
@ -153,63 +181,35 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
|
|||
}
|
||||
|
||||
/**
|
||||
* raw_read_seqcount_begin - start seq-read critical section w/o lockdep
|
||||
* @s: pointer to seqcount_t
|
||||
* Returns: count to be passed to read_seqcount_retry
|
||||
* raw_seqcount_begin() - begin a seqcount_t read critical section w/o
|
||||
* lockdep and w/o counter stabilization
|
||||
* @s: Pointer to seqcount_t
|
||||
*
|
||||
* raw_read_seqcount_begin opens a read critical section of the given
|
||||
* seqcount, but without any lockdep checking. Validity of the critical
|
||||
* section is tested by checking read_seqcount_retry function.
|
||||
*/
|
||||
static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
|
||||
{
|
||||
unsigned ret = __read_seqcount_begin(s);
|
||||
smp_rmb();
|
||||
return ret;
|
||||
}
|
||||
|
||||
/**
|
||||
* read_seqcount_begin - begin a seq-read critical section
|
||||
* @s: pointer to seqcount_t
|
||||
* Returns: count to be passed to read_seqcount_retry
|
||||
* raw_seqcount_begin opens a read critical section of the given
|
||||
* seqcount_t. Unlike read_seqcount_begin(), this function will not wait
|
||||
* for the count to stabilize. If a writer is active when it begins, it
|
||||
* will fail the read_seqcount_retry() at the end of the read critical
|
||||
* section instead of stabilizing at the beginning of it.
|
||||
*
|
||||
* read_seqcount_begin opens a read critical section of the given seqcount.
|
||||
* Validity of the critical section is tested by checking read_seqcount_retry
|
||||
* function.
|
||||
*/
|
||||
static inline unsigned read_seqcount_begin(const seqcount_t *s)
|
||||
{
|
||||
seqcount_lockdep_reader_access(s);
|
||||
return raw_read_seqcount_begin(s);
|
||||
}
|
||||
|
||||
/**
|
||||
* raw_seqcount_begin - begin a seq-read critical section
|
||||
* @s: pointer to seqcount_t
|
||||
* Returns: count to be passed to read_seqcount_retry
|
||||
* Use this only in special kernel hot paths where the read section is
|
||||
* small and has a high probability of success through other external
|
||||
* means. It will save a single branching instruction.
|
||||
*
|
||||
* raw_seqcount_begin opens a read critical section of the given seqcount.
|
||||
* Validity of the critical section is tested by checking read_seqcount_retry
|
||||
* function.
|
||||
*
|
||||
* Unlike read_seqcount_begin(), this function will not wait for the count
|
||||
* to stabilize. If a writer is active when we begin, we will fail the
|
||||
* read_seqcount_retry() instead of stabilizing at the beginning of the
|
||||
* critical section.
|
||||
* Return: count to be passed to read_seqcount_retry()
|
||||
*/
|
||||
static inline unsigned raw_seqcount_begin(const seqcount_t *s)
|
||||
{
|
||||
unsigned ret = READ_ONCE(s->sequence);
|
||||
smp_rmb();
|
||||
kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
|
||||
return ret & ~1;
|
||||
/*
|
||||
* If the counter is odd, let read_seqcount_retry() fail
|
||||
* by decrementing the counter.
|
||||
*/
|
||||
return raw_read_seqcount(s) & ~1;
|
||||
}
|
||||
|
||||
/**
|
||||
* __read_seqcount_retry - end a seq-read critical section (without barrier)
|
||||
* @s: pointer to seqcount_t
|
||||
* @start: count, from read_seqcount_begin
|
||||
* Returns: 1 if retry is required, else 0
|
||||
* __read_seqcount_retry() - end a seqcount_t read section w/o barrier
|
||||
* @s: Pointer to seqcount_t
|
||||
* @start: count, from read_seqcount_begin()
|
||||
*
|
||||
* __read_seqcount_retry is like read_seqcount_retry, but has no smp_rmb()
|
||||
* barrier. Callers should ensure that smp_rmb() or equivalent ordering is
|
||||
|
@ -218,6 +218,8 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
|
|||
*
|
||||
* Use carefully, only in critical code, and comment how the barrier is
|
||||
* provided.
|
||||
*
|
||||
* Return: true if a read section retry is required, else false
|
||||
*/
|
||||
static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
|
||||
{
|
||||
|
@ -226,14 +228,15 @@ static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
|
|||
}
|
||||
|
||||
/**
|
||||
* read_seqcount_retry - end a seq-read critical section
|
||||
* @s: pointer to seqcount_t
|
||||
* @start: count, from read_seqcount_begin
|
||||
* Returns: 1 if retry is required, else 0
|
||||
* read_seqcount_retry() - end a seqcount_t read critical section
|
||||
* @s: Pointer to seqcount_t
|
||||
* @start: count, from read_seqcount_begin()
|
||||
*
|
||||
* read_seqcount_retry closes a read critical section of the given seqcount.
|
||||
* If the critical section was invalid, it must be ignored (and typically
|
||||
* retried).
|
||||
* read_seqcount_retry closes the read critical section of given
|
||||
* seqcount_t. If the critical section was invalid, it must be ignored
|
||||
* (and typically retried).
|
||||
*
|
||||
* Return: true if a read section retry is required, else false
|
||||
*/
|
||||
static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
|
||||
{
|
||||
|
@ -241,8 +244,10 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
|
|||
return __read_seqcount_retry(s, start);
|
||||
}
|
||||
|
||||
|
||||
|
||||
/**
|
||||
* raw_write_seqcount_begin() - start a seqcount_t write section w/o lockdep
|
||||
* @s: Pointer to seqcount_t
|
||||
*/
|
||||
static inline void raw_write_seqcount_begin(seqcount_t *s)
|
||||
{
|
||||
kcsan_nestable_atomic_begin();
|
||||
|
@ -250,6 +255,10 @@ static inline void raw_write_seqcount_begin(seqcount_t *s)
|
|||
smp_wmb();
|
||||
}
|
||||
|
||||
/**
|
||||
* raw_write_seqcount_end() - end a seqcount_t write section w/o lockdep
|
||||
* @s: Pointer to seqcount_t
|
||||
*/
|
||||
static inline void raw_write_seqcount_end(seqcount_t *s)
|
||||
{
|
||||
smp_wmb();
|
||||
|
@ -257,20 +266,79 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
|
|||
kcsan_nestable_atomic_end();
|
||||
}
|
||||
|
||||
static inline void __write_seqcount_begin_nested(seqcount_t *s, int subclass)
|
||||
{
|
||||
raw_write_seqcount_begin(s);
|
||||
seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
|
||||
}
|
||||
|
||||
/**
|
||||
* raw_write_seqcount_barrier - do a seq write barrier
|
||||
* @s: pointer to seqcount_t
|
||||
* write_seqcount_begin_nested() - start a seqcount_t write section with
|
||||
* custom lockdep nesting level
|
||||
* @s: Pointer to seqcount_t
|
||||
* @subclass: lockdep nesting level
|
||||
*
|
||||
* This can be used to provide an ordering guarantee instead of the
|
||||
* usual consistency guarantee. It is one wmb cheaper, because we can
|
||||
* collapse the two back-to-back wmb()s.
|
||||
* See Documentation/locking/lockdep-design.rst
|
||||
*/
|
||||
static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
|
||||
{
|
||||
lockdep_assert_preemption_disabled();
|
||||
__write_seqcount_begin_nested(s, subclass);
|
||||
}
|
||||
|
||||
/*
|
||||
* A write_seqcount_begin() variant w/o lockdep non-preemptibility checks.
|
||||
*
|
||||
* Use for internal seqlock.h code where it's known that preemption is
|
||||
* already disabled. For example, seqlock_t write side functions.
|
||||
*/
|
||||
static inline void __write_seqcount_begin(seqcount_t *s)
|
||||
{
|
||||
__write_seqcount_begin_nested(s, 0);
|
||||
}
|
||||
|
||||
/**
|
||||
* write_seqcount_begin() - start a seqcount_t write side critical section
|
||||
* @s: Pointer to seqcount_t
|
||||
*
|
||||
* write_seqcount_begin opens a write side critical section of the given
|
||||
* seqcount_t.
|
||||
*
|
||||
* Context: seqcount_t write side critical sections must be serialized and
|
||||
* non-preemptible. If readers can be invoked from hardirq or softirq
|
||||
* context, interrupts or bottom halves must be respectively disabled.
|
||||
*/
|
||||
static inline void write_seqcount_begin(seqcount_t *s)
|
||||
{
|
||||
write_seqcount_begin_nested(s, 0);
|
||||
}
|
||||
|
||||
/**
|
||||
* write_seqcount_end() - end a seqcount_t write side critical section
|
||||
* @s: Pointer to seqcount_t
|
||||
*
|
||||
* The write section must've been opened with write_seqcount_begin().
|
||||
*/
|
||||
static inline void write_seqcount_end(seqcount_t *s)
|
||||
{
|
||||
seqcount_release(&s->dep_map, _RET_IP_);
|
||||
raw_write_seqcount_end(s);
|
||||
}
|
||||
|
||||
/**
|
||||
* raw_write_seqcount_barrier() - do a seqcount_t write barrier
|
||||
* @s: Pointer to seqcount_t
|
||||
*
|
||||
* This can be used to provide an ordering guarantee instead of the usual
|
||||
* consistency guarantee. It is one wmb cheaper, because it can collapse
|
||||
* the two back-to-back wmb()s.
|
||||
*
|
||||
* Note that writes surrounding the barrier should be declared atomic (e.g.
|
||||
* via WRITE_ONCE): a) to ensure the writes become visible to other threads
|
||||
* atomically, avoiding compiler optimizations; b) to document which writes are
|
||||
* meant to propagate to the reader critical section. This is necessary because
|
||||
* neither writes before and after the barrier are enclosed in a seq-writer
|
||||
* critical section that would ensure readers are aware of ongoing writes.
|
||||
* critical section that would ensure readers are aware of ongoing writes::
|
||||
*
|
||||
* seqcount_t seq;
|
||||
* bool X = true, Y = false;
|
||||
|
@ -307,6 +375,37 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s)
|
|||
kcsan_nestable_atomic_end();
|
||||
}
|
||||
|
||||
/**
|
||||
* write_seqcount_invalidate() - invalidate in-progress seqcount_t read
|
||||
* side operations
|
||||
* @s: Pointer to seqcount_t
|
||||
*
|
||||
* After write_seqcount_invalidate, no seqcount_t read side operations
|
||||
* will complete successfully and see data older than this.
|
||||
*/
|
||||
static inline void write_seqcount_invalidate(seqcount_t *s)
|
||||
{
|
||||
smp_wmb();
|
||||
kcsan_nestable_atomic_begin();
|
||||
s->sequence+=2;
|
||||
kcsan_nestable_atomic_end();
|
||||
}
|
||||
|
||||
/**
|
||||
* raw_read_seqcount_latch() - pick even/odd seqcount_t latch data copy
|
||||
* @s: Pointer to seqcount_t
|
||||
*
|
||||
* Use seqcount_t latching to switch between two storage places protected
|
||||
* by a sequence counter. Doing so allows having interruptible, preemptible,
|
||||
* seqcount_t write side critical sections.
|
||||
*
|
||||
* Check raw_write_seqcount_latch() for more details and a full reader and
|
||||
* writer usage example.
|
||||
*
|
||||
* Return: sequence counter raw value. Use the lowest bit as an index for
|
||||
* picking which data copy to read. The full counter value must then be
|
||||
* checked with read_seqcount_retry().
|
||||
*/
|
||||
static inline int raw_read_seqcount_latch(seqcount_t *s)
|
||||
{
|
||||
/* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */
|
||||
|
@ -315,8 +414,8 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
|
|||
}
|
||||
|
||||
/**
|
||||
* raw_write_seqcount_latch - redirect readers to even/odd copy
|
||||
* @s: pointer to seqcount_t
|
||||
* raw_write_seqcount_latch() - redirect readers to even/odd copy
|
||||
* @s: Pointer to seqcount_t
|
||||
*
|
||||
* The latch technique is a multiversion concurrency control method that allows
|
||||
* queries during non-atomic modifications. If you can guarantee queries never
|
||||
|
@ -332,7 +431,7 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
|
|||
* Very simply put: we first modify one copy and then the other. This ensures
|
||||
* there is always one copy in a stable state, ready to give us an answer.
|
||||
*
|
||||
* The basic form is a data structure like:
|
||||
* The basic form is a data structure like::
|
||||
*
|
||||
* struct latch_struct {
|
||||
* seqcount_t seq;
|
||||
|
@ -340,24 +439,24 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
|
|||
* };
|
||||
*
|
||||
* Where a modification, which is assumed to be externally serialized, does the
|
||||
* following:
|
||||
* following::
|
||||
*
|
||||
* void latch_modify(struct latch_struct *latch, ...)
|
||||
* {
|
||||
* smp_wmb(); <- Ensure that the last data[1] update is visible
|
||||
* smp_wmb(); // Ensure that the last data[1] update is visible
|
||||
* latch->seq++;
|
||||
* smp_wmb(); <- Ensure that the seqcount update is visible
|
||||
* smp_wmb(); // Ensure that the seqcount update is visible
|
||||
*
|
||||
* modify(latch->data[0], ...);
|
||||
*
|
||||
* smp_wmb(); <- Ensure that the data[0] update is visible
|
||||
* smp_wmb(); // Ensure that the data[0] update is visible
|
||||
* latch->seq++;
|
||||
* smp_wmb(); <- Ensure that the seqcount update is visible
|
||||
* smp_wmb(); // Ensure that the seqcount update is visible
|
||||
*
|
||||
* modify(latch->data[1], ...);
|
||||
* }
|
||||
*
|
||||
* The query will have a form like:
|
||||
* The query will have a form like::
|
||||
*
|
||||
* struct entry *latch_query(struct latch_struct *latch, ...)
|
||||
* {
|
||||
|
@ -370,8 +469,8 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
|
|||
* idx = seq & 0x01;
|
||||
* entry = data_query(latch->data[idx], ...);
|
||||
*
|
||||
* smp_rmb();
|
||||
* } while (seq != latch->seq);
|
||||
* // read_seqcount_retry() includes needed smp_rmb()
|
||||
* } while (read_seqcount_retry(&latch->seq, seq));
|
||||
*
|
||||
* return entry;
|
||||
* }
|
||||
|
@ -380,7 +479,9 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
|
|||
* modify data[0]. When that is complete, we redirect queries back to data[0]
|
||||
* and we can modify data[1].
|
||||
*
|
||||
* NOTE: The non-requirement for atomic modifications does _NOT_ include
|
||||
* NOTE:
|
||||
*
|
||||
* The non-requirement for atomic modifications does _NOT_ include
|
||||
* the publishing of new entries in the case where data is a dynamic
|
||||
* data structure.
|
||||
*
|
||||
|
@ -388,7 +489,9 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
|
|||
* to miss an entire modification sequence, once it resumes it might
|
||||
* observe the new entry.
|
||||
*
|
||||
* NOTE: When data is a dynamic data structure; one should use regular RCU
|
||||
* NOTE:
|
||||
*
|
||||
* When data is a dynamic data structure; one should use regular RCU
|
||||
* patterns to manage the lifetimes of the objects within.
|
||||
*/
|
||||
static inline void raw_write_seqcount_latch(seqcount_t *s)
|
||||
|
@ -399,67 +502,48 @@ static inline void raw_write_seqcount_latch(seqcount_t *s)
|
|||
}
|
||||
|
||||
/*
|
||||
* Sequence counter only version assumes that callers are using their
|
||||
* own mutexing.
|
||||
*/
|
||||
static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
|
||||
{
|
||||
raw_write_seqcount_begin(s);
|
||||
seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
|
||||
}
|
||||
|
||||
static inline void write_seqcount_begin(seqcount_t *s)
|
||||
{
|
||||
write_seqcount_begin_nested(s, 0);
|
||||
}
|
||||
|
||||
static inline void write_seqcount_end(seqcount_t *s)
|
||||
{
|
||||
seqcount_release(&s->dep_map, _RET_IP_);
|
||||
raw_write_seqcount_end(s);
|
||||
}
|
||||
|
||||
/**
|
||||
* write_seqcount_invalidate - invalidate in-progress read-side seq operations
|
||||
* @s: pointer to seqcount_t
|
||||
* Sequential locks (seqlock_t)
|
||||
*
|
||||
* After write_seqcount_invalidate, no read-side seq operations will complete
|
||||
* successfully and see data older than this.
|
||||
* Sequence counters with an embedded spinlock for writer serialization
|
||||
* and non-preemptibility.
|
||||
*
|
||||
* For more info, see:
|
||||
* - Comments on top of seqcount_t
|
||||
* - Documentation/locking/seqlock.rst
|
||||
*/
|
||||
static inline void write_seqcount_invalidate(seqcount_t *s)
|
||||
{
|
||||
smp_wmb();
|
||||
kcsan_nestable_atomic_begin();
|
||||
s->sequence+=2;
|
||||
kcsan_nestable_atomic_end();
|
||||
}
|
||||
|
||||
typedef struct {
|
||||
struct seqcount seqcount;
|
||||
spinlock_t lock;
|
||||
} seqlock_t;
|
||||
|
||||
/*
|
||||
* These macros triggered gcc-3.x compile-time problems. We think these are
|
||||
* OK now. Be cautious.
|
||||
*/
|
||||
#define __SEQLOCK_UNLOCKED(lockname) \
|
||||
{ \
|
||||
.seqcount = SEQCNT_ZERO(lockname), \
|
||||
.lock = __SPIN_LOCK_UNLOCKED(lockname) \
|
||||
}
|
||||
|
||||
#define seqlock_init(x) \
|
||||
/**
|
||||
* seqlock_init() - dynamic initializer for seqlock_t
|
||||
* @sl: Pointer to the seqlock_t instance
|
||||
*/
|
||||
#define seqlock_init(sl) \
|
||||
do { \
|
||||
seqcount_init(&(x)->seqcount); \
|
||||
spin_lock_init(&(x)->lock); \
|
||||
seqcount_init(&(sl)->seqcount); \
|
||||
spin_lock_init(&(sl)->lock); \
|
||||
} while (0)
|
||||
|
||||
#define DEFINE_SEQLOCK(x) \
|
||||
seqlock_t x = __SEQLOCK_UNLOCKED(x)
|
||||
/**
|
||||
* DEFINE_SEQLOCK() - Define a statically allocated seqlock_t
|
||||
* @sl: Name of the seqlock_t instance
|
||||
*/
|
||||
#define DEFINE_SEQLOCK(sl) \
|
||||
seqlock_t sl = __SEQLOCK_UNLOCKED(sl)
|
||||
|
||||
/*
|
||||
* Read side functions for starting and finalizing a read side section.
|
||||
/**
|
||||
* read_seqbegin() - start a seqlock_t read side critical section
|
||||
* @sl: Pointer to seqlock_t
|
||||
*
|
||||
* Return: count, to be passed to read_seqretry()
|
||||
*/
|
||||
static inline unsigned read_seqbegin(const seqlock_t *sl)
|
||||
{
|
||||
|
@ -470,6 +554,17 @@ static inline unsigned read_seqbegin(const seqlock_t *sl)
|
|||
return ret;
|
||||
}
|
||||
|
||||
/**
|
||||
* read_seqretry() - end a seqlock_t read side section
|
||||
* @sl: Pointer to seqlock_t
|
||||
* @start: count, from read_seqbegin()
|
||||
*
|
||||
* read_seqretry closes the read side critical section of given seqlock_t.
|
||||
* If the critical section was invalid, it must be ignored (and typically
|
||||
* retried).
|
||||
*
|
||||
* Return: true if a read section retry is required, else false
|
||||
*/
|
||||
static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
|
||||
{
|
||||
/*
|
||||
|
@ -481,41 +576,85 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
|
|||
return read_seqcount_retry(&sl->seqcount, start);
|
||||
}
|
||||
|
||||
/*
|
||||
* Lock out other writers and update the count.
|
||||
* Acts like a normal spin_lock/unlock.
|
||||
* Don't need preempt_disable() because that is in the spin_lock already.
|
||||
/**
|
||||
* write_seqlock() - start a seqlock_t write side critical section
|
||||
* @sl: Pointer to seqlock_t
|
||||
*
|
||||
* write_seqlock opens a write side critical section for the given
|
||||
* seqlock_t. It also implicitly acquires the spinlock_t embedded inside
|
||||
* that sequential lock. All seqlock_t write side sections are thus
|
||||
* automatically serialized and non-preemptible.
|
||||
*
|
||||
* Context: if the seqlock_t read section, or other write side critical
|
||||
* sections, can be invoked from hardirq or softirq contexts, use the
|
||||
* _irqsave or _bh variants of this function instead.
|
||||
*/
|
||||
static inline void write_seqlock(seqlock_t *sl)
|
||||
{
|
||||
spin_lock(&sl->lock);
|
||||
write_seqcount_begin(&sl->seqcount);
|
||||
__write_seqcount_begin(&sl->seqcount);
|
||||
}
|
||||
|
||||
/**
|
||||
* write_sequnlock() - end a seqlock_t write side critical section
|
||||
* @sl: Pointer to seqlock_t
|
||||
*
|
||||
* write_sequnlock closes the (serialized and non-preemptible) write side
|
||||
* critical section of given seqlock_t.
|
||||
*/
|
||||
static inline void write_sequnlock(seqlock_t *sl)
|
||||
{
|
||||
write_seqcount_end(&sl->seqcount);
|
||||
spin_unlock(&sl->lock);
|
||||
}
|
||||
|
||||
/**
|
||||
* write_seqlock_bh() - start a softirqs-disabled seqlock_t write section
|
||||
* @sl: Pointer to seqlock_t
|
||||
*
|
||||
* _bh variant of write_seqlock(). Use only if the read side section, or
|
||||
* other write side sections, can be invoked from softirq contexts.
|
||||
*/
|
||||
static inline void write_seqlock_bh(seqlock_t *sl)
|
||||
{
|
||||
spin_lock_bh(&sl->lock);
|
||||
write_seqcount_begin(&sl->seqcount);
|
||||
__write_seqcount_begin(&sl->seqcount);
|
||||
}
|
||||
|
||||
/**
|
||||
* write_sequnlock_bh() - end a softirqs-disabled seqlock_t write section
|
||||
* @sl: Pointer to seqlock_t
|
||||
*
|
||||
* write_sequnlock_bh closes the serialized, non-preemptible, and
|
||||
* softirqs-disabled, seqlock_t write side critical section opened with
|
||||
* write_seqlock_bh().
|
||||
*/
|
||||
static inline void write_sequnlock_bh(seqlock_t *sl)
|
||||
{
|
||||
write_seqcount_end(&sl->seqcount);
|
||||
spin_unlock_bh(&sl->lock);
|
||||
}
|
||||
|
||||
/**
|
||||
* write_seqlock_irq() - start a non-interruptible seqlock_t write section
|
||||
* @sl: Pointer to seqlock_t
|
||||
*
|
||||
* _irq variant of write_seqlock(). Use only if the read side section, or
|
||||
* other write sections, can be invoked from hardirq contexts.
|
||||
*/
|
||||
static inline void write_seqlock_irq(seqlock_t *sl)
|
||||
{
|
||||
spin_lock_irq(&sl->lock);
|
||||
write_seqcount_begin(&sl->seqcount);
|
||||
__write_seqcount_begin(&sl->seqcount);
|
||||
}
|
||||
|
||||
/**
|
||||
* write_sequnlock_irq() - end a non-interruptible seqlock_t write section
|
||||
* @sl: Pointer to seqlock_t
|
||||
*
|
||||
* write_sequnlock_irq closes the serialized and non-interruptible
|
||||
* seqlock_t write side section opened with write_seqlock_irq().
|
||||
*/
|
||||
static inline void write_sequnlock_irq(seqlock_t *sl)
|
||||
{
|
||||
write_seqcount_end(&sl->seqcount);
|
||||
|
@ -527,13 +666,32 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
|
|||
unsigned long flags;
|
||||
|
||||
spin_lock_irqsave(&sl->lock, flags);
|
||||
write_seqcount_begin(&sl->seqcount);
|
||||
__write_seqcount_begin(&sl->seqcount);
|
||||
return flags;
|
||||
}
|
||||
|
||||
/**
|
||||
* write_seqlock_irqsave() - start a non-interruptible seqlock_t write
|
||||
* section
|
||||
* @lock: Pointer to seqlock_t
|
||||
* @flags: Stack-allocated storage for saving caller's local interrupt
|
||||
* state, to be passed to write_sequnlock_irqrestore().
|
||||
*
|
||||
* _irqsave variant of write_seqlock(). Use it only if the read side
|
||||
* section, or other write sections, can be invoked from hardirq context.
|
||||
*/
|
||||
#define write_seqlock_irqsave(lock, flags) \
|
||||
do { flags = __write_seqlock_irqsave(lock); } while (0)
|
||||
|
||||
/**
|
||||
* write_sequnlock_irqrestore() - end non-interruptible seqlock_t write
|
||||
* section
|
||||
* @sl: Pointer to seqlock_t
|
||||
* @flags: Caller's saved interrupt state, from write_seqlock_irqsave()
|
||||
*
|
||||
* write_sequnlock_irqrestore closes the serialized and non-interruptible
|
||||
* seqlock_t write section previously opened with write_seqlock_irqsave().
|
||||
*/
|
||||
static inline void
|
||||
write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
|
||||
{
|
||||
|
@ -541,65 +699,79 @@ write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
|
|||
spin_unlock_irqrestore(&sl->lock, flags);
|
||||
}
|
||||
|
||||
/*
|
||||
* A locking reader exclusively locks out other writers and locking readers,
|
||||
* but doesn't update the sequence number. Acts like a normal spin_lock/unlock.
|
||||
* Don't need preempt_disable() because that is in the spin_lock already.
|
||||
/**
|
||||
* read_seqlock_excl() - begin a seqlock_t locking reader section
|
||||
* @sl: Pointer to seqlock_t
|
||||
*
|
||||
* read_seqlock_excl opens a seqlock_t locking reader critical section. A
|
||||
* locking reader exclusively locks out *both* other writers *and* other
|
||||
* locking readers, but it does not update the embedded sequence number.
|
||||
*
|
||||
* Locking readers act like a normal spin_lock()/spin_unlock().
|
||||
*
|
||||
* Context: if the seqlock_t write section, *or other read sections*, can
|
||||
* be invoked from hardirq or softirq contexts, use the _irqsave or _bh
|
||||
* variant of this function instead.
|
||||
*
|
||||
* The opened read section must be closed with read_sequnlock_excl().
|
||||
*/
|
||||
static inline void read_seqlock_excl(seqlock_t *sl)
|
||||
{
|
||||
spin_lock(&sl->lock);
|
||||
}
|
||||
|
||||
/**
|
||||
* read_sequnlock_excl() - end a seqlock_t locking reader critical section
|
||||
* @sl: Pointer to seqlock_t
|
||||
*/
|
||||
static inline void read_sequnlock_excl(seqlock_t *sl)
|
||||
{
|
||||
spin_unlock(&sl->lock);
|
||||
}
|
||||
|
||||
/**
|
||||
* read_seqbegin_or_lock - begin a sequence number check or locking block
|
||||
* @lock: sequence lock
|
||||
* @seq : sequence number to be checked
|
||||
* read_seqlock_excl_bh() - start a seqlock_t locking reader section with
|
||||
* softirqs disabled
|
||||
* @sl: Pointer to seqlock_t
|
||||
*
|
||||
* First try it once optimistically without taking the lock. If that fails,
|
||||
* take the lock. The sequence number is also used as a marker for deciding
|
||||
* whether to be a reader (even) or writer (odd).
|
||||
* N.B. seq must be initialized to an even number to begin with.
|
||||
* _bh variant of read_seqlock_excl(). Use this variant only if the
|
||||
* seqlock_t write side section, *or other read sections*, can be invoked
|
||||
* from softirq contexts.
|
||||
*/
|
||||
static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
|
||||
{
|
||||
if (!(*seq & 1)) /* Even */
|
||||
*seq = read_seqbegin(lock);
|
||||
else /* Odd */
|
||||
read_seqlock_excl(lock);
|
||||
}
|
||||
|
||||
static inline int need_seqretry(seqlock_t *lock, int seq)
|
||||
{
|
||||
return !(seq & 1) && read_seqretry(lock, seq);
|
||||
}
|
||||
|
||||
static inline void done_seqretry(seqlock_t *lock, int seq)
|
||||
{
|
||||
if (seq & 1)
|
||||
read_sequnlock_excl(lock);
|
||||
}
|
||||
|
||||
static inline void read_seqlock_excl_bh(seqlock_t *sl)
|
||||
{
|
||||
spin_lock_bh(&sl->lock);
|
||||
}
|
||||
|
||||
/**
|
||||
* read_sequnlock_excl_bh() - stop a seqlock_t softirq-disabled locking
|
||||
* reader section
|
||||
* @sl: Pointer to seqlock_t
|
||||
*/
|
||||
static inline void read_sequnlock_excl_bh(seqlock_t *sl)
|
||||
{
|
||||
spin_unlock_bh(&sl->lock);
|
||||
}
|
||||
|
||||
/**
|
||||
* read_seqlock_excl_irq() - start a non-interruptible seqlock_t locking
|
||||
* reader section
|
||||
* @sl: Pointer to seqlock_t
|
||||
*
|
||||
* _irq variant of read_seqlock_excl(). Use this only if the seqlock_t
|
||||
* write side section, *or other read sections*, can be invoked from a
|
||||
* hardirq context.
|
||||
*/
|
||||
static inline void read_seqlock_excl_irq(seqlock_t *sl)
|
||||
{
|
||||
spin_lock_irq(&sl->lock);
|
||||
}
|
||||
|
||||
/**
|
||||
* read_sequnlock_excl_irq() - end an interrupts-disabled seqlock_t
|
||||
* locking reader section
|
||||
* @sl: Pointer to seqlock_t
|
||||
*/
|
||||
static inline void read_sequnlock_excl_irq(seqlock_t *sl)
|
||||
{
|
||||
spin_unlock_irq(&sl->lock);
|
||||
|
@ -613,15 +785,117 @@ static inline unsigned long __read_seqlock_excl_irqsave(seqlock_t *sl)
|
|||
return flags;
|
||||
}
|
||||
|
||||
/**
|
||||
* read_seqlock_excl_irqsave() - start a non-interruptible seqlock_t
|
||||
* locking reader section
|
||||
* @lock: Pointer to seqlock_t
|
||||
* @flags: Stack-allocated storage for saving caller's local interrupt
|
||||
* state, to be passed to read_sequnlock_excl_irqrestore().
|
||||
*
|
||||
* _irqsave variant of read_seqlock_excl(). Use this only if the seqlock_t
|
||||
* write side section, *or other read sections*, can be invoked from a
|
||||
* hardirq context.
|
||||
*/
|
||||
#define read_seqlock_excl_irqsave(lock, flags) \
|
||||
do { flags = __read_seqlock_excl_irqsave(lock); } while (0)
|
||||
|
||||
/**
|
||||
* read_sequnlock_excl_irqrestore() - end non-interruptible seqlock_t
|
||||
* locking reader section
|
||||
* @sl: Pointer to seqlock_t
|
||||
* @flags: Caller saved interrupt state, from read_seqlock_excl_irqsave()
|
||||
*/
|
||||
static inline void
|
||||
read_sequnlock_excl_irqrestore(seqlock_t *sl, unsigned long flags)
|
||||
{
|
||||
spin_unlock_irqrestore(&sl->lock, flags);
|
||||
}
|
||||
|
||||
/**
|
||||
* read_seqbegin_or_lock() - begin a seqlock_t lockless or locking reader
|
||||
* @lock: Pointer to seqlock_t
|
||||
* @seq : Marker and return parameter. If the passed value is even, the
|
||||
* reader will become a *lockless* seqlock_t reader as in read_seqbegin().
|
||||
* If the passed value is odd, the reader will become a *locking* reader
|
||||
* as in read_seqlock_excl(). In the first call to this function, the
|
||||
* caller *must* initialize and pass an even value to @seq; this way, a
|
||||
* lockless read can be optimistically tried first.
|
||||
*
|
||||
* read_seqbegin_or_lock is an API designed to optimistically try a normal
|
||||
* lockless seqlock_t read section first. If an odd counter is found, the
|
||||
* lockless read trial has failed, and the next read iteration transforms
|
||||
* itself into a full seqlock_t locking reader.
|
||||
*
|
||||
* This is typically used to avoid seqlock_t lockless readers starvation
|
||||
* (too much retry loops) in the case of a sharp spike in write side
|
||||
* activity.
|
||||
*
|
||||
* Context: if the seqlock_t write section, *or other read sections*, can
|
||||
* be invoked from hardirq or softirq contexts, use the _irqsave or _bh
|
||||
* variant of this function instead.
|
||||
*
|
||||
* Check Documentation/locking/seqlock.rst for template example code.
|
||||
*
|
||||
* Return: the encountered sequence counter value, through the @seq
|
||||
* parameter, which is overloaded as a return parameter. This returned
|
||||
* value must be checked with need_seqretry(). If the read section need to
|
||||
* be retried, this returned value must also be passed as the @seq
|
||||
* parameter of the next read_seqbegin_or_lock() iteration.
|
||||
*/
|
||||
static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
|
||||
{
|
||||
if (!(*seq & 1)) /* Even */
|
||||
*seq = read_seqbegin(lock);
|
||||
else /* Odd */
|
||||
read_seqlock_excl(lock);
|
||||
}
|
||||
|
||||
/**
|
||||
* need_seqretry() - validate seqlock_t "locking or lockless" read section
|
||||
* @lock: Pointer to seqlock_t
|
||||
* @seq: sequence count, from read_seqbegin_or_lock()
|
||||
*
|
||||
* Return: true if a read section retry is required, false otherwise
|
||||
*/
|
||||
static inline int need_seqretry(seqlock_t *lock, int seq)
|
||||
{
|
||||
return !(seq & 1) && read_seqretry(lock, seq);
|
||||
}
|
||||
|
||||
/**
|
||||
* done_seqretry() - end seqlock_t "locking or lockless" reader section
|
||||
* @lock: Pointer to seqlock_t
|
||||
* @seq: count, from read_seqbegin_or_lock()
|
||||
*
|
||||
* done_seqretry finishes the seqlock_t read side critical section started
|
||||
* with read_seqbegin_or_lock() and validated by need_seqretry().
|
||||
*/
|
||||
static inline void done_seqretry(seqlock_t *lock, int seq)
|
||||
{
|
||||
if (seq & 1)
|
||||
read_sequnlock_excl(lock);
|
||||
}
|
||||
|
||||
/**
|
||||
* read_seqbegin_or_lock_irqsave() - begin a seqlock_t lockless reader, or
|
||||
* a non-interruptible locking reader
|
||||
* @lock: Pointer to seqlock_t
|
||||
* @seq: Marker and return parameter. Check read_seqbegin_or_lock().
|
||||
*
|
||||
* This is the _irqsave variant of read_seqbegin_or_lock(). Use it only if
|
||||
* the seqlock_t write section, *or other read sections*, can be invoked
|
||||
* from hardirq context.
|
||||
*
|
||||
* Note: Interrupts will be disabled only for "locking reader" mode.
|
||||
*
|
||||
* Return:
|
||||
*
|
||||
* 1. The saved local interrupts state in case of a locking reader, to
|
||||
* be passed to done_seqretry_irqrestore().
|
||||
*
|
||||
* 2. The encountered sequence counter value, returned through @seq
|
||||
* overloaded as a return parameter. Check read_seqbegin_or_lock().
|
||||
*/
|
||||
static inline unsigned long
|
||||
read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq)
|
||||
{
|
||||
|
@ -635,6 +909,18 @@ read_seqbegin_or_lock_irqsave(seqlock_t *lock, int *seq)
|
|||
return flags;
|
||||
}
|
||||
|
||||
/**
|
||||
* done_seqretry_irqrestore() - end a seqlock_t lockless reader, or a
|
||||
* non-interruptible locking reader section
|
||||
* @lock: Pointer to seqlock_t
|
||||
* @seq: Count, from read_seqbegin_or_lock_irqsave()
|
||||
* @flags: Caller's saved local interrupt state in case of a locking
|
||||
* reader, also from read_seqbegin_or_lock_irqsave()
|
||||
*
|
||||
* This is the _irqrestore variant of done_seqretry(). The read section
|
||||
* must've been opened with read_seqbegin_or_lock_irqsave(), and validated
|
||||
* by need_seqretry().
|
||||
*/
|
||||
static inline void
|
||||
done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags)
|
||||
{
|
||||
|
|
|
@ -56,6 +56,7 @@
|
|||
#include <linux/kernel.h>
|
||||
#include <linux/stringify.h>
|
||||
#include <linux/bottom_half.h>
|
||||
#include <linux/lockdep.h>
|
||||
#include <asm/barrier.h>
|
||||
#include <asm/mmiowb.h>
|
||||
|
||||
|
|
|
@ -15,7 +15,7 @@
|
|||
# include <linux/spinlock_types_up.h>
|
||||
#endif
|
||||
|
||||
#include <linux/lockdep.h>
|
||||
#include <linux/lockdep_types.h>
|
||||
|
||||
typedef struct raw_spinlock {
|
||||
arch_spinlock_t raw_lock;
|
||||
|
|
|
@ -167,6 +167,8 @@ typedef struct {
|
|||
int counter;
|
||||
} atomic_t;
|
||||
|
||||
#define ATOMIC_INIT(i) { (i) }
|
||||
|
||||
#ifdef CONFIG_64BIT
|
||||
typedef struct {
|
||||
s64 counter;
|
||||
|
|
|
@ -359,7 +359,13 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
|
|||
struct vm_area_struct *new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
|
||||
|
||||
if (new) {
|
||||
*new = *orig;
|
||||
ASSERT_EXCLUSIVE_WRITER(orig->vm_flags);
|
||||
ASSERT_EXCLUSIVE_WRITER(orig->vm_file);
|
||||
/*
|
||||
* orig->shared.rb may be modified concurrently, but the clone
|
||||
* will be reinitialized.
|
||||
*/
|
||||
*new = data_race(*orig);
|
||||
INIT_LIST_HEAD(&new->anon_vma_chain);
|
||||
new->vm_next = new->vm_prev = NULL;
|
||||
}
|
||||
|
@ -1954,8 +1960,8 @@ static __latent_entropy struct task_struct *copy_process(
|
|||
|
||||
rt_mutex_init_task(p);
|
||||
|
||||
lockdep_assert_irqs_enabled();
|
||||
#ifdef CONFIG_PROVE_LOCKING
|
||||
DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled);
|
||||
DEBUG_LOCKS_WARN_ON(!p->softirqs_enabled);
|
||||
#endif
|
||||
retval = -EAGAIN;
|
||||
|
@ -2035,18 +2041,10 @@ static __latent_entropy struct task_struct *copy_process(
|
|||
seqcount_init(&p->mems_allowed_seq);
|
||||
#endif
|
||||
#ifdef CONFIG_TRACE_IRQFLAGS
|
||||
p->irq_events = 0;
|
||||
p->hardirqs_enabled = 0;
|
||||
p->hardirq_enable_ip = 0;
|
||||
p->hardirq_enable_event = 0;
|
||||
p->hardirq_disable_ip = _THIS_IP_;
|
||||
p->hardirq_disable_event = 0;
|
||||
memset(&p->irqtrace, 0, sizeof(p->irqtrace));
|
||||
p->irqtrace.hardirq_disable_ip = _THIS_IP_;
|
||||
p->irqtrace.softirq_enable_ip = _THIS_IP_;
|
||||
p->softirqs_enabled = 1;
|
||||
p->softirq_enable_ip = _THIS_IP_;
|
||||
p->softirq_enable_event = 0;
|
||||
p->softirq_disable_ip = 0;
|
||||
p->softirq_disable_event = 0;
|
||||
p->hardirq_context = 0;
|
||||
p->softirq_context = 0;
|
||||
#endif
|
||||
|
||||
|
|
114
kernel/futex.c
114
kernel/futex.c
|
@ -32,30 +32,13 @@
|
|||
* "But they come in a choice of three flavours!"
|
||||
*/
|
||||
#include <linux/compat.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/poll.h>
|
||||
#include <linux/fs.h>
|
||||
#include <linux/file.h>
|
||||
#include <linux/jhash.h>
|
||||
#include <linux/init.h>
|
||||
#include <linux/futex.h>
|
||||
#include <linux/mount.h>
|
||||
#include <linux/pagemap.h>
|
||||
#include <linux/syscalls.h>
|
||||
#include <linux/signal.h>
|
||||
#include <linux/export.h>
|
||||
#include <linux/magic.h>
|
||||
#include <linux/pid.h>
|
||||
#include <linux/nsproxy.h>
|
||||
#include <linux/ptrace.h>
|
||||
#include <linux/sched/rt.h>
|
||||
#include <linux/sched/wake_q.h>
|
||||
#include <linux/sched/mm.h>
|
||||
#include <linux/hugetlb.h>
|
||||
#include <linux/freezer.h>
|
||||
#include <linux/memblock.h>
|
||||
#include <linux/fault-inject.h>
|
||||
#include <linux/refcount.h>
|
||||
|
||||
#include <asm/futex.h>
|
||||
|
||||
|
@ -476,7 +459,7 @@ static u64 get_inode_sequence_number(struct inode *inode)
|
|||
/**
|
||||
* get_futex_key() - Get parameters which are the keys for a futex
|
||||
* @uaddr: virtual address of the futex
|
||||
* @fshared: 0 for a PROCESS_PRIVATE futex, 1 for PROCESS_SHARED
|
||||
* @fshared: false for a PROCESS_PRIVATE futex, true for PROCESS_SHARED
|
||||
* @key: address where result is stored.
|
||||
* @rw: mapping needs to be read/write (values: FUTEX_READ,
|
||||
* FUTEX_WRITE)
|
||||
|
@ -500,8 +483,8 @@ static u64 get_inode_sequence_number(struct inode *inode)
|
|||
*
|
||||
* lock_page() might sleep, the caller should not hold a spinlock.
|
||||
*/
|
||||
static int
|
||||
get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_access rw)
|
||||
static int get_futex_key(u32 __user *uaddr, bool fshared, union futex_key *key,
|
||||
enum futex_access rw)
|
||||
{
|
||||
unsigned long address = (unsigned long)uaddr;
|
||||
struct mm_struct *mm = current->mm;
|
||||
|
@ -538,7 +521,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a
|
|||
|
||||
again:
|
||||
/* Ignore any VERIFY_READ mapping (futex common case) */
|
||||
if (unlikely(should_fail_futex(fshared)))
|
||||
if (unlikely(should_fail_futex(true)))
|
||||
return -EFAULT;
|
||||
|
||||
err = get_user_pages_fast(address, 1, FOLL_WRITE, &page);
|
||||
|
@ -626,7 +609,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a
|
|||
* A RO anonymous page will never change and thus doesn't make
|
||||
* sense for futex operations.
|
||||
*/
|
||||
if (unlikely(should_fail_futex(fshared)) || ro) {
|
||||
if (unlikely(should_fail_futex(true)) || ro) {
|
||||
err = -EFAULT;
|
||||
goto out;
|
||||
}
|
||||
|
@ -677,10 +660,6 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a
|
|||
return err;
|
||||
}
|
||||
|
||||
static inline void put_futex_key(union futex_key *key)
|
||||
{
|
||||
}
|
||||
|
||||
/**
|
||||
* fault_in_user_writeable() - Fault in user address and verify RW access
|
||||
* @uaddr: pointer to faulting user space address
|
||||
|
@ -1611,13 +1590,13 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
|
|||
|
||||
ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key, FUTEX_READ);
|
||||
if (unlikely(ret != 0))
|
||||
goto out;
|
||||
return ret;
|
||||
|
||||
hb = hash_futex(&key);
|
||||
|
||||
/* Make sure we really have tasks to wakeup */
|
||||
if (!hb_waiters_pending(hb))
|
||||
goto out_put_key;
|
||||
return ret;
|
||||
|
||||
spin_lock(&hb->lock);
|
||||
|
||||
|
@ -1640,9 +1619,6 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
|
|||
|
||||
spin_unlock(&hb->lock);
|
||||
wake_up_q(&wake_q);
|
||||
out_put_key:
|
||||
put_futex_key(&key);
|
||||
out:
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
@ -1709,10 +1685,10 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
|
|||
retry:
|
||||
ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, FUTEX_READ);
|
||||
if (unlikely(ret != 0))
|
||||
goto out;
|
||||
return ret;
|
||||
ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, FUTEX_WRITE);
|
||||
if (unlikely(ret != 0))
|
||||
goto out_put_key1;
|
||||
return ret;
|
||||
|
||||
hb1 = hash_futex(&key1);
|
||||
hb2 = hash_futex(&key2);
|
||||
|
@ -1730,13 +1706,13 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
|
|||
* an MMU, but we might get them from range checking
|
||||
*/
|
||||
ret = op_ret;
|
||||
goto out_put_keys;
|
||||
return ret;
|
||||
}
|
||||
|
||||
if (op_ret == -EFAULT) {
|
||||
ret = fault_in_user_writeable(uaddr2);
|
||||
if (ret)
|
||||
goto out_put_keys;
|
||||
return ret;
|
||||
}
|
||||
|
||||
if (!(flags & FLAGS_SHARED)) {
|
||||
|
@ -1744,8 +1720,6 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
|
|||
goto retry_private;
|
||||
}
|
||||
|
||||
put_futex_key(&key2);
|
||||
put_futex_key(&key1);
|
||||
cond_resched();
|
||||
goto retry;
|
||||
}
|
||||
|
@ -1781,11 +1755,6 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
|
|||
out_unlock:
|
||||
double_unlock_hb(hb1, hb2);
|
||||
wake_up_q(&wake_q);
|
||||
out_put_keys:
|
||||
put_futex_key(&key2);
|
||||
out_put_key1:
|
||||
put_futex_key(&key1);
|
||||
out:
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
@ -1992,20 +1961,18 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
|
|||
retry:
|
||||
ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, FUTEX_READ);
|
||||
if (unlikely(ret != 0))
|
||||
goto out;
|
||||
return ret;
|
||||
ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2,
|
||||
requeue_pi ? FUTEX_WRITE : FUTEX_READ);
|
||||
if (unlikely(ret != 0))
|
||||
goto out_put_key1;
|
||||
return ret;
|
||||
|
||||
/*
|
||||
* The check above which compares uaddrs is not sufficient for
|
||||
* shared futexes. We need to compare the keys:
|
||||
*/
|
||||
if (requeue_pi && match_futex(&key1, &key2)) {
|
||||
ret = -EINVAL;
|
||||
goto out_put_keys;
|
||||
}
|
||||
if (requeue_pi && match_futex(&key1, &key2))
|
||||
return -EINVAL;
|
||||
|
||||
hb1 = hash_futex(&key1);
|
||||
hb2 = hash_futex(&key2);
|
||||
|
@ -2025,13 +1992,11 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
|
|||
|
||||
ret = get_user(curval, uaddr1);
|
||||
if (ret)
|
||||
goto out_put_keys;
|
||||
return ret;
|
||||
|
||||
if (!(flags & FLAGS_SHARED))
|
||||
goto retry_private;
|
||||
|
||||
put_futex_key(&key2);
|
||||
put_futex_key(&key1);
|
||||
goto retry;
|
||||
}
|
||||
if (curval != *cmpval) {
|
||||
|
@ -2090,12 +2055,10 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
|
|||
case -EFAULT:
|
||||
double_unlock_hb(hb1, hb2);
|
||||
hb_waiters_dec(hb2);
|
||||
put_futex_key(&key2);
|
||||
put_futex_key(&key1);
|
||||
ret = fault_in_user_writeable(uaddr2);
|
||||
if (!ret)
|
||||
goto retry;
|
||||
goto out;
|
||||
return ret;
|
||||
case -EBUSY:
|
||||
case -EAGAIN:
|
||||
/*
|
||||
|
@ -2106,8 +2069,6 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
|
|||
*/
|
||||
double_unlock_hb(hb1, hb2);
|
||||
hb_waiters_dec(hb2);
|
||||
put_futex_key(&key2);
|
||||
put_futex_key(&key1);
|
||||
/*
|
||||
* Handle the case where the owner is in the middle of
|
||||
* exiting. Wait for the exit to complete otherwise
|
||||
|
@ -2216,12 +2177,6 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
|
|||
double_unlock_hb(hb1, hb2);
|
||||
wake_up_q(&wake_q);
|
||||
hb_waiters_dec(hb2);
|
||||
|
||||
out_put_keys:
|
||||
put_futex_key(&key2);
|
||||
out_put_key1:
|
||||
put_futex_key(&key1);
|
||||
out:
|
||||
return ret ? ret : task_count;
|
||||
}
|
||||
|
||||
|
@ -2567,7 +2522,7 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
|
|||
*/
|
||||
if (q->pi_state->owner != current)
|
||||
ret = fixup_pi_state_owner(uaddr, q, current);
|
||||
goto out;
|
||||
return ret ? ret : locked;
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -2580,7 +2535,7 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
|
|||
*/
|
||||
if (q->pi_state->owner == current) {
|
||||
ret = fixup_pi_state_owner(uaddr, q, NULL);
|
||||
goto out;
|
||||
return ret;
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -2594,8 +2549,7 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
|
|||
q->pi_state->owner);
|
||||
}
|
||||
|
||||
out:
|
||||
return ret ? ret : locked;
|
||||
return ret;
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -2692,12 +2646,11 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags,
|
|||
|
||||
ret = get_user(uval, uaddr);
|
||||
if (ret)
|
||||
goto out;
|
||||
return ret;
|
||||
|
||||
if (!(flags & FLAGS_SHARED))
|
||||
goto retry_private;
|
||||
|
||||
put_futex_key(&q->key);
|
||||
goto retry;
|
||||
}
|
||||
|
||||
|
@ -2706,9 +2659,6 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags,
|
|||
ret = -EWOULDBLOCK;
|
||||
}
|
||||
|
||||
out:
|
||||
if (ret)
|
||||
put_futex_key(&q->key);
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
@ -2853,7 +2803,6 @@ static int futex_lock_pi(u32 __user *uaddr, unsigned int flags,
|
|||
* - EAGAIN: The user space value changed.
|
||||
*/
|
||||
queue_unlock(hb);
|
||||
put_futex_key(&q.key);
|
||||
/*
|
||||
* Handle the case where the owner is in the middle of
|
||||
* exiting. Wait for the exit to complete otherwise
|
||||
|
@ -2961,13 +2910,11 @@ static int futex_lock_pi(u32 __user *uaddr, unsigned int flags,
|
|||
put_pi_state(pi_state);
|
||||
}
|
||||
|
||||
goto out_put_key;
|
||||
goto out;
|
||||
|
||||
out_unlock_put_key:
|
||||
queue_unlock(hb);
|
||||
|
||||
out_put_key:
|
||||
put_futex_key(&q.key);
|
||||
out:
|
||||
if (to) {
|
||||
hrtimer_cancel(&to->timer);
|
||||
|
@ -2980,12 +2927,11 @@ static int futex_lock_pi(u32 __user *uaddr, unsigned int flags,
|
|||
|
||||
ret = fault_in_user_writeable(uaddr);
|
||||
if (ret)
|
||||
goto out_put_key;
|
||||
goto out;
|
||||
|
||||
if (!(flags & FLAGS_SHARED))
|
||||
goto retry_private;
|
||||
|
||||
put_futex_key(&q.key);
|
||||
goto retry;
|
||||
}
|
||||
|
||||
|
@ -3114,16 +3060,13 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
|
|||
out_unlock:
|
||||
spin_unlock(&hb->lock);
|
||||
out_putkey:
|
||||
put_futex_key(&key);
|
||||
return ret;
|
||||
|
||||
pi_retry:
|
||||
put_futex_key(&key);
|
||||
cond_resched();
|
||||
goto retry;
|
||||
|
||||
pi_faulted:
|
||||
put_futex_key(&key);
|
||||
|
||||
ret = fault_in_user_writeable(uaddr);
|
||||
if (!ret)
|
||||
|
@ -3265,7 +3208,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
|
|||
*/
|
||||
ret = futex_wait_setup(uaddr, val, flags, &q, &hb);
|
||||
if (ret)
|
||||
goto out_key2;
|
||||
goto out;
|
||||
|
||||
/*
|
||||
* The check above which compares uaddrs is not sufficient for
|
||||
|
@ -3274,7 +3217,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
|
|||
if (match_futex(&q.key, &key2)) {
|
||||
queue_unlock(hb);
|
||||
ret = -EINVAL;
|
||||
goto out_put_keys;
|
||||
goto out;
|
||||
}
|
||||
|
||||
/* Queue the futex_q, drop the hb lock, wait for wakeup. */
|
||||
|
@ -3284,7 +3227,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
|
|||
ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to);
|
||||
spin_unlock(&hb->lock);
|
||||
if (ret)
|
||||
goto out_put_keys;
|
||||
goto out;
|
||||
|
||||
/*
|
||||
* In order for us to be here, we know our q.key == key2, and since
|
||||
|
@ -3374,11 +3317,6 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
|
|||
ret = -EWOULDBLOCK;
|
||||
}
|
||||
|
||||
out_put_keys:
|
||||
put_futex_key(&q.key);
|
||||
out_key2:
|
||||
put_futex_key(&key2);
|
||||
|
||||
out:
|
||||
if (to) {
|
||||
hrtimer_cancel(&to->timer);
|
||||
|
|
|
@ -7,8 +7,11 @@ CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
|
|||
CFLAGS_REMOVE_debugfs.o = $(CC_FLAGS_FTRACE)
|
||||
CFLAGS_REMOVE_report.o = $(CC_FLAGS_FTRACE)
|
||||
|
||||
CFLAGS_core.o := $(call cc-option,-fno-conserve-stack,) \
|
||||
$(call cc-option,-fno-stack-protector,)
|
||||
CFLAGS_core.o := $(call cc-option,-fno-conserve-stack) \
|
||||
-fno-stack-protector -DDISABLE_BRANCH_PROFILING
|
||||
|
||||
obj-y := core.o debugfs.o report.o
|
||||
obj-$(CONFIG_KCSAN_SELFTEST) += test.o
|
||||
obj-$(CONFIG_KCSAN_SELFTEST) += selftest.o
|
||||
|
||||
CFLAGS_kcsan-test.o := $(CFLAGS_KCSAN) -g -fno-omit-frame-pointer
|
||||
obj-$(CONFIG_KCSAN_TEST) += kcsan-test.o
|
||||
|
|
|
@ -3,8 +3,7 @@
|
|||
#ifndef _KERNEL_KCSAN_ATOMIC_H
|
||||
#define _KERNEL_KCSAN_ATOMIC_H
|
||||
|
||||
#include <linux/jiffies.h>
|
||||
#include <linux/sched.h>
|
||||
#include <linux/types.h>
|
||||
|
||||
/*
|
||||
* Special rules for certain memory where concurrent conflicting accesses are
|
||||
|
@ -13,8 +12,7 @@
|
|||
*/
|
||||
static bool kcsan_is_atomic_special(const volatile void *ptr)
|
||||
{
|
||||
/* volatile globals that have been observed in data races. */
|
||||
return ptr == &jiffies || ptr == ¤t->state;
|
||||
return false;
|
||||
}
|
||||
|
||||
#endif /* _KERNEL_KCSAN_ATOMIC_H */
|
||||
|
|
|
@ -291,6 +291,20 @@ static inline unsigned int get_delay(void)
|
|||
0);
|
||||
}
|
||||
|
||||
void kcsan_save_irqtrace(struct task_struct *task)
|
||||
{
|
||||
#ifdef CONFIG_TRACE_IRQFLAGS
|
||||
task->kcsan_save_irqtrace = task->irqtrace;
|
||||
#endif
|
||||
}
|
||||
|
||||
void kcsan_restore_irqtrace(struct task_struct *task)
|
||||
{
|
||||
#ifdef CONFIG_TRACE_IRQFLAGS
|
||||
task->irqtrace = task->kcsan_save_irqtrace;
|
||||
#endif
|
||||
}
|
||||
|
||||
/*
|
||||
* Pull everything together: check_access() below contains the performance
|
||||
* critical operations; the fast-path (including check_access) functions should
|
||||
|
@ -336,9 +350,11 @@ static noinline void kcsan_found_watchpoint(const volatile void *ptr,
|
|||
flags = user_access_save();
|
||||
|
||||
if (consumed) {
|
||||
kcsan_save_irqtrace(current);
|
||||
kcsan_report(ptr, size, type, KCSAN_VALUE_CHANGE_MAYBE,
|
||||
KCSAN_REPORT_CONSUMED_WATCHPOINT,
|
||||
watchpoint - watchpoints);
|
||||
kcsan_restore_irqtrace(current);
|
||||
} else {
|
||||
/*
|
||||
* The other thread may not print any diagnostics, as it has
|
||||
|
@ -396,9 +412,14 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
|
|||
goto out;
|
||||
}
|
||||
|
||||
/*
|
||||
* Save and restore the IRQ state trace touched by KCSAN, since KCSAN's
|
||||
* runtime is entered for every memory access, and potentially useful
|
||||
* information is lost if dirtied by KCSAN.
|
||||
*/
|
||||
kcsan_save_irqtrace(current);
|
||||
if (!kcsan_interrupt_watcher)
|
||||
/* Use raw to avoid lockdep recursion via IRQ flags tracing. */
|
||||
raw_local_irq_save(irq_flags);
|
||||
local_irq_save(irq_flags);
|
||||
|
||||
watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
|
||||
if (watchpoint == NULL) {
|
||||
|
@ -539,7 +560,8 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
|
|||
kcsan_counter_dec(KCSAN_COUNTER_USED_WATCHPOINTS);
|
||||
out_unlock:
|
||||
if (!kcsan_interrupt_watcher)
|
||||
raw_local_irq_restore(irq_flags);
|
||||
local_irq_restore(irq_flags);
|
||||
kcsan_restore_irqtrace(current);
|
||||
out:
|
||||
user_access_restore(ua_flags);
|
||||
}
|
||||
|
@ -754,6 +776,7 @@ EXPORT_SYMBOL(__kcsan_check_access);
|
|||
*/
|
||||
|
||||
#define DEFINE_TSAN_READ_WRITE(size) \
|
||||
void __tsan_read##size(void *ptr); \
|
||||
void __tsan_read##size(void *ptr) \
|
||||
{ \
|
||||
check_access(ptr, size, 0); \
|
||||
|
@ -762,6 +785,7 @@ EXPORT_SYMBOL(__kcsan_check_access);
|
|||
void __tsan_unaligned_read##size(void *ptr) \
|
||||
__alias(__tsan_read##size); \
|
||||
EXPORT_SYMBOL(__tsan_unaligned_read##size); \
|
||||
void __tsan_write##size(void *ptr); \
|
||||
void __tsan_write##size(void *ptr) \
|
||||
{ \
|
||||
check_access(ptr, size, KCSAN_ACCESS_WRITE); \
|
||||
|
@ -777,12 +801,14 @@ DEFINE_TSAN_READ_WRITE(4);
|
|||
DEFINE_TSAN_READ_WRITE(8);
|
||||
DEFINE_TSAN_READ_WRITE(16);
|
||||
|
||||
void __tsan_read_range(void *ptr, size_t size);
|
||||
void __tsan_read_range(void *ptr, size_t size)
|
||||
{
|
||||
check_access(ptr, size, 0);
|
||||
}
|
||||
EXPORT_SYMBOL(__tsan_read_range);
|
||||
|
||||
void __tsan_write_range(void *ptr, size_t size);
|
||||
void __tsan_write_range(void *ptr, size_t size)
|
||||
{
|
||||
check_access(ptr, size, KCSAN_ACCESS_WRITE);
|
||||
|
@ -799,6 +825,7 @@ EXPORT_SYMBOL(__tsan_write_range);
|
|||
* the size-check of compiletime_assert_rwonce_type().
|
||||
*/
|
||||
#define DEFINE_TSAN_VOLATILE_READ_WRITE(size) \
|
||||
void __tsan_volatile_read##size(void *ptr); \
|
||||
void __tsan_volatile_read##size(void *ptr) \
|
||||
{ \
|
||||
const bool is_atomic = size <= sizeof(long long) && \
|
||||
|
@ -811,6 +838,7 @@ EXPORT_SYMBOL(__tsan_write_range);
|
|||
void __tsan_unaligned_volatile_read##size(void *ptr) \
|
||||
__alias(__tsan_volatile_read##size); \
|
||||
EXPORT_SYMBOL(__tsan_unaligned_volatile_read##size); \
|
||||
void __tsan_volatile_write##size(void *ptr); \
|
||||
void __tsan_volatile_write##size(void *ptr) \
|
||||
{ \
|
||||
const bool is_atomic = size <= sizeof(long long) && \
|
||||
|
@ -836,14 +864,17 @@ DEFINE_TSAN_VOLATILE_READ_WRITE(16);
|
|||
* The below are not required by KCSAN, but can still be emitted by the
|
||||
* compiler.
|
||||
*/
|
||||
void __tsan_func_entry(void *call_pc);
|
||||
void __tsan_func_entry(void *call_pc)
|
||||
{
|
||||
}
|
||||
EXPORT_SYMBOL(__tsan_func_entry);
|
||||
void __tsan_func_exit(void);
|
||||
void __tsan_func_exit(void)
|
||||
{
|
||||
}
|
||||
EXPORT_SYMBOL(__tsan_func_exit);
|
||||
void __tsan_init(void);
|
||||
void __tsan_init(void)
|
||||
{
|
||||
}
|
||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -9,6 +9,7 @@
|
|||
#define _KERNEL_KCSAN_KCSAN_H
|
||||
|
||||
#include <linux/kcsan.h>
|
||||
#include <linux/sched.h>
|
||||
|
||||
/* The number of adjacent watchpoints to check. */
|
||||
#define KCSAN_CHECK_ADJACENT 1
|
||||
|
@ -22,6 +23,12 @@ extern unsigned int kcsan_udelay_interrupt;
|
|||
*/
|
||||
extern bool kcsan_enabled;
|
||||
|
||||
/*
|
||||
* Save/restore IRQ flags state trace dirtied by KCSAN.
|
||||
*/
|
||||
void kcsan_save_irqtrace(struct task_struct *task);
|
||||
void kcsan_restore_irqtrace(struct task_struct *task);
|
||||
|
||||
/*
|
||||
* Initialize debugfs file.
|
||||
*/
|
||||
|
|
|
@ -308,6 +308,9 @@ static void print_verbose_info(struct task_struct *task)
|
|||
if (!task)
|
||||
return;
|
||||
|
||||
/* Restore IRQ state trace for printing. */
|
||||
kcsan_restore_irqtrace(task);
|
||||
|
||||
pr_err("\n");
|
||||
debug_show_held_locks(task);
|
||||
print_irqtrace_events(task);
|
||||
|
@ -606,10 +609,11 @@ void kcsan_report(const volatile void *ptr, size_t size, int access_type,
|
|||
goto out;
|
||||
|
||||
/*
|
||||
* With TRACE_IRQFLAGS, lockdep's IRQ trace state becomes corrupted if
|
||||
* we do not turn off lockdep here; this could happen due to recursion
|
||||
* into lockdep via KCSAN if we detect a race in utilities used by
|
||||
* lockdep.
|
||||
* Because we may generate reports when we're in scheduler code, the use
|
||||
* of printk() could deadlock. Until such time that all printing code
|
||||
* called in print_report() is scheduler-safe, accept the risk, and just
|
||||
* get our message out. As such, also disable lockdep to hide the
|
||||
* warning, and avoid disabling lockdep for the rest of the kernel.
|
||||
*/
|
||||
lockdep_off();
|
||||
|
||||
|
|
|
@ -395,7 +395,7 @@ void lockdep_init_task(struct task_struct *task)
|
|||
|
||||
static __always_inline void lockdep_recursion_finish(void)
|
||||
{
|
||||
if (WARN_ON_ONCE(--current->lockdep_recursion))
|
||||
if (WARN_ON_ONCE((--current->lockdep_recursion) & LOCKDEP_RECURSION_MASK))
|
||||
current->lockdep_recursion = 0;
|
||||
}
|
||||
|
||||
|
@ -2062,9 +2062,9 @@ print_bad_irq_dependency(struct task_struct *curr,
|
|||
pr_warn("-----------------------------------------------------\n");
|
||||
pr_warn("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] is trying to acquire:\n",
|
||||
curr->comm, task_pid_nr(curr),
|
||||
curr->hardirq_context, hardirq_count() >> HARDIRQ_SHIFT,
|
||||
lockdep_hardirq_context(), hardirq_count() >> HARDIRQ_SHIFT,
|
||||
curr->softirq_context, softirq_count() >> SOFTIRQ_SHIFT,
|
||||
curr->hardirqs_enabled,
|
||||
lockdep_hardirqs_enabled(),
|
||||
curr->softirqs_enabled);
|
||||
print_lock(next);
|
||||
|
||||
|
@ -3331,9 +3331,9 @@ print_usage_bug(struct task_struct *curr, struct held_lock *this,
|
|||
|
||||
pr_warn("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] takes:\n",
|
||||
curr->comm, task_pid_nr(curr),
|
||||
lockdep_hardirq_context(curr), hardirq_count() >> HARDIRQ_SHIFT,
|
||||
lockdep_hardirq_context(), hardirq_count() >> HARDIRQ_SHIFT,
|
||||
lockdep_softirq_context(curr), softirq_count() >> SOFTIRQ_SHIFT,
|
||||
lockdep_hardirqs_enabled(curr),
|
||||
lockdep_hardirqs_enabled(),
|
||||
lockdep_softirqs_enabled(curr));
|
||||
print_lock(this);
|
||||
|
||||
|
@ -3484,19 +3484,21 @@ check_usage_backwards(struct task_struct *curr, struct held_lock *this,
|
|||
|
||||
void print_irqtrace_events(struct task_struct *curr)
|
||||
{
|
||||
printk("irq event stamp: %u\n", curr->irq_events);
|
||||
const struct irqtrace_events *trace = &curr->irqtrace;
|
||||
|
||||
printk("irq event stamp: %u\n", trace->irq_events);
|
||||
printk("hardirqs last enabled at (%u): [<%px>] %pS\n",
|
||||
curr->hardirq_enable_event, (void *)curr->hardirq_enable_ip,
|
||||
(void *)curr->hardirq_enable_ip);
|
||||
trace->hardirq_enable_event, (void *)trace->hardirq_enable_ip,
|
||||
(void *)trace->hardirq_enable_ip);
|
||||
printk("hardirqs last disabled at (%u): [<%px>] %pS\n",
|
||||
curr->hardirq_disable_event, (void *)curr->hardirq_disable_ip,
|
||||
(void *)curr->hardirq_disable_ip);
|
||||
trace->hardirq_disable_event, (void *)trace->hardirq_disable_ip,
|
||||
(void *)trace->hardirq_disable_ip);
|
||||
printk("softirqs last enabled at (%u): [<%px>] %pS\n",
|
||||
curr->softirq_enable_event, (void *)curr->softirq_enable_ip,
|
||||
(void *)curr->softirq_enable_ip);
|
||||
trace->softirq_enable_event, (void *)trace->softirq_enable_ip,
|
||||
(void *)trace->softirq_enable_ip);
|
||||
printk("softirqs last disabled at (%u): [<%px>] %pS\n",
|
||||
curr->softirq_disable_event, (void *)curr->softirq_disable_ip,
|
||||
(void *)curr->softirq_disable_ip);
|
||||
trace->softirq_disable_event, (void *)trace->softirq_disable_ip,
|
||||
(void *)trace->softirq_disable_ip);
|
||||
}
|
||||
|
||||
static int HARDIRQ_verbose(struct lock_class *class)
|
||||
|
@ -3646,10 +3648,19 @@ static void __trace_hardirqs_on_caller(void)
|
|||
*/
|
||||
void lockdep_hardirqs_on_prepare(unsigned long ip)
|
||||
{
|
||||
if (unlikely(!debug_locks || current->lockdep_recursion))
|
||||
if (unlikely(!debug_locks))
|
||||
return;
|
||||
|
||||
if (unlikely(current->hardirqs_enabled)) {
|
||||
/*
|
||||
* NMIs do not (and cannot) track lock dependencies, nothing to do.
|
||||
*/
|
||||
if (unlikely(in_nmi()))
|
||||
return;
|
||||
|
||||
if (unlikely(current->lockdep_recursion & LOCKDEP_RECURSION_MASK))
|
||||
return;
|
||||
|
||||
if (unlikely(lockdep_hardirqs_enabled())) {
|
||||
/*
|
||||
* Neither irq nor preemption are disabled here
|
||||
* so this is racy by nature but losing one hit
|
||||
|
@ -3677,7 +3688,7 @@ void lockdep_hardirqs_on_prepare(unsigned long ip)
|
|||
* Can't allow enabling interrupts while in an interrupt handler,
|
||||
* that's general bad form and such. Recursion, limited stack etc..
|
||||
*/
|
||||
if (DEBUG_LOCKS_WARN_ON(current->hardirq_context))
|
||||
if (DEBUG_LOCKS_WARN_ON(lockdep_hardirq_context()))
|
||||
return;
|
||||
|
||||
current->hardirq_chain_key = current->curr_chain_key;
|
||||
|
@ -3690,12 +3701,35 @@ EXPORT_SYMBOL_GPL(lockdep_hardirqs_on_prepare);
|
|||
|
||||
void noinstr lockdep_hardirqs_on(unsigned long ip)
|
||||
{
|
||||
struct task_struct *curr = current;
|
||||
struct irqtrace_events *trace = ¤t->irqtrace;
|
||||
|
||||
if (unlikely(!debug_locks || curr->lockdep_recursion))
|
||||
if (unlikely(!debug_locks))
|
||||
return;
|
||||
|
||||
if (curr->hardirqs_enabled) {
|
||||
/*
|
||||
* NMIs can happen in the middle of local_irq_{en,dis}able() where the
|
||||
* tracking state and hardware state are out of sync.
|
||||
*
|
||||
* NMIs must save lockdep_hardirqs_enabled() to restore IRQ state from,
|
||||
* and not rely on hardware state like normal interrupts.
|
||||
*/
|
||||
if (unlikely(in_nmi())) {
|
||||
if (!IS_ENABLED(CONFIG_TRACE_IRQFLAGS_NMI))
|
||||
return;
|
||||
|
||||
/*
|
||||
* Skip:
|
||||
* - recursion check, because NMI can hit lockdep;
|
||||
* - hardware state check, because above;
|
||||
* - chain_key check, see lockdep_hardirqs_on_prepare().
|
||||
*/
|
||||
goto skip_checks;
|
||||
}
|
||||
|
||||
if (unlikely(current->lockdep_recursion & LOCKDEP_RECURSION_MASK))
|
||||
return;
|
||||
|
||||
if (lockdep_hardirqs_enabled()) {
|
||||
/*
|
||||
* Neither irq nor preemption are disabled here
|
||||
* so this is racy by nature but losing one hit
|
||||
|
@ -3720,10 +3754,11 @@ void noinstr lockdep_hardirqs_on(unsigned long ip)
|
|||
DEBUG_LOCKS_WARN_ON(current->hardirq_chain_key !=
|
||||
current->curr_chain_key);
|
||||
|
||||
skip_checks:
|
||||
/* we'll do an OFF -> ON transition: */
|
||||
curr->hardirqs_enabled = 1;
|
||||
curr->hardirq_enable_ip = ip;
|
||||
curr->hardirq_enable_event = ++curr->irq_events;
|
||||
this_cpu_write(hardirqs_enabled, 1);
|
||||
trace->hardirq_enable_ip = ip;
|
||||
trace->hardirq_enable_event = ++trace->irq_events;
|
||||
debug_atomic_inc(hardirqs_on_events);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(lockdep_hardirqs_on);
|
||||
|
@ -3733,9 +3768,18 @@ EXPORT_SYMBOL_GPL(lockdep_hardirqs_on);
|
|||
*/
|
||||
void noinstr lockdep_hardirqs_off(unsigned long ip)
|
||||
{
|
||||
struct task_struct *curr = current;
|
||||
if (unlikely(!debug_locks))
|
||||
return;
|
||||
|
||||
if (unlikely(!debug_locks || curr->lockdep_recursion))
|
||||
/*
|
||||
* Matching lockdep_hardirqs_on(), allow NMIs in the middle of lockdep;
|
||||
* they will restore the software state. This ensures the software
|
||||
* state is consistent inside NMIs as well.
|
||||
*/
|
||||
if (in_nmi()) {
|
||||
if (!IS_ENABLED(CONFIG_TRACE_IRQFLAGS_NMI))
|
||||
return;
|
||||
} else if (current->lockdep_recursion & LOCKDEP_RECURSION_MASK)
|
||||
return;
|
||||
|
||||
/*
|
||||
|
@ -3745,13 +3789,15 @@ void noinstr lockdep_hardirqs_off(unsigned long ip)
|
|||
if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
|
||||
return;
|
||||
|
||||
if (curr->hardirqs_enabled) {
|
||||
if (lockdep_hardirqs_enabled()) {
|
||||
struct irqtrace_events *trace = ¤t->irqtrace;
|
||||
|
||||
/*
|
||||
* We have done an ON -> OFF transition:
|
||||
*/
|
||||
curr->hardirqs_enabled = 0;
|
||||
curr->hardirq_disable_ip = ip;
|
||||
curr->hardirq_disable_event = ++curr->irq_events;
|
||||
this_cpu_write(hardirqs_enabled, 0);
|
||||
trace->hardirq_disable_ip = ip;
|
||||
trace->hardirq_disable_event = ++trace->irq_events;
|
||||
debug_atomic_inc(hardirqs_off_events);
|
||||
} else {
|
||||
debug_atomic_inc(redundant_hardirqs_off);
|
||||
|
@ -3764,7 +3810,7 @@ EXPORT_SYMBOL_GPL(lockdep_hardirqs_off);
|
|||
*/
|
||||
void lockdep_softirqs_on(unsigned long ip)
|
||||
{
|
||||
struct task_struct *curr = current;
|
||||
struct irqtrace_events *trace = ¤t->irqtrace;
|
||||
|
||||
if (unlikely(!debug_locks || current->lockdep_recursion))
|
||||
return;
|
||||
|
@ -3776,7 +3822,7 @@ void lockdep_softirqs_on(unsigned long ip)
|
|||
if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
|
||||
return;
|
||||
|
||||
if (curr->softirqs_enabled) {
|
||||
if (current->softirqs_enabled) {
|
||||
debug_atomic_inc(redundant_softirqs_on);
|
||||
return;
|
||||
}
|
||||
|
@ -3785,17 +3831,17 @@ void lockdep_softirqs_on(unsigned long ip)
|
|||
/*
|
||||
* We'll do an OFF -> ON transition:
|
||||
*/
|
||||
curr->softirqs_enabled = 1;
|
||||
curr->softirq_enable_ip = ip;
|
||||
curr->softirq_enable_event = ++curr->irq_events;
|
||||
current->softirqs_enabled = 1;
|
||||
trace->softirq_enable_ip = ip;
|
||||
trace->softirq_enable_event = ++trace->irq_events;
|
||||
debug_atomic_inc(softirqs_on_events);
|
||||
/*
|
||||
* We are going to turn softirqs on, so set the
|
||||
* usage bit for all held locks, if hardirqs are
|
||||
* enabled too:
|
||||
*/
|
||||
if (curr->hardirqs_enabled)
|
||||
mark_held_locks(curr, LOCK_ENABLED_SOFTIRQ);
|
||||
if (lockdep_hardirqs_enabled())
|
||||
mark_held_locks(current, LOCK_ENABLED_SOFTIRQ);
|
||||
lockdep_recursion_finish();
|
||||
}
|
||||
|
||||
|
@ -3804,8 +3850,6 @@ void lockdep_softirqs_on(unsigned long ip)
|
|||
*/
|
||||
void lockdep_softirqs_off(unsigned long ip)
|
||||
{
|
||||
struct task_struct *curr = current;
|
||||
|
||||
if (unlikely(!debug_locks || current->lockdep_recursion))
|
||||
return;
|
||||
|
||||
|
@ -3815,13 +3859,15 @@ void lockdep_softirqs_off(unsigned long ip)
|
|||
if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
|
||||
return;
|
||||
|
||||
if (curr->softirqs_enabled) {
|
||||
if (current->softirqs_enabled) {
|
||||
struct irqtrace_events *trace = ¤t->irqtrace;
|
||||
|
||||
/*
|
||||
* We have done an ON -> OFF transition:
|
||||
*/
|
||||
curr->softirqs_enabled = 0;
|
||||
curr->softirq_disable_ip = ip;
|
||||
curr->softirq_disable_event = ++curr->irq_events;
|
||||
current->softirqs_enabled = 0;
|
||||
trace->softirq_disable_ip = ip;
|
||||
trace->softirq_disable_event = ++trace->irq_events;
|
||||
debug_atomic_inc(softirqs_off_events);
|
||||
/*
|
||||
* Whoops, we wanted softirqs off, so why aren't they?
|
||||
|
@ -3843,7 +3889,7 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
|
|||
*/
|
||||
if (!hlock->trylock) {
|
||||
if (hlock->read) {
|
||||
if (curr->hardirq_context)
|
||||
if (lockdep_hardirq_context())
|
||||
if (!mark_lock(curr, hlock,
|
||||
LOCK_USED_IN_HARDIRQ_READ))
|
||||
return 0;
|
||||
|
@ -3852,7 +3898,7 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
|
|||
LOCK_USED_IN_SOFTIRQ_READ))
|
||||
return 0;
|
||||
} else {
|
||||
if (curr->hardirq_context)
|
||||
if (lockdep_hardirq_context())
|
||||
if (!mark_lock(curr, hlock, LOCK_USED_IN_HARDIRQ))
|
||||
return 0;
|
||||
if (curr->softirq_context)
|
||||
|
@ -3890,7 +3936,7 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
|
|||
|
||||
static inline unsigned int task_irq_context(struct task_struct *task)
|
||||
{
|
||||
return LOCK_CHAIN_HARDIRQ_CONTEXT * !!task->hardirq_context +
|
||||
return LOCK_CHAIN_HARDIRQ_CONTEXT * !!lockdep_hardirq_context() +
|
||||
LOCK_CHAIN_SOFTIRQ_CONTEXT * !!task->softirq_context;
|
||||
}
|
||||
|
||||
|
@ -3983,7 +4029,7 @@ static inline short task_wait_context(struct task_struct *curr)
|
|||
* Set appropriate wait type for the context; for IRQs we have to take
|
||||
* into account force_irqthread as that is implied by PREEMPT_RT.
|
||||
*/
|
||||
if (curr->hardirq_context) {
|
||||
if (lockdep_hardirq_context()) {
|
||||
/*
|
||||
* Check if force_irqthreads will run us threaded.
|
||||
*/
|
||||
|
@ -4826,11 +4872,11 @@ static void check_flags(unsigned long flags)
|
|||
return;
|
||||
|
||||
if (irqs_disabled_flags(flags)) {
|
||||
if (DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)) {
|
||||
if (DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())) {
|
||||
printk("possible reason: unannotated irqs-off.\n");
|
||||
}
|
||||
} else {
|
||||
if (DEBUG_LOCKS_WARN_ON(!current->hardirqs_enabled)) {
|
||||
if (DEBUG_LOCKS_WARN_ON(!lockdep_hardirqs_enabled())) {
|
||||
printk("possible reason: unannotated irqs-on.\n");
|
||||
}
|
||||
}
|
||||
|
|
|
@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
|
|||
*/
|
||||
|
||||
for (;;) {
|
||||
if (prev->next == node &&
|
||||
/*
|
||||
* cpu_relax() below implies a compiler barrier which would
|
||||
* prevent this comparison being optimized away.
|
||||
*/
|
||||
if (data_race(prev->next) == node &&
|
||||
cmpxchg(&prev->next, node, NULL) == node)
|
||||
break;
|
||||
|
||||
|
|
|
@ -107,6 +107,12 @@ static bool ksoftirqd_running(unsigned long pending)
|
|||
* where hardirqs are disabled legitimately:
|
||||
*/
|
||||
#ifdef CONFIG_TRACE_IRQFLAGS
|
||||
|
||||
DEFINE_PER_CPU(int, hardirqs_enabled);
|
||||
DEFINE_PER_CPU(int, hardirq_context);
|
||||
EXPORT_PER_CPU_SYMBOL_GPL(hardirqs_enabled);
|
||||
EXPORT_PER_CPU_SYMBOL_GPL(hardirq_context);
|
||||
|
||||
void __local_bh_disable_ip(unsigned long ip, unsigned int cnt)
|
||||
{
|
||||
unsigned long flags;
|
||||
|
@ -224,7 +230,7 @@ static inline bool lockdep_softirq_start(void)
|
|||
{
|
||||
bool in_hardirq = false;
|
||||
|
||||
if (lockdep_hardirq_context(current)) {
|
||||
if (lockdep_hardirq_context()) {
|
||||
in_hardirq = true;
|
||||
lockdep_hardirq_exit();
|
||||
}
|
||||
|
|
|
@ -1117,6 +1117,7 @@ config PROVE_LOCKING
|
|||
select DEBUG_RWSEMS
|
||||
select DEBUG_WW_MUTEX_SLOWPATH
|
||||
select DEBUG_LOCK_ALLOC
|
||||
select PREEMPT_COUNT if !ARCH_NO_PREEMPT
|
||||
select TRACE_IRQFLAGS
|
||||
default n
|
||||
help
|
||||
|
@ -1325,11 +1326,17 @@ config WW_MUTEX_SELFTEST
|
|||
endmenu # lock debugging
|
||||
|
||||
config TRACE_IRQFLAGS
|
||||
depends on TRACE_IRQFLAGS_SUPPORT
|
||||
bool
|
||||
help
|
||||
Enables hooks to interrupt enabling and disabling for
|
||||
either tracing or lock debugging.
|
||||
|
||||
config TRACE_IRQFLAGS_NMI
|
||||
def_bool y
|
||||
depends on TRACE_IRQFLAGS
|
||||
depends on TRACE_IRQFLAGS_NMI_SUPPORT
|
||||
|
||||
config STACKTRACE
|
||||
bool "Stack backtrace support"
|
||||
depends on STACKTRACE_SUPPORT
|
||||
|
|
|
@ -4,7 +4,8 @@ config HAVE_ARCH_KCSAN
|
|||
bool
|
||||
|
||||
config HAVE_KCSAN_COMPILER
|
||||
def_bool CC_IS_CLANG && $(cc-option,-fsanitize=thread -mllvm -tsan-distinguish-volatile=1)
|
||||
def_bool (CC_IS_CLANG && $(cc-option,-fsanitize=thread -mllvm -tsan-distinguish-volatile=1)) || \
|
||||
(CC_IS_GCC && $(cc-option,-fsanitize=thread --param tsan-distinguish-volatile=1))
|
||||
help
|
||||
For the list of compilers that support KCSAN, please see
|
||||
<file:Documentation/dev-tools/kcsan.rst>.
|
||||
|
@ -59,7 +60,28 @@ config KCSAN_SELFTEST
|
|||
bool "Perform short selftests on boot"
|
||||
default y
|
||||
help
|
||||
Run KCSAN selftests on boot. On test failure, causes the kernel to panic.
|
||||
Run KCSAN selftests on boot. On test failure, causes the kernel to
|
||||
panic. Recommended to be enabled, ensuring critical functionality
|
||||
works as intended.
|
||||
|
||||
config KCSAN_TEST
|
||||
tristate "KCSAN test for integrated runtime behaviour"
|
||||
depends on TRACEPOINTS && KUNIT
|
||||
select TORTURE_TEST
|
||||
help
|
||||
KCSAN test focusing on behaviour of the integrated runtime. Tests
|
||||
various race scenarios, and verifies the reports generated to
|
||||
console. Makes use of KUnit for test organization, and the Torture
|
||||
framework for test thread control.
|
||||
|
||||
Each test case may run at least up to KCSAN_REPORT_ONCE_IN_MS
|
||||
milliseconds. Test run duration may be optimized by building the
|
||||
kernel and KCSAN test with KCSAN_REPORT_ONCE_IN_MS set to a lower
|
||||
than default value.
|
||||
|
||||
Say Y here if you want the test to be built into the kernel and run
|
||||
during boot; say M if you want the test to build as a module; say N
|
||||
if you are unsure.
|
||||
|
||||
config KCSAN_EARLY_ENABLE
|
||||
bool "Early enable during boot"
|
||||
|
|
|
@ -6,7 +6,7 @@ ifdef CONFIG_KCSAN
|
|||
ifdef CONFIG_CC_IS_CLANG
|
||||
cc-param = -mllvm -$(1)
|
||||
else
|
||||
cc-param = --param -$(1)
|
||||
cc-param = --param $(1)
|
||||
endif
|
||||
|
||||
# Keep most options here optional, to allow enabling more compilers if absence
|
||||
|
|
|
@ -2,9 +2,9 @@
|
|||
#ifndef _LIBLOCKDEP_LINUX_TRACE_IRQFLAGS_H_
|
||||
#define _LIBLOCKDEP_LINUX_TRACE_IRQFLAGS_H_
|
||||
|
||||
# define lockdep_hardirq_context(p) 0
|
||||
# define lockdep_hardirq_context() 0
|
||||
# define lockdep_softirq_context(p) 0
|
||||
# define lockdep_hardirqs_enabled(p) 0
|
||||
# define lockdep_hardirqs_enabled() 0
|
||||
# define lockdep_softirqs_enabled(p) 0
|
||||
# define lockdep_hardirq_enter() do { } while (0)
|
||||
# define lockdep_hardirq_exit() do { } while (0)
|
||||
|
|
|
@ -1985,28 +1985,36 @@ outcome undefined.
|
|||
|
||||
In technical terms, the compiler is allowed to assume that when the
|
||||
program executes, there will not be any data races. A "data race"
|
||||
occurs when two conflicting memory accesses execute concurrently;
|
||||
two memory accesses "conflict" if:
|
||||
occurs when there are two memory accesses such that:
|
||||
|
||||
they access the same location,
|
||||
1. they access the same location,
|
||||
|
||||
they occur on different CPUs (or in different threads on the
|
||||
same CPU),
|
||||
2. at least one of them is a store,
|
||||
|
||||
at least one of them is a plain access,
|
||||
3. at least one of them is plain,
|
||||
|
||||
and at least one of them is a store.
|
||||
4. they occur on different CPUs (or in different threads on the
|
||||
same CPU), and
|
||||
|
||||
The LKMM tries to determine whether a program contains two conflicting
|
||||
accesses which may execute concurrently; if it does then the LKMM says
|
||||
there is a potential data race and makes no predictions about the
|
||||
program's outcome.
|
||||
5. they execute concurrently.
|
||||
|
||||
Determining whether two accesses conflict is easy; you can see that
|
||||
all the concepts involved in the definition above are already part of
|
||||
the memory model. The hard part is telling whether they may execute
|
||||
concurrently. The LKMM takes a conservative attitude, assuming that
|
||||
accesses may be concurrent unless it can prove they cannot.
|
||||
In the literature, two accesses are said to "conflict" if they satisfy
|
||||
1 and 2 above. We'll go a little farther and say that two accesses
|
||||
are "race candidates" if they satisfy 1 - 4. Thus, whether or not two
|
||||
race candidates actually do race in a given execution depends on
|
||||
whether they are concurrent.
|
||||
|
||||
The LKMM tries to determine whether a program contains race candidates
|
||||
which may execute concurrently; if it does then the LKMM says there is
|
||||
a potential data race and makes no predictions about the program's
|
||||
outcome.
|
||||
|
||||
Determining whether two accesses are race candidates is easy; you can
|
||||
see that all the concepts involved in the definition above are already
|
||||
part of the memory model. The hard part is telling whether they may
|
||||
execute concurrently. The LKMM takes a conservative attitude,
|
||||
assuming that accesses may be concurrent unless it can prove they
|
||||
are not.
|
||||
|
||||
If two memory accesses aren't concurrent then one must execute before
|
||||
the other. Therefore the LKMM decides two accesses aren't concurrent
|
||||
|
@ -2169,8 +2177,8 @@ again, now using plain accesses for buf:
|
|||
}
|
||||
|
||||
This program does not contain a data race. Although the U and V
|
||||
accesses conflict, the LKMM can prove they are not concurrent as
|
||||
follows:
|
||||
accesses are race candidates, the LKMM can prove they are not
|
||||
concurrent as follows:
|
||||
|
||||
The smp_wmb() fence in P0 is both a compiler barrier and a
|
||||
cumul-fence. It guarantees that no matter what hash of
|
||||
|
@ -2324,12 +2332,11 @@ could now perform the load of x before the load of ptr (there might be
|
|||
a control dependency but no address dependency at the machine level).
|
||||
|
||||
Finally, it turns out there is a situation in which a plain write does
|
||||
not need to be w-post-bounded: when it is separated from the
|
||||
conflicting access by a fence. At first glance this may seem
|
||||
impossible. After all, to be conflicting the second access has to be
|
||||
on a different CPU from the first, and fences don't link events on
|
||||
different CPUs. Well, normal fences don't -- but rcu-fence can!
|
||||
Here's an example:
|
||||
not need to be w-post-bounded: when it is separated from the other
|
||||
race-candidate access by a fence. At first glance this may seem
|
||||
impossible. After all, to be race candidates the two accesses must
|
||||
be on different CPUs, and fences don't link events on different CPUs.
|
||||
Well, normal fences don't -- but rcu-fence can! Here's an example:
|
||||
|
||||
int x, y;
|
||||
|
||||
|
@ -2365,7 +2372,7 @@ concurrent and there is no race, even though P1's plain store to y
|
|||
isn't w-post-bounded by any marked accesses.
|
||||
|
||||
Putting all this material together yields the following picture. For
|
||||
two conflicting stores W and W', where W ->co W', the LKMM says the
|
||||
race-candidate stores W and W', where W ->co W', the LKMM says the
|
||||
stores don't race if W can be linked to W' by a
|
||||
|
||||
w-post-bounded ; vis ; w-pre-bounded
|
||||
|
@ -2378,8 +2385,8 @@ sequence, and if W' is plain then they also have to be linked by a
|
|||
|
||||
w-post-bounded ; vis ; r-pre-bounded
|
||||
|
||||
sequence. For a conflicting load R and store W, the LKMM says the two
|
||||
accesses don't race if R can be linked to W by an
|
||||
sequence. For race-candidate load R and store W, the LKMM says the
|
||||
two accesses don't race if R can be linked to W by an
|
||||
|
||||
r-post-bounded ; xb* ; w-pre-bounded
|
||||
|
||||
|
@ -2411,20 +2418,20 @@ is, the rules governing the memory subsystem's choice of a store to
|
|||
satisfy a load request and its determination of where a store will
|
||||
fall in the coherence order):
|
||||
|
||||
If R and W conflict and it is possible to link R to W by one
|
||||
of the xb* sequences listed above, then W ->rfe R is not
|
||||
allowed (i.e., a load cannot read from a store that it
|
||||
If R and W are race candidates and it is possible to link R to
|
||||
W by one of the xb* sequences listed above, then W ->rfe R is
|
||||
not allowed (i.e., a load cannot read from a store that it
|
||||
executes before, even if one or both is plain).
|
||||
|
||||
If W and R conflict and it is possible to link W to R by one
|
||||
of the vis sequences listed above, then R ->fre W is not
|
||||
allowed (i.e., if a store is visible to a load then the load
|
||||
must read from that store or one coherence-after it).
|
||||
If W and R are race candidates and it is possible to link W to
|
||||
R by one of the vis sequences listed above, then R ->fre W is
|
||||
not allowed (i.e., if a store is visible to a load then the
|
||||
load must read from that store or one coherence-after it).
|
||||
|
||||
If W and W' conflict and it is possible to link W to W' by one
|
||||
of the vis sequences listed above, then W' ->co W is not
|
||||
allowed (i.e., if one store is visible to a second then the
|
||||
second must come after the first in the coherence order).
|
||||
If W and W' are race candidates and it is possible to link W
|
||||
to W' by one of the vis sequences listed above, then W' ->co W
|
||||
is not allowed (i.e., if one store is visible to a second then
|
||||
the second must come after the first in the coherence order).
|
||||
|
||||
This is the extent to which the LKMM deals with plain accesses.
|
||||
Perhaps it could say more (for example, plain accesses might
|
||||
|
|
|
@ -126,7 +126,7 @@ However, it is not necessarily the case that accesses ordered by
|
|||
locking will be seen as ordered by CPUs not holding that lock.
|
||||
Consider this example:
|
||||
|
||||
/* See Z6.0+pooncerelease+poacquirerelease+fencembonceonce.litmus. */
|
||||
/* See Z6.0+pooncelock+pooncelock+pombonce.litmus. */
|
||||
void CPU0(void)
|
||||
{
|
||||
spin_lock(&mylock);
|
||||
|
|
|
@ -73,6 +73,18 @@ o Christopher Pulte, Shaked Flur, Will Deacon, Jon French,
|
|||
Linux-kernel memory model
|
||||
=========================
|
||||
|
||||
o Jade Alglave, Will Deacon, Boqun Feng, David Howells, Daniel
|
||||
Lustig, Luc Maranget, Paul E. McKenney, Andrea Parri, Nicholas
|
||||
Piggin, Alan Stern, Akira Yokosawa, and Peter Zijlstra.
|
||||
2019. "Calibrating your fear of big bad optimizing compilers"
|
||||
Linux Weekly News. https://lwn.net/Articles/799218/
|
||||
|
||||
o Jade Alglave, Will Deacon, Boqun Feng, David Howells, Daniel
|
||||
Lustig, Luc Maranget, Paul E. McKenney, Andrea Parri, Nicholas
|
||||
Piggin, Alan Stern, Akira Yokosawa, and Peter Zijlstra.
|
||||
2019. "Who's afraid of a big bad optimizing compiler?"
|
||||
Linux Weekly News. https://lwn.net/Articles/793253/
|
||||
|
||||
o Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
|
||||
Alan Stern. 2018. "Frightening small children and disconcerting
|
||||
grown-ups: Concurrency in the Linux kernel". In Proceedings of
|
||||
|
@ -88,6 +100,11 @@ o Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
|
|||
Alan Stern. 2017. "A formal kernel memory-ordering model (part 2)"
|
||||
Linux Weekly News. https://lwn.net/Articles/720550/
|
||||
|
||||
o Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
|
||||
Alan Stern. 2017-2019. "A Formal Model of Linux-Kernel Memory
|
||||
Ordering" (backup material for the LWN articles)
|
||||
https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/LWNLinuxMM/
|
||||
|
||||
|
||||
Memory-model tooling
|
||||
====================
|
||||
|
@ -110,5 +127,5 @@ Memory-model comparisons
|
|||
========================
|
||||
|
||||
o Paul E. McKenney, Ulrich Weigand, Andrea Parri, and Boqun
|
||||
Feng. 2016. "Linux-Kernel Memory Model". (6 June 2016).
|
||||
http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0124r2.html.
|
||||
Feng. 2018. "Linux-Kernel Memory Model". (27 September 2018).
|
||||
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0124r6.html.
|
||||
|
|
|
@ -28,8 +28,34 @@ downloaded separately:
|
|||
See "herdtools7/INSTALL.md" for installation instructions.
|
||||
|
||||
Note that although these tools usually provide backwards compatibility,
|
||||
this is not absolutely guaranteed. Therefore, if a later version does
|
||||
not work, please try using the exact version called out above.
|
||||
this is not absolutely guaranteed.
|
||||
|
||||
For example, a future version of herd7 might not work with the model
|
||||
in this release. A compatible model will likely be made available in
|
||||
a later release of Linux kernel.
|
||||
|
||||
If you absolutely need to run the model in this particular release,
|
||||
please try using the exact version called out above.
|
||||
|
||||
klitmus7 is independent of the model provided here. It has its own
|
||||
dependency on a target kernel release where converted code is built
|
||||
and executed. Any change in kernel APIs essential to klitmus7 will
|
||||
necessitate an upgrade of klitmus7.
|
||||
|
||||
If you find any compatibility issues in klitmus7, please inform the
|
||||
memory model maintainers.
|
||||
|
||||
klitmus7 Compatibility Table
|
||||
----------------------------
|
||||
|
||||
============ ==========
|
||||
target Linux herdtools7
|
||||
------------ ----------
|
||||
-- 4.18 7.48 --
|
||||
4.15 -- 4.19 7.49 --
|
||||
4.20 -- 5.5 7.54 --
|
||||
5.6 -- 7.56 --
|
||||
============ ==========
|
||||
|
||||
|
||||
==================
|
||||
|
@ -207,11 +233,15 @@ The Linux-kernel memory model (LKMM) has the following limitations:
|
|||
case as a store release.
|
||||
|
||||
b. The "unless" RMW operations are not currently modeled:
|
||||
atomic_long_add_unless(), atomic_add_unless(),
|
||||
atomic_inc_unless_negative(), and
|
||||
atomic_dec_unless_positive(). These can be emulated
|
||||
atomic_long_add_unless(), atomic_inc_unless_negative(),
|
||||
and atomic_dec_unless_positive(). These can be emulated
|
||||
in litmus tests, for example, by using atomic_cmpxchg().
|
||||
|
||||
One exception of this limitation is atomic_add_unless(),
|
||||
which is provided directly by herd7 (so no corresponding
|
||||
definition in linux-kernel.def). atomic_add_unless() is
|
||||
modeled by herd7 therefore it can be used in litmus tests.
|
||||
|
||||
c. The call_rcu() function is not modeled. It can be
|
||||
emulated in litmus tests by adding another process that
|
||||
invokes synchronize_rcu() and the body of the callback
|
||||
|
|
Loading…
Reference in New Issue