Commit Graph

13 Commits

Author SHA1 Message Date
Peter Zijlstra 726328d92a locking/spinlock, arch: Update and fix spin_unlock_wait() implementations
This patch updates/fixes all spin_unlock_wait() implementations.

The update is in semantics; where it previously was only a control
dependency, we now upgrade to a full load-acquire to match the
store-release from the spin_unlock() we waited on. This ensures that
when spin_unlock_wait() returns, we're guaranteed to observe the full
critical section we waited on.

This fixes a number of spin_unlock_wait() users that (not
unreasonably) rely on this.

I also fixed a number of ticket lock versions to only wait on the
current lock holder, instead of for a full unlock, as this is
sufficient.

Furthermore; again for ticket locks; I added an smp_rmb() in between
the initial ticket load and the spin loop testing the current value
because I could not convince myself the address dependency is
sufficient, esp. if the loads are of different sizes.

I'm more than happy to remove this smp_rmb() again if people are
certain the address dependency does indeed work as expected.

Note: PPC32 will be fixed independently

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: chris@zankel.net
Cc: cmetcalf@mellanox.com
Cc: davem@davemloft.net
Cc: dhowells@redhat.com
Cc: james.hogan@imgtec.com
Cc: jejb@parisc-linux.org
Cc: linux@armlinux.org.uk
Cc: mpe@ellerman.id.au
Cc: ralf@linux-mips.org
Cc: realmz6@gmail.com
Cc: rkuo@codeaurora.org
Cc: rth@twiddle.net
Cc: schwidefsky@de.ibm.com
Cc: tony.luck@intel.com
Cc: vgupta@synopsys.com
Cc: ysato@users.sourceforge.jp
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14 11:55:15 +02:00
Vineet Gupta ed6aefed72 Revert "ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff"
This reverts commit e78fdfef84.

The issue was fixed in hardware in HS2.1C release and there are no known
external users of affected RTL so revert the whole delayed retry series !

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-06-02 10:59:23 +05:30
Vineet Gupta 819f3602dc Revert "ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle"
This reverts commit b89aa12c17.

The issue was fixed in hardware in HS2.1C release and there are no known
external users of affected RTL so revert the whole delayed retry series !

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-06-02 10:59:23 +05:30
Vineet Gupta 42316a201a Revert "ARCv2: spinlock/rwlock/atomics: reduce 1 instruction in exponential backoff"
This reverts commit 1097163870.

The issue was fixed in hardware in HS2.1C release and there are no known
external users of affected RTL - so revert thw whole delayed retry
series !

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-06-02 10:59:22 +05:30
Noam Camus 2a1021fce8 ARC: rwlock: disable interrupts in !LLSC variant
If we hold rwlock and interrupt occures we may
end up spinning on it for ever during softirq.
Note that this lock is an internal lock
and since the lock is free to be used from any context,
the lock needs to be IRQ-safe.

Below you may see an example for interrupt we get while
nl_table_lock is holding its rw->lock_mutex and we spinned
on it for ever.

The concept for the fix was taken from SPARC.

[2015-05-12 19:16:12] Stack Trace:
[2015-05-12 19:16:12]   arc_unwind_core+0xb8/0x11c
[2015-05-12 19:16:12]   dump_stack+0x68/0xac
[2015-05-12 19:16:12]   _raw_read_lock+0xa8/0xac
[2015-05-12 19:16:12]   netlink_broadcast_filtered+0x56/0x35c
[2015-05-12 19:16:12]   nlmsg_notify+0x42/0xa4
[2015-05-12 19:16:13]   neigh_update+0x1fe/0x44c
[2015-05-12 19:16:13]   neigh_event_ns+0x40/0xa4
[2015-05-12 19:16:13]   arp_process+0x46e/0x5a8
[2015-05-12 19:16:13]   __netif_receive_skb_core+0x358/0x500
[2015-05-12 19:16:13]   process_backlog+0x92/0x154
[2015-05-12 19:16:13]   net_rx_action+0xb8/0x188
[2015-05-12 19:16:13]   __do_softirq+0xda/0x1d8
[2015-05-12 19:16:14]   irq_exit+0x8a/0x8c
[2015-05-12 19:16:14]   arch_do_IRQ+0x6c/0xa8
[2015-05-12 19:16:14]   handle_interrupt_level1+0xe4/0xf0

Signed-off-by: Noam Camus <noamc@ezchip.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
2016-05-09 09:32:32 +05:30
Vineet Gupta 1097163870 ARCv2: spinlock/rwlock/atomics: reduce 1 instruction in exponential backoff
The increment of delay counter was 2 instructions:
Arithmatic Shfit Left (ASL) + set to 1 on overflow

This can be done in 1 using ROtate Left (ROL)

Suggested-by: Nigel Topham <ntopham@synopsys.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-07 13:56:16 +05:30
Vineet Gupta b89aa12c17 ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle
The previous commit for delayed retry of SCOND needs some fine tuning
for spin locks.

The backoff from delayed retry in conjunction with spin looping of lock
itself can potentially cause the delay counter to reach high values.
So to provide fairness to any lock operation, after a lock "seems"
available (i.e. just before first SCOND try0, reset the delay counter
back to starting value of 1

Essentially reset delay to 1 for a new spin-wait-loop-acquire cycle.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-04 09:26:35 +05:30
Vineet Gupta e78fdfef84 ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff
This is to workaround the llock/scond livelock

HS38x4 could get into a LLOCK/SCOND livelock in case of multiple overlapping
coherency transactions in the SCU. The exclusive line state keeps rotating
among contenting cores leading to a never ending cycle. So break the cycle
by deferring the retry of failed exclusive access (SCOND). The actual delay
needed is function of number of contending cores as well as the unrelated
coherency traffic from other cores. To keep the code simple, start off with
small delay of 1 which would suffice most cases and in case of contention
double the delay. Eventually the delay is sufficient such that the coherency
pipeline is drained, thus a subsequent exclusive access would succeed.

Link: http://lkml.kernel.org/r/1438612568-28265-1-git-send-email-vgupta@synopsys.com
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-04 09:26:34 +05:30
Vineet Gupta 69cbe630f5 ARC: LLOCK/SCOND based rwlock
With LLOCK/SCOND, the rwlock counter can be atomically updated w/o need
for a guarding spin lock.

This in turn elides the EXchange instruction based spinning which causes
the cacheline transition to exclusive state and concurrent spinning
across cores would cause the line to keep bouncing around.
LLOCK/SCOND based implementation is superior as spinning on LLOCK keeps
the cacheline in shared state.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-04 09:26:33 +05:30
Vineet Gupta ae7eae9e03 ARC: LLOCK/SCOND based spin_lock
Current spin_lock uses EXchange instruction to implement the atomic test
and set of lock location (reads orig value and ST 1). This however forces
the cacheline into exclusive state (because of the ST) and concurrent
loops in multiple cores will bounce the line around between cores.

Instead, use LLOCK/SCOND to implement the atomic test and set which is
better as line is in shared state while lock is spinning on LLOCK

The real motivation of this change however is to make way for future
changes in atomics to implement delayed retry (with backoff).
Initial experiment with delayed retry in atomics combined with orig
EX based spinlock was a total disaster (broke even LMBench) as
struct sock has a cache line sharing an atomic_t and spinlock. The
tight spinning on lock, caused the atomic retry to keep backing off
such that it would never finish.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-04 09:26:33 +05:30
Vineet Gupta 2576c28e3f ARC: add smp barriers around atomics per Documentation/atomic_ops.txt
- arch_spin_lock/unlock were lacking the ACQUIRE/RELEASE barriers
   Since ARCv2 only provides load/load, store/store and all/all, we need
   the full barrier

 - LLOCK/SCOND based atomics, bitops, cmpxchg, which return modified
   values were lacking the explicit smp barriers.

 - Non LLOCK/SCOND varaints don't need the explicit barriers since that
   is implicity provided by the spin locks used to implement the
   critical section (the spin lock barriers in turn are also fixed in
   this commit as explained above

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-06-25 06:00:16 +05:30
Vineet Gupta 6c00350b57 ARC: Workaround spinlock livelock in SMP SystemC simulation
Some ARC SMP systems lack native atomic R-M-W (LLOCK/SCOND) insns and
can only use atomic EX insn (reg with mem) to build higher level R-M-W
primitives. This includes a SystemC based SMP simulation model.

So rwlocks need to use a protecting spinlock for atomic cmp-n-exchange
operation to update reader(s)/writer count.

The spinlock operation itself looks as follows:

	mov reg, 1		; 1=locked, 0=unlocked
retry:
	EX reg, [lock]		; load existing, store 1, atomically
	BREQ reg, 1, rety	; if already locked, retry

In single-threaded simulation, SystemC alternates between the 2 cores
with "N" insn each based scheduling. Additionally for insn with global
side effect, such as EX writing to shared mem, a core switch is
enforced too.

Given that, 2 cores doing a repeated EX on same location, Linux often
got into a livelock e.g. when both cores were fiddling with tasklist
lock (gdbserver / hackbench) for read/write respectively as the
sequence diagram below shows:

           core1                                   core2
         --------                                --------
1. spin lock [EX r=0, w=1] - LOCKED
2. rwlock(Read)            - LOCKED
3. spin unlock  [ST 0]     - UNLOCKED
                                         spin lock [EX r=0,w=1] - LOCKED
                      -- resched core 1----

5. spin lock [EX r=1] - ALREADY-LOCKED

                      -- resched core 2----
6.                                       rwlock(Write) - READER-LOCKED
7.                                       spin unlock [ST 0]
8.                                       rwlock failed, retry again

9.                                       spin lock  [EX r=0, w=1]
                      -- resched core 1----

10  spinlock locked in #9, retry #5
11. spin lock [EX gets 1]
                      -- resched core 2----
...
...

The fix was to unlock using the EX insn too (step 7), to trigger another
SystemC scheduling pass which would let core1 proceed, eliding the
livelock.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-09-27 16:28:48 +05:30
Vineet Gupta 6e35fa2d43 ARC: Spinlock/rwlock/mutex primitives
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
2013-02-11 20:00:35 +05:30