include/asm-generic/qrwlock.h was trying to get arch_spin_is_locked via
asm-generic/qspinlock.h. However, this does not work because architectures
might be using queued rwlocks but not queued spinlocks (csky), or because they
might be defining their own queued_* macros before including asm/qspinlock.h.
To fix this, ensure that asm/spinlock.h always includes qrwlock.h after
defining arch_spin_is_locked (either directly for csky, or via
asm/qspinlock.h for other architectures). The only inclusion elsewhere
is in kernel/locking/qrwlock.c. That one is really unnecessary because
the file is only compiled in SMP configurations (config QUEUED_RWLOCKS
depends on SMP) and in that case linux/spinlock.h already includes
asm/qrwlock.h if needed, via asm/spinlock.h.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Waiman Long <longman@redhat.com>
Fixes: 26128cb6c7 ("locking/rwlocks: Add contention detection for rwlocks")
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Ben Gardon <bgardon@google.com>
[Add arch/sparc and kernel/locking parts per discussion with Waiman. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The mmiowb() macro is horribly difficult to use and drivers will continue
to work most of the time if they omit a call when it is required.
Rather than rely on driver authors getting this right, push mmiowb() into
arch_spin_unlock() for mips. If this is deemed to be a performance issue,
a subsequent optimisation could make use of ARCH_HAS_MMIOWB to elide
the barrier in cases where no I/O writes were performed inside the
critical section.
Acked-by: Paul Burton <paul.burton@mips.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The arch_{read,spin,write}_lock_flags() macros are simply mapped to the
non-flags versions by the majority of architectures, so do this in core
code and remove the dummy implementations. Also remove the implementation
in spinlock_up.h, since all callers of do_raw_spin_lock_flags() call
local_irq_save(flags) anyway.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1507055129-12300-4-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
arch_{read,spin,write}_relax() are defined as cpu_relax() by the core
code, so architectures that can't do better (i.e. most of them) don't
need to bother with the dummy definitions.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1507055129-12300-3-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch switches MIPS to make use of generically implemented queued
spinlocks, rather than the ticket spinlocks used previously. This allows
us to drop a whole load of inline assembly, share more generic code, and
is also a performance win.
Results from running the AIM7 short workload on a MIPS Creator Ci40 (ie.
2 core 2 thread interAptiv CPU clocked at 546MHz) with v4.12-rc4
pistachio_defconfig, with ftrace disabled due to a current bug, and both
with & without use of queued rwlocks & spinlocks:
Forks | v4.12-rc4 | +qlocks | Change
-------|-----------|----------|--------
10 | 52630.32 | 53316.31 | +1.01%
20 | 51777.80 | 52623.15 | +1.02%
30 | 51645.92 | 52517.26 | +1.02%
40 | 51634.88 | 52419.89 | +1.02%
50 | 51506.75 | 52307.81 | +1.02%
60 | 51500.74 | 52322.72 | +1.02%
70 | 51434.81 | 52288.60 | +1.02%
80 | 51423.22 | 52434.85 | +1.02%
90 | 51428.65 | 52410.10 | +1.02%
The kernels used for these tests also had my "MIPS: Hardcode cpu_has_*
where known at compile time due to ISA" patch applied, which allows the
kernel_uses_llsc checks in cmpxchg() & xchg() to be optimised away at
compile time.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16358/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This patch switches MIPS to make use of generically implemented queued
read/write locks, rather than the custom implementation used previously.
This allows us to drop a whole load of inline assembly, share more
generic code, and is also a performance win.
Results from running the AIM7 short workload on a MIPS Creator Ci40 (ie.
2 core 2 thread interAptiv CPU clocked at 546MHz) with v4.12-rc4
pistachio_defconfig, with ftrace disabled due to a current bug, and both
with & without use of queued rwlocks & spinlocks:
Forks | v4.12-rc4 | +qlocks | Change
-------|-----------|----------|--------
10 | 52630.32 | 53316.31 | +1.01%
20 | 51777.80 | 52623.15 | +1.02%
30 | 51645.92 | 52517.26 | +1.02%
40 | 51634.88 | 52419.89 | +1.02%
50 | 51506.75 | 52307.81 | +1.02%
60 | 51500.74 | 52322.72 | +1.02%
70 | 51434.81 | 52288.60 | +1.02%
80 | 51423.22 | 52434.85 | +1.02%
90 | 51428.65 | 52410.10 | +1.02%
The kernels used for these tests also had my "MIPS: Hardcode cpu_has_*
where known at compile time due to ISA" patch applied, which allows the
kernel_uses_llsc checks in cmpxchg() & xchg() to be optimised away at
compile time.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16357/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
When building for microMIPS we need to ensure that the assembler always
knows that there is code at the target of a branch or jump. Recent
toolchains will fail to link a microMIPS kernel when this isn't the case
due to what it thinks is a branch to non-microMIPS code.
mips-mti-linux-gnu-ld kernel/built-in.o: .spinlock.text+0x2fc: Unsupported branch between ISA modes.
mips-mti-linux-gnu-ld final link failed: Bad value
This is due to inline assembly labels in spinlock.h not being followed
by an instruction mnemonic, either due to a .subsection pseudo-op or the
end of the inline asm block.
Fix this with a .insn direction after such labels.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Reviewed-by: Maciej W. Rozycki <macro@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org>
Patchwork: https://patchwork.linux-mips.org/patch/15325/
Signed-off-by: James Hogan <james.hogan@imgtec.com>
This patch updates/fixes all spin_unlock_wait() implementations.
The update is in semantics; where it previously was only a control
dependency, we now upgrade to a full load-acquire to match the
store-release from the spin_unlock() we waited on. This ensures that
when spin_unlock_wait() returns, we're guaranteed to observe the full
critical section we waited on.
This fixes a number of spin_unlock_wait() users that (not
unreasonably) rely on this.
I also fixed a number of ticket lock versions to only wait on the
current lock holder, instead of for a full unlock, as this is
sufficient.
Furthermore; again for ticket locks; I added an smp_rmb() in between
the initial ticket load and the spin loop testing the current value
because I could not convince myself the address dependency is
sufficient, esp. if the loads are of different sizes.
I'm more than happy to remove this smp_rmb() again if people are
certain the address dependency does indeed work as expected.
Note: PPC32 will be fixed independently
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: chris@zankel.net
Cc: cmetcalf@mellanox.com
Cc: davem@davemloft.net
Cc: dhowells@redhat.com
Cc: james.hogan@imgtec.com
Cc: jejb@parisc-linux.org
Cc: linux@armlinux.org.uk
Cc: mpe@ellerman.id.au
Cc: ralf@linux-mips.org
Cc: realmz6@gmail.com
Cc: rkuo@codeaurora.org
Cc: rth@twiddle.net
Cc: schwidefsky@de.ibm.com
Cc: tony.luck@intel.com
Cc: vgupta@synopsys.com
Cc: ysato@users.sourceforge.jp
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On MIPS64 we have spinlocks that are 32b in size and an efficient
cmpxchg64 implementation, so we qualify to make use of cmpxchg backed
lockrefs. Select the ARCH_USE_CMPXCHG_LOCKREF Kconfig symbol and provide
a trivial implementation of arch_spin_value_unlocked to satisfy the
lockref code.
Using Linus' simple testcase from
http://article.gmane.org/gmane.linux.file-systems/77466 on a dual core
system with an in-development MIPS64 CPU running on FPGA I see around an
8% gain:
Pre-patch:
Total loops: 252698
Total loops: 251482
Total loops: 250806
Total loops: 252885
Total loops: 251666
Post-patch:
Total loops: 273728
Total loops: 269932
Total loops: 269341
Total loops: 275004
Total loops: 270208
[ralf@linux-mips.org: Fixed conflict.]
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Steven J. Hill <Steven.Hill@imgtec.com>
Cc: linux-kernel@vger.kernel.org
Cc: Maciej W. Rozycki <macro@codesourcery.com>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Patchwork: https://patchwork.linux-mips.org/patch/10810/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Commit 5753762cbd1c("MIPS: asm: spinlock: Replace "sub" instruction
with "addiu") replaced the "sub" instruction with addiu but it did
not update the immediate value in the R10000_LLSC_WAR case.
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Fixes: 5753762cbd1c("MIPS: asm: spinlock: Replace "sub" instruction with "addiu"")
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9385/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
"sub $reg, imm" is not a real MIPS instruction. The assembler can
replace that with "addi $reg, -imm". However, addi has been removed
from R6, so we replace the "sub" instruction with the "addiu" one.
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
The GCC_OFF12_ASM macro is used for 12-bit immediate constrains
but we will also use it for 9-bit constrains on MIPS R6 so we
rename it to something more appropriate.
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
In the microMIPS encoding some memory access instructions have their
immediate offset reduced to 12 bits only. That does not match the GCC
`R' constraint we use in some places to satisfy the requirement,
resulting in build failures like this:
{standard input}: Assembler messages:
{standard input}:720: Error: macro used $at after ".set noat"
{standard input}:720: Warning: macro instruction expanded into multiple instructions
Fix the problem by defining a macro, `GCC_OFF12_ASM', that expands to
the right constraint depending on whether microMIPS or standard MIPS
code is produced. Also apply the fix to where `m' is used as in the
worst case this change does nothing, e.g. where the pointer was already
in a register such as a function argument and no further offset was
requested, and in the best case it avoids an extraneous sequence of up
to two instructions to load the high 20 bits of the address in the LL/SC
loop. This reduces the risk of lock contention that is the higher the
more instructions there are in the critical section between LL and SC.
Strictly speaking we could just bulk-replace `R' with `ZC' as the latter
constraint adjusts automatically depending on the ISA selected.
However it was only introduced with GCC 4.9 and we keep supporing older
compilers for the standard MIPS configuration, hence the slightly more
complicated approach I chose.
The choice of a zero-argument function-like rather than an object-like
macro was made so that it does not look like a function call taking the
C expression used for the constraint as an argument. This is so as not
to confuse the reader or formatting checkers like `checkpatch.pl' and
follows previous practice.
Signed-off-by: Maciej W. Rozycki <macro@codesourcery.com>
Signed-off-by: Steven J. Hill <Steven.Hill@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/8482/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
We were doing:
SRL $r,$?,16
ANDI $r,$r,0xffff
The logical right shift by 16 leaves the upper 16 bits clear, so the
subsequent masking out of those bits is redundant, and can safely be
removed.
Signed-off-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
7837314d14 [MIPS: Get rid of branches to
.subsections] removed most uses of .subsection] removed most uses of
.subsection in inline assembler code.
It left the instances in spinlock.h alone because we knew their use was
in fairly small files where .subsection use was fine but of course this
was a fragile assumption. LTO breaks this assumption resulting in build
errors due to exceeded branch range, so remove further instances of
.subsection.
The two functions that still use .macro don't currently cause issues
however this use is still fragile.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Having received another series of whitespace patches I decided to do this
once and for all rather than dealing with this kind of patches trickling
in forever.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The current locking mechanism uses a ll/sc sequence to release a
spinlock. This is slower than a wmb() followed by a store to unlock.
The branching forward to .subsection 2 on sc failure slows down the
contended case. So we get rid of that part too.
Since we are now working on naturally aligned u16 values, we can get
rid of a masking operation as the LHU already does the right thing.
The ANDI are reversed for better scheduling on multi-issue CPUs
On a 12 CPU 750MHz Octeon cn5750 this patch improves ipv4 UDP packet
forwarding rates from 3.58*10^6 PPS to 3.99*10^6 PPS, or about 11%.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
To: linux-mips@linux-mips.org
Patchwork: http://patchwork.linux-mips.org/patch/937/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Replace some instances of smp_llsc_mb() with a new macro
smp_mb__before_llsc(). It is used before ll/sc sequences that are
documented as needing write barrier semantics.
The default implementation of smp_mb__before_llsc() is just smp_llsc_mb(),
so there are no changes in semantics.
Also simplify definition of smp_mb(), smp_rmb(), and smp_wmb() to be just
barrier() in the non-SMP case.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
To: linux-mips@linux-mips.org
Patchwork: http://patchwork.linux-mips.org/patch/851/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Name space cleanup for rwlock functions. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
Not strictly necessary for -rt as -rt does not have non sleeping
rwlocks, but it's odd to not have a consistent naming convention.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
Name space cleanup. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
The raw_spin* namespace was taken by lockdep for the architecture
specific implementations. raw_spin_* would be the ideal name space for
the spinlocks which are not converted to sleeping locks in preempt-rt.
Linus suggested to convert the raw_ to arch_ locks and cleanup the
name space instead of using an artifical name like core_spin,
atomic_spin or whatever
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
Pass the original flags to rwlock arch-code, so that it can re-enable
interrupts if implemented for that architecture.
Initially, make __raw_read_lock_flags and __raw_write_lock_flags stubs
which just do the same thing as non-flags variants.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <linux-arch@vger.kernel.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the lock is not acquired and has to spin *and* the second attempt
to acquire the lock fails, the delay time is not masked by the ticket
range mask. If the ticket number wraps around to zero, the result is
that the lock sampling delay is essentially infinite (due to casting
-1 to an unsigned int).
The fix: Always mask the difference between my_ticket and the current
ticket value before calculating the delay.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Architectures other than mips and x86 are not using ticket spinlocks.
Therefore, the contention on the lock is meaningless, since there is
nobody known to be waiting on it (arguably /fairly/ unfair locks).
Dummy it out to return 0 on other architectures.
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>