linux/arch/s390/include/asm/barrier.h

/*
 * Copyright IBM Corp. 1999, 2009
 *
 * Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com>
 */

#ifndef __ASM_BARRIER_H
#define __ASM_BARRIER_H

/*
 * Force strict CPU ordering.
 * And yes, this is required on UP too when we're talking
 * to devices.
 */

#ifdef CONFIG_HAVE_MARCH_Z196_FEATURES
/* Fast-BCR without checkpoint synchronization */
#define __ASM_BARRIER "bcr 14,0\n"
#else
#define __ASM_BARRIER "bcr 15,0\n"
#endif

#define mb() do {  asm volatile(__ASM_BARRIER : : : "memory"); } while (0)

#define rmb()				mb()
#define wmb()				mb()
#define dma_rmb()			rmb()
#define dma_wmb()			wmb()
#define smp_mb()			mb()
#define smp_rmb()			rmb()
#define smp_wmb()			wmb()

#define read_barrier_depends()		do { } while (0)
#define smp_read_barrier_depends()	do { } while (0)

#define smp_mb__before_atomic()		smp_mb()
#define smp_mb__after_atomic()		smp_mb()

#define set_mb(var, value)		do { WRITE_ONCE(var, value); mb(); } while (0)

#define smp_store_release(p, v)						\
do {									\
	compiletime_assert_atomic_type(*p);				\
	barrier();							\
	ACCESS_ONCE(*p) = (v);						\
} while (0)

#define smp_load_acquire(p)						\
({									\
	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
	compiletime_assert_atomic_type(*p);				\
	barrier();							\
	___p1;								\
})

#endif /* __ASM_BARRIER_H */
Disintegrate asm/system.h for S390 Disintegrate asm/system.h for S390. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-s390@vger.kernel.org 2012-03-29 01:30:02 +08:00			`/*`
			`* Copyright IBM Corp. 1999, 2009`
			`*`
			`* Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com>`
			`*/`

			`#ifndef __ASM_BARRIER_H`
			`#define __ASM_BARRIER_H`

			`/*`
			`* Force strict CPU ordering.`
			`* And yes, this is required on UP too when we're talking`
			`* to devices.`
			`*/`

s390/barrier: make use of fast-bcr facility If the kernel gets compiled for at least z196, make use of the fast-BCR facility. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> 2012-05-14 18:41:54 +08:00			`#ifdef CONFIG_HAVE_MARCH_Z196_FEATURES`
s390/barrier: convert mb() to define again Some of the now available common code drivers only compile if mb() is a define. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> 2013-01-30 20:56:14 +08:00			`/* Fast-BCR without checkpoint synchronization */`
s390/spinlock: optimize spin_unlock code Use a memory barrier + store sequence instead of a load + compare and swap sequence to unlock a spinlock and an rw lock. For the spinlock case this saves us two memory reads and a not needed cpu serialization after the compare and swap instruction stored the new value. The kernel size (performance_defconfig) gets reduced by ~14k. Average execution time of a tight inlined spin_unlock loop drops from 5.8ns to 0.7ns on a zEC12 machine. An artificial stress test case where several counters are protected with a single spinlock and which are only incremented while holding the spinlock shows ~30% improvement on a 4 cpu machine. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> 2014-09-08 14:20:43 +08:00			`#define __ASM_BARRIER "bcr 14,0\n"`
s390/barrier: make use of fast-bcr facility If the kernel gets compiled for at least z196, make use of the fast-BCR facility. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> 2012-05-14 18:41:54 +08:00			`#else`
s390/spinlock: optimize spin_unlock code Use a memory barrier + store sequence instead of a load + compare and swap sequence to unlock a spinlock and an rw lock. For the spinlock case this saves us two memory reads and a not needed cpu serialization after the compare and swap instruction stored the new value. The kernel size (performance_defconfig) gets reduced by ~14k. Average execution time of a tight inlined spin_unlock loop drops from 5.8ns to 0.7ns on a zEC12 machine. An artificial stress test case where several counters are protected with a single spinlock and which are only incremented while holding the spinlock shows ~30% improvement on a 4 cpu machine. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> 2014-09-08 14:20:43 +08:00			`#define __ASM_BARRIER "bcr 15,0\n"`
s390/barrier: make use of fast-bcr facility If the kernel gets compiled for at least z196, make use of the fast-BCR facility. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> 2012-05-14 18:41:54 +08:00			`#endif`
s390/barrier: cleanup barrier functions s390 really has no eieio instruction, so get rid of the implied ppc semantics and in addition change mb() into a function. Also remove SYNC_OTHER_CORES() since it is unused. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> 2012-05-14 18:40:43 +08:00
s390/spinlock: optimize spin_unlock code Use a memory barrier + store sequence instead of a load + compare and swap sequence to unlock a spinlock and an rw lock. For the spinlock case this saves us two memory reads and a not needed cpu serialization after the compare and swap instruction stored the new value. The kernel size (performance_defconfig) gets reduced by ~14k. Average execution time of a tight inlined spin_unlock loop drops from 5.8ns to 0.7ns on a zEC12 machine. An artificial stress test case where several counters are protected with a single spinlock and which are only incremented while holding the spinlock shows ~30% improvement on a 4 cpu machine. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> 2014-09-08 14:20:43 +08:00			`#define mb() do { asm volatile(__ASM_BARRIER : : : "memory"); } while (0)`

s390/barrier: cleanup barrier functions s390 really has no eieio instruction, so get rid of the implied ppc semantics and in addition change mb() into a function. Also remove SYNC_OTHER_CORES() since it is unused. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> 2012-05-14 18:40:43 +08:00			`#define rmb() mb()`
			`#define wmb() mb()`
arch: Add lightweight memory barriers dma_rmb() and dma_wmb() There are a number of situations where the mandatory barriers rmb() and wmb() are used to order memory/memory operations in the device drivers and those barriers are much heavier than they actually need to be. For example in the case of PowerPC wmb() calls the heavy-weight sync instruction when for coherent memory operations all that is really needed is an lsync or eieio instruction. This commit adds a coherent only version of the mandatory memory barriers rmb() and wmb(). In most cases this should result in the barrier being the same as the SMP barriers for the SMP case, however in some cases we use a barrier that is somewhere in between rmb() and smp_rmb(). For example on ARM the rmb barriers break down as follows: Barrier Call Explanation --------- -------- ---------------------------------- rmb() dsb() Data synchronization barrier - system dma_rmb() dmb(osh) data memory barrier - outer sharable smp_rmb() dmb(ish) data memory barrier - inner sharable These new barriers are not as safe as the standard rmb() and wmb(). Specifically they do not guarantee ordering between coherent and incoherent memories. The primary use case for these would be to enforce ordering of reads and writes when accessing coherent memory that is shared between the CPU and a device. It may also be noted that there is no dma_mb(). Most architectures don't provide a good mechanism for performing a coherent only full barrier without resorting to the same mechanism used in mb(). As such there isn't much to be gained in trying to define such a function. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: David Miller <davem@davemloft.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> 2014-12-12 07:02:06 +08:00			`#define dma_rmb() rmb()`
			`#define dma_wmb() wmb()`
s390/barrier: cleanup barrier functions s390 really has no eieio instruction, so get rid of the implied ppc semantics and in addition change mb() into a function. Also remove SYNC_OTHER_CORES() since it is unused. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> 2012-05-14 18:40:43 +08:00			`#define smp_mb() mb()`
			`#define smp_rmb() rmb()`
			`#define smp_wmb() wmb()`
arch: Cleanup read_barrier_depends() and comments This patch is meant to cleanup the handling of read_barrier_depends and smp_read_barrier_depends. In multiple spots in the kernel headers read_barrier_depends is defined as "do {} while (0)", however we then go into the SMP vs non-SMP sections and have the SMP version reference read_barrier_depends, and the non-SMP define it as yet another empty do/while. With this commit I went through and cleaned out the duplicate definitions and reduced the number of definitions down to 2 per header. In addition I moved the 50 line comments for the macro from the x86 and mips headers that defined it as an empty do/while to those that were actually defining the macro, alpha and blackfin. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> 2014-12-12 07:01:55 +08:00
			`#define read_barrier_depends() do { } while (0)`
			`#define smp_read_barrier_depends() do { } while (0)`
arch,s390: Convert smp_mb__*() As per the existing implementation; implement the new one using smp_mb(). AFAICT the s390 compare-and-swap does imply a barrier, however there are some immediate ops that seem to be singly-copy atomic and do not imply a barrier. One such is the "ni" op (which would be and-immediate) which is used for the constant clear_bit implementation. Therefore s390 needs full barriers for the {before,after} atomic ops. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-kme5dz5hcobpnufnnkh1ech2@git.kernel.org Cc: Chen Gang <gang.chen@asianux.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux390@de.ibm.com Cc: linux-kernel@vger.kernel.org Cc: linux-s390@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> 2014-03-14 02:00:35 +08:00
			`#define smp_mb__before_atomic() smp_mb()`
			`#define smp_mb__after_atomic() smp_mb()`
Disintegrate asm/system.h for S390 Disintegrate asm/system.h for S390. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-s390@vger.kernel.org 2012-03-29 01:30:02 +08:00
locking/arch: Add WRITE_ONCE() to set_mb() Since we assume set_mb() to result in a single store followed by a full memory barrier, employ WRITE_ONCE(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> 2015-05-12 16:52:27 +08:00			`#define set_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0)`
Disintegrate asm/system.h for S390 Disintegrate asm/system.h for S390. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-s390@vger.kernel.org 2012-03-29 01:30:02 +08:00
arch: Introduce smp_load_acquire(), smp_store_release() A number of situations currently require the heavyweight smp_mb(), even though there is no need to order prior stores against later loads. Many architectures have much cheaper ways to handle these situations, but the Linux kernel currently has no portable way to make use of them. This commit therefore supplies smp_load_acquire() and smp_store_release() to remedy this situation. The new smp_load_acquire() primitive orders the specified load against any subsequent reads or writes, while the new smp_store_release() primitive orders the specifed store against any prior reads or writes. These primitives allow array-based circular FIFOs to be implemented without an smp_mb(), and also allow a theoretical hole in rcu_assign_pointer() to be closed at no additional expense on most architectures. In addition, the RCU experience transitioning from explicit smp_read_barrier_depends() and smp_wmb() to rcu_dereference() and rcu_assign_pointer(), respectively resulted in substantial improvements in readability. It therefore seems likely that replacing other explicit barriers with smp_load_acquire() and smp_store_release() will provide similar benefits. It appears that roughly half of the explicit barriers in core kernel code might be so replaced. [Changelog by PaulMck] Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> 2013-11-06 21:57:36 +08:00			`#define smp_store_release(p, v) \`
			`do { \`
			`compiletime_assert_atomic_type(*p); \`
			`barrier(); \`
			`ACCESS_ONCE(*p) = (v); \`
			`} while (0)`

			`#define smp_load_acquire(p) \`
			`({ \`
			`typeof(p) ___p1 = ACCESS_ONCE(p); \`
			`compiletime_assert_atomic_type(*p); \`
			`barrier(); \`
			`___p1; \`
			`})`

Disintegrate asm/system.h for S390 Disintegrate asm/system.h for S390. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-s390@vger.kernel.org 2012-03-29 01:30:02 +08:00			`#endif /* __ASM_BARRIER_H */`