linux_old1/arch/arm64/include/asm
Will Deacon 8e86f0b409 arm64: atomics: fix use of acquire + release for full barrier semantics
Linux requires a number of atomic operations to provide full barrier
semantics, that is no memory accesses after the operation can be
observed before any accesses up to and including the operation in
program order.

On arm64, these operations have been incorrectly implemented as follows:

	// A, B, C are independent memory locations

	<Access [A]>

	// atomic_op (B)
1:	ldaxr	x0, [B]		// Exclusive load with acquire
	<op(B)>
	stlxr	w1, x0, [B]	// Exclusive store with release
	cbnz	w1, 1b

	<Access [C]>

The assumption here being that two half barriers are equivalent to a
full barrier, so the only permitted ordering would be A -> B -> C
(where B is the atomic operation involving both a load and a store).

Unfortunately, this is not the case by the letter of the architecture
and, in fact, the accesses to A and C are permitted to pass their
nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
store-release on B). This is a clear violation of the full barrier
requirement.

The simple way to fix this is to implement the same algorithm as ARMv7
using explicit barriers:

	<Access [A]>

	// atomic_op (B)
	dmb	ish		// Full barrier
1:	ldxr	x0, [B]		// Exclusive load
	<op(B)>
	stxr	w1, x0, [B]	// Exclusive store
	cbnz	w1, 1b
	dmb	ish		// Full barrier

	<Access [C]>

but this has the undesirable effect of introducing *two* full barrier
instructions. A better approach is actually the following, non-intuitive
sequence:

	<Access [A]>

	// atomic_op (B)
1:	ldxr	x0, [B]		// Exclusive load
	<op(B)>
	stlxr	w1, x0, [B]	// Exclusive store with release
	cbnz	w1, 1b
	dmb	ish		// Full barrier

	<Access [C]>

The simple observations here are:

  - The dmb ensures that no subsequent accesses (e.g. the access to C)
    can enter or pass the atomic sequence.

  - The dmb also ensures that no prior accesses (e.g. the access to A)
    can pass the atomic sequence.

  - Therefore, no prior access can pass a subsequent access, or
    vice-versa (i.e. A is strictly ordered before C).

  - The stlxr ensures that no prior access can pass the store component
    of the atomic operation.

The only tricky part remaining is the ordering between the ldxr and the
access to A, since the absence of the first dmb means that we're now
permitting re-ordering between the ldxr and any prior accesses.

From an (arbitrary) observer's point of view, there are two scenarios:

  1. We have observed the ldxr. This means that if we perform a store to
     [B], the ldxr will still return older data. If we can observe the
     ldxr, then we can potentially observe the permitted re-ordering
     with the access to A, which is clearly an issue when compared to
     the dmb variant of the code. Thankfully, the exclusive monitor will
     save us here since it will be cleared as a result of the store and
     the ldxr will retry. Notice that any use of a later memory
     observation to imply observation of the ldxr will also imply
     observation of the access to A, since the stlxr/dmb ensure strict
     ordering.

  2. We have not observed the ldxr. This means we can perform a store
     and influence the later ldxr. However, that doesn't actually tell
     us anything about the access to [A], so we've not lost anything
     here either when compared to the dmb variant.

This patch implements this solution for our barriered atomic operations,
ensuring that we satisfy the full barrier requirements where they are
needed.

Cc: <stable@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-07 16:45:43 +00:00
..
xen xen/arm64: do not call the swiotlb functions twice 2013-12-11 16:21:00 +00:00
Kbuild Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-01-25 11:17:34 -08:00
arch_timer.h ARM64: arch_timer: add support to configure and enable event stream 2013-09-26 09:47:43 +01:00
asm-offsets.h
assembler.h arm64: asm: add CPU_LE & CPU_BE assembler helpers 2013-10-25 15:59:38 +01:00
atomic.h arm64: atomics: fix use of acquire + release for full barrier semantics 2014-02-07 16:45:43 +00:00
barrier.h arm64: barriers: allow dsb macro to take option parameter 2014-02-06 11:39:11 +00:00
bitops.h arm64: klib: Optimised atomic bitops 2013-03-21 17:39:31 +00:00
cache.h arm64: Cache maintenance routines 2012-09-17 13:42:00 +01:00
cacheflush.h arm64: add DSB after icache flush in __flush_icache_all() 2014-02-05 10:26:35 +00:00
cachetype.h arm64: Cache maintenance routines 2012-09-17 13:42:00 +01:00
cmpxchg.h arm64: atomics: fix use of acquire + release for full barrier semantics 2014-02-07 16:45:43 +00:00
compat.h arm64: compat: add support for big-endian (BE8) AArch32 binaries 2013-10-25 15:59:35 +01:00
compiler.h arm64: Miscellaneous header files 2012-09-17 13:42:21 +01:00
cpu_ops.h arm64: kernel: cpu_{suspend/resume} implementation 2013-12-16 17:17:31 +00:00
cputable.h arm64: CPU support 2012-09-17 13:41:59 +01:00
cputype.h Merge tag 'arm64-suspend' of git://linux-arm.org/linux-2.6-lp into upstream 2013-12-19 17:57:51 +00:00
debug-monitors.h arm64: support single-step and breakpoint handler hooks 2013-12-19 17:43:11 +00:00
device.h arm64: device: add iommu pointer to device archdata 2013-06-11 18:15:55 +01:00
dma-contiguous.h arm64: fix build error if DMA_CMA is enabled 2014-01-27 12:00:25 +00:00
dma-mapping.h arm64/xen: get_dma_ops: return xen_dma_ops if we are running as xen_initial_domain 2013-10-18 16:01:32 +00:00
elf.h arm64: compat: add support for big-endian (BE8) AArch32 binaries 2013-10-25 15:59:35 +01:00
esr.h arm64: fix typo: s/SERRROR/SERROR/ 2014-02-05 10:42:32 +00:00
exception.h arm64: Use irqchip_init() for interrupt controller initialisation 2013-03-26 16:02:23 +00:00
exec.h arm64: Miscellaneous header files 2012-09-17 13:42:21 +01:00
fb.h arm64: Device specific operations 2012-09-17 13:42:04 +01:00
fpsimd.h arm64: elf: fix core dumping definitions for GP and FP registers 2012-11-08 16:06:20 +00:00
fpsimdmacros.h arm64: move FP-SIMD save/restore code to a macro 2012-12-05 11:26:50 +00:00
futex.h arm64: atomics: fix use of acquire + release for full barrier semantics 2014-02-07 16:45:43 +00:00
hardirq.h arm64: enable generic clockevent broadcast 2013-12-16 17:17:35 +00:00
hugetlb.h ARM64: mm: HugeTLB support. 2013-06-14 09:52:40 +01:00
hw_breakpoint.h arm64: Debugging support 2012-09-17 13:42:14 +01:00
hwcap.h ARM64: arch_timer: add support to configure and enable event stream 2013-09-26 09:47:43 +01:00
hypervisor.h arm64/xen: introduce asm/xen header files on arm64 2013-06-07 10:39:45 +00:00
insn.h arm64: introduce aarch64_insn_gen_{nop|branch_imm}() helper functions 2014-01-08 15:21:29 +00:00
io.h Revert "arm64: Fix memory shareability attribute for ioremap_wc/cache" 2014-01-16 18:32:25 +00:00
irq.h arm64: add CPU_HOTPLUG infrastructure 2013-10-25 11:33:21 +01:00
irqflags.h arm64: Unmask asynchronous aborts when in kernel mode 2013-11-25 16:44:05 +00:00
jump_label.h arm64, jump label: optimize jump label implementation 2014-01-08 15:23:53 +00:00
kvm_arm.h arm64: fix typo: s/SERRROR/SERROR/ 2014-02-05 10:42:32 +00:00
kvm_asm.h arm64: KVM: perform save/restore of PAR_EL1 2013-08-09 13:19:28 +01:00
kvm_coproc.h arm64: KVM: 32bit handling of coprocessor traps 2013-06-12 16:42:16 +01:00
kvm_emulate.h A handful of fixes for KVM/arm64: 2013-11-11 12:05:20 +01:00
kvm_host.h arm64: KVM: Add Kconfig option for max VCPUs per-Guest 2013-12-28 10:28:50 +00:00
kvm_mmio.h arm64: KVM: MMIO access backend 2013-06-07 14:03:38 +01:00
kvm_mmu.h arm/arm64: kvm: Use virt_to_idmap instead of virt_to_phys for idmap mappings 2013-12-11 09:49:31 -08:00
kvm_psci.h arm64: KVM: PSCI implementation 2013-06-12 16:40:32 +01:00
linkage.h arm64: fix alignment padding in assembly code 2012-10-20 11:12:01 +01:00
memblock.h arm64: MMU initialisation 2012-09-17 13:41:56 +01:00
memory.h arm64: Correct virt_addr_valid 2013-12-19 17:43:02 +00:00
mmu.h arm64: Add simple earlyprintk support 2013-01-22 17:51:01 +00:00
mmu_context.h arm64: mm: don't bother invalidating the icache in switch_mm 2013-06-07 18:00:11 +01:00
module.h arm64: Loadable modules 2012-09-17 13:42:19 +01:00
neon.h arm64: add support for kernel mode NEON 2013-08-20 12:12:26 +01:00
page.h arm64: MMU fault handling and page table management 2012-09-17 13:41:57 +01:00
percpu.h arm64: percpu: implement optimised pcpu access using tpidr_el1 2013-12-19 17:43:06 +00:00
perf_event.h arm64: perf: add guest vs host discrimination 2013-01-29 16:56:17 +00:00
pgalloc.h arm64: handle pgtable_page_ctor() fail 2013-11-15 09:32:16 +09:00
pgtable-2level-hwdef.h arm64: Use 42-bit address space with 64K pages 2013-11-05 17:23:52 +00:00
pgtable-2level-types.h ARM64: include: asm: include "asm/types.h" in "pgtable-2level-types.h" and "pgtable-3level-types.h" 2013-08-22 11:44:41 +01:00
pgtable-3level-hwdef.h arm64: MMU definitions 2012-09-17 13:41:56 +01:00
pgtable-3level-types.h ARM64: include: asm: include "asm/types.h" in "pgtable-2level-types.h" and "pgtable-3level-types.h" 2013-08-22 11:44:41 +01:00
pgtable-hwdef.h arm64: mm: Fix PMD_SECT_PROT_NONE definition 2013-12-06 17:22:44 +00:00
pgtable.h arm64: mm: Introduce PTE_WRITE 2014-01-31 11:30:49 +00:00
pmu.h arm64: Performance counters support 2012-09-17 13:42:17 +01:00
proc-fns.h arm64: kernel: suspend/resume registers save/restore 2013-12-16 17:17:31 +00:00
processor.h arm64: compat: add support for big-endian (BE8) AArch32 binaries 2013-10-25 15:59:35 +01:00
psci.h arm64: unify smp_psci.c and psci.c 2013-10-25 11:33:19 +01:00
ptrace.h arm64: compat: add support for big-endian (BE8) AArch32 binaries 2013-10-25 15:59:35 +01:00
shmparam.h arm64: ELF definitions 2012-09-17 13:42:07 +01:00
sigcontext.h UAPI: (Scripted) Disintegrate arch/arm64/include/asm 2012-10-11 11:05:13 +01:00
signal32.h arm64: 32-bit (compat) applications support 2012-09-17 13:42:12 +01:00
smp.h arm64: add CPU_HOTPLUG infrastructure 2013-10-25 11:33:21 +01:00
smp_plat.h arm64: kernel: build MPIDR_EL1 hash function data structure 2013-12-16 17:17:30 +00:00
sparsemem.h arm64: MMU definitions 2012-09-17 13:41:56 +01:00
spinlock.h arm64: lockref: add support for lockless lockrefs using cmpxchg 2013-10-24 15:46:34 +01:00
spinlock_types.h arm64: Fix the endianness of arch_spinlock_t 2013-10-25 16:10:22 +01:00
stacktrace.h arm64: Exception handling 2012-09-17 10:24:46 +01:00
stat.h UAPI: (Scripted) Disintegrate arch/arm64/include/asm 2012-10-11 11:05:13 +01:00
string.h arm64: klib: Optimised string functions 2013-03-21 17:39:30 +00:00
suspend.h arm64: kernel: cpu_{suspend/resume} implementation 2013-12-16 17:17:31 +00:00
sync_bitops.h arm64/xen: introduce asm/xen header files on arm64 2013-06-07 10:39:45 +00:00
syscall.h arm64: check for number of arguments in syscall_get/set_arguments() 2013-10-23 15:45:35 +01:00
syscalls.h arm64: switch to generic sigaltstack 2013-02-14 09:17:29 -05:00
system_misc.h arm64: use common reboot infrastructure 2013-07-19 15:57:08 +01:00
thread_info.h preempt: Make PREEMPT_ACTIVE generic 2013-11-13 20:21:47 +01:00
timex.h arm64: kernel: compiling issue, need delete read_current_timer() 2013-06-10 17:58:20 +01:00
tlb.h Fix TLB gather virtual address range invalidation corner cases 2013-08-16 08:52:46 -07:00
tlbflush.h ARM64: mm: THP support. 2013-06-14 09:52:41 +01:00
traps.h arm64: Exception handling 2012-09-17 10:24:46 +01:00
uaccess.h arm64: use generic strnlen_user and strncpy_from_user functions 2013-12-19 17:43:06 +00:00
ucontext.h arm64: fix padding computation in struct ucontext 2013-03-18 10:42:16 +00:00
unistd.h burying unused conditionals 2013-02-14 09:21:15 -05:00
unistd32.h arm64: compat: Wire up new AArch32 syscalls 2014-02-05 12:03:52 +00:00
vdso.h arm64: VDSO support 2012-09-17 13:42:09 +01:00
vdso_datapage.h arm64: VDSO support 2012-09-17 13:42:09 +01:00
virt.h arm64: head: create a new function for setting the boot_cpu_mode flag 2013-10-25 15:59:39 +01:00
word-at-a-time.h arm64: dcache: select DCACHE_WORD_ACCESS for little-endian CPUs 2013-12-19 17:43:08 +00:00