Commit Graph

7 Commits

Author SHA1 Message Date
Ard Biesheuvel f10dc56c64 crypto: arm64 - revert NEON yield for fast AEAD implementations
As it turns out, checking the TIF_NEED_RESCHED flag after each
iteration results in a significant performance regression (~10%)
when running fast algorithms (i.e., ones that use special instructions
and operate in the < 4 cycles per byte range) on in-order cores with
comparatively slow memory accesses such as the Cortex-A53.

Given the speed of these ciphers, and the fact that the page based
nature of the AEAD scatterwalk API guarantees that the core NEON
transform is never invoked with more than a single page's worth of
input, we can estimate the worst case duration of any resulting
scheduling blackout: on a 1 GHz Cortex-A53 running with 64k pages,
processing a page's worth of input at 4 cycles per byte results in
a delay of ~250 us, which is a reasonable upper bound.

So let's remove the yield checks from the fused AES-CCM and AES-GCM
routines entirely.

This reverts commit 7b67ae4d5c and
partially reverts commit 7c50136a8a.

Fixes: 7c50136a8a ("crypto: arm64/aes-ghash - yield NEON after every ...")
Fixes: 7b67ae4d5c ("crypto: arm64/aes-ccm - yield NEON after every ...")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07 17:26:23 +08:00
Ard Biesheuvel 7c50136a8a crypto: arm64/aes-ghash - yield NEON after every block of input
Avoid excessive scheduling delays under a preemptible kernel by
yielding the NEON after every block of input.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-05-12 00:13:10 +08:00
Ard Biesheuvel 03c9a333fe crypto: arm64/ghash - add NEON accelerated fallback for 64-bit PMULL
Implement a NEON fallback for systems that do support NEON but have
no support for the optional 64x64->128 polynomial multiplication
instruction that is part of the ARMv8 Crypto Extensions. It is based
on the paper "Fast Software Polynomial Multiplication on ARM Processors
Using the NEON Engine" by Danilo Camara, Conrado Gouvea, Julio Lopez and
Ricardo Dahab (https://hal.inria.fr/hal-01506572), but has been reworked
extensively for the AArch64 ISA.

On a low-end core such as the Cortex-A53 found in the Raspberry Pi3, the
NEON based implementation is 4x faster than the table based one, and
is time invariant as well, making it less vulnerable to timing attacks.
When combined with the bit-sliced NEON implementation of AES-CTR, the
AES-GCM performance increases by 2x (from 58 to 29 cycles per byte).

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-08-04 09:27:25 +08:00
Ard Biesheuvel 537c1445ab crypto: arm64/gcm - implement native driver using v8 Crypto Extensions
Currently, the AES-GCM implementation for arm64 systems that support the
ARMv8 Crypto Extensions is based on the generic GCM module, which combines
the AES-CTR implementation using AES instructions with the PMULL based
GHASH driver. This is suboptimal, given the fact that the input data needs
to be loaded twice, once for the encryption and again for the MAC
calculation.

On Cortex-A57 (r1p2) and other recent cores that implement micro-op fusing
for the AES instructions, AES executes at less than 1 cycle per byte, which
means that any cycles wasted on loading the data twice hurt even more.

So implement a new GCM driver that combines the AES and PMULL instructions
at the block level. This improves performance on Cortex-A57 by ~37% (from
3.5 cpb to 2.6 cpb)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-08-04 09:27:23 +08:00
Ard Biesheuvel 9c433ad508 crypto: arm64/ghash-ce - fix for big endian
The GHASH key and digest are both pairs of 64-bit quantities, but the
GHASH code does not always refer to them as such, causing failures when
built for big endian. So replace the 16x1 loads and stores with 2x8 ones.

Fixes: b913a6404c ("arm64/crypto: improve performance of GHASH algorithm")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-10-21 11:03:43 +08:00
Ard Biesheuvel b913a6404c arm64/crypto: improve performance of GHASH algorithm
This patches modifies the GHASH secure hash implementation to switch to a
faster, polynomial multiplication based reduction instead of one that uses
shifts and rotates.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-06-18 12:40:54 +01:00
Ard Biesheuvel fdd2389457 arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
This is a port to ARMv8 (Crypto Extensions) of the Intel implementation of the
GHASH Secure Hash (used in the Galois/Counter chaining mode). It relies on the
optional PMULL/PMULL2 instruction (polynomial multiply long, what Intel call
carry-less multiply).

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-05-14 10:04:07 -07:00