linux

Commit Graph

Author	SHA1	Message	Date
Ard Biesheuvel	efa7cebdbf	crypto: arm/crc32 - add build time test for CRC instruction support The accelerated CRC32 module for ARM may use either the scalar CRC32 instructions, the NEON 64x64 to 128 bit polynomial multiplication (vmull.p64) instruction, or both, depending on what the current CPU supports. However, this also requires support in binutils, and as it turns out, versions of binutils exist that support the vmull.p64 instruction but not the crc32 instructions. So refactor the Makefile logic so that this module only gets built if binutils has support for both. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-03-01 19:47:53 +08:00
Ard Biesheuvel	cc477bf645	crypto: arm/aes - replace bit-sliced OpenSSL NEON code This replaces the unwieldy generated implementation of bit-sliced AES in CBC/CTR/XTS modes that originated in the OpenSSL project with a new version that is heavily based on the OpenSSL implementation, but has a number of advantages over the old version: - it does not rely on the scalar AES cipher that also originated in the OpenSSL project and contains redundant lookup tables and key schedule generation routines (which we already have in crypto/aes_generic.) - it uses the same expanded key schedule for encryption and decryption, reducing the size of the per-key data structure by 1696 bytes - it adds an implementation of AES in ECB mode, which can be wrapped by other generic chaining mode implementations - it moves the handling of corner cases that are non critical to performance to the glue layer written in C - it was written directly in assembler rather than generated from a Perl script Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-01-13 18:27:31 +08:00
Ard Biesheuvel	81edb42629	crypto: arm/aes - replace scalar AES cipher This replaces the scalar AES cipher that originates in the OpenSSL project with a new implementation that is ~15% () faster (on modern cores), and reuses the lookup tables and the key schedule generation routines from the generic C implementation (which is usually compiled in anyway due to networking and other subsystems depending on it). Note that the bit sliced NEON code for AES still depends on the scalar cipher that this patch replaces, so it is not removed entirely yet. On Cortex-A57, the performance increases from 17.0 to 14.9 cycles per byte for 128-bit keys. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-01-13 00:26:50 +08:00
Ard Biesheuvel	afaf712e99	crypto: arm/chacha20 - implement NEON version based on SSE3 code This is a straight port to ARM/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. It uses the new skcipher walksize attribute to process the input in strides of 4x the block size. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-01-13 00:26:48 +08:00
Herbert Xu	5386e5d1f8	Revert "crypto: arm64/ARM: NEON accelerated ChaCha20" This patch reverts the following commits: `8621caa0d4` `8096667273` I should not have applied them because they had already been obsoleted by a subsequent patch series. They also cause a build failure because of the subsequent commit `9ae433bc79`. Fixes: `9ae433bc79` ("crypto: chacha20 - convert generic and...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-12-28 17:39:26 +08:00
Ard Biesheuvel	8096667273	crypto: arm/chacha20 - implement NEON version based on SSE3 code This is a straight port to ARM/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-12-27 17:47:29 +08:00
Ard Biesheuvel	d0a3431a7b	crypto: arm/crc32 - accelerated support based on x86 SSE implementation This is a combination of the the Intel algorithm implemented using SSE and PCLMULQDQ instructions from arch/x86/crypto/crc32-pclmul_asm.S, and the new CRC32 extensions introduced for both 32-bit and 64-bit ARM in version 8 of the architecture. Two versions of the above combo are provided, one for CRC32 and one for CRC32C. The PMULL/NEON algorithm is faster, but operates on blocks of at least 64 bytes, and on multiples of 16 bytes only. For the remaining input, or for all input on systems that lack the PMULL 64x64->128 instructions, the CRC32 instructions will be used. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-12-07 20:01:24 +08:00
Ard Biesheuvel	1d481f1cd8	crypto: arm/crct10dif - port x86 SSE implementation to ARM This is a transliteration of the Intel algorithm implemented using SSE and PCLMULQDQ instructions that resides in the file arch/x86/crypto/crct10dif-pcl-asm_64.S, but simplified to only operate on buffers that are 16 byte aligned (but of any size) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-12-07 20:01:21 +08:00
Ard Biesheuvel	c80ae7ca37	crypto: arm/sha512 - accelerated SHA-512 using ARM generic ASM and NEON This replaces the SHA-512 NEON module with the faster and more versatile implementation from the OpenSSL project. It consists of both a NEON and a generic ASM version of the core SHA-512 transform, where the NEON version reverts to the ASM version when invoked in non-process context. This patch is based on the OpenSSL upstream version b1a5d1c65208 of sha512-armv4.pl, which can be found here: https://git.openssl.org/gitweb/?p=openssl.git;h=b1a5d1c65208 Performance relative to the generic implementation (measured using tcrypt.ko mode=306 sec=1 running on a Cortex-A57 under KVM): input size block size asm neon old neon 16 16 1.39 2.54 2.21 64 16 1.32 2.33 2.09 64 64 1.38 2.53 2.19 256 16 1.31 2.28 2.06 256 64 1.38 2.54 2.25 256 256 1.40 2.77 2.39 1024 16 1.29 2.22 2.01 1024 256 1.40 2.82 2.45 1024 1024 1.41 2.93 2.53 2048 16 1.33 2.21 2.00 2048 256 1.40 2.84 2.46 2048 1024 1.41 2.96 2.55 2048 2048 1.41 2.98 2.56 4096 16 1.34 2.20 1.99 4096 256 1.40 2.84 2.46 4096 1024 1.41 2.97 2.56 4096 4096 1.41 3.01 2.58 8192 16 1.34 2.19 1.99 8192 256 1.40 2.85 2.47 8192 1024 1.41 2.98 2.56 8192 4096 1.41 2.71 2.59 8192 8192 1.51 3.51 2.69 Acked-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-05-11 15:08:01 +08:00
Ard Biesheuvel	3abafaf219	crypto: arm - workaround for building with old binutils Old versions of binutils (before 2.23) do not yet understand the crypto-neon-fp-armv8 fpu instructions, and an attempt to build these files results in a build failure: arch/arm/crypto/aes-ce-core.S:133: Error: selected processor does not support ARM mode `vld1.8 {q10-q11},[ip]!' arch/arm/crypto/aes-ce-core.S:133: Error: bad instruction `aese.8 q0,q8' arch/arm/crypto/aes-ce-core.S:133: Error: bad instruction `aesmc.8 q0,q0' arch/arm/crypto/aes-ce-core.S:133: Error: bad instruction `aese.8 q0,q9' arch/arm/crypto/aes-ce-core.S:133: Error: bad instruction `aesmc.8 q0,q0' Since the affected versions are still in widespread use, and this breaks 'allmodconfig' builds, we should try to at least get a successful kernel build. Unfortunately, I could not come up with a way to make the Kconfig symbol depend on the binutils version, which would be the nicest solution. Instead, this patch uses the 'as-instr' Kbuild macro to find out whether the support is present in the assembler, and otherwise emits a non-fatal warning indicating which selected modules could not be built. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: http://storage.kernelci.org/next/next-20150410/arm-allmodconfig/build.log Fixes: `864cbeed4a` ("crypto: arm - add support for SHA1 using ARMv8 Crypto Instructions") [ard.biesheuvel: - omit modules entirely instead of building empty ones if binutils is too old - update commit log accordingly] Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-04-13 12:09:11 +08:00
Sami Tolvanen	f2f770d74a	crypto: arm/sha256 - Add optimized SHA-256/224 Add Andy Polyakov's optimized assembly and NEON implementations for SHA-256/224. The sha256-armv4.pl script for generating the assembly code is from OpenSSL commit 51f8d095562f36cdaa6893597b5c609e943b0565. Compared to sha256-generic these implementations have the following tcrypt speed improvements on Motorola Nexus 6 (Snapdragon 805): bs b/u sha256-neon sha256-asm 16 16 x1.32 x1.19 64 16 x1.27 x1.15 64 64 x1.36 x1.20 256 16 x1.22 x1.11 256 64 x1.36 x1.19 256 256 x1.59 x1.23 1024 16 x1.21 x1.10 1024 256 x1.65 x1.23 1024 1024 x1.76 x1.25 2048 16 x1.21 x1.10 2048 256 x1.66 x1.23 2048 1024 x1.78 x1.25 2048 2048 x1.79 x1.25 4096 16 x1.20 x1.09 4096 256 x1.66 x1.23 4096 1024 x1.79 x1.26 4096 4096 x1.82 x1.26 8192 16 x1.20 x1.09 8192 256 x1.67 x1.23 8192 1024 x1.80 x1.26 8192 4096 x1.85 x1.28 8192 8192 x1.85 x1.27 Where bs refers to block size and b/u to bytes per update. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Cc: Andy Polyakov <appro@openssl.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-04-03 18:03:40 +08:00
Ard Biesheuvel	f1e866b106	crypto: arm - add support for GHASH using ARMv8 Crypto Extensions This implements the GHASH hash algorithm (as used by the GCM AEAD chaining mode) using the AArch32 version of the 64x64 to 128 bit polynomial multiplication instruction (vmull.p64) that is part of the ARMv8 Crypto Extensions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-03-12 21:13:36 +11:00
Ard Biesheuvel	86464859cc	crypto: arm - AES in ECB/CBC/CTR/XTS modes using ARMv8 Crypto Extensions This implements the ECB, CBC, CTR and XTS asynchronous block ciphers using the AArch32 versions of the ARMv8 Crypto Extensions for AES. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-03-12 21:13:36 +11:00
Ard Biesheuvel	006d0624fa	crypto: arm - add support for SHA-224/256 using ARMv8 Crypto Extensions This implements the SHA-224/256 secure hash algorithm using the AArch32 versions of the ARMv8 Crypto Extensions for SHA2. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-03-12 21:13:35 +11:00
Ard Biesheuvel	864cbeed4a	crypto: arm - add support for SHA1 using ARMv8 Crypto Instructions This implements the SHA1 secure hash algorithm using the AArch32 versions of the ARMv8 Crypto Extensions for SHA1. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-03-12 21:13:35 +11:00
Jussi Kivilinna	c8611d712a	ARM: 8120/1: crypto: sha512: add ARM NEON implementation This patch adds ARM NEON assembly implementation of SHA-512 and SHA-384 algorithms. tcrypt benchmark results on Cortex-A8, sha512-generic vs sha512-neon-asm: block-size bytes/update old-vs-new 16 16 2.99x 64 16 2.67x 64 64 3.00x 256 16 2.64x 256 64 3.06x 256 256 3.33x 1024 16 2.53x 1024 256 3.39x 1024 1024 3.52x 2048 16 2.50x 2048 256 3.41x 2048 1024 3.54x 2048 2048 3.57x 4096 16 2.49x 4096 256 3.42x 4096 1024 3.56x 4096 4096 3.59x 8192 16 2.48x 8192 256 3.42x 8192 1024 3.56x 8192 4096 3.60x 8192 8192 3.60x Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2014-08-02 08:51:50 +01:00
Jussi Kivilinna	604682551a	ARM: 8119/1: crypto: sha1: add ARM NEON implementation This patch adds ARM NEON assembly implementation of SHA-1 algorithm. tcrypt benchmark results on Cortex-A8, sha1-arm-asm vs sha1-neon-asm: block-size bytes/update old-vs-new 16 16 1.04x 64 16 1.02x 64 64 1.05x 256 16 1.03x 256 64 1.04x 256 256 1.30x 1024 16 1.03x 1024 256 1.36x 1024 1024 1.52x 2048 16 1.03x 2048 256 1.39x 2048 1024 1.55x 2048 2048 1.59x 4096 16 1.03x 4096 256 1.40x 4096 1024 1.57x 4096 4096 1.62x 8192 16 1.03x 8192 256 1.40x 8192 1024 1.58x 8192 4096 1.63x 8192 8192 1.63x Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2014-08-02 08:51:47 +01:00
Ard Biesheuvel	e4e7f10bfc	ARM: add support for bit sliced AES using NEON instructions Bit sliced AES gives around 45% speedup on Cortex-A15 for encryption and around 25% for decryption. This implementation of the AES algorithm does not rely on any lookup tables so it is believed to be invulnerable to cache timing attacks. This algorithm processes up to 8 blocks in parallel in constant time. This means that it is not usable by chaining modes that are strictly sequential in nature, such as CBC encryption. CBC decryption, however, can benefit from this implementation and runs about 25% faster. The other chaining modes implemented in this module, XTS and CTR, can execute fully in parallel in both directions. The core code has been adopted from the OpenSSL project (in collaboration with the original author, on cc). For ease of maintenance, this version is identical to the upstream OpenSSL code, i.e., all modifications that were required to make it suitable for inclusion into the kernel have been made upstream. The original can be found here: http://git.openssl.org/gitweb/?p=openssl.git;a=commit;h=6f6a6130 Note to integrators: While this implementation is significantly faster than the existing table based ones (generic or ARM asm), especially in CTR mode, the effects on power efficiency are unclear as of yet. This code does fundamentally more work, by calculating values that the table based code obtains by a simple lookup; only by doing all of that work in a SIMD fashion, it manages to perform better. Cc: Andy Polyakov <appro@openssl.org> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>	2013-10-04 20:48:38 +02:00
David McCullough	f0be44f4fb	arm/crypto: Add optimized AES and SHA1 routines Add assembler versions of AES and SHA1 for ARM platforms. This has provided up to a 50% improvement in IPsec/TCP throughout for tunnels using AES128/SHA1. Platform CPU SPeed Endian Before (bps) After (bps) Improvement IXP425 533 MHz big 11217042 15566294 ~38% KS8695 166 MHz little 3828549 5795373 ~51% Signed-off-by: David McCullough <ucdevel@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-09-07 04:17:02 +08:00

19 Commits