linux

Commit Graph

Author	SHA1	Message	Date
Ard Biesheuvel	537c1445ab	crypto: arm64/gcm - implement native driver using v8 Crypto Extensions Currently, the AES-GCM implementation for arm64 systems that support the ARMv8 Crypto Extensions is based on the generic GCM module, which combines the AES-CTR implementation using AES instructions with the PMULL based GHASH driver. This is suboptimal, given the fact that the input data needs to be loaded twice, once for the encryption and again for the MAC calculation. On Cortex-A57 (r1p2) and other recent cores that implement micro-op fusing for the AES instructions, AES executes at less than 1 cycle per byte, which means that any cycles wasted on loading the data twice hurt even more. So implement a new GCM driver that combines the AES and PMULL instructions at the block level. This improves performance on Cortex-A57 by ~37% (from 3.5 cpb to 2.6 cpb) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:23 +08:00
Ard Biesheuvel	ec808bbef0	crypto: arm64/aes-bs - implement non-SIMD fallback for AES-CTR Of the various chaining modes implemented by the bit sliced AES driver, only CTR is exposed as a synchronous cipher, and requires a fallback in order to remain usable once we update the kernel mode NEON handling logic to disallow nested use. So wire up the existing CTR fallback C code. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:22 +08:00
Ard Biesheuvel	611d5324f4	crypto: arm64/chacha20 - take may_use_simd() into account To accommodate systems that disallow the use of kernel mode NEON in some circumstances, take the return value of may_use_simd into account when deciding whether to invoke the C fallback routine. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:22 +08:00
Ard Biesheuvel	e211506979	crypto: arm64/aes-blk - add a non-SIMD fallback for synchronous CTR To accommodate systems that may disallow use of the NEON in kernel mode in some circumstances, introduce a C fallback for synchronous AES in CTR mode, and use it if may_use_simd() returns false. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:21 +08:00
Ard Biesheuvel	5092fcf349	crypto: arm64/aes-ce-ccm: add non-SIMD generic fallback The arm64 kernel will shortly disallow nested kernel mode NEON. So honour this in the ARMv8 Crypto Extensions implementation of CCM-AES, and fall back to a scalar implementation using the generic crypto helpers for AES, XOR and incrementing the CTR counter. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:21 +08:00
Ard Biesheuvel	b8fb993a83	crypto: arm64/aes-ce-cipher: add non-SIMD generic fallback The arm64 kernel will shortly disallow nested kernel mode NEON, so add a fallback to scalar code that can be invoked in that case. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:20 +08:00
Ard Biesheuvel	f402e3115e	crypto: arm64/aes-ce-cipher - match round key endianness with generic code In order to be able to reuse the generic AES code as a fallback for situations where the NEON may not be used, update the key handling to match the byte order of the generic code: it stores round keys as sequences of 32-bit quantities rather than streams of bytes, and so our code needs to be updated to reflect that. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:19 +08:00
Ard Biesheuvel	da1793312f	crypto: arm64/sha2-ce - add non-SIMD scalar fallback The arm64 kernel will shortly disallow nested kernel mode NEON, so add a fallback to scalar code that can be invoked in that case. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:19 +08:00
Ard Biesheuvel	0771f3234d	crypto: arm64/sha1-ce - add non-SIMD generic fallback The arm64 kernel will shortly disallow nested kernel mode NEON, so add a fallback to scalar C code that can be invoked in that case. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:18 +08:00
Ard Biesheuvel	15c7d8f8a2	crypto: arm64/crc32 - add non-SIMD scalar fallback The arm64 kernel will shortly disallow nested kernel mode NEON, so add a fallback to scalar C code that can be invoked in that case. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:17 +08:00
Ard Biesheuvel	2dde374e1f	crypto: arm64/crct10dif - add non-SIMD generic fallback The arm64 kernel will shortly disallow nested kernel mode NEON, so add a fallback to scalar C code that can be invoked in that case. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:16 +08:00
Ard Biesheuvel	6d6254d728	crypto: arm64/ghash-ce - add non-SIMD scalar fallback The arm64 kernel will shortly disallow nested kernel mode NEON, so add a fallback to scalar C code that can be invoked in that case. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:16 +08:00
Ard Biesheuvel	45fe93dff2	crypto: algapi - make crypto_xor() take separate dst and src arguments There are quite a number of occurrences in the kernel of the pattern if (dst != src) memcpy(dst, src, walk.total % AES_BLOCK_SIZE); crypto_xor(dst, final, walk.total % AES_BLOCK_SIZE); or crypto_xor(keystream, src, nbytes); memcpy(dst, keystream, nbytes); where crypto_xor() is preceded or followed by a memcpy() invocation that is only there because crypto_xor() uses its output parameter as one of the inputs. To avoid having to add new instances of this pattern in the arm64 code, which will be refactored to implement non-SIMD fallbacks, add an alternative implementation called crypto_xor_cpy(), taking separate input and output arguments. This removes the need for the separate memcpy(). Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:15 +08:00
Ard Biesheuvel	f4857f4c2e	crypto: arm64/sha - avoid non-standard inline asm tricks Replace the inline asm which exports struct offsets as ELF symbols with proper const variables exposing the same values. This works around an issue with Clang which does not interpret the "i" (or "I") constraints in the same way as GCC. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-05-18 13:19:52 +08:00
Ard Biesheuvel	4860620da7	crypto: arm64/aes - add NEON/Crypto Extensions CBCMAC/CMAC/XCBC driver On ARMv8 implementations that do not support the Crypto Extensions, such as the Raspberry Pi 3, the CCM driver falls back to the generic table based AES implementation to perform the MAC part of the algorithm, which is slow and not time invariant. So add a CBCMAC implementation to the shared glue code between NEON AES and Crypto Extensions AES, so that it can be used instead now that the CCM driver has been updated to look for CBCMAC implementations other than the one it supplies itself. Also, given how these algorithms mostly only differ in the way the key handling and the final encryption are implemented, expose CMAC and XCBC algorithms as well based on the same core update code. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-02-11 17:50:45 +08:00
Ard Biesheuvel	5d3d9c8bda	crypto: arm64/crc32 - merge CRC32 and PMULL instruction based drivers The PMULL based CRC32 implementation already contains code based on the separate, optional CRC32 instructions to fallback to when operating on small quantities of data. We can expose these routines directly on systems that lack the 64x64 PMULL instructions but do implement the CRC32 ones, which makes the driver that is based solely on those CRC32 instructions redundant. So remove it. Note that this aligns arm64 with ARM, whose accelerated CRC32 driver also combines the CRC32 extension based and the PMULL based versions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Matthias Brugger <mbrugger@suse.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-02-11 17:50:38 +08:00
Ard Biesheuvel	88a3f582be	crypto: arm64/aes - don't use IV buffer to return final keystream block The arm64 bit sliced AES core code uses the IV buffer to pass the final keystream block back to the glue code if the input is not a multiple of the block size, so that the asm code does not have to deal with anything except 16 byte blocks. This is done under the assumption that the outgoing IV is meaningless anyway in this case, given that chaining is no longer possible under these circumstances. However, as it turns out, the CCM driver does expect the IV to retain a value that is equal to the original IV except for the counter value, and even interprets byte zero as a length indicator, which may result in memory corruption if the IV is overwritten with something else. So use a separate buffer to return the final keystream block. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-02-03 18:16:20 +08:00
Ard Biesheuvel	12fcd92305	crypto: arm64/aes - replace scalar fallback with plain NEON fallback The new bitsliced NEON implementation of AES uses a fallback in two places: CBC encryption (which is strictly sequential, whereas this driver can only operate efficiently on 8 blocks at a time), and the XTS tweak generation, which involves encrypting a single AES block with a different key schedule. The plain (i.e., non-bitsliced) NEON code is more suitable as a fallback, given that it is faster than scalar on low end cores (which is what the NEON implementations target, since high end cores have dedicated instructions for AES), and shows similar behavior in terms of D-cache footprint and sensitivity to cache timing attacks. So switch the fallback handling to the plain NEON driver. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-02-03 18:16:20 +08:00
Ard Biesheuvel	4edd7d015b	crypto: arm64/aes-neon-blk - tweak performance for low end cores The non-bitsliced AES implementation using the NEON is highly sensitive to micro-architectural details, and, as it turns out, the Cortex-A53 on the Raspberry Pi 3 is a core that can benefit from this code, given that its scalar AES performance is abysmal (32.9 cycles per byte). The new bitsliced AES code manages 19.8 cycles per byte on this core, but can only operate on 8 blocks at a time, which is not supported by all chaining modes. With a bit of tweaking, we can get the plain NEON code to run at 22.0 cycles per byte, making it useful for sequential modes like CBC encryption. (Like bitsliced NEON, the plain NEON implementation does not use any lookup tables, which makes it easy on the D-cache, and invulnerable to cache timing attacks) So tweak the plain NEON AES code to use tbl instructions rather than shl/sri pairs, and to avoid the need to reload permutation vectors or other constants from memory in every round. Also, improve the decryption performance by switching to 16x8 pmul instructions for the performing the multiplications in GF(2^8). To allow the ECB and CBC encrypt routines to be reused by the bitsliced NEON code in a subsequent patch, export them from the module. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-02-03 18:16:20 +08:00
Ard Biesheuvel	c458c4ada0	crypto: arm64/aes - performance tweak Shuffle some instructions around in the __hround macro to shave off 0.1 cycles per byte on Cortex-A57. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-02-03 18:16:20 +08:00
Ard Biesheuvel	262ea4f670	crypto: arm64/aes - avoid literals for cross-module symbol references Using simple adrp/add pairs to refer to the AES lookup tables exposed by the generic AES driver (which could be loaded far away from this driver when KASLR is in effect) was unreliable at module load time before commit `41c066f2c4` ("arm64: assembler: make adr_l work in modules under KASLR"), which is why the AES code used literals instead. So now we can get rid of the literals, and switch to the adr_l macro. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-02-03 18:16:20 +08:00
Ard Biesheuvel	4d1108fd74	crypto: arm64/chacha20 - remove cra_alignmask Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-02-03 18:16:19 +08:00
Ard Biesheuvel	ccc5d51ef9	crypto: arm64/aes-blk - remove cra_alignmask Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-02-03 18:16:19 +08:00
Ard Biesheuvel	8f4102dbd9	crypto: arm64/aes-ce-ccm - remove cra_alignmask Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-02-03 18:16:19 +08:00
Herbert Xu	34cb582139	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Merge the crypto tree to pick up arm64 output IV patch.	2017-02-03 18:14:10 +08:00
Ard Biesheuvel	11e3b725cf	crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modes Update the ARMv8 Crypto Extensions and the plain NEON AES implementations in CBC and CTR modes to return the next IV back to the skcipher API client. This is necessary for chaining to work correctly. Note that for CTR, this is only done if the request is a round multiple of the block size, since otherwise, chaining is impossible anyway. Cc: <stable@vger.kernel.org> # v3.16+ Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-01-23 22:41:33 +08:00
Ard Biesheuvel	1abee99eaf	crypto: arm64/aes - reimplement bit-sliced ARM/NEON implementation for arm64 This is a reimplementation of the NEON version of the bit-sliced AES algorithm. This code is heavily based on Andy Polyakov's OpenSSL version for ARM, which is also available in the kernel. This is an alternative for the existing NEON implementation for arm64 authored by me, which suffers from poor performance due to its reliance on the pathologically slow four register variant of the tbl/tbx NEON instruction. This version is about ~30% () faster than the generic C code, but only in cases where the input can be 8x interleaved (this is a fundamental property of bit slicing). For this reason, only the chaining modes ECB, XTS and CTR are implemented. (The significance of ECB is that it could potentially be used by other chaining modes) Measured on Cortex-A57. Note that this is still an order of magnitude slower than the implementations that use the dedicated AES instructions introduced in ARMv8, but those are part of an optional extension, and so it is good to have a fallback. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-01-13 00:26:51 +08:00
Ard Biesheuvel	bed593c0e8	crypto: arm64/aes - add scalar implementation This adds a scalar implementation of AES, based on the precomputed tables that are exposed by the generic AES code. Since rotates are cheap on arm64, this implementation only uses the 4 core tables (of 1 KB each), and avoids the prerotated ones, reducing the D-cache footprint by 75%. On Cortex-A57, this code manages 13.0 cycles per byte, which is ~34% faster than the generic C code. (Note that this is still >13x slower than the code that uses the optional ARMv8 Crypto Extensions, which manages <1 cycles per byte.) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-01-13 00:26:49 +08:00
Ard Biesheuvel	293614ce3e	crypto: arm64/aes-blk - expose AES-CTR as synchronous cipher as well In addition to wrapping the AES-CTR cipher into the async SIMD wrapper, which exposes it as an async skcipher that defers processing to process context, expose our AES-CTR implementation directly as a synchronous cipher as well, but with a lower priority. This makes the AES-CTR transform usable in places where synchronous transforms are required, such as the MAC802.11 encryption code, which executes in sotfirq context, where SIMD processing is allowed on arm64. Users of the async transform will keep the existing behavior. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-01-13 00:26:49 +08:00
Ard Biesheuvel	b7171ce9eb	crypto: arm64/chacha20 - implement NEON version based on SSE3 code This is a straight port to arm64/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. It uses the new skcipher walksize attribute to process the input in strides of 4x the block size. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-01-13 00:26:48 +08:00
Herbert Xu	5386e5d1f8	Revert "crypto: arm64/ARM: NEON accelerated ChaCha20" This patch reverts the following commits: `8621caa0d4` `8096667273` I should not have applied them because they had already been obsoleted by a subsequent patch series. They also cause a build failure because of the subsequent commit `9ae433bc79`. Fixes: `9ae433bc79` ("crypto: chacha20 - convert generic and...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-12-28 17:39:26 +08:00
Ard Biesheuvel	8621caa0d4	crypto: arm64/chacha20 - implement NEON version based on SSE3 code This is a straight port to arm64/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-12-27 17:47:28 +08:00
Ard Biesheuvel	8fefde90e9	crypto: arm64/crc32 - accelerated support based on x86 SSE implementation This is a combination of the the Intel algorithm implemented using SSE and PCLMULQDQ instructions from arch/x86/crypto/crc32-pclmul_asm.S, and the new CRC32 extensions introduced for both 32-bit and 64-bit ARM in version 8 of the architecture. Two versions of the above combo are provided, one for CRC32 and one for CRC32C. The PMULL/NEON algorithm is faster, but operates on blocks of at least 64 bytes, and on multiples of 16 bytes only. For the remaining input, or for all input on systems that lack the PMULL 64x64->128 instructions, the CRC32 instructions will be used. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-12-07 20:01:22 +08:00
Ard Biesheuvel	6ef5737f39	crypto: arm64/crct10dif - port x86 SSE implementation to arm64 This is a transliteration of the Intel algorithm implemented using SSE and PCLMULQDQ instructions that resides in the file arch/x86/crypto/crct10dif-pcl-asm_64.S, but simplified to only operate on buffers that are 16 byte aligned (but of any size) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-12-07 20:01:17 +08:00
Herbert Xu	0be8a270b3	crypto: arm64/aes-ce-ccm - Fix AEAD decryption length This patch fixes the ARM64 CE CCM implementation decryption by using skcipher_walk_aead_decrypt instead of skcipher_walk_aead, which ensures the correct length is used when doing the walk. Fixes: `cf2c0fe740` ("crypto: aes-ce-ccm - Use skcipher walk interface") Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-12-01 21:06:37 +08:00
Ard Biesheuvel	b3e1e0cbd9	crypto: arm64/aes-ce-ctr - fix skcipher conversion Fix a missing statement that got lost in the skcipher conversion of the CTR transform. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-11-30 20:01:44 +08:00
Ard Biesheuvel	7f329c1742	crypto: arm/aes-ce - fix broken monolithic build When building the arm64 kernel with both CONFIG_CRYPTO_AES_ARM64_CE_BLK=y and CONFIG_CRYPTO_AES_ARM64_NEON_BLK=y configured, the build breaks with the following error: arch/arm64/crypto/aes-neon-blk.o:(.bss+0x0): multiple definition of `aes_simd_algs' arch/arm64/crypto/aes-ce-blk.o:(.bss+0x0): first defined here Fix this by making aes_simd_algs 'static'. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-11-30 20:01:41 +08:00
Herbert Xu	585b5fa63d	crypto: arm/aes - Select SIMD in Kconfig The skcipher conversion for ARM missed the select on CRYPTO_SIMD, causing build failures if SIMD was not otherwise enabled. Fixes: `da40e7a4ba` ("crypto: aes-ce - Convert to skcipher") Fixes: `211f41af53` ("crypto: aesbs - Convert to skcipher") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-11-29 16:11:14 +08:00
Ard Biesheuvel	a4b15bed54	crypto: arm64/sha2 - add generated .S files to .gitignore Add the files that are generated by the recently merged OpenSSL SHA-256/512 implementation to .gitignore so Git disregards them when showing untracked files. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-11-29 16:06:56 +08:00
Herbert Xu	d0ed0db149	crypto: arm64/aes - Convert to skcipher This patch converts arm64/aes over to the skcipher interface. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-11-28 21:23:20 +08:00
Herbert Xu	cf2c0fe740	crypto: aes-ce-ccm - Use skcipher walk interface This patch makes use of the new skcipher walk interface instead of the obsolete blkcipher walk interface. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-11-28 21:23:17 +08:00
Ard Biesheuvel	7918ecef07	crypto: arm64/sha2 - integrate OpenSSL implementations of SHA256/SHA512 This integrates both the accelerated scalar and the NEON implementations of SHA-224/256 as well as SHA-384/512 from the OpenSSL project. Relative performance compared to the respective generic C versions: \| SHA256-scalar \| SHA256-NEON* \| SHA512 \| ------------+-----------------+--------------+----------+ Cortex-A53 \| 1.63x \| 1.63x \| 2.34x \| Cortex-A57 \| 1.43x \| 1.59x \| 1.95x \| Cortex-A73 \| 1.26x \| 1.56x \| ? \| The core crypto code was authored by Andy Polyakov of the OpenSSL project, in collaboration with whom the upstream code was adapted so that this module can be built from the same version of sha512-armv8.pl. The version in this patch was taken from OpenSSL commit 32bbb62ea634 ("sha/asm/sha512-armv8.pl: fix big-endian support in __KERNEL__ case.") * The core SHA algorithm is fundamentally sequential, but there is a secondary transformation involved, called the schedule update, which can be performed independently. The NEON version of SHA-224/SHA-256 only implements this part of the algorithm using NEON instructions, the sequential part is always done using scalar instructions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-11-28 19:58:05 +08:00
Ard Biesheuvel	caf4b9e2b3	crypto: arm64/aes-xts-ce: fix for big endian Emit the XTS tweak literal constants in the appropriate order for a single 128-bit scalar literal load. Fixes: `49788fe2a1` ("arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto Extensions") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-10-21 11:03:45 +08:00
Ard Biesheuvel	a2c435cc99	crypto: arm64/aes-neon - fix for big endian The AES implementation using pure NEON instructions relies on the generic AES key schedule generation routines, which store the round keys as arrays of 32-bit quantities stored in memory using native endianness. This means we should refer to these round keys using 4x4 loads rather than 16x1 loads. In addition, the ShiftRows tables are loading using a single scalar load, which is also affected by endianness, so emit these tables in the correct order depending on whether we are building for big endian or not. Fixes: `49788fe2a1` ("arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto Extensions") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-10-21 11:03:45 +08:00
Ard Biesheuvel	56e4e76c68	crypto: arm64/aes-ccm-ce: fix for big endian The AES-CCM implementation that uses ARMv8 Crypto Extensions instructions refers to the AES round keys as pairs of 64-bit quantities, which causes failures when building the code for big endian. In addition, it byte swaps the input counter unconditionally, while this is only required for little endian builds. So fix both issues. Fixes: `12ac3efe74` ("arm64/crypto: use crypto instructions to generate AES key schedule") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-10-21 11:03:43 +08:00
Ard Biesheuvel	174122c39c	crypto: arm64/sha2-ce - fix for big endian The SHA256 digest is an array of 8 32-bit quantities, so we should refer to them as such in order for this code to work correctly when built for big endian. So replace 16 byte scalar loads and stores with 4x32 vector ones where appropriate. Fixes: `6ba6c74dfc` ("arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-10-21 11:03:43 +08:00
Ard Biesheuvel	ee71e5f1e7	crypto: arm64/sha1-ce - fix for big endian The SHA1 digest is an array of 5 32-bit quantities, so we should refer to them as such in order for this code to work correctly when built for big endian. So replace 16 byte scalar loads and stores with 4x4 vector ones where appropriate. Fixes: `2c98833a42` ("arm64/crypto: SHA-1 using ARMv8 Crypto Extensions") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-10-21 11:03:43 +08:00
Ard Biesheuvel	9c433ad508	crypto: arm64/ghash-ce - fix for big endian The GHASH key and digest are both pairs of 64-bit quantities, but the GHASH code does not always refer to them as such, causing failures when built for big endian. So replace the 16x1 loads and stores with 2x8 ones. Fixes: `b913a6404c` ("arm64/crypto: improve performance of GHASH algorithm") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-10-21 11:03:43 +08:00
Ard Biesheuvel	1803b9a52c	crypto: arm64/aes-ce - fix for big endian The core AES cipher implementation that uses ARMv8 Crypto Extensions instructions erroneously loads the round keys as 64-bit quantities, which causes the algorithm to fail when built for big endian. In addition, the key schedule generation routine fails to take endianness into account as well, when loading the combining the input key with the round constants. So fix both issues. Fixes: `12ac3efe74` ("arm64/crypto: use crypto instructions to generate AES key schedule") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-10-21 11:03:42 +08:00
Ard Biesheuvel	2db34e78f1	crypto: arm64/aes-ctr - fix NULL dereference in tail processing The AES-CTR glue code avoids calling into the blkcipher API for the tail portion of the walk, by comparing the remainder of walk.nbytes modulo AES_BLOCK_SIZE with the residual nbytes, and jumping straight into the tail processing block if they are equal. This tail processing block checks whether nbytes != 0, and does nothing otherwise. However, in case of an allocation failure in the blkcipher layer, we may enter this code with walk.nbytes == 0, while nbytes > 0. In this case, we should not dereference the source and destination pointers, since they may be NULL. So instead of checking for nbytes != 0, check for (walk.nbytes % AES_BLOCK_SIZE) != 0, which implies the former in non-error conditions. Fixes: `49788fe2a1` ("arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto Extensions") Cc: stable@vger.kernel.org Reported-by: xiakaixu <xiakaixu@huawei.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-09-13 18:44:59 +08:00

1 2

87 Commits