Currently, inner IV/DIGEST data are only copied once into the hash
engines and not set explicitly before launching a request that is not a
first frag. This is an issue especially when multiple ahash reqs are
computed in parallel or chained with cipher request, as the state of the
request being computed is not updated into the hash engine. It leads to
non-deterministic corrupted digest results.
Fixes: commit 2786cee8e5 ("crypto: marvell - Move SRAM I/O operations to step functions")
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
So far, we used a dedicated dma pool to copy the result of outer IV for
cipher requests. Instead of using a dma pool per outer data, we prefer
use the op dma pool that contains all part of the request from the SRAM.
Then, the outer data that is likely to be used by the 'complete'
operation, is copied later. In this way, any type of result can be
retrieved by DMA for cipher or ahash requests.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The Cryptographic Engines and Security Accelerators (CESA) supports the
Multi-Packet Chain Mode. With this mode enabled, multiple tdma requests
can be chained and processed by the hardware without software
intervention. This mode was already activated, however the crypto
requests were not chained together. By doing so, we reduce significantly
the number of IRQs. Instead of being interrupted at the end of each
crypto request, we are interrupted at the end of the last cryptographic
request processed by the engine.
This commits re-factorizes the code, changes the code architecture and
adds the required data structures to chain cryptographic requests
together before sending them to an engine (stopped or possibly already
running).
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This commits adds support for fine grained load balancing on
multi-engine IPs. The engine is pre-selected based on its current load
and on the weight of the crypto request that is about to be processed.
The global crypto queue is also moved to each engine. These changes are
required to allow chaining crypto requests at the DMA level. By using
a crypto queue per engine, we make sure that we keep the state of the
tdma chain synchronized with the crypto queue. We also reduce contention
on 'cesa_dev->lock' and improve parallelism.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
So far, the 'process' operation was used to check if the current request
was correctly handled by the engine, if it was the case it copied
information from the SRAM to the main memory. Now, we split this
operation. We keep the 'process' operation, which still checks if the
request was correctly handled by the engine or not, then we add a new
operation for completion. The 'complete' method copies the content of
the SRAM to memory. This will soon become useful if we want to call
the process and the complete operations from different locations
depending on the type of the request (different cleanup logic).
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently, the only way to access the tdma chain is to use the 'req'
union from a mv_cesa_{ablkcipher,ahash}. This will soon become a problem
if we want to handle the TDMA chaining vs standard/non-DMA processing in
a generic way (with generic functions at the cesa.c level detecting
whether the request should be queued at the DMA level or not). Hence the
decision to move the chain field a the mv_cesa_req level at the expense
of adding 2 void * fields to all request contexts (including non-DMA
ones) and to remove the type completly. To limit the overhead, we get
rid of the type field, which can now be deduced from the req->chain.first
value. Once these changes are done the union is no longer needed, so
remove it and move mv_cesa_ablkcipher_std_req and mv_cesa_req
to mv_cesa_ablkcipher_req directly. There are also no needs to keep the
'base' field into the union of mv_cesa_ahash_req, so move it into the
upper structure.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a TDMA descriptor at the end of the request for copying the
output IV vector via a DMA transfer. This is a good way for offloading
as much as processing as possible to the DMA and the crypto engine.
This is also required for processing multiple cipher requests
in chained mode, otherwise the content of the IV vector would be
overwritten by the last processed request.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Crypto requests are not guaranteed to be finalized (->final() call),
and can be freed at any moment, without getting any notification from
the core. This can lead to memory leaks of the ->cache buffer.
Make this buffer part of the request object, and allocate an extra buffer
from the DMA cache pool when doing DMA operations.
As a side effect, this patch also fixes another bug related to cache
allocation and DMA operations. When the core allocates a new request and
import an existing state, a cache buffer can be allocated (depending
on the state). The problem is, at that very moment, we don't know yet
whether the request will use DMA or not, and since everything is
likely to be initialized to zero, mv_cesa_ahash_alloc_cache() thinks it
should allocate a buffer for standard operation. But when
mv_cesa_ahash_free_cache() is called, req->type has been set to
CESA_DMA_REQ in the meantime, thus leading to an invalind dma_pool_free()
call (the buffer passed in argument has not been allocated from the pool).
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reported-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Pull crypto update from Herbert Xu:
"API:
- Add support for cipher output IVs in testmgr
- Add missing crypto_ahash_blocksize helper
- Mark authenc and des ciphers as not allowed under FIPS.
Algorithms:
- Add CRC support to 842 compression
- Add keywrap algorithm
- A number of changes to the akcipher interface:
+ Separate functions for setting public/private keys.
+ Use SG lists.
Drivers:
- Add Intel SHA Extension optimised SHA1 and SHA256
- Use dma_map_sg instead of custom functions in crypto drivers
- Add support for STM32 RNG
- Add support for ST RNG
- Add Device Tree support to exynos RNG driver
- Add support for mxs-dcp crypto device on MX6SL
- Add xts(aes) support to caam
- Add ctr(aes) and xts(aes) support to qat
- A large set of fixes from Russell King for the marvell/cesa driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (115 commits)
crypto: asymmetric_keys - Fix unaligned access in x509_get_sig_params()
crypto: akcipher - Don't #include crypto/public_key.h as the contents aren't used
hwrng: exynos - Add Device Tree support
hwrng: exynos - Fix missing configuration after suspend to RAM
hwrng: exynos - Add timeout for waiting on init done
dt-bindings: rng: Describe Exynos4 PRNG bindings
crypto: marvell/cesa - use __le32 for hardware descriptors
crypto: marvell/cesa - fix missing cpu_to_le32() in mv_cesa_dma_add_op()
crypto: marvell/cesa - use memcpy_fromio()/memcpy_toio()
crypto: marvell/cesa - use gfp_t for gfp flags
crypto: marvell/cesa - use dma_addr_t for cur_dma
crypto: marvell/cesa - use readl_relaxed()/writel_relaxed()
crypto: caam - fix indentation of close braces
crypto: caam - only export the state we really need to export
crypto: caam - fix non-block aligned hash calculation
crypto: caam - avoid needlessly saving and restoring caam_hash_ctx
crypto: caam - print errno code when hash registration fails
crypto: marvell/cesa - fix memory leak
crypto: marvell/cesa - fix first-fragment handling in mv_cesa_ahash_dma_last_req()
crypto: marvell/cesa - rearrange handling for sw padded hashes
...
Much of the driver uses cpu_to_le32() to convert values for descriptors
to little endian before writing. Use __le32 to define the hardware-
accessed parts of the descriptors, and ensure most places where it's
reasonable to do so use cpu_to_le32() when assigning to these.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
cur_dma is part of the software state, not read by the hardware.
Storing it in LE32 format is wrong, use dma_addr_t for this.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use relaxed IO accessors where appropriate.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Multiple locations in the driver test the operation context fragment
type, checking whether it is a first fragment or not. Introduce a
mv_cesa_mac_op_is_first_frag() helper, which returns true if the
fragment operation is for a first fragment.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
mv_cesa_get_op_cfg() does not write to its argument, it only reads.
So, let's make it const.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Rather than determining whether we're using a MD5 hash by looking at
the digest size, switch to a cleaner solution using a per-request flag
initialised by the method type.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently, we read/write the state in CPU endian, but on the final
request, we convert its endian according to the requested algorithm.
(md5 is little endian, SHA are big endian.)
Always keep creq->state in CPU native endian format, and perform the
necessary conversion when copying the hash to the result.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The mv_cesa_queue_req() function calls crypto_enqueue_request() to
enqueue a request. In the normal case (i.e the queue isn't full), this
function returns -EINPROGRESS. The current Marvell CESA crypto driver
takes this into account and cleans up the request only if an error
occured, i.e if the return value is not -EINPROGRESS.
Unfortunately this causes problems with
CRYPTO_TFM_REQ_MAY_BACKLOG-flagged requests. When such a request is
passed to crypto_enqueue_request() and the queue is full,
crypto_enqueue_request() will return -EBUSY, but will keep the request
enqueued nonetheless. This situation was not properly handled by the
Marvell CESA driver, which was anyway cleaning up the request in such
a situation. When later on the request was taken out of the backlog
and actually processed, a kernel crash occured due to the internal
driver data structures for this structure having been cleaned up.
To avoid this situation, this commit adds a
mv_cesa_req_needs_cleanup() helper function which indicates if the
request needs to be cleaned up or not after a call to
crypto_enqueue_request(). This helper allows to do the cleanup only in
the appropriate cases, and all call sites of mv_cesa_queue_req() are
fixed to use this new helper function.
Reported-by: Vincent Donnefort <vdonnefort@gmail.com>
Fixes: db509a4533 ("crypto: marvell/cesa - add TDMA support")
Cc: <stable@vger.kernel.org> # v4.2+
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Tested-by: Vincent Donnefort <vdonnefort@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for SHA256 operations.
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for MD5 operations.
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for Triple-DES operations.
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for DES operations.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The CESA IP supports CPU offload through a dedicated DMA engine (TDMA)
which can control the crypto block.
When you use this mode, all the required data (operation metadata and
payload data) are transferred using DMA, and the results are retrieved
through DMA when possible (hash results are not retrieved through DMA yet),
thus reducing the involvement of the CPU and providing better performances
in most cases (for small requests, the cost of DMA preparation might
exceed the performance gain).
Note that some CESA IPs do not embed this dedicated DMA, hence the
activation of this feature on a per platform basis.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The existing mv_cesa driver supports some features of the CESA IP but is
quite limited, and reworking it to support new features (like involving the
TDMA engine to offload the CPU) is almost impossible.
This driver has been rewritten from scratch to take those new features into
account.
This commit introduce the base infrastructure allowing us to add support
for DMA optimization.
It also includes support for one hash (SHA1) and one cipher (AES)
algorithm, and enable those features on the Armada 370 SoC.
Other algorithms and platforms will be added later on.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>