The flags field in 'struct shash_desc' never actually does anything.
The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP.
However, no shash algorithm ever sleeps, making this flag a no-op.
With this being the case, inevitably some users who can't sleep wrongly
pass MAY_SLEEP. These would all need to be fixed if any shash algorithm
actually started sleeping. For example, the shash_ahash_*() functions,
which wrap a shash algorithm with the ahash API, pass through MAY_SLEEP
from the ahash API to the shash API. However, the shash functions are
called under kmap_atomic(), so actually they're assumed to never sleep.
Even if it turns out that some users do need preemption points while
hashing large buffers, we could easily provide a helper function
crypto_shash_update_large() which divides the data into smaller chunks
and calls crypto_shash_update() and cond_resched() for each chunk. It's
not necessary to have a flag in 'struct shash_desc', nor is it necessary
to make individual shash algorithms aware of this at all.
Therefore, remove shash_desc::flags, and document that the
crypto_shash_*() functions can be called from any context.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Many ahash algorithms set .cra_flags = CRYPTO_ALG_TYPE_AHASH. But this
is redundant with the C structure type ('struct ahash_alg'), and
crypto_register_ahash() already sets the type flag automatically,
clearing any type flag that was already there. Apparently the useless
assignment has just been copy+pasted around.
So, remove the useless assignment from all the ahash algorithms.
This patch shouldn't change any actual behavior.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Fixes: 8043bb1ae0 ("crypto: omap-sham - convert driver logic to use sgs for data xmit")
The memory pages freed in omap_sham_finish_req() were less than those
allocated in omap_sham_copy_sgs().
Cc: stable@vger.kernel.org
Signed-off-by: Bin Liu <b-liu@ti.com>
Acked-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commit 8043bb1ae0 ("crypto: omap-sham - convert driver logic to use
sgs for data xmit") removed the if() clause leaving the statement as is.
The intention was in that case to finish the request always so the goto
instruction seems sensible.
Remove the indentation to fix Smatch warning:
drivers/crypto/omap-sham.c:1761 omap_sham_done_task() warn: inconsistent indenting
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
ahash_request 'req' argument passed by the caller
omap_sham_handle_queue() cannot be NULL here because it is obtained from
non-NULL pointer via container_of().
This fixes smatch warning:
drivers/crypto/omap-sham.c:812 omap_sham_prepare_request() warn: variable dereferenced before check 'req' (see line 805)
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Crypto driver queue size can now be configured from userspace. This
allows optimizing the queue usage based on use case. Default queue
size is still 10 entries.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Crypto driver fallback size can now be configured from userspace. This
allows optimizing the DMA usage based on use case. Default fallback
size of 256 is still used.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In certain platforms like DRA7xx having memory > 2GB with LPAE enabled
has a constraint that DMA can be done with the initial 2GB and marks it
as ZONE_DMA. But openssl when used with cryptodev does not make sure that
input buffer is DMA capable. So, adding a check to verify if the input
buffer is capable of DMA.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Reported-by: Aparna Balasubramanian <aparnab@ti.com>
Reviewed-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The usage of of_device_get_match_data reduce the code size a bit.
Furthermore, it prevents an improbable dereference when
of_match_device() return NULL.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove unnecessary static on local variable dd. Such variable
is initialized before being used, on every execution path throughout
the function. The static has no benefit and, removing it reduces the
object file size.
This issue was detected using Coccinelle and the following semantic patch:
https://github.com/GustavoARSilva/coccinelle/blob/master/static/static_unused.cocci
In the following log you can see a difference in the object file size.
This log is the output of the size command, before and after the code
change:
before:
text data bss dec hex filename
26135 11944 128 38207 953f drivers/crypto/omap-sham.o
after:
text data bss dec hex filename
26084 11856 64 38004 9474 drivers/crypto/omap-sham.o
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This was previously missed from the code, causing SDMA to hang in
some cases where the buffer ended up being not aligned.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently there is an interesting corner case failure with omap-sham
driver, if the finalize call is done separately with no data, but
all previous data has already been processed. In this case, it is not
possible to close the hash with the hardware without providing any data,
so we get incorrect results. Fix this by adjusting the size of data
sent to the hardware crypto engine in case the non-final data size falls
on the block size boundary, by reducing the amount of data sent by one
full block. This makes it sure that we always have some data available
for the finalize call and we can close the hash properly.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Reported-by: Aparna Balasubramanian <aparnab@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently, the hash later code only handles the cases when we have
either new data coming in with the request or old data in the buffer,
but not the combination when we have both. Fix this by changing the
ordering of the code a bit and handling both cases properly
simultaneously if needed. Also, fix an issue with omap_sham_update
that surfaces with this fix, so that the code checks the bufcnt
instead of total data amount against buffer length to avoid any
buffer overflows.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch simply replace all occurrence of HMAC IPAD/OPAD value by their
define.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The current internal buffer size is way too large for crypto core, so
shrink it to be smaller. This makes the buffer to fit into the space
reserved for the export/import buffers also.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Now that the driver has been converted to use scatterlists for data
handling, add proper implementation for the export/import stubs also.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently, the internal buffer has been used for data transmission. Change
this so that scatterlists are used instead, and change the driver to
actually use the previously introduced helper functions for scatterlist
preparation.
This patch also removes the old buffer handling code which is no longer
needed.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently the threshold value was hardcoded in the driver. Having a define
for it makes it easier to configure.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently omap-sham uses a huge internal buffer for caching data, and
pushing this out to the DMA as large chunks. This, unfortunately,
doesn't work too well with the export/import functionality required
for ahash algorithms, and must be changed towards more scatterlist
centric approach.
This patch adds support functions for (mostly) scatterlist based data
handling. omap_sham_prepare_request() prepares a scatterlist for DMA
transfer to SHA crypto accelerator. This requires checking the data /
offset / length alignment of the data, splitting the data to SHA block
size granularity, and adding any remaining data back to the buffer.
With this patch, the code doesn't actually go live yet, the support code
will be taken properly into use with additional patches that modify the
SHA driver functionality itself.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The current usage of sgl will be deprecated, and will be replaced by an
array required by the sg based driver implementation. Rename the existing
variable as sgl_tmp so that it can be removed from the driver easily later.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
OMAP HW generally expects data for DMA to be on word boundary, so make the
SHA driver inform crypto framework of the same preference.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Initially these just return -ENOTSUPP to indicate that they don't
really do anything yet. Some sort of implementation is required
for the driver to at least probe.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If software fallback is used on older hardware accelerator setup (OMAP2/
OMAP3), the first block of data must be purged from the buffer. The
first block contains the pre-generated ipad value required by the HW,
but the software fallback algorithm generates its own, causing wrong
results.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If we have processed any data with the hardware accelerator (digcnt > 0),
we must complete the entire hash by using it. This is because the current
hash value can't be imported to the software fallback algorithm. Otherwise
we end up with wrong hash results.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Some of the call paths of OMAP SHA driver can avoid executing the next
step of the crypto queue under tasklet; instead, execute the next step
directly via function call. This avoids a costly round-trip via the
scheduler giving a slight performance boost.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The arm-neon-sha implementations have cra_priority of 150...300, so
increase omap-sham priority to 400 to ensure it is on top of any
software alg.
Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Adds software fallback support for small crypto requests. In these cases,
it is undesirable to use DMA, as setting it up itself is rather heavy
operation. Gives about 40% extra performance in ipsec usecase.
Signed-off-by: Bin Liu <b-liu@ti.com>
[t-kristo@ti.com: dropped the extra traces, updated some comments
on the code]
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The extra call to dmaengine_terminate_all is not needed, as the DMA
is not running at this point. This improves performance slightly.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Change crypto queue size from 1 to 10 for omap SHA driver. This should
allow clients to enqueue requests more effectively to avoid serializing
whole crypto sequences, giving extra performance.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Calling runtime PM API for every block causes serious performance hit to
crypto operations that are done on a long buffer. As crypto is performed
on a page boundary, encrypting large buffers can cause a series of crypto
operations divided by page. The runtime PM API is also called those many
times.
Convert the driver to use runtime_pm autosuspend instead, with a default
timeout value of 1 second. This results in upto ~50% speedup.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This if statement is reversed so we end up either leaking or Oopsing on
error.
Fixes: dbe246209b ('crypto: omap-sham - Use dma_request_chan() for requesting DMA channel')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
With the new dma_request_chan() the client driver does not need to look for
the DMA resource and it does not need to pass filter_fn anymore.
By switching to the new API the driver can now support deferred probing
against DMA.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: David S. Miller <davem@davemloft.net>
CC: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
[hch: split from a larger patch by Dan]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jens Axboe <axboe@fb.com>
omap3 support is same as omap2, just with different IO address (specified in DT)
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Function pm_runtime_get_sync could fail and we need to check return
value to prevent kernel crash.
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
kmap_atomic() gives only the page address of the input page.
Driver should take care of adding the offset of the scatterlist
within the page to the returned page address.
omap-sham driver is not adding the offset to page and directly operates
on the return vale of kmap_atomic(), because of which the following
error comes when running crypto tests:
00000000: d9 a1 1b 7c aa 90 3b aa 11 ab cb 25 00 b8 ac bf
[ 2.338169] 00000010: c1 39 cd ff 48 d0 a8 e2 2b fa 33 a1
[ 2.344008] alg: hash: Chunking test 1 failed for omap-sha256
So adding the scatterlist offset to vaddr.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
omap_sham_handle_queue() can be called as part of done_task tasklet.
During this its atomic and any calls to pm functions cannot sleep.
But there is a call to pm_runtime_get_sync() (which can sleep) in
omap_sham_handle_queue(), because of which the following appears:
" [ 116.169969] BUG: scheduling while atomic: kworker/0:2/2676/0x00000100"
Add pm_runtime_irq_safe() to avoid this.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Replaced the use of a Variable Length Array In Struct (VLAIS) with a C99
compliant equivalent. This patch allocates the appropriate amount of memory
using a char array using the SHASH_DESC_ON_STACK macro.
The new code can be compiled with both gcc and clang.
Signed-off-by: Behan Webster <behanw@converseincode.com>
Reviewed-by: Mark Charlebois <charlebm@gmail.com>
Reviewed-by: Jan-Simon Möller <dl9pf@gmx.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
HIGHMEM pages may not be mapped so we must kmap them before accessing.
This resolves a random OOPs error that was showing up during OpenSSL SHA tests.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use SIMPLE_DEV_PM_OPS macro in order to make the code simpler.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Command "tcrypt sec=1 mode=403" give the follwoing error for Polling
mode:
root@am335x-evm:/# insmod tcrypt.ko sec=1 mode=403
[...]
[ 346.982754] test 15 ( 4096 byte blocks, 1024 bytes per update, 4 updates): 4352 opers/sec, 17825792 bytes/sec
[ 347.992661] test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 7095 opers/sec, 29061120 bytes/sec
[ 349.002667] test 17 ( 8192 byte blocks, 16 bytes per update, 512 updates):
[ 349.010882] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 349.020037] pgd = ddeac000
[ 349.022884] [00000000] *pgd=9dcb4831, *pte=00000000, *ppte=00000000
[ 349.029816] Internal error: Oops: 17 [#1] PREEMPT SMP ARM
[ 349.035482] Modules linked in: tcrypt(+)
[ 349.039617] CPU: 0 PID: 1473 Comm: insmod Not tainted 3.12.4-01566-g6279006-dirty #38
[ 349.047832] task: dda91540 ti: ddcd2000 task.ti: ddcd2000
[ 349.053517] PC is at omap_sham_xmit_dma+0x6c/0x238
[ 349.058544] LR is at omap_sham_xmit_dma+0x38/0x238
[ 349.063570] pc : [<c04eb7cc>] lr : [<c04eb798>] psr: 20000013
[ 349.063570] sp : ddcd3c78 ip : 00000000 fp : 9d8980b8
[ 349.075610] r10: 00000000 r9 : 00000000 r8 : 00000000
[ 349.081090] r7 : 00001000 r6 : dd898000 r5 : 00000040 r4 : ddb10550
[ 349.087935] r3 : 00000004 r2 : 00000010 r1 : 53100080 r0 : 00000000
[ 349.094783] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 349.102268] Control: 10c5387d Table: 9deac019 DAC: 00000015
[ 349.108294] Process insmod (pid: 1473, stack limit = 0xddcd2248)
[...]
This is because polling_mode is not enabled for ctx without FLAGS_FINUP.
For polling mode the bufcnt is made 0 unconditionally. But it should be made 0
only if it is a final update or a total is not zero(This condition is similar
to what is done in DMA case). Because of this wrong hashes are produced.
Fixing the same.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In omap_sham_probe() and omap_sham_remove(), 'dd->dma_lch'
is released without checking to see if it was successfully
requested or not. This is a bug and was identified and
reported by Dan Carpenter here:
http://www.spinics.net/lists/devicetree/msg11023.html
Add code to only release 'dd->dma_lch' when its not NULL
(that is, when it was successfully requested).
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
CC: Joel Fernandes <joelf@ti.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Pull crypto update from Herbert Xu:
- Made x86 ablk_helper generic for ARM
- Phase out chainiv in favour of eseqiv (affects IPsec)
- Fixed aes-cbc IV corruption on s390
- Added constant-time crypto_memneq which replaces memcmp
- Fixed aes-ctr in omap-aes
- Added OMAP3 ROM RNG support
- Add PRNG support for MSM SoC's
- Add and use Job Ring API in caam
- Misc fixes
[ NOTE! This pull request was sent within the merge window, but Herbert
has some questionable email sending setup that makes him public enemy
#1 as far as gmail is concerned. So most of his emails seem to be
trapped by gmail as spam, resulting in me not seeing them. - Linus ]
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (49 commits)
crypto: s390 - Fix aes-cbc IV corruption
crypto: omap-aes - Fix CTR mode counter length
crypto: omap-sham - Add missing modalias
padata: make the sequence counter an atomic_t
crypto: caam - Modify the interface layers to use JR API's
crypto: caam - Add API's to allocate/free Job Rings
crypto: caam - Add Platform driver for Job Ring
hwrng: msm - Add PRNG support for MSM SoC's
ARM: DT: msm: Add Qualcomm's PRNG driver binding document
crypto: skcipher - Use eseqiv even on UP machines
crypto: talitos - Simplify key parsing
crypto: picoxcell - Simplify and harden key parsing
crypto: ixp4xx - Simplify and harden key parsing
crypto: authencesn - Simplify key parsing
crypto: authenc - Export key parsing helper function
crypto: mv_cesa: remove deprecated IRQF_DISABLED
hwrng: OMAP3 ROM Random Number Generator support
crypto: sha256_ssse3 - also test for BMI2
crypto: mv_cesa - Remove redundant of_match_ptr
crypto: sahara - Remove redundant of_match_ptr
...
Replace some instances of of_irq_map_one()/irq_create_of_mapping() and
of_irq_to_resource() by the simpler equivalent irq_of_parse_and_map().
Signed-off-by: Thierry Reding <treding@nvidia.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
[grant.likely: resolved conflicts with core code renames]
Signed-off-by: Grant Likely <grant.likely@linaro.org>
Each cycle of SHA512 operates on 32 data words where as
SHA256 operates on 16 data words. This needs to be updated
while configuring DMA channels. Doing the same.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
For writing input buffer into DATA_IN register current driver
has the following state machine:
-> if input buffer < 9 : use fallback driver
-> else if input buffer < block size : Copy input buffer into data_in regs
-> else use dma transfer.
In cases where requesting for DMA channels fails for some reason,
or channel numbers are not provided in DT or platform data, probe
also fails. Instead of returning from driver use cpu polling mode.
In this mode processor polls on INPUT_READY bit and writes data into
data_in regs when it equals 1. This operation is repeated until the
length of message.
Now the state machine looks like:
-> if input buffer < 9 : use fallback driver
-> else if input buffer < block size : Copy input buffer into data_in regs
-> else if dma enabled: use dma transfer
else use cpu polling mode.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use devm_kzalloc() to make cleanup paths simpler.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using devm_request_irq() rather than request_irq().
So removing free_irq() calls from the probe error
path and the remove handler.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>