twofish-x86_64-3way uses functions from LRW and XTS modules, so selecting would
appear to be better option than using #ifdefs in twofish_glue_3way.c to
enable/disable LRW and XTS features.
This also fixes build problem when twofish-x86_64-3way would be build into
kernel but XTS/LRW are build as modules.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
XTS has been EXPERIMENTAL since it was introduced in 2007. I'd say by now
it has seen enough testing to justify removal of EXPERIMENTAL tag.
CC: Rik Snel <rsnel@cube.dyndns.org>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
LRW has been EXPERIMENTAL since it was introduced in 2006. I'd say by now
it has seen enough testing to justify removal of EXPERIMENTAL tag.
CC: Rik Snel <rsnel@cube.dyndns.org>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Since serpent_sse2_glue.c uses cryptd, CRYPTO_SERPENT_SSE2_X86_64 and
CRYPTO_SERPENT_SSE2_586 should be selecting CRYPTO_CRYPTD.
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Fix a typo in the Kconfig file help text.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Selecting NET causes all sorts of issues, including a dependency
loop involving bluetooth. This patch makes it a dependency instead.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds a basic userspace configuration API for the crypto layer.
With this it is possible to instantiate, remove and to show crypto
algorithms from userspace.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Patch adds 3-way parallel x86_64 assembly implementation of twofish as new
module. New assembler functions crypt data in three blocks chunks, improving
cipher performance on out-of-order CPUs.
Patch has been tested with tcrypt and automated filesystem tests.
Summary of the tcrypt benchmarks:
Twofish 3-way-asm vs twofish asm (128bit 8kb block ECB)
encrypt: 1.3x speed
decrypt: 1.3x speed
Twofish 3-way-asm vs twofish asm (128bit 8kb block CBC)
encrypt: 1.07x speed
decrypt: 1.4x speed
Twofish 3-way-asm vs twofish asm (128bit 8kb block CTR)
encrypt: 1.4x speed
Twofish 3-way-asm vs AES asm (128bit 8kb block ECB)
encrypt: 1.0x speed
decrypt: 1.0x speed
Twofish 3-way-asm vs AES asm (128bit 8kb block CBC)
encrypt: 0.84x speed
decrypt: 1.09x speed
Twofish 3-way-asm vs AES asm (128bit 8kb block CTR)
encrypt: 1.15x speed
Full output:
http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-twofish-3way-asm-x86_64.txthttp://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-twofish-asm-x86_64.txthttp://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-aes-asm-x86_64.txt
Tests were run on:
vendor_id : AuthenticAMD
cpu family : 16
model : 10
model name : AMD Phenom(tm) II X6 1055T Processor
Also userspace test were run on:
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU E7330 @ 2.40GHz
stepping : 11
Userspace test results:
Encryption/decryption of twofish 3-way vs x86_64-asm on AMD Phenom II:
encrypt: 1.27x
decrypt: 1.25x
Encryption/decryption of twofish 3-way vs x86_64-asm on Intel Xeon E7330:
encrypt: 1.36x
decrypt: 1.36x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Patch adds x86_64 assembly implementation of blowfish. Two set of assembler
functions are provided. First set is regular 'one-block at time'
encrypt/decrypt functions. Second is 'four-block at time' functions that
gain performance increase on out-of-order CPUs. Performance of 4-way
functions should be equal to 1-way functions with in-order CPUs.
Summary of the tcrypt benchmarks:
Blowfish assembler vs blowfish C (256bit 8kb block ECB)
encrypt: 2.2x speed
decrypt: 2.3x speed
Blowfish assembler vs blowfish C (256bit 8kb block CBC)
encrypt: 1.12x speed
decrypt: 2.5x speed
Blowfish assembler vs blowfish C (256bit 8kb block CTR)
encrypt: 2.5x speed
Full output:
http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-blowfish-asm-x86_64.txthttp://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-blowfish-c-x86_64.txt
Tests were run on:
vendor_id : AuthenticAMD
cpu family : 16
model : 10
model name : AMD Phenom(tm) II X6 1055T Processor
stepping : 0
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Patch splits up the blowfish crypto routine into a common part (key setup)
which will be used by blowfish crypto modules (x86_64 assembly and generic-c).
Also fixes errors/warnings reported by checkpatch.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This is an assembler implementation of the SHA1 algorithm using the
Supplemental SSE3 (SSSE3) instructions or, when available, the
Advanced Vector Extensions (AVX).
Testing with the tcrypt module shows the raw hash performance is up to
2.3 times faster than the C implementation, using 8k data blocks on a
Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25%
faster.
Since this implementation uses SSE/YMM registers it cannot safely be
used in every situation, e.g. while an IRQ interrupts a kernel thread.
The implementation falls back to the generic SHA1 variant, if using
the SSE/YMM registers is not possible.
With this algorithm I was able to increase the throughput of a single
IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
the SSSE3 variant -- a speedup of +34.8%.
Saving and restoring SSE/YMM state might make the actual throughput
fluctuate when there are FPU intensive userland applications running.
For example, meassuring the performance using iperf2 directly on the
machine under test gives wobbling numbers because iperf2 uses the FPU
for each packet to check if the reporting interval has expired (in the
above test I got min/max/avg: 402/484/464 MBit/s).
Using this algorithm on a IPsec gateway gives much more reasonable and
stable numbers, albeit not as high as in the directly connected case.
Here is the result from an RFC 2544 test run with a EXFO Packet Blazer
FTB-8510:
frame size sha1-generic sha1-ssse3 delta
64 byte 37.5 MBit/s 37.5 MBit/s 0.0%
128 byte 56.3 MBit/s 62.5 MBit/s +11.0%
256 byte 87.5 MBit/s 100.0 MBit/s +14.3%
512 byte 131.3 MBit/s 150.0 MBit/s +14.2%
1024 byte 162.5 MBit/s 193.8 MBit/s +19.3%
1280 byte 175.0 MBit/s 212.5 MBit/s +21.4%
1420 byte 175.0 MBit/s 218.7 MBit/s +25.0%
1518 byte 150.0 MBit/s 181.2 MBit/s +20.8%
The throughput for the largest frame size is lower than for the
previous size because the IP packets need to be fragmented in this
case to make there way through the IPsec tunnel.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Maxim Locktyukhin <maxim.locktyukhin@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
fs: Merge split strings
treewide: fix potentially dangerous trailing ';' in #defined values/expressions
uwb: Fix misspelling of neighbourhood in comment
net, netfilter: Remove redundant goto in ebt_ulog_packet
trivial: don't touch files that are removed in the staging tree
lib/vsprintf: replace link to Draft by final RFC number
doc: Kconfig: `to be' -> `be'
doc: Kconfig: Typo: square -> squared
doc: Konfig: Documentation/power/{pm => apm-acpi}.txt
drivers/net: static should be at beginning of declaration
drivers/media: static should be at beginning of declaration
drivers/i2c: static should be at beginning of declaration
XTENSA: static should be at beginning of declaration
SH: static should be at beginning of declaration
MIPS: static should be at beginning of declaration
ARM: static should be at beginning of declaration
rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check
Update my e-mail address
PCIe ASPM: forcedly -> forcibly
gma500: push through device driver tree
...
Fix up trivial conflicts:
- arch/arm/mach-ep93xx/dma-m2p.c (deleted)
- drivers/gpio/gpio-ep93xx.c (renamed and context nearby)
- drivers/net/r8169.c (just context changes)
CRYPTO_GHASH_CLMUL_NI_INTEL and CRYPTO_AES_NI_INTEL cannot be used
on UML.
Commit 3e02e5cb and 54b6a1b enabled them by accident.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Loading fpu without aesni-intel does nothing. Loading aesni-intel
without fpu causes modes like xts to fail. (Unloading
aesni-intel will restore those modes.)
One solution would be to make aesni-intel depend on fpu, but it
seems cleaner to just combine the modules.
This is probably responsible for bugs like:
https://bugzilla.redhat.com/show_bug.cgi?id=589390
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This feature no longer needs the experimental tag.
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add missing dependency on NET since we require sockets for our
interface.
Should really be a select but kconfig doesn't like that:
net/Kconfig:6:error: found recursive dependency: NET -> NETWORK_FILESYSTEMS -> AFS_FS -> AF_RXRPC -> CRYPTO -> CRYPTO_USER_API_HASH -> CRYPTO_USER_API -> NET
Reported-by: Zimny Lech <napohybelskurwysynom2010@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The AES-NI instructions are also available in legacy mode so the 32-bit
architecture may profit from those, too.
To illustrate the performance gain here's a short summary of a dm-crypt
speed test on a Core i7 M620 running at 2.67GHz comparing both assembler
implementations:
x86: i568 aes-ni delta
ECB, 256 bit: 93.8 MB/s 123.3 MB/s +31.4%
CBC, 256 bit: 84.8 MB/s 262.3 MB/s +209.3%
LRW, 256 bit: 108.6 MB/s 222.1 MB/s +104.5%
XTS, 256 bit: 105.0 MB/s 205.5 MB/s +95.7%
Additionally, due to some minor optimizations, the 64-bit version also
got a minor performance gain as seen below:
x86-64: old impl. new impl. delta
ECB, 256 bit: 121.1 MB/s 123.0 MB/s +1.5%
CBC, 256 bit: 285.3 MB/s 290.8 MB/s +1.9%
LRW, 256 bit: 263.7 MB/s 265.3 MB/s +0.6%
XTS, 256 bit: 251.1 MB/s 255.3 MB/s +1.7%
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds the af_alg plugin for symmetric key ciphers,
corresponding to the ablkcipher kernel operation type.
Keys can optionally be set through the setsockopt interface.
Once a sendmsg call occurs without MSG_MORE no further writes
may be made to the socket until all previous data has been read.
IVs and and whether encryption/decryption is performed can be
set through the setsockopt interface or as a control message
to sendmsg.
The interface is completely synchronous, all operations are
carried out in recvmsg(2) and will complete prior to the system
call returning.
The splice(2) interface support reading the user-space data directly
without copying (except that the Crypto API itself may copy the data
if alignment is off).
The recvmsg(2) interface supports directly writing to user-space
without additional copying, i.e., the kernel crypto interface will
receive the user-space address as its output SG list.
Thakns to Miloslav Trmac for reviewing this and contributing
fixes and improvements.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
This patch adds the af_alg plugin for hash, corresponding to
the ahash kernel operation type.
Keys can optionally be set through the setsockopt interface.
Each sendmsg call will finalise the hash unless sent with a MSG_MORE
flag.
Partial hash states can be cloned using accept(2).
The interface is completely synchronous, all operations will
complete prior to the system call returning.
Both sendmsg(2) and splice(2) support reading the user-space
data directly without copying (except that the Crypto API itself
may copy the data if alignment is off).
For now only the splice(2) interface supports performing digest
instead of init/update/final. In future the sendmsg(2) interface
will also be modified to use digest/finup where possible so that
hardware that cannot return a partial hash state can still benefit
from this interface.
Thakns to Miloslav Trmac for reviewing this and contributing
fixes and improvements.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: Martin Willi <martin@strongswan.org>
This patch creates the backbone of the user-space interface for
the Crypto API, through a new socket family AF_ALG.
Each session corresponds to one or more connections obtained from
that socket. The number depends on the number of inputs/outputs
of that particular type of operation. For most types there will
be a s ingle connection/file descriptor that is used for both input
and output. AEAD is one of the few that require two inputs.
Each algorithm type will provide its own implementation that plugs
into af_alg. They're keyed using a string such as "skcipher" or
"hash".
IOW this patch only contains the boring bits that is required
to hold everything together.
Thakns to Miloslav Trmac for reviewing this and contributing
fixes and improvements.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: Martin Willi <martin@strongswan.org>
Below is a patch to update the broken web addresses, in crypto/*
that I could locate. Some are just simple typos that needed to be
fixed, and some had a change in location altogether..
let me know if any of them need to be changed and such.
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
On Thu, Aug 05, 2010 at 07:01:21PM -0700, Linus Torvalds wrote:
> On Thu, Aug 5, 2010 at 6:40 PM, Herbert Xu <herbert@gondor.hengli.com.au> wrote:
> >
> > -config CRYPTO_MANAGER_TESTS
> > - bool "Run algolithms' self-tests"
> > - default y
> > - depends on CRYPTO_MANAGER2
> > +config CRYPTO_MANAGER_DISABLE_TESTS
> > + bool "Disable run-time self tests"
> > + depends on CRYPTO_MANAGER2 && EMBEDDED
>
> Why do you still want to force-enable those tests? I was going to
> complain about the "default y" anyway, now I'm _really_ complaining,
> because you've now made it impossible to disable those tests. Why?
As requested, this patch sets the default to y and removes the
EMBEDDED dependency.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch fixes a serious bug in the test disabling patch where
it can cause an spurious load of the cryptomgr module even when
it's compiled in.
It also negates the test disabling option so that its absence
causes tests to be enabled.
The Kconfig option is also now behind EMBEDDED.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
By default, CONFIG_CRYPTO_MANAGER_TESTS will be enabled and thus
self-tests will still run, but it is now possible to disable them
to gain some time during bootup.
Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The PCOMP Kconfig entry current allows the following combination
which is illegal:
ZLIB=y
PCOMP=y
ALGAPI=m
ALGAPI2=y
MANAGER=m
MANAGER2=m
This patch fixes this by adding PCOMP2 so that PCOMP can select
ALGAPI to propagate the setting to MANAGER2.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds a parallel crypto template that takes a crypto
algorithm and converts it to process the crypto transforms in
parallel. For the moment only aead algorithms are supported.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
CLMUL-NI accelerated GHASH should be turned off on non-x86_64 machine.
Reported-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
PCLMULQDQ is used to accelerate the most time-consuming part of GHASH,
carry-less multiplication. More information about PCLMULQDQ can be
found at:
http://software.intel.com/en-us/articles/carry-less-multiplication-and-its-usage-for-computing-the-gcm-mode/
Because PCLMULQDQ changes XMM state, its usage must be enclosed with
kernel_fpu_begin/end, which can be used only in process context, the
acceleration is implemented as crypto_ahash. That is, request in soft
IRQ context will be defered to the cryptd kernel thread.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds VMAC (a fast MAC) support into crypto framework.
Signed-off-by: Shane Wang <shane.wang@intel.com>
Signed-off-by: Joseph Cihula <joseph.cihula@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
What about something like this? It defaults the CPRNG to m and makes FIPS
dependent on the CPRNG. That way you get a module build by default, but you can
change it to y manually during config and still satisfy the dependency, and if
you select N it disables FIPS as well. I rather like that better than making
FIPS a tristate. I just tested it out here and it seems to work well. Let me
know what you think
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This reverts commit 215ccd6f55.
It causes CPRNG and everything selected by it to be built-in
whenever FIPS is enabled. The problem is that it is selecting
a tristate from a bool, which is usually not what is intended.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove the dedicated GHASH implementation in GCM, and uses the GHASH
digest algorithm instead. This will make GCM uses hardware accelerated
GHASH implementation automatically if available.
ahash instead of shash interface is used, because some hardware
accelerated GHASH implementation needs asynchronous interface.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
GHASH is implemented as a shash algorithm. The actual implementation
is copied from gcm.c. This makes it possible to add
architecture/hardware accelerated GHASH implementation.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The ANSI CPRNG has no dependence on FIPS support. FIPS support however,
requires the use of the CPRNG. Adjust that depedency relationship in Kconfig.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Because kernel_fpu_begin() and kernel_fpu_end() operations are too
slow, the performance gain of general mode implementation + aes-aesni
is almost all compensated.
The AES-NI support for more modes are implemented as follow:
- Add a new AES algorithm implementation named __aes-aesni without
kernel_fpu_begin/end()
- Use fpu(<mode>(AES)) to provide kenrel_fpu_begin/end() invoking
- Add <mode>(AES) ablkcipher, which uses cryptd(fpu(<mode>(AES))) to
defer cryption to cryptd context in soft_irq context.
Now the ctr, lrw, pcbc and xts support are added.
Performance testing based on dm-crypt shows that cryption time can be
reduced to 50% of general mode implementation + aes-aesni implementation.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Blkcipher touching FPU need to be enclosed by kernel_fpu_begin() and
kernel_fpu_end(). If they are invoked in cipher algorithm
implementation, they will be invoked for each block, so that
performance will be hurt, because they are "slow" operations. This
patch implements "fpu" template, which makes these operations to be
invoked for each request.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The current "comp" crypto interface supports one-shot (de)compression only,
i.e. the whole data buffer to be (de)compressed must be passed at once, and
the whole (de)compressed data buffer will be received at once.
In several use-cases (e.g. compressed file systems that store files in big
compressed blocks), this workflow is not suitable.
Furthermore, the "comp" type doesn't provide for the configuration of
(de)compression parameters, and always allocates workspace memory for both
compression and decompression, which may waste memory.
To solve this, add a "pcomp" partial (de)compression interface that provides
the following operations:
- crypto_compress_{init,update,final}() for compression,
- crypto_decompress_{init,update,final}() for decompression,
- crypto_{,de}compress_setup(), to configure (de)compression parameters
(incl. allocating workspace memory).
The (de)compression methods take a struct comp_request, which was mimicked
after the z_stream object in zlib, and contains buffer pointer and length
pairs for input and output.
The setup methods take an opaque parameter pointer and length pair. Parameters
are supposed to be encoded using netlink attributes, whose meanings depend on
the actual (name of the) (de)compression algorithm.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
keventd_wq has potential starvation problem, so use dedicated
kcrypto_wq instead.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>