linux_old1/arch/x86/crypto/camellia_glue.c

1764 lines
72 KiB
C
Raw Normal View History

crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
/*
* Glue Code for assembler optimized version of Camellia
*
* Copyright (c) 2012 Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
*
* Camellia parts based on code by:
* Copyright (C) 2006 NTT (Nippon Telegraph and Telephone Corporation)
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307
* USA
*
*/
#include <asm/processor.h>
#include <asm/unaligned.h>
#include <linux/crypto.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/types.h>
#include <crypto/algapi.h>
#include <crypto/lrw.h>
#include <crypto/xts.h>
#include <asm/crypto/glue_helper.h>
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
#define CAMELLIA_MIN_KEY_SIZE 16
#define CAMELLIA_MAX_KEY_SIZE 32
#define CAMELLIA_BLOCK_SIZE 16
#define CAMELLIA_TABLE_BYTE_LEN 272
struct camellia_ctx {
u64 key_table[CAMELLIA_TABLE_BYTE_LEN / sizeof(u64)];
u32 key_length;
};
/* regular block cipher functions */
asmlinkage void __camellia_enc_blk(struct camellia_ctx *ctx, u8 *dst,
const u8 *src, bool xor);
asmlinkage void camellia_dec_blk(struct camellia_ctx *ctx, u8 *dst,
const u8 *src);
/* 2-way parallel cipher functions */
asmlinkage void __camellia_enc_blk_2way(struct camellia_ctx *ctx, u8 *dst,
const u8 *src, bool xor);
asmlinkage void camellia_dec_blk_2way(struct camellia_ctx *ctx, u8 *dst,
const u8 *src);
static inline void camellia_enc_blk(struct camellia_ctx *ctx, u8 *dst,
const u8 *src)
{
__camellia_enc_blk(ctx, dst, src, false);
}
static inline void camellia_enc_blk_xor(struct camellia_ctx *ctx, u8 *dst,
const u8 *src)
{
__camellia_enc_blk(ctx, dst, src, true);
}
static inline void camellia_enc_blk_2way(struct camellia_ctx *ctx, u8 *dst,
const u8 *src)
{
__camellia_enc_blk_2way(ctx, dst, src, false);
}
static inline void camellia_enc_blk_xor_2way(struct camellia_ctx *ctx, u8 *dst,
const u8 *src)
{
__camellia_enc_blk_2way(ctx, dst, src, true);
}
static void camellia_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
{
camellia_enc_blk(crypto_tfm_ctx(tfm), dst, src);
}
static void camellia_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
{
camellia_dec_blk(crypto_tfm_ctx(tfm), dst, src);
}
/* camellia sboxes */
const u64 camellia_sp10011110[256] = {
0x7000007070707000ULL, 0x8200008282828200ULL, 0x2c00002c2c2c2c00ULL,
0xec0000ecececec00ULL, 0xb30000b3b3b3b300ULL, 0x2700002727272700ULL,
0xc00000c0c0c0c000ULL, 0xe50000e5e5e5e500ULL, 0xe40000e4e4e4e400ULL,
0x8500008585858500ULL, 0x5700005757575700ULL, 0x3500003535353500ULL,
0xea0000eaeaeaea00ULL, 0x0c00000c0c0c0c00ULL, 0xae0000aeaeaeae00ULL,
0x4100004141414100ULL, 0x2300002323232300ULL, 0xef0000efefefef00ULL,
0x6b00006b6b6b6b00ULL, 0x9300009393939300ULL, 0x4500004545454500ULL,
0x1900001919191900ULL, 0xa50000a5a5a5a500ULL, 0x2100002121212100ULL,
0xed0000edededed00ULL, 0x0e00000e0e0e0e00ULL, 0x4f00004f4f4f4f00ULL,
0x4e00004e4e4e4e00ULL, 0x1d00001d1d1d1d00ULL, 0x6500006565656500ULL,
0x9200009292929200ULL, 0xbd0000bdbdbdbd00ULL, 0x8600008686868600ULL,
0xb80000b8b8b8b800ULL, 0xaf0000afafafaf00ULL, 0x8f00008f8f8f8f00ULL,
0x7c00007c7c7c7c00ULL, 0xeb0000ebebebeb00ULL, 0x1f00001f1f1f1f00ULL,
0xce0000cececece00ULL, 0x3e00003e3e3e3e00ULL, 0x3000003030303000ULL,
0xdc0000dcdcdcdc00ULL, 0x5f00005f5f5f5f00ULL, 0x5e00005e5e5e5e00ULL,
0xc50000c5c5c5c500ULL, 0x0b00000b0b0b0b00ULL, 0x1a00001a1a1a1a00ULL,
0xa60000a6a6a6a600ULL, 0xe10000e1e1e1e100ULL, 0x3900003939393900ULL,
0xca0000cacacaca00ULL, 0xd50000d5d5d5d500ULL, 0x4700004747474700ULL,
0x5d00005d5d5d5d00ULL, 0x3d00003d3d3d3d00ULL, 0xd90000d9d9d9d900ULL,
0x0100000101010100ULL, 0x5a00005a5a5a5a00ULL, 0xd60000d6d6d6d600ULL,
0x5100005151515100ULL, 0x5600005656565600ULL, 0x6c00006c6c6c6c00ULL,
0x4d00004d4d4d4d00ULL, 0x8b00008b8b8b8b00ULL, 0x0d00000d0d0d0d00ULL,
0x9a00009a9a9a9a00ULL, 0x6600006666666600ULL, 0xfb0000fbfbfbfb00ULL,
0xcc0000cccccccc00ULL, 0xb00000b0b0b0b000ULL, 0x2d00002d2d2d2d00ULL,
0x7400007474747400ULL, 0x1200001212121200ULL, 0x2b00002b2b2b2b00ULL,
0x2000002020202000ULL, 0xf00000f0f0f0f000ULL, 0xb10000b1b1b1b100ULL,
0x8400008484848400ULL, 0x9900009999999900ULL, 0xdf0000dfdfdfdf00ULL,
0x4c00004c4c4c4c00ULL, 0xcb0000cbcbcbcb00ULL, 0xc20000c2c2c2c200ULL,
0x3400003434343400ULL, 0x7e00007e7e7e7e00ULL, 0x7600007676767600ULL,
0x0500000505050500ULL, 0x6d00006d6d6d6d00ULL, 0xb70000b7b7b7b700ULL,
0xa90000a9a9a9a900ULL, 0x3100003131313100ULL, 0xd10000d1d1d1d100ULL,
0x1700001717171700ULL, 0x0400000404040400ULL, 0xd70000d7d7d7d700ULL,
0x1400001414141400ULL, 0x5800005858585800ULL, 0x3a00003a3a3a3a00ULL,
0x6100006161616100ULL, 0xde0000dededede00ULL, 0x1b00001b1b1b1b00ULL,
0x1100001111111100ULL, 0x1c00001c1c1c1c00ULL, 0x3200003232323200ULL,
0x0f00000f0f0f0f00ULL, 0x9c00009c9c9c9c00ULL, 0x1600001616161600ULL,
0x5300005353535300ULL, 0x1800001818181800ULL, 0xf20000f2f2f2f200ULL,
0x2200002222222200ULL, 0xfe0000fefefefe00ULL, 0x4400004444444400ULL,
0xcf0000cfcfcfcf00ULL, 0xb20000b2b2b2b200ULL, 0xc30000c3c3c3c300ULL,
0xb50000b5b5b5b500ULL, 0x7a00007a7a7a7a00ULL, 0x9100009191919100ULL,
0x2400002424242400ULL, 0x0800000808080800ULL, 0xe80000e8e8e8e800ULL,
0xa80000a8a8a8a800ULL, 0x6000006060606000ULL, 0xfc0000fcfcfcfc00ULL,
0x6900006969696900ULL, 0x5000005050505000ULL, 0xaa0000aaaaaaaa00ULL,
0xd00000d0d0d0d000ULL, 0xa00000a0a0a0a000ULL, 0x7d00007d7d7d7d00ULL,
0xa10000a1a1a1a100ULL, 0x8900008989898900ULL, 0x6200006262626200ULL,
0x9700009797979700ULL, 0x5400005454545400ULL, 0x5b00005b5b5b5b00ULL,
0x1e00001e1e1e1e00ULL, 0x9500009595959500ULL, 0xe00000e0e0e0e000ULL,
0xff0000ffffffff00ULL, 0x6400006464646400ULL, 0xd20000d2d2d2d200ULL,
0x1000001010101000ULL, 0xc40000c4c4c4c400ULL, 0x0000000000000000ULL,
0x4800004848484800ULL, 0xa30000a3a3a3a300ULL, 0xf70000f7f7f7f700ULL,
0x7500007575757500ULL, 0xdb0000dbdbdbdb00ULL, 0x8a00008a8a8a8a00ULL,
0x0300000303030300ULL, 0xe60000e6e6e6e600ULL, 0xda0000dadadada00ULL,
0x0900000909090900ULL, 0x3f00003f3f3f3f00ULL, 0xdd0000dddddddd00ULL,
0x9400009494949400ULL, 0x8700008787878700ULL, 0x5c00005c5c5c5c00ULL,
0x8300008383838300ULL, 0x0200000202020200ULL, 0xcd0000cdcdcdcd00ULL,
0x4a00004a4a4a4a00ULL, 0x9000009090909000ULL, 0x3300003333333300ULL,
0x7300007373737300ULL, 0x6700006767676700ULL, 0xf60000f6f6f6f600ULL,
0xf30000f3f3f3f300ULL, 0x9d00009d9d9d9d00ULL, 0x7f00007f7f7f7f00ULL,
0xbf0000bfbfbfbf00ULL, 0xe20000e2e2e2e200ULL, 0x5200005252525200ULL,
0x9b00009b9b9b9b00ULL, 0xd80000d8d8d8d800ULL, 0x2600002626262600ULL,
0xc80000c8c8c8c800ULL, 0x3700003737373700ULL, 0xc60000c6c6c6c600ULL,
0x3b00003b3b3b3b00ULL, 0x8100008181818100ULL, 0x9600009696969600ULL,
0x6f00006f6f6f6f00ULL, 0x4b00004b4b4b4b00ULL, 0x1300001313131300ULL,
0xbe0000bebebebe00ULL, 0x6300006363636300ULL, 0x2e00002e2e2e2e00ULL,
0xe90000e9e9e9e900ULL, 0x7900007979797900ULL, 0xa70000a7a7a7a700ULL,
0x8c00008c8c8c8c00ULL, 0x9f00009f9f9f9f00ULL, 0x6e00006e6e6e6e00ULL,
0xbc0000bcbcbcbc00ULL, 0x8e00008e8e8e8e00ULL, 0x2900002929292900ULL,
0xf50000f5f5f5f500ULL, 0xf90000f9f9f9f900ULL, 0xb60000b6b6b6b600ULL,
0x2f00002f2f2f2f00ULL, 0xfd0000fdfdfdfd00ULL, 0xb40000b4b4b4b400ULL,
0x5900005959595900ULL, 0x7800007878787800ULL, 0x9800009898989800ULL,
0x0600000606060600ULL, 0x6a00006a6a6a6a00ULL, 0xe70000e7e7e7e700ULL,
0x4600004646464600ULL, 0x7100007171717100ULL, 0xba0000babababa00ULL,
0xd40000d4d4d4d400ULL, 0x2500002525252500ULL, 0xab0000abababab00ULL,
0x4200004242424200ULL, 0x8800008888888800ULL, 0xa20000a2a2a2a200ULL,
0x8d00008d8d8d8d00ULL, 0xfa0000fafafafa00ULL, 0x7200007272727200ULL,
0x0700000707070700ULL, 0xb90000b9b9b9b900ULL, 0x5500005555555500ULL,
0xf80000f8f8f8f800ULL, 0xee0000eeeeeeee00ULL, 0xac0000acacacac00ULL,
0x0a00000a0a0a0a00ULL, 0x3600003636363600ULL, 0x4900004949494900ULL,
0x2a00002a2a2a2a00ULL, 0x6800006868686800ULL, 0x3c00003c3c3c3c00ULL,
0x3800003838383800ULL, 0xf10000f1f1f1f100ULL, 0xa40000a4a4a4a400ULL,
0x4000004040404000ULL, 0x2800002828282800ULL, 0xd30000d3d3d3d300ULL,
0x7b00007b7b7b7b00ULL, 0xbb0000bbbbbbbb00ULL, 0xc90000c9c9c9c900ULL,
0x4300004343434300ULL, 0xc10000c1c1c1c100ULL, 0x1500001515151500ULL,
0xe30000e3e3e3e300ULL, 0xad0000adadadad00ULL, 0xf40000f4f4f4f400ULL,
0x7700007777777700ULL, 0xc70000c7c7c7c700ULL, 0x8000008080808000ULL,
0x9e00009e9e9e9e00ULL,
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
};
const u64 camellia_sp22000222[256] = {
0xe0e0000000e0e0e0ULL, 0x0505000000050505ULL, 0x5858000000585858ULL,
0xd9d9000000d9d9d9ULL, 0x6767000000676767ULL, 0x4e4e0000004e4e4eULL,
0x8181000000818181ULL, 0xcbcb000000cbcbcbULL, 0xc9c9000000c9c9c9ULL,
0x0b0b0000000b0b0bULL, 0xaeae000000aeaeaeULL, 0x6a6a0000006a6a6aULL,
0xd5d5000000d5d5d5ULL, 0x1818000000181818ULL, 0x5d5d0000005d5d5dULL,
0x8282000000828282ULL, 0x4646000000464646ULL, 0xdfdf000000dfdfdfULL,
0xd6d6000000d6d6d6ULL, 0x2727000000272727ULL, 0x8a8a0000008a8a8aULL,
0x3232000000323232ULL, 0x4b4b0000004b4b4bULL, 0x4242000000424242ULL,
0xdbdb000000dbdbdbULL, 0x1c1c0000001c1c1cULL, 0x9e9e0000009e9e9eULL,
0x9c9c0000009c9c9cULL, 0x3a3a0000003a3a3aULL, 0xcaca000000cacacaULL,
0x2525000000252525ULL, 0x7b7b0000007b7b7bULL, 0x0d0d0000000d0d0dULL,
0x7171000000717171ULL, 0x5f5f0000005f5f5fULL, 0x1f1f0000001f1f1fULL,
0xf8f8000000f8f8f8ULL, 0xd7d7000000d7d7d7ULL, 0x3e3e0000003e3e3eULL,
0x9d9d0000009d9d9dULL, 0x7c7c0000007c7c7cULL, 0x6060000000606060ULL,
0xb9b9000000b9b9b9ULL, 0xbebe000000bebebeULL, 0xbcbc000000bcbcbcULL,
0x8b8b0000008b8b8bULL, 0x1616000000161616ULL, 0x3434000000343434ULL,
0x4d4d0000004d4d4dULL, 0xc3c3000000c3c3c3ULL, 0x7272000000727272ULL,
0x9595000000959595ULL, 0xabab000000abababULL, 0x8e8e0000008e8e8eULL,
0xbaba000000bababaULL, 0x7a7a0000007a7a7aULL, 0xb3b3000000b3b3b3ULL,
0x0202000000020202ULL, 0xb4b4000000b4b4b4ULL, 0xadad000000adadadULL,
0xa2a2000000a2a2a2ULL, 0xacac000000acacacULL, 0xd8d8000000d8d8d8ULL,
0x9a9a0000009a9a9aULL, 0x1717000000171717ULL, 0x1a1a0000001a1a1aULL,
0x3535000000353535ULL, 0xcccc000000ccccccULL, 0xf7f7000000f7f7f7ULL,
0x9999000000999999ULL, 0x6161000000616161ULL, 0x5a5a0000005a5a5aULL,
0xe8e8000000e8e8e8ULL, 0x2424000000242424ULL, 0x5656000000565656ULL,
0x4040000000404040ULL, 0xe1e1000000e1e1e1ULL, 0x6363000000636363ULL,
0x0909000000090909ULL, 0x3333000000333333ULL, 0xbfbf000000bfbfbfULL,
0x9898000000989898ULL, 0x9797000000979797ULL, 0x8585000000858585ULL,
0x6868000000686868ULL, 0xfcfc000000fcfcfcULL, 0xecec000000ecececULL,
0x0a0a0000000a0a0aULL, 0xdada000000dadadaULL, 0x6f6f0000006f6f6fULL,
0x5353000000535353ULL, 0x6262000000626262ULL, 0xa3a3000000a3a3a3ULL,
0x2e2e0000002e2e2eULL, 0x0808000000080808ULL, 0xafaf000000afafafULL,
0x2828000000282828ULL, 0xb0b0000000b0b0b0ULL, 0x7474000000747474ULL,
0xc2c2000000c2c2c2ULL, 0xbdbd000000bdbdbdULL, 0x3636000000363636ULL,
0x2222000000222222ULL, 0x3838000000383838ULL, 0x6464000000646464ULL,
0x1e1e0000001e1e1eULL, 0x3939000000393939ULL, 0x2c2c0000002c2c2cULL,
0xa6a6000000a6a6a6ULL, 0x3030000000303030ULL, 0xe5e5000000e5e5e5ULL,
0x4444000000444444ULL, 0xfdfd000000fdfdfdULL, 0x8888000000888888ULL,
0x9f9f0000009f9f9fULL, 0x6565000000656565ULL, 0x8787000000878787ULL,
0x6b6b0000006b6b6bULL, 0xf4f4000000f4f4f4ULL, 0x2323000000232323ULL,
0x4848000000484848ULL, 0x1010000000101010ULL, 0xd1d1000000d1d1d1ULL,
0x5151000000515151ULL, 0xc0c0000000c0c0c0ULL, 0xf9f9000000f9f9f9ULL,
0xd2d2000000d2d2d2ULL, 0xa0a0000000a0a0a0ULL, 0x5555000000555555ULL,
0xa1a1000000a1a1a1ULL, 0x4141000000414141ULL, 0xfafa000000fafafaULL,
0x4343000000434343ULL, 0x1313000000131313ULL, 0xc4c4000000c4c4c4ULL,
0x2f2f0000002f2f2fULL, 0xa8a8000000a8a8a8ULL, 0xb6b6000000b6b6b6ULL,
0x3c3c0000003c3c3cULL, 0x2b2b0000002b2b2bULL, 0xc1c1000000c1c1c1ULL,
0xffff000000ffffffULL, 0xc8c8000000c8c8c8ULL, 0xa5a5000000a5a5a5ULL,
0x2020000000202020ULL, 0x8989000000898989ULL, 0x0000000000000000ULL,
0x9090000000909090ULL, 0x4747000000474747ULL, 0xefef000000efefefULL,
0xeaea000000eaeaeaULL, 0xb7b7000000b7b7b7ULL, 0x1515000000151515ULL,
0x0606000000060606ULL, 0xcdcd000000cdcdcdULL, 0xb5b5000000b5b5b5ULL,
0x1212000000121212ULL, 0x7e7e0000007e7e7eULL, 0xbbbb000000bbbbbbULL,
0x2929000000292929ULL, 0x0f0f0000000f0f0fULL, 0xb8b8000000b8b8b8ULL,
0x0707000000070707ULL, 0x0404000000040404ULL, 0x9b9b0000009b9b9bULL,
0x9494000000949494ULL, 0x2121000000212121ULL, 0x6666000000666666ULL,
0xe6e6000000e6e6e6ULL, 0xcece000000cececeULL, 0xeded000000edededULL,
0xe7e7000000e7e7e7ULL, 0x3b3b0000003b3b3bULL, 0xfefe000000fefefeULL,
0x7f7f0000007f7f7fULL, 0xc5c5000000c5c5c5ULL, 0xa4a4000000a4a4a4ULL,
0x3737000000373737ULL, 0xb1b1000000b1b1b1ULL, 0x4c4c0000004c4c4cULL,
0x9191000000919191ULL, 0x6e6e0000006e6e6eULL, 0x8d8d0000008d8d8dULL,
0x7676000000767676ULL, 0x0303000000030303ULL, 0x2d2d0000002d2d2dULL,
0xdede000000dededeULL, 0x9696000000969696ULL, 0x2626000000262626ULL,
0x7d7d0000007d7d7dULL, 0xc6c6000000c6c6c6ULL, 0x5c5c0000005c5c5cULL,
0xd3d3000000d3d3d3ULL, 0xf2f2000000f2f2f2ULL, 0x4f4f0000004f4f4fULL,
0x1919000000191919ULL, 0x3f3f0000003f3f3fULL, 0xdcdc000000dcdcdcULL,
0x7979000000797979ULL, 0x1d1d0000001d1d1dULL, 0x5252000000525252ULL,
0xebeb000000ebebebULL, 0xf3f3000000f3f3f3ULL, 0x6d6d0000006d6d6dULL,
0x5e5e0000005e5e5eULL, 0xfbfb000000fbfbfbULL, 0x6969000000696969ULL,
0xb2b2000000b2b2b2ULL, 0xf0f0000000f0f0f0ULL, 0x3131000000313131ULL,
0x0c0c0000000c0c0cULL, 0xd4d4000000d4d4d4ULL, 0xcfcf000000cfcfcfULL,
0x8c8c0000008c8c8cULL, 0xe2e2000000e2e2e2ULL, 0x7575000000757575ULL,
0xa9a9000000a9a9a9ULL, 0x4a4a0000004a4a4aULL, 0x5757000000575757ULL,
0x8484000000848484ULL, 0x1111000000111111ULL, 0x4545000000454545ULL,
0x1b1b0000001b1b1bULL, 0xf5f5000000f5f5f5ULL, 0xe4e4000000e4e4e4ULL,
0x0e0e0000000e0e0eULL, 0x7373000000737373ULL, 0xaaaa000000aaaaaaULL,
0xf1f1000000f1f1f1ULL, 0xdddd000000ddddddULL, 0x5959000000595959ULL,
0x1414000000141414ULL, 0x6c6c0000006c6c6cULL, 0x9292000000929292ULL,
0x5454000000545454ULL, 0xd0d0000000d0d0d0ULL, 0x7878000000787878ULL,
0x7070000000707070ULL, 0xe3e3000000e3e3e3ULL, 0x4949000000494949ULL,
0x8080000000808080ULL, 0x5050000000505050ULL, 0xa7a7000000a7a7a7ULL,
0xf6f6000000f6f6f6ULL, 0x7777000000777777ULL, 0x9393000000939393ULL,
0x8686000000868686ULL, 0x8383000000838383ULL, 0x2a2a0000002a2a2aULL,
0xc7c7000000c7c7c7ULL, 0x5b5b0000005b5b5bULL, 0xe9e9000000e9e9e9ULL,
0xeeee000000eeeeeeULL, 0x8f8f0000008f8f8fULL, 0x0101000000010101ULL,
0x3d3d0000003d3d3dULL,
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
};
const u64 camellia_sp03303033[256] = {
0x0038380038003838ULL, 0x0041410041004141ULL, 0x0016160016001616ULL,
0x0076760076007676ULL, 0x00d9d900d900d9d9ULL, 0x0093930093009393ULL,
0x0060600060006060ULL, 0x00f2f200f200f2f2ULL, 0x0072720072007272ULL,
0x00c2c200c200c2c2ULL, 0x00abab00ab00ababULL, 0x009a9a009a009a9aULL,
0x0075750075007575ULL, 0x0006060006000606ULL, 0x0057570057005757ULL,
0x00a0a000a000a0a0ULL, 0x0091910091009191ULL, 0x00f7f700f700f7f7ULL,
0x00b5b500b500b5b5ULL, 0x00c9c900c900c9c9ULL, 0x00a2a200a200a2a2ULL,
0x008c8c008c008c8cULL, 0x00d2d200d200d2d2ULL, 0x0090900090009090ULL,
0x00f6f600f600f6f6ULL, 0x0007070007000707ULL, 0x00a7a700a700a7a7ULL,
0x0027270027002727ULL, 0x008e8e008e008e8eULL, 0x00b2b200b200b2b2ULL,
0x0049490049004949ULL, 0x00dede00de00dedeULL, 0x0043430043004343ULL,
0x005c5c005c005c5cULL, 0x00d7d700d700d7d7ULL, 0x00c7c700c700c7c7ULL,
0x003e3e003e003e3eULL, 0x00f5f500f500f5f5ULL, 0x008f8f008f008f8fULL,
0x0067670067006767ULL, 0x001f1f001f001f1fULL, 0x0018180018001818ULL,
0x006e6e006e006e6eULL, 0x00afaf00af00afafULL, 0x002f2f002f002f2fULL,
0x00e2e200e200e2e2ULL, 0x0085850085008585ULL, 0x000d0d000d000d0dULL,
0x0053530053005353ULL, 0x00f0f000f000f0f0ULL, 0x009c9c009c009c9cULL,
0x0065650065006565ULL, 0x00eaea00ea00eaeaULL, 0x00a3a300a300a3a3ULL,
0x00aeae00ae00aeaeULL, 0x009e9e009e009e9eULL, 0x00ecec00ec00ececULL,
0x0080800080008080ULL, 0x002d2d002d002d2dULL, 0x006b6b006b006b6bULL,
0x00a8a800a800a8a8ULL, 0x002b2b002b002b2bULL, 0x0036360036003636ULL,
0x00a6a600a600a6a6ULL, 0x00c5c500c500c5c5ULL, 0x0086860086008686ULL,
0x004d4d004d004d4dULL, 0x0033330033003333ULL, 0x00fdfd00fd00fdfdULL,
0x0066660066006666ULL, 0x0058580058005858ULL, 0x0096960096009696ULL,
0x003a3a003a003a3aULL, 0x0009090009000909ULL, 0x0095950095009595ULL,
0x0010100010001010ULL, 0x0078780078007878ULL, 0x00d8d800d800d8d8ULL,
0x0042420042004242ULL, 0x00cccc00cc00ccccULL, 0x00efef00ef00efefULL,
0x0026260026002626ULL, 0x00e5e500e500e5e5ULL, 0x0061610061006161ULL,
0x001a1a001a001a1aULL, 0x003f3f003f003f3fULL, 0x003b3b003b003b3bULL,
0x0082820082008282ULL, 0x00b6b600b600b6b6ULL, 0x00dbdb00db00dbdbULL,
0x00d4d400d400d4d4ULL, 0x0098980098009898ULL, 0x00e8e800e800e8e8ULL,
0x008b8b008b008b8bULL, 0x0002020002000202ULL, 0x00ebeb00eb00ebebULL,
0x000a0a000a000a0aULL, 0x002c2c002c002c2cULL, 0x001d1d001d001d1dULL,
0x00b0b000b000b0b0ULL, 0x006f6f006f006f6fULL, 0x008d8d008d008d8dULL,
0x0088880088008888ULL, 0x000e0e000e000e0eULL, 0x0019190019001919ULL,
0x0087870087008787ULL, 0x004e4e004e004e4eULL, 0x000b0b000b000b0bULL,
0x00a9a900a900a9a9ULL, 0x000c0c000c000c0cULL, 0x0079790079007979ULL,
0x0011110011001111ULL, 0x007f7f007f007f7fULL, 0x0022220022002222ULL,
0x00e7e700e700e7e7ULL, 0x0059590059005959ULL, 0x00e1e100e100e1e1ULL,
0x00dada00da00dadaULL, 0x003d3d003d003d3dULL, 0x00c8c800c800c8c8ULL,
0x0012120012001212ULL, 0x0004040004000404ULL, 0x0074740074007474ULL,
0x0054540054005454ULL, 0x0030300030003030ULL, 0x007e7e007e007e7eULL,
0x00b4b400b400b4b4ULL, 0x0028280028002828ULL, 0x0055550055005555ULL,
0x0068680068006868ULL, 0x0050500050005050ULL, 0x00bebe00be00bebeULL,
0x00d0d000d000d0d0ULL, 0x00c4c400c400c4c4ULL, 0x0031310031003131ULL,
0x00cbcb00cb00cbcbULL, 0x002a2a002a002a2aULL, 0x00adad00ad00adadULL,
0x000f0f000f000f0fULL, 0x00caca00ca00cacaULL, 0x0070700070007070ULL,
0x00ffff00ff00ffffULL, 0x0032320032003232ULL, 0x0069690069006969ULL,
0x0008080008000808ULL, 0x0062620062006262ULL, 0x0000000000000000ULL,
0x0024240024002424ULL, 0x00d1d100d100d1d1ULL, 0x00fbfb00fb00fbfbULL,
0x00baba00ba00babaULL, 0x00eded00ed00ededULL, 0x0045450045004545ULL,
0x0081810081008181ULL, 0x0073730073007373ULL, 0x006d6d006d006d6dULL,
0x0084840084008484ULL, 0x009f9f009f009f9fULL, 0x00eeee00ee00eeeeULL,
0x004a4a004a004a4aULL, 0x00c3c300c300c3c3ULL, 0x002e2e002e002e2eULL,
0x00c1c100c100c1c1ULL, 0x0001010001000101ULL, 0x00e6e600e600e6e6ULL,
0x0025250025002525ULL, 0x0048480048004848ULL, 0x0099990099009999ULL,
0x00b9b900b900b9b9ULL, 0x00b3b300b300b3b3ULL, 0x007b7b007b007b7bULL,
0x00f9f900f900f9f9ULL, 0x00cece00ce00ceceULL, 0x00bfbf00bf00bfbfULL,
0x00dfdf00df00dfdfULL, 0x0071710071007171ULL, 0x0029290029002929ULL,
0x00cdcd00cd00cdcdULL, 0x006c6c006c006c6cULL, 0x0013130013001313ULL,
0x0064640064006464ULL, 0x009b9b009b009b9bULL, 0x0063630063006363ULL,
0x009d9d009d009d9dULL, 0x00c0c000c000c0c0ULL, 0x004b4b004b004b4bULL,
0x00b7b700b700b7b7ULL, 0x00a5a500a500a5a5ULL, 0x0089890089008989ULL,
0x005f5f005f005f5fULL, 0x00b1b100b100b1b1ULL, 0x0017170017001717ULL,
0x00f4f400f400f4f4ULL, 0x00bcbc00bc00bcbcULL, 0x00d3d300d300d3d3ULL,
0x0046460046004646ULL, 0x00cfcf00cf00cfcfULL, 0x0037370037003737ULL,
0x005e5e005e005e5eULL, 0x0047470047004747ULL, 0x0094940094009494ULL,
0x00fafa00fa00fafaULL, 0x00fcfc00fc00fcfcULL, 0x005b5b005b005b5bULL,
0x0097970097009797ULL, 0x00fefe00fe00fefeULL, 0x005a5a005a005a5aULL,
0x00acac00ac00acacULL, 0x003c3c003c003c3cULL, 0x004c4c004c004c4cULL,
0x0003030003000303ULL, 0x0035350035003535ULL, 0x00f3f300f300f3f3ULL,
0x0023230023002323ULL, 0x00b8b800b800b8b8ULL, 0x005d5d005d005d5dULL,
0x006a6a006a006a6aULL, 0x0092920092009292ULL, 0x00d5d500d500d5d5ULL,
0x0021210021002121ULL, 0x0044440044004444ULL, 0x0051510051005151ULL,
0x00c6c600c600c6c6ULL, 0x007d7d007d007d7dULL, 0x0039390039003939ULL,
0x0083830083008383ULL, 0x00dcdc00dc00dcdcULL, 0x00aaaa00aa00aaaaULL,
0x007c7c007c007c7cULL, 0x0077770077007777ULL, 0x0056560056005656ULL,
0x0005050005000505ULL, 0x001b1b001b001b1bULL, 0x00a4a400a400a4a4ULL,
0x0015150015001515ULL, 0x0034340034003434ULL, 0x001e1e001e001e1eULL,
0x001c1c001c001c1cULL, 0x00f8f800f800f8f8ULL, 0x0052520052005252ULL,
0x0020200020002020ULL, 0x0014140014001414ULL, 0x00e9e900e900e9e9ULL,
0x00bdbd00bd00bdbdULL, 0x00dddd00dd00ddddULL, 0x00e4e400e400e4e4ULL,
0x00a1a100a100a1a1ULL, 0x00e0e000e000e0e0ULL, 0x008a8a008a008a8aULL,
0x00f1f100f100f1f1ULL, 0x00d6d600d600d6d6ULL, 0x007a7a007a007a7aULL,
0x00bbbb00bb00bbbbULL, 0x00e3e300e300e3e3ULL, 0x0040400040004040ULL,
0x004f4f004f004f4fULL,
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
};
const u64 camellia_sp00444404[256] = {
0x0000707070700070ULL, 0x00002c2c2c2c002cULL, 0x0000b3b3b3b300b3ULL,
0x0000c0c0c0c000c0ULL, 0x0000e4e4e4e400e4ULL, 0x0000575757570057ULL,
0x0000eaeaeaea00eaULL, 0x0000aeaeaeae00aeULL, 0x0000232323230023ULL,
0x00006b6b6b6b006bULL, 0x0000454545450045ULL, 0x0000a5a5a5a500a5ULL,
0x0000edededed00edULL, 0x00004f4f4f4f004fULL, 0x00001d1d1d1d001dULL,
0x0000929292920092ULL, 0x0000868686860086ULL, 0x0000afafafaf00afULL,
0x00007c7c7c7c007cULL, 0x00001f1f1f1f001fULL, 0x00003e3e3e3e003eULL,
0x0000dcdcdcdc00dcULL, 0x00005e5e5e5e005eULL, 0x00000b0b0b0b000bULL,
0x0000a6a6a6a600a6ULL, 0x0000393939390039ULL, 0x0000d5d5d5d500d5ULL,
0x00005d5d5d5d005dULL, 0x0000d9d9d9d900d9ULL, 0x00005a5a5a5a005aULL,
0x0000515151510051ULL, 0x00006c6c6c6c006cULL, 0x00008b8b8b8b008bULL,
0x00009a9a9a9a009aULL, 0x0000fbfbfbfb00fbULL, 0x0000b0b0b0b000b0ULL,
0x0000747474740074ULL, 0x00002b2b2b2b002bULL, 0x0000f0f0f0f000f0ULL,
0x0000848484840084ULL, 0x0000dfdfdfdf00dfULL, 0x0000cbcbcbcb00cbULL,
0x0000343434340034ULL, 0x0000767676760076ULL, 0x00006d6d6d6d006dULL,
0x0000a9a9a9a900a9ULL, 0x0000d1d1d1d100d1ULL, 0x0000040404040004ULL,
0x0000141414140014ULL, 0x00003a3a3a3a003aULL, 0x0000dededede00deULL,
0x0000111111110011ULL, 0x0000323232320032ULL, 0x00009c9c9c9c009cULL,
0x0000535353530053ULL, 0x0000f2f2f2f200f2ULL, 0x0000fefefefe00feULL,
0x0000cfcfcfcf00cfULL, 0x0000c3c3c3c300c3ULL, 0x00007a7a7a7a007aULL,
0x0000242424240024ULL, 0x0000e8e8e8e800e8ULL, 0x0000606060600060ULL,
0x0000696969690069ULL, 0x0000aaaaaaaa00aaULL, 0x0000a0a0a0a000a0ULL,
0x0000a1a1a1a100a1ULL, 0x0000626262620062ULL, 0x0000545454540054ULL,
0x00001e1e1e1e001eULL, 0x0000e0e0e0e000e0ULL, 0x0000646464640064ULL,
0x0000101010100010ULL, 0x0000000000000000ULL, 0x0000a3a3a3a300a3ULL,
0x0000757575750075ULL, 0x00008a8a8a8a008aULL, 0x0000e6e6e6e600e6ULL,
0x0000090909090009ULL, 0x0000dddddddd00ddULL, 0x0000878787870087ULL,
0x0000838383830083ULL, 0x0000cdcdcdcd00cdULL, 0x0000909090900090ULL,
0x0000737373730073ULL, 0x0000f6f6f6f600f6ULL, 0x00009d9d9d9d009dULL,
0x0000bfbfbfbf00bfULL, 0x0000525252520052ULL, 0x0000d8d8d8d800d8ULL,
0x0000c8c8c8c800c8ULL, 0x0000c6c6c6c600c6ULL, 0x0000818181810081ULL,
0x00006f6f6f6f006fULL, 0x0000131313130013ULL, 0x0000636363630063ULL,
0x0000e9e9e9e900e9ULL, 0x0000a7a7a7a700a7ULL, 0x00009f9f9f9f009fULL,
0x0000bcbcbcbc00bcULL, 0x0000292929290029ULL, 0x0000f9f9f9f900f9ULL,
0x00002f2f2f2f002fULL, 0x0000b4b4b4b400b4ULL, 0x0000787878780078ULL,
0x0000060606060006ULL, 0x0000e7e7e7e700e7ULL, 0x0000717171710071ULL,
0x0000d4d4d4d400d4ULL, 0x0000abababab00abULL, 0x0000888888880088ULL,
0x00008d8d8d8d008dULL, 0x0000727272720072ULL, 0x0000b9b9b9b900b9ULL,
0x0000f8f8f8f800f8ULL, 0x0000acacacac00acULL, 0x0000363636360036ULL,
0x00002a2a2a2a002aULL, 0x00003c3c3c3c003cULL, 0x0000f1f1f1f100f1ULL,
0x0000404040400040ULL, 0x0000d3d3d3d300d3ULL, 0x0000bbbbbbbb00bbULL,
0x0000434343430043ULL, 0x0000151515150015ULL, 0x0000adadadad00adULL,
0x0000777777770077ULL, 0x0000808080800080ULL, 0x0000828282820082ULL,
0x0000ecececec00ecULL, 0x0000272727270027ULL, 0x0000e5e5e5e500e5ULL,
0x0000858585850085ULL, 0x0000353535350035ULL, 0x00000c0c0c0c000cULL,
0x0000414141410041ULL, 0x0000efefefef00efULL, 0x0000939393930093ULL,
0x0000191919190019ULL, 0x0000212121210021ULL, 0x00000e0e0e0e000eULL,
0x00004e4e4e4e004eULL, 0x0000656565650065ULL, 0x0000bdbdbdbd00bdULL,
0x0000b8b8b8b800b8ULL, 0x00008f8f8f8f008fULL, 0x0000ebebebeb00ebULL,
0x0000cececece00ceULL, 0x0000303030300030ULL, 0x00005f5f5f5f005fULL,
0x0000c5c5c5c500c5ULL, 0x00001a1a1a1a001aULL, 0x0000e1e1e1e100e1ULL,
0x0000cacacaca00caULL, 0x0000474747470047ULL, 0x00003d3d3d3d003dULL,
0x0000010101010001ULL, 0x0000d6d6d6d600d6ULL, 0x0000565656560056ULL,
0x00004d4d4d4d004dULL, 0x00000d0d0d0d000dULL, 0x0000666666660066ULL,
0x0000cccccccc00ccULL, 0x00002d2d2d2d002dULL, 0x0000121212120012ULL,
0x0000202020200020ULL, 0x0000b1b1b1b100b1ULL, 0x0000999999990099ULL,
0x00004c4c4c4c004cULL, 0x0000c2c2c2c200c2ULL, 0x00007e7e7e7e007eULL,
0x0000050505050005ULL, 0x0000b7b7b7b700b7ULL, 0x0000313131310031ULL,
0x0000171717170017ULL, 0x0000d7d7d7d700d7ULL, 0x0000585858580058ULL,
0x0000616161610061ULL, 0x00001b1b1b1b001bULL, 0x00001c1c1c1c001cULL,
0x00000f0f0f0f000fULL, 0x0000161616160016ULL, 0x0000181818180018ULL,
0x0000222222220022ULL, 0x0000444444440044ULL, 0x0000b2b2b2b200b2ULL,
0x0000b5b5b5b500b5ULL, 0x0000919191910091ULL, 0x0000080808080008ULL,
0x0000a8a8a8a800a8ULL, 0x0000fcfcfcfc00fcULL, 0x0000505050500050ULL,
0x0000d0d0d0d000d0ULL, 0x00007d7d7d7d007dULL, 0x0000898989890089ULL,
0x0000979797970097ULL, 0x00005b5b5b5b005bULL, 0x0000959595950095ULL,
0x0000ffffffff00ffULL, 0x0000d2d2d2d200d2ULL, 0x0000c4c4c4c400c4ULL,
0x0000484848480048ULL, 0x0000f7f7f7f700f7ULL, 0x0000dbdbdbdb00dbULL,
0x0000030303030003ULL, 0x0000dadadada00daULL, 0x00003f3f3f3f003fULL,
0x0000949494940094ULL, 0x00005c5c5c5c005cULL, 0x0000020202020002ULL,
0x00004a4a4a4a004aULL, 0x0000333333330033ULL, 0x0000676767670067ULL,
0x0000f3f3f3f300f3ULL, 0x00007f7f7f7f007fULL, 0x0000e2e2e2e200e2ULL,
0x00009b9b9b9b009bULL, 0x0000262626260026ULL, 0x0000373737370037ULL,
0x00003b3b3b3b003bULL, 0x0000969696960096ULL, 0x00004b4b4b4b004bULL,
0x0000bebebebe00beULL, 0x00002e2e2e2e002eULL, 0x0000797979790079ULL,
0x00008c8c8c8c008cULL, 0x00006e6e6e6e006eULL, 0x00008e8e8e8e008eULL,
0x0000f5f5f5f500f5ULL, 0x0000b6b6b6b600b6ULL, 0x0000fdfdfdfd00fdULL,
0x0000595959590059ULL, 0x0000989898980098ULL, 0x00006a6a6a6a006aULL,
0x0000464646460046ULL, 0x0000babababa00baULL, 0x0000252525250025ULL,
0x0000424242420042ULL, 0x0000a2a2a2a200a2ULL, 0x0000fafafafa00faULL,
0x0000070707070007ULL, 0x0000555555550055ULL, 0x0000eeeeeeee00eeULL,
0x00000a0a0a0a000aULL, 0x0000494949490049ULL, 0x0000686868680068ULL,
0x0000383838380038ULL, 0x0000a4a4a4a400a4ULL, 0x0000282828280028ULL,
0x00007b7b7b7b007bULL, 0x0000c9c9c9c900c9ULL, 0x0000c1c1c1c100c1ULL,
0x0000e3e3e3e300e3ULL, 0x0000f4f4f4f400f4ULL, 0x0000c7c7c7c700c7ULL,
0x00009e9e9e9e009eULL,
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
};
const u64 camellia_sp02220222[256] = {
0x00e0e0e000e0e0e0ULL, 0x0005050500050505ULL, 0x0058585800585858ULL,
0x00d9d9d900d9d9d9ULL, 0x0067676700676767ULL, 0x004e4e4e004e4e4eULL,
0x0081818100818181ULL, 0x00cbcbcb00cbcbcbULL, 0x00c9c9c900c9c9c9ULL,
0x000b0b0b000b0b0bULL, 0x00aeaeae00aeaeaeULL, 0x006a6a6a006a6a6aULL,
0x00d5d5d500d5d5d5ULL, 0x0018181800181818ULL, 0x005d5d5d005d5d5dULL,
0x0082828200828282ULL, 0x0046464600464646ULL, 0x00dfdfdf00dfdfdfULL,
0x00d6d6d600d6d6d6ULL, 0x0027272700272727ULL, 0x008a8a8a008a8a8aULL,
0x0032323200323232ULL, 0x004b4b4b004b4b4bULL, 0x0042424200424242ULL,
0x00dbdbdb00dbdbdbULL, 0x001c1c1c001c1c1cULL, 0x009e9e9e009e9e9eULL,
0x009c9c9c009c9c9cULL, 0x003a3a3a003a3a3aULL, 0x00cacaca00cacacaULL,
0x0025252500252525ULL, 0x007b7b7b007b7b7bULL, 0x000d0d0d000d0d0dULL,
0x0071717100717171ULL, 0x005f5f5f005f5f5fULL, 0x001f1f1f001f1f1fULL,
0x00f8f8f800f8f8f8ULL, 0x00d7d7d700d7d7d7ULL, 0x003e3e3e003e3e3eULL,
0x009d9d9d009d9d9dULL, 0x007c7c7c007c7c7cULL, 0x0060606000606060ULL,
0x00b9b9b900b9b9b9ULL, 0x00bebebe00bebebeULL, 0x00bcbcbc00bcbcbcULL,
0x008b8b8b008b8b8bULL, 0x0016161600161616ULL, 0x0034343400343434ULL,
0x004d4d4d004d4d4dULL, 0x00c3c3c300c3c3c3ULL, 0x0072727200727272ULL,
0x0095959500959595ULL, 0x00ababab00abababULL, 0x008e8e8e008e8e8eULL,
0x00bababa00bababaULL, 0x007a7a7a007a7a7aULL, 0x00b3b3b300b3b3b3ULL,
0x0002020200020202ULL, 0x00b4b4b400b4b4b4ULL, 0x00adadad00adadadULL,
0x00a2a2a200a2a2a2ULL, 0x00acacac00acacacULL, 0x00d8d8d800d8d8d8ULL,
0x009a9a9a009a9a9aULL, 0x0017171700171717ULL, 0x001a1a1a001a1a1aULL,
0x0035353500353535ULL, 0x00cccccc00ccccccULL, 0x00f7f7f700f7f7f7ULL,
0x0099999900999999ULL, 0x0061616100616161ULL, 0x005a5a5a005a5a5aULL,
0x00e8e8e800e8e8e8ULL, 0x0024242400242424ULL, 0x0056565600565656ULL,
0x0040404000404040ULL, 0x00e1e1e100e1e1e1ULL, 0x0063636300636363ULL,
0x0009090900090909ULL, 0x0033333300333333ULL, 0x00bfbfbf00bfbfbfULL,
0x0098989800989898ULL, 0x0097979700979797ULL, 0x0085858500858585ULL,
0x0068686800686868ULL, 0x00fcfcfc00fcfcfcULL, 0x00ececec00ecececULL,
0x000a0a0a000a0a0aULL, 0x00dadada00dadadaULL, 0x006f6f6f006f6f6fULL,
0x0053535300535353ULL, 0x0062626200626262ULL, 0x00a3a3a300a3a3a3ULL,
0x002e2e2e002e2e2eULL, 0x0008080800080808ULL, 0x00afafaf00afafafULL,
0x0028282800282828ULL, 0x00b0b0b000b0b0b0ULL, 0x0074747400747474ULL,
0x00c2c2c200c2c2c2ULL, 0x00bdbdbd00bdbdbdULL, 0x0036363600363636ULL,
0x0022222200222222ULL, 0x0038383800383838ULL, 0x0064646400646464ULL,
0x001e1e1e001e1e1eULL, 0x0039393900393939ULL, 0x002c2c2c002c2c2cULL,
0x00a6a6a600a6a6a6ULL, 0x0030303000303030ULL, 0x00e5e5e500e5e5e5ULL,
0x0044444400444444ULL, 0x00fdfdfd00fdfdfdULL, 0x0088888800888888ULL,
0x009f9f9f009f9f9fULL, 0x0065656500656565ULL, 0x0087878700878787ULL,
0x006b6b6b006b6b6bULL, 0x00f4f4f400f4f4f4ULL, 0x0023232300232323ULL,
0x0048484800484848ULL, 0x0010101000101010ULL, 0x00d1d1d100d1d1d1ULL,
0x0051515100515151ULL, 0x00c0c0c000c0c0c0ULL, 0x00f9f9f900f9f9f9ULL,
0x00d2d2d200d2d2d2ULL, 0x00a0a0a000a0a0a0ULL, 0x0055555500555555ULL,
0x00a1a1a100a1a1a1ULL, 0x0041414100414141ULL, 0x00fafafa00fafafaULL,
0x0043434300434343ULL, 0x0013131300131313ULL, 0x00c4c4c400c4c4c4ULL,
0x002f2f2f002f2f2fULL, 0x00a8a8a800a8a8a8ULL, 0x00b6b6b600b6b6b6ULL,
0x003c3c3c003c3c3cULL, 0x002b2b2b002b2b2bULL, 0x00c1c1c100c1c1c1ULL,
0x00ffffff00ffffffULL, 0x00c8c8c800c8c8c8ULL, 0x00a5a5a500a5a5a5ULL,
0x0020202000202020ULL, 0x0089898900898989ULL, 0x0000000000000000ULL,
0x0090909000909090ULL, 0x0047474700474747ULL, 0x00efefef00efefefULL,
0x00eaeaea00eaeaeaULL, 0x00b7b7b700b7b7b7ULL, 0x0015151500151515ULL,
0x0006060600060606ULL, 0x00cdcdcd00cdcdcdULL, 0x00b5b5b500b5b5b5ULL,
0x0012121200121212ULL, 0x007e7e7e007e7e7eULL, 0x00bbbbbb00bbbbbbULL,
0x0029292900292929ULL, 0x000f0f0f000f0f0fULL, 0x00b8b8b800b8b8b8ULL,
0x0007070700070707ULL, 0x0004040400040404ULL, 0x009b9b9b009b9b9bULL,
0x0094949400949494ULL, 0x0021212100212121ULL, 0x0066666600666666ULL,
0x00e6e6e600e6e6e6ULL, 0x00cecece00cececeULL, 0x00ededed00edededULL,
0x00e7e7e700e7e7e7ULL, 0x003b3b3b003b3b3bULL, 0x00fefefe00fefefeULL,
0x007f7f7f007f7f7fULL, 0x00c5c5c500c5c5c5ULL, 0x00a4a4a400a4a4a4ULL,
0x0037373700373737ULL, 0x00b1b1b100b1b1b1ULL, 0x004c4c4c004c4c4cULL,
0x0091919100919191ULL, 0x006e6e6e006e6e6eULL, 0x008d8d8d008d8d8dULL,
0x0076767600767676ULL, 0x0003030300030303ULL, 0x002d2d2d002d2d2dULL,
0x00dedede00dededeULL, 0x0096969600969696ULL, 0x0026262600262626ULL,
0x007d7d7d007d7d7dULL, 0x00c6c6c600c6c6c6ULL, 0x005c5c5c005c5c5cULL,
0x00d3d3d300d3d3d3ULL, 0x00f2f2f200f2f2f2ULL, 0x004f4f4f004f4f4fULL,
0x0019191900191919ULL, 0x003f3f3f003f3f3fULL, 0x00dcdcdc00dcdcdcULL,
0x0079797900797979ULL, 0x001d1d1d001d1d1dULL, 0x0052525200525252ULL,
0x00ebebeb00ebebebULL, 0x00f3f3f300f3f3f3ULL, 0x006d6d6d006d6d6dULL,
0x005e5e5e005e5e5eULL, 0x00fbfbfb00fbfbfbULL, 0x0069696900696969ULL,
0x00b2b2b200b2b2b2ULL, 0x00f0f0f000f0f0f0ULL, 0x0031313100313131ULL,
0x000c0c0c000c0c0cULL, 0x00d4d4d400d4d4d4ULL, 0x00cfcfcf00cfcfcfULL,
0x008c8c8c008c8c8cULL, 0x00e2e2e200e2e2e2ULL, 0x0075757500757575ULL,
0x00a9a9a900a9a9a9ULL, 0x004a4a4a004a4a4aULL, 0x0057575700575757ULL,
0x0084848400848484ULL, 0x0011111100111111ULL, 0x0045454500454545ULL,
0x001b1b1b001b1b1bULL, 0x00f5f5f500f5f5f5ULL, 0x00e4e4e400e4e4e4ULL,
0x000e0e0e000e0e0eULL, 0x0073737300737373ULL, 0x00aaaaaa00aaaaaaULL,
0x00f1f1f100f1f1f1ULL, 0x00dddddd00ddddddULL, 0x0059595900595959ULL,
0x0014141400141414ULL, 0x006c6c6c006c6c6cULL, 0x0092929200929292ULL,
0x0054545400545454ULL, 0x00d0d0d000d0d0d0ULL, 0x0078787800787878ULL,
0x0070707000707070ULL, 0x00e3e3e300e3e3e3ULL, 0x0049494900494949ULL,
0x0080808000808080ULL, 0x0050505000505050ULL, 0x00a7a7a700a7a7a7ULL,
0x00f6f6f600f6f6f6ULL, 0x0077777700777777ULL, 0x0093939300939393ULL,
0x0086868600868686ULL, 0x0083838300838383ULL, 0x002a2a2a002a2a2aULL,
0x00c7c7c700c7c7c7ULL, 0x005b5b5b005b5b5bULL, 0x00e9e9e900e9e9e9ULL,
0x00eeeeee00eeeeeeULL, 0x008f8f8f008f8f8fULL, 0x0001010100010101ULL,
0x003d3d3d003d3d3dULL,
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
};
const u64 camellia_sp30333033[256] = {
0x3800383838003838ULL, 0x4100414141004141ULL, 0x1600161616001616ULL,
0x7600767676007676ULL, 0xd900d9d9d900d9d9ULL, 0x9300939393009393ULL,
0x6000606060006060ULL, 0xf200f2f2f200f2f2ULL, 0x7200727272007272ULL,
0xc200c2c2c200c2c2ULL, 0xab00ababab00ababULL, 0x9a009a9a9a009a9aULL,
0x7500757575007575ULL, 0x0600060606000606ULL, 0x5700575757005757ULL,
0xa000a0a0a000a0a0ULL, 0x9100919191009191ULL, 0xf700f7f7f700f7f7ULL,
0xb500b5b5b500b5b5ULL, 0xc900c9c9c900c9c9ULL, 0xa200a2a2a200a2a2ULL,
0x8c008c8c8c008c8cULL, 0xd200d2d2d200d2d2ULL, 0x9000909090009090ULL,
0xf600f6f6f600f6f6ULL, 0x0700070707000707ULL, 0xa700a7a7a700a7a7ULL,
0x2700272727002727ULL, 0x8e008e8e8e008e8eULL, 0xb200b2b2b200b2b2ULL,
0x4900494949004949ULL, 0xde00dedede00dedeULL, 0x4300434343004343ULL,
0x5c005c5c5c005c5cULL, 0xd700d7d7d700d7d7ULL, 0xc700c7c7c700c7c7ULL,
0x3e003e3e3e003e3eULL, 0xf500f5f5f500f5f5ULL, 0x8f008f8f8f008f8fULL,
0x6700676767006767ULL, 0x1f001f1f1f001f1fULL, 0x1800181818001818ULL,
0x6e006e6e6e006e6eULL, 0xaf00afafaf00afafULL, 0x2f002f2f2f002f2fULL,
0xe200e2e2e200e2e2ULL, 0x8500858585008585ULL, 0x0d000d0d0d000d0dULL,
0x5300535353005353ULL, 0xf000f0f0f000f0f0ULL, 0x9c009c9c9c009c9cULL,
0x6500656565006565ULL, 0xea00eaeaea00eaeaULL, 0xa300a3a3a300a3a3ULL,
0xae00aeaeae00aeaeULL, 0x9e009e9e9e009e9eULL, 0xec00ececec00ececULL,
0x8000808080008080ULL, 0x2d002d2d2d002d2dULL, 0x6b006b6b6b006b6bULL,
0xa800a8a8a800a8a8ULL, 0x2b002b2b2b002b2bULL, 0x3600363636003636ULL,
0xa600a6a6a600a6a6ULL, 0xc500c5c5c500c5c5ULL, 0x8600868686008686ULL,
0x4d004d4d4d004d4dULL, 0x3300333333003333ULL, 0xfd00fdfdfd00fdfdULL,
0x6600666666006666ULL, 0x5800585858005858ULL, 0x9600969696009696ULL,
0x3a003a3a3a003a3aULL, 0x0900090909000909ULL, 0x9500959595009595ULL,
0x1000101010001010ULL, 0x7800787878007878ULL, 0xd800d8d8d800d8d8ULL,
0x4200424242004242ULL, 0xcc00cccccc00ccccULL, 0xef00efefef00efefULL,
0x2600262626002626ULL, 0xe500e5e5e500e5e5ULL, 0x6100616161006161ULL,
0x1a001a1a1a001a1aULL, 0x3f003f3f3f003f3fULL, 0x3b003b3b3b003b3bULL,
0x8200828282008282ULL, 0xb600b6b6b600b6b6ULL, 0xdb00dbdbdb00dbdbULL,
0xd400d4d4d400d4d4ULL, 0x9800989898009898ULL, 0xe800e8e8e800e8e8ULL,
0x8b008b8b8b008b8bULL, 0x0200020202000202ULL, 0xeb00ebebeb00ebebULL,
0x0a000a0a0a000a0aULL, 0x2c002c2c2c002c2cULL, 0x1d001d1d1d001d1dULL,
0xb000b0b0b000b0b0ULL, 0x6f006f6f6f006f6fULL, 0x8d008d8d8d008d8dULL,
0x8800888888008888ULL, 0x0e000e0e0e000e0eULL, 0x1900191919001919ULL,
0x8700878787008787ULL, 0x4e004e4e4e004e4eULL, 0x0b000b0b0b000b0bULL,
0xa900a9a9a900a9a9ULL, 0x0c000c0c0c000c0cULL, 0x7900797979007979ULL,
0x1100111111001111ULL, 0x7f007f7f7f007f7fULL, 0x2200222222002222ULL,
0xe700e7e7e700e7e7ULL, 0x5900595959005959ULL, 0xe100e1e1e100e1e1ULL,
0xda00dadada00dadaULL, 0x3d003d3d3d003d3dULL, 0xc800c8c8c800c8c8ULL,
0x1200121212001212ULL, 0x0400040404000404ULL, 0x7400747474007474ULL,
0x5400545454005454ULL, 0x3000303030003030ULL, 0x7e007e7e7e007e7eULL,
0xb400b4b4b400b4b4ULL, 0x2800282828002828ULL, 0x5500555555005555ULL,
0x6800686868006868ULL, 0x5000505050005050ULL, 0xbe00bebebe00bebeULL,
0xd000d0d0d000d0d0ULL, 0xc400c4c4c400c4c4ULL, 0x3100313131003131ULL,
0xcb00cbcbcb00cbcbULL, 0x2a002a2a2a002a2aULL, 0xad00adadad00adadULL,
0x0f000f0f0f000f0fULL, 0xca00cacaca00cacaULL, 0x7000707070007070ULL,
0xff00ffffff00ffffULL, 0x3200323232003232ULL, 0x6900696969006969ULL,
0x0800080808000808ULL, 0x6200626262006262ULL, 0x0000000000000000ULL,
0x2400242424002424ULL, 0xd100d1d1d100d1d1ULL, 0xfb00fbfbfb00fbfbULL,
0xba00bababa00babaULL, 0xed00ededed00ededULL, 0x4500454545004545ULL,
0x8100818181008181ULL, 0x7300737373007373ULL, 0x6d006d6d6d006d6dULL,
0x8400848484008484ULL, 0x9f009f9f9f009f9fULL, 0xee00eeeeee00eeeeULL,
0x4a004a4a4a004a4aULL, 0xc300c3c3c300c3c3ULL, 0x2e002e2e2e002e2eULL,
0xc100c1c1c100c1c1ULL, 0x0100010101000101ULL, 0xe600e6e6e600e6e6ULL,
0x2500252525002525ULL, 0x4800484848004848ULL, 0x9900999999009999ULL,
0xb900b9b9b900b9b9ULL, 0xb300b3b3b300b3b3ULL, 0x7b007b7b7b007b7bULL,
0xf900f9f9f900f9f9ULL, 0xce00cecece00ceceULL, 0xbf00bfbfbf00bfbfULL,
0xdf00dfdfdf00dfdfULL, 0x7100717171007171ULL, 0x2900292929002929ULL,
0xcd00cdcdcd00cdcdULL, 0x6c006c6c6c006c6cULL, 0x1300131313001313ULL,
0x6400646464006464ULL, 0x9b009b9b9b009b9bULL, 0x6300636363006363ULL,
0x9d009d9d9d009d9dULL, 0xc000c0c0c000c0c0ULL, 0x4b004b4b4b004b4bULL,
0xb700b7b7b700b7b7ULL, 0xa500a5a5a500a5a5ULL, 0x8900898989008989ULL,
0x5f005f5f5f005f5fULL, 0xb100b1b1b100b1b1ULL, 0x1700171717001717ULL,
0xf400f4f4f400f4f4ULL, 0xbc00bcbcbc00bcbcULL, 0xd300d3d3d300d3d3ULL,
0x4600464646004646ULL, 0xcf00cfcfcf00cfcfULL, 0x3700373737003737ULL,
0x5e005e5e5e005e5eULL, 0x4700474747004747ULL, 0x9400949494009494ULL,
0xfa00fafafa00fafaULL, 0xfc00fcfcfc00fcfcULL, 0x5b005b5b5b005b5bULL,
0x9700979797009797ULL, 0xfe00fefefe00fefeULL, 0x5a005a5a5a005a5aULL,
0xac00acacac00acacULL, 0x3c003c3c3c003c3cULL, 0x4c004c4c4c004c4cULL,
0x0300030303000303ULL, 0x3500353535003535ULL, 0xf300f3f3f300f3f3ULL,
0x2300232323002323ULL, 0xb800b8b8b800b8b8ULL, 0x5d005d5d5d005d5dULL,
0x6a006a6a6a006a6aULL, 0x9200929292009292ULL, 0xd500d5d5d500d5d5ULL,
0x2100212121002121ULL, 0x4400444444004444ULL, 0x5100515151005151ULL,
0xc600c6c6c600c6c6ULL, 0x7d007d7d7d007d7dULL, 0x3900393939003939ULL,
0x8300838383008383ULL, 0xdc00dcdcdc00dcdcULL, 0xaa00aaaaaa00aaaaULL,
0x7c007c7c7c007c7cULL, 0x7700777777007777ULL, 0x5600565656005656ULL,
0x0500050505000505ULL, 0x1b001b1b1b001b1bULL, 0xa400a4a4a400a4a4ULL,
0x1500151515001515ULL, 0x3400343434003434ULL, 0x1e001e1e1e001e1eULL,
0x1c001c1c1c001c1cULL, 0xf800f8f8f800f8f8ULL, 0x5200525252005252ULL,
0x2000202020002020ULL, 0x1400141414001414ULL, 0xe900e9e9e900e9e9ULL,
0xbd00bdbdbd00bdbdULL, 0xdd00dddddd00ddddULL, 0xe400e4e4e400e4e4ULL,
0xa100a1a1a100a1a1ULL, 0xe000e0e0e000e0e0ULL, 0x8a008a8a8a008a8aULL,
0xf100f1f1f100f1f1ULL, 0xd600d6d6d600d6d6ULL, 0x7a007a7a7a007a7aULL,
0xbb00bbbbbb00bbbbULL, 0xe300e3e3e300e3e3ULL, 0x4000404040004040ULL,
0x4f004f4f4f004f4fULL,
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
};
const u64 camellia_sp44044404[256] = {
0x7070007070700070ULL, 0x2c2c002c2c2c002cULL, 0xb3b300b3b3b300b3ULL,
0xc0c000c0c0c000c0ULL, 0xe4e400e4e4e400e4ULL, 0x5757005757570057ULL,
0xeaea00eaeaea00eaULL, 0xaeae00aeaeae00aeULL, 0x2323002323230023ULL,
0x6b6b006b6b6b006bULL, 0x4545004545450045ULL, 0xa5a500a5a5a500a5ULL,
0xeded00ededed00edULL, 0x4f4f004f4f4f004fULL, 0x1d1d001d1d1d001dULL,
0x9292009292920092ULL, 0x8686008686860086ULL, 0xafaf00afafaf00afULL,
0x7c7c007c7c7c007cULL, 0x1f1f001f1f1f001fULL, 0x3e3e003e3e3e003eULL,
0xdcdc00dcdcdc00dcULL, 0x5e5e005e5e5e005eULL, 0x0b0b000b0b0b000bULL,
0xa6a600a6a6a600a6ULL, 0x3939003939390039ULL, 0xd5d500d5d5d500d5ULL,
0x5d5d005d5d5d005dULL, 0xd9d900d9d9d900d9ULL, 0x5a5a005a5a5a005aULL,
0x5151005151510051ULL, 0x6c6c006c6c6c006cULL, 0x8b8b008b8b8b008bULL,
0x9a9a009a9a9a009aULL, 0xfbfb00fbfbfb00fbULL, 0xb0b000b0b0b000b0ULL,
0x7474007474740074ULL, 0x2b2b002b2b2b002bULL, 0xf0f000f0f0f000f0ULL,
0x8484008484840084ULL, 0xdfdf00dfdfdf00dfULL, 0xcbcb00cbcbcb00cbULL,
0x3434003434340034ULL, 0x7676007676760076ULL, 0x6d6d006d6d6d006dULL,
0xa9a900a9a9a900a9ULL, 0xd1d100d1d1d100d1ULL, 0x0404000404040004ULL,
0x1414001414140014ULL, 0x3a3a003a3a3a003aULL, 0xdede00dedede00deULL,
0x1111001111110011ULL, 0x3232003232320032ULL, 0x9c9c009c9c9c009cULL,
0x5353005353530053ULL, 0xf2f200f2f2f200f2ULL, 0xfefe00fefefe00feULL,
0xcfcf00cfcfcf00cfULL, 0xc3c300c3c3c300c3ULL, 0x7a7a007a7a7a007aULL,
0x2424002424240024ULL, 0xe8e800e8e8e800e8ULL, 0x6060006060600060ULL,
0x6969006969690069ULL, 0xaaaa00aaaaaa00aaULL, 0xa0a000a0a0a000a0ULL,
0xa1a100a1a1a100a1ULL, 0x6262006262620062ULL, 0x5454005454540054ULL,
0x1e1e001e1e1e001eULL, 0xe0e000e0e0e000e0ULL, 0x6464006464640064ULL,
0x1010001010100010ULL, 0x0000000000000000ULL, 0xa3a300a3a3a300a3ULL,
0x7575007575750075ULL, 0x8a8a008a8a8a008aULL, 0xe6e600e6e6e600e6ULL,
0x0909000909090009ULL, 0xdddd00dddddd00ddULL, 0x8787008787870087ULL,
0x8383008383830083ULL, 0xcdcd00cdcdcd00cdULL, 0x9090009090900090ULL,
0x7373007373730073ULL, 0xf6f600f6f6f600f6ULL, 0x9d9d009d9d9d009dULL,
0xbfbf00bfbfbf00bfULL, 0x5252005252520052ULL, 0xd8d800d8d8d800d8ULL,
0xc8c800c8c8c800c8ULL, 0xc6c600c6c6c600c6ULL, 0x8181008181810081ULL,
0x6f6f006f6f6f006fULL, 0x1313001313130013ULL, 0x6363006363630063ULL,
0xe9e900e9e9e900e9ULL, 0xa7a700a7a7a700a7ULL, 0x9f9f009f9f9f009fULL,
0xbcbc00bcbcbc00bcULL, 0x2929002929290029ULL, 0xf9f900f9f9f900f9ULL,
0x2f2f002f2f2f002fULL, 0xb4b400b4b4b400b4ULL, 0x7878007878780078ULL,
0x0606000606060006ULL, 0xe7e700e7e7e700e7ULL, 0x7171007171710071ULL,
0xd4d400d4d4d400d4ULL, 0xabab00ababab00abULL, 0x8888008888880088ULL,
0x8d8d008d8d8d008dULL, 0x7272007272720072ULL, 0xb9b900b9b9b900b9ULL,
0xf8f800f8f8f800f8ULL, 0xacac00acacac00acULL, 0x3636003636360036ULL,
0x2a2a002a2a2a002aULL, 0x3c3c003c3c3c003cULL, 0xf1f100f1f1f100f1ULL,
0x4040004040400040ULL, 0xd3d300d3d3d300d3ULL, 0xbbbb00bbbbbb00bbULL,
0x4343004343430043ULL, 0x1515001515150015ULL, 0xadad00adadad00adULL,
0x7777007777770077ULL, 0x8080008080800080ULL, 0x8282008282820082ULL,
0xecec00ececec00ecULL, 0x2727002727270027ULL, 0xe5e500e5e5e500e5ULL,
0x8585008585850085ULL, 0x3535003535350035ULL, 0x0c0c000c0c0c000cULL,
0x4141004141410041ULL, 0xefef00efefef00efULL, 0x9393009393930093ULL,
0x1919001919190019ULL, 0x2121002121210021ULL, 0x0e0e000e0e0e000eULL,
0x4e4e004e4e4e004eULL, 0x6565006565650065ULL, 0xbdbd00bdbdbd00bdULL,
0xb8b800b8b8b800b8ULL, 0x8f8f008f8f8f008fULL, 0xebeb00ebebeb00ebULL,
0xcece00cecece00ceULL, 0x3030003030300030ULL, 0x5f5f005f5f5f005fULL,
0xc5c500c5c5c500c5ULL, 0x1a1a001a1a1a001aULL, 0xe1e100e1e1e100e1ULL,
0xcaca00cacaca00caULL, 0x4747004747470047ULL, 0x3d3d003d3d3d003dULL,
0x0101000101010001ULL, 0xd6d600d6d6d600d6ULL, 0x5656005656560056ULL,
0x4d4d004d4d4d004dULL, 0x0d0d000d0d0d000dULL, 0x6666006666660066ULL,
0xcccc00cccccc00ccULL, 0x2d2d002d2d2d002dULL, 0x1212001212120012ULL,
0x2020002020200020ULL, 0xb1b100b1b1b100b1ULL, 0x9999009999990099ULL,
0x4c4c004c4c4c004cULL, 0xc2c200c2c2c200c2ULL, 0x7e7e007e7e7e007eULL,
0x0505000505050005ULL, 0xb7b700b7b7b700b7ULL, 0x3131003131310031ULL,
0x1717001717170017ULL, 0xd7d700d7d7d700d7ULL, 0x5858005858580058ULL,
0x6161006161610061ULL, 0x1b1b001b1b1b001bULL, 0x1c1c001c1c1c001cULL,
0x0f0f000f0f0f000fULL, 0x1616001616160016ULL, 0x1818001818180018ULL,
0x2222002222220022ULL, 0x4444004444440044ULL, 0xb2b200b2b2b200b2ULL,
0xb5b500b5b5b500b5ULL, 0x9191009191910091ULL, 0x0808000808080008ULL,
0xa8a800a8a8a800a8ULL, 0xfcfc00fcfcfc00fcULL, 0x5050005050500050ULL,
0xd0d000d0d0d000d0ULL, 0x7d7d007d7d7d007dULL, 0x8989008989890089ULL,
0x9797009797970097ULL, 0x5b5b005b5b5b005bULL, 0x9595009595950095ULL,
0xffff00ffffff00ffULL, 0xd2d200d2d2d200d2ULL, 0xc4c400c4c4c400c4ULL,
0x4848004848480048ULL, 0xf7f700f7f7f700f7ULL, 0xdbdb00dbdbdb00dbULL,
0x0303000303030003ULL, 0xdada00dadada00daULL, 0x3f3f003f3f3f003fULL,
0x9494009494940094ULL, 0x5c5c005c5c5c005cULL, 0x0202000202020002ULL,
0x4a4a004a4a4a004aULL, 0x3333003333330033ULL, 0x6767006767670067ULL,
0xf3f300f3f3f300f3ULL, 0x7f7f007f7f7f007fULL, 0xe2e200e2e2e200e2ULL,
0x9b9b009b9b9b009bULL, 0x2626002626260026ULL, 0x3737003737370037ULL,
0x3b3b003b3b3b003bULL, 0x9696009696960096ULL, 0x4b4b004b4b4b004bULL,
0xbebe00bebebe00beULL, 0x2e2e002e2e2e002eULL, 0x7979007979790079ULL,
0x8c8c008c8c8c008cULL, 0x6e6e006e6e6e006eULL, 0x8e8e008e8e8e008eULL,
0xf5f500f5f5f500f5ULL, 0xb6b600b6b6b600b6ULL, 0xfdfd00fdfdfd00fdULL,
0x5959005959590059ULL, 0x9898009898980098ULL, 0x6a6a006a6a6a006aULL,
0x4646004646460046ULL, 0xbaba00bababa00baULL, 0x2525002525250025ULL,
0x4242004242420042ULL, 0xa2a200a2a2a200a2ULL, 0xfafa00fafafa00faULL,
0x0707000707070007ULL, 0x5555005555550055ULL, 0xeeee00eeeeee00eeULL,
0x0a0a000a0a0a000aULL, 0x4949004949490049ULL, 0x6868006868680068ULL,
0x3838003838380038ULL, 0xa4a400a4a4a400a4ULL, 0x2828002828280028ULL,
0x7b7b007b7b7b007bULL, 0xc9c900c9c9c900c9ULL, 0xc1c100c1c1c100c1ULL,
0xe3e300e3e3e300e3ULL, 0xf4f400f4f4f400f4ULL, 0xc7c700c7c7c700c7ULL,
0x9e9e009e9e9e009eULL,
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
};
const u64 camellia_sp11101110[256] = {
0x7070700070707000ULL, 0x8282820082828200ULL, 0x2c2c2c002c2c2c00ULL,
0xececec00ececec00ULL, 0xb3b3b300b3b3b300ULL, 0x2727270027272700ULL,
0xc0c0c000c0c0c000ULL, 0xe5e5e500e5e5e500ULL, 0xe4e4e400e4e4e400ULL,
0x8585850085858500ULL, 0x5757570057575700ULL, 0x3535350035353500ULL,
0xeaeaea00eaeaea00ULL, 0x0c0c0c000c0c0c00ULL, 0xaeaeae00aeaeae00ULL,
0x4141410041414100ULL, 0x2323230023232300ULL, 0xefefef00efefef00ULL,
0x6b6b6b006b6b6b00ULL, 0x9393930093939300ULL, 0x4545450045454500ULL,
0x1919190019191900ULL, 0xa5a5a500a5a5a500ULL, 0x2121210021212100ULL,
0xededed00ededed00ULL, 0x0e0e0e000e0e0e00ULL, 0x4f4f4f004f4f4f00ULL,
0x4e4e4e004e4e4e00ULL, 0x1d1d1d001d1d1d00ULL, 0x6565650065656500ULL,
0x9292920092929200ULL, 0xbdbdbd00bdbdbd00ULL, 0x8686860086868600ULL,
0xb8b8b800b8b8b800ULL, 0xafafaf00afafaf00ULL, 0x8f8f8f008f8f8f00ULL,
0x7c7c7c007c7c7c00ULL, 0xebebeb00ebebeb00ULL, 0x1f1f1f001f1f1f00ULL,
0xcecece00cecece00ULL, 0x3e3e3e003e3e3e00ULL, 0x3030300030303000ULL,
0xdcdcdc00dcdcdc00ULL, 0x5f5f5f005f5f5f00ULL, 0x5e5e5e005e5e5e00ULL,
0xc5c5c500c5c5c500ULL, 0x0b0b0b000b0b0b00ULL, 0x1a1a1a001a1a1a00ULL,
0xa6a6a600a6a6a600ULL, 0xe1e1e100e1e1e100ULL, 0x3939390039393900ULL,
0xcacaca00cacaca00ULL, 0xd5d5d500d5d5d500ULL, 0x4747470047474700ULL,
0x5d5d5d005d5d5d00ULL, 0x3d3d3d003d3d3d00ULL, 0xd9d9d900d9d9d900ULL,
0x0101010001010100ULL, 0x5a5a5a005a5a5a00ULL, 0xd6d6d600d6d6d600ULL,
0x5151510051515100ULL, 0x5656560056565600ULL, 0x6c6c6c006c6c6c00ULL,
0x4d4d4d004d4d4d00ULL, 0x8b8b8b008b8b8b00ULL, 0x0d0d0d000d0d0d00ULL,
0x9a9a9a009a9a9a00ULL, 0x6666660066666600ULL, 0xfbfbfb00fbfbfb00ULL,
0xcccccc00cccccc00ULL, 0xb0b0b000b0b0b000ULL, 0x2d2d2d002d2d2d00ULL,
0x7474740074747400ULL, 0x1212120012121200ULL, 0x2b2b2b002b2b2b00ULL,
0x2020200020202000ULL, 0xf0f0f000f0f0f000ULL, 0xb1b1b100b1b1b100ULL,
0x8484840084848400ULL, 0x9999990099999900ULL, 0xdfdfdf00dfdfdf00ULL,
0x4c4c4c004c4c4c00ULL, 0xcbcbcb00cbcbcb00ULL, 0xc2c2c200c2c2c200ULL,
0x3434340034343400ULL, 0x7e7e7e007e7e7e00ULL, 0x7676760076767600ULL,
0x0505050005050500ULL, 0x6d6d6d006d6d6d00ULL, 0xb7b7b700b7b7b700ULL,
0xa9a9a900a9a9a900ULL, 0x3131310031313100ULL, 0xd1d1d100d1d1d100ULL,
0x1717170017171700ULL, 0x0404040004040400ULL, 0xd7d7d700d7d7d700ULL,
0x1414140014141400ULL, 0x5858580058585800ULL, 0x3a3a3a003a3a3a00ULL,
0x6161610061616100ULL, 0xdedede00dedede00ULL, 0x1b1b1b001b1b1b00ULL,
0x1111110011111100ULL, 0x1c1c1c001c1c1c00ULL, 0x3232320032323200ULL,
0x0f0f0f000f0f0f00ULL, 0x9c9c9c009c9c9c00ULL, 0x1616160016161600ULL,
0x5353530053535300ULL, 0x1818180018181800ULL, 0xf2f2f200f2f2f200ULL,
0x2222220022222200ULL, 0xfefefe00fefefe00ULL, 0x4444440044444400ULL,
0xcfcfcf00cfcfcf00ULL, 0xb2b2b200b2b2b200ULL, 0xc3c3c300c3c3c300ULL,
0xb5b5b500b5b5b500ULL, 0x7a7a7a007a7a7a00ULL, 0x9191910091919100ULL,
0x2424240024242400ULL, 0x0808080008080800ULL, 0xe8e8e800e8e8e800ULL,
0xa8a8a800a8a8a800ULL, 0x6060600060606000ULL, 0xfcfcfc00fcfcfc00ULL,
0x6969690069696900ULL, 0x5050500050505000ULL, 0xaaaaaa00aaaaaa00ULL,
0xd0d0d000d0d0d000ULL, 0xa0a0a000a0a0a000ULL, 0x7d7d7d007d7d7d00ULL,
0xa1a1a100a1a1a100ULL, 0x8989890089898900ULL, 0x6262620062626200ULL,
0x9797970097979700ULL, 0x5454540054545400ULL, 0x5b5b5b005b5b5b00ULL,
0x1e1e1e001e1e1e00ULL, 0x9595950095959500ULL, 0xe0e0e000e0e0e000ULL,
0xffffff00ffffff00ULL, 0x6464640064646400ULL, 0xd2d2d200d2d2d200ULL,
0x1010100010101000ULL, 0xc4c4c400c4c4c400ULL, 0x0000000000000000ULL,
0x4848480048484800ULL, 0xa3a3a300a3a3a300ULL, 0xf7f7f700f7f7f700ULL,
0x7575750075757500ULL, 0xdbdbdb00dbdbdb00ULL, 0x8a8a8a008a8a8a00ULL,
0x0303030003030300ULL, 0xe6e6e600e6e6e600ULL, 0xdadada00dadada00ULL,
0x0909090009090900ULL, 0x3f3f3f003f3f3f00ULL, 0xdddddd00dddddd00ULL,
0x9494940094949400ULL, 0x8787870087878700ULL, 0x5c5c5c005c5c5c00ULL,
0x8383830083838300ULL, 0x0202020002020200ULL, 0xcdcdcd00cdcdcd00ULL,
0x4a4a4a004a4a4a00ULL, 0x9090900090909000ULL, 0x3333330033333300ULL,
0x7373730073737300ULL, 0x6767670067676700ULL, 0xf6f6f600f6f6f600ULL,
0xf3f3f300f3f3f300ULL, 0x9d9d9d009d9d9d00ULL, 0x7f7f7f007f7f7f00ULL,
0xbfbfbf00bfbfbf00ULL, 0xe2e2e200e2e2e200ULL, 0x5252520052525200ULL,
0x9b9b9b009b9b9b00ULL, 0xd8d8d800d8d8d800ULL, 0x2626260026262600ULL,
0xc8c8c800c8c8c800ULL, 0x3737370037373700ULL, 0xc6c6c600c6c6c600ULL,
0x3b3b3b003b3b3b00ULL, 0x8181810081818100ULL, 0x9696960096969600ULL,
0x6f6f6f006f6f6f00ULL, 0x4b4b4b004b4b4b00ULL, 0x1313130013131300ULL,
0xbebebe00bebebe00ULL, 0x6363630063636300ULL, 0x2e2e2e002e2e2e00ULL,
0xe9e9e900e9e9e900ULL, 0x7979790079797900ULL, 0xa7a7a700a7a7a700ULL,
0x8c8c8c008c8c8c00ULL, 0x9f9f9f009f9f9f00ULL, 0x6e6e6e006e6e6e00ULL,
0xbcbcbc00bcbcbc00ULL, 0x8e8e8e008e8e8e00ULL, 0x2929290029292900ULL,
0xf5f5f500f5f5f500ULL, 0xf9f9f900f9f9f900ULL, 0xb6b6b600b6b6b600ULL,
0x2f2f2f002f2f2f00ULL, 0xfdfdfd00fdfdfd00ULL, 0xb4b4b400b4b4b400ULL,
0x5959590059595900ULL, 0x7878780078787800ULL, 0x9898980098989800ULL,
0x0606060006060600ULL, 0x6a6a6a006a6a6a00ULL, 0xe7e7e700e7e7e700ULL,
0x4646460046464600ULL, 0x7171710071717100ULL, 0xbababa00bababa00ULL,
0xd4d4d400d4d4d400ULL, 0x2525250025252500ULL, 0xababab00ababab00ULL,
0x4242420042424200ULL, 0x8888880088888800ULL, 0xa2a2a200a2a2a200ULL,
0x8d8d8d008d8d8d00ULL, 0xfafafa00fafafa00ULL, 0x7272720072727200ULL,
0x0707070007070700ULL, 0xb9b9b900b9b9b900ULL, 0x5555550055555500ULL,
0xf8f8f800f8f8f800ULL, 0xeeeeee00eeeeee00ULL, 0xacacac00acacac00ULL,
0x0a0a0a000a0a0a00ULL, 0x3636360036363600ULL, 0x4949490049494900ULL,
0x2a2a2a002a2a2a00ULL, 0x6868680068686800ULL, 0x3c3c3c003c3c3c00ULL,
0x3838380038383800ULL, 0xf1f1f100f1f1f100ULL, 0xa4a4a400a4a4a400ULL,
0x4040400040404000ULL, 0x2828280028282800ULL, 0xd3d3d300d3d3d300ULL,
0x7b7b7b007b7b7b00ULL, 0xbbbbbb00bbbbbb00ULL, 0xc9c9c900c9c9c900ULL,
0x4343430043434300ULL, 0xc1c1c100c1c1c100ULL, 0x1515150015151500ULL,
0xe3e3e300e3e3e300ULL, 0xadadad00adadad00ULL, 0xf4f4f400f4f4f400ULL,
0x7777770077777700ULL, 0xc7c7c700c7c7c700ULL, 0x8080800080808000ULL,
0x9e9e9e009e9e9e00ULL,
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
};
/* key constants */
#define CAMELLIA_SIGMA1L (0xA09E667FL)
#define CAMELLIA_SIGMA1R (0x3BCC908BL)
#define CAMELLIA_SIGMA2L (0xB67AE858L)
#define CAMELLIA_SIGMA2R (0x4CAA73B2L)
#define CAMELLIA_SIGMA3L (0xC6EF372FL)
#define CAMELLIA_SIGMA3R (0xE94F82BEL)
#define CAMELLIA_SIGMA4L (0x54FF53A5L)
#define CAMELLIA_SIGMA4R (0xF1D36F1CL)
#define CAMELLIA_SIGMA5L (0x10E527FAL)
#define CAMELLIA_SIGMA5R (0xDE682D1DL)
#define CAMELLIA_SIGMA6L (0xB05688C2L)
#define CAMELLIA_SIGMA6R (0xB3E6C1FDL)
/* macros */
#define ROLDQ(l, r, bits) ({ \
u64 t = l; \
l = (l << bits) | (r >> (64 - bits)); \
r = (r << bits) | (t >> (64 - bits)); \
})
#define CAMELLIA_F(x, kl, kr, y) ({ \
u64 ii = x ^ (((u64)kl << 32) | kr); \
y = camellia_sp11101110[(uint8_t)ii]; \
y ^= camellia_sp44044404[(uint8_t)(ii >> 8)]; \
ii >>= 16; \
y ^= camellia_sp30333033[(uint8_t)ii]; \
y ^= camellia_sp02220222[(uint8_t)(ii >> 8)]; \
ii >>= 16; \
y ^= camellia_sp00444404[(uint8_t)ii]; \
y ^= camellia_sp03303033[(uint8_t)(ii >> 8)]; \
ii >>= 16; \
y ^= camellia_sp22000222[(uint8_t)ii]; \
y ^= camellia_sp10011110[(uint8_t)(ii >> 8)]; \
y = ror64(y, 32); \
})
#define SET_SUBKEY_LR(INDEX, sRL) (subkey[(INDEX)] = ror64((sRL), 32))
static void camellia_setup_tail(u64 *subkey, u64 *subRL, int max)
{
u64 kw4, tt;
u32 dw, tl, tr;
/* absorb kw2 to other subkeys */
/* round 2 */
subRL[3] ^= subRL[1];
/* round 4 */
subRL[5] ^= subRL[1];
/* round 6 */
subRL[7] ^= subRL[1];
subRL[1] ^= (subRL[1] & ~subRL[9]) << 32;
/* modified for FLinv(kl2) */
dw = (subRL[1] & subRL[9]) >> 32,
subRL[1] ^= rol32(dw, 1);
/* round 8 */
subRL[11] ^= subRL[1];
/* round 10 */
subRL[13] ^= subRL[1];
/* round 12 */
subRL[15] ^= subRL[1];
subRL[1] ^= (subRL[1] & ~subRL[17]) << 32;
/* modified for FLinv(kl4) */
dw = (subRL[1] & subRL[17]) >> 32,
subRL[1] ^= rol32(dw, 1);
/* round 14 */
subRL[19] ^= subRL[1];
/* round 16 */
subRL[21] ^= subRL[1];
/* round 18 */
subRL[23] ^= subRL[1];
if (max == 24) {
/* kw3 */
subRL[24] ^= subRL[1];
/* absorb kw4 to other subkeys */
kw4 = subRL[25];
} else {
subRL[1] ^= (subRL[1] & ~subRL[25]) << 32;
/* modified for FLinv(kl6) */
dw = (subRL[1] & subRL[25]) >> 32,
subRL[1] ^= rol32(dw, 1);
/* round 20 */
subRL[27] ^= subRL[1];
/* round 22 */
subRL[29] ^= subRL[1];
/* round 24 */
subRL[31] ^= subRL[1];
/* kw3 */
subRL[32] ^= subRL[1];
/* absorb kw4 to other subkeys */
kw4 = subRL[33];
/* round 23 */
subRL[30] ^= kw4;
/* round 21 */
subRL[28] ^= kw4;
/* round 19 */
subRL[26] ^= kw4;
kw4 ^= (kw4 & ~subRL[24]) << 32;
/* modified for FL(kl5) */
dw = (kw4 & subRL[24]) >> 32,
kw4 ^= rol32(dw, 1);
}
/* round 17 */
subRL[22] ^= kw4;
/* round 15 */
subRL[20] ^= kw4;
/* round 13 */
subRL[18] ^= kw4;
kw4 ^= (kw4 & ~subRL[16]) << 32;
/* modified for FL(kl3) */
dw = (kw4 & subRL[16]) >> 32,
kw4 ^= rol32(dw, 1);
/* round 11 */
subRL[14] ^= kw4;
/* round 9 */
subRL[12] ^= kw4;
/* round 7 */
subRL[10] ^= kw4;
kw4 ^= (kw4 & ~subRL[8]) << 32;
/* modified for FL(kl1) */
dw = (kw4 & subRL[8]) >> 32,
kw4 ^= rol32(dw, 1);
/* round 5 */
subRL[6] ^= kw4;
/* round 3 */
subRL[4] ^= kw4;
/* round 1 */
subRL[2] ^= kw4;
/* kw1 */
subRL[0] ^= kw4;
/* key XOR is end of F-function */
SET_SUBKEY_LR(0, subRL[0] ^ subRL[2]); /* kw1 */
SET_SUBKEY_LR(2, subRL[3]); /* round 1 */
SET_SUBKEY_LR(3, subRL[2] ^ subRL[4]); /* round 2 */
SET_SUBKEY_LR(4, subRL[3] ^ subRL[5]); /* round 3 */
SET_SUBKEY_LR(5, subRL[4] ^ subRL[6]); /* round 4 */
SET_SUBKEY_LR(6, subRL[5] ^ subRL[7]); /* round 5 */
tl = (subRL[10] >> 32) ^ (subRL[10] & ~subRL[8]);
dw = tl & (subRL[8] >> 32), /* FL(kl1) */
tr = subRL[10] ^ rol32(dw, 1);
tt = (tr | ((u64)tl << 32));
SET_SUBKEY_LR(7, subRL[6] ^ tt); /* round 6 */
SET_SUBKEY_LR(8, subRL[8]); /* FL(kl1) */
SET_SUBKEY_LR(9, subRL[9]); /* FLinv(kl2) */
tl = (subRL[7] >> 32) ^ (subRL[7] & ~subRL[9]);
dw = tl & (subRL[9] >> 32), /* FLinv(kl2) */
tr = subRL[7] ^ rol32(dw, 1);
tt = (tr | ((u64)tl << 32));
SET_SUBKEY_LR(10, subRL[11] ^ tt); /* round 7 */
SET_SUBKEY_LR(11, subRL[10] ^ subRL[12]); /* round 8 */
SET_SUBKEY_LR(12, subRL[11] ^ subRL[13]); /* round 9 */
SET_SUBKEY_LR(13, subRL[12] ^ subRL[14]); /* round 10 */
SET_SUBKEY_LR(14, subRL[13] ^ subRL[15]); /* round 11 */
tl = (subRL[18] >> 32) ^ (subRL[18] & ~subRL[16]);
dw = tl & (subRL[16] >> 32), /* FL(kl3) */
tr = subRL[18] ^ rol32(dw, 1);
tt = (tr | ((u64)tl << 32));
SET_SUBKEY_LR(15, subRL[14] ^ tt); /* round 12 */
SET_SUBKEY_LR(16, subRL[16]); /* FL(kl3) */
SET_SUBKEY_LR(17, subRL[17]); /* FLinv(kl4) */
tl = (subRL[15] >> 32) ^ (subRL[15] & ~subRL[17]);
dw = tl & (subRL[17] >> 32), /* FLinv(kl4) */
tr = subRL[15] ^ rol32(dw, 1);
tt = (tr | ((u64)tl << 32));
SET_SUBKEY_LR(18, subRL[19] ^ tt); /* round 13 */
SET_SUBKEY_LR(19, subRL[18] ^ subRL[20]); /* round 14 */
SET_SUBKEY_LR(20, subRL[19] ^ subRL[21]); /* round 15 */
SET_SUBKEY_LR(21, subRL[20] ^ subRL[22]); /* round 16 */
SET_SUBKEY_LR(22, subRL[21] ^ subRL[23]); /* round 17 */
if (max == 24) {
SET_SUBKEY_LR(23, subRL[22]); /* round 18 */
SET_SUBKEY_LR(24, subRL[24] ^ subRL[23]); /* kw3 */
} else {
tl = (subRL[26] >> 32) ^ (subRL[26] & ~subRL[24]);
dw = tl & (subRL[24] >> 32), /* FL(kl5) */
tr = subRL[26] ^ rol32(dw, 1);
tt = (tr | ((u64)tl << 32));
SET_SUBKEY_LR(23, subRL[22] ^ tt); /* round 18 */
SET_SUBKEY_LR(24, subRL[24]); /* FL(kl5) */
SET_SUBKEY_LR(25, subRL[25]); /* FLinv(kl6) */
tl = (subRL[23] >> 32) ^ (subRL[23] & ~subRL[25]);
dw = tl & (subRL[25] >> 32), /* FLinv(kl6) */
tr = subRL[23] ^ rol32(dw, 1);
tt = (tr | ((u64)tl << 32));
SET_SUBKEY_LR(26, subRL[27] ^ tt); /* round 19 */
SET_SUBKEY_LR(27, subRL[26] ^ subRL[28]); /* round 20 */
SET_SUBKEY_LR(28, subRL[27] ^ subRL[29]); /* round 21 */
SET_SUBKEY_LR(29, subRL[28] ^ subRL[30]); /* round 22 */
SET_SUBKEY_LR(30, subRL[29] ^ subRL[31]); /* round 23 */
SET_SUBKEY_LR(31, subRL[30]); /* round 24 */
SET_SUBKEY_LR(32, subRL[32] ^ subRL[31]); /* kw3 */
}
}
static void camellia_setup128(const unsigned char *key, u64 *subkey)
{
u64 kl, kr, ww;
u64 subRL[26];
/**
* k == kl || kr (|| is concatenation)
*/
kl = get_unaligned_be64(key);
kr = get_unaligned_be64(key + 8);
/* generate KL dependent subkeys */
/* kw1 */
subRL[0] = kl;
/* kw2 */
subRL[1] = kr;
/* rotation left shift 15bit */
ROLDQ(kl, kr, 15);
/* k3 */
subRL[4] = kl;
/* k4 */
subRL[5] = kr;
/* rotation left shift 15+30bit */
ROLDQ(kl, kr, 30);
/* k7 */
subRL[10] = kl;
/* k8 */
subRL[11] = kr;
/* rotation left shift 15+30+15bit */
ROLDQ(kl, kr, 15);
/* k10 */
subRL[13] = kr;
/* rotation left shift 15+30+15+17 bit */
ROLDQ(kl, kr, 17);
/* kl3 */
subRL[16] = kl;
/* kl4 */
subRL[17] = kr;
/* rotation left shift 15+30+15+17+17 bit */
ROLDQ(kl, kr, 17);
/* k13 */
subRL[18] = kl;
/* k14 */
subRL[19] = kr;
/* rotation left shift 15+30+15+17+17+17 bit */
ROLDQ(kl, kr, 17);
/* k17 */
subRL[22] = kl;
/* k18 */
subRL[23] = kr;
/* generate KA */
kl = subRL[0];
kr = subRL[1];
CAMELLIA_F(kl, CAMELLIA_SIGMA1L, CAMELLIA_SIGMA1R, ww);
kr ^= ww;
CAMELLIA_F(kr, CAMELLIA_SIGMA2L, CAMELLIA_SIGMA2R, kl);
/* current status == (kll, klr, w0, w1) */
CAMELLIA_F(kl, CAMELLIA_SIGMA3L, CAMELLIA_SIGMA3R, kr);
kr ^= ww;
CAMELLIA_F(kr, CAMELLIA_SIGMA4L, CAMELLIA_SIGMA4R, ww);
kl ^= ww;
/* generate KA dependent subkeys */
/* k1, k2 */
subRL[2] = kl;
subRL[3] = kr;
ROLDQ(kl, kr, 15);
/* k5,k6 */
subRL[6] = kl;
subRL[7] = kr;
ROLDQ(kl, kr, 15);
/* kl1, kl2 */
subRL[8] = kl;
subRL[9] = kr;
ROLDQ(kl, kr, 15);
/* k9 */
subRL[12] = kl;
ROLDQ(kl, kr, 15);
/* k11, k12 */
subRL[14] = kl;
subRL[15] = kr;
ROLDQ(kl, kr, 34);
/* k15, k16 */
subRL[20] = kl;
subRL[21] = kr;
ROLDQ(kl, kr, 17);
/* kw3, kw4 */
subRL[24] = kl;
subRL[25] = kr;
camellia_setup_tail(subkey, subRL, 24);
}
static void camellia_setup256(const unsigned char *key, u64 *subkey)
{
u64 kl, kr; /* left half of key */
u64 krl, krr; /* right half of key */
u64 ww; /* temporary variables */
u64 subRL[34];
/**
* key = (kl || kr || krl || krr) (|| is concatenation)
*/
kl = get_unaligned_be64(key);
kr = get_unaligned_be64(key + 8);
krl = get_unaligned_be64(key + 16);
krr = get_unaligned_be64(key + 24);
/* generate KL dependent subkeys */
/* kw1 */
subRL[0] = kl;
/* kw2 */
subRL[1] = kr;
ROLDQ(kl, kr, 45);
/* k9 */
subRL[12] = kl;
/* k10 */
subRL[13] = kr;
ROLDQ(kl, kr, 15);
/* kl3 */
subRL[16] = kl;
/* kl4 */
subRL[17] = kr;
ROLDQ(kl, kr, 17);
/* k17 */
subRL[22] = kl;
/* k18 */
subRL[23] = kr;
ROLDQ(kl, kr, 34);
/* k23 */
subRL[30] = kl;
/* k24 */
subRL[31] = kr;
/* generate KR dependent subkeys */
ROLDQ(krl, krr, 15);
/* k3 */
subRL[4] = krl;
/* k4 */
subRL[5] = krr;
ROLDQ(krl, krr, 15);
/* kl1 */
subRL[8] = krl;
/* kl2 */
subRL[9] = krr;
ROLDQ(krl, krr, 30);
/* k13 */
subRL[18] = krl;
/* k14 */
subRL[19] = krr;
ROLDQ(krl, krr, 34);
/* k19 */
subRL[26] = krl;
/* k20 */
subRL[27] = krr;
ROLDQ(krl, krr, 34);
/* generate KA */
kl = subRL[0] ^ krl;
kr = subRL[1] ^ krr;
CAMELLIA_F(kl, CAMELLIA_SIGMA1L, CAMELLIA_SIGMA1R, ww);
kr ^= ww;
CAMELLIA_F(kr, CAMELLIA_SIGMA2L, CAMELLIA_SIGMA2R, kl);
kl ^= krl;
CAMELLIA_F(kl, CAMELLIA_SIGMA3L, CAMELLIA_SIGMA3R, kr);
kr ^= ww ^ krr;
CAMELLIA_F(kr, CAMELLIA_SIGMA4L, CAMELLIA_SIGMA4R, ww);
kl ^= ww;
/* generate KB */
krl ^= kl;
krr ^= kr;
CAMELLIA_F(krl, CAMELLIA_SIGMA5L, CAMELLIA_SIGMA5R, ww);
krr ^= ww;
CAMELLIA_F(krr, CAMELLIA_SIGMA6L, CAMELLIA_SIGMA6R, ww);
krl ^= ww;
/* generate KA dependent subkeys */
ROLDQ(kl, kr, 15);
/* k5 */
subRL[6] = kl;
/* k6 */
subRL[7] = kr;
ROLDQ(kl, kr, 30);
/* k11 */
subRL[14] = kl;
/* k12 */
subRL[15] = kr;
/* rotation left shift 32bit */
ROLDQ(kl, kr, 32);
/* kl5 */
subRL[24] = kl;
/* kl6 */
subRL[25] = kr;
/* rotation left shift 17 from k11,k12 -> k21,k22 */
ROLDQ(kl, kr, 17);
/* k21 */
subRL[28] = kl;
/* k22 */
subRL[29] = kr;
/* generate KB dependent subkeys */
/* k1 */
subRL[2] = krl;
/* k2 */
subRL[3] = krr;
ROLDQ(krl, krr, 30);
/* k7 */
subRL[10] = krl;
/* k8 */
subRL[11] = krr;
ROLDQ(krl, krr, 30);
/* k15 */
subRL[20] = krl;
/* k16 */
subRL[21] = krr;
ROLDQ(krl, krr, 51);
/* kw3 */
subRL[32] = krl;
/* kw4 */
subRL[33] = krr;
camellia_setup_tail(subkey, subRL, 32);
}
static void camellia_setup192(const unsigned char *key, u64 *subkey)
{
unsigned char kk[32];
u64 krl, krr;
memcpy(kk, key, 24);
memcpy((unsigned char *)&krl, key+16, 8);
krr = ~krl;
memcpy(kk+24, (unsigned char *)&krr, 8);
camellia_setup256(kk, subkey);
}
static int __camellia_setkey(struct camellia_ctx *cctx,
const unsigned char *key,
unsigned int key_len, u32 *flags)
{
if (key_len != 16 && key_len != 24 && key_len != 32) {
*flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
return -EINVAL;
}
cctx->key_length = key_len;
switch (key_len) {
case 16:
camellia_setup128(key, cctx->key_table);
break;
case 24:
camellia_setup192(key, cctx->key_table);
break;
case 32:
camellia_setup256(key, cctx->key_table);
break;
}
return 0;
}
static int camellia_setkey(struct crypto_tfm *tfm, const u8 *in_key,
unsigned int key_len)
{
return __camellia_setkey(crypto_tfm_ctx(tfm), in_key, key_len,
&tfm->crt_flags);
}
static void camellia_decrypt_cbc_2way(void *ctx, u128 *dst, const u128 *src)
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
{
u128 iv = *src;
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
camellia_dec_blk_2way(ctx, (u8 *)dst, (u8 *)src);
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
u128_xor(&dst[1], &dst[1], &iv);
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
}
static void camellia_crypt_ctr(void *ctx, u128 *dst, const u128 *src, le128 *iv)
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
{
be128 ctrblk;
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
if (dst != src)
*dst = *src;
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
le128_to_be128(&ctrblk, iv);
le128_inc(iv);
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
camellia_enc_blk_xor(ctx, (u8 *)dst, (u8 *)&ctrblk);
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
}
static void camellia_crypt_ctr_2way(void *ctx, u128 *dst, const u128 *src,
le128 *iv)
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
{
be128 ctrblks[2];
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
if (dst != src) {
dst[0] = src[0];
dst[1] = src[1];
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
}
le128_to_be128(&ctrblks[0], iv);
le128_inc(iv);
le128_to_be128(&ctrblks[1], iv);
le128_inc(iv);
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
camellia_enc_blk_xor_2way(ctx, (u8 *)dst, (u8 *)ctrblks);
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
}
static const struct common_glue_ctx camellia_enc = {
.num_funcs = 2,
.fpu_blocks_limit = -1,
.funcs = { {
.num_blocks = 2,
.fn_u = { .ecb = GLUE_FUNC_CAST(camellia_enc_blk_2way) }
}, {
.num_blocks = 1,
.fn_u = { .ecb = GLUE_FUNC_CAST(camellia_enc_blk) }
} }
};
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
static const struct common_glue_ctx camellia_ctr = {
.num_funcs = 2,
.fpu_blocks_limit = -1,
.funcs = { {
.num_blocks = 2,
.fn_u = { .ctr = GLUE_CTR_FUNC_CAST(camellia_crypt_ctr_2way) }
}, {
.num_blocks = 1,
.fn_u = { .ctr = GLUE_CTR_FUNC_CAST(camellia_crypt_ctr) }
} }
};
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
static const struct common_glue_ctx camellia_dec = {
.num_funcs = 2,
.fpu_blocks_limit = -1,
.funcs = { {
.num_blocks = 2,
.fn_u = { .ecb = GLUE_FUNC_CAST(camellia_dec_blk_2way) }
}, {
.num_blocks = 1,
.fn_u = { .ecb = GLUE_FUNC_CAST(camellia_dec_blk) }
} }
};
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
static const struct common_glue_ctx camellia_dec_cbc = {
.num_funcs = 2,
.fpu_blocks_limit = -1,
.funcs = { {
.num_blocks = 2,
.fn_u = { .cbc = GLUE_CBC_FUNC_CAST(camellia_decrypt_cbc_2way) }
}, {
.num_blocks = 1,
.fn_u = { .cbc = GLUE_CBC_FUNC_CAST(camellia_dec_blk) }
} }
};
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
static int ecb_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
struct scatterlist *src, unsigned int nbytes)
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
{
return glue_ecb_crypt_128bit(&camellia_enc, desc, dst, src, nbytes);
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
}
static int ecb_decrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
struct scatterlist *src, unsigned int nbytes)
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
{
return glue_ecb_crypt_128bit(&camellia_dec, desc, dst, src, nbytes);
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
}
static int cbc_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
struct scatterlist *src, unsigned int nbytes)
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
{
return glue_cbc_encrypt_128bit(GLUE_FUNC_CAST(camellia_enc_blk), desc,
dst, src, nbytes);
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
}
static int cbc_decrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
struct scatterlist *src, unsigned int nbytes)
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
{
return glue_cbc_decrypt_128bit(&camellia_dec_cbc, desc, dst, src,
nbytes);
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
}
static int ctr_crypt(struct blkcipher_desc *desc, struct scatterlist *dst,
struct scatterlist *src, unsigned int nbytes)
{
return glue_ctr_crypt_128bit(&camellia_ctr, desc, dst, src, nbytes);
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
}
static void encrypt_callback(void *priv, u8 *srcdst, unsigned int nbytes)
{
const unsigned int bsize = CAMELLIA_BLOCK_SIZE;
struct camellia_ctx *ctx = priv;
int i;
while (nbytes >= 2 * bsize) {
camellia_enc_blk_2way(ctx, srcdst, srcdst);
srcdst += bsize * 2;
nbytes -= bsize * 2;
}
for (i = 0; i < nbytes / bsize; i++, srcdst += bsize)
camellia_enc_blk(ctx, srcdst, srcdst);
}
static void decrypt_callback(void *priv, u8 *srcdst, unsigned int nbytes)
{
const unsigned int bsize = CAMELLIA_BLOCK_SIZE;
struct camellia_ctx *ctx = priv;
int i;
while (nbytes >= 2 * bsize) {
camellia_dec_blk_2way(ctx, srcdst, srcdst);
srcdst += bsize * 2;
nbytes -= bsize * 2;
}
for (i = 0; i < nbytes / bsize; i++, srcdst += bsize)
camellia_dec_blk(ctx, srcdst, srcdst);
}
struct camellia_lrw_ctx {
struct lrw_table_ctx lrw_table;
struct camellia_ctx camellia_ctx;
};
static int lrw_camellia_setkey(struct crypto_tfm *tfm, const u8 *key,
unsigned int keylen)
{
struct camellia_lrw_ctx *ctx = crypto_tfm_ctx(tfm);
int err;
err = __camellia_setkey(&ctx->camellia_ctx, key,
keylen - CAMELLIA_BLOCK_SIZE,
&tfm->crt_flags);
if (err)
return err;
return lrw_init_table(&ctx->lrw_table,
key + keylen - CAMELLIA_BLOCK_SIZE);
}
static int lrw_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
struct scatterlist *src, unsigned int nbytes)
{
struct camellia_lrw_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
be128 buf[2 * 4];
struct lrw_crypt_req req = {
.tbuf = buf,
.tbuflen = sizeof(buf),
.table_ctx = &ctx->lrw_table,
.crypt_ctx = &ctx->camellia_ctx,
.crypt_fn = encrypt_callback,
};
return lrw_crypt(desc, dst, src, nbytes, &req);
}
static int lrw_decrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
struct scatterlist *src, unsigned int nbytes)
{
struct camellia_lrw_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
be128 buf[2 * 4];
struct lrw_crypt_req req = {
.tbuf = buf,
.tbuflen = sizeof(buf),
.table_ctx = &ctx->lrw_table,
.crypt_ctx = &ctx->camellia_ctx,
.crypt_fn = decrypt_callback,
};
return lrw_crypt(desc, dst, src, nbytes, &req);
}
static void lrw_exit_tfm(struct crypto_tfm *tfm)
{
struct camellia_lrw_ctx *ctx = crypto_tfm_ctx(tfm);
lrw_free_table(&ctx->lrw_table);
}
struct camellia_xts_ctx {
struct camellia_ctx tweak_ctx;
struct camellia_ctx crypt_ctx;
};
static int xts_camellia_setkey(struct crypto_tfm *tfm, const u8 *key,
unsigned int keylen)
{
struct camellia_xts_ctx *ctx = crypto_tfm_ctx(tfm);
u32 *flags = &tfm->crt_flags;
int err;
/* key consists of keys of equal size concatenated, therefore
* the length must be even
*/
if (keylen % 2) {
*flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
return -EINVAL;
}
/* first half of xts-key is for crypt */
err = __camellia_setkey(&ctx->crypt_ctx, key, keylen / 2, flags);
if (err)
return err;
/* second half of xts-key is for tweak */
return __camellia_setkey(&ctx->tweak_ctx, key + keylen / 2, keylen / 2,
flags);
}
static int xts_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
struct scatterlist *src, unsigned int nbytes)
{
struct camellia_xts_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
be128 buf[2 * 4];
struct xts_crypt_req req = {
.tbuf = buf,
.tbuflen = sizeof(buf),
.tweak_ctx = &ctx->tweak_ctx,
.tweak_fn = XTS_TWEAK_CAST(camellia_enc_blk),
.crypt_ctx = &ctx->crypt_ctx,
.crypt_fn = encrypt_callback,
};
return xts_crypt(desc, dst, src, nbytes, &req);
}
static int xts_decrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
struct scatterlist *src, unsigned int nbytes)
{
struct camellia_xts_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
be128 buf[2 * 4];
struct xts_crypt_req req = {
.tbuf = buf,
.tbuflen = sizeof(buf),
.tweak_ctx = &ctx->tweak_ctx,
.tweak_fn = XTS_TWEAK_CAST(camellia_enc_blk),
.crypt_ctx = &ctx->crypt_ctx,
.crypt_fn = decrypt_callback,
};
return xts_crypt(desc, dst, src, nbytes, &req);
}
static struct crypto_alg camellia_algs[6] = { {
.cra_name = "camellia",
.cra_driver_name = "camellia-asm",
.cra_priority = 200,
.cra_flags = CRYPTO_ALG_TYPE_CIPHER,
.cra_blocksize = CAMELLIA_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct camellia_ctx),
.cra_alignmask = 0,
.cra_module = THIS_MODULE,
.cra_u = {
.cipher = {
.cia_min_keysize = CAMELLIA_MIN_KEY_SIZE,
.cia_max_keysize = CAMELLIA_MAX_KEY_SIZE,
.cia_setkey = camellia_setkey,
.cia_encrypt = camellia_encrypt,
.cia_decrypt = camellia_decrypt
}
}
}, {
.cra_name = "ecb(camellia)",
.cra_driver_name = "ecb-camellia-asm",
.cra_priority = 300,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER,
.cra_blocksize = CAMELLIA_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct camellia_ctx),
.cra_alignmask = 0,
.cra_type = &crypto_blkcipher_type,
.cra_module = THIS_MODULE,
.cra_u = {
.blkcipher = {
.min_keysize = CAMELLIA_MIN_KEY_SIZE,
.max_keysize = CAMELLIA_MAX_KEY_SIZE,
.setkey = camellia_setkey,
.encrypt = ecb_encrypt,
.decrypt = ecb_decrypt,
},
},
}, {
.cra_name = "cbc(camellia)",
.cra_driver_name = "cbc-camellia-asm",
.cra_priority = 300,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER,
.cra_blocksize = CAMELLIA_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct camellia_ctx),
.cra_alignmask = 0,
.cra_type = &crypto_blkcipher_type,
.cra_module = THIS_MODULE,
.cra_u = {
.blkcipher = {
.min_keysize = CAMELLIA_MIN_KEY_SIZE,
.max_keysize = CAMELLIA_MAX_KEY_SIZE,
.ivsize = CAMELLIA_BLOCK_SIZE,
.setkey = camellia_setkey,
.encrypt = cbc_encrypt,
.decrypt = cbc_decrypt,
},
},
}, {
.cra_name = "ctr(camellia)",
.cra_driver_name = "ctr-camellia-asm",
.cra_priority = 300,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER,
.cra_blocksize = 1,
.cra_ctxsize = sizeof(struct camellia_ctx),
.cra_alignmask = 0,
.cra_type = &crypto_blkcipher_type,
.cra_module = THIS_MODULE,
.cra_u = {
.blkcipher = {
.min_keysize = CAMELLIA_MIN_KEY_SIZE,
.max_keysize = CAMELLIA_MAX_KEY_SIZE,
.ivsize = CAMELLIA_BLOCK_SIZE,
.setkey = camellia_setkey,
.encrypt = ctr_crypt,
.decrypt = ctr_crypt,
},
},
}, {
.cra_name = "lrw(camellia)",
.cra_driver_name = "lrw-camellia-asm",
.cra_priority = 300,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER,
.cra_blocksize = CAMELLIA_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct camellia_lrw_ctx),
.cra_alignmask = 0,
.cra_type = &crypto_blkcipher_type,
.cra_module = THIS_MODULE,
.cra_exit = lrw_exit_tfm,
.cra_u = {
.blkcipher = {
.min_keysize = CAMELLIA_MIN_KEY_SIZE +
CAMELLIA_BLOCK_SIZE,
.max_keysize = CAMELLIA_MAX_KEY_SIZE +
CAMELLIA_BLOCK_SIZE,
.ivsize = CAMELLIA_BLOCK_SIZE,
.setkey = lrw_camellia_setkey,
.encrypt = lrw_encrypt,
.decrypt = lrw_decrypt,
},
},
}, {
.cra_name = "xts(camellia)",
.cra_driver_name = "xts-camellia-asm",
.cra_priority = 300,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER,
.cra_blocksize = CAMELLIA_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct camellia_xts_ctx),
.cra_alignmask = 0,
.cra_type = &crypto_blkcipher_type,
.cra_module = THIS_MODULE,
.cra_u = {
.blkcipher = {
.min_keysize = CAMELLIA_MIN_KEY_SIZE * 2,
.max_keysize = CAMELLIA_MAX_KEY_SIZE * 2,
.ivsize = CAMELLIA_BLOCK_SIZE,
.setkey = xts_camellia_setkey,
.encrypt = xts_encrypt,
.decrypt = xts_decrypt,
},
},
} };
static bool is_blacklisted_cpu(void)
{
if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
return false;
if (boot_cpu_data.x86 == 0x0f) {
/*
* On Pentium 4, camellia-asm is slower than original assembler
* implementation because excessive uses of 64bit rotate and
* left-shifts (which are really slow on P4) needed to store and
* handle 128bit block in two 64bit registers.
*/
return true;
}
return false;
}
static int force;
module_param(force, int, 0);
MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");
static int __init init(void)
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
{
if (!force && is_blacklisted_cpu()) {
printk(KERN_INFO
"camellia-x86_64: performance on this CPU "
"would be suboptimal: disabling "
"camellia-x86_64.\n");
return -ENODEV;
}
return crypto_register_algs(camellia_algs, ARRAY_SIZE(camellia_algs));
}
static void __exit fini(void)
crypto: camellia - add assembler implementation for x86_64 Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of functions are provided. First set is regular 'one-block at time' encrypt/decrypt functions. Second is 'two-block at time' functions that gain performance increase on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way functions with in-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: AMD Phenom II 1055T (fam:16, model:10): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x 64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x 256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x 1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x 8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x 64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x 256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x 1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x 8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.15x 1.22x ecb-dec 1.16x 1.16x cbc-enc 0.85x 0.90x cbc-dec 1.20x 1.23x ctr-enc 1.28x 1.30x ctr-dec 1.27x 1.28x lrw-enc 1.12x 1.16x lrw-dec 1.08x 1.10x xts-enc 1.11x 1.15x xts-dec 1.14x 1.15x Intel Core2 T8100 (fam:6, model:23, step:6): camellia-asm vs camellia_generic: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x 64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x 256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x 1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x 8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x 64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x 256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x 1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x 8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x camellia-asm vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.17x 1.21x ecb-dec 1.17x 1.20x cbc-enc 0.80x 0.82x cbc-dec 1.22x 1.24x ctr-enc 1.25x 1.26x ctr-dec 1.25x 1.26x lrw-enc 1.14x 1.18x lrw-dec 1.13x 1.17x xts-enc 1.14x 1.18x xts-dec 1.14x 1.17x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-03-06 02:26:47 +08:00
{
crypto_unregister_algs(camellia_algs, ARRAY_SIZE(camellia_algs));
}
module_init(init);
module_exit(fini);
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Camellia Cipher Algorithm, asm optimized");
MODULE_ALIAS("camellia");
MODULE_ALIAS("camellia-asm");