linux/drivers/block
Sergey Senozhatsky beca3ec71f zram: add multi stream functionality
Existing zram (zcomp) implementation has only one compression stream
(buffer and algorithm private part), so in order to prevent data
corruption only one write (compress operation) can use this compression
stream, forcing all concurrent write operations to wait for stream lock
to be released.  This patch changes zcomp to keep a compression streams
list of user-defined size (via sysfs device attr).  Each write operation
still exclusively holds compression stream, the difference is that we
can have N write operations (depending on size of streams list)
executing in parallel.  See TEST section later in commit message for
performance data.

Introduce struct zcomp_strm_multi and a set of functions to manage
zcomp_strm stream access.  zcomp_strm_multi has a list of idle
zcomp_strm structs, spinlock to protect idle list and wait queue, making
it possible to perform parallel compressions.

The following set of functions added:
- zcomp_strm_multi_find()/zcomp_strm_multi_release()
  find and release a compression stream, implement required locking
- zcomp_strm_multi_create()/zcomp_strm_multi_destroy()
  create and destroy zcomp_strm_multi

zcomp ->strm_find() and ->strm_release() callbacks are set during
initialisation to zcomp_strm_multi_find()/zcomp_strm_multi_release()
correspondingly.

Each time zcomp issues a zcomp_strm_multi_find() call, the following set
of operations performed:

- spin lock strm_lock
- if idle list is not empty, remove zcomp_strm from idle list, spin
  unlock and return zcomp stream pointer to caller
- if idle list is empty, current adds itself to wait queue. it will be
  awaken by zcomp_strm_multi_release() caller.

zcomp_strm_multi_release():
- spin lock strm_lock
- add zcomp stream to idle list
- spin unlock, wake up sleeper

Minchan Kim reported that spinlock-based locking scheme has demonstrated
a severe perfomance regression for single compression stream case,
comparing to mutex-based (see https://lkml.org/lkml/2014/2/18/16)

base                      spinlock                    mutex

==Initial write           ==Initial write             ==Initial  write
records:  5               records:  5                 records:   5
avg:      1642424.35      avg:      699610.40         avg:       1655583.71
std:      39890.95(2.43%) std:      232014.19(33.16%) std:       52293.96
max:      1690170.94      max:      1163473.45        max:       1697164.75
min:      1568669.52      min:      573429.88         min:       1553410.23
==Rewrite                 ==Rewrite                   ==Rewrite
records:  5               records:  5                 records:   5
avg:      1611775.39      avg:      501406.64         avg:       1684419.11
std:      17144.58(1.06%) std:      15354.41(3.06%)   std:       18367.42
max:      1641800.95      max:      531356.78         max:       1706445.84
min:      1593515.27      min:      488817.78         min:       1655335.73

When only one compression stream available, mutex with spin on owner
tends to perform much better than frequent wait_event()/wake_up().  This
is why single stream implemented as a special case with mutex locking.

Introduce and document zram device attribute max_comp_streams.  This
attr shows and stores current zcomp's max number of zcomp streams
(max_strm).  Extend zcomp's zcomp_create() with `max_strm' parameter.
`max_strm' limits the number of zcomp_strm structs in compression
backend's idle list (max_comp_streams).

max_comp_streams used during initialisation as follows:
-- passing to zcomp_create() max_strm equals to 1 will initialise zcomp
using single compression stream zcomp_strm_single (mutex-based locking).
-- passing to zcomp_create() max_strm greater than 1 will initialise zcomp
using multi compression stream zcomp_strm_multi (spinlock-based locking).

default max_comp_streams value is 1, meaning that zram with single stream
will be initialised.

Later patch will introduce configuration knob to change max_comp_streams
on already initialised and used zcomp.

TEST
iozone -t 3 -R -r 16K -s 60M -I +Z

       test           base       1 strm (mutex)     3 strm (spinlock)
-----------------------------------------------------------------------
 Initial write      589286.78       583518.39          718011.05
       Rewrite      604837.97       596776.38         1515125.72
  Random write      584120.11       595714.58         1388850.25
        Pwrite      535731.17       541117.38          739295.27
        Fwrite     1418083.88      1478612.72         1484927.06

Usage example:
set max_comp_streams to 4
        echo 4 > /sys/block/zram0/max_comp_streams

show current max_comp_streams (default value is 1).
        cat /sys/block/zram0/max_comp_streams

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:01 -07:00
..
aoe mm: close PageTail race 2014-03-04 07:55:47 -08:00
drbd drbd: Fix future possible NULL pointer dereference 2014-02-21 15:42:34 -08:00
mtip32xx Merge branch 'for-3.15/drivers' of git://git.kernel.dk/linux-block 2014-04-01 19:43:53 -07:00
paride drivers/block/paride/pg.c: underflow bug in pg_write() 2014-01-21 20:16:56 -08:00
rsxx block: Convert bio_for_each_segment() to bvec_iter 2013-11-23 22:33:49 -08:00
xen-blkback xen-blkback: init persistent_purge_work work_struct 2014-02-11 20:34:03 -07:00
zram zram: add multi stream functionality 2014-04-07 16:36:01 -07:00
DAC960.c DAC960: remove sleep_on usage 2014-03-13 14:56:38 -06:00
DAC960.h
Kconfig zram: promote zram from staging 2014-01-30 16:56:55 -08:00
Makefile zram: promote zram from staging 2014-01-30 16:56:55 -08:00
amiflop.c tree-wide: use reinit_completion instead of INIT_COMPLETION 2013-11-15 09:32:21 +09:00
ataflop.c ataflop: fix sleep_on races 2014-03-13 14:56:38 -06:00
brd.c block: Convert bio_for_each_segment() to bvec_iter 2013-11-23 22:33:49 -08:00
cciss.c cciss: Fallback to MSI rather than to INTx if MSI-X failed 2014-03-13 14:56:39 -06:00
cciss.h cciss: Adds simple mode functionality 2011-08-08 11:40:15 +02:00
cciss_cmd.h cciss: use new doorbell-bit-5 reset method 2011-05-06 08:23:55 -06:00
cciss_scsi.c cciss: switch to ->show_info() 2013-04-09 14:13:19 -04:00
cciss_scsi.h cciss: add cciss_tape_cmds module paramter 2011-05-06 08:23:59 -06:00
cpqarray.c cpqarray: fix info leak in ida_locked_ioctl() 2013-09-24 17:00:26 -07:00
cpqarray.h
cryptoloop.c move linux/loop.h to drivers/block 2013-06-29 12:46:45 +04:00
floppy.c floppy: don't use PREPARE_[DELAYED_]WORK 2014-03-07 10:24:48 -05:00
hd.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
ida_cmd.h
ida_ioctl.h
loop.c drivers/block/loop.c: fix comment typo in loop_config_discard 2014-01-21 20:16:56 -08:00
loop.h move linux/loop.h to drivers/block 2013-06-29 12:46:45 +04:00
mg_disk.c mg_disk: Spelling s/finised/finished/ 2014-01-21 20:34:58 -08:00
nbd.c block: Immutable bio vecs 2013-11-23 22:33:49 -08:00
null_blk.c null_blk: use blk_complete_request and blk_mq_complete_request 2014-02-10 09:27:31 -07:00
nvme-core.c Merge branch 'for-3.15/drivers' of git://git.kernel.dk/linux-block 2014-04-01 19:43:53 -07:00
nvme-scsi.c NVMe: compat SG_IO ioctl 2013-12-16 15:49:40 -05:00
osdblk.c block: replace strict_strtoul() with kstrtoul() 2013-09-11 15:56:56 -07:00
pktcdvd.c pktcdvd: fix error return code 2014-01-03 10:05:34 +01:00
ps3disk.c block: Kill bio_segments()/bi_vcnt usage 2013-11-23 22:33:51 -08:00
ps3vram.c block: Convert bio_for_each_segment() to bvec_iter 2013-11-23 22:33:49 -08:00
rbd.c rbd: drop an unsafe assertion 2014-03-29 10:38:14 -07:00
rbd_types.h rbd: get rid of RBD_MAX_SEG_NAME_LEN 2012-12-17 08:37:29 -06:00
skd_main.c skd: Use pci_enable_msix_range() instead of pci_enable_msix() 2014-02-21 15:45:26 -08:00
skd_s1120.h skd: fix formatting in skd_s1120.h 2013-11-08 09:10:30 -07:00
smart1,2.h fix typos 'comamnd' -> 'command' in comments 2011-02-02 11:31:21 +01:00
sunvdc.c sunvdc: Fix off-by-one in generic_request(). 2013-02-14 11:49:01 -08:00
swim.c drivers/block/swim.c: remove unnecessary platform_set_drvdata() 2013-09-11 15:56:59 -07:00
swim3.c swim3: fix interruptible_sleep_on race 2014-03-13 14:56:38 -06:00
swim_asm.S
sx8.c drivers/block/sx8.c: remove unnecessary pci_set_drvdata() 2014-01-21 20:16:56 -08:00
umem.c block: Convert drivers to immutable biovecs 2013-11-23 22:33:51 -08:00
umem.h
virtio_blk.c Nothing exciting: virtio-blk users might see a bit of a boost from the 2014-04-02 14:43:17 -07:00
xen-blkfront.c Merge branch 'stable/for-jens-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip into for-linus 2014-02-10 12:52:34 -07:00
xsysace.c xilinx systemace: Fix sparse warnings 2013-07-10 07:47:12 +02:00
z2ram.c block/z2ram: Remove duplicate external declarations 2013-11-26 11:09:10 +01:00