Change 130f315a (staging: zram: remove special handle of uncompressed page)
introduced a bug in the handling of incompressible pages which resulted in
memory allocation failure for such pages.
When a page expands on compression, say from 4K to 4K+30, we were trying to
do zsmalloc(pool, 4K+30). However, the maximum size which zsmalloc can
allocate is PAGE_SIZE (for obvious reasons), so such allocation requests
always return failure (0).
For a page that has compressed size larger than the original size (this may
happen with already compressed or random data), there is no point storing
the compressed version as that would take more space and would also require
time for decompression when needed again. So, the fix is to store any page,
whose compressed size exceeds a threshold (max_zpage_size), as-it-is i.e.
without compression. Memory required for storing this uncompressed page can
then be requested from zsmalloc which supports PAGE_SIZE sized allocations.
Lastly, the fix checks that we do not attempt to "decompress" the page which
we stored in the uncompressed form -- we just memcpy() out such pages.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Reported-by: viechweg@gmail.com
Reported-by: paerley@gmail.com
Reported-by: wu.tommy@gmail.com
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch improves mapping performance in zsmalloc by getting
usage information from the user in the form of a "mapping mode"
and using it to avoid unnecessary copying for objects that span
pages.
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch switches zcache and zram dependency to ZSMALLOC
rather than X86. There is no net change since ZSMALLOC
depends on X86, however, this prevent further changes to
these files as zsmalloc dependencies change.
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Using the __aligned() attribute in favor of __attribute__((aligned(size)))
Signed-off-by: Sam Hansen <solid.se7en@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Porting zram to use the pr_warn() function instead of the deprecated
pr_warning().
Signed-off-by: Sam Hansen <solid.se7en@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xvmalloc can't handle PAGE_SIZE page so that zram have to
handle it specially but zsmalloc can do it so let's remove
unnecessary special handling code.
Quote from Nitin
"I think page vs handle distinction was added since xvmalloc could not
handle full page allocation. Now that zsmalloc allows full page
allocation, we can just use it for both cases. This would also allow
removing the ZRAM_UNCOMPRESSED flag. The only downside will be slightly
slower code path for full page allocation but this event is anyways
supposed to be rare, so should be fine."
1. This patch reduces code very much.
drivers/staging/zram/zram_drv.c | 104 +++++--------------------------------
drivers/staging/zram/zram_drv.h | 17 +-----
drivers/staging/zram/zram_sysfs.c | 6 +--
3 files changed, 15 insertions(+), 112 deletions(-)
2. change pages_expand with bad_compress so it can count
bad compression(above 75%) ratio.
3. remove zobj_header which is for back-reference for defragmentation
because firstly, it's not used at the moment and zsmalloc can't handle
bigger size than PAGE_SIZE so zram can't do it any more without redesign.
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fd1a30de makes a bug that it uses (struct page *) as zsmalloc's handle
although it's a uncompressed page so that it can access random page,
return random data or even crashed by get_first_page in zs_map_object.
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We should use unsigned long as handle instead of void * to avoid any
confusion. Without this, users may just treat zs_malloc return value as
a pointer and try to deference it.
This patch passed compile test(zram, zcache and ramster) and zram is
tested on qemu.
changelog
* from v2
- remove hval pointed out by Nitin
- based on next-20120607
* from v1
- change zcache's zv_create return value
- baesd on next-20120604
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull kmap_atomic cleanup from Cong Wang.
It's been in -next for a long time, and it gets rid of the (no longer
used) second argument to k[un]map_atomic().
Fix up a few trivial conflicts in various drivers, and do an "evil
merge" to catch some new uses that have come in since Cong's tree.
* 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
drbd: remove the second argument of k[un]map_atomic()
zcache: remove the second argument of k[un]map_atomic()
gma500: remove the second argument of k[un]map_atomic()
dm: remove the second argument of k[un]map_atomic()
tomoyo: remove the second argument of k[un]map_atomic()
sunrpc: remove the second argument of k[un]map_atomic()
rds: remove the second argument of k[un]map_atomic()
net: remove the second argument of k[un]map_atomic()
mm: remove the second argument of k[un]map_atomic()
lib: remove the second argument of k[un]map_atomic()
power: remove the second argument of k[un]map_atomic()
kdb: remove the second argument of k[un]map_atomic()
udf: remove the second argument of k[un]map_atomic()
ubifs: remove the second argument of k[un]map_atomic()
squashfs: remove the second argument of k[un]map_atomic()
reiserfs: remove the second argument of k[un]map_atomic()
ocfs2: remove the second argument of k[un]map_atomic()
ntfs: remove the second argument of k[un]map_atomic()
...
zram accepts number of devices to be created
as a module parameter. This was renamed from
num_devices to zram_num_devices (without updating
the documentation!) since num_devices was declared
as a non-static global variable, polluting the global
namespace. Now, we declare it as a static variable
and revert back the name change.
The documentation (zram.txt) already mentions
num_devices as the module parameter name.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
linux/vmalloc.h added to zsmalloc-main.c to resolve implicit
declaration errors.
X86 dependency added to zsmalloc and dependent drivers zcache and zram.
This X86 only requirement is not ideal. Working to find portable
functions for __flush_tlb_one and set_pte.
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The allocation of zram->compress_buffer is misssing a GFP_* specifier.
This is equivalent to GFP_NOWAIT but it is more likely a omission.
Since the allocation just above it uses GFP_KERNEL, there is no reason
to use GFP_NOWAIT here. Therefore, add GFP_KERNEL.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As reported by checkpatch.pl strict_strtoX is obsolet and should be
replaced by kstrtoX.
Signed-off-by: Sergey Datsevich <srgdts@gmail.com>
Signed-off-by: Bjoern Meier <bjoernmeier@hotmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits)
block: don't call blk_drain_queue() if elevator is not up
blk-throttle: use queue_is_locked() instead of lockdep_is_held()
blk-throttle: Take blkcg->lock while traversing blkcg->policy_list
blk-throttle: Free up policy node associated with deleted rule
block: warn if tag is greater than real_max_depth.
block: make gendisk hold a reference to its queue
blk-flush: move the queue kick into
blk-flush: fix invalid BUG_ON in blk_insert_flush
block: Remove the control of complete cpu from bio.
block: fix a typo in the blk-cgroup.h file
block: initialize the bounce pool if high memory may be added later
block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown
block: drop @tsk from attempt_plug_merge() and explain sync rules
block: make get_request[_wait]() fail if queue is dead
block: reorganize throtl_get_tg() and blk_throtl_bio()
block: reorganize queue draining
block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg()
block: pass around REQ_* flags instead of broken down booleans during request alloc/free
block: move blk_throtl prototypes to block/blk.h
block: fix genhd refcounting in blkio_policy_parse_and_set()
...
Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion
and making the request functions be of type "void" instead of "int" in
- drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c}
- drivers/staging/zram/zram_drv.c
There is very little benefit in allowing to let a ->make_request
instance update the bios device and sector and loop around it in
__generic_make_request when we can archive the same through calling
generic_make_request from the driver and letting the loop in
generic_make_request handle it.
Note that various drivers got the return value from ->make_request and
returned non-zero values for errors.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Fixes sparse warning:
zram_drv.c:666:6: warning: symbol 'zram_slot_free_notify' was not
declared. Should it be static?
Also, max_zpage_size is now size_t just to be consistent with data-type
of other variables maintaining sizes of various kinds.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When the allocation of zram->table fails, we set zram->disksize to zero
to prevent accessing the unallocated table entries during cleanup.
However, we currently don't take this precaution when the initialization
fails earlier.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently init_lock only prevents concurrent execution of zram_init_device()
and zram_reset_device() but not zram_make_request() nor sysfs store functions.
This patch changes init_lock into a rw_semaphore. A write lock is taken by
init, reset and store functions, a read lock is taken by zram_make_request().
Also, avoids to release the lock before calling __zram_reset_device() for
cleaning after a failed init, thus preventing any concurrent task to see an
inconsistent state of zram.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The global variable "num_devices" is too general to be
global. This patch switches the name to be "zram_num_devices".
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The global variable "devices" is too general to be global.
This patch switches the name to be "zram_devices".
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes the unmapping order of KM_USER0/1 in
handle_uncompressed_page() and zram_read() so that kmap()/kunmap() calls
are correctly nested.
Reported-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Reviewed-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently, nothing protects zram table from concurrent access.
For instance, ZRAM_UNCOMPRESSED bit can be cleared by zram_free_page()
called from a concurrent write between the time ZRAM_UNCOMPRESSED has
been set and the time it is tested to unmap KM_USER0 in
zram_bvec_write(). This ultimately leads to kernel panic.
Also, a read request can occurs when the page has been freed by a
running write request and before it has been updated, leading to
zero filled block being incorrectly read and "Read before write"
error message.
This patch replace the current mutex by a rw_semaphore. It extends
the protection to zram table (currently, only compression buffers are
protected) and read requests (currently, only write requests are
protected).
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 7b19b8d45b (zram: Prevent overflow
in logical block size) introduced ZRAM_LOGICAL_BLOCK_SIZE constant to
prevent overflow of logical block size on 64k page kernel.
However, the current implementation of zram only allow operation on block
of the same size as a page. That makes theorically legit 4k requests fail
on 64k page kernel.
This patch makes zram allow operation on partial pages. Basically, it
means we still do operations on full pages internally, but only copy the
relevent segments from/to the user memory.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch refactor the code of zram_read/write() functions. It does
not removes a lot of duplicate code alone, but is mostly a helper for
the third patch of this series (Staging: zram: allow partial page
operations).
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The offset of uncompressed page is always zero: handle_uncompressed_page()
doesn't have to care about it.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Both zram and zcache use xvmalloc allocator. If xvmalloc
is compiled separately for both of them, we will get linker
error if they are both selected as "built-in". We can also
get linker error regarding missing xvmalloc symbols if zram
is not built.
So, we now compile xvmalloc separately and export its symbols
which are then used by both of zram and zcache.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently the device is initialized when first write is done to the
device. Any read attempt before the first write would fail, including
"hidden" read the user may not know about (as for example if he tries
to write a partial block).
This patch initializes the device on first request, whether read or
write.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is to resolve a merge conflict with:
drivers/staging/zram/zram_drv.c
as pointed out by Stephen Rothwell
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In zram_read() and zram_write() we were not incrementing the
index number and thus were reading/writing values from/to
incorrect sectors on zram disk, resulting in data corruption.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch eliminates duplicate code. The remove_block_head function
is a special case of remove_block which can be contained in remove_block
without confusion.
The portion of code in remove_block_head which was noted as "DEBUG ONLY"
is now mandatory. Doing this provides consistent management of the double
linked list of blocks under a freelist and makes this consolidation
of delete block code safe. The first and last blocks will have NULL
pointers in their previous and next page pointers respectively.
Additionally, any time a block is removed from a free list the next and
previous pointers will be set to NULL to avoid misuse outside xvmalloc.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently zram will do nothing to the page in the bvec when that page
has not been previously written. This allows random data to leak to
user space. That can be seen by doing the following:
## Load the module and create a 256Mb zram device called /dev/zram0
# modprobe zram
# echo $((256*1024*1024)) > /sys/class/block/zram0/disksize
## Initialize the device by writing zero to the first block
# dd if=/dev/zero of=/dev/zram0 bs=512 count=1
## Read ~256Mb of memory into a file and hope for something interesting
# dd if=/dev/zram0 of=file
This patch will treat an unwritten page as a zero-filled page. If a
page is read before a write has occurred the data returned is all 0's.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
By swapping the total_pages statistic with the lock we close a
hole in the structure for 64-bit CPUs.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add a debug config flag to enable debug printk output and future
debug code.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This change is in a conditional block which is entered only when there is
an existing data block on the freelist where the insert has taken place.
The new block is pushed onto the freelist stack and this conditional block
is updating links in the prior stack head to point to the new stack head.
After this conditional block the first-/second-level indices are updated
to indicate that there is a free block at this location.
This patch adds an immediate return from the conditional block to avoid
setting bits again to indicate a free block on this freelist. The bits
would already be set because there was an existing free block on this
freelist.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On a 64K page kernel, the value PAGE_SIZE passed to
blk_queue_logical_block_size would overflow the logical block size
argument (resulting in setting it to 0).
This patch sets the logical block size to 4096, using a new
ZRAM_LOGICAL_BLOCK_SIZE constant.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
xvmalloc will not currently function with 64K pages. Newly allocated
pages will be inserted at an offset beyond the end of the first-level
index. This tuning is needed to properly size the allocator for 64K
pages.
The default 3 byte shift results in a second level list size which can not
be indexed using the 64 bits of the flbitmap in the xv_pool structure.
The shift must increase to 4 bytes between second level list entries to
fit the size of the first level bitmap.
Here are a few statistics for structure sizes on 32- and 64-bit CPUs
with 4KB and 64KB page sizes.
bits_per_long 32 64 64
page_size 4,096 4,096 65,535
xv_align 4 8 8
fl_delta 3 3 4
num_free_lists 508 508 4,094
xv_pool size 4,144b 8,216b 66,040b
per object overhead 32 64 64
zram struct 0.5GB disk 512KB 1024KB 64KB
This patch maintains the current tunings for 4K pages, adds an optimal
sizing for 64K pages and adds a safe tuning for any other page sizes.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
zram_read() and zram_write() always return zero, so make them return
void to simplify the code.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Not exactly sure if this is a typo or not, due to my search
results comming up with not that many hits. Either its dereferenceable
or dereferencable from the two I choose the later. if it's wrong let me know
and I'll resend.
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Make zram_read() return a bio error if the device is not initialized
instead of pretending nothing happened.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently disksize_store() round down the disk size provided by user.
This is probably not what one would expect, so round up instead.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We can not configure zram device without sysfs anyway, so make zram
depends on it.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This reverts commit 7e24cce38a because it
was never appropriate for mainline.
Do not check for init flag before starting I/O - zram module is unusable
without this fix.
The oops mentioned in the reverted commit message was actually a problem
only with the zram version as present in project's own repository where
we allocate struct zram_stats_cpu upon device initialization. OTOH, In
mainline/staging version of zram, we allocate struct stats upfront, so
this oops cannot happen in mainline version.
Checking for init_done flag in zram_make_request() results in a *no-op*
for any I/O operation since we simply always return success. This flag
is actually set when the first write occurs on a zram disk which
triggers its initialization.
Bug report: https://bugzilla.kernel.org/show_bug.cgi?id=25722
Reported-by: Dennis Jansen <dennis.jansen@web.de>
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes the warning generated by sparse: "Using plain integer
as NULL pointer" by replacing the offending 0s with NULL.
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This was done to handle a number of conflicts in the batman-adv
and winbond drivers properly. It also now allows us to fix up the sysfs
attributes properly that were not in the .37 release due to them being
only in this tree at the time.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
They should be writable by root, not readable.
Doh, stupid me with the wrong flags.
Reported-by: Jonathan Cameron <jic23@cam.ac.uk>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
They should not be writable by any user
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>