sg uses a scheme to reallocate a single contiguous array of all its
pointers for lookup and management. This didn't matter too much when sg
could only attach 256 nodes, but now the maximum has been bumped up to
32k we're starting to push the limits of the maximum allocatable
contiguous memory. The solution to this is to eliminate the static
array and do everything via idr, which this patch does.
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
unsigned short is too small for sizeof(struct scatterlist) *
min(q->max_hw_segments, q->max_phys_segments).
This fixes memory leak with 4096 segments since 16 (likely sg size
with x86) * 4096 sets sglist_len to zero.
This might not happen without sg chaining support.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
coverity spotted this (cid #758). All callers dereference sfp, so we dont
need this check. In addition to this, we dereference it earlier in the
function.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.
Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch (as857) modifies the SG_GET_RESERVED_SIZE and
SG_SET_RESERVED_SIZE ioctls in the sg driver, capping the values at
the device's request_queue's max_sectors value. This will permit
cdrecord to obtain a legal value for the maximum transfer length,
fixing Bugzilla #7026.
The patch also caps the initial reserved_size value. There's no
reason to have a reserved buffer larger than max_sectors, since it
would be impossible to use the extra space.
The corresponding ioctls in the block layer are modified similarly,
and the initial value for the reserved_size is set as large as
possible. This will effectively make it default to max_sectors.
Note that the actual value is meaningless anyway, since block devices
don't have a reserved buffer.
Finally, the BLKSECTGET ioctl is added to sg, so that there will be a
uniform way for users to determine the actual max_sectors value for
any raw SCSI transport.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
For certain LLDs the sg driver can cause on oops
when the transfer length is large and not a
multiple of PAGE_SIZE.
ChangeLog:
- correct the length of the last scatter gather
list element.
- fix some printk()s that have the wrong function
name.
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This sg driver patch addresses the problem with larger
page sizes reported by Brian King in this post:
http://marc.theaimsgroup.com/?l=linux-scsi&m=115867718623631&w=2
Some other related matters are also addressed. Some of these
prevent oopses when the SG_SCATTER_SZ or scatter_elem_sz are
set to inappropriate values.
The scatter_elem_sz has been tested up to 4 MB which should
make the largest data transfer with one SCSI command, 32 MB
less one block, achievable with a relatively small number
of elements in the scatter gather list.
ChangeLog:
- add scatter_elem_sz boot time parameter and sysfs module
parameter that is initialized to SG_SCATTER_SZ
- the driver will then adjust scatter_elem_sz to be the
max(given(scatter_elem_sz), PAGE_SIZE)
It will also round it up, if necessary, to be a power
of two
- clean up sg.h header, correct bad urls and some statements
that are no longer valid
- make the def_reserved_size sysfs module attribute writable
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
There's a problem where sg is executing a ->nopage operation on a
compound page, it actually calls get_page() on the first page in the
compound rather than the page which is being mapped. The fix is to
select the correct page by indexing into the compound.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Conflicts:
drivers/scsi/nsp32.c
drivers/scsi/pcmcia/nsp_cs.c
Removal of randomness flag conflicts with SA_ -> IRQF_ global
replacement.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
I got a NULL derefrence in cdev_del+1 when called from sg_remove. By looking at
the code of sg_add, sg_alloc and sg_remove (all in drivers/scsi/sg.c) I found
out that sg_add is calling sg_alloc but if it fails afterwards it does not
deallocate the space that was allocated in sg_alloc and the redundant entry has
NULL in cdev. When sg_remove is being called, it tries to perform cdev_del to
this NULL cdev and fails.
Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Use ARRAY_SIZE macro instead of sizeof(x)/sizeof(x[0]) and remove
duplicates of the macro.
Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
when the sg driver is unable to setup direct IO, free that scatter
gather list prior to falling back to indirect IO
Further to this thread started by Bryan Holty:
http://marc.theaimsgroup.com/?l=linux-scsi&m=114306885116728&w=2
Here is the reworked patch again. This time it has been
tested with a program provided by Bryan.
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Doug found a bug where if scsi_execute_async fails, we are leaking
sg resources. scsi_do_req never failed so we did not have to handle
that case before.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We currently have two implementations of this obsolete ioctl, one in
the block layer and one in the scsi code. Both of them have drawbacks.
This patch kills the scsi layer version after updating the block version
with the missing bits:
- argument checking
- use scatterlist I/O
- set number of retries based on the submitted command
This is the last user of non-S/G I/O except for the gdth driver, so
getting this in ASAP and through the scsi tree would be nie to kill
the non-S/G I/O path. Jens, what do you think about adding a check
for non-S/G I/O in the midlayer?
Thanks to Or Gerlitz for testing this patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
sg increments the refcount of constituent pages in its higher order memory
allocations when they are about to be mapped by userspace. This is done so
the subsequent get_page/put_page when doing the mapping and unmapping does not
free the page.
Move over to the preferred way, that is, using compound pages instead. This
fixes a whole class of possible obscure bugs where a get_user_pages on a
constituent page may outlast the user mappings or even the driver.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As devfs has been disabled from the kernel tree for a number of months
now (5 to be exact), here's a patch against 2.6.16-rc1-git1 that removes
support for it from the SCSI subsystem.
The patch also removes the scsi_disk devfs_name field as it's no longer
needed.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Change the core SCSI code to use kzalloc rather than kmalloc+memset
where possible.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Remove a hack in the sg driver that alters the total buffer
length for SG_IO commands to ensure buffers are not odd byte
lengths. This breaks on the ipr driver since it requires the
request_bufflen to equal the length specified in the cdb.
The block layer SG_IO code does not appear to have this hack.
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
When the scsi_execute_async interface was added it ended up reducing
the flexibility of userspace to send arbitrary scsi commands through
sg using SG_IO. The SG_IO interface allows userspace to specify the
CDB length. This is now ignored in scsi_execute_async and it is
guessed using the COMMAND_SIZE macro, which is not always correct,
particularly for vendor specific commands. This patch adds a cmd_len
parameter to the scsi_execute_async interface to allow the caller
to specify the length of the CDB.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
sg_page_malloc should clear the data buffer, not that extent of mem_map.
This fixes Jesper's sg_page_free "Bad page states"
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert sg to always send scatterlists, and kill scsi_request usage.
TODO:
- move DIO code to common place or make block layers usable for ULDs.
- move buffer allocation code to common place for all ULDs to use. And
make buffer allocation code obey all queue limits so we can find
out about problems before calling scsi_execute_async. Currently, sg.c
could allocate a buffer that is too large, and send the request
to scsi_execute_async. scsi_execute_async will then check it against
all the queue limits and return a failure in this case. It would nicer
to know about the queue limit violation right away.
- move indirect (copy_to/from_user) paths commone place or make block
layers usable for ULDs.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
sg's st_map_user_pages is modelled on an earlier version of st's
sgl_map_user_pages, and has the same bug: if get_user_pages got some but
not all of the pages, then those got were released, but the positive res
code returned implied that they were still to be freed.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2.6.15-rc1 made sg's st_unmap_user_pages and st's sgl_unmap_user_pages
BUG on a PageReserved page. But that's wrong: they could be unmapping
the ZERO_PAGE, which is marked PG_reserved; and perhaps others (while
get_user_pages is still permitted on VM_PFNMAP areas - that may change).
More change is needed here: sg claims to dirty even pages written from,
and st claims not to dirty even pages read into; and SetPageDirty is not
adequate for this nowadays. Fixes to those follow in a later patch: for
the moment just fix the 2.6.15 regression.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch removes almost all inclusions of linux/version.h. The 3
#defines are unused in most of the touched files.
A few drivers use the simple KERNEL_VERSION(a,b,c) macro, which is
unfortunatly in linux/version.h.
There are also lots of #ifdef for long obsolete kernels, this was not
touched. In a few places, the linux/version.h include was move to where
the LINUX_VERSION_CODE was used.
quilt vi `find * -type f -name "*.[ch]"|xargs grep -El '(UTS_RELEASE|LINUX_VERSION_CODE|KERNEL_VERSION|linux/version.h)'|grep -Ev '(/(boot|coda|drm)/|~$)'`
search pattern:
/UTS_RELEASE\|LINUX_VERSION_CODE\|KERNEL_VERSION\|linux\/\(utsname\|version\).h
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is the drivers/scsi/ part of the big kfree cleanup patch.
Remove pointless checks for NULL prior to calling kfree() in drivers/scsi/.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Acked-by: Kai Makisara <kai.makisara@kolumbus.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove PageReserved() calls from core code by tightening VM_RESERVED
handling in mm/ to cover PageReserved functionality.
PageReserved special casing is removed from get_page and put_page.
All setting and clearing of PageReserved is retained, and it is now flagged
in the page_alloc checks to help ensure we don't introduce any refcount
based freeing of Reserved pages.
MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being
deprecated. We never completely handled it correctly anyway, and is be
reintroduced in future if required (Hugh has a proof of concept).
Once PageReserved() calls are removed from kernel/power/swsusp.c, and all
arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can
be trivially removed.
Last real user of PageReserved is swsusp, which uses PageReserved to
determine whether a struct page points to valid memory or not. This still
needs to be addressed (a generic page_is_ram() should work).
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and
thus mapcounted and count towards shared rss). These writes to the struct
page could cause excessive cacheline bouncing on big systems. There are a
number of ways this could be addressed if it is an issue.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Refcount bug fix for filemap_xip.c
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch uses sg_set_buf/sg_init_one in some places where it was
duplicated.
Signed-off-by: David Hardeman <david@2gen.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This should eliminate (at least in the mid layer) to make numeric
assumptions about any of the enumeration variables. As a side effect,
it will also make all the messages consistent and line us up nicely for
the error logging strategy (if it ever shows itself again).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The previous patch adding the ability to nest struct class_device
changed the paramaters to the call class_device_create(). This patch
fixes up all in-kernel users of the function.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Driver core: pass interface to class intreface methods
Pass interface as argument to add() and remove() class interface
methods. This way a subsystem can implement generic add/remove
handlers and then call interface-specific ones.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A bunch of create_proc_dir_entry() calls creating directories had crept
in since the last sweep; converted to proc_mkdir().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We fix the oops by enforcing the host state model. There have also
been two extra states added: SHOST_CANCEL_RECOVERY and
SHOST_DEL_RECOVERY so we can take the model through host removal while
the recovery thread is active.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Further to the problem discussed in this post:
http://marc.theaimsgroup.com/?l=linux-scsi&m=112540053711489&w=2
It seems that the sg driver does not need to set the VM_IO flag
on pages that it memory maps to the user space since they are
not from the IO space. Ahmed Teirelbar <ahmed.teirelbar@adic.com>
wants the facility and has tested this patch as I have without
adverse effects.
The oops protection is still important. Some users really did
try and use dio transfers from the sg driver to memory mapped
IO space (on a video capture card if my memory serves) during the
lk 2.4 series. I'm not sure how successful it was but that will
now be politely refused in lk 2.6.13+ .
Changelog:
- set the page flags for sg's reserved buffer mmap-ed
to the user space to VM_RESERVED (rather than
VM_RESERVED | VM_IO )
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch adopts the same solution as proposed by Kai M. in
a post titled: "[PATCH] SCSI tape signed/unsigned fix".
The fix is in a function that the sg driver borrowed from
the st driver so its maintenance is a little easier if
the functions remain the same after the fix.
- change nr_pages type from unsigned to signed so errors
from get_user_pages() call are properly handled
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
I know that scsi procfs is legacy code but this is a fix for a memory leak.
While reading through sg.c I realized that the implementation of
/proc/scsi/sg/devices with seq_file is leaking memory due to freeing the
pointer returned by the next() iterator method. Since next() might return
NULL or an error this is wrong. This patch fixes it through using the
seq_files private field for holding the reference to the iterator object.
Here is a small bash script to trigger the leak. Use slabtop to watch
the size-32 usage grow and grow.
#!/bin/sh
while true; do
cat /proc/scsi/sg/devices > /dev/null
done
Signed-off-by: Jan Blunck <j.blunck@tu-harburg.de>
Acked-by: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Migrate the current SCSI host state model to a model like SCSI
device is using.
Signed-off-by: Mike Anderson <andmike@us.ibm.com>
Rejections fixed up and
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
A problem exists todayin the sg driver that if an SG_IO request is
outstanding to a device when it is removed from the system. The
system may oops if that command completes later in time.
1. sg_remove gets called
2. sg_remove calls sg_finish_req_req on all pending requests
This removes the Sg_request's from the headrp list in the Sg_fd
3. The sleeping SG_IO ioctl is woken. It does nothing and returns.
4. The caller closes the fd, which invokes sg_release
5. sg_release calls sg_remove_sfp. It finds no outstanding commands
since the headrp list is empty, so it calls __sg_remove_sfp,
which frees the sfp.
6. Now when sg_cmd_done gets called, sg uses upper_private_data in
the Scsi_Request, which should point to the srp, which has been
freed, so it points to freed memory.
7. sg then dereferences the srp pointer to get the sfp, and we oops.
The fix is to NULL out the upper_private_data field in this path,
which sg_cmd_done already checks for, which will prevent the oops
from occurring.
cpu 0x1: Vector: 300 (Data Access) at [c00000000fff7aa0]
pc: d0000000002bbea8: .sg_cmd_done+0x70/0x394 [sg]
lr: d000000000073304: .scsi_finish_command+0x10c/0x130 [scsi_mod]
sp: c00000000fff7d20
msr: 8000000000009032
dar: 2f70726f63202f78
dsisr: 40000000
current = 0xc0000000024589b0
paca = 0xc0000000003da800
pid = 7, comm = events/1
[c00000000fff7dc0] d000000000073304 .scsi_finish_command+0x10c/0x130 [scsi_mod]
[c00000000fff7e50] d00000000007317c .scsi_softirq+0x140/0x168 [scsi_mod]
[c00000000fff7ef0] c0000000000634dc .__do_softirq+0xa0/0x17c
[c00000000fff7f90] c000000000018430 .call_do_softirq+0x14/0x24
[c00000000ed472e0] c0000000000142e0 .do_softirq+0x74/0x9c
[c00000000ed47370] c000000000013c9c .do_IRQ+0xe8/0x100
[c00000000ed473f0] c00000000000ae34 HardwareInterrupt_entry+0x8/0x54
c00000000003df28 .smp_call_function+0
x100/0x1d0
[c00000000ed47780] c0000000000ba99c .invalidate_bh_lrus+0x30/0x70
[c00000000ed47810] c0000000000b91a0 .invalidate_bdev+0x18/0x3c
[c00000000ed478a0] c0000000000da7b8 .__invalidate_device+0x70/0x94
[c00000000ed47930] c0000000001d40bc .invalidate_partition+0x4c/0x7c
[c00000000ed479c0] c00000000010a944 .del_gendisk+0x48/0x15c
[c00000000ed47a50] d00000000003d55c .sd_remove+0x34/0xe4 [sd_mod]
[c00000000ed47ae0] c0000000001c5d30 .device_release_driver+0x90/0xb4
[c00000000ed47b70] c0000000001c6130 .bus_remove_device+0xb0/0x12c
[c00000000ed47c00] c0000000001c4378 .device_del+0x120/0x198
[c00000000ed47ca0] d00000000007dcdc .scsi_remove_device+0xb4/0x194 [scsi_mod]
[c00000000ed47d30] d0000000000a5864 .ipr_worker_thread+0x1d4/0x27c [ipr]
[c00000000ed47dd0] c0000000000734c4 .worker_thread+0x238/0x2f4
[c00000000ed47ee0] c0000000000796c0 .kthread+0xcc/0x11c
[c00000000ed47f90] c000000000018ad0 .kernel_thread+0x4c/0x6c
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
these have been wrappers for the generic dma direction bits since 2.5.x.
This patch converts the few remaining drivers and removes the macros.
Arjan noticed there's some hunk in here that shouldn't. Updated patch
below:
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We have the scsi_print_* functions in the proper namespace for a long
time now and there weren't a lot users left.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The attachment combines the most recent patch from
Yum Rayan <yum.rayan@gmail.com> (to reduce sg stack
usage), Adrian Bunk <bunk@stusta.de> (to fix check
after use) and me (fix elapsed time calculation
(duration) on ia64 machines).
I have modified the patch from Yum Rayan so kmalloc()
in sg_read() is only called for the (rare) code paths
that need them.
Changelog:
- reduce stack usage in sg_ioctl() and sg_read()
- fix check after use in sg_mmap()
- hold duration internally in milliseconds and
check current time later than held time
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!