Commit Graph

167826 Commits

Author SHA1 Message Date
Linus Torvalds 7fd83b47ce Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

1) GRE tunnel drivers don't set the transport header properly, they also
   blindly deref the inner protocol ipv4 and needs some checks.  Fixes
   from Isaku Yamahata.

2) Fix sleeps while atomic in netdevice rename code, from Eric Dumazet.

3) Fix double-spinlock in solos-pci driver, from Dan Carpenter.

4) More ARP bug fixes.  Fix lockdep splat in arp_solicit() and then the
   bug accidentally added by that fix.  From Eric Dumazet and Cong Wang.

5) Remove some __dev* annotations that slipped back in, as well as all
   HOTPLUG references.  From Greg KH

6) RDS protocol uses wrong interfaces to access scatter-gather elements,
   causing a regression.  From Mike Marciniszyn.

7) Fix build error in cpts driver, from Richard Cochran.

8) Fix arithmetic in packet scheduler, from Stefan Hasko.

9) Similarly, fix association during calculation of random backoff in
   batman-adv.  From Akinobu Mita.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits)
  ipv6/ip6_gre: set transport header correctly
  ipv4/ip_gre: set transport header correctly to gre header
  IB/rds: suppress incompatible protocol when version is known
  IB/rds: Correct ib_api use with gs_dma_address/sg_dma_len
  net/vxlan: Use the underlying device index when joining/leaving multicast groups
  tcp: should drop incoming frames without ACK flag set
  netprio_cgroup: define sk_cgrp_prioidx only if NETPRIO_CGROUP is enabled
  cpts: fix a run time warn_on.
  cpts: fix build error by removing useless code.
  batman-adv: fix random jitter calculation
  arp: fix a regression in arp_solicit()
  net: sched: integer overflow fix
  CONFIG_HOTPLUG removal from networking core
  Drivers: network: more __dev* removal
  bridge: call br_netpoll_disable in br_add_if
  ipv4: arp: fix a lockdep splat in arp_solicit()
  tuntap: dont use a private kmem_cache
  net: devnet_rename_seq should be a seqcount
  ip_gre: fix possible use after free
  ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally
  ...
2012-12-27 10:40:30 -08:00
Yan Burman af9b078e35 net/vxlan: Use the underlying device index when joining/leaving multicast groups
The socket calls from vxlan to join/leave multicast group aren't
using the index of the underlying device, as a result the stack uses
the first interface that is up. This results in vxlan being non functional
over a device which isn't the 1st to be up.
Fix this by providing the iflink field to the vxlan instance
to the multicast calls.

Signed-off-by: Yan Burman <yanb@mellanox.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26 15:09:55 -08:00
Richard Cochran ccb6e984a1 cpts: fix a run time warn_on.
This patch fixes a warning in clk_enable by calling clk_prepare_enable
instead.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26 14:15:09 -08:00
Richard Cochran cbc44dbe1f cpts: fix build error by removing useless code.
The cpts driver tries to obtain the input clock frequency by calling the
clock's internal 'recalc' method. Since <plat/clock.h> has been removed,
this code can no longer compile.

However, the driver never makes use of the frequency value, so this patch
fixes the issue by removing the offending code altogether.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26 14:15:09 -08:00
Linus Torvalds 637704cbc9 Merge branch 'i2c-embedded/for-next' of git://git.pengutronix.de/git/wsa/linux
Pull i2c __dev* attribute removal from Wolfram Sang:
 "The squashed patches from Bill to get rid of the __dev* annotations in
  the i2c subsystem.  I couldn't include it in my previous pull request
  due to some dependency with the mfd subsystem.  I had this patch in
  linux-next for two days before rc1 and nothing popped up."

* 'i2c-embedded/for-next' of git://git.pengutronix.de/git/wsa/linux:
  i2c: remove __dev* attributes from subsystem
2012-12-23 09:48:33 -08:00
Rafael J. Wysocki 9c016d6109 Partly revert "[media] uvcvideo: Set error_idx properly for extended controls API failures"
Commit f0ed2ce840 ("[media] uvcvideo: Set error_idx properly for
extended controls API failures") causes user space to behave incorrectly
on one of my test machines (there is no sound under KDE 4.9.4 using
pulseaudio and there is a knotify4 process occupying one of the CPU
cores 100% of the time).  Reverting that commit entirely fixes the
problem for me.

However, commit f0ed2ce840 appears to do more than it follows from its
changelog, because the changelog only says about the changes related to
ctrls->error_idx, while the commit additionally changes error codes
returned by various functions in uvc_ctrl.c and uvc_v4l2.c.  It turns
out that the changes of the returned error codes confuse the user spce,
so it is sufficient to revert the part of commit f0ed2ce840 not
mentioned in its changelog to fix the problem.

[ 'ENOENT' is not a valid error return from an ioctl to begin with, and
  I don't understand how anybody ever even thought it would be.  - Linus ]

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-23 09:39:14 -08:00
Bill Pemberton 0b255e927d i2c: remove __dev* attributes from subsystem
CONFIG_HOTPLUG is going away as an option.  As result the __dev*
markings will be going away.

Remove use of __devinit, __devexit_p, __devinitdata, __devinitconst,
and __devexit.

Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Acked-by: Peter Korsgaard <peter.korsgaard@barco.com> (for ocores and mux-gpio)
Acked-by: Havard Skinnemoen <hskinnemoen@gmail.com> (for i2c-gpio)
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn> (for puf3)
Acked-by: Barry Song <baohua.song@csr.com> (for sirf)
Reviewed-by: Jean Delvare <khali@linux-fr.org>
[wsa: Fixed "foo* bar" flaws while we are here]
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
2012-12-22 20:13:45 +01:00
Greg KH 03ce758e56 Drivers: network: more __dev* removal
Remove some __dev* markings that snuck in the 3.8-rc1 merge window in
the drivers/net/* directory.

Cc: Bill Pemberton <wfp5p@virginia.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-22 00:03:00 -08:00
Linus Torvalds 4fe19a136a Merge git://www.linux-watchdog.org/linux-watchdog
Pull watchdog updates from Wim Van Sebroeck:
 "This includes some fixes and code improvements (like
  clk_prepare_enable and clk_disable_unprepare), conversion from the
  omap_wdt and twl4030_wdt drivers to the watchdog framework, addition
  of the SB8x0 chipset support and the DA9055 Watchdog driver and some
  OF support for the davinci_wdt driver."

* git://www.linux-watchdog.org/linux-watchdog: (22 commits)
  watchdog: mei: avoid oops in watchdog unregister code path
  watchdog: Orion: Fix possible null-deference in orion_wdt_probe
  watchdog: sp5100_tco: Add SB8x0 chipset support
  watchdog: davinci_wdt: add OF support
  watchdog: da9052: Fix invalid free of devm_ allocated data
  watchdog: twl4030_wdt: Change TWL4030_MODULE_PM_RECEIVER to TWL_MODULE_PM_RECEIVER
  watchdog: remove depends on CONFIG_EXPERIMENTAL
  watchdog: Convert dev_printk(KERN_<LEVEL> to dev_<level>(
  watchdog: DA9055 Watchdog driver
  watchdog: omap_wdt: eliminate goto
  watchdog: omap_wdt: delete redundant platform_set_drvdata() calls
  watchdog: omap_wdt: convert to devm_ functions
  watchdog: omap_wdt: convert to new watchdog core
  watchdog: WatchDog Timer Driver Core: fix comment
  watchdog: s3c2410_wdt: use clk_prepare_enable and clk_disable_unprepare
  watchdog: imx2_wdt: Select the driver via ARCH_MXC
  watchdog: cpu5wdt.c: add missing del_timer call
  watchdog: hpwdt.c: Increase version string
  watchdog: Convert twl4030_wdt to watchdog core
  davinci_wdt: preparation for switch to common clock framework
  ...
2012-12-21 17:10:29 -08:00
Linus Torvalds b49249d103 Miscellaneous device-mapper fixes, cleanups and performance improvements.
Of particular note:
 - Disable broken WRITE SAME support in all targets except linear and striped.
   Use it when kcopyd is zeroing blocks.
 - Remove several mempools from targets by moving the data into the bio's new
   front_pad area(which dm calls 'per_bio_data').
 - Fix a race in thin provisioning if discards are misused.
 - Prevent userspace from interfering with the ioctl parameters and
   use kmalloc for the data buffer if it's small instead of vmalloc.
 - Throttle some annoying error messages when I/O fails.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ1M5OAAoJEK2W1qbAHj1nrQcP/itnnAw8RNsSHBrFMrL9wVnB
 5dmZ1BXPZmEbG+ViU4wzVmRUPHuSHTwhIqH7UFPjyCgbWaz1jaXfpyIxBsxlJi4E
 zuGjv46akANMwH0o/aJDRuEIrCnMtjLrMiY2Oq00lJFvATurwYAKSIgmnwRVdAYy
 gDehJhaymNtHVjhymu33xEn/hqqkQtUbMDj9o+IZppmAw1aQyNuYnwQu3HvcETuz
 /JBcs8isXKIQMJdMLFdGg7lZjLO241UvSwCAeGycKkupHLaYfycumPywgdiNFVUg
 L6pQP9RtAQ+H2VBQ1OIVMJxqiXxQ0xHhyxUYIe3reTar+RXoMA0yK+FiJTwSY1cE
 Xk0s8x2DXwUyu3Vx7UmvgUXnMgd4TIPITYBYiOAanEF/8Xt0voZn8mzNyyzsyFXy
 0u1vMRK+ZK7+QPio9LRh7bgHNK1g5ZyShvwqTMDmtlp+uskaP4iHDDGtVUjFA+Wf
 r9Ms0CXPbXIN6laUIT/4L3LJZtyRWB6e8wuCrUWIWWRbjrMPaPnB+/NlckGJ0CHa
 P/5r1rmLdneTEZ8Vx/2g3fBJ+H2uNQKhYujjnE0HqtHP+tvjt7ernibyU2QhNBeE
 Zy0PXRatY0Xn7UFpn44uJ2qxkWaO5Dloaa4HkWdlWFdR3f/u5MzVjy5mDXLUxkGq
 wj2Z3YkjYjy948MViBhD
 =yzhS
 -----END PGP SIGNATURE-----

Merge tag 'dm-3.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm

Pull dm update from Alasdair G Kergon:
 "Miscellaneous device-mapper fixes, cleanups and performance
  improvements.

  Of particular note:
   - Disable broken WRITE SAME support in all targets except linear and
     striped.  Use it when kcopyd is zeroing blocks.
   - Remove several mempools from targets by moving the data into the
     bio's new front_pad area(which dm calls 'per_bio_data').
   - Fix a race in thin provisioning if discards are misused.
   - Prevent userspace from interfering with the ioctl parameters and
     use kmalloc for the data buffer if it's small instead of vmalloc.
   - Throttle some annoying error messages when I/O fails."

* tag 'dm-3.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: (36 commits)
  dm stripe: add WRITE SAME support
  dm: remove map_info
  dm snapshot: do not use map_context
  dm thin: dont use map_context
  dm raid1: dont use map_context
  dm flakey: dont use map_context
  dm raid1: rename read_record to bio_record
  dm: move target request nr to dm_target_io
  dm snapshot: use per_bio_data
  dm verity: use per_bio_data
  dm raid1: use per_bio_data
  dm: introduce per_bio_data
  dm kcopyd: add WRITE SAME support to dm_kcopyd_zero
  dm linear: add WRITE SAME support
  dm: add WRITE SAME support
  dm: prepare to support WRITE SAME
  dm ioctl: use kmalloc if possible
  dm ioctl: remove PF_MEMALLOC
  dm persistent data: improve improve space map block alloc failure message
  dm thin: use DMERR_LIMIT for errors
  ...
2012-12-21 17:08:06 -08:00
Linus Torvalds 184e251661 Second batch of InfiniBand/RDMA changes for 3.8:
- cxgb4 changes to fix lookup engine hash collisions
  - mlx4 changes to make flow steering usable
  - fix to IPoIB to avoid pinning dst reference for too long
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJQ1NdhAAoJEENa44ZhAt0hjb0P/i0tL4Ux+PvqG/Phh2gaZaQR
 evoi3bw6tCYFzWEJPfur2AJ63svKjyrSrfpvgEZjhthDdYORjIv2Dw2Je1qkOSrf
 5tJtSp3r9D0y35SE/DxNnzlgua9heBqphPlOGpjcKdN83KP7XIyVG2SGyJiEeLza
 owefPx/48jZr8hsw7LB2DlZmNUbVWK00o+pXa/VsUQX/dlIU5hyihAzBjtkwJyT2
 xtiyu9oqXGuv1JW/a16ooPGDaETDLJ1G50NndadUZYWFWj36VrAwW6hOAK3oOikf
 Qa4z3gJVzpSdaC1kiuxERj7GxlRpVUJY0IgHEoMTVrexOz1IsFEP8KEfGLkAYwzB
 jjuXh+Z2+QU5OOO3un0nINRGxKZUSD8Scoa222GBwGnWuCCq68APx2UGTkVkhWon
 FyjhF13WJRbElg2oXzI1cg9lJNv2pf10hXhiy2qdO6tDElVXVQk+KRiDdcNtxS0H
 FYYh3og/DjFwp18j+FVLA9r5AiPuVV5DjNnlwBNjTTMc9RwlOrX/6oCK2kZN24rZ
 l+Nr0gv+h6MAjTPBPYLUP2bsY6wYt5n566mfLea/lir9YeI+Q3PL2sDdGZ7C6xHH
 S4pRW95leP4pEFpHqYg8z3QKPswYqzokgbUSHxg2TVYrV01RC75axXCh0Q90q3RE
 oLP8GDYod2okxhAU2AyN
 =lDXt
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull more infiniband changes from Roland Dreier:
 "Second batch of InfiniBand/RDMA changes for 3.8:
   - cxgb4 changes to fix lookup engine hash collisions
   - mlx4 changes to make flow steering usable
   - fix to IPoIB to avoid pinning dst reference for too long"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA/cxgb4: Fix bug for active and passive LE hash collision path
  RDMA/cxgb4: Fix LE hash collision bug for passive open connection
  RDMA/cxgb4: Fix LE hash collision bug for active open connection
  mlx4_core: Allow choosing flow steering mode
  mlx4_core: Adjustments to Flow Steering activation logic for SR-IOV
  mlx4_core: Fix error flow in the flow steering wrapper
  mlx4_core: Add QPN enforcement for flow steering rules set by VFs
  cxgb4: Add LE hash collision bug fix path in LLD driver
  cxgb4: Add T4 filter support
  IPoIB: Call skb_dst_drop() once skb is enqueued for sending
2012-12-21 16:40:26 -08:00
Eric Dumazet 9fdc6bef5f tuntap: dont use a private kmem_cache
Commit 96442e4242 (tuntap: choose the txq based on rxq)
added a per tun_struct kmem_cache.

As soon as several tun_struct are used, we get an error
because two caches cannot have same name.

Use the default kmalloc()/kfree_rcu(), as it reduce code
size and doesn't have performance impact here.

Reported-by: Paul Moore <pmoore@redhat.com>
Tested-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-21 13:14:01 -08:00
Dan Carpenter 29042073e7 solos-pci: double lock in geos_gpio_store()
There is a typo here so we do a double lock instead of an unlock.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-21 13:14:00 -08:00
Mike Snitzer 45e621d45e dm stripe: add WRITE SAME support
Rename stripe_map_discard to stripe_map_range and reuse it for WRITE
SAME bio processing.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:41 +00:00
Mikulas Patocka 7de3ee57da dm: remove map_info
This patch removes map_info from bio-based device mapper targets.
map_info is still used for request-based targets.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:41 +00:00
Mikulas Patocka ee18026ac6 dm snapshot: do not use map_context
Eliminate struct map_info from dm-snap.

map_info->ptr was used in dm-snap to indicate if the bio was tracked.
If map_info->ptr was non-NULL, the bio was linked in tracked_chunk_hash.

This patch removes the use of map_info->ptr. We determine if the bio was
tracked based on hlist_unhashed(&c->node). If hlist_unhashed is true,
the bio is not tracked, if it is false, the bio is tracked.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:41 +00:00
Mikulas Patocka 59c3d2c6a1 dm thin: dont use map_context
This patch removes endio_hook_pool from dm-thin and uses per-bio data instead.

This patch removes any use of map_info in preparation for the next patch
that removes map_info from bio-based device mapper.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:40 +00:00
Mikulas Patocka 0045d61b5b dm raid1: dont use map_context
Don't use map_info any more in dm-raid1.

map_info was used for writes to hold the region number. For this purpose
we add a new field dm_bio_details to dm_raid1_bio_record.

map_info was used for reads to hold a pointer to dm_raid1_bio_record (if
the pointer was non-NULL, bio details were saved; if the pointer was
NULL, bio details were not saved). We use
dm_raid1_bio_record.details->bi_bdev for this purpose. If bi_bdev is
NULL, details were not saved, if bi_bdev is non-NULL, details were
saved.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:40 +00:00
Mikulas Patocka c7cfdf5973 dm flakey: dont use map_context
Replace map_info with a per-bio structure "struct per_bio_data" in dm-flakey.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:39 +00:00
Mikulas Patocka 89c7cd8974 dm raid1: rename read_record to bio_record
Rename struct read_record to bio_record in dm-raid1.

In the following patch, the structure will be used for both read and
write bios, so rename it.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:39 +00:00
Mikulas Patocka ddbd658f64 dm: move target request nr to dm_target_io
This patch moves target_request_nr from map_info to dm_target_io and
makes it accessible with dm_bio_get_target_request_nr.

This patch is a preparation for the next patch that removes map_info.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:39 +00:00
Mikulas Patocka 42bc954f2a dm snapshot: use per_bio_data
Replace tracked_chunk_pool with per_bio_data in dm-snap.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:38 +00:00
Mikulas Patocka e42c3f914d dm verity: use per_bio_data
Replace io_mempool with per_bio_data in dm-verity.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:38 +00:00
Mikulas Patocka 39cf0ed27e dm raid1: use per_bio_data
Replace read_record_pool with per_bio_data in dm-raid1.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:38 +00:00
Mikulas Patocka c0820cf5ad dm: introduce per_bio_data
Introduce a field per_bio_data_size in struct dm_target.

Targets can set this field in the constructor. If a target sets this
field to a non-zero value, "per_bio_data_size" bytes of auxiliary data
are allocated for each bio submitted to the target. These data can be
used for any purpose by the target and help us improve performance by
removing some per-target mempools.

Per-bio data is accessed with dm_per_bio_data. The
argument data_size must be the same as the value per_bio_data_size in
dm_target.

If the target has a pointer to per_bio_data, it can get a pointer to
the bio with dm_bio_from_per_bio_data() function (data_size must be the
same as the value passed to dm_per_bio_data).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:38 +00:00
Mike Snitzer 70d6c400ac dm kcopyd: add WRITE SAME support to dm_kcopyd_zero
Add WRITE SAME support to dm-io and make it accessible to
dm_kcopyd_zero().  dm_kcopyd_zero() provides an asynchronous interface
whereas the blkdev_issue_write_same() interface is synchronous.

WRITE SAME is a SCSI command that can be leveraged for more efficient
zeroing of a specified logical extent of a device which supports it.
Only a single zeroed logical block is transfered to the target for each
WRITE SAME and the target then writes that same block across the
specified extent.

The dm thin target uses this.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:37 +00:00
Mike Snitzer 4f0b70b047 dm linear: add WRITE SAME support
The linear target can already support WRITE SAME requests so signal
this by setting num_write_same_requests to 1.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:37 +00:00
Mike Snitzer 23508a96cd dm: add WRITE SAME support
WRITE SAME bios have a payload that contain a single page.  When
cloning WRITE SAME bios DM has no need to modify the bi_io_vec
attributes (and doing so would be detrimental).  DM need only alter the
start and end of the WRITE SAME bio accordingly.

Rather than duplicate __clone_and_map_discard, factor out a common
function that is also used by __clone_and_map_write_same.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:37 +00:00
Mike Snitzer d54eaa5a0f dm: prepare to support WRITE SAME
Allow targets to opt in to WRITE SAME support by setting
'num_write_same_requests' in the dm_target structure.

A dm device will only advertise WRITE SAME support if all its
targets and all its underlying devices support it.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:36 +00:00
Mikulas Patocka 9c5091f2ee dm ioctl: use kmalloc if possible
If the parameter buffer is small enough, try to allocate it with kmalloc()
rather than vmalloc().

vmalloc is noticeably slower than kmalloc because it has to manipulate
page tables.

In my tests, on PA-RISC this patch speeds up activation 13 times.
On Opteron this patch speeds up activation by 5%.

This patch introduces a new function free_params() to free the
parameters and this uses new flags that record whether or not vmalloc()
was used and whether or not the input buffer must be wiped after use.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:36 +00:00
Mikulas Patocka 5023e5cf58 dm ioctl: remove PF_MEMALLOC
When allocating memory for the userspace ioctl data, set some
appropriate GPF flags directly instead of using PF_MEMALLOC.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:36 +00:00
Joe Thornber 7960123f2d dm persistent data: improve improve space map block alloc failure message
Improve space map error message when unable to allocate a new
metadata block.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:36 +00:00
Mike Snitzer c397741c76 dm thin: use DMERR_LIMIT for errors
Throttle all errors logged from the IO path by dm thin.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:34 +00:00
Mike Snitzer 89ddeb8cb1 dm persistent data: use DMERR_LIMIT for errors
Nearly all of persistent-data is in the IO path so throttle error
messages with DMERR_LIMIT to limit the amount logged when
something has gone wrong.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:34 +00:00
Mike Snitzer a5bd968aeb dm block manager: reinstate message when validator fails
Reinstate a useful error message when the block manager buffer validator fails.
This was mistakenly eliminated when the block manager was converted to use
dm-bufio.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:34 +00:00
Jonathan Brassow 3a0f9aaee0 dm raid: round region_size to power of two
If the user does not supply a bitmap region_size to the dm raid target,
a reasonable size is computed automatically.  If this is not a power of 2,
the md code will report an error later.

This patch catches the problem early and rounds the region_size to the
next power of two.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:33 +00:00
Joe Thornber 2aab38502d dm thin: cleanup dead code
Remove unused @data_block parameter from cell_defer.
Change thin_bio_map to use many returns rather than setting a variable.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:33 +00:00
Joe Thornber f286ba0eed dm thin: rename cell_defer_except to cell_defer_no_holder
Rename cell_defer_except() to cell_defer_no_holder() which describes
its function more clearly.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:33 +00:00
Mikulas Patocka 9aa0c0e60f dm snapshot: optimize track_chunk
track_chunk is always called with interrupts enabled. Consequently, we
do not need to save and restore interrupt state in "flags" variable.
This patch changes spin_lock_irqsave to spin_lock_irq and
spin_unlock_irqrestore to spin_unlock_irq.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:33 +00:00
Mikulas Patocka 19cbbc60c6 dm raid: use DM_ENDIO_INCOMPLETE
Use a defined macro DM_ENDIO_INCOMPLETE instead of a numeric constant.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:32 +00:00
Mikulas Patocka 7c27213b20 dm raid1: remove impossible mempool_alloc error test
mempool_alloc can't fail if __GFP_WAIT is specified, so the condition
that tests if read_record is non-NULL is always true.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:32 +00:00
Mike Snitzer 018debea8d dm thin: emit ignore_discard in status when discards disabled
If "ignore_discard" is specified when creating the thin pool device then
discard support is disabled for that device.  The pool device's status
should reflect this fact rather than stating "no_discard_passdown"
(which implies discards are enabled but passdown is disabled).

Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:32 +00:00
Joe Thornber e3cbf94513 dm persistent data: fix nested btree deletion
When deleting nested btrees, the code forgets to delete the innermost
btree.  The thin-metadata code serendipitously compensates for this by
claiming there is one extra layer in the tree.

This patch corrects both problems.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:32 +00:00
Joe Thornber 563af186df dm thin: wake worker when discard is prepared
When discards are prepared it is best to directly wake the worker that
will process them.  The worker will be woken anyway, via periodic
commit, but there is no reason to not wake_worker here.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:31 +00:00
Joe Thornber e8088073c9 dm thin: fix race between simultaneous io and discards to same block
There is a race when discard bios and non-discard bios are issued
simultaneously to the same block.

Discard support is expensive for all thin devices precisely because you
have to be careful to quiesce the area you're discarding.  DM thin must
handle this conflicting IO pattern (simultaneous non-discard vs discard)
even though a sane application shouldn't be issuing such IO.

The race manifests as follows:

1. A non-discard bio is mapped in thin_bio_map.
   This doesn't lock out parallel activity to the same block.

2. A discard bio is issued to the same block as the non-discard bio.

3. The discard bio is locked in a dm_bio_prison_cell in process_discard
   to lock out parallel activity against the same block.

4. The non-discard bio's mapping continues and its all_io_entry is
   incremented so the bio is accounted for in the thin pool's all_io_ds
   which is a dm_deferred_set used to track time locality of non-discard IO.

5. The non-discard bio is finally locked in a dm_bio_prison_cell in
   process_bio.

The race can result in deadlock, leaving the block layer hanging waiting
for completion of a discard bio that never completes, e.g.:

INFO: task ruby:15354 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ruby            D ffffffff8160f0e0     0 15354  15314 0x00000000
 ffff8802fb08bc58 0000000000000082 ffff8802fb08bfd8 0000000000012900
 ffff8802fb08a010 0000000000012900 0000000000012900 0000000000012900
 ffff8802fb08bfd8 0000000000012900 ffff8803324b9480 ffff88032c6f14c0
Call Trace:
 [<ffffffff814e5a19>] schedule+0x29/0x70
 [<ffffffff814e3d85>] schedule_timeout+0x195/0x220
 [<ffffffffa06b9bc1>] ? _dm_request+0x111/0x160 [dm_mod]
 [<ffffffff814e589e>] wait_for_common+0x11e/0x190
 [<ffffffff8107a170>] ? try_to_wake_up+0x2b0/0x2b0
 [<ffffffff814e59ed>] wait_for_completion+0x1d/0x20
 [<ffffffff81233289>] blkdev_issue_discard+0x219/0x260
 [<ffffffff81233e79>] blkdev_ioctl+0x6e9/0x7b0
 [<ffffffff8119a65c>] block_ioctl+0x3c/0x40
 [<ffffffff8117539c>] do_vfs_ioctl+0x8c/0x340
 [<ffffffff8119a547>] ? block_llseek+0x67/0xb0
 [<ffffffff811756f1>] sys_ioctl+0xa1/0xb0
 [<ffffffff810561f6>] ? sys_rt_sigprocmask+0x86/0xd0
 [<ffffffff814ef099>] system_call_fastpath+0x16/0x1b

The thinp-test-suite's test_discard_random_sectors reliably hits this
deadlock on fast SSD storage.

The fix for this race is that the all_io_entry for a bio must be
incremented whilst the dm_bio_prison_cell is held for the bio's
associated virtual and physical blocks.  That cell locking wasn't
occurring early enough in thin_bio_map.  This patch fixes this.

Care is taken to always call the new function inc_all_io_entry() with
the relevant cells locked, but they are generally unlocked before
calling issue() to try to avoid holding the cells locked across
generic_submit_request.

Also, now that thin_bio_map may lock bios in a cell, process_bio() is no
longer the only thread that will do so.  Because of this we must be sure
to use cell_defer_except() to release all non-holder entries, that
were added by the other thread, because they must be deferred.

This patch depends on "dm thin: replace dm_cell_release_singleton with
cell_defer_except".

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: stable@vger.kernel.org
2012-12-21 20:23:31 +00:00
Joe Thornber b7ca9c9273 dm thin: replace dm_cell_release_singleton with cell_defer_except
Change existing users of the function dm_cell_release_singleton to share
cell_defer_except instead, and then remove the now-unused function.

Everywhere that calls dm_cell_release_singleton, the bio in question
is the holder of the cell.

If there are no non-holder entries in the cell then cell_defer_except
behaves exactly like dm_cell_release_singleton.  Conversely, if there
*are* non-holder entries then dm_cell_release_singleton must not be used
because those entries would need to be deferred.

Consequently, it is safe to replace use of dm_cell_release_singleton
with cell_defer_except.

This patch is a pre-requisite for "dm thin: fix race between
simultaneous io and discards to same block".

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:31 +00:00
Mike Snitzer c1a94672a8 dm: disable WRITE SAME
WRITE SAME bios are not yet handled correctly by device-mapper so
disable their use on device-mapper devices by setting
max_write_same_sectors to zero.

As an example, a ciphertext device is incompatible because the data
gets changed according to the location at which it written and so the
dm crypt target cannot support it.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Cc: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:30 +00:00
Alasdair G Kergon e910d7ebec dm ioctl: prevent unsafe change to dm_ioctl data_size
Abort dm ioctl processing if userspace changes the data_size parameter
after we validated it but before we finished copying the data buffer
from userspace.

The dm ioctl parameters are processed in the following sequence:
 1. ctl_ioctl() calls copy_params();
 2. copy_params() makes a first copy of the fixed-sized portion of the
    userspace parameters into the local variable "tmp";
 3. copy_params() then validates tmp.data_size and allocates a new
    structure big enough to hold the complete data and copies the whole
    userspace buffer there;
 4. ctl_ioctl() reads userspace data the second time and copies the whole
    buffer into the pointer "param";
 5. ctl_ioctl() reads param->data_size without any validation and stores it
    in the variable "input_param_size";
 6. "input_param_size" is further used as the authoritative size of the
    kernel buffer.

The problem is that userspace code could change the contents of user
memory between steps 2 and 4.  In particular, the data_size parameter
can be changed to an invalid value after the kernel has validated it.
This lets userspace force the kernel to access invalid kernel memory.

The fix is to ensure that the size has not changed at step 4.

This patch shouldn't have a security impact because CAP_SYS_ADMIN is
required to run this code, but it should be fixed anyway.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: stable@kernel.org
2012-12-21 20:23:30 +00:00
Mikulas Patocka 550929faf8 dm persistent data: rename node to btree_node
This patch fixes a compilation failure on sparc32 by renaming struct node.

struct node is already defined in include/linux/node.h. On sparc32, it
happens to be included through other dependencies and persistent-data
doesn't compile because of conflicting declarations.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:30 +00:00
Linus Torvalds 54e37b8dbe vfio pull for v3.8
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQx/6fAAoJECObm247sIsi6BgP/1rgXeiDsc4nGcGHHow1h2fS
 aYbKgNQ9gdRD01fVVeNbkg7XTvpSvfMJ3BK6AxUT2DIT/yAQeSE+To0YtLan6UPq
 PV1zfShlxEqys3QyCShhOGFZgNZ6b0Auy+7KzKDz1ROBhgbEuteXFuZU3QPQ25AR
 TWNTTVzcmvdTSvNAUAiyhF5WWuo8yJe0MU4LhXOTRJ2eaMvuNJF+LmRqUCn9c+q4
 dUO4eYPQv1YUoVR7RLqJKJ7tvkEI6bcLkOAFKz05v0XBrTdJcIUOIfMSWRlogvKD
 egwySDVHt0myhno4VS4rV0Pk2KtwlN1isxYmKBe3yuxjwEha6d3SyjaK3MIEtVB+
 SU8lllwy7pmk/oUyeaQldxPTFpmKBjWmgyeuU1fJZ57v+tK/M3MZiHP9kTClYIoN
 TFor87QWIAieDYSIDOH/OrU3nLFrXCR978OpkaceT2guCTP2wxQr8GDfjq/9h42L
 3wHFQKdxPezFF468nKIZTU9G+em+bSsDdTCaZvRMMPQrnF4D+sgeUcSOSuWiYzLK
 Z39Ms6kfsaWssXfks7xHPWQPDw8S2jjwK+V+liBadFgqeiqtwzXfh+R8b1c8YvpX
 o0uFhQswTbWrp8linNGbdxjRpODVhICEIcDK4UcXTskO1W4J4Y4j2uZrUXLjF9tG
 PsrsKVGBr541NyvCRKu/
 =vfdy
 -----END PGP SIGNATURE-----

Merge tag 'vfio-for-v3.8-v2' of git://github.com/awilliam/linux-vfio

Pull vfio update from Alex Williamson.

* tag 'vfio-for-v3.8-v2' of git://github.com/awilliam/linux-vfio:
  vfio-pci: Enable device before attempting reset
  VFIO: fix out of order labels for error recovery in vfio_pci_init()
  VFIO: use ACCESS_ONCE() to guard access to dev->driver
  VFIO: unregister IOMMU notifier on error recovery path
  vfio-pci: Re-order device reset
  vfio: simplify kmalloc+copy_from_user to memdup_user
2012-12-20 21:30:12 -08:00