Fix TCP FRTO logic so that it always notices when snd_una advances,
indicating that any RTO after that point will be a new and distinct
loss episode.
Previously there was a very specific sequence that could cause FRTO to
fail to notice a new loss episode had started:
(1) RTO timer fires, enter FRTO and retransmit packet 1 in write queue
(2) receiver ACKs packet 1
(3) FRTO sends 2 more packets
(4) RTO timer fires again (should start a new loss episode)
The problem was in step (3) above, where tcp_process_loss() returned
early (in the spot marked "Step 2.b"), so that it never got to the
logic to clear icsk_retransmits. Thus icsk_retransmits stayed
non-zero. Thus in step (4) tcp_enter_loss() would see the non-zero
icsk_retransmits, decide that this RTO is not a new episode, and
decide not to cut ssthresh and remember the current cwnd and ssthresh
for undo.
There were two main consequences to the bug that we have
observed. First, ssthresh was not decreased in step (4). Second, when
there was a series of such FRTO (1-4) sequences that happened to be
followed by an FRTO undo, we would restore the cwnd and ssthresh from
before the entire series started (instead of the cwnd and ssthresh
from before the most recent RTO). This could result in cwnd and
ssthresh being restored to values much bigger than the proper values.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Fixes: e33099f96d ("tcp: implement RFC5682 F-RTO")
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_tw_recycle heavily relies on tcp timestamps to build a per-host
ordering of incoming connections and teardowns without the need to
hold state on a specific quadruple for TCP_TIMEWAIT_LEN, but only for
the last measured RTO. To do so, we keep the last seen timestamp in a
per-host indexed data structure and verify if the incoming timestamp
in a connection request is strictly greater than the saved one during
last connection teardown. Thus we can verify later on that no old data
packets will be accepted by the new connection.
During moving a socket to time-wait state we already verify if timestamps
where seen on a connection. Only if that was the case we let the
time-wait socket expire after the RTO, otherwise normal TCP_TIMEWAIT_LEN
will be used. But we don't verify this on incoming SYN packets. If a
connection teardown was less than TCP_PAWS_MSL seconds in the past we
cannot guarantee to not accept data packets from an old connection if
no timestamps are present. We should drop this SYN packet. This patch
closes this loophole.
Please note, this patch does not make tcp_tw_recycle in any way more
usable but only adds another safety check:
Sporadic drops of SYN packets because of reordering in the network or
in the socket backlog queues can happen. Users behing NAT trying to
connect to a tcp_tw_recycle enabled server can get caught in blackholes
and their connection requests may regullary get dropped because hosts
behind an address translator don't have synchronized tcp timestamp clocks.
tcp_tw_recycle cannot work if peers don't have tcp timestamps enabled.
In general, use of tcp_tw_recycle is disadvised.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure we use the correct address-family-specific function for
handling MTU reductions from within tcp_release_cb().
Previously AF_INET6 sockets were incorrectly always using the IPv6
code path when sometimes they were handling IPv4 traffic and thus had
an IPv4 dst.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Diagnosed-by: Willem de Bruijn <willemb@google.com>
Fixes: 563d34d057 ("tcp: dont drop MTU reduction indications")
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As of 4fddbf5d78 ("sit: strictly restrict incoming traffic to tunnel link device"),
when looking up a tunnel, tunnel's underlying interface (t->parms.link)
is verified to match incoming traffic's ingress device.
However the comparison was incorrectly based on skb->dev->iflink.
Instead, dev->ifindex should be used, which correctly represents the
interface from which the IP stack hands the ipip6 packets.
This allows setting up sit tunnels bound to vlan interfaces (otherwise
incoming ipip6 traffic on the vlan interface was dropped due to
ipip6_tunnel_lookup match failure).
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the Makefile, ehea_phyp.o is included twice in the list of
object files compile into ehea.o.
This change removes one instance.
Signed-off-by: Andreas Ruprecht <rupran@einserver.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
xgene_enet_get_ring_size() returns a negative value in case of an error,
but its only caller in xgene_enet_create_desc_ring() currently uses the
return value directly as u32. Instead, check for a negative value first and
error out in case. Also move the call to xgene_enet_get_ring_size() before
devm_kzalloc() so we don't need to free anything in the error path.
This fixes the following issue reported by the Coverity Scanner:
** CID 1231336: Improper use of negative value (NEGATIVE_RETURNS)
/drivers/net/ethernet/apm/xgene/xgene_enet_main.c: 596 in xgene_enet_create_desc_ring()
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need to init .owner field.
Based on the patch from Peter Griffin <peter.griffin@linaro.org>
"mmc: remove .owner field for drivers using module_platform_driver"
This patch removes the superflous .owner field for drivers which
use the module_platform_driver API, as this is overriden in
platform_driver_register anyway."
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull misc kbuild updates from Michal Marek:
"This is the non-critical part of kbuild for 3.17-rc1:
- make help hint to use make -s with make kernelrelease et al.
- moved a kbuild document to Documentation/kbuild where it belongs
- four new Coccinelle scripts, one dropped and one fixed
- new make kselftest target to run various tests on the kernel"
* 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kbuild: kselftest - new make target to build and run kernel selftests
Coccinelle: Script to replace if and BUG with BUG_ON
Coccinelle: Script to detect incorrect argument to sizeof
Coccinelle: Script to use ARRAY_SIZE instead of division of two sizeofs
Coccinelle: Script to detect cast after memory allocation
coccinelle/null: solve parse error
Documentation: headers_install.txt is part of kbuild
kbuild: make -s should be used with kernelrelease/kernelversion/image_name
Pull kbuild updates from Michal Marek:
- make clean also considers $(extra-m) and $(extra-) to be consistent
- cleanup and fixes in scripts/Makefile.host
- allow to override the name of the Python 2 executable with make
PYTHON=... (only needed for ia64 in practice)
- option to split debugingo into *.dwo files to save disk space if the
compiler supports it (CONFIG_DEBUG_INFO_SPLIT)
- option to use dwarf4 debuginfo if the compiler supports it
(CONFIG_DEBUG_INFO_DWARF4)
- fix for disabling certain warnings with clang
- fix for unneeded rebuild with dash when a command contains
backslashes
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kbuild: Fix handling of backslashes in *.cmd files
kbuild, LLVMLinux: Supress warnings unless W=1-3
Kbuild: Add a option to enable dwarf4 v2
kbuild: Support split debug info v4
kbuild: allow to override Python command name
kbuild: clean-up and bug fix of scripts/Makefile.host
kbuild: clean up scripts/Makefile.host
kbuild: drop shared library support from Makefile.host
kbuild: fix a bug of C++ host program handling
kbuild: fix a typo in scripts/Makefile.host
scripts/Makefile.clean: clean also $(extra-m) and $(extra-)
- MR reregistration support
- MAD support for RMPP in userspace
- iSER and SRP initiator updates
- ocrdma hardware driver updates
- other fixes...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJT7N2GAAoJEENa44ZhAt0hUiIQAKBqYIjpB3QY6Z/B19mxDxku
I81B3OkirumbAaCLoLckvq4gnwQ+BAD+YUmXgP08TCIgrABZAYYw+WvMEY9WNyQB
x3Pv+BzX+wKKNaQkSnB9JVdku+BSI76eW0YYrIX0F0x1o0Jq9JpSkia91KmRvqmX
YSFy2R7BjEZ4lfo/uscydHT26Q6EdT3od4iv48K8qq5rKdjtyYNgD/75DLCN599Z
uI1f98e6Tl+7nHWaioQB61zlYPkNPLAnZtMrY2j4tarTwYwX1KhF3eV7z39L1l81
nhMIXr+qBXtYuZw3I9rKqw3VVCCqB9e6E8FIA5K/d2jWqO+0TqIMUYOuZXuCezWg
o+uGgwbDOweBqwrKRmiR2M0lk2I1Z16jBxYuaUBbLImG0/NPtuUB22t6RbPAojTa
EjDkb9XBA7uFUMrYYnou+HxEzmJUYkin6wgGxtklYEKUqvh8G9ccGt6httWrCSrV
mpjwJv+S4LdFM49wdP993lYthpCoZ42yxEzZ7zJ/KTt17/Wb5F4RtHIROGVFkHdT
mo8RUz5DGDznfNQgn0m/jC3woFnMNpLOduI+CivhrwrWCwMpSUxIjqWu3263pIJ7
+H0kNOKDAbp6+F27j+AVznlyXnaEtYyM8EZnysG1Hkz24gCSWBYo5Ep8eSH8LgY4
9VPc75KVG6uddx5mhg5h
=wOm0
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband/rdma updates from Roland Dreier:
"Main set of InfiniBand/RDMA updates for 3.17 merge window:
- MR reregistration support
- MAD support for RMPP in userspace
- iSER and SRP initiator updates
- ocrdma hardware driver updates
- other fixes..."
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (52 commits)
IB/srp: Fix return value check in srp_init_module()
RDMA/ocrdma: report asic-id in query device
RDMA/ocrdma: Update sli data structure for endianness
RDMA/ocrdma: Obtain SL from device structure
RDMA/uapi: Include socket.h in rdma_user_cm.h
IB/srpt: Handle GID change events
IB/mlx5: Use ARRAY_SIZE instead of sizeof/sizeof[0]
IB/mlx4: Use ARRAY_SIZE instead of sizeof/sizeof[0]
RDMA/amso1100: Check for integer overflow in c2_alloc_cq_buf()
IPoIB: Remove unnecessary test for NULL before debugfs_remove()
IB/mad: Add user space RMPP support
IB/mad: add new ioctl to ABI to support new registration options
IB/mad: Add dev_notice messages for various umad/mad registration failures
IB/mad: Update module to [pr|dev]_* style print messages
IB/ipoib: Avoid multicast join attempts with invalid P_key
IB/umad: Update module to [pr|dev]_* style print messages
IB/ipoib: Avoid flushing the workqueue from worker context
IB/ipoib: Use P_Key change event instead of P_Key polling mechanism
IB/ipath: Add P_Key change event support
mlx4_core: Add support for secure-host and SMP firewall
...
Benjamin Herrenschmidt pointed out that I further missed modifying
update_vsyscall after the wall_to_mono value was changed to a
timespec64. This causes issues on powerpc32, which expects a 32bit
timespec.
This patch fixes the problem by properly converting from a timespec64 to
a timespec before passing the value on to the arch-specific vsyscall
logic.
[ Thomas is currently on vacation, but reviewed it and wanted me to send
this fix on to you directly. ]
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge leftovers from Andrew Morton:
"A few leftovers.
I have a bunch of OCFS2 patches which are still out for review and
which I might sneak along after -rc1. Partly my fault - I should send
my review pokes out earlier"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: fix CROSS_MEMORY_ATTACH help text grammar
drivers/mfd/rtsx_usb.c: export device table
mm, hugetlb_cgroup: align hugetlb cgroup limit to hugepage size
The rtsx_usb driver contains the table for the devices it supports but
doesn't export it. As a result, no alias is generated and it doesn't
get loaded automatically.
Via https://bugzilla.novell.com/show_bug.cgi?id=890096
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reported-by: Marcel Witte <wittemar@googlemail.com>
Cc: Roger Tseng <rogerable@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Memcg aligns memory.limit_in_bytes to PAGE_SIZE as part of the resource
counter since it makes no sense to allow a partial page to be charged.
As a result of the hugetlb cgroup using the resource counter, it is also
aligned to PAGE_SIZE but makes no sense unless aligned to the size of
the hugepage being limited.
Align hugetlb cgroup limit to hugepage size.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are firmwares which don't support scheduled scan.
Disable it for now.
Linus's system encoutered this issue.
Thanks to David Spinadel for his help.
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Pull more powerpc updates from Ben Herrenschmidt:
"Here are some more powerpc bits for 3.17, essentially fixes.
The biggest series, also aimed at -stable, is from Aneesh and is the
result of weeks and weeks of debugging to find out why the heck or THP
implementation was occasionally triggering multi-hit errors in our
level 1 TLB. It ended up being a combination of issues including
subtleties as to how we should invalidate those special 'MPSS' pages
we use to allow the use of 16M pages inside 4K/64K "base page size"
segments (you really have to love our MMU !)
Another interesting one in the "OMG" category is the series from
Michael adding memory barriers to spin_is_locked(). That's also the
result of many days of debugging to figure out why the semaphore code
would occasionally crash in ways that made no sense. It ended up
being some creative lock stacking that was defeated by the fact that
our locks allow a load inside the locked section to be re-ordered with
the load of the lock value itself (I'm still of two mind about whether
to kill that once and for all by putting a heavier barrier back into
our lock implementation...). The fixes come with a long explanation
in the cset comments, feel free to read it if you feel like having a
headache today"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (25 commits)
powerpc/thp: Add tracepoints to track hugepage invalidate
powerpc/mm: Use read barrier when creating real_pte
powerpc/thp: Use ACCESS_ONCE when loading pmdp
powerpc/thp: Invalidate with vpn in loop
powerpc/thp: Handle combo pages in invalidate
powerpc/thp: Invalidate old 64K based hash page mapping before insert of 4k pte
powerpc/thp: Don't recompute vsid and ssize in loop on invalidate
powerpc/thp: Add write barrier after updating the valid bit
powerpc: reorder per-cpu NUMA information's initialization
powerpc/perf/hv-24x7: Use kmem_cache_free
powerpc/pseries/hvcserver: Fix endian issue in hvcs_get_partner_info
powerpc: Hard disable interrupts in xmon
powerpc: remove duplicate definition of TEXASR_FS
powerpc/pseries: Avoid deadlock on removing ddw
powerpc/pseries: Failure on removing device node
powerpc/boot: Use correct zlib types for comparison
powerpc/powernv: Interface to register/unregister opal dump region
printk: Add function to return log buffer address and size
powerpc: Add POWER8 features to CPU_FTRS_POSSIBLE/ALWAYS
powerpc/ppc476: Disable BTAC
...
Pull seccomp fix from James Morris.
BUG(!spin_is_locked()) really doesn't work very well in UP
configurations without any actual spinlock state. Which is very much
why we have that "assert_spin_lock()" function for this.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
seccomp: Replace BUG(!spin_is_locked()) with assert_spin_lock
In case of error, the function create_workqueue() returns NULL pointer
not ERR_PTR(). The IS_ERR() test in the return value check should be
replaced with NULL test.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
window:
Group changes to the device tree. In preparation for adding device tree
overlay support, OF_DYNAMIC is reworked so that a set of device tree
changes can be prepared and applied to the tree all at once. OF_RECONFIG
notifiers see the most significant change here so that users always get
a consistent view of the tree. Notifiers generation is moved from before
a change to after it, and notifiers for a group of changes are emitted
after the entire block of changes have been applied
Automatic console selection from DT. Console drivers can now use
of_console_check() to see if the device node is specified as a console
device. If so then it gets added as a preferred console. UART devices
get this support automatically when uart_add_one_port() is called.
DT unit tests no longer depend on pre-loaded data in the device tree.
Data is loaded dynamically at the start of unit tests, and then unloaded
again when the tests have completed.
Also contains a few bugfixes for reserved regions and early memory setup.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJT6M7XAAoJEMWQL496c2LNTTgP/2rXyrTTGZpK/qrLKHWKYHvr
XL7tcTkhA0OLU64E37fB+xtDXyBYnLsharuwUFSd1LlL1Wnc1cZN5ORRlMJbmjUR
Wvwl0A8/mkhGl4tzzKgJ4z4rMJXvlZfJRpnVoRB5FOn90LI7k/jsf5rIwF/6S90B
6D6II0r4RG9ku1m7g70cToxcIFCzp0V+eu2tym9GnhsyGKlunPT9iNiTpwfVhPAj
QUvMPKIQXReOv6xDU5q6E07839IMf7SdAvciBTHGnCDD7sGziHvnBIShj/2vTDAF
27sGRKrWUnnuxRUMOoIudiYyeHXIdt1WXp6FsS/ztVI37Ijh9YPShJwWGhQDppnp
4tcoSdefqw9IRUajAVWsB0RUW/tCL4a7tggWofylOA6itDi+HZnw6YafE1G1YzxF
q8OFo9uqLcmFQfHDJpk+sdtXoMZzOgrxlEscsGsQ8kd2Uoe8+chgR9EY1sqPkWVF
Zw0FJCTB6spBlsnXeboBGrnvpbPkacwhvesIFO0IANy4j4R55xlEeTcs1fe3ubUf
UhNyyFdnCAUA7e5CAabcAQYsdbEKG6v0Il3H6xts36c8lXCSFXVgNcw5mdCpFCcQ
DZ3/1FGSVzkYNXX8hlzIn1W0dqvwn4x0ZbnpXUJrGUBpSmjt9qkXx0XbdeFwartU
7X4tNGS1jfNYhOdP87jF
=8IFz
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux
Pull device tree updates from Grant Likely:
"The branch contains the following device tree changes the v3.17 merge
window:
Group changes to the device tree. In preparation for adding device
tree overlay support, OF_DYNAMIC is reworked so that a set of device
tree changes can be prepared and applied to the tree all at once.
OF_RECONFIG notifiers see the most significant change here so that
users always get a consistent view of the tree. Notifiers generation
is moved from before a change to after it, and notifiers for a group
of changes are emitted after the entire block of changes have been
applied
Automatic console selection from DT. Console drivers can now use
of_console_check() to see if the device node is specified as a console
device. If so then it gets added as a preferred console. UART
devices get this support automatically when uart_add_one_port() is
called.
DT unit tests no longer depend on pre-loaded data in the device tree.
Data is loaded dynamically at the start of unit tests, and then
unloaded again when the tests have completed.
Also contains a few bugfixes for reserved regions and early memory
setup"
* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux: (21 commits)
of: Fixing OF Selftest build error
drivers: of: add automated assignment of reserved regions to client devices
of: Use proper types for checking memory overflow
of: typo fix in __of_prop_dup()
Adding selftest testdata dynamically into live tree
of: Add todo tasklist for Devicetree
of: Transactional DT support.
of: Reorder device tree changes and notifiers
of: Move dynamic node fixups out of powerpc and into common code
of: Make sure attached nodes don't carry along extra children
of: Make devicetree sysfs update functions consistent.
of: Create unlocked versions of node and property add/remove functions
OF: Utility helper functions for dynamic nodes
of: Move CONFIG_OF_DYNAMIC code into a separate file
of: rename of_aliases_mutex to just of_mutex
of/platform: Fix of_platform_device_destroy iteration of devices
of: Migrate of_find_node_by_name() users to for_each_node_by_name()
tty: Update hypervisor tty drivers to use core stdout parsing code.
arm/versatile: Add the uart as the stdout device.
of: Enable console on serial ports specified by /chosen/stdout-path
...
- Enable support for bus reset on device release
- Fixes for EEH support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJT64V6AAoJECObm247sIsimLkQAKL8sdGu2xwjujB/JFQrTXq0
9jN42aNhv50fVE9p2FKoqet/6VlQmRxLUvA9By94iMIKsUb3w0Rziti1FI3iyKtb
7lPzhCY9GWFnMD7Ero0nx+ljYrDiMlDdjeJ1ke25f1WZKMv524FJvA3cbiCvnAm0
3CRV0buq69deiiy5j1dbJdwYfPTJU7Jepqg8/OkjgJYokKRwqng9QfUgRbNwfL5d
FuVWaiGKvkxj4Sil0M3m4Ogf2yqqweX0am/rC3pGoO9woc9XFV560ABoEvXe3PI/
TPq3X63eqiBtu+t0qnYdi9jAfV62NMQEOwgb1z3IrIRoR9pwCOR2UFVQSr9X+rMp
EGrGUNkgU9cTzGeiSPmTII/ccYgsFpivy2UV+lNeDs4kraU++Q3QoJ2jwbzx8unF
nzyEfW8BizUenBZHsLvBDrlDddn5uEgao9ivdc9yBhjtOQshA8xMUKFG1PHzqtYg
Vis9/K2TKrZBcQTBD7nwCWWPLUg11N7stD6Ui0DEAEqJtGWxCNSqcKGTen1NvsOp
hpFdlySYORNvPQDMMBmjMc+YkngBwcwDaITE5FFMbyDJYymKA28MtkYaDVLZp5kZ
yYua3afxxDDaAzh7ikzUMpBlsSxbRpanAeTtEMQ0s2+ZfkzdvKztmTe9S26swQXG
p9hb4pIJNflk014lnNt1
=ADuE
-----END PGP SIGNATURE-----
Merge tag 'vfio-v3.17-rc1' of git://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- enable support for bus reset on device release
- fixes for EEH support
* tag 'vfio-v3.17-rc1' of git://github.com/awilliam/linux-vfio:
drivers/vfio: Enable VFIO if EEH is not supported
drivers/vfio: Allow EEH to be built as module
drivers/vfio: Fix EEH build error
vfio-pci: Attempt bus/slot reset on release
vfio-pci: Use mutex around open, release, and remove
vfio-pci: Release devices with BusMaster disabled
- Fix ARM build.
- Fix boot crash with PVH guests.
- Improve reliability of resume/migration.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJT6y4zAAoJEFxbo/MsZsTRqI4H/3c+TXRjeB5787wkm3Lya1JO
5+DsbbtlM13PT+Oi0Zop7xUYMPya7Xkwix0FvFvJ/q0/NFMf6RDJREtRy7atRkkp
IRKccsGA/OH0ETcmQJOGHErnK51cAjMrm+ilJHIIYd6btEVCuP212G9f8CVGMyIK
r0U6sVUxTJ5TPOLjvv5e8Rz+iTtP+OXvxaf2+rcqAf1QAJJ6S6+ipJZjZemX8OR/
IoUPiee5Ou/ogQekt0f473vJrxX5nP7wLdJ2F7YA/A6F/HQHNbYDgQXf2IXrqFmb
DpkpXxaowZGKXwnwqHUr3pVDdeMpvYJKZDPZmTQq2htrXufHIiY/E3qr9wGjqxk=
=iVy1
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.17-b-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull Xen bugfixes from David Vrabel:
- fix ARM build
- fix boot crash with PVH guests
- improve reliability of resume/migration
* tag 'stable/for-linus-3.17-b-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: use vmap() to map grant table pages in PVH guests
x86/xen: resume timer irqs early
arm/xen: remove duplicate arch_gnttab_init() function
allow thin snapshots to be larger than the external origin.
. Add support for quickly loading a repetitive pattern into the
dm-switch target.
. Use per-bio data in the dm-crypt target instead of always using a
mempool for each allocation. Required switching to kmalloc alignment
for the bio slab.
. Fix DM core to properly stack the QUEUE_FLAG_NO_SG_MERGE flag
. Fix the dm-cache and dm-thin targets' export of the minimum_io_size to
match the data block size -- this fixes an issue where mkfs.xfs would
improperly infer raid striping was in place on the underlying storage.
. Small cleanups in dm-io, dm-mpath and dm-cache
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJT64yEAAoJEMUj8QotnQNatjQH/2mqm8EtPuZas70zHVDzjMlE
ZyV8xgHpU0MBmiBi+JhUBv9iKX4sVa+C25559WkKtxRVMnZmI1WDry4TagiqrhnK
9o/uvdWigJMR+uwahwe4UErEtKscOQJD30a8taN/suJ6Z2C7XJJRUZPsyL4a3Vov
w+UIi7aYDEGp/2VQ8mvTTxjdF5x5km4wKsjBTs03uTrrkEJ+bIUndl2I1X+X4bsw
kiWYOQwmcnD8GwYkSrthJYLsS3Hjur/J/My7KZwXc00ANLOexqHdKfRDwH8b36+m
olKXv3swCd8vi+jJYEYzuW9213ACsSEGP7h8NFVZ/+2FeDsSzB/C7zjW9okIUIw=
=y/3r
-----END PGP SIGNATURE-----
Merge tag 'dm-3.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper changes from Mike Snitzer:
- Allow the thin target to paired with any size external origin; also
allow thin snapshots to be larger than the external origin.
- Add support for quickly loading a repetitive pattern into the
dm-switch target.
- Use per-bio data in the dm-crypt target instead of always using a
mempool for each allocation. Required switching to kmalloc alignment
for the bio slab.
- Fix DM core to properly stack the QUEUE_FLAG_NO_SG_MERGE flag
- Fix the dm-cache and dm-thin targets' export of the minimum_io_size
to match the data block size -- this fixes an issue where mkfs.xfs
would improperly infer raid striping was in place on the underlying
storage.
- Small cleanups in dm-io, dm-mpath and dm-cache
* tag 'dm-3.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm table: propagate QUEUE_FLAG_NO_SG_MERGE
dm switch: efficiently support repetitive patterns
dm switch: factor out switch_region_table_read
dm cache: set minimum_io_size to cache's data block size
dm thin: set minimum_io_size to pool's data block size
dm crypt: use per-bio data
block: use kmalloc alignment for bio slab
dm table: make dm_table_supports_discards static
dm cache metadata: use dm-space-map-metadata.h defined size limits
dm cache: fail migrations in the do_worker error path
dm cache: simplify deferred set reference count increments
dm thin: relax external origin size constraints
dm thin: switch to an atomic_t for tracking pending new block preparations
dm mpath: eliminate pg_ready() wrapper
dm io: simplify dec_count and sync_io
- Forward compatibility for eMMC.
- Fix some blacklisted cards with broken secure discard.
MMC host:
- mmci: Add support for Qualcomm variant.
- mmci: Fix regression for arm_variant.
- sdhci: Various fixes and cleanups.
- sdhci: Improve external VDD regulator support.
- sdhci: Support for DDR50 1.8V mode for BayTrail.
- sdhci-st: Add driver for ST SDHCI controller.
- sh-mmcif: DMA fixes.
- omap_hsmmc: Add support for SDIO interrupts.
- sdhci-pci: Add support for Intel Quark X1000.
- dw_mmc: Update the reset sequence.
- s3cmci: port DMA code to dmaengine API.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJT6K/QAAoJEP4mhCVzWIwpj5UP/3KWAsUCz5PlPaTxWgh/U+aN
lwO8aryFGjxfengCtJaw4cVksg3Lc5EvaSweg7BUqTucHoq/epdAU+Doqf5OVZ9Y
kicfx+WOAwIGTTuY9Wu7W884WgF+litZW4O4LtQnr6hHzx+xQhPabTc6zKJVS7WD
/Fx1Qcc0WFdGA3D0VdpCqGa7Y8IYk8cEP16MOA4s0HxZxONkjFbPdpQRF/zywGOA
tqJC7kAhlY+Rx5bsrr5Lvbhu2Ut+cfLBPdPYuJz50wnymQ15IrLj5n4rSd5GLY+f
HjMeCJzDHi1gk26wn3kYyWwz3JkJGWhtf6fC1hDqvkWEmHd872v7EuU9V4D3ehV1
xPVVVyUYqMgs0Vi+PXAlMhoajrwgnlM1Ox0sheOOefcaRoFVpV6PZOuAsAOfKRfC
ySOuEhOKn5OOvjH2TB4dNH1a3hegR9B3fJqjFe4UmUQP4FBicOfUKi+n6HwC+Omb
VDCuVvwmMCVPi/Lop6thpPepJBfR8Ncb3vanLpK64LcIjEfoPTwoAyNI32Qnry52
XMO4Tppl+bPzglGpV427KXlFbBwccp+8sCMRc/4lRiIMH9QhbZfVA1FcxwpvZ419
6SKXmqhA62nHLcWlQrRlvdnNyr+Ow2o5gJ0iVffByfmzcgPmja9YZYyDfx+DxWOo
W1oItq+qMd25rnnOPTLB
=kGSY
-----END PGP SIGNATURE-----
Merge tag 'mmc-v3.17-1' of git://git.linaro.org/people/ulf.hansson/mmc
Pull MMC updates from Ulf Hansson:
"Me and Chris Ball decided to try out using my MMC tree as the primary
one, to simplify handling of patches.
This pull does thus contains all the MMC patches for 3.17 rc1, no pull
from Chris this time.
Details:
MMC core:
- forward compatibility for eMMC
- fix some blacklisted cards with broken secure discard
MMC host:
- mmci: Add support for Qualcomm variant
- mmci: Fix regression for arm_variant
- sdhci: Various fixes and cleanups
- sdhci: Improve external VDD regulator support
- sdhci: Support for DDR50 1.8V mode for BayTrail
- sdhci-st: Add driver for ST SDHCI controller
- sh-mmcif: DMA fixes
- omap_hsmmc: Add support for SDIO interrupts
- sdhci-pci: Add support for Intel Quark X1000
- dw_mmc: Update the reset sequence
- s3cmci: port DMA code to dmaengine API"
* tag 'mmc-v3.17-1' of git://git.linaro.org/people/ulf.hansson/mmc: (67 commits)
mmc: dw_mmc: modify the dt-binding for removing slot-node and supports-highspeed
mmc: dw_mmc: Slot quirk "disable-wp" is deprecated.
mmc: mmci: Reverse IRQ handling for the arm_variant
mmc: mmci: Move all CMD irq handling to mmci_cmd_irq()
mmc: mmci: Remove redundant check of status for DATA irq
mmc: dw_mmc: change to use recommended reset procedure
mmc: sdhci-pxav3: Use devm_* managed helpers
mmc: tmio: Configure DMA slave bus width
mmc: sh_mmcif: Configure DMA slave bus width
mmc: sh_mmcif: Fix DMA slave address configuration
mmc: sh_mmcif: Document DT bindings
mmc: sdhci-pci: remove PCI PM functions in suspend/resume callback
mmc: Do not advertise secure discard if it is blacklisted
mmc: sdhci-msm: Get COMPILE_TEST support
mmc: sdhci-msm: Remove unnecessary header file inclusion
mmc: sdhci-msm: Fix the binding example
mmc: sdhci: add DDR50 1.8V mode support for BayTrail eMMC Controller
mmc: sdhci: Preset value not supported in Baytrail eMMC
mmc: MMC_USDHI6ROL0 should depend on HAS_DMA
mmc: MMC_SH_MMCIF should depend on HAS_DMA
...
Pull block driver changes from Jens Axboe:
"Nothing out of the ordinary here, this pull request contains:
- A big round of fixes for bcache from Kent Overstreet, Slava Pestov,
and Surbhi Palande. No new features, just a lot of fixes.
- The usual round of drbd updates from Andreas Gruenbacher, Lars
Ellenberg, and Philipp Reisner.
- virtio_blk was converted to blk-mq back in 3.13, but now Ming Lei
has taken it one step further and added support for actually using
more than one queue.
- Addition of an explicit SG_FLAG_Q_AT_HEAD for block/bsg, to
compliment the the default behavior of adding to the tail of the
queue. From Douglas Gilbert"
* 'for-3.17/drivers' of git://git.kernel.dk/linux-block: (86 commits)
bcache: Drop unneeded blk_sync_queue() calls
bcache: add mutex lock for bch_is_open
bcache: Correct printing of btree_gc_max_duration_ms
bcache: try to set b->parent properly
bcache: fix memory corruption in init error path
bcache: fix crash with incomplete cache set
bcache: Fix more early shutdown bugs
bcache: fix use-after-free in btree_gc_coalesce()
bcache: Fix an infinite loop in journal replay
bcache: fix crash in bcache_btree_node_alloc_fail tracepoint
bcache: bcache_write tracepoint was crashing
bcache: fix typo in bch_bkey_equal_header
bcache: Allocate bounce buffers with GFP_NOWAIT
bcache: Make sure to pass GFP_WAIT to mempool_alloc()
bcache: fix uninterruptible sleep in writeback thread
bcache: wait for buckets when allocating new btree root
bcache: fix crash on shutdown in passthrough mode
bcache: fix lockdep warnings on shutdown
bcache allocator: send discards with correct size
bcache: Fix to remove the rcu_sched stalls.
...
Pull block core bits from Jens Axboe:
"Small round this time, after the massive blk-mq dump for 3.16. This
pull request contains:
- Fixes for max_sectors overflow in ioctls from Akinoby Mita.
- Partition off-by-one bug fix in aix partitions from Dan Carpenter.
- Various small partition cleanups from Fabian Frederick.
- Fix for the block integrity code sometimes returning the wrong
vector count from Gu Zheng.
- Cleanup an re-org of the blk-mq queue enter/exit percpu counters
from Tejun. Dependent on the percpu pull for 3.17 (which was in
the block tree too), that you have already pulled in.
- A blkcg oops fix, also from Tejun"
* 'for-3.17/core' of git://git.kernel.dk/linux-block:
partitions: aix.c: off by one bug
blkcg: don't call into policy draining if root_blkg is already gone
Revert "bio: modify __bio_add_page() to accept pages that don't start a new segment"
bio: modify __bio_add_page() to accept pages that don't start a new segment
block: fix SG_[GS]ET_RESERVED_SIZE ioctl when max_sectors is huge
block: fix BLKSECTGET ioctl when max_sectors is greater than USHRT_MAX
block/partitions/efi.c: kerneldoc fixing
block/partitions/msdos.c: code clean-up
block/partitions/amiga.c: replace nolevel printk by pr_err
block/partitions/aix.c: replace count*size kzalloc by kcalloc
bio-integrity: add "bip_max_vcnt" into struct bio_integrity_payload
blk-mq: use percpu_ref for mq usage count
blk-mq: collapse __blk_mq_drain_queue() into blk_mq_freeze_queue()
blk-mq: decouble blk-mq freezing from generic bypassing
block, blk-mq: draining can't be skipped even if bypass_depth was non-zero
blk-mq: fix a memory ordering bug in blk_mq_queue_enter()
There's no need to call locks_free_lock here while still holding the
i_lock. Defer that until the lock has been dropped.
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
In commit 72f98e7255 (locks: turn lock_flocks into a spinlock), we
moved from using the BKL to a global spinlock. With this change, we lost
the ability to block in the fl_release_private operation.
This is problematic for NFS (and probably some other filesystems as
well). Add a new list_head argument to locks_delete_lock. If that
argument is non-NULL, then queue any locks that we want to free to the
list instead of freeing them.
Then, add a new locks_dispose_list function that will walk such a list
and call locks_free_lock on them after the i_lock has been dropped.
Finally, change all of the callers of locks_delete_lock to pass in a
list_head, except for lease_modify. That function can be called long
after the i_lock has been acquired. Deferring the freeing of a lease
after unlocking it in that function is non-trivial until we overhaul
some of the spinlocking in the lease code.
Currently though, no filesystem that sets fl_release_private supports
leases, so this is not currently a problem. We'll eventually want to
make the same change in the lease code, but it needs a lot more work
before we can reasonably do so.
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Currently in the case where a new file lock completely replaces the old
one, we end up overwriting the existing lock with the new info. This
means that we have to call fl_release_private inside i_lock. Change the
code to instead copy the info to new_fl, insert that lock into the
correct spot and then delete the old lock. In a later patch, we'll defer
the freeing of the old lock until after the i_lock has been dropped.
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Conexnat HD-audio driver has a workaround for cx5051 (aka CX20561)
chip to add fake mute controls to each amp (commit 3868137e). This
implies the minimum-as-mute TLV bit in TLV for each corresponding
control. Meanwhile we build the virtual master from these, but the
TLV bit is missing, even though the slaves have it.
This patch simply adds the missing TLV_DB_SCALE_MUTE bit for vmaster,
as already done in patch_sigmatel.c.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Perform a pci_claim_resource() on all valid resources discovered
during the OF device tree scan.
Based almost entirely upon the PCI OF bus probing code which does
the same thing there.
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems that when a PCI Express bridge is not in use and has no devices
behind it, the ranges property is bogus. Specifically the size property
is of the form [0xffffffff:...], and if you add this size to the resource
start address the 64-bit calculation will overflow.
Just check specifically for this size value signature and skip them.
Signed-off-by: David S. Miller <davem@davemloft.net>
Dump the various aspects of the PCI bridge probed at boot time, most
importantly the bridge number ranges, and the ranges property.
This helps diagnose PCI resource issues and other problems by giving
ofpci_debug=1 on the boot command line.
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2014-08-12
This series contains updates to i40e and e1000e.
Lucas provides a fix for i40e to resolve a compile issue where a header
was missing in the #includes.
Wei Yongjun provides a fix for i40e to resolve a sparse warning, where
a non-static function should be static.
Julia Lawall provides a fix for i40e which was found using Coccinelle,
where there was a typo in the name of the type given to sizeof().
Rickard Strandqvist provides a fix for i40e to replace the use of
strncpy() with strlcpy() to avoid strings that lack null termination.
Jean Sacren provides two e1000e fixes, first is a comment fix and second
removes an excessive space character in a debug message.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Liu says:
====================
xen-netback: synchronisation between core driver and netback
The zero-copy netback has far more interactions with core network driver than
the old copying backend. One significant thing is that netback now relies on
a callback from core driver to correctly release resources.
However correct synchronisation between core driver and netback is missing.
Currently netback relies on a loop to wait for core driver to release
resources. This is proven not enough and erroneous recently, partly due to code
structure, partly due to missing synchronisation. Short-live domains like
OpenMirage unikernels can easily trigger race in backend, rendering backend
unresponsive.
This patch series aims to slove this issue by introducing proper
synchronisation between core driver and netback.
Chagges in v4:
* avoid using wait queue
* remove dedicated loop for netif_napi_del
* remove unnecessary check on callback
Change in v3: improve commit message in patch 1
Change in v2: fix Zoltan's email address in commit message
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The original implementation relies on a loop to check if all inflight
packets are freed. Now we have proper reference counting, there's no
need to use loop anymore.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reference count the number of packets in host stack, so that we don't
stop the deallocation thread too early. If not, we can end up with
xenvif_free permanently waiting for deallocation thread to unmap grefs.
Reported-by: Thomas Leonard <talex5@gmail.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Originally netif_napi_add was in xenvif_init_queue and netif_napi_del
was in xenvif_deinit_queue, while kthreads were handled in
xenvif_connect and xenvif_disconnect. Move netif_napi_add and
netif_napi_del to xenvif_connect and xenvif_disconnect so that they
reside together with kthread operations.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Liu says:
====================
xen-netback: fix debugfs code
This small series fixes two problems in xen-netback debugfs code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The original code is bogus. The function gets called in a loop which
leaks entries created in previous rounds.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Zoltan Kiss <zoltan.kiss@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enlarge buffer size and check input length properly, so that we don't
misuse -ENOSPC.
Note that command like "kickXXXX" is still allowed, that's one patch for
another day if we really want to be very strict on this.
Reported-by: SeeChen Ng <seechen81@gmail.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Zoltan Kiss <zoltan.kiss@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bytestream timestamps are correlated with a single byte in the skbuff,
recorded in skb_shinfo(skb)->tskey. When fragmenting skbuffs, ensure
that the tskey is set for the fragment in which the tskey falls
(seqno <= tskey < end_seqno).
The original implementation did not address fragmentation in
tcp_fragment or tso_fragment. Add code to inspect the sequence numbers
and move both tskey and the relevant tx_flags if necessary.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>