Commit Graph

677596 Commits

Author SHA1 Message Date
Masahiro Yamada 05d8cba4a1 kbuild: skip install/check of headers right under uapi directories
Since commit 61562f981e ("uapi: export all arch specifics
directories"), "make INSTALL_HDR_PATH=$root/usr headers_install"
deletes standard glibc headers and others in $(root)/usr/include.

The cause of the issue is that headers_install now starts descending
from arch/$(hdr-arch)/include/uapi with $(root)/usr/include for its
destination when installing asm headers.  So, headers already there
are assumed to be unwanted.

When headers_install starts descending from include/uapi with
$(root)/usr/include for its destination, it works around the problem
by creating an dummy destination $(root)/usr/include/uapi, but this
is tricky.

To fix the problem in a clean way is to skip headers install/check
in include/uapi and arch/$(hdr-arch)/include/uapi because we know
there are only sub-directories in uapi directories.  A good side
effect is the empty destination $(root)/usr/include/uapi will go
away.

I am also removing the trailing slash in the headers_check target to
skip checking in arch/$(hdr-arch)/include/uapi.

Fixes: 61562f981e ("uapi: export all arch specifics directories")
Reported-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
2017-05-18 02:17:45 +09:00
Ihar Hrachyshka 77d7123342 neighbour: update neigh timestamps iff update is effective
It's a common practice to send gratuitous ARPs after moving an
IP address to another device to speed up healing of a service. To
fulfill service availability constraints, the timing of network peers
updating their caches to point to a new location of an IP address can be
particularly important.

Sometimes neigh_update calls won't touch neither lladdr nor state, for
example if an update arrives in locktime interval. The neigh->updated
value is tested by the protocol specific neigh code, which in turn
will influence whether NEIGH_UPDATE_F_OVERRIDE gets set in the
call to neigh_update() or not. As a result, we may effectively ignore
the update request, bailing out of touching the neigh entry, except that
we still bump its timestamps inside neigh_update.

This may be a problem for updates arriving in quick succession. For
example, consider the following scenario:

A service is moved to another device with its IP address. The new device
sends three gratuitous ARP requests into the network with ~1 seconds
interval between them. Just before the first request arrives to one of
network peer nodes, its neigh entry for the IP address transitions from
STALE to DELAY.  This transition, among other things, updates
neigh->updated. Once the kernel receives the first gratuitous ARP, it
ignores it because its arrival time is inside the locktime interval. The
kernel still bumps neigh->updated. Then the second gratuitous ARP
request arrives, and it's also ignored because it's still in the (new)
locktime interval. Same happens for the third request. The node
eventually heals itself (after delay_first_probe_time seconds since the
initial transition to DELAY state), but it just wasted some time and
require a new ARP request/reply round trip. This unfortunate behaviour
both puts more load on the network, as well as reduces service
availability.

This patch changes neigh_update so that it bumps neigh->updated (as well
as neigh->confirmed) only once we are sure that either lladdr or entry
state will change). In the scenario described above, it means that the
second gratuitous ARP request will actually update the entry lladdr.

Ideally, we would update the neigh entry on the very first gratuitous
ARP request. The locktime mechanism is designed to ignore ARP updates in
a short timeframe after a previous ARP update was honoured by the kernel
layer. This would require tracking timestamps for state transitions
separately from timestamps when actual updates are received. This would
probably involve changes in neighbour struct. Therefore, the patch
doesn't tackle the issue of the first gratuitous APR ignored, leaving
it for a follow-up.

Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 11:41:38 -04:00
Ihar Hrachyshka 23d268eb24 arp: honour gratuitous ARP _replies_
When arp_accept is 1, gratuitous ARPs are supposed to override matching
entries irrespective of whether they arrive during locktime. This was
implemented in commit 56022a8fdd ("ipv4: arp: update neighbour address
when a gratuitous arp is received and arp_accept is set")

There is a glitch in the patch though. RFC 2002, section 4.6, "ARP,
Proxy ARP, and Gratuitous ARP", defines gratuitous ARPs so that they can
be either of Request or Reply type. Those Reply gratuitous ARPs can be
triggered with standard tooling, for example, arping -A option does just
that.

This patch fixes the glitch, making both Request and Reply flavours of
gratuitous ARPs to behave identically.

As per RFC, if gratuitous ARPs are of Reply type, their Target Hardware
Address field should also be set to the link-layer address to which this
cache entry should be updated. The field is present in ARP over Ethernet
but not in IEEE 1394. In this patch, I don't consider any broadcasted
ARP replies as gratuitous if the field is not present, to conform the
standard. It's not clear whether there is such a thing for IEEE 1394 as
a gratuitous ARP reply; until it's cleared up, we will ignore such
broadcasts. Note that they will still update existing ARP cache entries,
assuming they arrive out of locktime time interval.

Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 11:32:44 -04:00
Colin Ian King 7e1b9521f5 dm cache: handle kmalloc failure allocating background_tracker struct
Currently there is no kmalloc failure check on the allocation of
the background_tracker struct in btracker_create(), and so a NULL return
will lead to a NULL pointer dereference.  Add a NULL check.

Detected by CoverityScan, CID#1416587 ("Dereference null return value")

Fixes: b29d4986d ("dm cache: significant rework to leverage dm-bio-prison-v2")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-05-17 09:44:53 -04:00
Tin Huynh 83345d51a4 i2c: xgene: Set ACPI_COMPANION_I2C
With ACPI, i2c-core requires ACPI companion to be set in order for it
to create slave device.
This patch sets the ACPI companion accordingly.

Signed-off-by: Tin Huynh <tnhuynh@apm.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-05-17 09:21:06 +02:00
Thomas Petazzoni 88ad60c23a i2c: mv64xxx: don't override deferred probing when getting irq
There is no reason to use platform_get_irq() for non-DT probing and
irq_of_parse_and_map() for DT probing. Indeed, platform_get_irq()
works fine for both.

In addition, using platform_get_irq() properly returns -EPROBE_DEFER
when the interrupt controller is not yet available, so instead of
inventing our own error code (-ENXIO), return the one provided by
platform_get_irq().

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-05-16 23:19:00 +02:00
Linus Torvalds b23afd3848 - fix bad EFI vars iterator usage
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJZG0rAAAoJEIly9N/cbcAmfigP/iaKLzuVid4PC3Pzf7hAwmmg
 cWrH8sPMZF6elSSKHguRWqUpyHgK6crsJYnMB8c+4+/upXMorBpX6GFK3CEZG2UO
 1vrVkJ06q0N41qNnJc3TpDV3RxecdKOs26Jtc9gm+ZMScJxZBLXBu3bfwkwE8V/c
 1RFYhFciFg2lZBwIUZb73nAtvBtMdkLACLA4isa1Gn/Q0Ah50s+MFrePNjBm2Xen
 mjhp7lP1w57X6Is3ZlbNwVEWA66qUP3PiyTasA0RiCNfQzJtT71NSCkh3849w1rX
 61cink78ZLtokdWMs510PiiUNiwivpkewftkUj33QAWkCvr7mh5rTleYNDcule3S
 GWHcq7I4HNJSf2aoAUMG6A21KGT1Rg3+EmdH6Ci1/sTOIRkitFdWTjdx/JJb6qH8
 WRmFh+lF3v5eB4pFu5z4sgqBbHFEa3cXsHqYw5rRrJuFJM/FLzVrOPTGCEfaQzj0
 Nnl53MeQ3PNK5djpJtNQ0JFkvnylgbed2E5XUQbqm5nNGWMuviBHNPKhyBroSoEZ
 LHk9EW8NRFUXZMjPEbi+2vPigz4/UAXBcc/MOPqdbZCBughTdq7xicIdil31hPxn
 WeJzhXGs0rVEdzZJaUDYT+yKuTbTBeYwl/ujAGtqSB45lk/VLdZ0vWRz7PL83MwL
 BwOer++6MGmiO0JnkS85
 =bdBo
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore fix from Kees Cook:
 "Fix bad EFI vars iterator usage"

* tag 'pstore-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  efi-pstore: Fix read iter after pstore API refactor
2017-05-16 13:29:07 -07:00
Arnd Bergmann 2432a3fb5c mlx5e: add CONFIG_INET dependency
We now reference the arp_tbl, which requires IPv4 support to be
enabled in the kernel, otherwise we get a link error:

drivers/net/built-in.o: In function `mlx5e_tc_update_neigh_used_value':
(.text+0x16afec): undefined reference to `arp_tbl'
drivers/net/built-in.o: In function `mlx5e_rep_neigh_init':
en_rep.c:(.text+0x16c16d): undefined reference to `arp_tbl'
drivers/net/built-in.o: In function `mlx5e_rep_netevent_event':
en_rep.c:(.text+0x16cbb5): undefined reference to `arp_tbl'

This adds a Kconfig dependency for it.

Fixes: 232c001398 ("net/mlx5e: Add support to neighbour update flow")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16 15:48:01 -04:00
Mikulas Patocka 13840d3801 dm bufio: make the parameter "retain_bytes" unsigned long
Change the type of the parameter "retain_bytes" from unsigned to
unsigned long, so that on 64-bit machines the user can set more than
4GiB of data to be retained.

Also, change the type of the variable "count" in the function
"__evict_old_buffers" to unsigned long.  The assignment
"count = c->n_buffers[LIST_CLEAN] + c->n_buffers[LIST_DIRTY];"
could result in unsigned long to unsigned overflow and that could result
in buffers not being freed when they should.

While at it, avoid division in get_retain_buffers().  Division is slow,
we can change it to shift because we have precalculated the log2 of
block size.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-05-16 15:12:08 -04:00
David Ahern f6c5775ff0 net: Improve handling of failures on link and route dumps
In general, rtnetlink dumps do not anticipate failure to dump a single
object (e.g., link or route) on a single pass. As both route and link
objects have grown via more attributes, that is no longer a given.

netlink dumps can handle a failure if the dump function returns an
error; specifically, netlink_dump adds the return code to the response
if it is <= 0 so userspace is notified of the failure. The missing
piece is the rtnetlink dump functions returning the error.

Fix route and link dump functions to return the errors if no object is
added to an skb (detected by skb->len != 0). IPv6 route dumps
(rt6_dump_route) already return the error; this patch updates IPv4 and
link dumps. Other dump functions may need to be ajusted as well.

Reported-by: Jan Moskyto Matejka <mq@ucw.cz>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16 14:54:11 -04:00
Christoph Hellwig 19a0f7e37c net/smc: Add warning about remote memory exposure
The driver explicitly bypasses APIs to register all memory once a
connection is made, and thus allows remote access to memory.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16 14:49:43 -04:00
Ursula Braun 263eec9b2a smc: switch to usage of IB_PD_UNSAFE_GLOBAL_RKEY
Currently, SMC enables remote access to physical memory when a user
has successfully configured and established an SMC-connection until ten
minutes after the last SMC connection is closed. Because this is considered
a security risk, drivers are supposed to use IB_PD_UNSAFE_GLOBAL_RKEY in
such a case.

This patch changes the current SMC code to use IB_PD_UNSAFE_GLOBAL_RKEY.
This improves user awareness, but does not remove the security risk itself.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16 14:49:42 -04:00
Kees Cook 6f61dd3aa3 efi-pstore: Fix read iter after pstore API refactor
During the internal pstore API refactoring, the EFI vars read entry was
accidentally made to update a stack variable instead of the pstore
private data pointer. This corrects the problem (and removes the now
needless argument).

Fixes: 125cc42baf ("pstore: Replace arguments for read() API")
Signed-off-by: Kees Cook <keescook@chromium.org>
2017-05-16 11:46:49 -07:00
Wolfram Sang fc9d0cd9ca Merge branch 'i2c-mux/for-current' of https://github.com/peda-r/i2c-mux into i2c/for-current
Pull bugfixes from the i2c mux subsubsystem:

This fixes an old bug in resource cleanup on failure in i2c-mux-reg and
a new log spamming bug from this merge window in the i2c-mux core.
2017-05-16 18:57:39 +02:00
Thomas Winter bcfc7d3311 ipmr: vrf: Find VIFs using the actual device
The skb->dev that is passed into ip_mr_input is
the loX device for VRFs. When we lookup a vif
for this dev, none is found as we do not create
vifs for loopbacks. Instead lookup a vif for the
actual device that the packet was received on,
eg the vlan.

Signed-off-by: Thomas Winter <Thomas.Winter@alliedtelesis.co.nz>
cc: David Ahern <dsa@cumulusnetworks.com>
cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
cc: roopa <roopa@cumulusnetworks.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16 12:52:17 -04:00
Soheil Hassas Yeganeh bafbb9c732 tcp: eliminate negative reordering in tcp_clean_rtx_queue
tcp_ack() can call tcp_fragment() which may dededuct the
value tp->fackets_out when MSS changes. When prior_fackets
is larger than tp->fackets_out, tcp_clean_rtx_queue() can
invoke tcp_update_reordering() with negative values. This
results in absurd tp->reodering values higher than
sysctl_tcp_max_reordering.

Note that tcp_update_reordering indeeds sets tp->reordering
to min(sysctl_tcp_max_reordering, metric), but because
the comparison is signed, a negative metric always wins.

Fixes: c7caf8d3ed ("[TCP]: Fix reord detection due to snd_una covered holes")
Reported-by: Rebecca Isaacs <risaacs@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16 12:45:21 -04:00
Linus Torvalds 2b6b38b04c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:

 - convert the debug feature to refcount_t

 - reduce the copy size for strncpy_from_user

 - 8 bug fixes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/virtio: change virtio_feature_desc:features type to __le32
  s390: convert debug_info.ref_count from atomic_t to refcount_t
  s390: move _text symbol to address higher than zero
  s390/qdio: increase string buffer size
  s390/ccwgroup: increase string buffer size
  s390/topology: let topology_mnest_limit() return unsigned char
  s390/uaccess: use sane length for __strncpy_from_user()
  s390/uprobes: fix compile for !KPROBES
  s390/ftrace: fix compile for !MODULES
  s390/cputime: fix incorrect system time
2017-05-16 09:24:44 -07:00
Linus Torvalds bec6cd63aa One amd64_edac fix correcting chip select sizes reporting on F17h
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAlkbE7oACgkQEsHwGGHe
 VUqV7w//fDM2YrUOUQp5lEYXxe+LCKT7i59aL649xlpeZ7/aw2Fd6rMICKvuH4LG
 FYiH2d3Gop6EJqOrVhw91C3JKhqNJCc59x72WdCV0/W/EBJxg9PSrGv0XXZw8YCO
 HAt6aktxqWqFmIKMjxuWdounrjFLKyd7dD0N9Lnw/1OUL/vJ6L5C+2oZu+rtZtva
 2Z3rHVhpOroTI9DmvCUNkCSv0txxBtP9te8yKmMBMqO3MjEBDs7Wfza4/PlDF7TL
 RVu3Hb1AzX04NC9OD62Z49RcBpy7o7ljU9OFbQu9mbobkSncTayBk9jQkQit7lG5
 WLsK3iCNYszldFFhvKAzloohyERXxmUxqjSNmulKcEN24eQaBZWqzPFsTGk9Kir2
 VENB4bJ8KnOKp7P6zKJswzaMwCbR5kK87gMhHdciiwGsbD0HenOn6iw/znaXKsc4
 Ca4qS4juOXecdGZvx6znOAckw0g4KkKlsJ7Z3FLfU30kDwOIVzmEMHQb4M4rr4AF
 k0fGkiA8vfgvo8H2dT6DireeUIqD0nqrGFdaFYSoHH1pHWZCLTEnbmSDzZ/Im0L2
 P6YzZD4kgKPGtgEKDr32sjyq/KcVxwn7+Agnbl8XV+OeezQSM++QXo9Yrj+j2kgj
 Zya6eKtrrnS0Js8L0/WfSCBjc2SWuveYVq6lC0/75xftULz5UBc=
 =j2Fs
 -----END PGP SIGNATURE-----

Merge tag 'edac_fix_for_4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC fix from Borislav Petkov:
 "A single amd64_edac fix correcting chip select sizes reporting on
  F17h"

* tag 'edac_fix_for_4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, amd64: Fix reporting of Chip Select sizes on Fam17h
2017-05-16 09:18:18 -07:00
Gao Feng c953d63548 ebtables: arpreply: Add the standard target sanity check
The info->target comes from userspace and it would be used directly.
So we need to add the sanity check to make sure it is a valid standard
target, although the ebtables tool has already checked it. Kernel needs
to validate anything coming from userspace.

If the target is set as an evil value, it would break the ebtables
and cause a panic. Because the non-standard target is treated as one
offset.

Now add one helper function ebt_invalid_target, and we would replace
the macro INVALID_TARGET later.

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-16 10:24:27 +02:00
Linus Torvalds a95cfad947 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Track alignment in BPF verifier so that legitimate programs won't be
    rejected on !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS architectures.

 2) Make tail calls work properly in arm64 BPF JIT, from Deniel
    Borkmann.

 3) Make the configuration and semantics Generic XDP make more sense and
    don't allow both generic XDP and a driver specific instance to be
    active at the same time. Also from Daniel.

 4) Don't crash on resume in xen-netfront, from Vitaly Kuznetsov.

 5) Fix use-after-free in VRF driver, from Gao Feng.

 6) Use netdev_alloc_skb_ip_align() to avoid unaligned IP headers in
    qca_spi driver, from Stefan Wahren.

 7) Always run cleanup routines in BPF samples when we get SIGTERM, from
    Andy Gospodarek.

 8) The mdio phy code should bring PHYs out of reset using the shared
    GPIO lines before invoking bus->reset(). From Florian Fainelli.

 9) Some USB descriptor access endian fixes in various drivers from
    Johan Hovold.

10) Handle PAUSE advertisements properly in mlx5 driver, from Gal
    Pressman.

11) Fix reversed test in mlx5e_setup_tc(), from Saeed Mahameed.

12) Cure netdev leak in AF_PACKET when using timestamping via control
    messages. From Douglas Caetano dos Santos.

13) netcp doesn't support HWTSTAMP_FILTER_ALl, reject it. From Miroslav
    Lichvar.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
  ldmvsw: stop the clean timer at beginning of remove
  ldmvsw: unregistering netdev before disable hardware
  net: netcp: fix check of requested timestamping filter
  ipv6: avoid dad-failures for addresses with NODAD
  qed: Fix uninitialized data in aRFS infrastructure
  mdio: mux: fix device_node_continue.cocci warnings
  net/packet: fix missing net_device reference release
  net/mlx4_core: Use min3 to select number of MSI-X vectors
  macvlan: Fix performance issues with vlan tagged packets
  net: stmmac: use correct pointer when printing normal descriptor ring
  net/mlx5: Use underlay QPN from the root name space
  net/mlx5e: IPoIB, Only support regular RQ for now
  net/mlx5e: Fix setup TC ndo
  net/mlx5e: Fix ethtool pause support and advertise reporting
  net/mlx5e: Use the correct pause values for ethtool advertising
  vmxnet3: ensure that adapter is in proper state during force_close
  sfc: revert changes to NIC revision numbers
  net: ch9200: add missing USB-descriptor endianness conversions
  net: irda: irda-usb: fix firmware name on big-endian hosts
  net: dsa: mv88e6xxx: add default case to switch
  ...
2017-05-15 15:50:49 -07:00
Linus Torvalds 1319a2856d Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "A set of minor cifs fixes"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  [CIFS] Minor cleanup of xattr query function
  fs: cifs: transport: Use time_after for time comparison
  SMB2: Fix share type handling
  cifs: cifsacl: Use a temporary ops variable to reduce code length
  Don't delay freeing mids when blocked on slow socket write of request
  CIFS: silence lockdep splat in cifs_relock_file()
2017-05-15 15:27:02 -07:00
David S. Miller 66f4bc819d Merge branch 'ldmsw-fixes'
Shannon Nelson says:

====================
ldmvsw: port removal stability

Under heavy reboot stress testing we found a couple of timing issues
when removing the device that could cause the kernel great heartburn,
addressed by these two patches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15 15:36:09 -04:00
Shannon Nelson 8b671f906c ldmvsw: stop the clean timer at beginning of remove
Stop the clean timer earlier to be sure there's no asynchronous
interference while stopping the port.

Orabug: 25748241

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15 15:36:08 -04:00
Thomas Tai b18e5e86b4 ldmvsw: unregistering netdev before disable hardware
When running LDom binding/unbinding test, kernel may panic
in ldmvsw_open(). It is more likely that because we're removing
the ldc connection before unregistering the netdev in vsw_port_remove(),
we set up a window of time where one process could be removing the
device while another trying to UP the device. This also sometimes causes
vio handshake error due to opening a device without closing it completely.
We should unregister the netdev before we disable the "hardware".

Orabug: 25980913, 25925306

Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15 15:36:08 -04:00
Miroslav Lichvar ca9df7ede4 net: netcp: fix check of requested timestamping filter
The driver doesn't support timestamping of all received packets and
should return error when trying to enable the HWTSTAMP_FILTER_ALL
filter.

Cc: WingMan Kwok <w-kwok2@ti.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15 15:21:03 -04:00
Christoph Hellwig f98e0eb680 dm mpath: multipath_clone_and_map must not return -EIO
Since 412445ac ("dm: introduce a new DM_MAPIO_KILL return value"), the
clone_and_map_rq methods must not return errno values, so fix it up
to properly return DM_MAPIO_KILL, instead of the -EIO value that snuck
in due to a conflict between two patches.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-05-15 15:09:53 -04:00
Christoph Hellwig 18a482f524 dm mpath: don't return -EIO from dm_report_EIO
Instead just turn the macro into a helper for the warning message.
This removes an unnecessary assignment and will allow the next commit to
fix a place where -EIO is the wrong return value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-05-15 15:09:52 -04:00
Christoph Hellwig ece0728037 dm rq: add a missing break to map_request
We don't want to bug when receiving a DM_MAPIO_KILL value..

Fixes: 412445ac ("dm: introduce a new DM_MAPIO_KILL return value")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-05-15 15:09:51 -04:00
Joe Thornber 0377a07c7a dm space map disk: fix some book keeping in the disk space map
When decrementing the reference count for a block, the free count wasn't
being updated if the reference count went to zero.

Cc: stable@vger.kernel.org
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-05-15 15:09:50 -04:00
Joe Thornber 91bcdb92d3 dm thin metadata: call precommit before saving the roots
These calls were the wrong way round in __write_initial_superblock.

Cc: stable@vger.kernel.org
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-05-15 15:09:49 -04:00
David S. Miller 42a928ced3 mlx5-fixes-2017-05-12
Misc fixes for mlx5 driver
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZGDM8AAoJEEg/ir3gV/o+svMH/1lAl+FIGCWgZ82/UbFHCZCW
 SIXEnP2id+Nic7JxQSa/RVQ43I75wkM8pN929hFoyz8p/pPLhZkpo7vX6yLWG2SL
 fTx/4Qn5jR/eow/D+fdrlyKMPg0A7fijY+ZnvDPsQkjtakIedCgc/A1xDufpKi7+
 I7nOJa4yACuZK0gzy32VGgpJw02q32eRTJjKHRiEYdmNQSIJpmRbG2m4e0z/me2s
 hrMt358/llPOZNwkAPD2SHZxH68oSq5EbSRmz5jDwXfTVFkjWNQVowwm3pCFjsQn
 3xDtkjakVPVBUagR3hngtLPcsDpUR4GU3tHr4l/UPp0wFBXVKgdupC7B+mCkSp4=
 =ubJG
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2017-05-12-V2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2017-05-12

This series contains some mlx5 fixes for net.
Please pull and let me know if there's any problem.

For -stable:
("net/mlx5e: Fix ethtool pause support and advertise reporting") kernels >= 4.8
("net/mlx5e: Use the correct pause values for ethtool advertising") kernels >= 4.8

v1->v2:
 Dropped statistics spinlock patch, it needs some extra work.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15 14:38:04 -04:00
Mahesh Bandewar 66eb9f86e5 ipv6: avoid dad-failures for addresses with NODAD
Every address gets added with TENTATIVE flag even for the addresses with
IFA_F_NODAD flag and dad-work is scheduled for them. During this DAD process
we realize it's an address with NODAD and complete the process without
sending any probe. However the TENTATIVE flags stays on the
address for sometime enough to cause misinterpretation when we receive a NS.
While processing NS, if the address has TENTATIVE flag, we mark it DADFAILED
and endup with an address that was originally configured as NODAD with
DADFAILED.

We can't avoid scheduling dad_work for addresses with NODAD but we can
avoid adding TENTATIVE flag to avoid this racy situation.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15 14:31:51 -04:00
Mintz, Yuval aa4ad88cfc qed: Fix uninitialized data in aRFS infrastructure
Current memset is using incorrect type of variable, causing the
upper-half of the strucutre to be left uninitialized and causing:

  ethernet/qlogic/qed/qed_init_fw_funcs.c: In function 'qed_set_rfs_mode_disable':
  ethernet/qlogic/qed/qed_init_fw_funcs.c:993:3: error: '*((void *)&ramline+4)' is used uninitialized in this function [-Werror=uninitialized]

Fixes: d51e4af5c2 ("qed: aRFS infrastructure support")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15 14:31:27 -04:00
Julia Lawall 8c977f5a85 mdio: mux: fix device_node_continue.cocci warnings
Device node iterators put the previous value of the index variable, so an
explicit put causes a double put.

In particular, of_mdiobus_register can fail before doing anything
interesting, so one could view it as a no-op from the reference count
point of view.

Generated by: scripts/coccinelle/iterators/device_node_continue.cocci

CC: Jon Mason <jon.mason@broadcom.com>
Signed-off-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15 14:28:10 -04:00
Douglas Caetano dos Santos d19b183cdc net/packet: fix missing net_device reference release
When using a TX ring buffer, if an error occurs processing a control
message (e.g. invalid message), the net_device reference is not
released.

Fixes c14ac9451c ("sock: enable timestamping using control messages")
Signed-off-by: Douglas Caetano dos Santos <douglascs@taghos.com.br>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15 14:22:12 -04:00
yuval.shaia@oracle.com 4762010f09 net/mlx4_core: Use min3 to select number of MSI-X vectors
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15 14:20:26 -04:00
Vlad Yasevich 70957eaecc macvlan: Fix performance issues with vlan tagged packets
Macvlan always turns on offload features that have sofware
fallback (NETIF_GSO_SOFTWARE).  This allows much higher guest-guest
communications over macvtap.

However, macvtap does not turn on these features for vlan tagged traffic.
As a result, depending on the HW that mactap is configured on, the
performance of guest-guest communication over a vlan is very
inconsistent.  If the HW supports TSO/UFO over vlans, then the
performance will be fine.  If not, the the performance will suffer
greatly since the VM may continue using TSO/UFO, and will force the host
segment the traffic and possibly overlow the macvtap queue.

This patch adds the always on offloads to vlan_features.  This
makes sure that any vlan tagged traffic between 2 guest will not
be segmented needlessly.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15 14:18:11 -04:00
Peter Rosin 9fce894d03 i2c: mux: only print failure message on error
As is, a failure message is printed unconditionally, which is confusing.
And noisy.

Fixes: 8d4d159f25 ("i2c: mux: provide more info on failure in i2c_mux_add_adapter")
Signed-off-by: Peter Rosin <peda@axentia.se>
2017-05-15 18:49:11 +02:00
Peter Rosin a36d4637e4 i2c: mux: reg: rename label to indicate what it does
That maintains sanity if it is ever called from some other spot, and
also makes the label names coherent.

Signed-off-by: Peter Rosin <peda@axentia.se>
2017-05-15 18:49:10 +02:00
Peter Rosin 68118e0e73 i2c: mux: reg: put away the parent i2c adapter on probe failure
It is only prudent to let go of resources that are not used.

Fixes: b3fdd32799 ("i2c: mux: Add register-based mux i2c-mux-reg")
Signed-off-by: Peter Rosin <peda@axentia.se>
2017-05-15 18:44:58 +02:00
Niklas Cassel 66c25f6e31 net: stmmac: use correct pointer when printing normal descriptor ring
There are two pointers in sysfs_display_ring,
one that increments if using normal dma descriptors,
another if using extended dma descriptors.

When printing the normal dma descriptors, the wrong pointer is used,
thus the printed descriptor addresses are incorrect.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15 10:02:19 -04:00
Pablo Neira Ayuso 591054469b netfilter: nf_tables: revisit chain/object refcounting from elements
Andreas reports that the following incremental update using our commit
protocol doesn't work.

 # nft -f incremental-update.nft
 delete element ip filter client_to_any { 10.180.86.22 : goto CIn_1 }
 delete chain ip filter CIn_1
 ... Error: Could not process rule: Device or resource busy

The existing code is not well-integrated into the commit phase protocol,
since element deletions do not result in refcount decrement from the
preparation phase. This results in bogus EBUSY errors like the one
above.

Two new functions come with this patch:

* nft_set_elem_activate() function is used from the abort path, to
  restore the set element refcounting on objects that occurred from
  the preparation phase.

* nft_set_elem_deactivate() that is called from nft_del_setelem() to
  decrement set element refcounting on objects from the preparation
  phase in the commit protocol.

The nft_data_uninit() has been renamed to nft_data_release() since this
function does not uninitialize any data store in the data register,
instead just releases the references to objects. Moreover, a new
function nft_data_hold() has been introduced to be used from
nft_set_elem_activate().

Reported-by: Andreas Schultz <aschultz@tpip.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-15 12:51:41 +02:00
Pablo Neira Ayuso 71df14b0ce netfilter: nf_tables: missing sanitization in data from userspace
Do not assume userspace always sends us NFT_DATA_VALUE for bitwise and
cmp expressions. Although NFT_DATA_VERDICT does not make any sense, it
is still possible to handcraft a netlink message using this incorrect
data type.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-15 12:51:40 +02:00
Liping Zhang fa803605ee netfilter: nf_tables: can't assume lock is acquired when dumping set elems
When dumping the elements related to a specified set, we may invoke the
nf_tables_dump_set with the NFNL_SUBSYS_NFTABLES lock not acquired. So
we should use the proper rcu operation to avoid race condition, just
like other nft dump operations.

Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-15 12:51:39 +02:00
Eric Leblond 87e94dbc21 netfilter: synproxy: fix conntrackd interaction
This patch fixes the creation of connection tracking entry from
netlink when synproxy is used. It was missing the addition of
the synproxy extension.

This was causing kernel crashes when a conntrack entry created by
conntrackd was used after the switch of traffic from active node
to the passive node.

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-15 12:51:39 +02:00
Willem de Bruijn 324318f024 netfilter: xtables: zero padding in data_to_user
When looking up an iptables rule, the iptables binary compares the
aligned match and target data (XT_ALIGN). In some cases this can
exceed the actual data size to include padding bytes.

Before commit f77bc5b23f ("iptables: use match, target and data
copy_to_user helpers") the malloc()ed bytes were overwritten by the
kernel with kzalloced contents, zeroing the padding and making the
comparison succeed. After this patch, the kernel copies and clears
only data, leaving the padding bytes undefined.

Extend the clear operation from data size to aligned data size to
include the padding bytes, if any.

Padding bytes can be observed in both match and target, and the bug
triggered, by issuing a rule with match icmp and target ACCEPT:

  iptables -t mangle -A INPUT -i lo -p icmp --icmp-type 1 -j ACCEPT
  iptables -t mangle -D INPUT -i lo -p icmp --icmp-type 1 -j ACCEPT

Fixes: f77bc5b23f ("iptables: use match, target and data copy_to_user helpers")
Reported-by: Paul Moore <pmoore@redhat.com>
Reported-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-15 12:51:38 +02:00
Pablo Neira Ayuso ff1e4300cf Merge tag 'ipvs-fixes-for-v4.12' of http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs
Simon Horman says:

====================
IPVS Fixes for v4.12

please consider this fix to IPVS for v4.12.

* It is a fix from Julian Anastasov to only SNAT SNAT packet replies only for
  NATed connections

My understanding is that this fix is appropriate for 4.9.25, 4.10.13, 4.11
as well as the nf tree. Julian has separately posted backports for other
-stable kernels; please see:

* [PATCH 3.2.88,3.4.113 -stable 1/3] ipvs: SNAT packet replies only for
        NATed connections
* [PATCH 3.10.105,3.12.73,3.16.43,4.1.39 -stable 2/3] ipvs: SNAT packet
        replies only for NATed connections
* [PATCH 4.4.65 -stable 3/3] ipvs: SNAT packet replies only for NATed
        connections
====================

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-15 12:50:12 +02:00
Liping Zhang 9338d7b441 netfilter: nfnl_cthelper: reject del request if helper obj is in use
We can still delete the ct helper even if it is in use, this will cause
a use-after-free error. In more detail, I mean:
  # nfct helper add ssdp inet udp
  # iptables -t raw -A OUTPUT -p udp -j CT --helper ssdp
  # nfct helper delete ssdp //--> oops, succeed!
  BUG: unable to handle kernel paging request at 000026ca
  IP: 0x26ca
  [...]
  Call Trace:
   ? ipv4_helper+0x62/0x80 [nf_conntrack_ipv4]
   nf_hook_slow+0x21/0xb0
   ip_output+0xe9/0x100
   ? ip_fragment.constprop.54+0xc0/0xc0
   ip_local_out+0x33/0x40
   ip_send_skb+0x16/0x80
   udp_send_skb+0x84/0x240
   udp_sendmsg+0x35d/0xa50

So add reference count to fix this issue, if ct helper is used by
others, reject the delete request.

Apply this patch:
  # nfct helper delete ssdp
  nfct v1.4.3: netlink error: Device or resource busy

Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-15 12:42:29 +02:00
Liping Zhang d91fc59cd7 netfilter: introduce nf_conntrack_helper_put helper function
And convert module_put invocation to nf_conntrack_helper_put, this is
prepared for the followup patch, which will add a refcnt for cthelper,
so we can reject the deleting request when cthelper is in use.

Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-15 12:42:29 +02:00
Liping Zhang d110a3942a netfilter: don't setup nat info for confirmed ct
We cannot setup nat info if the ct has been confirmed already, else,
different cpu may race to handle the same ct. In extreme situation,
we may hit the "BUG_ON(nf_nat_initialized(ct, maniptype))" in the
nf_nat_setup_info.

Also running the following commands will easily hit NF_CT_ASSERT in
nf_conntrack_alter_reply:
  # nft flush ruleset
  # ping -c 2 -W 1 1.1.1.111 &
  # nft add table t
  # nft add chain t c {type nat hook postrouting priority 0 \;}
  # nft add rule t c snat to 4.5.6.7
  WARNING: CPU: 1 PID: 10065 at net/netfilter/nf_conntrack_core.c:1472
  nf_conntrack_alter_reply+0x9a/0x1a0 [nf_conntrack]
  [...]
  Call Trace:
   nf_nat_setup_info+0xad/0x840 [nf_nat]
   ? deactivate_slab+0x65d/0x6c0
   nft_nat_eval+0xcd/0x100 [nft_nat]
   nft_do_chain+0xff/0x5d0 [nf_tables]
   ? mark_held_locks+0x6f/0xa0
   ? __local_bh_enable_ip+0x70/0xa0
   ? trace_hardirqs_on_caller+0x11f/0x190
   ? ipt_do_table+0x310/0x610
   ? trace_hardirqs_on+0xd/0x10
   ? __local_bh_enable_ip+0x70/0xa0
   ? ipt_do_table+0x32b/0x610
   ? __lock_acquire+0x2ac/0x1580
   ? ipt_do_table+0x32b/0x610
   nft_nat_do_chain+0x65/0x80 [nft_chain_nat_ipv4]
   nf_nat_ipv4_fn+0x1ae/0x240 [nf_nat_ipv4]
   nf_nat_ipv4_out+0x4a/0xf0 [nf_nat_ipv4]
   nft_nat_ipv4_out+0x15/0x20 [nft_chain_nat_ipv4]
   nf_hook_slow+0x2c/0xf0
   ip_output+0x154/0x270

So for the confirmed ct, just ignore it and return NF_ACCEPT.

Fixes: 9a08ecfe74 ("netfilter: don't attach a nat extension by default")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-15 12:42:28 +02:00