Commit Graph

61154 Commits

Author SHA1 Message Date
Andrew Lunn 74c950c966 net: llc: kerneldoc fixes
Simple fixes which require no deep knowledge of the code.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13 17:20:39 -07:00
Andrew Lunn b51cd7c834 net: ipv6: kerneldoc fixes
Simple fixes which require no deep knowledge of the code.

Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13 17:20:39 -07:00
Andrew Lunn 3628e3cbf9 net: ipv4: kerneldoc fixes
Simple fixes which require no deep knowledge of the code.

Cc: Paul Moore <paul@paul-moore.com>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13 17:20:39 -07:00
Andrew Lunn aff53b23a9 net: decnet: kerneldoc fixes
Simple fixes which require no deep knowledge of the code.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13 17:20:39 -07:00
Andrew Lunn d0b1101bb5 net: dccp: kerneldoc fixes
Simple fixes which require no deep knowledge of the code.

Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13 17:20:39 -07:00
Andrew Lunn 8842500dd0 net: core: kerneldoc fixes
Simple fixes which require no deep knowledge of the code.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13 17:20:39 -07:00
Andrew Lunn e0a7f1fe0c net: can: kerneldoc fixes
Simple fixes which require no deep knowledge of the code.

Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13 17:20:39 -07:00
Andrew Lunn 15e522a7b1 net: 9p: kerneldoc fixes
Simple fixes which require no deep knowledge of the code.

Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13 17:20:39 -07:00
Alexander A. Klimov 2be53e0e46 AX.25 Kconfig: Replace HTTP links with HTTPS ones
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
    For each line:
      If doesn't contain `\bxmlns\b`:
        For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
	  If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
            If both the HTTP and HTTPS versions
            return 200 OK and serve the same content:
              Replace HTTP with HTTPS.

Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13 12:34:06 -07:00
Alexander A. Klimov 266f312845 dccp: Replace HTTP links with HTTPS ones
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
    For each line:
      If doesn't contain `\bxmlns\b`:
        For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
	  If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
            If both the HTTP and HTTPS versions
            return 200 OK and serve the same content:
              Replace HTTP with HTTPS.

Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13 11:54:07 -07:00
Nikolay Aleksandrov 528ae84a34 net: bridge: fix undefined br_vlan_can_enter_range in tunnel code
If bridge vlan filtering is not defined we won't have
br_vlan_can_enter_range and thus will get a compile error as was
reported by Stephen and the build bot. So let's define a stub for when
vlan filtering is not used.

Fixes: 9433944368 ("net: bridge: notify on vlan tunnel changes done via the old api")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13 11:22:55 -07:00
Nikolay Aleksandrov 9433944368 net: bridge: notify on vlan tunnel changes done via the old api
If someone uses the old vlan API to configure tunnel mappings we'll only
generate the old-style full port notification. That would be a problem
if we are monitoring the new vlan notifications for changes. The patch
resolves the issue by adding vlan notifications to the old tunnel netlink
code. As usual we try to compress the notifications for as many vlans
in a range as possible, thus a vlan tunnel change is considered able
to enter the "current" vlan notification range if:
 1. vlan exists
 2. it has actually changed (curr_change == true)
 3. it passes all standard vlan notification range checks done by
    br_vlan_can_enter_range() such as option equality, id continuity etc

Note that vlan tunnel changes (add/del) are considered a part of vlan
options so only RTM_NEWVLAN notification is generated with the relevant
information inside.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-12 15:18:24 -07:00
David S. Miller 71930d6102 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
All conflicts seemed rather trivial, with some guidance from
Saeed Mameed on the tc_ct.c one.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-11 00:46:00 -07:00
Linus Torvalds 5a764898af Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Restore previous behavior of CAP_SYS_ADMIN wrt loading networking
    BPF programs, from Maciej Żenczykowski.

 2) Fix dropped broadcasts in mac80211 code, from Seevalamuthu
    Mariappan.

 3) Slay memory leak in nl80211 bss color attribute parsing code, from
    Luca Coelho.

 4) Get route from skb properly in ip_route_use_hint(), from Miaohe Lin.

 5) Don't allow anything other than ARPHRD_ETHER in llc code, from Eric
    Dumazet.

 6) xsk code dips too deeply into DMA mapping implementation internals.
    Add dma_need_sync and use it. From Christoph Hellwig

 7) Enforce power-of-2 for BPF ringbuf sizes. From Andrii Nakryiko.

 8) Check for disallowed attributes when loading flow dissector BPF
    programs. From Lorenz Bauer.

 9) Correct packet injection to L3 tunnel devices via AF_PACKET, from
    Jason A. Donenfeld.

10) Don't advertise checksum offload on ipa devices that don't support
    it. From Alex Elder.

11) Resolve several issues in TCP MD5 signature support. Missing memory
    barriers, bogus options emitted when using syncookies, and failure
    to allow md5 key changes in established states. All from Eric
    Dumazet.

12) Fix interface leak in hsr code, from Taehee Yoo.

13) VF reset fixes in hns3 driver, from Huazhong Tan.

14) Make loopback work again with ipv6 anycast, from David Ahern.

15) Fix TX starvation under high load in fec driver, from Tobias
    Waldekranz.

16) MLD2 payload lengths not checked properly in bridge multicast code,
    from Linus Lüssing.

17) Packet scheduler code that wants to find the inner protocol
    currently only works for one level of VLAN encapsulation. Allow
    Q-in-Q situations to work properly here, from Toke
    Høiland-Jørgensen.

18) Fix route leak in l2tp, from Xin Long.

19) Resolve conflict between the sk->sk_user_data usage of bpf reuseport
    support and various protocols. From Martin KaFai Lau.

20) Fix socket cgroup v2 reference counting in some situations, from
    Cong Wang.

21) Cure memory leak in mlx5 connection tracking offload support, from
    Eli Britstein.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
  mlxsw: pci: Fix use-after-free in case of failed devlink reload
  mlxsw: spectrum_router: Remove inappropriate usage of WARN_ON()
  net: macb: fix call to pm_runtime in the suspend/resume functions
  net: macb: fix macb_suspend() by removing call to netif_carrier_off()
  net: macb: fix macb_get/set_wol() when moving to phylink
  net: macb: mark device wake capable when "magic-packet" property present
  net: macb: fix wakeup test in runtime suspend/resume routines
  bnxt_en: fix NULL dereference in case SR-IOV configuration fails
  libbpf: Fix libbpf hashmap on (I)LP32 architectures
  net/mlx5e: CT: Fix memory leak in cleanup
  net/mlx5e: Fix port buffers cell size value
  net/mlx5e: Fix 50G per lane indication
  net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash
  net/mlx5e: Fix VXLAN configuration restore after function reload
  net/mlx5e: Fix usage of rcu-protected pointer
  net/mxl5e: Verify that rpriv is not NULL
  net/mlx5: E-Switch, Fix vlan or qos setting in legacy mode
  net/mlx5: Fix eeprom support for SFP module
  cgroup: Fix sock_cgroup_data on big-endian.
  selftests: bpf: Fix detach from sockmap tests
  ...
2020-07-10 18:16:22 -07:00
Kuniyuki Iwashima a594920f87 inet: Remove an unnecessary argument of syn_ack_recalc().
Commit 0c3d79bce4 ("tcp: reduce SYN-ACK
retrans for TCP_DEFER_ACCEPT") introduces syn_ack_recalc() which decides
if a minisock is held and a SYN+ACK is retransmitted or not.

If rskq_defer_accept is not zero in syn_ack_recalc(), max_retries always
has the same value because max_retries is overwritten by rskq_defer_accept
in reqsk_timer_handler().

This commit adds three changes:
- remove redundant non-zero check for rskq_defer_accept in
   reqsk_timer_handler().
- remove max_retries from the arguments of syn_ack_recalc() and use
   rskq_defer_accept instead.
- rename thresh to max_syn_ack_retries for readability.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com>
CC: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-10 14:37:57 -07:00
Vladyslav Tarasiuk 15c724b997 devlink: Add devlink health port reporters API
In order to use new devlink port health reporters infrastructure, add
corresponding constructor and destructor functions.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-10 14:32:02 -07:00
Vladyslav Tarasiuk f4f5416601 devlink: Implement devlink health reporters on per-port basis
Add devlink-health reporter support on per-port basis.
The main difference existing devlink-health is that port reporters are
stored in per-devlink_port lists. Upon creation of such health reporter the
reference to a port it belongs to is stored in reporter struct.

Fill the port index attribute in devlink-health response to
allow devlink userspace utility to distinguish between device and port
reporters.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-10 14:32:02 -07:00
Vladyslav Tarasiuk bd8210055c devlink: Create generic devlink health reporter search function
Add a generic __devlink_health_reporter_find_by_name() that can be used
with arbitrary devlink health reporter list.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-10 14:32:02 -07:00
Vladyslav Tarasiuk 3c5584bf0a devlink: Rework devlink health reporter destructor
Devlink keeps its own reference to every reporter in a list and inits
refcount to 1 upon reporter's creation. Existing destructor waits to
free the memory indefinitely using msleep() until all references except
devlink's own are put.

Rework this mechanism by moving memory free routine to a separate
function, which is called when the last reporter reference is put.

Besides, it allows to call __devlink_health_reporter_destroy() while
locked on a reporters list mutex in symmetry to
__devlink_health_reporter_create(), which is required in follow-up
patch.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-10 14:32:02 -07:00
Vladyslav Tarasiuk c57544b3de devlink: Refactor devlink health reporter constructor
Prepare a common routine in devlink_health_reporter_create() for usage
in similar functions for devlink port health reporters.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-10 14:32:02 -07:00
Jakub Kicinski c7d759eb7b ethtool: add tunnel info interface
Add an interface to report offloaded UDP ports via ethtool netlink.

Now that core takes care of tracking which UDP tunnel ports the NICs
are aware of we can quite easily export this information out to
user space.

The responsibility of writing the netlink dumps is split between
ethtool code and udp_tunnel_nic.c - since udp_tunnel module may
not always be loaded, yet we should always report the capabilities
of the NIC.

$ ethtool --show-tunnels eth0
Tunnel information for eth0:
  UDP port table 0:
    Size: 4
    Types: vxlan
    No entries
  UDP port table 1:
    Size: 4
    Types: geneve, vxlan-gpe
    Entries (1):
        port 1230, vxlan-gpe

v4:
 - back to v2, build fix is now directly in udp_tunnel.h
v3:
 - don't compile ETHTOOL_MSG_TUNNEL_INFO_GET in if CONFIG_INET
   not set.
v2:
 - fix string set count,
 - reorder enums in the uAPI,
 - fix type of ETHTOOL_A_TUNNEL_UDP_TABLE_TYPES to bitset
   in docs and comments.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-10 13:54:00 -07:00
Jakub Kicinski cc4e3835ef udp_tunnel: add central NIC RX port offload infrastructure
Cater to devices which:
 (a) may want to sleep in the callbacks;
 (b) only have IPv4 support;
 (c) need all the programming to happen while the netdev is up.

Drivers attach UDP tunnel offload info struct to their netdevs,
where they declare how many UDP ports of various tunnel types
they support. Core takes care of tracking which ports to offload.

Use a fixed-size array since this matches what almost all drivers
do, and avoids a complexity and uncertainty around memory allocations
in an atomic context.

Make sure that tunnel drivers don't try to replay the ports when
new NIC netdev is registered. Automatic replays would mess up
reference counting, and will be removed completely once all drivers
are converted.

v4:
 - use a #define NULL to avoid build issues with CONFIG_INET=n.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-10 13:54:00 -07:00
Linus Torvalds b1b11d0063 cleanup in-kernel read and write operations
Reshuffle the (__)kernel_read and (__)kernel_write helpers, and ensure
 all users of in-kernel file I/O use them if they don't use iov_iter
 based methods already.
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl8Ij8gLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYOcpBAAn157ooLqRrqQisEA6j59rTgkHUuqZMUx+8XjiivX
 baHQPmgctza1Xzjc4PjJ1owtLpt4ywcTpY8IDj3vZF1PpffeeuWVzxMTk/aIvhNN
 zPK2SJpRlDQHErKEhkTTOfOYoFTgc7vPa5Hvm6AEMaJs8oPtGZ2rnQHzPXENl/TY
 TgcLd1ou3iuw19UIAfB+EfuC9uhq7pCPu9+tryNyT2IfM7fqdsIhRESpcodg1ve+
 1k6leFIBrXa3MWiBGVUGCrSmlpP9xd22Zl8D/w60WeYWeg7szZoUK2bjhbdIEDZI
 tTwkdZ73IKpcxOyzUVbfr2hqNa94zrXCKQGfEGVS/7arV7QH4yvhg9NU9lqVXZKV
 ruPoyjsmJkHW52FfEEv1Gfrd6v4H6qZ6iyJEm3ZYNGul85O97t1xA/kKxAIwMuPa
 nFhhxHIooT/We3Ao77FROhIob4D5AOfOI4gvkTE15YMzsNxT/yjilQjdDFR5An6A
 ckzqb+VyDvcTx2gxR/qaol7b4lzmri4S/8Jt7WXjHOtNe9eXC4kl44leitK5j31H
 fHZNyMLJ2+/JF5pGB2rNRNnTeQ7lXKob4Y+qAjRThddDxtdsf5COdZAiIiZbRurR
 Ogl2k3sMDdHgNfycK2Bg5Fab9OIWePQlpcGU14afUSPviuNkIYKLGrx92ZWef53j
 loI=
 =eYsI
 -----END PGP SIGNATURE-----

Merge tag 'cleanup-kernel_read_write' of git://git.infradead.org/users/hch/misc

Pull in-kernel read and write op cleanups from Christoph Hellwig:
 "Cleanup in-kernel read and write operations

  Reshuffle the (__)kernel_read and (__)kernel_write helpers, and ensure
  all users of in-kernel file I/O use them if they don't use iov_iter
  based methods already.

  The new WARN_ONs in combination with syzcaller already found a missing
  input validation in 9p. The fix should be on your way through the
  maintainer ASAP".

[ This is prep-work for the real changes coming 5.9 ]

* tag 'cleanup-kernel_read_write' of git://git.infradead.org/users/hch/misc:
  fs: remove __vfs_read
  fs: implement kernel_read using __kernel_read
  integrity/ima: switch to using __kernel_read
  fs: add a __kernel_read helper
  fs: remove __vfs_write
  fs: implement kernel_write using __kernel_write
  fs: check FMODE_WRITE in __kernel_write
  fs: unexport __kernel_write
  bpfilter: switch to kernel_write
  autofs: switch to kernel_write
  cachefiles: switch to kernel_write
2020-07-10 09:45:15 -07:00
Danielle Ratson 82901ad169 devlink: Move input checks from driver to devlink
Currently, all the input checks are done in driver.

After adding the split capability to devlink port, move the checks to
devlink.

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09 13:15:30 -07:00
Danielle Ratson a0f49b5486 devlink: Add a new devlink port split ability attribute and pass to netlink
Add a new attribute that indicates the split ability of devlink port.

Drivers are expected to set it via devlink_port_attrs_set(), before
registering the port.

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09 13:15:30 -07:00
Danielle Ratson a21cf0a833 devlink: Add a new devlink port lanes attribute and pass to netlink
Add a new devlink port attribute that indicates the port's number of lanes.

Drivers are expected to set it via devlink_port_attrs_set(), before
registering the port.

The attribute is not passed to user space in case the number of lanes is
invalid (0).

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09 13:15:29 -07:00
Danielle Ratson 71ad8d55f8 devlink: Replace devlink_port_attrs_set parameters with a struct
Currently, devlink_port_attrs_set accepts a long list of parameters,
that most of them are devlink port's attributes.

Use the devlink_port_attrs struct to replace the relevant parameters.

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09 13:15:29 -07:00
Danielle Ratson 46737a1949 devlink: Move switch_port attribute of devlink_port_attrs to devlink_port
The struct devlink_port_attrs holds the attributes of devlink_port.

Similarly to the previous patch, 'switch_port' attribute is another
exception.

Move 'switch_port' to be devlink_port's field.

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09 13:15:29 -07:00
Danielle Ratson 10a429bab4 devlink: Move set attribute of devlink_port_attrs to devlink_port
The struct devlink_port_attrs holds the attributes of devlink_port.

The 'set' field is not devlink_port's attribute as opposed to most of the
others.

Move 'set' to be devlink_port's field called 'attrs_set'.

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09 13:15:29 -07:00
Linus Torvalds ce69fb3b39 Refactor kallsyms_show_value() users for correct cred
Several users of kallsyms_show_value() were performing checks not
 during "open". Refactor everything needed to gain proper checks against
 file->f_cred for modules, kprobes, and bpf.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8GUbMWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJohnD/9VsPsAMV+8lhsPvkcuW/DkTcAY
 qUEzsXU3v06gJ0Z/1lBKtisJ6XmD93wWcZCTFvJ0S8vR3yLZvOVfToVjCMO32Trc
 4ZkWTPwpvfeLug6T6CcI2ukQdZ/opI1cSabqGl79arSBgE/tsghwrHuJ8Exkz4uq
 0b7i8nZa+RiTezwx4EVeGcg6Dv1tG5UTG2VQvD/+QGGKneBlrlaKlI885N/6jsHa
 KxvB7+8ES1pnfGYZenx+RxMdljNrtyptbQEU8gyvoV5YR7635gjZsVsPwWANJo+4
 EGcFFpwWOAcVQaC3dareLTM8nVngU6Wl3Rd7JjZtjvtZba8DdCn669R34zDGXbiP
 +1n1dYYMSMBeqVUbAQfQyLD0pqMIHdwQj2TN8thSGccr2o3gNk6AXgYq0aYm8IBf
 xDCvAansJw9WqmxErIIsD4BFkMqF7MjH3eYZxwCPWSrKGDvKxQSPV5FarnpDC9U7
 dYCWVxNPmtn+unC/53yXjEcBepKaYgNR7j5G7uOfkHvU43Bd5demzLiVJ10D8abJ
 ezyErxxEqX2Gr7JR2fWv7iBbULJViqcAnYjdl0y0NgK/hftt98iuge6cZmt1z6ai
 24vI3X4VhvvVN5/f64cFDAdYtMRUtOo2dmxdXMid1NI07Mj2qFU1MUwb8RHHlxbK
 8UegV2zcrBghnVuMkw==
 =ib5Q
 -----END PGP SIGNATURE-----

Merge tag 'kallsyms_show_value-v5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull kallsyms fix from Kees Cook:
 "Refactor kallsyms_show_value() users for correct cred.

  I'm not delighted by the timing of getting these changes to you, but
  it does fix a handful of kernel address exposures, and no one has
  screamed yet at the patches.

  Several users of kallsyms_show_value() were performing checks not
  during "open". Refactor everything needed to gain proper checks
  against file->f_cred for modules, kprobes, and bpf"

* tag 'kallsyms_show_value-v5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  selftests: kmod: Add module address visibility test
  bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok()
  kprobes: Do not expose probe addresses to non-CAP_SYSLOG
  module: Do not expose section addresses to non-CAP_SYSLOG
  module: Refactor section attr into bin attribute
  kallsyms: Refactor kallsyms_show_value() to take cred
2020-07-09 13:09:30 -07:00
Christoph Paasch ce69e563b3 tcp: make sure listeners don't initialize congestion-control state
syzkaller found its way into setsockopt with TCP_CONGESTION "cdg".
tcp_cdg_init() does a kcalloc to store the gradients. As sk_clone_lock
just copies all the memory, the allocated pointer will be copied as
well, if the app called setsockopt(..., TCP_CONGESTION) on the listener.
If now the socket will be destroyed before the congestion-control
has properly been initialized (through a call to tcp_init_transfer), we
will end up freeing memory that does not belong to that particular
socket, opening the door to a double-free:

[   11.413102] ==================================================================
[   11.414181] BUG: KASAN: double-free or invalid-free in tcp_cleanup_congestion_control+0x58/0xd0
[   11.415329]
[   11.415560] CPU: 3 PID: 4884 Comm: syz-executor.5 Not tainted 5.8.0-rc2 #80
[   11.416544] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[   11.418148] Call Trace:
[   11.418534]  <IRQ>
[   11.418834]  dump_stack+0x7d/0xb0
[   11.419297]  print_address_description.constprop.0+0x1a/0x210
[   11.422079]  kasan_report_invalid_free+0x51/0x80
[   11.423433]  __kasan_slab_free+0x15e/0x170
[   11.424761]  kfree+0x8c/0x230
[   11.425157]  tcp_cleanup_congestion_control+0x58/0xd0
[   11.425872]  tcp_v4_destroy_sock+0x57/0x5a0
[   11.426493]  inet_csk_destroy_sock+0x153/0x2c0
[   11.427093]  tcp_v4_syn_recv_sock+0xb29/0x1100
[   11.427731]  tcp_get_cookie_sock+0xc3/0x4a0
[   11.429457]  cookie_v4_check+0x13d0/0x2500
[   11.433189]  tcp_v4_do_rcv+0x60e/0x780
[   11.433727]  tcp_v4_rcv+0x2869/0x2e10
[   11.437143]  ip_protocol_deliver_rcu+0x23/0x190
[   11.437810]  ip_local_deliver+0x294/0x350
[   11.439566]  __netif_receive_skb_one_core+0x15d/0x1a0
[   11.441995]  process_backlog+0x1b1/0x6b0
[   11.443148]  net_rx_action+0x37e/0xc40
[   11.445361]  __do_softirq+0x18c/0x61a
[   11.445881]  asm_call_on_stack+0x12/0x20
[   11.446409]  </IRQ>
[   11.446716]  do_softirq_own_stack+0x34/0x40
[   11.447259]  do_softirq.part.0+0x26/0x30
[   11.447827]  __local_bh_enable_ip+0x46/0x50
[   11.448406]  ip_finish_output2+0x60f/0x1bc0
[   11.450109]  __ip_queue_xmit+0x71c/0x1b60
[   11.451861]  __tcp_transmit_skb+0x1727/0x3bb0
[   11.453789]  tcp_rcv_state_process+0x3070/0x4d3a
[   11.456810]  tcp_v4_do_rcv+0x2ad/0x780
[   11.457995]  __release_sock+0x14b/0x2c0
[   11.458529]  release_sock+0x4a/0x170
[   11.459005]  __inet_stream_connect+0x467/0xc80
[   11.461435]  inet_stream_connect+0x4e/0xa0
[   11.462043]  __sys_connect+0x204/0x270
[   11.465515]  __x64_sys_connect+0x6a/0xb0
[   11.466088]  do_syscall_64+0x3e/0x70
[   11.466617]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   11.467341] RIP: 0033:0x7f56046dc469
[   11.467844] Code: Bad RIP value.
[   11.468282] RSP: 002b:00007f5604dccdd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
[   11.469326] RAX: ffffffffffffffda RBX: 000000000068bf00 RCX: 00007f56046dc469
[   11.470379] RDX: 0000000000000010 RSI: 0000000020000000 RDI: 0000000000000004
[   11.471311] RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
[   11.472286] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[   11.473341] R13: 000000000041427c R14: 00007f5604dcd5c0 R15: 0000000000000003
[   11.474321]
[   11.474527] Allocated by task 4884:
[   11.475031]  save_stack+0x1b/0x40
[   11.475548]  __kasan_kmalloc.constprop.0+0xc2/0xd0
[   11.476182]  tcp_cdg_init+0xf0/0x150
[   11.476744]  tcp_init_congestion_control+0x9b/0x3a0
[   11.477435]  tcp_set_congestion_control+0x270/0x32f
[   11.478088]  do_tcp_setsockopt.isra.0+0x521/0x1a00
[   11.478744]  __sys_setsockopt+0xff/0x1e0
[   11.479259]  __x64_sys_setsockopt+0xb5/0x150
[   11.479895]  do_syscall_64+0x3e/0x70
[   11.480395]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   11.481097]
[   11.481321] Freed by task 4872:
[   11.481783]  save_stack+0x1b/0x40
[   11.482230]  __kasan_slab_free+0x12c/0x170
[   11.482839]  kfree+0x8c/0x230
[   11.483240]  tcp_cleanup_congestion_control+0x58/0xd0
[   11.483948]  tcp_v4_destroy_sock+0x57/0x5a0
[   11.484502]  inet_csk_destroy_sock+0x153/0x2c0
[   11.485144]  tcp_close+0x932/0xfe0
[   11.485642]  inet_release+0xc1/0x1c0
[   11.486131]  __sock_release+0xc0/0x270
[   11.486697]  sock_close+0xc/0x10
[   11.487145]  __fput+0x277/0x780
[   11.487632]  task_work_run+0xeb/0x180
[   11.488118]  __prepare_exit_to_usermode+0x15a/0x160
[   11.488834]  do_syscall_64+0x4a/0x70
[   11.489326]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Wei Wang fixed a part of these CDG-malloc issues with commit c120144407
("tcp: memset ca_priv data to 0 properly").

This patch here fixes the listener-scenario: We make sure that listeners
setting the congestion-control through setsockopt won't initialize it
(thus CDG never allocates on listeners). For those who use AF_UNSPEC to
reuse a socket, tcp_disconnect() is changed to cleanup afterwards.

(The issue can be reproduced at least down to v4.4.x.)

Cc: Wei Wang <weiwan@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Fixes: 2b0a8c9eee ("tcp: add CDG congestion control")
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09 13:07:45 -07:00
Paolo Abeni ac3b45f609 mptcp: add MPTCP socket diag interface
exposes basic inet socket attribute, plus some MPTCP socket
fields comprising PM status and MPTCP-level sequence numbers.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09 12:38:41 -07:00
Paolo Abeni 96d890daad mptcp: add msk interations helper
mptcp_token_iter_next() allow traversing all the MPTCP
sockets inside the token container belonging to the given
network namespace with a quite standard iterator semantic.

That will be used by the next patch, but keep the API generic,
as we plan to use this later for PM's sake.

Additionally export mptcp_token_get_sock(), as it also
will be used by the diag module.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09 12:38:41 -07:00
Paolo Abeni 3f935c75eb inet_diag: support for wider protocol numbers
After commit bf9765145b ("sock: Make sk_protocol a 16-bit value")
the current size of 'sdiag_protocol' is not sufficient to represent
the possible protocol values.

This change introduces a new inet diag request attribute to let
user space specify the relevant protocol number using u32 values.

The attribute is parsed by inet diag core on get/dump command
and the extended protocol value, if available, is preferred to
'sdiag_protocol' to lookup the diag handler.

The parse attributed are exposed to all the diag handlers via
the cb->data.

Note that inet_diag_dump_one_icsk() is left unmodified, as it
will not be used by protocol using the extended attribute.

Suggested-by: David S. Miller <davem@davemloft.net>
Co-developed-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09 12:38:41 -07:00
Michal Kubecek 365f9ae4ee ethtool: fix genlmsg_put() failure handling in ethnl_default_dumpit()
If the genlmsg_put() call in ethnl_default_dumpit() fails, we bail out
without checking if we already have some messages in current skb like we do
with ethnl_default_dump_one() failure later. Therefore if existing messages
almost fill up the buffer so that there is not enough space even for
netlink and genetlink header, we lose all prepared messages and return and
error.

Rather than duplicating the skb->len check, move the genlmsg_put(),
genlmsg_cancel() and genlmsg_end() calls into ethnl_default_dump_one().
This is also more logical as all message composition will be in
ethnl_default_dump_one() and only iteration logic will be left in
ethnl_default_dumpit().

Fixes: 728480f124 ("ethtool: default handlers for GET requests")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09 12:35:33 -07:00
Cong Wang 306381aec7 net_sched: fix a memory leak in atm_tc_init()
When tcf_block_get() fails inside atm_tc_init(),
atm_tc_put() is called to release the qdisc p->link.q.
But the flow->ref prevents it to do so, as the flow->ref
is still zero.

Fix this by moving the p->link.ref initialization before
tcf_block_get().

Fixes: 6529eaba33 ("net: sched: introduce tcf block infractructure")
Reported-and-tested-by: syzbot+d411cff6ab29cc2c311b@syzkaller.appspotmail.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09 12:31:28 -07:00
Kees Cook 6396026045 bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok()
When evaluating access control over kallsyms visibility, credentials at
open() time need to be used, not the "current" creds (though in BPF's
case, this has likely always been the same). Plumb access to associated
file->f_cred down through bpf_dump_raw_ok() and its callers now that
kallsysm_show_value() has been refactored to take struct cred.

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 7105e828c0 ("bpf: allow for correlation of maps and helpers in dump")
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-08 16:01:21 -07:00
Hamish Martin a34f829164 tipc: fix retransmission on unicast links
A scenario has been observed where a 'bc_init' message for a link is not
retransmitted if it fails to be received by the peer. This leads to the
peer never establishing the link fully and it discarding all other data
received on the link. In this scenario the message is lost in transit to
the peer.

The issue is traced to the 'nxt_retr' field of the skb not being
initialised for links that aren't a bc_sndlink. This leads to the
comparison in tipc_link_advance_transmq() that gates whether to attempt
retransmission of a message performing in an undesirable way.
Depending on the relative value of 'jiffies', this comparison:
    time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)
may return true or false given that 'nxt_retr' remains at the
uninitialised value of 0 for non bc_sndlinks.

This is most noticeable shortly after boot when jiffies is initialised
to a high value (to flush out rollover bugs) and we compare a jiffies of,
say, 4294940189 to zero. In that case time_before returns 'true' leading
to the skb not being retransmitted.

The fix is to ensure that all skbs have a valid 'nxt_retr' time set for
them and this is achieved by refactoring the setting of this value into
a central function.
With this fix, transmission losses of 'bc_init' messages do not stall
the link establishment forever because the 'bc_init' message is
retransmitted and the link eventually establishes correctly.

Fixes: 382f598fb6 ("tipc: reduce duplicate packets for unicast traffic")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-08 15:39:50 -07:00
Linus Walleij efd7fe68f0 net: dsa: tag_rtl4_a: Implement Realtek 4 byte A tag
This implements the known parts of the Realtek 4 byte
tag protocol version 0xA, as found in the RTL8366RB
DSA switch.

It is designated as protocol version 0xA as a
different Realtek 4 byte tag format with protocol
version 0x9 is known to exist in the Realtek RTL8306
chips.

The tag and switch chip lacks public documentation, so
the tag format has been reverse-engineered from
packet dumps. As only ingress traffic has been available
for analysis an egress tag has not been possible to
develop (even using educated guesses about bit fields)
so this is as far as it gets. It is not known if the
switch even supports egress tagging.

Excessive attempts to figure out the egress tag format
was made. When nothing else worked, I just tried all bit
combinations with 0xannp where a is protocol and p is
port. I looped through all values several times trying
to get a response from ping, without any positive
result.

Using just these ingress tags however, the switch
functionality is vastly improved and the packets find
their way into the destination port without any
tricky VLAN configuration. On the D-Link DIR-685 the
LAN ports now come up and respond to ping without
any command line configuration so this is a real
improvement for users.

Egress packets need to be restricted to the proper
target ports using VLAN, which the RTL8366RB DSA
switch driver already sets up.

Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Mauri Sandberg <sandberg@mailfence.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-08 15:36:19 -07:00
Meir Lichtinger 065e0d42a0 ethtool: Add support for 100Gbps per lane link modes
Define 100G, 200G and 400G link modes using 100Gbps per lane

LR, ER and FR are defined as a single link mode because they are
using same technology and by design are fully interoperable.
EEPROM content indicates if the module is LR, ER, or FR, and the
user space ethtool decoder is planned to support decoding these
modes in the EEPROM.

Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
CC: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-08 15:30:42 -07:00
Xin Long 27d5332366 l2tp: remove skb_dst_set() from l2tp_xmit_skb()
In the tx path of l2tp, l2tp_xmit_skb() calls skb_dst_set() to set
skb's dst. However, it will eventually call inet6_csk_xmit() or
ip_queue_xmit() where skb's dst will be overwritten by:

   skb_dst_set_noref(skb, dst);

without releasing the old dst in skb. Then it causes dst/dev refcnt leak:

  unregister_netdevice: waiting for eth0 to become free. Usage count = 1

This can be reproduced by simply running:

  # modprobe l2tp_eth && modprobe l2tp_ip
  # sh ./tools/testing/selftests/net/l2tp.sh

So before going to inet6_csk_xmit() or ip_queue_xmit(), skb's dst
should be dropped. This patch is to fix it by removing skb_dst_set()
from l2tp_xmit_skb() and moving skb_dst_drop() into l2tp_xmit_core().

Fixes: 3557baabf2 ("[L2TP]: PPP over L2TP driver core")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: James Chapman <jchapman@katalix.com>
Tested-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-08 15:24:33 -07:00
David S. Miller e80a07b244 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Support for rejecting packets from the prerouting chain, from
   Laura Garcia Liebana.

2) Remove useless assignment in pipapo, from Stefano Brivio.

3) On demand hook registration in IPVS, from Julian Anastasov.

4) Expire IPVS connection from process context to not overload
   timers, also from Julian.

5) Fallback to conntrack TCP tracker to handle connection reuse
   in IPVS, from Julian Anastasov.

6) Several patches to support for chain bindings.

7) Expose enum nft_chain_flags through UAPI.

8) Reject unsupported chain flags from the netlink control plane.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-08 12:42:40 -07:00
Florian Fainelli 17809516a0 net: phy: Uninline PHY ethtool statistics operations
Now that we have moved the PHY ethtool statistics to be dynamically
registered, we no longer need to inline those for ethtool. This used to
be done to avoid cross symbol referencing and allow ethtool to be
decoupled from PHYLIB entirely.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-08 12:39:05 -07:00
Ursula Braun fb4f79264c net/smc: tolerate future SMCD versions
CLC proposal messages of future SMCD versions could be larger than SMCD
V1 CLC proposal messages.
To enable toleration in SMC V1 the receival of CLC proposal messages
is adapted:
* accept larger length values in CLC proposal
* check trailing eye catcher for incoming CLC proposal with V1 length only
* receive the whole CLC proposal even in cases it does not fit into the
  V1 buffer

Fixes: e7b7a64a84 ("smc: support variable CLC proposal messages")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-08 12:35:15 -07:00
Ursula Braun 82087c0330 net/smc: switch smcd_dev_list spinlock to mutex
The similar smc_ib_devices spinlock has been converted to a mutex.
Protecting the smcd_dev_list by a mutex is possible as well. This
patch converts the smcd_dev_list spinlock to a mutex.

Fixes: c6ba7c9ba4 ("net/smc: add base infrastructure for SMC-D and ISM")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-08 12:35:15 -07:00
Ursula Braun 92f3cb0e11 net/smc: fix sleep bug in smc_pnet_find_roce_resource()
Tests showed this BUG:
[572555.252867] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
[572555.252876] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 131031, name: smcapp
[572555.252879] INFO: lockdep is turned off.
[572555.252883] CPU: 1 PID: 131031 Comm: smcapp Tainted: G           O      5.7.0-rc3uschi+ #356
[572555.252885] Hardware name: IBM 3906 M03 703 (LPAR)
[572555.252887] Call Trace:
[572555.252896]  [<00000000ac364554>] show_stack+0x94/0xe8
[572555.252901]  [<00000000aca1f400>] dump_stack+0xa0/0xe0
[572555.252906]  [<00000000ac3c8c10>] ___might_sleep+0x260/0x280
[572555.252910]  [<00000000acdc0c98>] __mutex_lock+0x48/0x940
[572555.252912]  [<00000000acdc15c2>] mutex_lock_nested+0x32/0x40
[572555.252975]  [<000003ff801762d0>] mlx5_lag_get_roce_netdev+0x30/0xc0 [mlx5_core]
[572555.252996]  [<000003ff801fb3aa>] mlx5_ib_get_netdev+0x3a/0xe0 [mlx5_ib]
[572555.253007]  [<000003ff80063848>] smc_pnet_find_roce_resource+0x1d8/0x310 [smc]
[572555.253011]  [<000003ff800602f0>] __smc_connect+0x1f0/0x3e0 [smc]
[572555.253015]  [<000003ff80060634>] smc_connect+0x154/0x190 [smc]
[572555.253022]  [<00000000acbed8d4>] __sys_connect+0x94/0xd0
[572555.253025]  [<00000000acbef620>] __s390x_sys_socketcall+0x170/0x360
[572555.253028]  [<00000000acdc6800>] system_call+0x298/0x2b8
[572555.253030] INFO: lockdep is turned off.

Function smc_pnet_find_rdma_dev() might be called from
smc_pnet_find_roce_resource(). It holds the smc_ib_devices list
spinlock while calling infiniband op get_netdev(). At least for mlx5
the get_netdev operation wants mutex serialization, which conflicts
with the smc_ib_devices spinlock.
This patch switches the smc_ib_devices spinlock into a mutex to
allow sleeping when calling get_netdev().

Fixes: a4cf0443c4 ("smc: introduce SMC as an IB-client")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-08 12:35:15 -07:00
Karsten Graul b7eede7578 net/smc: fix work request handling
Wait for pending sends only when smc_switch_conns() found a link to move
the connections to. Do not wait during link freeing, this can lead to
permanent hang situations. And refuse to provide a new tx slot on an
unusable link.

Fixes: c6f02ebeea ("net/smc: switch connections to alternate link")
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-08 12:35:15 -07:00
Karsten Graul 6778a6bed0 net/smc: separate LLC wait queues for flow and messages
There might be races in scenarios where both SMC link groups are on the
same system. Prevent that by creating separate wait queues for LLC flows
and messages. Switch to non-interruptable versions of wait_event() and
wake_up() for the llc flow waiter to make sure the waiters get control
sequentially. Fine tune the llc_flow_lock to include the assignment of
the message. Write to system log when an unexpected message was
dropped. And remove an extra indirection and use the existing local
variable lgr in smc_llc_enqueue().

Fixes: 555da9af82 ("net/smc: add event-based llc_flow framework")
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-08 12:35:15 -07:00
Christoph Hellwig 6955a76fbc bpfilter: switch to kernel_write
While pipes don't really need sb_writers projection, __kernel_write is an
interface better kept private, and the additional rw_verify_area does not
hurt here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08 08:27:56 +02:00
Gustavo A. R. Silva 964201de69 net/sched: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-07 15:47:46 -07:00