Commit Graph

94 Commits

Author SHA1 Message Date
Linus Torvalds b2fe5fa686 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Significantly shrink the core networking routing structures. Result
    of http://vger.kernel.org/~davem/seoul2017_netdev_keynote.pdf

 2) Add netdevsim driver for testing various offloads, from Jakub
    Kicinski.

 3) Support cross-chip FDB operations in DSA, from Vivien Didelot.

 4) Add a 2nd listener hash table for TCP, similar to what was done for
    UDP. From Martin KaFai Lau.

 5) Add eBPF based queue selection to tun, from Jason Wang.

 6) Lockless qdisc support, from John Fastabend.

 7) SCTP stream interleave support, from Xin Long.

 8) Smoother TCP receive autotuning, from Eric Dumazet.

 9) Lots of erspan tunneling enhancements, from William Tu.

10) Add true function call support to BPF, from Alexei Starovoitov.

11) Add explicit support for GRO HW offloading, from Michael Chan.

12) Support extack generation in more netlink subsystems. From Alexander
    Aring, Quentin Monnet, and Jakub Kicinski.

13) Add 1000BaseX, flow control, and EEE support to mvneta driver. From
    Russell King.

14) Add flow table abstraction to netfilter, from Pablo Neira Ayuso.

15) Many improvements and simplifications to the NFP driver bpf JIT,
    from Jakub Kicinski.

16) Support for ipv6 non-equal cost multipath routing, from Ido
    Schimmel.

17) Add resource abstration to devlink, from Arkadi Sharshevsky.

18) Packet scheduler classifier shared filter block support, from Jiri
    Pirko.

19) Avoid locking in act_csum, from Davide Caratti.

20) devinet_ioctl() simplifications from Al viro.

21) More TCP bpf improvements from Lawrence Brakmo.

22) Add support for onlink ipv6 route flag, similar to ipv4, from David
    Ahern.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1925 commits)
  tls: Add support for encryption using async offload accelerator
  ip6mr: fix stale iterator
  net/sched: kconfig: Remove blank help texts
  openvswitch: meter: Use 64-bit arithmetic instead of 32-bit
  tcp_nv: fix potential integer overflow in tcpnv_acked
  r8169: fix RTL8168EP take too long to complete driver initialization.
  qmi_wwan: Add support for Quectel EP06
  rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK
  ipmr: Fix ptrdiff_t print formatting
  ibmvnic: Wait for device response when changing MAC
  qlcnic: fix deadlock bug
  tcp: release sk_frag.page in tcp_disconnect
  ipv4: Get the address of interface correctly.
  net_sched: gen_estimator: fix lockdep splat
  net: macb: Handle HRESP error
  net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoring
  ipv6: addrconf: break critical section in addrconf_verify_rtnl()
  ipv6: change route cache aging logic
  i40e/i40evf: Update DESC_NEEDED value to reflect larger value
  bnxt_en: cleanup DIM work on device shutdown
  ...
2018-01-31 14:31:10 -08:00
Linus Torvalds 1ed2d76e02 Merge branch 'work.sock_recvmsg' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull kern_recvmsg reduction from Al Viro:
 "kernel_recvmsg() is a set_fs()-using wrapper for sock_recvmsg(). In
  all but one case that is not needed - use of ITER_KVEC for ->msg_iter
  takes care of the data and does not care about set_fs(). The only
  exception is svc_udp_recvfrom() where we want cmsg to be store into
  kernel object; everything else can just use sock_recvmsg() and be done
  with that.

  A followup converting svc_udp_recvfrom() away from set_fs() (and
  killing kernel_recvmsg() off) is *NOT* in here - I'd like to hear what
  netdev folks think of the approach proposed in that followup)"

* 'work.sock_recvmsg' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  tipc: switch to sock_recvmsg()
  smc: switch to sock_recvmsg()
  ipvs: switch to sock_recvmsg()
  mISDN: switch to sock_recvmsg()
  drbd: switch to sock_recvmsg()
  lustre lnet_sock_read(): switch to sock_recvmsg()
  cfs2: switch to sock_recvmsg()
  ncpfs: switch to sock_recvmsg()
  dlm: switch to sock_recvmsg()
  svc_recvfrom(): switch to sock_recvmsg()
2018-01-30 18:59:03 -08:00
Gustavo A. R. Silva a8fbf8e7ec net/smc: return booleans instead of integers
Return statements in functions returning bool should use
true/false instead of 1/0.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26 10:41:56 -05:00
Ursula Braun 127f497058 net/smc: release clcsock from tcp_listen_worker
Closing a listen socket may hit the warning
WARN_ON(sock_owned_by_user(sk)) of tcp_close(), if the wake up of
the smc_tcp_listen_worker has not yet finished.
This patch introduces smc_close_wait_listen_clcsock() making sure
the listening internal clcsock has been closed in smc_tcp_listen_work(),
before the listening external SMC socket finishes closing.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26 10:41:56 -05:00
Ursula Braun 51f1de79ad net/smc: replace sock_put worker by socket refcounting
Proper socket refcounting makes the sock_put worker obsolete.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26 10:41:56 -05:00
Ursula Braun 8dce2786a2 net/smc: smc_poll improvements
Increase the socket refcount during poll wait.
Take the socket lock before checking socket state.
For a listening socket return a mask independent of state SMC_ACTIVE and
cover errors or closed state as well.
Get rid of the accept_q loop in smc_accept_poll().

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26 10:41:56 -05:00
Ursula Braun da05bf2981 net/smc: handle device, port, and QP error events
RoCE device changes cause an IB event, processed in the global event
handler for the ROCE device. Problems for a certain Queue Pair cause a QP
event, processed in the QP event handler for this QP.
Among those events are port errors and other fatal device errors. All
link groups using such a port or device must be terminated in those cases.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26 10:41:56 -05:00
Ursula Braun 1a0a04c7a8 net/smc: check for healthy link group resp. connections
If a problem for at least one connection of a link group is detected,
the whole link group and all its connections are terminated.
This patch adds a check for healthy link group when trying to reserve
a work request, and checks for healthy connections before starting
a tx worker.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25 16:10:42 -05:00
Ursula Braun 732720fafd net/smc: wake up wr_reg_wait when terminating a link group
If a new connection with a new rmb is added to a link group, its
memory region is registered. If a link group is terminated, a pending
registration requires a wake up.

And consolidate setting of tx_flag peer_conn_abort in smc_lgr_terminate().

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25 16:10:42 -05:00
Ursula Braun 610db66f37 net/smc: do not reuse a linkgroup with setup problems
Once a linkgroup is created successfully, it stays alive for a
certain time to service more connections potentially created.
If one of the initialization steps for a new linkgroup fails,
the linkgroup should not be reused by other connections following.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25 16:10:42 -05:00
Ursula Braun b4772b3a87 net/smc: terminate link group for ib_post_send problems
If ib_post_send() fails, terminate all connections of this
link group.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25 16:10:42 -05:00
Ursula Braun 5ac92a00aa net/smc: handle state SMC_PEERFINCLOSEWAIT correctly
A state transition from closing state SMC_PEERFINCLOSEWAIT to closing
state SMC_APPFINCLOSEWAIT is not allowed. Once a closing indication
from the peer has been received, the socket reaches state SMC_CLOSED.

And receiving a peer_conn_abort just changes the state of the socket
into one of the states SMC_PROCESSABORT or SMC_CLOSED;
sending a peer_conn_abort occurs in smc_close_active() for state
SMC_PROCESSABORT only.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25 16:10:42 -05:00
Ursula Braun 611b63a127 net/smc: cancel tx worker in case of socket aborts
If an SMC socket is aborted, the tx worker should be cancelled.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25 16:10:42 -05:00
Ursula Braun aa377e682d net/smc: continue waiting if peer signals write_shutdown
If the peer sends a shutdown WRITE, this should not affect sending
in general, and waiting for send buffer space in particular.
Stop waiting of the local socket for send buffer space only, if peer
signals closing, but not if peer signals just shutdown WRITE.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24 10:52:57 -05:00
Ursula Braun bbb96bf236 net/smc: improve state change handling after close wait
When a socket is closed or shutdown, smc waits for data being transmitted
in certain states. If the state changes during this wait, the close
switch depending on state should be reentered.
In addition, state change is avoided if sending of close or shutdown fails.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24 10:52:57 -05:00
Ursula Braun 86e780d3a3 net/smc: make wait for work request uninterruptible
Work requests are needed for every ib_post_send(), among them the
ib_post_send() to signal closing. If an smc socket program is cancelled,
the smc connections should be cleaned up, and require sending of closing
signals to the peer. This may fail, if a wait for
a free work request is needed, but is cancelled immediately due to the
cancel interrupt. To guarantee notification of the peer, the wait for
a work request is changed to uninterruptible.

And the area to receive work request completion info with
ib_poll_cq() is cleared first.
And _tx_ variable names are used in the _tx_routines for the
demultiplexing common type in the header.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24 10:52:57 -05:00
Ursula Braun 8429c13435 net/smc: get rid of tx_pend waits in socket closing
There is no need to wait for confirmation of pending tx requests
for a closing connection, since pending tx slots are dismissed
when finishing a connection.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24 10:52:57 -05:00
Ursula Braun 35a6b17847 net/smc: simplify function smc_clcsock_accept()
Cleanup to avoid duplicate code in smc_clcsock_accept().
No functional change.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24 10:52:57 -05:00
Ursula Braun 3163c5071f net/smc: use local struct sock variables consistently
Cleanup to consistently exploit the local struct sock definitions.
No functional change.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24 10:52:57 -05:00
Ursula Braun e7b7a64a84 smc: support variable CLC proposal messages
According to RFC7609 [1] the CLC proposal message contains an area of
unknown length for future growth. Additionally it may contain up to
8 IPv6 prefixes. The current version of the SMC-code does not
understand CLC proposal messages using these variable length fields and,
thus, is incompatible with SMC implementations in other operating
systems.

This patch makes sure, SMC understands incoming CLC proposals
* with arbitrary length values for future growth
* with up to 8 IPv6 prefixes

[1] SMC-R Informational RFC: http://www.rfc-editor.org/info/rfc7609

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-07 15:03:12 -05:00
Ursula Braun 6b5771aa3c smc: no consumer update in tasklet context
The SMC protocol requires to send a separate consumer cursor update,
if it cannot be piggybacked to updates of the producer cursor.
When receiving a blocked signal from the sender, this update is sent
already in tasklet context. In addition consumer cursor updates are
sent after data receival.
Sending of cursor updates is controlled by sequence numbers.
Assuming receiving stray messages the receiver drops updates with older
sequence numbers than an already received cursor update with a higher
sequence number.
Sending consumer cursor updates in tasklet context may result in
wrong order sends and its corresponding drops at the receiver. Since
it is sufficient to send consumer cursor updates once the data is
received, this patch gets rid of the consumer cursor update in tasklet
context to guarantee in-sequence arrival of cursor updates.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-07 15:03:12 -05:00
Ursula Braun 71c125c3f2 smc: cleanup close checking during data receival
When waiting for data to be received it must be checked if the
peer signals shutdown. The SMC code uses two different checks
for this purpose, even though just one check is sufficient.
This patch removes the superfluous test for SOCK_DONE.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-07 15:03:12 -05:00
Ursula Braun 4bd3e7fbfa smc: no update for unused sk_write_pending
The smc code never checks the sk_write_pending sock field.
Thus there is no need to update it.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-07 15:03:12 -05:00
Ursula Braun 0c9f1515aa smc: improve smc_clc_send_decline() error handling
Let smc_clc_send_decline() return with an error, if the amount
sent is smaller than the length of an smc decline message.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-07 15:03:12 -05:00
Ursula Braun a8ae890b9c smc: make smc_close_active_abort() static
smc_close_active_abort() is used in smc_close.c only.
Make it static.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-07 15:03:12 -05:00
Al Viro d63d271ce2 smc: switch to sock_recvmsg()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-12-02 20:38:09 -05:00
Al Viro ade994f4f6 net: annotate ->poll() instances
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-27 16:20:04 -05:00
Al Viro e6c8adca20 anntotate the places where ->poll() return values go
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-27 16:19:53 -05:00
Geert Uytterhoeven 6887037025 net/smc: Fix preinitialization of buf_desc in __smc_buf_create()
With gcc-4.1.2:

    net/smc/smc_core.c: In function ‘__smc_buf_create’:
    net/smc/smc_core.c:567: warning: ‘bufsize’ may be used uninitialized in this function

Indeed, if the for-loop is never executed, bufsize is used
uninitialized.  In addition, buf_desc is stored for later use, while it
is still a NULL pointer.

Before, error handling was done by checking if buf_desc is non-NULL.
The cleanup changed this to an error check, but forgot to update the
preinitialization of buf_desc to an error pointer.

Update the preinitializatin of buf_desc to fix this.

Fixes: b33982c3a6 ("net/smc: cleanup function __smc_buf_create()")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24 01:33:34 +09:00
Ursula Braun 4e1061f4a2 net/smc: use sk_rcvbuf as start for rmb creation
Commit 3e034725c0 ("net/smc: common functions for RMBs and send buffers")
merged handling of SMC receive and send buffers. It introduced sk_buf_size
as merged start value for size determination. But since sk_buf_size is not
used at all, sk_sndbuf is erroneously used as start for rmb creation.
This patch makes sure, sk_buf_size is really used as intended, and
sk_rcvbuf is used as start value for rmb creation.

Fixes: 3e034725c0 ("net/smc: common functions for RMBs and send buffers")
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24 01:33:34 +09:00
David S. Miller 2a171788ba Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Files removed in 'net-next' had their license header updated
in 'net'.  We take the remove from 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:26:51 +09:00
Greg Kroah-Hartman b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Ursula Braun c5c1cc9c52 smc: add SMC rendezvous protocol
The SMC protocol [1] uses a rendezvous protocol to negotiate SMC
capability between peers. The current Linux implementation does not yet
use this rendezvous protocol and, thus, is not compliant to RFC7609 and
incompatible with other SMC implementations like in zOS.
This patch adds support for the SMC rendezvous protocol. It uses a new
TCP experimental option. With this option, SMC capabilities are
exchanged between the peers during the TCP three way handshake.

[1] SMC-R Informational RFC: http://www.rfc-editor.org/info/rfc7609

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-26 18:00:29 +09:00
Ursula Braun 145686baab smc: fix mutex unlocks during link group creation
Link group creation is synchronized with the smc_create_lgr_pending
lock. In smc_listen_work() this mutex is sometimes unlocked, even
though it has not been locked before. This issue will surface in
presence of the SMC rendezvous code.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-26 18:00:29 +09:00
Gustavo A. R. Silva 7f6b437e9b net: smc_close: mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in this particular case I placed the "fall through" comment
on its own line, which is what GCC is expecting to find.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 18:29:39 +09:00
Ursula Braun 43e2ada3e0 net/smc: dev_put for netdev after usage of ib_query_gid()
For RoCEs ib_query_gid() takes a reference count on the net_device.
This reference count must be decreased by the caller.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reported-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Fixes: 0cfdd8f92c ("smc: connection and link group creation")
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-12 12:20:27 -07:00
Ursula Braun d921c420d2 net/smc: replace function pointer get_netdev()
SMC should not open code the function pointer get_netdev of the
IB device. Replacing ib_query_gid(..., NULL) with
ib_query_gid(..., gid_attr) allows access to the netdev.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Suggested-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-12 12:20:26 -07:00
David S. Miller 1f8d31d189 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-09-23 10:16:53 -07:00
Ursula Braun 51957bc53a net/smc: parameter cleanup in smc_cdc_get_free_slot()
Use the smc_connection as first parameter with smc_cdc_get_free_slot().
This is just a small code cleanup, no functional change.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:33:03 -07:00
Ursula Braun 8c96feeeb3 net/smc: no close wait in case of process shut down
Usually socket closing is delayed if there is still data available in
the send buffer to be transmitted. If a process is killed, the delay
should be avoided.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:31:03 -07:00
Ursula Braun 18e537cd58 net/smc: introduce a delay
The number of outstanding work requests is limited. If all work
requests are in use, tx processing is postponed to another scheduling
of the tx worker. Switch to a delayed worker to have a gap for tx
completion queue events before the next retry.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:31:03 -07:00
Ursula Braun bfbedfd383 net/smc: terminate link group if out-of-sync is received
An out-of-sync condition can just be detected by the client.
If the server receives a CLC DECLINE message indicating an out-of-sync
condition for the link groups, the server must clean up the out-of-sync
link group.
There is no need for an extra third parameter in smc_clc_send_decline().

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:31:03 -07:00
Ursula Braun 5bc11ddbdf net/smc: longer delay for client link group removal
Client link group creation always follows the server linkgroup creation.
If peer creates a new server link group, client has to create a new
client link group. If peer reuses a server link group for a new
connection, client has to reuse its client link group as well. This
patch introduces a longer delay for client link group removal to make
sure this link group still exists, once the peer decides to reuse a
server link group. This avoids out-of-sync conditions for link groups.
If already scheduled, modify the delay.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:31:03 -07:00
Ursula Braun 8301fa44b4 net/smc: adapt send request completion notification
The solicited flag is meaningful for the receive completion queue.
Ask for next work completion of any type on the send queue.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:31:03 -07:00
Ursula Braun a6832c3acd net/smc: adjust net_device refcount
smc_pnet_fill_entry() uses dev_get_by_name() adding a refcount to ndev.
The following smc_pnet_enter() has to reduce the refcount if the entry
to be added exists already in the pnet table.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:31:02 -07:00
Ursula Braun 731b008560 net/smc: take RCU read lock for routing cache lookup
smc_netinfo_by_tcpsk() looks up the routing cache. Such a lookup requires
protection by an RCU read lock.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:31:02 -07:00
Hans Wippel 846e344eb7 net/smc: add receive timeout check
The SMC receive function currently lacks a timeout check under the
condition that no data were received and no data are available. This
patch adds such a check.

Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:31:02 -07:00
Hans Wippel 09579ac803 net/smc: add missing dev_put
In the infiniband part, SMC currently uses get_netdev which calls
dev_hold on the returned net device. However, the SMC code never calls
dev_put on that net device resulting in a wrong reference count.

This patch adds a dev_put after the usage of the net device to fix the
issue.

Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 15:31:02 -07:00
Ursula Braun 10428dd835 net/smc: synchronize buffer usage with device
Usage of send buffer "sndbuf" is synced
(a) before filling sndbuf for cpu access
(b) after filling sndbuf for device access

Usage of receive buffer "RMB" is synced
(a) before reading RMB content for cpu access
(b) after reading RMB content for device access

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 11:22:58 -07:00
Ursula Braun b33982c3a6 net/smc: cleanup function __smc_buf_create()
Split function __smc_buf_create() for better readability.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 11:22:58 -07:00