Use a more typical logging style.
Miscellanea:
o Obsolete the c4iw_debug module parameter
o Coalesce formats
o Realign arguments
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Convert printks to pr_<level>
Miscellanea:
o Coalesce formats
o Realign arguments
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Clean up send_connect() and make use of t6 specific
active open request struct.
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Bharat Teja <bharat@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Elsewhere the sin_family field holds a value with a name of the form
AF_..., so it seems reasonable to do so here as well. Also the values
of PF_INET and AF_INET are the same.
The semantic patch that makes this change is as follows:
//</smpl>
@@
struct sockaddr_in sip;
@@
(
sip.sin_family ==
- PF_INET
+ AF_INET
|
sip.sin_family !=
- PF_INET
+ AF_INET
|
sip.sin_family =
- PF_INET
+ AF_INET
)
//</smpl>
Signed-off-by: Shyam Saini <mayhs11saini@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Function rx_data(), which handles ingress CPL_RX_DATA messages, was
always sending an RX_DATA_ACK with the goal of updating the credits.
However, if the RDMA connection is moved out of FPDU mode abruptly,
then it is possible for iw_cxgb4 to process queued RX_DATA CPLs after HW
has aborted the connection. These CPLs should not trigger RX_DATA_ACKS.
If they do, HW can see a READ after DELETE of the DB_LE hash entry for
the tid and post a LE_DB HashTblMemCrcError.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
- Updates to mlx5
- Updates to mlx4 (two conflicts, both minor and easily resolved)
- Updates to iw_cxgb4 (one conflict, not so obvious to resolve, proper
resolution is to keep the code in cxgb4_main.c as it is in Linus'
tree as attach_uld was refactored and moved into cxgb4_uld.c)
- Improvements to uAPI (moved vendor specific API elements to uAPI area)
- Add hns-roce driver and hns and hns-roce ACPI reset support
- Conversion of all rdma code away from deprecated
create_singlethread_workqueue
- Security improvement: remove unsafe ib_get_dma_mr (breaks lustre in
staging)
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJX+AwSAAoJELgmozMOVy/d0WkQAKxPzVccMWwHv28iZI4ey13u
JwE+VoCNpCAZAVuEgzK5zzFdNHPvAk2jU93H4apA7dfXJBXPatVuj9Lnk+ieEEnW
tbFwJjBpbQ3Zol3+SPfAHnsVMbtax+xmd6WDKExPXXEDl1L6rutwL3KKfmgWEitg
ysX7XOJCiSdyM0hcg4T6UPB9a3jGPff9NLu0oGamV+yoUk5Y0WGoVFxHZ4MKcw8t
OkFBYIxGz4SGwq2tulStuH03HteURX594KngtrA8dyq6l1R2GlGRv+bkJAUEIWUv
aA0ow3VWusOM6fT+jLXPCv8iUwIXM8tR/U6F7X+cmORUUtWvCl+uCUVid113j/aN
BK+Af2nJnfoJ5cDBPsD+bC76l5gQycNZO/Qh8op2kmgJtD+6OpGM3cBXsHx53+kk
0wloJ2lKCGShWxNj+ig8n8rR/rhhs/x3vV3ouCVWNMbOUgOSN3eYHxmK3wGFW4nd
Qx+WYCjj9Yi/J6nmUDcfEQ4NWPR22Q2+0ENAabfhLhV6mDloAO5ILHd4GDqC3IA9
UtxlVjf4ZonaiLnTQQzCnDMGVVk6tT8FJ9D42s0ScwjbdYwjyCW9/rs/g2EhcprR
Cc+AmjqLviCWGtzBSFO0SijqQon8lcQOwdLw61CdFFvPa/mlLdf1rbx9ArIyNVKn
JSrbr3CGyoqyYj6qaEO5
=LC+S
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull main rdma updates from Doug Ledford:
"This is the main pull request for the rdma stack this release. The
code has been through 0day and I had it tagged for linux-next testing
for a couple days.
Summary:
- updates to mlx5
- updates to mlx4 (two conflicts, both minor and easily resolved)
- updates to iw_cxgb4 (one conflict, not so obvious to resolve,
proper resolution is to keep the code in cxgb4_main.c as it is in
Linus' tree as attach_uld was refactored and moved into
cxgb4_uld.c)
- improvements to uAPI (moved vendor specific API elements to uAPI
area)
- add hns-roce driver and hns and hns-roce ACPI reset support
- conversion of all rdma code away from deprecated
create_singlethread_workqueue
- security improvement: remove unsafe ib_get_dma_mr (breaks lustre in
staging)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (75 commits)
staging/lustre: Disable InfiniBand support
iw_cxgb4: add fast-path for small REG_MR operations
cxgb4: advertise support for FR_NSMR_TPTE_WR
IB/core: correctly handle rdma_rw_init_mrs() failure
IB/srp: Fix infinite loop when FMR sg[0].offset != 0
IB/srp: Remove an unused argument
IB/core: Improve ib_map_mr_sg() documentation
IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets
IB/mthca: Move user vendor structures
IB/nes: Move user vendor structures
IB/ocrdma: Move user vendor structures
IB/mlx4: Move user vendor structures
IB/cxgb4: Move user vendor structures
IB/cxgb3: Move user vendor structures
IB/mlx5: Move and decouple user vendor structures
IB/{core,hw}: Add constant for node_desc
ipoib: Make ipoib_warn ratelimited
IB/mlx4/alias_GUID: Remove deprecated create_singlethread_workqueue
IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue
IB/ipoib: Remove deprecated create_singlethread_workqueue
...
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces
deprecated create_singlethread_workqueue(). This is the identity
conversion.
The workqueue "workq" queues work item &skb_work. It has been
identity converted.
WQ_MEM_RECLAIM has been set to ensure forward progress under
memory pressure.
Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add cxgb_mk_rx_data_ack() to remove duplicate
code to form CPL_RX_DATA_ACK hardware command.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add cxgb_mk_abort_rpl() to remove duplicate
code to form CPL_ABORT_RPL hardware command.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add cxgb_mk_abort_req() to remove duplicate code
to form CPL_ABORT_REQ hardware command.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add cxgb_mk_close_con_req() to remove duplicate
code to form CPL_CLOSE_CON_REQ hardware command.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add cxgb_mk_tid_release() to remove duplicate code
to form CPL_TID_RELEASE hardware command.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add cxgb_compute_wscale() in libcxgb_cm.h to remove
it's duplicate definitions from cxgb4/cm.c and
cxgbit/cxgbit_cm.c.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add cxgb_best_mtu() in libcxgb_cm.h to remove
it's duplicate definitions from cxgb4/cm.c and
cxgbit/cxgbit_cm.c
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add cxgb_is_neg_adv() in libcxgb_cm.h to remove
it's duplicate definitions from cxgb4/cm.c and
cxgbit/cxgbit_cm.c.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add cxgb_find_route6() in libcxgb_cm.c to remove
it's duplicate definitions from cxgb4/cm.c and
cxgbit/cxgbit_cm.c.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add cxgb_find_route() in libcxgb_cm.c to remove
it's duplicate definitions from cxgb4/cm.c and
cxgbit/cxgbit_cm.c.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add cxgb_get_4tuple() in libcxgb_cm.c to remove
it's duplicate definitions from cxgb4/cm.c and
cxgbit/cxgbit_cm.c.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise an endpoint can be still closing down causing a touch
after free crash. Also WARN_ON if ulps have failed to destroy
various resources during device removal.
Fixes: ad61a4c7a9 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref")
Reviewed-by: Sagi Grimberg <sagi@grimbrg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
The i40iw initiator sends an MPA-request with ird=16 and ord=16. The cxgb4
responder sends an MPA-reply with ord = 32 causing i40iw to terminate
due to insufficient resources.
The logic to reduce the ORD to <= peer's IRD was wrong.
Reported-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The i40iw initiator sends an MPA-request with ird = 63, ord = 63. The
cxgb4 responder sends a RST. Since the inbound ord=63 and it exceeds
the max_ird/c4iw_max_read_depth (=32 default), chelsio decides to abort.
Instead, cxgb4 should adjust the ord/ird down before presenting it to
the ULP.
Reported-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
There exists a race where the application can setup a connection
and then disconnect it before iw_cxgb4 processes the fw4_ack
message. For passive side connections, the fw4_ack message is
used to know when to stop the ep timer for MPA_REPLY messages.
If the application disconnects before the fw4_ack is handled then
c4iw_ep_disconnect() needs to clean up the timer state and stop the
timer before restarting it for the disconnect timer. Failure to do this
results in a "timer already started" message and a premature stopping
of the disconnect timer.
Fixes: e4b76a2 ("RDMA/iw_cxgb4: stop_ep_timer() after MPA negotiation")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pre-allocate buffers for sending various control messages to close
connection, abort connection, etc so that we gracefully handle
connections when system is running out of memory.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Get rid of unneeded code, and refactor things a bit.
For MPA version 0 we abort the connection. For > 0, we attempt to send
an MPA_START/REJECT Reply, and then disconnect gracefully. If the send
of the MPA message fails, then we abort the connection. We can ignore
c4iw_ep_disconnect() errors here because it will clean up the endpoint
if there are failures.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
markers_enabled should be read only once during MPA negotiation.
The present code does read markers_enabled twice during negotiation
which results in setting wrong recv/xmit markers if the markers_enabled is
changed in the middle of negotiation.
With this change the markers_enabled is read only once during MPA
negotiation. recv markers are set based on markers enabled module
parameter and xmit markers are set based on markers flag from the
MPA_START_REQ/MPA_START_REP.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
__force casts should be avoided if there is a better alternative.
Hence modify the comparison of s_addr with INADDR_ANY such that the
__force cast is no longer necessary.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Vipul Pandya <vipul@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
These handlers when called print error message to the kernel log,
but the actual handling is done by _c4iw_free_ep() and process_timeout().
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Currently c4iw_peer_abort_intr() does not wake up the waiter if the
endpoint state indicates we're using MPAv2 and we're currently trying to
connect. This was introduced with commit 7c0a33d611 ("RDMA/cxgb4:
Don't wakeup threads for MPAv2")
However, this original fix is flawed because it introduces a race that
can cause a deadlock of the iwarp stack. Here is the race:
->local side sets up an active offload connection.
->local side sends MPA_START request.
->peer sends MPA_START response.
->local side ingress cpl thread begins processing the MPA_START response,
but before it changes the state from MPA_REQ_SENT to FPDU_MODE:
->peer sends a RST which results in a ABORT_REQ_RSS. This triggers
peer_abort_intr() which sees the state in MPA_REQ_SENT and since mpa_rev
is 2, it will avoid waking up the endpoint with -ECONNRESET, assuming the
stack will re-attempt the connection using MPAv1.
->Meanwhile, the cpl thread moves the state to FPDU_MODE and calls
c4iw_modify_rc_qp() which calls rdma_init() which sends a RI_WR/INIT WR
to firmware. But since HW sent an abort, FW correctly drops the RI_WR/INIT
WR.
->So the cpl thread is stuck waiting for a reply and cannot process the
ABORT_REQ_RSS cpl sitting in its input queue. Thus everything comes to a
halt because no more ingress cpls are processed by the stack...
The correct fix for the issue is to always do the wake up in
c4iw_abort_intr() but reinitialize the wait object in c4iw_reconnect().
Fixes: 7c0a33d611 ("RDMA/cxgb4: Don't wakeup threads for MPAv2")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add get_ep_from_stid() which will atomically find and reference the
endpoint struct if found. This avoids touch-after-free races between
threads destroying listening endpoints and the CPL processing thread
processing an incoming PASS_ACCEPT_REQ CPL.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
c4iw_reject() and c4iw_accept() need to handle the case where the
endpoint has timed out and is in the middle of ABORTING the connection.
Here is the flow that causes the BUG_ON() to fire on the server side:
1) offload connection setup and endpoint timer started
2) MPA_START request received from peer, CONNECT_REQUEST passed to ULP
3) endpoint timer fires, and process_timeout() aborts the connection,
this moves the endpoint state to ABORTING until HW sends up the
ABORT_RPL_RSS.
4) application exits closing the CONNECT_REQUEST cm_id. The IWCM calls
c4iw_reject_cr() to destroy this connection request.
5) WHAMO: BUG_ON() because the state is ABORTING.
The fix is to change c4iw_reject_cr() and c4iw_accept_cr() to fail the
operation if the state is not in MPA_REQ_RCVD vs in DEAD.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
ARP failure may also happen when ep in FPDU_MODE and these failures need
to be handled by process_timeout(). process_timeout() also has to handle
case MPA_REQ_RCVD, setting abort to 1, leading to ep resource release.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Arp failure for send_mpa_reply/reject() is handled by freeing the
mpa_skb in c4iw_free_ep() before releasing ep.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
There is a race between ULP threads calling c4iw_ep_disconnect() via
c4iw_modify_rc_qp() and the ingress CPL thread where the ULP thread
can free the endpoint just after the ingress CPL thread finds the ep
pointer in the tid table. To avoid this, we now use the hwtid_idr table
for lookups instead of the LLD tid table so we can lock around insert,
remove, and lookup+get_ep to avoid the race. The CPL handlers now will
either find the ep ptr and have a ref on it, or not find it and they
can discard the CPL. Callers of get_ep_from_tid() will have a ref
on the ep if found, and thus must deref when they are done.
Negative advice in peer_abort_intr() need to dereference the ep.
therefore peer_abort() is scheduled to dereference the ep later.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In abort_arp_failure(), the return value from c4iw_ofld_send() is
ignored and thus if the CPL isn't sent, the endpoint is stuck and never
gets aborted. Failure of c4iw_ofld_send() is treated as fatal error, and
the ep resources are released in a safer context through process_work().
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moving the state to ABORTING causes the ep to get stuck because
c4iw_ep_timeout() thinks the ABORT has already been done. So leave the
state alone and let c4iw_ep_disconnect() do the right thing given the
ep state.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
->In act_open_rpl(), CPL_ERR_TCAM_FULL error handling branch, there is
no handling of the return value of send_fw_act_open_req().
->In send_fw_act_open_req(), there is no handling of return value of
c4iw_l2t_send(), which may cause a ep leak and won't notify upper layers
on connection establish failure.
->send_mpa_req() should act on the return from c4iw_l2t_send() and
return the error to the caller.
->In case of c4iw_l2t_send() failure in send_mpa_req(), returns without
starting the timer and not changing the ep state, which is further
handled by act_establish()
-> In act_establish()?if send_mpa_request's get_skb returns an error,
may cause an ep leak. So handle return value of send_mpa_req()
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
->Stop the ep timer after MPA negotiation so that the arp failures
during send_mpa_reply/reject will be handled by process_timeout() after
the ep timer expires.
->Added case MPA_REP_SENT in process_timeout().
->For MPA reject, c4iw_ep_disconnect tries to start an already started
timer, which leads to warning message "timer already started".
-> In case of mpa reject stop the timer and call send_mpa_reject().
-> Added new ep flag STOP_MPA_TIMER to tell fw4_ack() to stop the timer
only for send_mpa_reply(), which is set in c4iw_accept_cr().
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In case of incomplete mpa messages we should not stop timer as it
results in return with timeout for the next mpa message
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
-> On passive side of connection parent_ep referenced during connection
request has to be dereferenced during the passive accept failure.
-> As passive accept failure error handlinglogic runs in atomic context,
the parent ep is dereferenced by scheduling work request.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
- add EP_DISC_FAIL history bit
- add QP_REFED/DEREFED history bits
- Add functions to ref/deref the cm_id and add history bit for the same
- add CLOSE_CON_RPL history
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use c4iw_ep_disconnect() instead. This is part of getting rid of
abort_connection() altogether so we properly clean up on send_abort()
failures.
This is the last user of abort_connection(), so remove it too.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In c4iw_ep_disconnect(), if we fail to initiate a close operation, then
move the qp to ERROR to disassociate the ep from the qp. Failure to do
this will leak the ep resources.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Instead return whether the caller needs to disconnect. This is part of
getting rid of abort_connection() altogether so we properly clean up on
send_abort() failures.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use c4iw_ep_disconnect() instead. This is part of getting rid of
abort_connection() altogether so we properly clean up on send_abort()
failures.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Instead, have the caller, rx_data() handle the close/abort like
it does for process_mpa_request(). This is part of getting rid of
abort_connection() altogether so we properly clean up on send_abort()
failures.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In rx_data(), with the ep in FPDU_MODE, refcnt=2, if we get unexpected
streaming data, we call c4iw_modify_rc_qp() and move the qp from
RTS -> TERMINATE. In c4iw_modify_rc_qp(), if rdma_fini() returns
an error, the ep will be dereferenced (refcnt=1). Then rx_data()
calls c4iw_ep_disconnect() which starts the close operation.
But if send_halfclose() fails in c4iw_ep_disconnect(), we will call
release_ep_resources() derefing the ep which reduces the refcnt to 0 and
and frees the ep. However we still has the ep mutex at that point, so we
have a touch-after-free bug. There is a similar issue where
peer_close() calls c4iw_ep_disconnect().
The solution is to add a reference to the ep in c4iw_ep_disconnect()
after acquiring the mutex, and release it after releasing the mutex.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In c4iw_ep_disconnect(), if we start the ep timer to begin a close,
but send_halfclose() fails, we need to stop the timer and send a CLOSE
event up to the IWCM before releasing the resources. Otherwise, we can
crash when the ep timer fires if the ep is referencing a previous instance
of the device. This can happen as part of adapter reset/recovery, for
instance.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If ARP fails before the CPL_PASS_ACCEPT_RPL is seen by hardware, the tid
will be stuck in SYN_PEND and never released. So create an arp failure
handler specifically for this message to release the endpoint resources.
In pass_accept_rpl_arp_failure(), put the parent endpoint so it will
be freed when destroyed. Also we don't need to call release_tid() here
because _c4iw_free_ep() calls cxgb4_remove_tid() which releases the
hwtid.
If we get an ABORT_REQ_RSS instead of a PASS_ESTABLISH (because the
peer's ACK to our SYN is never received), then put the parent as well
in peer_abort().
Treat accept_cr() failures just like arp failures: put the parent ep
and release the ep resources destroying the tid
The ARP failure handlers are called in an atomic context, so we need to
schedule some of the processing which might block. Namely _c4iw_free_ep()
which needs a mutex. So create a "special" CPL opcode and handler and
schedule it via sched() to be run by process_work() in a blockable context.
Also rework the active open arp failure handler to make use of
release_ep_resources(). This allows both the active and passive arp
failure handlers to use the same deferred cleanup function.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Now that most of the port mapper code been moved to iwcm, we can remove
it from iw_cxgb4.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This series provides support for iWARP applications to specify a TOS
value and have that map to a VLAN Priority for iw_cxgb4 iWARP connections.
In iw_cxgb4, when allocating an L2T entry, pass the skb_priority based
on the tos value in the cm_id. Also pass the correct tos value during
connection setup so the passive side gets the client's desired tos.
When sending the FLOWC work request to FW, if the egress device is
in a vlan, then use the vlan priority bits as the scheduling class.
This allows associating RDMA connections with scheduling classes to
provide traffic shaping per flow.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Don't log errors if a listening endpoint is going away when procesing a
PASS_ACCEPT_REQ message. This can happen. Change the error printk to
a PDBG() debug log entry
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
- Remove usage of ib_query_device and instead store attributes in
ib_device struct
- Move iopoll out of block and into lib, rename to irqpoll, and use
in several places in the rdma stack as our new completion queue
polling library mechanism. Update the other block drivers that
already used iopoll to use the new mechanism too.
- Replace the per-entry GID table locks with a single GID table lock
- IPoIB multicast cleanup
- Cleanups to the IB MR facility
- Add support for 64bit extended IB counters
- Fix for netlink oops while parsing RDMA nl messages
- RoCEv2 support for the core IB code
- mlx4 RoCEv2 support
- mlx5 RoCEv2 support
- Cross Channel support for mlx5
- Timestamp support for mlx5
- Atomic support for mlx5
- Raw QP support for mlx5
- MAINTAINERS update for mlx4/mlx5
- Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates
- Add support for remote invalidate to the iSER driver (pushed through the
RDMA tree due to dependencies, acknowledged by nab)
- Update to NFSoRDMA (pushed through the RDMA tree due to dependencies,
acknowledged by Bruce)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWoSygAAoJELgmozMOVy/dDjsP/2vbTda2MvQfkfkGEZBQdJSg
095RN0gQgCJdg78lAl8yuaK8r4VN/7uefpDtFdudH1I/Pei7X0wxN9R1UzFNG4KR
AD53lz92IVPs15328SbPR2kvNWISR9aBFQo3rlElq3Grqlp0EMn2Ou1vtu87rekF
aMllxr8Nl0uZhP+eWusOsYpJUUtwirLgRnrAyfqo2UxZh/TMIroT0TCx1KXjVcAg
dhDARiZAdu3OgSc6OsWqmH+DELEq6dFVA5F+DDBGAb8bFZqlJc7cuMHWInwNsNXT
so4bnEQ835alTbsdYtqs5DUNS8heJTAJP4Uz0ehkTh/uNCcvnKeUTw1c2P/lXI1k
7s33gMM+0FXj0swMBw0kKwAF2d9Hhus9UAN7NwjBuOyHcjGRd5q7SAnfWkvKx000
s9jVW19slb2I38gB58nhjOh8s+vXUArgxnV1+kTia1+bJSR5swvVoWRicRXdF0vh
TvLX/BjbSIU73g1TnnLNYoBTV3ybFKQ6bVdQW7fzSTDs54dsI1vvdHXi3bYZCpnL
HVwQTZRfEzkvb0AdKbcvf8p/TlaAHem3ODqtO1eHvO4if1QJBSn+SptTEeJVYYdK
n4B3l/dMoBH4JXJUmEHB9jwAvYOpv/YLAFIvdL7NFwbqGNsC3nfXFcmkVORB1W3B
KEMcM2we4bz+uyKMjEAD
=5oO7
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma updates from Doug Ledford:
"Initial roundup of 4.5 merge window patches
- Remove usage of ib_query_device and instead store attributes in
ib_device struct
- Move iopoll out of block and into lib, rename to irqpoll, and use
in several places in the rdma stack as our new completion queue
polling library mechanism. Update the other block drivers that
already used iopoll to use the new mechanism too.
- Replace the per-entry GID table locks with a single GID table lock
- IPoIB multicast cleanup
- Cleanups to the IB MR facility
- Add support for 64bit extended IB counters
- Fix for netlink oops while parsing RDMA nl messages
- RoCEv2 support for the core IB code
- mlx4 RoCEv2 support
- mlx5 RoCEv2 support
- Cross Channel support for mlx5
- Timestamp support for mlx5
- Atomic support for mlx5
- Raw QP support for mlx5
- MAINTAINERS update for mlx4/mlx5
- Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates
- Add support for remote invalidate to the iSER driver (pushed
through the RDMA tree due to dependencies, acknowledged by nab)
- Update to NFSoRDMA (pushed through the RDMA tree due to
dependencies, acknowledged by Bruce)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (169 commits)
IB/mlx5: Unify CQ create flags check
IB/mlx5: Expose Raw Packet QP to user space consumers
{IB, net}/mlx5: Move the modify QP operation table to mlx5_ib
IB/mlx5: Support setting Ethernet priority for Raw Packet QPs
IB/mlx5: Add Raw Packet QP query functionality
IB/mlx5: Add create and destroy functionality for Raw Packet QP
IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types
IB/mlx5: Allocate a Transport Domain for each ucontext
net/mlx5_core: Warn on unsupported events of QP/RQ/SQ
net/mlx5_core: Add RQ and SQ event handling
net/mlx5_core: Export transport objects
IB/mlx5: Expose CQE version to user-space
IB/mlx5: Add CQE version 1 support to user QPs and SRQs
IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext
IB/sa: Fix netlink local service GFP crash
IB/srpt: Remove redundant wc array
IB/qib: Improve ipoib UD performance
IB/mlx4: Advertise RoCE v2 support
IB/mlx4: Create and use another QP1 for RoCEv2
IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
...
The h/w is designed in such a way that, if you do anything IPv6
related, a valid clip entry must be there. So take clip reference
before creating IPv6 listening servers, and then if we fail to
create server, release the clip entry.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pull trivial tree updates from Jiri Kosina.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
floppy: make local variable non-static
exynos: fixes an incorrect header guard
dt-bindings: fixes some incorrect header guards
cpufreq-dt: correct dead link in documentation
cpufreq: ARM big LITTLE: correct dead link in documentation
treewide: Fix typos in printk
Documentation: filesystem: Fix typo in fs/eventfd.c
fs/super.c: use && instead of & for warn_on condition
Documentation: fix sysfs-ptp
lib: scatterlist: fix Kconfig description
This patch fix multiple spelling typos found in
various part of kernel.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The ESTABLISHED event should have the peer's ord/ird so
swap the values in the event before the upcall.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When calculating the minimum ird in c4iw_accept_cr(), we need to always
have a value of at least 1 if the RTR message is a 0B read. The code
was
incorrectly using ep->ord for this logic which was incorrectly adjusting
the ird and causing incorrect ord/ird negotiation when using MPAv2 to
negotiate these values.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This allows client ULPs to get the negotiated ord/ird which is useful
to avoid stalling the SQ due to exceeding the ORD.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In c4iw_create_listen(), if we're using listen filters, then bail out
of the busy loop if the device becomes fatally dead
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This fixes an if statement checking the return value of the function
get_lladdr for success in the function pick_local_ip6addrs to instead
of directly checking the return value of this call check the opposite
as get_lladdr returns zero for success which would incorrectly make
this if statement block not execute with the current if statement
check.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add support for ipv6 address handling clip api provided by lld
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This enables ORD/IRD negotiation and its about time to enable it by
default
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
For listening endpoints bound to the wildcard address, we need to pass
the wildcard address mapping to iwpm_get_remote_info() instead of the
mapped address of the new child connection.
Without this fix, and with iwarp port mapping enabled, each iw_cxgb4
connection that is spawned from a listening endpoint bound to the wildcard
address, will generate an annoying dmesg entry about failing to find
the remote address mapping info, and the connection state displayed in
debugfs under /sys/kernel/debug/iw_cxgb4/<pci-slot-no>/eps will not have
the peer's address/port mapping info. The connection still works though.
Fixes: 5b6b8fe ("RDMA/cxgb4: Report the actual address of the remote connecting peer")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Remove these log messages in favor of per-endpoint counters as well as
device-global counters that can be inspected via debugfs.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Get the actual (non-mapped) ip/tcp address of the connecting peer from
the port mapper
Also setup the passive side endpoint to correctly display the actual
and mapped addresses for the new connection.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
- get_dma_mr() was using ~0UL which is should be ~0ULL. This causes the
DMA MR to get setup incorrectly in hardware.
- wr_log_show() needed a 64b divide function div64_u64() instead of
doing
division directly.
- fixed warnings about recasting a pointer to a u64
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Cleanup all the MACROS that are defined in t4fw_ri_api.h and affected files
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch cleanups all other macros/register define related to
CPL messages that are defined in t4_msg.h and the affected files
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch cleanups all macros/register define related to connection management
CPL messages that are defined in t4_msg.h and the affected files
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cxgb4_create_server() and cxgb4_create_server6() return NET_XMIT_*
values or a negative errno. iw_cxgb4 need to handle this correctly.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch cleanups PF/VF and LDST related macros/register defines that are
defined in t4fw_api.h and the affected files.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch cleanups all filter related macros/register defines that are defined
in t4fw_api.h and the affected files.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactored all macros used in cxgb4i as part of previously started cxgb4 macro
names cleanup. Makes them more uniform and avoids namespace collision.
Minor changes in other drivers where required as some of these macros are used
by multiple drivers, affected drivers are iw_cxgb4, cxgb4(vf) & csiostor
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Various patches have ended up changing the style of the symbolic macros/register
defines to different style.
As a result, the current kernel.org files are a mix of different macro styles.
Since this macro/register defines is used by different drivers a
few patch series have ended up adding duplicate macro/register define entries
with different styles. This makes these register define/macro files a complete
mess and we want to make them clean and consistent. This patch cleans up a part
of it.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes ntuple calculation for IPv6 active open request for T5
adapter. And also removes an duplicate line which got added in commit
92e7ae7172 ("iw_cxgb4: Choose appropriate hw mtu index and ISS for
iWARP connections")
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
best_mtu and set_emss were not considering ipv6 header for ipv6 case.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Conflicts:
drivers/infiniband/hw/cxgb4/device.c
The cxgb4 conflict was simply overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Advertise a larger max read queue depth for qps, and gather the resource limits
from fw and use them to avoid exhaustinq all the resources.
Design:
cxgb4:
Obtain the max_ordird_qp and max_ird_adapter device params from FW
at init time and pass them up to the ULDs when they attach. If these
parameters are not available, due to older firmware, then hard-code
the values based on the known values for older firmware.
iw_cxgb4:
Fix the c4iw_query_device() to report these correct values based on
adapter parameters. ibv_query_device() will always return:
max_qp_rd_atom = max_qp_init_rd_atom = min(module_max, max_ordird_qp)
max_res_rd_atom = max_ird_adapter
Bump up the per qp max module option to 32, allowing it to be increased
by the user up to the device max of max_ordird_qp. 32 seems to be
sufficient to maximize throughput for streaming read benchmarks.
Fail connection setup if the negotiated IRD exhausts the available
adapter ird resources. So the driver will track the amount of ird
resource in use and not send an RI_WR/INIT to FW that would reduce the
available ird resources below zero.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to only register with the iwpm core once. Currently it is
being done for every adapter, which causes a failure for each adapter
but the first, making multiple adapters unusable.
Fixes: 9eccfe109b ("RDMA/cxgb4: Add support for iWARP Port Mapper user space service")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Based on origninal work by Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Based on origninal work by Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Change logic which determines our Physical Function at PCI Probe time.
Now we read the PL_WHOAMI register and get the Physical Function.
Pass Physical Function to Upper Layer Drivers in lld_info structure in the
new field "pf" added to lld_info. This is useful for the cases where the
PF, say PF4, is attached to a Virtual Machine via some form of "PCI
Pass Through" technology and the PCI Function shows up as PF0 in the VM.
Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.
2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
Benniston.
3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
Mork.
4) BPF now has a "random" opcode, from Chema Gonzalez.
5) Add more BPF documentation and improve test framework, from Daniel
Borkmann.
6) Support TCP fastopen over ipv6, from Daniel Lee.
7) Add software TSO helper functions and use them to support software
TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia.
8) Support software TSO in fec driver too, from Nimrod Andy.
9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.
10) Handle broadcasts more gracefully over macvlan when there are large
numbers of interfaces configured, from Herbert Xu.
11) Allow more control over fwmark used for non-socket based responses,
from Lorenzo Colitti.
12) Do TCP congestion window limiting based upon measurements, from Neal
Cardwell.
13) Support busy polling in SCTP, from Neal Horman.
14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.
15) Bridge promisc mode handling improvements from Vlad Yasevich.
16) Don't use inetpeer entries to implement ID generation any more, it
performs poorly, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
tcp: fixing TLP's FIN recovery
net: fec: Add software TSO support
net: fec: Add Scatter/gather support
net: fec: Increase buffer descriptor entry number
net: fec: Factorize feature setting
net: fec: Enable IP header hardware checksum
net: fec: Factorize the .xmit transmit function
bridge: fix compile error when compiling without IPv6 support
bridge: fix smatch warning / potential null pointer dereference
via-rhine: fix full-duplex with autoneg disable
bnx2x: Enlarge the dorq threshold for VFs
bnx2x: Check for UNDI in uncommon branch
bnx2x: Fix 1G-baseT link
bnx2x: Fix link for KR with swapped polarity lane
sctp: Fix sk_ack_backlog wrap-around problem
net/core: Add VF link state control policy
net/fsl: xgmac_mdio is dependent on OF_MDIO
net/fsl: Make xgmac_mdio read error message useful
net_sched: drr: warn when qdisc is not work conserving
...
Fixed a bug that shows up with recv window sizes that exceed the size of
the RCV_BUFSIZ field in opt0 (>= 1024K). If the recv window exceeds
this, then we specify the max possible in opt0, add add the rest in via
a RX_DATA_ACK credits.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Select the appropriate hw mtu index and initial sequence number to optimize
hw memory performance.
Add new cxgb4_best_aligned_mtu() which allows callers to provide enough
information to be used to [possibly] select an MTU which will result in the
TCP Data Segment Size (AKA Maximum Segment Size) to be an aligned value.
If an RTR message exhange is required, then align the ISS to 8B - 1 + 4, so
that after the SYN the send seqno will align on a 4B boundary. The RTR
message exchange will leave the send seqno aligned on an 8B boundary.
If an RTR is not required, then align the ISS to 8B - 1. The goal is
to have the send seqno be 8B aligned when we send the first FPDU.
Based on original work by Casey Leedom <leeedom@chelsio.com> and
Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>