Commit Graph

799740 Commits

Author SHA1 Message Date
Michael Chan b16b689186 bnxt_en: Add SR-IOV support for 57500 chips.
There are some minor differences when assigning VF resources on the
new chips.  The MSIX (NQ) resource has to be assigned and ring group
is not needed on the new chips.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:08:53 -08:00
Michael Chan 36d65be9a8 bnxt_en: Disable MSIX before re-reserving NQs/CMPL rings.
When bringing up a device, the code checks to see if the number of
MSIX has changed.  pci_disable_msix() should be called first before
changing the number of reserved NQs/CMPL rings.  This ensures that
the MSIX vectors associated with the NQs/CMPL rings are still
properly mapped when pci_disable_msix() masks the vectors.

This patch will prevent errors when RDMA support is added for the new
57500 chips.  When the RDMA driver shuts down, the number of NQs is
decreased and we must use the new sequence to prevent MSIX errors.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:08:53 -08:00
Vasundhara Volam 780baad44f bnxt_en: Reserve 1 stat_ctx for RDMA driver.
bnxt_en requires same number of stat_ctxs as CP rings but RDMA
requires only 1 stat_ctx.  Also add a new parameter resv_stat_ctxs
to better keep track of stat_ctxs reserved including resources used
by RDMA.  Add a stat_ctxs parameter to all the relevant resource
reservation functions so we can reserve the correct number of
stat_ctxs.

Prior to this patch, we were not reserving the extra stat_ctx for
RDMA and RDMA would not work on the new 57500 chips.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:08:53 -08:00
Vasundhara Volam f4e896142d bnxt_en: Do not modify max_stat_ctxs after RDMA driver requests/frees stat_ctxs
Calling bnxt_set_max_func_stat_ctxs() to modify max stat_ctxs requested
or freed by the RDMA driver is wrong. After introducing reservation of
resources recently, the driver has to keep track of all stat_ctxs
including the ones used by the RDMA driver.  This will provide a better
foundation for accurate accounting of the stat_ctxs.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:08:53 -08:00
Vasundhara Volam c027c6b4e9 bnxt_en: get rid of num_stat_ctxs variable
For bnxt_en driver, stat_ctxs created will always be same as
cp_nr_rings. Remove extra variable that duplicates the value.
Also introduce bnxt_get_avail_stat_ctxs_for_en() helper to get
available stat_ctxs and bnxt_get_ulp_stat_ctxs() helper to return
number of stat_ctxs used by RDMA.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:08:53 -08:00
Michael Chan e916b0815a bnxt_en: Add bnxt_get_avail_cp_rings_for_en() helper function.
The available CP rings are calculated differently on the new 57500
chips, so add this helper to do this calculation correctly.  The
VFs will be assigned these available CP rings.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:08:53 -08:00
Michael Chan f7588cd893 bnxt_en: Store the maximum NQs available on the PF.
The PF has a pool of NQs and MSIX vectors assigned to it based on
NVRAM configurations.  The number of usable MSIX vectors on the PF
is the minimum of the NQs and MSIX vectors.  Any excess NQs without
associated MSIX may be used for the VFs, so we need to store this
max_nqs value.  max_nqs minus the NQs used by the PF will be the
available NQs for the VFs.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:08:52 -08:00
Stefano Brivio 11789039da fou: Prevent unbounded recursion in GUE error handler
Handling exceptions for direct UDP encapsulation in GUE (that is,
UDP-in-UDP) leads to unbounded recursion in the GUE exception handler,
syzbot reported.

While draft-ietf-intarea-gue-06 doesn't explicitly forbid direct
encapsulation of UDP in GUE, it probably doesn't make sense to set up GUE
this way, and it's currently not even possible to configure this.

Skip exception handling if the GUE proto/ctype field is set to the UDP
protocol number. Should we need to handle exceptions for UDP-in-GUE one
day, we might need to either explicitly set a bound for recursion, or
implement a special iterative handling for these cases.

Reported-and-tested-by: syzbot+43f6755d1c2e62743468@syzkaller.appspotmail.com
Fixes: b8a51b38e4 ("fou, fou6: ICMP error handlers for FoU and GUE")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 21:38:04 -08:00
Joakim Tjernlund a28777f250 ucc_geth: Add change_carrier() for Fixed PHYs
This allows to control carrier from /sys/class/net/ethX/carrier
for Fixed PHYs.

Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 11:24:32 -08:00
Joakim Tjernlund 6211d46713 gianfar: Add change_carrier() for Fixed PHYs
This allows to control carrier from /sys/class/net/ethX/carrier
for Fixed PHYs.

Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 11:24:32 -08:00
Joakim Tjernlund 6e8b0ff1ba dpaa_eth: Add change_carrier() for Fixed PHYs
This allows to control carrier from /sys/class/net/ethX/carrier
for Fixed PHYs.

Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 11:24:32 -08:00
Joakim Tjernlund b3e5464e36 Fixed PHY: Add fixed_phy_change_carrier()
Drivers can use this as .ndo_change_carrier() to change carrier
via /sys/class/net/ethX/carrier.

Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 11:24:32 -08:00
Eric Dumazet 4beaacc6fe net/mlx4_en: remove fallback after kzalloc_node()
kzalloc_node(..., GFP_KERNEL, node) will attempt to allocate
memory as close as possible to the node.

There is no need to fallback to kzalloc() if this has failed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 11:18:02 -08:00
Paolo Abeni c03b0358ab net: unbreak CONFIG_RETPOLINE=n builds
The kbuild bot reported a build breakage with CONFIG_RETPOLINE=n
due to commit aaa5d90b39 ("net: use indirect call wrappers at
GRO network layer").
I screwed the wrapper implementation for such config.
Fix the issue properly ignoring the builtin symbols arguments,
when retpoline is not enabled.

Reported-by: kbuild test robot <lkp@intel.com>
Fixes: aaa5d90b39 ("net: use indirect call wrappers at GRO network layer")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 09:19:49 -08:00
David S. Miller ae6750e0a5 Merge branch 'mlxsw-spectrum_acl-Add-Bloom-filter-support'
Ido Schimmel says:

====================
mlxsw: spectrum_acl: Add Bloom filter support

Nir says:

Spectrum-2 uses Bloom filter to reduce the number of lookups in the
algorithmic TCAM (A-TCAM). HW performs multiple exact match lookups in a
given region using a key composed of { packet & mask, mask ID, region ID }.
The masks which are used in a region are called rule patterns or RP.
When such multiple masks are used, the A-TCAM region uses an eRP
(extended RP) table that describes which rule patterns are in use and
defines the order of the lookup. When eRP table is used in a region, one
way to reduce the number of the lookups is to consult a Bloom filter
before doing the lookup.

A Bloom filter is a space-efficient probabilistic data structure, on
which a query returns either "possibly in set" or "definitely not in
set". HW can skip a lookup if a query on the Bloom filter results a
"definitely not set" response. The mlxsw driver implements a "counting
filter" and when either a new entry is marked or the last entry is
removed it will update the HW. Update of this counting filter occurs
when rule is configured or deleted from a region.

Patch #1 adds PEABFE register which is used for setting Bloom filter
entries.

Patch #2 adds Bloom filter resources.

Patch #3 and patch #4 provide Bloom filter handling within mlxsw, by
adding initialization and logic for updating the Bloom bit vector in HW.

Patch #5 and patch #6 add required calls for Bloom filter update as part
of rule configuration flow.

Patch #7 handles transitions to and from eRP table. It uses a list to
keep A-TCAM rules in order to update rules in Bloom filter, in cases of
transitions from master mask based A-TCAM region to an eRP table based
region and vice versa.

Patch #8 removes a trick done on master RP index to a remaining RP,
since Bloom filter is updated on eRP transitions.

Finally, patch #9 activates Bloom filter mechanism in HW, by cancelling
the bypass that was configured before and the remaining three patches
are selftests that exercise the new code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:35 -08:00
Nir Dotan 5d06a76d9e selftests: mlxsw: Add Bloom delta test
The eRP table is active when there is more than a single rule
pattern. It may be that the patterns are close enough and use delta
mechanism. Bloom filter index computation is based on the values of
{rule & mask, mask ID, region ID} where the rule delta bits must be
cleared.

Add a test that exercises Bloom filter with delta mechanism.
Configure rules within delta range and pass a packet which is
supposed to hit the correct rule.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:34 -08:00
Nir Dotan 5118ca4edf selftests: mlxsw: Add Bloom filter complex test
Bloom filter index computation is based on the values of
{rule & mask, mask ID, region ID} and the computation also varies
according to the region key size.

Add a test that exercises the possible combinations by creating
multiple chains using different key sizes and then pass a frame that
is supposed to to produce a hit on all of the regions.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:34 -08:00
Nir Dotan 095c720807 selftests: mlxsw: Add Bloom filter simple test
Add a test that exercises Bloom filter code.
Activate eRP table in the region by adding multiple rule patterns which
with very high probability use different entries in the Bloom filter.
Then send packets in order to check lookup hits on all relevant rules.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:34 -08:00
Nir Dotan 03ce5bd187 mlxsw: reg: Activate Bloom filter
Now that mlxsw driver handles all aspects of updating
the Bloom filter mechanism, set bf_bypass value to false
and allow HW to use Bloom filter.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:34 -08:00
Nir Dotan dd97d85f1e mlxsw: spectrum_acl: Set master RP index on transition to eRP
Bloom filter is updated on transitions from a single rule pattern,
also called master RP, to eRP table and vice versa. Since rules are
being written to or deleted from the Bloom filter on such transitions,
it is not required to keep the same eRP bank ID for the master RP.

Change master RP index assignment so it will be assigned with zero.
This is consistent with the assignment of the first available spot
that is used for allocating eRP's indices.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:34 -08:00
Nir Dotan 135fd95728 mlxsw: spectrum_acl: Update Bloom filter on eRP transitions
Bloom filter update is required only for rules which reside on an
eRP. When the region has only a single rule pattern then eRP table
is not used, however insertion of another pattern would trigger a
move to an active eRP table so it is imperative to update the Bloom
filter with all previously configured rules.

Add a method that updates Bloom filter entries for all rules
currently configured in the region, on the event of a transition
from master mask to eRP, or vice versa. For that purpose, maintain
a list of all A-TCAM rules within mlxsw_sp_acl_atcam_region.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:34 -08:00
Nir Dotan 8c81b7438b mlxsw: spectrum_acl: Set A-TCAM rules in Bloom filter
Add calls to eRP module for updating Bloom filter when a rule is
added or removed from the A-TCAM. eRP module will update the Bloom
filter only for cases in which the region has an active eRP table.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:34 -08:00
Nir Dotan f5a2852ed0 mlxsw: spectrum_acl: Add Bloom filter update
Add Bloom filter update for rule insertion and rule removal scenarios.
This is done within eRP module in order to assure that Bloom filter
updates are done only for rules which are part of an eRP, as HW does not
consult Bloom filter for entries when there is a single (master) mask in
the region.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:34 -08:00
Nir Dotan 7585cacdb9 mlxsw: spectrum_acl: Add Bloom filter handling
Spectrum-2 HW uses Bloom filter in order to skip lookups on specific
eRPs. It uses crc-16-Msbit-first calculation over a specific layout
of a rule's key fields combined with eRP ID as well as region ID.
Per potential lookup, iff the Bloom filter entry of the calculated
index is empty, then the lookup can be skipped. Hence, the mlxsw
driver should update the Bloom filter entry per each rule insertion
or deletion when rules are part of an eRP.

Add functions for adding and deleting entries in the Bloom filter.
In order to do so also add crc-16 computation based on the specific
Spectrum-2 polynomial and a function for encoding the crc-16 input
in the manner dictated by HW implementation.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:34 -08:00
Nir Dotan 0487cfba86 mlxsw: spectrum_acl: Introduce Bloom filter
Lay the foundations for Bloom filter handling. Introduce a new file for
Bloom filter actions.

Add struct mlxsw_sp_acl_bf to struct mlxsw_sp_acl_erp_core and initialize
the Bloom filter data structure. Also take care of proper destruction when
terminating.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:33 -08:00
Nir Dotan 944068582f mlxsw: resources: Add Spectrum-2 Bloom filter resource
Add the maximum Bloom filter logarithmic size per eRP table bank.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:33 -08:00
Nir Dotan 418089a850 mlxsw: reg: Add Policy Engine Algorithmic Bloom Filter Entries Register
Bloom filter is a bit vector which allows the HW a fast lookup on a
small size bit vector, that may reduce the number of lookups on the
A-TCAM memory. PEABFE register allows setting values to the bits of
the bit vector mentioned above.
Add the register to be later used in A-TCAM optimizations.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:33 -08:00
David S. Miller 0634d694b0 Merge branch 'rtnl-fdb-get'
Roopa Prabhu says:

====================
rtnl fdb get

This series adds support for rtnl fdb get similar to
route get.

v2: add nda_policy, fixes to exact msgs, strict nlmsg parsing

v3: remove unnecessary attribute length checks + simplify code
as pointed out by david
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 14:42:35 -08:00
Roopa Prabhu 31d31951d0 selftests: net: rtnetlink.sh: add fdb get test
tests the below three cases of bridge fdb get:
[bridge, mac, vlan]
[bridge_port, mac, vlan, flags=[NTF_MASTER]]
[vxlandev, mac, flags=NTF_SELF]

depends on iproute2 support for bridge fdb get.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 14:42:34 -08:00
Roopa Prabhu 474c3c896f vxlan: support for ndo_fdb_get
This patch implements ndo_fdb_get for a vxlan device.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 14:42:34 -08:00
Roopa Prabhu 4767456212 bridge: support for ndo_fdb_get
This patch implements ndo_fdb_get for the bridge
fdb.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 14:42:34 -08:00
Roopa Prabhu 5b2f94b276 net: rtnetlink: support for fdb get
This patch adds support for fdb get similar to
route get. arguments can be any of the following (similar to fdb add/del/dump):
[bridge, mac, vlan] or
[bridge_port, mac, vlan, flags=[NTF_MASTER]] or
[dev, mac, [vni|vlan], flags=[NTF_SELF]]

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 14:42:34 -08:00
David S. Miller 5312b93b04 Merge branch 'dsa-tag-cleanups'
Marek Vasut says:

====================
net: dsa: ksz: Clean up the tag code in prep for more switches

Clean up the KSZ DSA tag code in preparation for adding more switches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 14:23:33 -08:00
Marek Vasut 8a75b9d4c9 net: dsa: ksz: Add STP multicast handling
In case the destination address is link local, add override bit into the
switch tag to let such a packet through the switch even if the port is
blocked.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 14:23:33 -08:00
Tristram Ha bafe9ba7d9 net: dsa: ksz: Factor out common tag code
Factor out common code from the tag_ksz , so that the code can be used
with other KSZ family switches which use differenly sized tags.

Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 14:23:33 -08:00
Tristram Ha 39d6b96f9f net: dsa: ksz: Rename NET_DSA_TAG_KSZ to _KSZ9477
Rename the tag Kconfig option and related macros in preparation for
addition of new KSZ family switches with different tag formats.

Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 14:23:33 -08:00
Jakub Kicinski 036b9e7cae nfp: abm: allow to opt-out of RED offload
FW team asks to be able to not support RED even if NIC is capable
of buffering for testing and experimentation.  Add an opt-out flag.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 12:41:42 -08:00
David S. Miller 9c46ae0ea1 Revert "net: dccp: initialize (addr,port) listening hashtable"
This reverts commit ec49d83f24.

Cause build failures when DCCP is modular.

ERROR: "inet_hashinfo2_init" [net/dccp/dccp.ko] undefined!

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 12:36:41 -08:00
David Ahern df9b0e30d4 neighbor: Add protocol attribute
Similar to routes and rules, add protocol attribute to neighbor entries
for easier tracking of how each was created.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 12:15:25 -08:00
Peter Oskolkov 11fb60d108 selftests: net: reuseport_addr_any: add DCCP
This patch adds coverage of DCCP to reuseport_addr_any selftest.

Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 12:14:29 -08:00
Peter Oskolkov ec49d83f24 net: dccp: initialize (addr,port) listening hashtable
Commit d9fbc7f643 "net: tcp: prefer listeners bound to an address"
removes port-only listener lookups. This caused segfaults in DCCP
lookups because DCCP did not initialize the (addr,port) hashtable.

This patch adds said initialization.

The only non-trivial issue here is the size of the new hashtable.
It seemed reasonable to make it match the size of the port-only
hashtable (= INET_LHTABLE_SIZE) that was used previously. Other
parameters to inet_hashinfo2_init() match those used in TCP.

Tested: syzcaller issues fixed; the second patch in the patchset
        tests that DCCP lookups work correctly.

Fixes: d9fbc7f643 "net: tcp: prefer listeners bound to an address"
Reported-by: syzcaller <syzkaller@googlegroups.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 12:14:29 -08:00
Sam Protsenko c151acc6e9 l2tp: Add protocol field decompression
When Protocol Field Compression (PFC) is enabled, the "Protocol" field
in PPP packet will be received without leading 0x00. See section 6.5 in
RFC 1661 for details. So let's decompress protocol field if needed, the
same way it's done in drivers/net/ppp/pptp.c.

In case when "nopcomp" pppd option is not enabled, PFC (pcomp) can be
negotiated during LCP handshake, and L2TP driver in kernel will receive
PPP packets with compressed Protocol field, which in turn leads to next
error:

    Protocol Rejected (unsupported protocol 0x2145)

because instead of Protocol=0x0021 in PPP packet there will be
Protocol=0x21. This patch unwraps it back to 0x0021, which fixes the
issue.

Sending the compressed Protocol field will be implemented in subsequent
patch, this one is self-sufficient.

Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 23:28:19 -08:00
David S. Miller 63de273f34 mlx5e-updates-2018-12-14 (VF Lag)
From Aviv Heller,
 
 Subsequent patches introduce VF LAG, which provdies load-balancing and
 high-availability capabilities for VFs associated with different
 physical ports of the same Connect-X card.
 
 This series consists of the following:
  - mlx5 devcom, driver infrastructure that facilitates operations that involve
    both core devices (physical functions) of the same card, to synchronize and
    communicate between two driver instances of the same card.
  - Infrastructure for TC rule duplication.
  - Changes to LAG logic to enable its use when SR-IOV is enabled
  - PFs in switchdev mode is the only mode currently supported.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcFCCzAAoJEEg/ir3gV/o+VzIH/3xOcSuDIB7rM/mNoMtXVvvu
 ZBZGUlUh7PoX/+fLjKYuQhosCvx1kpecPeNTa/R4yFy3kUl70CsKT6Cqtw2qGiPU
 2hfr75MobM5nMwOU1QDXYyfSaXZsfUj25DE25OSvtMygR6YjeU0Lm4peWBnd9MD0
 Oj5lilggS6hiNyCTI0s1D4tR0Uh2roj2tAGW7JRkd1Et34u9KZbBitCWvYkTtqcp
 XauuCdjlDNZenOg0Lue9fAc4Jo9AVEaB2w/4dZDsarZyvabQ4DxVrmrVOgLmWBjc
 Tr10JCuBvgdX/66JikJQ08Fkx/pHGby6jxcI4ux3QEoHe4/umm3Te0+5/pIxe58=
 =wDuV
 -----END PGP SIGNATURE-----

Merge tag 'mlx5e-updates-2018-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5e-updates-2018-12-14 (VF Lag)

From Aviv Heller,

Subsequent patches introduce VF LAG, which provdies load-balancing and
high-availability capabilities for VFs associated with different
physical ports of the same Connect-X card.

This series consists of the following:
 - mlx5 devcom, driver infrastructure that facilitates operations that involve
   both core devices (physical functions) of the same card, to synchronize and
   communicate between two driver instances of the same card.
 - Infrastructure for TC rule duplication.
 - Changes to LAG logic to enable its use when SR-IOV is enabled
 - PFs in switchdev mode is the only mode currently supported.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 13:29:56 -08:00
David S. Miller bedf3b3320 Merge branch 'net-mitigate-retpoline-overhead'
Paolo Abeni says:

====================
net: mitigate retpoline overhead

The spectre v2 counter-measures, aka retpolines, are a source of measurable
overhead[1]. We can partially address that when the function pointer refers to
a builtin symbol resorting to a list of tests vs well-known builtin function and
direct calls.

Experimental results show that replacing a single indirect call via
retpoline with several branches and a direct call gives performance gains
even when multiple branches are added - 5 or more, as reported in [2].

This may lead to some uglification around the indirect calls. In netconf 2018
Eric Dumazet described a technique to hide the most relevant part of the needed
boilerplate with some macro help.

This series is a [re-]implementation of such idea, exposing the introduced
helpers in a new header file. They are later leveraged to avoid the indirect
call overhead in the GRO path, when possible.

Overall this gives > 10% performance improvement for UDP GRO benchmark and
smaller but measurable for TCP syn flood.

The added infra can be used in follow-up patches to cope with retpoline overhead
in other points of the networking stack (e.g. at the qdisc layer) and possibly
even in other subsystems.

v2  -> v3:
 - fix build error with CONFIG_IPV6=m

v1  -> v2:
 - list explicitly the builtin function names in INDIRECT_CALL_*(),
   as suggested by Ed Cree
 - expand the recipients list

rfc -> v1:
 - use branch prediction hints, as suggested by Eric

[1] http://vger.kernel.org/netconf2018_files/PaoloAbeni_netconf2018.pdf
[2] https://linuxplumbersconf.org/event/2/contributions/99/attachments/98/117/lpc18_paper_af_xdp_perf-v2.pdf
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 13:23:03 -08:00
Paolo Abeni 4f24ed77de udp: use indirect call wrappers for GRO socket lookup
This avoids another indirect call for UDP GRO. Again, the test
for the IPv6 variant is performed first.

v1 -> v2:
 - adapted to INDIRECT_CALL_ changes

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 13:23:02 -08:00
Paolo Abeni 028e0a4766 net: use indirect call wrappers at GRO transport layer
This avoids an indirect call in the receive path for TCP and UDP
packets. TCP takes precedence on UDP, so that we have a single
additional conditional in the common case.

When IPV6 is build as module, all gro symbols except UDPv6 are
builtin, while the latter belong to the ipv6 module, so we
need some special care.

v1 -> v2:
 - adapted to INDIRECT_CALL_ changes
v2 -> v3:
 - fix build issue with CONFIG_IPV6=m

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 13:23:02 -08:00
Paolo Abeni aaa5d90b39 net: use indirect call wrappers at GRO network layer
This avoids an indirect calls for L3 GRO receive path, both
for ipv4 and ipv6, if the latter is not compiled as a module.

Note that when IPv6 is compiled as builtin, it will be checked first,
so we have a single additional compare for the more common path.

v1 -> v2:
 - adapted to INDIRECT_CALL_ changes

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 13:23:02 -08:00
Paolo Abeni 283c16a2df indirect call wrappers: helpers to speed-up indirect calls of builtin
This header define a bunch of helpers that allow avoiding the
retpoline overhead when calling builtin functions via function pointers.
It boils down to explicitly comparing the function pointers to
known builtin functions and eventually invoke directly the latter.

The macros defined here implement the boilerplate for the above schema
and will be used by the next patches.

rfc -> v1:
 - use branch prediction hint, as suggested by Eric
v1  -> v2:
 - list explicitly the builtin function names in INDIRECT_CALL_*(),
   as suggested by Ed Cree

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 13:23:02 -08:00
Ilias Apalodimas 35e07d2347 net: socionext: remove mmio reads on Tx
Currently the driver issues 2 mmio reads to figure out the number of
transmitted packets and clean them. We can get rid of the expensive
reads since BIT 31 of the Tx descriptor can be used for that.
We can also remove the budget counting of Tx completions since all of
the descriptors are not deliberately processed.

Performance numbers using pktgen are:
size  pre-patch(pps)  post-patch(pps)
64       362483           427916
128      358315           411686
256      352725           389683
512      215675           216464
1024     113812           114442

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 13:15:19 -08:00
Ilias Apalodimas 17a12eaaf0 net: socionext: correctly recover txq after being full
Running pktgen with packets sizes > 512b ends up in the interface Txq
getting stuck.
"netsec 522d0000.ethernet eth0: netsec_netdev_start_xmit: TxQFull!"
appears on dmesg but the interface never recovers. It requires an
ifconfig down/up to make the interface usable again.

The reason that triggers this, is a race condition between
.ndo_start_xmit and the napi completion. The available budget is
calculated first and indicates the queue is full. Due to a costly
netif_err() the queue is not stopped in time while the napi completion
runs, clears the irq and frees up descriptors, thus the queue never wakes
up again.

Fix this by moving the print after stopping the queue, make the print
ratelimited, add barriers and check for cleaned descriptors..

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 13:15:19 -08:00