Commit Graph

2396 Commits

Author SHA1 Message Date
Or Gerlitz abbdf4bd7d mlxsw: spectrum: Align the matchall default case returned value
Align the default case for matchall offload with what's there
for flower.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 17:16:10 -07:00
Arkadi Sharshevsky bf95233e20 mlxsw: spectrum: Cosmetic naming change
Currently the struct representing router interface "mlxsw_sp_rif"
is reffered as "r" in various places in the driver. Furthermore it
contains a member which specify the index which is called "rif".
This patch change "r" to "rif" and "rif" to "rif_index".

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 17:16:10 -07:00
Ido Schimmel c7f6e6658b mlxsw: spectrum_router: Don't abort on l3mdev rules
Now that port netdevs can be enslaved to a VRF master we need to make
sure the device's routing tables won't be flushed upon the insertion of
a l3mdev rule.

Note that we assume the notified l3mdev rule is a simple rule as used by
the VRF master. We don't check for the presence of other selectors such
as 'iif' and 'oif'.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 10:18:35 -07:00
Ido Schimmel 3d70e458be mlxsw: spectrum_router: Add support for VRFs on top of bridges
In a similar fashion to the previous patch, allow bridges and VLAN
devices on top of bridges to be enslaved to a VRF master device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 10:18:35 -07:00
Ido Schimmel 7179eb5acd mlxsw: spectrum_router: Add support for VRFs
Allow port netdevs, LAG and VLAN devices stacked on top of these to be
enslaved to a VRF master device.

Upon enslavement, create a router interface (RIF) for the enslaved
netdev and associate it with a virtual router (VR) based on the VRF's
table ID.

If a RIF already exists for the netdev (f.e., due to the existence of an
IP address), then it's deleted and a new one is created with the
appropriate VR binding.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 10:18:35 -07:00
Ido Schimmel 9db032bb1e mlxsw: spectrum_router: Don't destroy RIF if L3 slave
We usually destroy the netdev's router interface (RIF) when the last IP
address is removed from it.

However, we shouldn't do that if it's enslaved to an L3 master device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 10:18:34 -07:00
Ido Schimmel 57837885e3 mlxsw: spectrum_router: Associate RIFs with correct VR
When a router interface (RIF) is created due to a netdev being enslaved
to a VRF master, then it should be associated with the appropriate
virtual router (VR) and not the default one.

If netdev is a VRF slave, lookup the VR based on the VRF's table ID.
Otherwise default to the MAIN table.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 10:18:34 -07:00
Ido Schimmel 5d7bfd1419 ipv4: fib_rules: Dump FIB rules when registering FIB notifier
In commit c3852ef7f2 ("ipv4: fib: Replay events when registering FIB
notifier") we dumped the FIB tables and replayed the events to the
passed notification block.

However, we merely sent a RULE_ADD notification in case custom rules
were in use. As explained in previous patches, this approach won't work
anymore. Instead, we should notify the caller about all the FIB rules
and let it act accordingly.

Upon registration to the FIB notification chain, replay a RULE_ADD
notification for each programmed FIB rule, custom or not. The integrity
of the dump is ensured by the mechanism introduced in the above
mentioned commit.

Prevent regressions by making sure current listeners correctly sanitize
the notified rules.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 10:18:34 -07:00
Amritha Nambiar 56f36acd21 mqprio: Modify mqprio to pass user parameters via ndo_setup_tc.
The configurable priority to traffic class mapping and the user specified
queue ranges are used to configure the traffic class, overriding the
hardware defaults when the 'hw' option is set to 0. However, when the 'hw'
option is non-zero, the hardware QOS defaults are used.

This patch makes it so that we can pass the data the user provided to
ndo_setup_tc. This allows us to pull in the queue configuration if the
user requested it as well as any additional hardware offload type
requested by using a value other than 1 for the hw value.

Finally it also provides a means for the device driver to return the level
supported for the offload type via the qopt->hw value. Previously we were
just always assuming the value to be 1, in the future values beyond just 1
may be supported.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-15 15:20:27 -07:00
David S. Miller 101c431492 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/genet/bcmgenet.c
	net/core/sock.c

Conflicts were overlapping changes in bcmgenet and the
lockdep handling of sockets.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-15 11:59:10 -07:00
Jiri Pirko e9093b1183 mlxsw: reg: Fix SPVMLR max record count
The num_rec field is 8 bit, so the maximal count number is 255.
This fixes vlans learning not being enabled for wider ranges than 255.

Fixes: a4feea74cd ("mlxsw: reg: Add Switch Port VLAN MAC Learning register definition")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-14 11:35:11 -07:00
Jiri Pirko f004ec065b mlxsw: reg: Fix SPVM max record count
The num_rec field is 8 bit, so the maximal count number is 255. This
fixes vlans not being enabled for wider ranges than 255.

Fixes: b2e345f9a4 ("mlxsw: reg: Add Switch Port VID and Switch Port VLAN Membership registers definitions")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-14 11:35:11 -07:00
Arkadi Sharshevsky 7c1b8eb175 mlxsw: spectrum: Add support for TC flower offload statistics
Add support for TC flower offload statistics including number of packets,
bytes and last use timestamp. Currently the statistics are gathered on a
per-rule basis.

Signed-off-by: Arkadi Sharshvesky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-12 23:50:15 -07:00
Arkadi Sharshevsky 4817072950 mlxsw: spectrum: Add support for counters on TCAM entries
Add support for packets and byte statistics on TCAM entries. The counters
are allocated from the generic flow counters pool.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-12 23:50:14 -07:00
Arkadi Sharshevsky 938ab60860 mlxsw: spectrum: Add support for Policing and Counting action block
Add support for Policing and Counting action block. This action block
will be used to bind counter to TCAM entries.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-12 23:50:14 -07:00
Arkadi Sharshevsky 446a154187 mlxsw: spectrum: Add periodic ACL rule activity update
Introduce periodic task for dumping the activity status for the ACL
rule TCAM entries. This is done in order to emulate last use statistics.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.comi>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-12 23:50:14 -07:00
Arkadi Sharshevsky 096e914f69 mlxsw: spectrum: Add support for direct rule access
Currently the ACL rules can be accessed only by hashing. In order to
dump the activity the rules are also placed in a list.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-12 23:50:14 -07:00
Arkadi Sharshevsky 7fd056c2ef mlxsw: spectrum_acl_tcam: Add support for retrieving TCAM entry activity
Add support for retrieving TCAM entry activity. In order to support ACL
rule activity corresponding TCAM entry should be queried.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-12 23:50:14 -07:00
Arkadi Sharshevsky 1abcbcc292 mlxsw: spectrum: Add support for generic flow counter allocation
Add support for allocating generic flow counter. Generic flow counter
can count packets or packets and bytes and can be assigned to different
hardware processes. First use will be for counting packets and bytes of
ACL rules, and will be introduced in the following patches.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-12 23:50:13 -07:00
Arkadi Sharshevsky 5766532abc mlxsw: reg: Add Monitoring General Purpose Counter Set register
The MGPC register retrieves generic flow counter value. It will be
used to query ACL counters.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-12 23:50:13 -07:00
Arkadi Sharshevsky ff7b0d2720 mlxsw: spectrum: Add support for counter allocator
Add implementation for counter allocator. The ASIC has special memory
pool for various counting purposes. Counter memory is distributed between
equal size banks.

The static sub-pool configuration should specify the following parameters
for each sub-pool:
- Number of required banks.
- Maximum entry size.

Each module can add dedicated sub-pool or use existing one.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-12 23:50:13 -07:00
Eugenia Emantayev ea29bd304d net/mlx5e: Fix loopback selftest
Change packet type handler to ETH_P_IP instead of ETH_P_ALL
since we are already expecting an IP packet.

Also, using ETH_P_ALL will cause the loopback test packet type handler
to be called on all outgoing packets, especially our own self loopback
test SKB, which will be validated on xmit as well, and we don't want that.

Tested with:
ethtool -t ethX
validated that the loopback test passes.

Fixes: 0952da791c ('net/mlx5e: Add support for loopback selftest')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-10 10:03:26 -08:00
Or Gerlitz 65ba8fb7d5 net/mlx5e: Avoid wrong identification of rules on deletion
When deleting offloaded TC flows, we must correctly identify E-switch
rules. The current check could get us wrong w.r.t to rules set on the
PF. Since it's possible to set NIC rules on the PF, switch to SRIOV
offloads mode and then attempt to delete a NIC rule.

To solve that, we add a flags field to offloaded rules, set it on
creation time and use that over the code where currently needed.

Fixes: 8b32580df1 ('net/mlx5e: Add TC vlan action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-10 10:03:26 -08:00
Huy Nguyen 33e21c5952 net/mlx5e: remove IEEE/CEE mode check when setting DCBX mode
Currently, the function setdcbx fails if the request dcbx mode
is either IEEE or CEE. We remove the IEEE/CEE mode check because
we support both IEEE and CEE interfaces.

Fixes: 3a6a931dfb ("net/mlx5e: Support DCBX CEE API")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-10 10:03:26 -08:00
Daniel Jurgens 5d47f6c89d net/mlx5: Don't save PCI state when PCI error is detected
When a PCI error is detected the PCI state could be corrupt, don't save
it in that flow. Save the state after initialization. After restoring the
PCI state during slot reset save it again, restoring the state destroys
the previously saved state info.

Fixes: 05ac2c0b74 ('net/mlx5: Fix race between PCI error handlers and
health work')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-10 10:03:26 -08:00
Paul Blakey af36370569 net/mlx5: Fix create autogroup prev initializer
The autogroups list is a list of non overlapping group boundaries
sorted by their start index. If the autogroups list wasn't empty
and an empty group slot was found at the start of the list,
the new group was added to the end of the list instead of the
beginning, as the prev initializer was incorrect.
When this was repeated, it caused multiple groups to have
overlapping boundaries.

Fixed that by correctly initializing the prev pointer to the
start of the list.

Fixes: eccec8da3b ('net/mlx5: Keep autogroups list ordered')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-10 10:03:26 -08:00
Ido Schimmel b5d90e6d6b mlxsw: spectrum_router: Make abort mechanism VR-aware
When the abort mechanism is invoked it binds the first virtual router
(VR) to an LPM tree and inserts a default route to direct packets to the
CPU.

With VRFs, we can have router interfaces (RIFs) bound to multiple VRs,
so we need to make sure packets are trapped from all VRs and not just
the first one.

Upon abort invocation, bind all active VRs to the same LPM tree and
insert a default route in each.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-10 09:36:06 -08:00
Ido Schimmel 6913229eea mlxsw: spectrum_router: Explicitly Associate RIFs with VRs
Up until now we implicitly associated all the router interfaces (RIFs)
with the first virtual router (VR). This must be changed in order to
enable VRF offload. Otherwise, a packet received via a VRF slave would
do a FIB lookup in the same table used by other VRFs.

Instead, bind the RIF to a VR according to the table where FIB lookup
should be performed for packets received via the RIF.

Currently, we only care about the MAIN and LOCAL tables (which we squash
together).

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-10 09:36:06 -08:00
Ido Schimmel 76610ebbde mlxsw: spectrum_router: Refactor virtual router handling
A virtual router (VR) is an entity within the device to which routing
tables and interfaces can be bound to. It can be used to implement VRFs.

In the initial implementation we associated the VR with a specific
protocol (e.g., IPv4) and an LPM tree. However, this isn't really
accurate, as the same VR can be used for both IPv4 and IPv6 traffic, by
binding a different LPM tree to a {VR, Proto} pair.

This patch aims to restructure the VR code according to the above logic,
so that VRs are more accurately represented by the driver's data
structures. The main motivation behind this change is to prepare the
driver for VRF offload.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-10 09:36:06 -08:00
Ido Schimmel 382dbb4014 mlxsw: spectrum_router: Simplify LPM tree allocation
When looking for a new LPM tree we should always consider all the unused
trees. It doesn't matter if the new tree is required due to changes in
currently used prefixes inside an existing routing table or because a
route was inserted into an empty table.

Both cases are functionally identical and therefore should be treated
the same.

When looking for a new LPM tree, consider all unused trees and don't
reserve trees for specific cases.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-10 09:36:06 -08:00
Ido Schimmel 4724ba561a mlxsw: spectrum_router: Place RIF related code with router code
The inetaddr notification block is currently implemented in the main
driver file, but this isn't really appropriate, as it mainly creates and
destroys router interfaces (RIFs) which belong with the rest of the
router code.

This will become even more apparent later on when we'll need to bind
these RIFs to virtual routers according to the VRF's table.

Structure the driver better and prevent unnecessary function exports by
moving the RIF related code with the rest of the router code.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-10 09:36:06 -08:00
Ido Schimmel 97989ee0f5 mlxsw: spectrum_router: Allow more route types to be programmed
Allow 'unreachable', 'blackhole' and 'prohibit' route types to be
programmed into the device by sending any packet hitting them to the
CPU.

This is needed so that users will be able to program a default route
into the VRF's table, thereby preventing lookup from leaking to other
tables.

Audit the code paths to make sure we don't rely on the presence of a
nexthop netdev, as it doesn't exist for above mentioned route types.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-10 09:36:06 -08:00
Ido Schimmel f4a761d203 mlxsw: spectrum: Destroy RIFs based on last removed address
We only use the RIF reference count to determine when the last IP
address was removed, but instead we can just test 'in_dev->ifa_list'.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-10 09:36:06 -08:00
Ido Schimmel 186962ebb7 mlxsw: spectrum: Associate PVID vPort with appropriate netdev
When a VLAN device is configured on top of a LAG device (f.e.,
bond0.10), a vPort is created on top of each of the LAG's slaves and its
'dev' pointer is set to the VLAN device.

This is in contrast to the implicit PVID vPort (representing 'bond0'),
whose 'dev' pointer keeps pointing to the port netdev itself (f.e.,
'sw1p1').

Make both cases consistent by setting their 'dev' pointer to the actual
netdev they represent. Either the LAG device itself (in the case of the
PVID vPort) or the VLAN device on top of it.

This will later allow us to more easily understand for which netdev we
should create the router interface (RIF) upon enslavement to a VRF
master.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-10 09:36:06 -08:00
Ido Schimmel 1f88061ee4 mlxsw: spectrum: Don't assume upper device's type
When an upper device is configured on top of a vPort we make sure it's a
bridge master during PRECHANGEUPPER and fail otherwise. Therefore, when
CHANGEUPPER is later received we don't bother checking the upper's type.

Make the code more extendable in preparation for VRF uppers, by checking
the upper's type.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-10 09:36:06 -08:00
Ido Schimmel b414970e21 mlxsw: spectrum: Sanitize bridge's upper devices
We're going to allow bridges stacked on top of port netdevs to be
enslaved to a VRF, but for now, only VLAN uppers of the VLAN-aware
bridge are supported.

Sanitize any other bridge upper. This is consistent with the way we
sanitize port netdevs' uppers.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-10 09:36:06 -08:00
Petr Machata 9caab08a76 mlxsw: spectrum: Add support for flower matches on VLAN ID, PCP
Introduce MLXSW_AFK_ELEMENT_VID, PCP and declare them in afk_element
infos that contain them.  Use the elements when VLAD ID or priority are
used in the flow.

Also add MLXSW_AFK_ELEMENT_VID, PCP to mlxsw_sp_acl_tcam_pattern_ipv4.
Both items are included in mlxsw_sp_afk_element_info_l2_dmac,
resp. _smac, and both MLXSW_AFK_ELEMENT_SMAC and _DMAC are already in
the pattern.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 18:35:35 -08:00
Petr Machata a150201a70 mlxsw: spectrum: Add support for vlan modify TC action
Add VLAN action offloading. Invoke it from Spectrum flower handler for
"vlan modify" actions.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 18:35:35 -08:00
Eric Dumazet 68b8df4644 mlx4: remove duplicate code in mlx4_en_process_rx_cq()
We should keep one way to build skbs, regardless of GRO being on or off.

Note that I made sure to defer as much as possible the point we need to
pull data from the frame, so that future prefetch() we might add
are more effective.

These skb attributes derive from the CQE or ring :
 ip_summed, csum
 hash
 vlan offload
 hwtstamps
 queue_mapping

As a bonus, this patch removes mlx4 dependency on eth_get_headlen()
which is very often broken enough to give us headaches.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 09:54:46 -08:00
Eric Dumazet 6969cf0fdb mlx4: make validate_loopback() more generic
Testing a boolean in fast path is not worth duplicating
the code allocating packets, when GRO is on or off.

If this proves to be a problem, we might later use a jump label.

Next patch will remove this duplicated code and ease code review.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 09:54:46 -08:00
Eric Dumazet 02e6fd3e55 mlx4: factorize page_address() calls
We need to compute the frame virtual address at different points.
Do it once.

Following patch will use the new va address for validate_loopback()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 09:54:46 -08:00
Eric Dumazet 9e8c0395a7 mlx4: do not access rx_desc from mlx4_en_process_rx_cq()
Instead of fetching dma address from rx_desc->data[0].addr,
prefer using frags[0].dma + frags[0].page_offset to avoid
a potential cache line miss.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 09:54:46 -08:00
Eric Dumazet 7d7bfc6a3f mlx4: add rx_alloc_pages counter in ethtool -S
This new counter tracks number of pages that we allocated for one port.

lpaa24:~# ethtool -S eth0 | egrep 'rx_alloc_pages|rx_packets'
     rx_packets: 306755183
     rx_alloc_pages: 932897

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 09:54:46 -08:00
Eric Dumazet 34db548bfb mlx4: add page recycling in receive path
Same technique than some Intel drivers, for arches where PAGE_SIZE = 4096

In most cases, pages are reused because they were consumed
before we could loop around the RX ring.

This brings back performance, and is even better,
a single TCP flow reaches 30Gbit on my hosts.

v2: added full memset() in mlx4_en_free_frag(), as Tariq found it was needed
if we switch to large MTU, as priv->log_rx_info can dynamically be changed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 09:54:46 -08:00
Eric Dumazet b5a54d9a31 mlx4: use order-0 pages for RX
Use of order-3 pages is problematic in some cases.

This patch might add three kinds of regression :

1) a CPU performance regression, but we will add later page
recycling and performance should be back.

2) TCP receiver could grow its receive window slightly slower,
   because skb->len/skb->truesize ratio will decrease.
   This is mostly ok, we prefer being conservative to not risk OOM,
   and eventually tune TCP better in the future.
   This is consistent with other drivers using 2048 per ethernet frame.

3) Because we allocate one page per RX slot, we consume more
   memory for the ring buffers. XDP already had this constraint anyway.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 09:54:46 -08:00
Eric Dumazet 60c7f5ae54 mlx4: removal of frag_sizes[]
We will soon use order-0 pages, and frag truesize will more precisely
match real sizes.

In the new model, we prefer to use <= 2048 bytes fragments, so that
we can use page-recycle technique on PAGE_SIZE=4096 arches.

We will still pack as much frames as possible on arches with big
pages, like PowerPC.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 09:54:46 -08:00
Eric Dumazet acd7628de0 mlx4: reduce rx ring page_cache size
We only need to store the page and dma address.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 09:54:46 -08:00
Eric Dumazet d85f6c14e9 mlx4: rx_headroom is a per port attribute
No need to duplicate it per RX queue / frags.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 09:54:46 -08:00
Eric Dumazet aaca121dd6 mlx4: get rid of frag_prefix_size
Using per frag storage for frag_prefix_size is really silly.

mlx4_en_complete_rx_desc() has all needed info already.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 09:54:46 -08:00
Eric Dumazet 159ddfd2ca mlx4: remove order field from mlx4_en_frag_info
This is really a port attribute, no need to duplicate it per
RX queue and per frag.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 09:54:46 -08:00