Commit Graph

918672 Commits

Author SHA1 Message Date
Arthur Kiyanovski 4bb7f4cf60 net: ena: reduce driver load time
This commit reduces the driver load time by using usec resolution
instead of msec when polling for hardware state change.

Also add back-off mechanism to handle cases where minimal sleep
time is not enough.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:12:48 -07:00
Arthur Kiyanovski b0ae3ac484 net: ena: cosmetic: minor code changes
1. Use BIT macro instead of shift operator for code clarity
2. Replace multiple flag assignments to a single assignment of multiple
   flags in ena_com_add_single_rx_desc()
3. Move ENA_HASH_KEY_SIZE from ena_netdev.h to ena_com.h

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:12:48 -07:00
Arthur Kiyanovski 6d0862e0ec net: ena: cosmetic: fix spacing issues
1. Add leading and trailing spaces to several comments for better
   readability
2. Make tabs and spaces uniform in enum defines in ena_admin_defs.h

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:12:48 -07:00
Arthur Kiyanovski 0a39a35f3f net: ena: cosmetic: code reorderings
1. Reorder sanity checks in get_comp_ctxt() to make more sense
2. Reorder variables in ena_com_fill_hash_function() and
   ena_calc_io_queue_size() in reverse christmas tree.
3. Move around member initializations.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:12:48 -07:00
Arthur Kiyanovski f302044747 net: ena: cosmetic: remove unnecessary code
1. Remove unused definition of DRV_MODULE_VERSION
2. Remove {} from single line-of-code ifs
3. Remove unnecessary comments from ena_get/set_coalesce()
4. Remove unnecessary extra spaces and newlines

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:12:48 -07:00
Arthur Kiyanovski 46143e5888 net: ena: cosmetic: fix line break issues
1. Join unnecessarily broken short lines in ena_com.c ena_netdev.c
2. Fix Indentations of broken lines

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:12:48 -07:00
Arthur Kiyanovski 13830937cc net: ena: cosmetic: fix spelling and grammar mistakes in comments
fix spelling and grammar mistakes in comments in ena_com.h,
ena_com.c and ena_netdev.c

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:12:48 -07:00
Arthur Kiyanovski ba6f6b4191 net: ena: cosmetic: set queue sizes to u32 for consistency
Make all types of variables that convey the number and sizeof queues to
be u32, for consistency with the API between the driver and device via
ena_admin_defs.h:ena_admin_get_feat_resp.max_queue_ext fields. Current
code sometimes uses int and there are multiple assignments between these
variables with different types.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:12:48 -07:00
Arthur Kiyanovski 95d0fcb570 net: ena: cosmetic: rename ena_update_tx/rx_rings_intr_moderation()
Rename ena_update_tx/rx_rings_intr_moderation() to
ena_update_tx/rx_rings_nonadaptive_intr_moderation()
to distinguish between adaptive and non adaptive interrupt moderaion.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:12:48 -07:00
Arthur Kiyanovski da447b3b54 net: ena: simplify ena_com_update_intr_delay_resolution()
Initialize prev_intr_delay_resolution with ena_dev->intr_delay_resolution
unconditionally, since it is initialized with
ENA_DEFAULT_INTR_DELAY_RESOLUTION in ena_probe(). This approach makes much
more sense than handling errors of not initializing it.

Also added unlikely to if condition.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:12:48 -07:00
Arthur Kiyanovski adb3fb3889 net: ena: fix ena_com_comp_status_to_errno() return value
Default return value should be -EINVAL since the input
in this case was unexpected.
Also remove the now redundant check in the beginning
of the function.

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:12:48 -07:00
Arthur Kiyanovski f391503b7a net: ena: use explicit variable size for clarity
Use u64 instead of unsigned long long for clarity

Signed-off-by: Shai Brandes <shaibran@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:12:48 -07:00
Arthur Kiyanovski 7cfe9a5593 net: ena: rename ena_com_free_desc to make API more uniform
Rename ena_com_free_desc to ena_com_free_q_entries to match
the LLQ mode.

In non-LLQ mode, an entry in an IO ring corresponds to a
a descriptor. In LLQ mode an entry may correspond to several
descriptors (per LLQ definition).

Signed-off-by: Igor Chauskin <igorch@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:12:48 -07:00
Arthur Kiyanovski 68f236df93 net: ena: add support for the rx offset feature
Newer ENA devices can write data to rx buffers with an offset
from the beginning of the buffer.

This commit adds support for this feature in the driver.

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:12:48 -07:00
David S. Miller b79f91f157 Merge branch 'net-atlantic-QoS-implementation'
Igor Russkikh says:

====================
net: atlantic: QoS implementation

This patch series adds support for mqprio rate limiters and multi-TC:
 * max_rate is supported on both A1 and A2;
 * min_rate is supported on A2 only;

This is a joint work of Mark and Dmitry.

To implement this feature, a couple of rearrangements and code
improvements were done, in areas of TC/ring management, allocation
control, etc.

One of the problems we faced is conflicting ptp functionality, which
consumes a whole traffic class due to hardware limitations.
Patches below have a more detailed description on how PTP and multi-TC
co-exist right now.

v2:
 * accommodated review comments (-Wmissing-prototypes and
   -Wunused-but-set-variable findings);
 * added user notification in case of conflicting multi-TC<->PTP
   configuration;
 * added automatic PTP disabling, if a conflicting configuration is
   detected;
 * removed module param, which was used for PTP disabling in v1;

v1: https://patchwork.ozlabs.org/cover/1294380/
====================

Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:08:29 -07:00
Mark Starovoytov 40f05e5b0d net: atlantic: proper rss_ctrl1 (54c0) initialization
This patch fixes an inconsistency between code and spec, which
was found while working on the QoS implementation.

When 8TCs are used, 2 is the maximum supported number of index bits.
In a 4TC mode, we do support 3, but we shouldn't really use the bytes,
which are intended for the 8TC mode.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:08:29 -07:00
Mark Starovoytov 2deac71ac4 net: atlantic: QoS implementation: min_rate
This patch adds support for mqprio min_rate limiters.

A2 HW supports Weighted Strict Priority (WSP) arbitration for Tx Descriptor
Queue scheduling among TCs, which can be used for min_rate shaping.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:08:29 -07:00
Mark Starovoytov b64f2ac995 net: atlantic: change the order of arguments for TC weight/credit setters
This patch changes the order of arguments for TC weight/credit setter
functions.
Having the "value to be set" on the right is slightly more robust in
a sense that it's more natural for the humans, so it's a bit more
error-proof this way.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:08:29 -07:00
Mark Starovoytov 5479e8436f net: atlantic: always use random TC-queue mapping for TX on A2.
This patch changes the TC-queue mapping mechanism used on A2.
Configure the A2 HW in such a way that we can keep queue index mapping
exactly as it was on A1.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:08:28 -07:00
Mark Starovoytov 14ef766b13 net: atlantic: automatically downgrade the number of queues if necessary
This patch adds support for automatic queue number downgrade.

On A2: this is a must have, because only TC0/TC1 support more than 4Q.
Other TCs support 4Qs maximum.
Thus, on A2 we must downgrade the number of queues per TC to 4, if more
than 2 TCs are requested.

On A1: this allows using 8TCs even on systems with cpu count >= 8, when
we have 8 queues by default.
We will just automatically switch to 8TCx4Q mode in this case.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:08:28 -07:00
Mark Starovoytov 7327699f35 net: atlantic: QoS implementation: max_rate
This patch adds initial support for mqprio rate limiters (max_rate only).

Atlantic HW supports Rate-Shaping for time-sensitive traffic at per
Traffic Class (TC) granularity.
Target rate is defined by:
* nominal link rate (always 10G);
* rate factor (ratio between nominal rate and max allowed).

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:08:28 -07:00
Mark Starovoytov b9e989262a net: atlantic: make TCVEC2RING accept nic_cfg
This patch updates TCVEC2RING to accept nic_cfg, which is needed to be able
to use it from hw_atl.
The name is updated to reflect the changes.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:08:28 -07:00
Mark Starovoytov 4272ba8b11 net: atlantic: per-TC queue statistics
This patch adds support for per-TC queue statistics.

By default (single TC), the output is the same as it used to be, e.g.:
     Queue[0] InPackets: 2
     Queue[0] OutPackets: 8
     Queue[0] Restarts: 0
     Queue[0] InJumboPackets: 0
     Queue[0] InLroPackets: 0
     Queue[0] InErrors: 0

If several TCs are enabled, then each queue statistics line is prefixed
with TC number, e.g.:
     TC0 Queue[0] InPackets: 6
     TC0 Queue[0] OutPackets: 11
Queue numbering is end-to-end, so:
     TC1 Queue[4] InPackets: 0
     TC1 Queue[4] OutPackets: 22

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:08:28 -07:00
Dmitry Bezrukov a83fe6b6ad net: atlantic: QoS implementation: multi-TC support
This patch adds multi-TC support.

PTP is automatically disabled when the user enables more than 2 TCs,
otherwise traffic on TC2 won't quite work, because it's reserved for PTP.

Signed-off-by: Dmitry Bezrukov <dbezrukov@marvell.com>
Co-developed-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Co-developed-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:08:28 -07:00
Dmitry Bezrukov 0aa7bc3ee4 net: atlantic: changes for multi-TC support
This patch contains the following changes:
* add cfg->is_ptp (used for PTP enable/disable switch, which
  is described in more details below);
* add cfg->tc_mode (A1 supports 2 HW modes only);
* setup queue to TC mapping based on TC mode on A2;
* remove hw_tx_tc_mode_get / hw_rx_tc_mode_get hw_ops.

In the first generation of our hardware (A1), a whole traffic class is
consumed for PTP handling in FW (FW uses it to send the ptp data and to
send back timestamps).
The 'is_ptp' flag introduced in this patch will be used in to automatically
disable PTP when a conflicting configuration is detected, e.g. when
multiple TCs are enabled.

Signed-off-by: Dmitry Bezrukov <dbezrukov@marvell.com>
Co-developed-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:08:28 -07:00
Dmitry Bezrukov 593dd0fc20 net: atlantic: move PTP TC initialization to a separate function
This patch moves the PTP TC initialization into a separate function.

Signed-off-by: Dmitry Bezrukov <dbezrukov@marvell.com>
Co-developed-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:08:28 -07:00
Dmitry Bezrukov 8ce8427169 net: atlantic: changes for multi-TC support
This patch contains the following changes:
* access cfg via aq_nic_get_cfg() in aq_nic_start() and aq_nic_map_skb();
* call aq_nic_get_dev() just once in aq_nic_map_skb();
* move ring allocation/deallocation out of aq_vec_alloc()/aq_vec_free();
* add the missing aq_nic_deinit() in atl_resume_common();
* rename 'tcs' field to 'tcs_max' in aq_hw_caps_s to differentiate it from
  the 'tcs' field in aq_nic_cfg_s, which is used for the current number of
  TCs;
* update _TC_MAX defines to the actual number of supported TCs;
* move tx_tc_mode register defines slightly higher (just to keep the order
  of definitions);
* separate variables for TX/RX buff_size in hw_atl*_hw_qos_set();
* use AQ_HW_*_TC instead of hardcoded magic numbers;
* actually use the 'ret' value in aq_mdo_add_secy();

Signed-off-by: Dmitry Bezrukov <dbezrukov@marvell.com>
Co-developed-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:08:28 -07:00
David S. Miller 59b8d27705 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2020-05-21

This series contains updates to ice driver only.  Several of the changes
are fixes, which could be backported to stable, of which, only one was
marked for stable because of the memory leak potential.

Jake exposes the information in the flash memory used for link
management, which is called the netlist module.

Henry and Tony add support for tunnel offloads.

Brett adds promiscuous support in VF's which is based on VF trust and
the new vf-true-promisc flag.

Avinash fixes an issue where a transmit timeout for a queue that belongs
to a PFC enabled TC is not a true transmit timeout, but because the PFC
is in action.

Dave fixes the check for contiguous TCs to allow for various UP2TC
mapping configurations.  Also fixed an issue when changing the pause
parameters would could multiple link drop/down's in succession, which in
turn caused the firmware to not generate a link interrupt for the driver
to respond to.

Anirudh (Ani) fixed a potential race condition in probe/open due to a
bit being cleared too early.

Lihong updates an error message to make it more meaningful instead of
just printing out the numerical value of the status/error code.  Also
fixed an incorrect return value if deleting a filter does not find a
match to delete or when adding a filter that already exists.

Karol fixes casting issues and precision loss in the driver.

Jesse make the sign usage more consistent in the driver by making sure
all instances of vf_id are unsigned, since it can never be negative.

Eric fixes a potential memory leak in ice_add_prof_id_vsig() where was
not cleaning up resources properly when an error occurs.

Michal to help organize the filtering code in the driver, refactor the
code into a separate file and add functions to prepare the filter
information.

Bruce cleaned up a conditional statement that always resulted in true
and provided a comment to make it more obvious.  Also cleaned up
redundant code checks.

Tony helps with potential namespace issues by renaming a 'ice' specific
function with the driver name prepended.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:05:05 -07:00
David S. Miller 4001f1f02e Merge branch 'Support-for-fdb-ECMP-nexthop-groups'
Roopa Prabhu says:

====================
Support for fdb ECMP nexthop groups

This series introduces ecmp nexthops and nexthop groups
for mac fdb entries. In subsequent patches this is used
by the vxlan driver fdb entries. The use case is
E-VPN multihoming [1,2,3] which requires bridged vxlan traffic
to be load balanced to remote switches (vteps) belonging to
the same multi-homed ethernet segment (This is analogous to
a multi-homed LAG but over vxlan).

Changes include new nexthop flag NHA_FDB for nexthops
referenced by fdb entries. These nexthops only have ip.
The patches make sure that routes dont reference such nexthops.

example:
$ip nexthop add id 12 via 172.16.1.2 fdb
$ip nexthop add id 13 via 172.16.1.3 fdb
$ip nexthop add id 102 group 12/13 fdb

$bridge fdb add 02:02:00:00:00:13 dev vxlan1000 nhid 101 self

[1] E-VPN https://tools.ietf.org/html/rfc7432
[2] E-VPN VxLAN: https://tools.ietf.org/html/rfc8365
[3] LPC talk with mention of nexthop groups for L2 ecmp
http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf

v4 -
    - fix error path free_skb in vxlan_xmit_nh
    - fix atomic notifier initialization issue
      (Reported-by: kernel test robot <rong.a.chen@intel.com>)
      The reported error was easy to locate and fix, but i was not
      able to re-test with the robot reproducer script due to some
      other issues with running the script on my test system.

v3 - fix wording in selftest print as pointed out by davidA

v2 -
	- dropped nikolays fixes for nexthop multipath null pointer deref
	  (he will send those separately)
	- added negative tests for route add with fdb nexthop + a few more
	- Fixes for a few  fdb replace conditions found during more testing
	- Moved to rcu_dereference_rtnl in vxlan_fdb_info and consolidate rcu
	  dereferences
	- Fixes to build failures Reported-by: kbuild test robot <lkp@intel.com>
	- DavidA, I am going to send a separate patch for the neighbor code validation
	  for NDA_NH_ID if thats ok.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:00:39 -07:00
Roopa Prabhu 0534c5489c selftests: net: add fdb nexthop tests
This commit adds ipv4 and ipv6 fdb nexthop api tests to fib_nexthops.sh.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:00:38 -07:00
Roopa Prabhu c7cdbe2efc vxlan: support for nexthop notifiers
vxlan driver registers for nexthop add/del notifiers to
cleanup fdb entries pointing to such nexthops.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:00:38 -07:00
Roopa Prabhu 8590ceedb7 nexthop: add support for notifiers
This patch adds nexthop add/del notifiers. To be used by
vxlan driver in a later patch. Could possibly be used by
switchdev drivers in the future.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:00:38 -07:00
Roopa Prabhu 1274e1cc42 vxlan: ecmp support for mac fdb entries
Todays vxlan mac fdb entries can point to multiple remote
ips (rdsts) with the sole purpose of replicating
broadcast-multicast and unknown unicast packets to those remote ips.

E-VPN multihoming [1,2,3] requires bridged vxlan traffic to be
load balanced to remote switches (vteps) belonging to the
same multi-homed ethernet segment (E-VPN multihoming is analogous
to multi-homed LAG implementations, but with the inter-switch
peerlink replaced with a vxlan tunnel). In other words it needs
support for mac ecmp. Furthermore, for faster convergence, E-VPN
multihoming needs the ability to update fdb ecmp nexthops independent
of the fdb entries.

New route nexthop API is perfect for this usecase.
This patch extends the vxlan fdb code to take a nexthop id
pointing to an ecmp nexthop group.

Changes include:
- New NDA_NH_ID attribute for fdbs
- Use the newly added fdb nexthop groups
- makes vxlan rdsts and nexthop handling code mutually
  exclusive
- since this is a new use-case and the requirement is for ecmp
nexthop groups, the fdb add and update path checks that the
nexthop is really an ecmp nexthop group. This check can be relaxed
in the future, if we want to introduce replication fdb nexthop groups
and allow its use in lieu of current rdst lists.
- fdb update requests with nexthop id's only allowed for existing
fdb's that have nexthop id's
- learning will not override an existing fdb entry with nexthop
group
- I have wrapped the switchdev offload code around the presence of
rdst

[1] E-VPN RFC https://tools.ietf.org/html/rfc7432
[2] E-VPN with vxlan https://tools.ietf.org/html/rfc8365
[3] http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf

Includes a null check fix in vxlan_xmit from Nikolay

v2 - Fixed build issue:
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:00:38 -07:00
Roopa Prabhu 38428d6871 nexthop: support for fdb ecmp nexthops
This patch introduces ecmp nexthops and nexthop groups
for mac fdb entries. In subsequent patches this is used
by the vxlan driver fdb entries. The use case is
E-VPN multihoming [1,2,3] which requires bridged vxlan traffic
to be load balanced to remote switches (vteps) belonging to
the same multi-homed ethernet segment (This is analogous to
a multi-homed LAG but over vxlan).

Changes include new nexthop flag NHA_FDB for nexthops
referenced by fdb entries. These nexthops only have ip.
This patch includes appropriate checks to avoid routes
referencing such nexthops.

example:
$ip nexthop add id 12 via 172.16.1.2 fdb
$ip nexthop add id 13 via 172.16.1.3 fdb
$ip nexthop add id 102 group 12/13 fdb

$bridge fdb add 02:02:00:00:00:13 dev vxlan1000 nhid 101 self

[1] E-VPN https://tools.ietf.org/html/rfc7432
[2] E-VPN VxLAN: https://tools.ietf.org/html/rfc8365
[3] LPC talk with mention of nexthop groups for L2 ecmp
http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf

v4 - fixed uninitialized variable reported by kernel test robot
Reported-by: kernel test robot <rong.a.chen@intel.com>

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 14:00:38 -07:00
David S. Miller 7b1b843a1e Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2020-05-21

This series contains updates to igc and e1000.

Andre cleans up code that was left over from the igb driver that handled
MAC address filters based on the source address, which is not currently
supported.  Simplifies the MAC address filtering code and prepare the
igc driver for future source address support.  Updated the MAC address
filter internal APIs to support filters based on source address.  Added
support for Network Flow Classification (NFC) rules based on source MAC
address.  Cleaned up the 'cookie' field which is not used anywhere in
the code and cleaned up a wrapper function that was not needed.
Simplified the filtering code for readability and aligned the ethtool
functions, so that function names were consistent.

Alex provides a fix for e1000 to resolve a deadlock issue when NAPI is
being disabled.

Sasha does additional cleanup of the igc driver of dead code that is not
used or needed.

v2: Fix the function header comment in patch 3 of the series, based on
    the feedback from Jakub Kicinski.
====================

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 13:48:50 -07:00
Tony Nguyen 5757cc7c8b ice: Rename build_ctob to ice_build_ctob
To make the function easier to identify as being part of the ice driver,
prepend ice to the function name.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Bruce Allan c522d1f686 ice: remove unnecessary backslash
Self-explanatory.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Bruce Allan 86a2e00d20 ice: remove unnecessary check
The variable status cannot be zero due to a prior check of it; remove this
check.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Bruce Allan 92ace4824c ice: remove unnecessary expression that is always true
The else conditional expression is always true due to the if conditional
expression; remove it and add a comment to make it obvious still.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Lihong Yang 757976ab16 ice: Fix check for removing/adding mac filters
In function ice_set_mac_address, we will remove old dev_addr before
adding the new MAC. In the removing and adding process of the MAC,
there is no need to return error if the check finds the to-be-removed
dev_addr does not exist in the MAC filter list or the to-be-added mac
already exists, keep going or return success accordingly.

Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Michal Swiatkowski 1b8f15b64a ice: refactor filter functions
Move filter functions to separate file.

Add functions that prepare suitable ice_fltr_info struct
depending on the filter type and add this struct to earlier created
list:
- ice_fltr_add_mac_to_list
- ice_fltr_add_vlan_to_list
- ice_fltr_add_eth_to_list
This functions are used in adding and removing filters.

Create wrappers for functions mentioned above that alloc list,
add suitable ice_fltr_info to it and call add or remove function.
- ice_fltr_prepare_mac
- ice_fltr_prepare_mac_and_broadcast
- ice_fltr_prepare_vlan
- ice_fltr_prepare_eth

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Eric Joyner 857a4f0e9f ice: Fix resource leak on early exit from function
Memory allocated in the ice_add_prof_id_vsig() function wasn't being
properly freed if an error occurred inside the for-loop in the function.

In particular, 'p' wasn't being freed if an error occurred before it was
added to the resource list at the end of the for-loop.

Signed-off-by: Eric Joyner <eric.joyner@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Jesse Brandeburg 53bb66983f ice: cleanup vf_id signedness
The vf_id variable is dealt with in the code in inconsistent
ways of sign usage, preventing compilation with -Werror=sign-compare.
Fix this problem in the code by always treating vf_id as unsigned, since
there are no valid values of vf_id that are negative.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Karol Kolacinski 88865fc4bb ice: Fix casting issues
Change min() macros to min_t() which has compare type specified and it
helps avoid precision loss.

In some cases there was precision loss during calls or assignments.
Some fields in structs were unnecessarily large and gave multiple
warnings.

There were also some minor type differences which are now fixed as well as
some cases where a simple cast was needed.

Callers were were passing data that is a u16 to
ice_sched_cfg_node_bw_alloc() but the function was truncating that to a u8.
Fix that by changing the function to take a u16.

Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Lihong Yang 0fee35774d ice: Provide more meaningful error message
When printing the ice status or AQ error codes, instead of printing out the
numerical value, provide the description of the error code. This provides
more info about the issue than a number.

Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Anirudh Venkataramanan de75135b5c ice: Fix probe/open race condition
As soon as the driver registers the PF netdev, userspace utilities
like NetworkManager try to bring up the associated interface. When
this happens, the driver may not have finished initializing fully,
resulting in a bunch of errors in the interface up flow.

The driver already has a mechanism to indicate if it's not up yet;
by setting the __ICE_DOWN bit in pf->state, but this bit gets
cleared too early in the current flow. So clear this bit only when
the driver is fully up. Also check for the same bit in the ice_open
flow, and return -EBUSY if the bit is set.

Also in ice_open, replace references of vsi->back with a local
variable.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Dave Ertman 46a316500e ice: only drop link once when setting pauseparams
Currently, the ice driver is setting a PHY configuration,
which causes a link drop, and then additionally it calls
for a nway_reset, which restarts auto-negotiation on the
link, which also causes a link drop.  These two link
events in such close timing is causing the FW to not be
able to generate a link interrupt for the driver to
respond to.

Remove the unnecessary auto-negotiation restart from the
set pauseparams flow.  Also remove error path that
would have performed an ice_down/ice_up as that is
also unnecessary.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Dave Ertman 891540024b ice: Fix check for contiguous TCs
The current implementation for contiguous TC check
is assuming that the UPs will be mapped to TCs in
a linear progressing fashion.  This is obviously
not always true.

Change the check to allow for various UP2TC mapping
configurations.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Avinash JD 610ed0e93e ice: Don't reset and rebuild for Tx timeout on PFC enabled queue
When there's a Tx timeout for a queue which belongs to a PFC enabled TC,
then it's not because the queue is hung but because PFC is in action.

In PFC, peer sends a pause frame for a specified period of time when its
buffer threshold is exceeded (due to congestion). Netdev on the other
hand checks if ACK is received within a specified time for a TX packet, if
not, it'll invoke the tx_timeout routine.

Signed-off-by: Avinash JD <avinash.dayanand@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:03 -07:00
Brett Creeley 01b5e89aab ice: Add VF promiscuous support
Implement promiscuous support for VF VSIs. Behaviour of promiscuous support
is based on VF trust as well as the, introduced, vf-true-promisc flag.

A trusted VF with vf-true-promisc disabled will be the default VSI, which
means that all traffic without a matching destination MAC address in the
device's internal switch will be forwarded to this VF VSI.

A trusted VF with vf-true-promisc enabled will go into "true promiscuous
mode". This amounts to the VF receiving all ingress and egress traffic
that hits the device's internal switch.

An untrusted VF will only receive traffic destined for that VF.

The vf-true-promisc-support flag cannot be toggled while any VF is in
promiscuous mode. This flag should be set prior to loading the iavf driver
or spawning VF(s).

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:03 -07:00