This patch is mostly prep-work for replacing the current approach to
programming the dynamic aka adaptive ITR. Specifically here what we are
doing is splitting the Tx and Rx ITR each into two separate values.
The first value current_itr represents the current value of the register.
The second value target_itr represents the desired value of the register.
The general plan by doing this is to allow for deferring the update of the
ITR value under certain circumstances. For now we will work with what we
have, but in the future I hope to change the behavior so that we always
only update one ITR at a time using some simple logic to determine which
ITR requires an update.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
While testing code for the recent ITR changes I found that updating the Tx
ITR appeared to have no effect with everything defaulting to the Rx ITR. A
bit of digging narrowed it down the fact that we were asking the PF to
associate all causes with ITR 0 as we weren't populating the itr_idx values
for either Rx or Tx.
To correct it I have added the configuration for these values to this
patch. In addition I did some minor clean-up to just add a local pointer
for the vector map instead of dereferencing it based off of the index
repeatedly. In my opinion this makes the resultant code a bit more readable
and saves us a few characters.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Instead of using the register value for the defines when setting up the
ring ITR we can just use the actual values and avoid the use of shifts and
macros to translate between the values we have and the values we want.
This helps to make the code more readable as we can quickly translate from
one value to the other.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The CLEARPBA bit in the dynamic interrupt control register actually has
no effect either way on the hardware. As per errata 28 in the XL710
specification update the interrupt is actually cleared any time the
register is written with the INTENA_MSK bit set to 0. As such the act of
toggling the enable bit actually will trigger the interrupt being
cleared and could lead to potential lost events if auto-masking is
not enabled.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch is a further clean-up related to the change over to using
q_vector->reg_idx when accessing the ITR registers. Specifically the code
appears to have several other spots where we were computing the register
offset manually and this resulted in errors in a few spots.
Specifically in the i40evf functions for mapping queues to vectors it
appears we may have had an off by 1 error since (v_idx - 1) for the first
q_vector with an index of 0 would result in us returning -1 if I am not
mistaken.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently in i40e_set_priv_flags we use new_flags to check for the
I40E_FLAG_DISABLE_FW_LLDP flag. This is an issue for a few a reasons.
DISABLE_FW_LLDP is persistent across reboots/driver reloads. This means
we need some way to detect if FW LLDP is enabled on init. We do this by
trying to init_dcb and if it fails with EPERM we know LLDP is disabled
in FW.
This could be a problem on older FW versions or NPAR enabled PFs because
there are situations where the FW could disable LLDP, but they do _not_
support using this flag to change it. If we do end up in this
situation, the flag will be set, then when the user tries to change any
priv flags, the driver thinks the user is trying to disable FW LLDP on a
FW that doesn't support it and essentially forbids any priv flag
changes.
The fix is simple, instead of checking if this flag is set, we should be
checking if the user is trying to _change_ the flag on unsupported FW
versions.
This patch also adds a comment explaining that the cmpxchg is the point
of no return. Once we put the new flags into pf->flags we can't back
out.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds a warning message when the link-down-on-close flag is
setting on. The warning is printed only on MFP devices
Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds necessary delay for 4.33 firmware to recover after
EMP reset. Without this patch driver occasionally reinitializes
structures too quickly to communicate with firmware after EMP reset
causing AdminQ to timeout.
Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The logic for dynamic ITR update is confusing at best as there were odd
paths chosen for how to find the rings associated with a given queue based
on the vector index and other inconsistencies throughout the code.
This patch is an attempt to clean up the logic so that we can more easily
understand what is going on. Specifically if there is a Rx or Tx ring that
is enabled in dynamic mode on the q_vector it is allowed to override the
other side of the interrupt moderation. While it isn't correct all this
patch is doing is cleaning up the logic for now so that when we come
through and fix it we can more easily identify that this is wrong.
The other big change made here is that we replace references to:
vsi->rx_rings[q_vector->v_idx]->itr_setting
with:
q_vector->rx.ring->itr_setting
The general idea is we can avoid the long pointer chase since just
accessing q_vector->rx.ring is a single pointer access versus having to
chase down vsi->rx_rings, and then finding the pointer in the array, and
finally chasing down the itr_setting from there.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The rings are already split out into Tx and Rx rings so it doesn't make
sense to have any single ring store both a Tx and Rx itr_setting value.
Since that is the case drop the pair in favor of storing just a single ITR
value.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
'bufer' should be 'buffer'
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fix the number of queues per enabled TC and report available queues
to the kernel without having to limit them to the max RSS limit so
they are available to be mapped for XPS. This allows a queue per
processing thread available for handling traffic for the given
traffic class.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJad5lgAAoJEFmIoMA60/r8s2kQAI3PztawDpaCP9Z12pkbBHSt
Ho0xTyk9rCZi9kQJbNjc+a+QrlA3QmTHXIXerB3LSWoh7M+XhsECjem92eHpgLNS
JvYPhTfOrCr0vdiAmOz6hD0AqN/psrbfzgiJhSwomsGEFS77k7kERSJckRv81sxb
Aj5F/WjucAgLorwm4auveAJEQ7atE7/6pkXzoqYm4G6NLOb46jUcRGndrnvXZBlz
fws8fBM4BHyi7i25CYQl24tFq1CGax1rIPgLg+4KnH76bQk/N6Ju0sGVSzfh+hG8
SIerK9bJbzGRAuNKoxB3aO1dyzsK3x9WztE2mG98w5trOISPIR1FqnvC/225FWAU
d6eIXiC7wKnEx+DElNTzCjzfHc7SAJoupO32H7CoiTe5zPUlWlxJ1zLYkK1gt50q
m8PRBiYTglxyznzrO0drtcdjEzvbdZNRrsYnul4wi1vSHzjk6F6XLtzT10XWM1M1
1pXLB8384FTj0Hu4bq6Y3Aivkmz0Sf+eQM2NaOwe+Zj7/1VV0d3lvi4LUXkqzLCA
FoXPJSMxG2Qu+iflCeYRQBJjExaZH3eNLZ3dT6QpcJrjaFVedd9u5DeeFqNL27zV
bhr8TdqrR4p4rc8EBAGoCapw96IxLZROKB3gxbrZVOpfIZpzthwHbElHX6aqUgF4
w/EV1JWs36WXWaxFk8wd
=ttq9
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- skip AER driver error recovery callbacks for correctable errors
reported via ACPI APEI, as we already do for errors reported via the
native path (Tyler Baicar)
- fix DPC shared interrupt handling (Alex Williamson)
- print full DPC interrupt number (Keith Busch)
- enable DPC only if AER is available (Keith Busch)
- simplify DPC code (Bjorn Helgaas)
- calculate ASPM L1 substate parameter instead of hardcoding it (Bjorn
Helgaas)
- enable Latency Tolerance Reporting for ASPM L1 substates (Bjorn
Helgaas)
- move ASPM internal interfaces out of public header (Bjorn Helgaas)
- allow hot-removal of VGA devices (Mika Westerberg)
- speed up unplug and shutdown by assuming Thunderbolt controllers
don't support Command Completed events (Lukas Wunner)
- add AtomicOps support for GPU and Infiniband drivers (Felix Kuehling,
Jay Cornwall)
- expose "ari_enabled" in sysfs to help NIC naming (Stuart Hayes)
- clean up PCI DMA interface usage (Christoph Hellwig)
- remove PCI pool API (replaced with DMA pool) (Romain Perier)
- deprecate pci_get_bus_and_slot(), which assumed PCI domain 0 (Sinan
Kaya)
- move DT PCI code from drivers/of/ to drivers/pci/ (Rob Herring)
- add PCI-specific wrappers for dev_info(), etc (Frederick Lawler)
- remove warnings on sysfs mmap failure (Bjorn Helgaas)
- quiet ROM validation messages (Alex Deucher)
- remove redundant memory alloc failure messages (Markus Elfring)
- fill in types for compile-time VGA and other I/O port resources
(Bjorn Helgaas)
- make "pci=pcie_scan_all" work for Root Ports as well as Downstream
Ports to help AmigaOne X1000 (Bjorn Helgaas)
- add SPDX tags to all PCI files (Bjorn Helgaas)
- quirk Marvell 9128 DMA aliases (Alex Williamson)
- quirk broken INTx disable on Ceton InfiniTV4 (Bjorn Helgaas)
- fix CONFIG_PCI=n build by adding dummy pci_irqd_intx_xlate() (Niklas
Cassel)
- use DMA API to get MSI address for DesignWare IP (Niklas Cassel)
- fix endpoint-mode DMA mask configuration (Kishon Vijay Abraham I)
- fix ARTPEC-6 incorrect IS_ERR() usage (Wei Yongjun)
- add support for ARTPEC-7 SoC (Niklas Cassel)
- add endpoint-mode support for ARTPEC (Niklas Cassel)
- add Cadence PCIe host and endpoint controller driver (Cyrille
Pitchen)
- handle multiple INTx status bits being set in dra7xx (Vignesh R)
- translate dra7xx hwirq range to fix INTD handling (Vignesh R)
- remove deprecated Exynos PHY initialization code (Jaehoon Chung)
- fix MSI erratum workaround for HiSilicon Hip06/Hip07 (Dongdong Liu)
- fix NULL pointer dereference in iProc BCMA driver (Ray Jui)
- fix Keystone interrupt-controller-node lookup (Johan Hovold)
- constify qcom driver structures (Julia Lawall)
- rework Tegra config space mapping to increase space available for
endpoints (Vidya Sagar)
- simplify Tegra driver by using bus->sysdata (Manikanta Maddireddy)
- remove PCI_REASSIGN_ALL_BUS usage on Tegra (Manikanta Maddireddy)
- add support for Global Fabric Manager Server (GFMS) event to
Microsemi Switchtec switch driver (Logan Gunthorpe)
- add IDs for Switchtec PSX 24xG3 and PSX 48xG3 (Kelvin Cao)
* tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits)
PCI: cadence: Add EndPoint Controller driver for Cadence PCIe controller
dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe endpoint controller
PCI: endpoint: Fix EPF device name to support multi-function devices
PCI: endpoint: Add the function number as argument to EPC ops
PCI: cadence: Add host driver for Cadence PCIe controller
dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe host controller
PCI: Add vendor ID for Cadence
PCI: Add generic function to probe PCI host controllers
PCI: generic: fix missing call of pci_free_resource_list()
PCI: OF: Add generic function to parse and allocate PCI resources
PCI: Regroup all PCI related entries into drivers/pci/Makefile
PCI/DPC: Reformat DPC register definitions
PCI/DPC: Add and use DPC Status register field definitions
PCI/DPC: Squash dpc_rp_pio_get_info() into dpc_process_rp_pio_error()
PCI/DPC: Remove unnecessary RP PIO register structs
PCI/DPC: Push dpc->rp_pio_status assignment into dpc_rp_pio_get_info()
PCI/DPC: Squash dpc_rp_pio_print_error() into dpc_rp_pio_get_info()
PCI/DPC: Make RP PIO log size check more generic
PCI/DPC: Rename local "status" to "dpc_status"
PCI/DPC: Squash dpc_rp_pio_print_tlp_header() into dpc_rp_pio_print_error()
...
When compared to ixgbe and other previous Intel drivers the i40e and i40evf
drivers actually reserve 2 additional descriptors in maybe_stop_tx for
cache line alignment. We need to update DESC_NEEDED to reflect this as
otherwise we are more likely to return TX_BUSY which will cause issues with
things like xmit_more.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2018-01-26
This series contains updates to i40e and i40evf.
Michal updates the driver to pass critical errors from the firmware to
the caller.
Patryk fixes an issue of creating multiple identical filters with the
same location, by simply moving the functions so that we remove the
existing filter and then add the new filter.
Paweł adds back in the ability to turn off offloads when VLAN is set for
the VF driver. Fixed an issue where the number of TC queue pairs was
exceeding MSI-X vectors count, causing messages about invalid TC mapping
and wrong selected Tx queue.
Alex cleans up the i40e/i40evf_set_itr_per_queue() by dropping all the
unneeded pointer chases. Puts to use the reg_idx value, which was going
unused, so that we can avoid having to compute the vector every time
throughout the driver.
Upasana enable the driver to display LLDP information on the vSphere Web
Client by exposing DCB parameters.
Alice converts our flags from 32 to 64 bit size, since we have added
more flags.
Dave implements a private ethtool flag to disable the processing of LLDP
packets by the firmware, so that the firmware will not consume LLDPDU
and cause them to be sent up the stack.
Alan adds a mechanism for detecting/storing the flag for processing of
LLDP packets by the firmware, so that its current state is persistent
across reboots/reloads of the driver.
Avinash fixes kdump with i40e due to resource constraints. We were
enabling VMDq and iWARP when we just have a single CPU, which was
starving kdump for the lack of IRQs.
Jake adds support to program the fragmented IPv4 input set PCTYPE.
Fixed the reported masks to properly report that the entire field is
masked, since we had accidentally swapped the mask values for the IPv4
addresses with the L4 port numbers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch suppresses the message about invalid TC mapping and wrong
selected TX queue. The root cause of this bug was setting too many
TC queue pairs on huge multiprocessor machines. When quantity of the
TC queue pairs is exceeding MSI-X vectors count then TX queue number
can be selected beyond actual TX queues amount.
Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The drivers for i40e and i40evf had a reg_idx value stored in the q_vector
that was going completely unused. I can only assume this was copied over
from ixgbe and nobody knew how to use it.
I'm going to make use of the value to avoid having to compute the vector
and thus the register index for multiple paths throughout the drivers.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In commit 36777d9fa2 ("i40e: check current configured input set when
adding ntuple filters") some code was added to report the input set
mask for a given filter when reporting it to the user.
This code is necessary so that the reported filter correctly displays
that it is or is not masking certain fields.
Unfortunately the code was incorrect. Development error accidentally
swapped the mask values for the IPv4 addresses with the L4 port numbers.
The port numbers are only 16bits wide while IPv4 addresses are 32 bits.
Unfortunately we assigned only 16 bits to the IPv4 address masks.
Additionally we assigned 32bit value 0xFFFFFFF to the TCP port numbers.
This second part does not matter as the value would be truncated to
16bits regardless, but it is unnecessary.
Fix the reported masks to properly report that the entire field is
masked.
Fixes: 36777d9fa2 ("i40e: check current configured input set when adding ntuple filters")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Our hardware does not allow situations where two filters might conflict
when matching. Essentially hardware only programs one filter for each
set of matching criteria. We don't support filters with overlapping
input sets, because each flow type can only use a single input set.
Additionally, different flow types will never have overlapping matches,
because of how the hardware parses the flow type before checking
matching criteria.
For this reason, we do not need or use the location number when
programming filters to hardware.
In order to avoid confusing scenarios with filters that match the same
criteria but program the flow to different queues, do not allow multiple
filters that match identical criteria to be programmed.
This ensures that we avoid odd scenarios when deleting filters, and when
programming new filters that match the same criteria.
Instead, users that wish to update the criteria for a filter must use
the same location id, or must delete all the matching filters first.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When implementing support for IP_USER_FLOW filters, we correctly
programmed a filter for both the non fragmented IPv4/Other filter, as
well as the fragmented IPv4 filters. However, we did not properly
program the input set for fragmented IPv4 PCTYPE. This meant that the
filters would almost certainly not match, unless the user specified all
of the flow types.
Add support to program the fragmented IPv4 filter input set. Since we
always program these filters together, we'll assume that the two input
sets must match, and will thus always program the input sets to the same
value.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
kdump fails in the system when used in conjunction with Ethernet driver
X722/X710. This is mainly because when we are resource constrained i.e.
when we have just one online_cpus, we are enabling VMDq and iWARP. It
doesn't make sense to enable them with just one CPU and starve kdump
for lack of IRQs.
So don't enable VMDq or iWARP when we just have a single CPU.
Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Using ethtool --set-priv-flags disable-fw-lldp <on/off> is persistent
across reboots/reloads so we need some mechanism in the driver to detect
if it's on or off on init so we can set the ethtool private flag
appropriately. Without this, every time the driver is reloaded the flag
will default to off regardless of whether it's on or off in FW.
We detect this by first attempting to program DCB and if AQ fails
returning I40E_AQ_RC_EPERM, we know that LLDP is disabled in FW.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Implement the private flag disable-fw-lldp for ethtool
to disable the processing of LLDP packets by the FW.
This will stop the FW from consuming LLDPDU and cause
them to be sent up the stack.
The FW is also being configured to apply a default DCB
configuration on link up.
Toggling the value of this flag will also cause a PF reset.
Disabling FW DCB will also disable DCBx.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
As we have added more flags, we need to now use more
bits and have over flooded the 32 bit size. So
make it 64.
Also change all the existing bits to unsigned long long
bits.
Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch enables driver to display LLDP information on the vSphere Web
Client with Intel adapters (X710, XL710) and Distributed Virtual Switch.
Signed-off-by: Upasana Menon <upasana.menon@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change cleans up the i40e/i40evf_set_itr_per_queue function by
dropping all the unneeded pointer chases. Instead we can just pull out the
pointers for the Tx and Rx rings and use them throughout the function.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds back the capability to turn off offloads when VF has
VLAN set. The commit 0a3b4f702f ("i40evf: enable support for VF VLAN
tag stripping control") adds the i40evf_set_features function and
changes the 'turn off' flow for offloads. This patch adds that
capability back by moving checking the VLAN option for VF to the
next statement.
Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch reorders i40e_add_del_fdir and i40e_update_ethtool_fdir_entry
calls so that we first remove an already existing filter (inside
i40e_update_ethtool_fdir_entry using i40e_add_del_fdir) and then
we add a new one with i40e_add_del_fdir.
After applying this patch, creating multiple identical filters (with
the same location) one after another doesn't revert their behavior
but behaves correctly.
Signed-off-by: Patryk Małek <patryk.malek@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The FW has the ability to return a critical error on every AQ command.
When this critical error occurs then we need to send the correct response
to the caller.
Signed-off-by: Michal Kosiarz <michal.kosiarz@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
commit 2de6aa3a66 ("ixgbe: Add support for padding packet")
Uses RXDCTL.RLPML to limit the maximum frame size on Rx when using
build_skb. Unfortunately that register does not work on 82599.
Added an explicit check to avoid setting this register on 82599 MAC.
Extended the comment related to the setting of RXDCTL.RLPML to better
explain its purpose.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
"offset" can't be both 0x0 and 0xFFFF so presumably || was intended
instead of &&. That matches with how this check is done in other
functions.
Fixes: 73834aec71 ("ixgbe: extend firmware version support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Since 5G link speed is supported by some devices, add reporting of 5G link
speed.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The current code enables on X550 timestamping of all packets for any
filter, which means ethtool should not report any PTP-specific filters
as unsupported.
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Use the ARRAY_SIZE macro on array buf to determine size of the array.
Improvement suggested by coccinelle.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Use the ARRAY_SIZE macro on various arrays to determine
size of the arrays. Improvement suggested by coccinelle.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In the case of the Tx rings we need to only clear the Tx buffer_info when
we are resetting the rings. Ideally we do this when we configure the ring
to bring it back up instead of when we are taking it down in order to avoid
dirtying pages we don't need to.
In addition we don't need to clear the Tx descriptor ring since we will
fully repopulate it when we begin transmitting frames and next_to_watch can
be cleared to prevent the ring from being cleaned beyond that point instead
of needing to touch anything in the Tx descriptor ring.
Finally with these changes we can avoid having to reset the skb member of
the Tx buffer_info structure in the cleanup path since the skb will always
be associated with the first buffer which has next_to_watch set.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Based on commit ec718254cb
("ixgbe: Improve performance and reduce size of ixgbe_tx_map")
This change is meant to both improve the performance and reduce the size of
ixgbevf_tx_map().
Expand the work done in the main loop by pushing first into tx_buffer.
This allows us to pull in the dma_mapping_error check, the tx_buffer value
assignment, and the initial DMA value assignment to the Tx descriptor.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Based on commit d2bead576e
("igb: Clear Rx buffer_info in configure instead of clean")
This change makes it so that instead of going through the entire ring on Rx
cleanup we only go through the region that was designated to be cleaned up
and stop when we reach the region where new allocations should start.
In addition we can avoid having to perform a memset on the Rx buffer_info
structures until we are about to start using the ring again.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We already had placehloders for failed page and buffer allocations.
Added alloc_rx_page and made sure the stats are properly updated and
exposed in ethtool.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Based on commit bd4171a5d4
("igb: update code to better handle incrementing page count")
Update the driver code so that we do bulk updates of the page reference
count instead of just incrementing it by one reference at a time. The
advantage to doing this is that we cut down on atomic operations and
this in turn should give us a slight improvement in cycles per packet.
In addition if we eventually move this over to using build_skb the gains
will be more noticeable.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Based on commit 5be5955425
("igb: update driver to make use of DMA_ATTR_SKIP_CPU_SYNC")
and
commit 7bd1759282 ("igb: Add support for DMA_ATTR_WEAK_ORDERING")
Convert the calls to dma_map/unmap_page() to the attributes version
and add DMA_ATTR_SKIP_CPU_SYNC/WEAK_ORDERING which should help
improve performance on some platforms.
Move sync_for_cpu call before we perform a prefetch to avoid
invalidating the first 128 bytes of the packet on architectures where
that call may invalidate the cache.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Based on:
commit 7ec0116c91 ("igb: Use length to determine if descriptor is done")
This change makes it so that we use the length of the packet instead of the
DD status bit to determine if a new descriptor is ready to be processed.
The obvious advantage is that it cuts down on reads as we don't really even
need the DD bit if going from a 0 to a non-zero value on size is enough to
inform us that the packet has been completed.
In addition we only reset the Rx descriptor length for descriptor zero when
resetting a ring instead of having to do a memset with 0 over the entire
ring. By doing this we can save some time on initialization.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Based on commit 64f2525ca4 ("igb: Only DMA sync frame length")
On some architectures synching a buffer for DMA may be expensive.
Instead of the entire 2K receive buffer only synchronize the length of
the frame, which will typically be the MTU or smaller.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Introduce ixgbevf_can_reuse_page() similar to the change in ixgbe from
commit af43da0dba
("ixgbe: Add function for checking to see if we can reuse page")
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Make use of tc_cls_can_offload_and_chain0() to set extack msg in case
ethtool tc offload flag is not set or chain unsupported.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of tc_cls_can_offload_and_chain0() to set extack msg in case
ethtool tc offload flag is not set or chain unsupported.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2018-01-24
This series contains updates to fm10k only.
Alex fixes MACVLAN offload for fm10k, where we were not seeing unicast
packets being received because we did not correctly configure the
default VLAN ID for the port and defaulting to 0.
Jake cleans up unnecessary parenthesis in a couple of "if" statements.
Fixed the driver to stop adding VLAN 0 into the VLAN table, since it
would cause the VLAN table to be inconsistent between the PF and VF.
Also fixed an issue where we were assuming that VLAN 1 is enabled when
the default VLAN ID is not set, so resolve by not requesting any filters
for the default_vid if it has not yet been assigned.
Ngai fixes an issue which was generating a dmesg regarding unbale to
kill a particular VLAN ID for the device. This is due to
ndo_vlan_rx_kill_vid() exits with an error and the handler for this ndo
is fm10k_update_vid() which exits prematurely under PF VLAN management.
So to resolve, we must check the VLAN update action type before exiting
fm10k_update_vid(), and act appropriately based on the action type.
Also corrected code comment typos.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Clarify the comment for when entering promiscuous mode that we update
the VLAN table. Add a comment distinguishing the case where we're
exiting promiscuous mode and need to clear the entire VLAN table.
Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@gmail.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Since commit 856dfd69e84f ("fm10k: Fix multicast mode synch issues",
2016-03-03) we've incorrectly assumed that VLAN 1 is enabled when the
default VID is not set.
This occurs because we check the default_vid and if it's zero, start
several loops over the active_vlans bitmask at 1, instead of checking to
ensure that that bit is active.
This happened because of commit d9ff3ee8efe9 ("fm10k: Add support for
VLAN 0 w/o default VLAN", 2014-08-07) which mistakenly assumed that we
should send requests for MAC and VLAN filters with VLAN 0 when the
default_vid isn't set.
However, the switch generally considers this an invalid configuration,
so the only time we'd have a default_vid of 0 is when the switch is
down.
Instead, lets just not request any filters for the default_vid if it's
not yet been assigned.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently, when the driver loads, it sends a request to add VLAN 0 to the
VLAN table. For the PF, this is honored, and VLAN 0 is indeed set. For
the VF, this request is silently converted into a request for the
default VLAN as defined by either the switch vid or the PF vid.
This results in the odd behavior that the VLAN table doesn't appear
consistent between the PF and the VF.
Furthermore, setting a MAC filter with VLAN 0 is generally considered an
invalid configuration by the switch, and since commit 856dfd69e84f
("fm10k: Fix multicast mode synch issues", 2016-03-03) we've had code
which prevents us from ever sending such a request.
Since there's not really a good reason to keep VLAN 0 in the VLAN table,
stop requesting it in fm10k_restore_rx_state().
This might seem to indicate that we would no longer properly configure
the MAC and VLAN tables for the default vid. However, due to the way
that fm10k_find_next_vlan() behaves, it will always return the
default_vid as enabled.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When a VF is under PF VLAN assignment:
ip link set <pf> vf <#> vlan <vid>
This will remove all previous entries in the VLAN table including those
generated by VLAN interfaces created on the VF. The issue arises when
the VF is under PF VLAN assignment and one or more of these VLAN
interfaces of the VF are deleted. When deleting these VLAN interfaces,
the following message will be generated in "dmesg":
failed to kill vid 0081/<vid> for device <vf>
This is due to the fact that "ndo_vlan_rx_kill_vid" exits with an error.
The handler for this ndo is "fm10k_update_vid". Any calls to this
function while under PF VLAN management will exit prematurely and, thus,
it will generate the failure message.
Additionally, since "fm10k_update_vid" exits prematurely, none of the
VLAN update is performed. So, even though the actual VLAN interfaces of
the VF will be deleted, the active_vlans bitmask is not cleared. When
the VF is no longer under PF VLAN assignment, the driver mistakenly
restores the previous entries of the VLAN table based on an
unsynchronized list of active VLANs.
The solution to this issue involves checking the VLAN update action type
before exiting "fm10k_update_vid". If the VLAN update action type is to
"add", this action will not be permitted while the VF is under PF VLAN
assignment and the VLAN update is abandoned like before.
However, if the VLAN update action type is to "kill", then we need to
also clear the active_vlans bitmask. However, we don't need to actually
queue any messages to the PF, because the MAC and VLAN tables have
already been cleared, and the PF would silently ignore these requests
anyways.
Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This fixes a few warnings found by checkpatch.pl --strict
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Since TC block changes drivers are required to check if
the TC hw offload flag is set on the interface themselves.
Fixes: 2f4b411a3d ("i40e: Enable cloud filters via tc-flower")
Fixes: 44ae12a768 ("net: sched: move the can_offload check from binding phase to rule insertion phase")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Amritha Nambiar <amritha.nambiar@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fm10k driver didn't work correctly when macvlan offload was enabled.
Specifically what would occur is that we would see no unicast packets being
received. This was traced down to us not correctly configuring the default
VLAN ID for the port and defaulting to 0.
To correct this we either use the default ID provided by the switch or
simply use 1. With that we are able to pass and receive traffic without any
issues.
In addition we were not repopulating the filter table following a reset. To
correct that I have added a bit of code to fm10k_restore_rx_state that will
repopulate the Rx filter configuration for the macvlan interfaces.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Problem description:
After ethernet cable connect and disconnect for several iterations on a
device with i210, tx timestamp will stop being put into the socket.
Steps to reproduce:
1. Setup a device with i210 and wire it to a 802.1AS capable switch (
Extreme Networks Summit x440 is used in our case)
2. Have the gptp daemon running on the device and make sure it is synced
with the switch
3. Have the switch disable and enable the port, wait for the device gets
resynced with the switch
4. Iterates step 3 until the device is not albe to get resynced
5. Review the log in dmesg and you will see warning message "igb : clearing
Tx timestamp hang"
Root cause:
If ptp_tx_work() gets scheduled just before the port gets disabled, a LINK
DOWN event will be processed before ptp_tx_work(), which may cause timeout
in ptp_tx_work(). In the timeout logic, the TSYNCTXCTL's TXTT bit (Transmit
timestamp valid bit) is not cleared, causing no new timestamp loaded to
TXSTMP register. Consequently therefore, no new interrupt is triggerred by
TSICR.TXTS bit and no more Tx timestamp send to the socket.
Signed-off-by: Daniel Hua <daniel.hua@ni.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Omit an extra message for a memory allocation failure in this function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
I personally spent a long time trying to decypher why my CPU would not
reach deeper C-states. Let's just tell the next user what's going on.
Signed-off-by: Matt Turner <matt.turner@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
By design, the idleslope increments are restricted to 16.384kbps steps.
Add a comment to igb_main.c making that explicit and add one example
that illustrates the impact of that.
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
According to section 12.0.3.4.13 "Receive Descriptor Control - RXDCTL"
of the Intel® 82579 Gigabit Ethernet PHY Datasheet v2.1:
"HTHRESH should be given a non zero value when ever PTHRESH is
used."
In RXDCTL(0), PTHRESH lives at bits 5:0, and HTHREST lives at bits 13:8.
Set only bit 8 of HTHREST as is done in e1000_flush_rx_ring(). Found by
inspection.
Signed-off-by: Matt Turner <matt.turner@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds a new function igb_get_max_rss_queues() to get maximum
RSS queues, this will reduce duplicate code and facilitate future
maintenance.
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Before libvirt modifies the MAC address and vlan tag for an SRIOV VF
for use by a virtual machine (either using vfio device assignment or
macvtap passthru mode), it saves the current MAC address and vlan tag
so that it can reset them to their original value when the guest is
done. Libvirt can't leave the VF MAC set to the value used by the
now-defunct guest since it may be started again later using a
different VF, but it certainly shouldn't just pick any random value,
either. So it saves the state of everything prior to using the VF, and
resets it to that.
The igb driver initializes the MAC addresses of all VFs to
00:00:00:00:00:00, and reports that when asked (via an RTM_GETLINK
netlink message, also visible in the list of VFs in the output of "ip
link show"). But when libvirt attempts to restore the MAC address back
to 00:00:00:00:00:00 (using an RTM_SETLINK netlink message) the kernel
responds with "Invalid argument".
Forbidding a reset back to the original value leaves the VF MAC at the
value set for the now-defunct virtual machine. Especially on a system
with NetworkManager enabled, this has very bad consequences, since
NetworkManager forces all interfacess to be IFF_UP all the time - if
the same virtual machine is restarted using a different VF (or even on
a different host), there will be multiple interfaces watching for
traffic with the same MAC address.
To allow libvirt to revert to the original state, we need a way to
remove the administrative set MAC on a VF, to allow normal host
operation again, and to reset/overwrite the VF MAC via VF netdev.
This patch implements the outlined scenario by allowing to set the
VF MAC to 00:00:00:00:00:00 via RTM_SETLINK on the PF.
igb_ndo_set_vf_mac resets the IGB_VF_FLAG_PF_SET_MAC flag to 0,
so it's possible to reset the VF MAC back to the original value via
the VF netdev.
Note: Recent patches to libvirt allow for a workaround if the NIC
isn't capable of resetting the administrative MAC back to all 0, but
in theory the NIC should allow resetting the MAC in the first place.
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Tested-by: Aaron Brown <arron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2018-01-23
This series contains updates to i40e and i40evf only.
Pawel enables FlatNVM support on x722 devices by allowing nvmupdate tool
to configure the preservation flags in the AdminQ command.
Mitch fixes a potential divide by zero error when DCB is enabled and
the firmware fails to configure the VSI, so check for this state.
Fixed a bug where the driver could fail to adhere to ETS bandwidth
allocations if 8 traffic classes were configured on the switch.
Sudheer fixes a potential deadlock by avoiding to call
flush_schedule_work() in i40evf_remove(), since cancel_work_sync()
and cancel_delayed_work_sync() already cleans up necessary work items.
Fixed an issue with the problematic detection and recovery from
hung queues in the PF which was causing lost interrupts. This is done
by triggering a software interrupt so that interrupts are forced on
and if we are already in napi_poll and an interrupt fires, napi_poll
will not be rescheduled and the interrupt is lost.
Avinash fixes an issue in the VF where is was possible to issue a
reset_task while the device is currently being removed.
Michal fixes an issue occurring while calling i40e_led_set() with
the blink parameter set to true, which was causing the activity LED
instead of the link LED to blink for port identification.
Shiraz changes the client interface to not call client close/open on
netdev down/up events, since this causes a lot of thrash that is
not needed. Instead, disable the PE TCP-ENA flag during a netdev
down event and re-enable on a netdev up event, since this blocks all
TCP traffic to the RDMA protocol engine.
Alan fixes an issue which was causing a potential transmit hang by
ignoring the PF link up message if the VF state is not yet in the
RUNNING state.
Amritha fixes the channel VSI recreation during the reset flow to
reconfigure the transmit rings and the queue context associated with
the channel VSI.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix recreating the channel VSIs during the reset flow to reconfigure
the Tx rings and the queue context associated with the channel VSI.
Also update the next_base_queue for the VSI while rebuilding the
channel VSIs after a reset.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If we receive the link status message from PF with link up before queues
are actually enabled, it will trigger a TX hang. This fixes the issue
by ignoring a link up message if the VF state is not yet in RUNNING
state.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Omit an extra message for a memory allocation failure in this function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Client close is overloaded to handle both un-registration and
netdev down event. On netdev down, i40iw client close is called
which unregisters the RDMA dev and this is too destructive
since the netdev is still registered.
Do not call client close/open on netdev down/up events. Instead
disable the PE TCP_ENA flag during a netdev down event. This
blocks all TCP traffic to the RDMA Protocol Engine. On netdev up,
re-enable the flag.
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Now that i40e_vsi_config_tc() has the pf and hw variable defined, use
them, instead of dereferencing vsi->back. Much easier to read.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The driver (and the entire netdev layer for that matter) assumes
that TC0 will always be present in our DCB configuration.
Unfortunately, this isn't always the case. Rather than fail to
configure the VSI, let's go ahead and try to make it work, even
though DCB will end up being disabled by the kernel.
If the driver fails to configure DCB, the driver queries what's
valid, then writes that back to the hardware, always forcing TC0.
This fixes a bug where the driver could fail to adhere to ETS BW
allocations if 8 TCs were configured on the switch.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In VFs, there is a known issue which can cause writebacks
to not occur when interrupts are disabled and there are
less than 4 descriptors resulting in TX timeout. Timeout
can also occur due to lost interrupt.
The current implementation for detecting and recovering
from hung queues in the PF is problematic because it actually
actively encourages lost interrupts. By triggering a SW
interrupt, interrupts are forced on. If we are already in
napi_poll and an interrupt fires, napi_poll will not be
rescheduled and the interrupt is effectively lost; thereby
potentially *causing* hung queues.
This patch checks whether packets are being processed between
every watchdog cycle and determine potential hung queue and
fires triggers SW interrupt only for that particular queue.
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This fix solves an issue occurring while calling i40e_led_set function
from the driver with "blink" parameter set as TRUE. This call resulted
in Activity LED blinking instead of Link LED, which may lead to errors
in physically identifying the port, since Activity LED may be blinking
for different reasons as well.
Signed-off-by: Michal Kuchta <michal.kuchta@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When a host disables and enables a PF device, all the associated
VFs are removed and added back in. It also generates a PFR which in turn
resets all the connected VFs. This behaviour is different from that of
Linux guest on Linux host. Hence we end up in a situation where there's
a PFR and device removal at the same time. And watchdog doesn't have a
clue about this and schedules a reset_task. This patch adds code to send
signal to reset_task that the device is currently being removed.
Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
flush_schedule_work blocks until completion of all scheduled
work items in global work-queue. This can cause deadlock in some
cases. i40evf_remove() cleans up necessary work items with
cancel_delayed_work_sync and cancel_work_sync. This fix removes
flush_schedule_work call inside i40evf_remove().
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In some weird circumstances with DCB enabled, the firmware can fail to
configure the VSI, leaving us with zero traffic classes. Check for this
state when we configure RSS to avoid a panic.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds new I40E_NVMUPD_GET_AQ_EVENT state to allow
retrieval of AdminQ events as a result of AdminQ commands sent
to firmware.
Add preservation flags support on X722 devices for NVM update
AdminQ function wrapper. Add new parameter and handling to
nvmupdate admin queue function intended to allow nvmupdate tool
to configure the preservation flags in the AdminQ command.
This is required to implement FlatNVM on X722 devices.
Signed-off-by: Pawel Jablonski <pawel.jablonski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
With all the support code in place we can now link in the ipsec
offload operations and set the ESP feature flag for the XFRM
subsystem to see.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add a simple statistic to count the ipsec offloads.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If the skb has a security association referenced in the skb, then
set up the Tx descriptor with the ipsec offload bits. While we're
here, we fix an oddly named field in the context descriptor struct.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If the chip sees and decrypts an ipsec offload, set up the skb
sp pointer with the ralated SA info. Since the chip is rude
enough to keep to itself the table index it used for the
decryption, we have to do our own table lookup, using the
hash for speed.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
On a chip reset most of the table contents are lost, so must be
restored. This scans the driver's ipsec tables and restores both
the filled and empty table slots to their pre-reset values.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add the functions for setting up and removing offloaded SAs (Security
Associations) with the x540 hardware. We set up the callback structure
but we don't yet set the hardware feature bit to be sure the XFRM service
won't actually try to use us for an offload yet.
The software tables are made up to mimic the hardware tables to make it
easier to track what's in the hardware, and the SA table index is used
for the XFRM offload handle. However, there is a hashing field in the
Rx SA tracking that will be used to facilitate faster table searches in
the Rx fast path.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Set up the data structures to be used by the ipsec offload.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add in the code for running and stopping the hardware ipsec
encryption/decryption engine. It is good to keep the engine
off when not in use in order to save on the power draw.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add a few routines to make access to the ipsec registers just a little
easier, and throw in the beginnings of an initialization.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Clean up the ipsec/macsec descriptor bit definitions to match the rest
of the defines and file organization. Also recognise the bit-definition
overlap in the error mask macro.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The BPF verifier conflict was some minor contextual issue.
The TUN conflict was less trivial. Cong Wang fixed a memory leak of
tfile->tx_array in 'net'. This is an skb_array. But meanwhile in
net-next tun changed tfile->tx_arry into tfile->tx_ring which is a
ptr_ring.
Signed-off-by: David S. Miller <davem@davemloft.net>
A cleanup of the PM code left an incorrect #ifdef in place, leading
to a harmless build warning:
drivers/net/ethernet/intel/fm10k/fm10k_pci.c:2502:12: error: 'fm10k_suspend' defined but not used [-Werror=unused-function]
drivers/net/ethernet/intel/fm10k/fm10k_pci.c:2475:12: error: 'fm10k_resume' defined but not used [-Werror=unused-function]
It's easier to use __maybe_unused attributes here, since you
can't pick the wrong one.
Fixes: 8249c47c6b ("fm10k: use generic PM hooks instead of legacy PCIe power hooks")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent checks added for formatting kernel-doc comments are causing warnings
if W= is run with a non-zero value. This patch fixes function comments to
resolve warnings when W=1 is used.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Recent checks added for formatting kernel-doc comments are causing warnings
if W= is run with a non-zero value. This patch fixes function comments to
resolve warnings when W=1 is used.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This update makes it so that we report the actual number of Tx queues via
real_num_tx_queues but are still restricted to RSS on only the first pool
by setting num_tc equal to 1. Doing this locks us into only having the
ability to setup XPS on the queues in that pool, and only those queues
should be used for transmitting anything other than macvlan traffic.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it so that instead of bringing rings up/down for various
we just update the netdev pointer for the Rx ring and set or clear the MAC
filter for the interface. By doing it this way we can avoid a number of
races and issues in the code as things were getting messy with the macvlan
clean-up racing with the interface clean-up to bring the rings down on
shutdown.
With this change we opt to leave the rings owned by the PF interface for
both Tx and Rx and just direct the packets once they are received to the
macvlan netdev.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We should not be stopping/starting the upper devices Tx queues when
handling a macvlan offload. Instead we should be stopping and starting
traffic on our own queues.
In order to prevent us from doing this I am updating the code so that we no
longer change the queue configuration on the upper device, nor do we update
the queue_index on our own device. Instead we can just use the queue index
for our local device and not update the netdev in the case of the transmit
rings.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We shouldn't be recording the Rx queue on macvlan offloaded frames since
the macvlan is normally brought up as a single queue device, and it will
trigger warnings for RPS if we have recorded queue IDs larger than the
"real_num_rx_queues" value recorded for the device.
Instead we should be recording the macvlan statistics since we are
bypassing the normal macvlan statistics that would have been generated by
the receive path.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The code throughout ixgbe was assuming that dev->num_tc was populated and
configured with the driver, when in fact this can be configured via mqprio
without any hardware coordination other than restricting us to the real
number of Tx queues we advertise.
Instead of handling things this way we need to keep a local copy of the
number of TCs in use so that we don't accidentally pull in the TC
configuration from mqprio when it is configured in software mode.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We might as well configure the limit to default to 1 pool always for the
interface. This accounts for the fact that the PF counts as 1 pool if
SR-IOV is enabled, and in general we are always running in 1 pool mode when
RSS or DCB is enabled as well, though we don't need to actually evaluate
any of the VMDq features in those cases.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The macvlan driver itself will validate the MAC address that is configured
for a given interface. There is no need for us to verify it again.
Instead we should be checking to verify that we actually allocate the filter
and have not run out of resources to configure a MAC rule in our filter
table.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
track_id == 0 is valid for “read only” profiles when
profile does not have any “write” commands.
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
PPP name was going to be confusing since PPP already means point
to point protocol. It is decided to change pipeline personalization
profile(ppp) to dynamic device personalization(ddp).
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Having the interrupts firing while we are polling causes extra overhead and
isn't needed for most systems out there. If an interrupt is lost us
experiencing a 2s latency spike before recovering is still not acceptable
and masks the issue. We are better off just identifying systems that lose
interrupts and instead enable workarounds for those systems.
To that end I am dropping the code that was strobing the interrupts as
there is a narrow window where having them enabled can actually cause
race issues anyway where a few stray packets might get misses if the
interrupt is re-enabled and fires before we call napi_complete.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If you enabled and disabled promiscuous mode on a VF you could easily put
it into a state where it would start firing interrupts on all queues at a
rate of 50+ interrupts per second even though there was no traffic present.
The issue seems to have been a stray admin queue feature flag set that was
leaving us in a high polling rate for the adminq task.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We should not be clearing the pending bit array for each vector manually.
The documentation for the hardware states that when in MSI-X mode the
pending bit array will be cleared automatically. Us clearing it ourselves
just results in multiple opportunities for us to drop an interrupt.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Variable read_size is initialized and this value is never read, it is
instead set inside the do-loop, hence the initialization is redundant
and can be removed. Cleans up clang warning:
drivers/net/ethernet/intel/i40e/i40e_nvm.c:390:6: warning: Value stored
to 'read_size' during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bump the i40e driver from 2.1.14 to 2.3.2.
Bump the i40evf driver from 3.0.1 to 3.2.2
Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We introduced the virtchnl interface in order to have an interface for
talking to a virtual device driver which was host-driver agnostic. This
interface has its own definitions, including one for link speed.
The host driver has to talk to the virtchnl interface using these new
definitions in order to remain compatible. Today, the i40e link_speed
enumerations are value-exact matches for the virtchnl interface, so it
was originally decided to simply use a typecast.
However, this is unsafe, and makes it easier for future drivers to
continue this unsafe practice. There is nothing guaranteeing these
values are exact, and the type-cast would hide any compiler warning
which indicates the problem.
Rather than rely on this type cast, introduce a helper function which
can convert the AdminQ link speed definition into a virtchnl
definition. This can then be used by host driver implementations in
order to safely convert to the interface recognized by the virtual
functions.
If the link speed is not able to be represented by the virtchnl
definitions we'll report UNKNOWN which is the safest result.
This will ensure that should the driver specific link_speeds actual bit
definitions change, we do not report them incorrectly according to the
VF.
Additionally, this provides a better pattern for future drivers to copy,
as it is more likely a future device may not use the exact same bit-wise
definition as the current virtchnl interface.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We currently notify a VF of the link state after ENABLE_QUEUES, which is
the last thing a VF does after being configured. Guests may not actually
ENABLE_QUEUES until they get configured, and thus between driver load
and device configuration the VF may show inaccurate link status.
Fix this by also sending the link state after GET_VF_RESOURCES. Although
we could remove the message following ENABLE_QUEUES, it's not that
significant of a loss, so this patch just keeps both to ensure maximum
compatibility with guests on various OSes.
Specifically, without this patch guests running FreeBSD will display
inaccurate link state until the device is brought up. This is mostly
a cosmetic issue but can be confusing to system administrators.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If i40evf_open() is called quickly at the same time as a reset occurs
(such as via ethtool) it is possible for the device to attempt to open
while a reset is in progress. This occurs because the driver was not
holding the critical task bit lock during i40evf_open, nor was it
holding it around the call to i40evf_up_complete() in
i40evf_reset_task().
We didn't hold the lock previously because calls to i40evf_down() would
take the bit lock directly, and this would have caused a deadlock.
To avoid this, we'll move the bit lock handling out of i40evf_down() and
into the callers of this function. Additionally, we'll now hold the bit
lock over the entire set of steps when going up or down, to ensure that
we remain consistent.
Ultimately this causes us to serialize the transitions between down and
up properly, and avoid changing status while we're resetting.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Although not strictly necessary, it is customary to reverse the order in
which we release locks that we acquire. This helps preserve lock
ordering during future refactors, which can help avoid potential
deadlock situations.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Stop overloading the __I40EVF_IN_CRITICAL_TASK bit lock to protect the
mac_filter_list and vlan_filter_list. Instead, implement a spinlock to
protect these two lists, similar to how we protect the hash in the i40e
PF code.
Ensure that every place where we access the list uses the spinlock to
ensure consistency, and stop holding the critical section around blocks
of code which only need access to the macvlan filter lists.
This refactor helps simplify the locking behavior, and is necessary as
a future refactor to the __I40EVF_IN_CRITICAL_TASK would cause
a deadlock otherwise.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In i40evf_reset_task we use netif_running() to determine whether or not
the device is currently up. This allows us to properly free queue memory
and shut down things before we request the hardware reset.
It turns out that we cannot be guaranteed of netif_running() returning
false until the device is fully up, as the kernel core code sets
__LINK_STATE_START prior to calling .ndo_open. Since we're not holding
the rtnl_lock(), it's possible that the driver's i40evf_open handler
function is currently being called while we're resetting.
We can't simply hold the rtnl_lock() while checking netif_running() as
this could cause a deadlock with the i40evf_open() function.
Additionally, we can't avoid the deadlock by holding the rtnl_lock()
over the whole reset path, as this essentially serializes all resets,
and can cause massive delays if we have multiple VFs on a system.
Instead, lets just check our own internal state __I40EVF_RUNNING state
field. This allows us to ensure that the state is correct and is only
set after we've finished bringing the device up.
Without this change we might free data structures about device queues
and other memory before they've been fully allocated.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Display some more stats that were already being counted, to help users
understand when priority xon/xoff packets are being sent/received
Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2018-01-09
This series contains updates to ixgbe and ixgbevf only.
Emil fixes an issue with "wake on LAN"(WoL) where we need to ensure we
enable the reception of multicast packets so that WoL works for IPv6
magic packets. Cleaned up code no longer needed with the update to
adaptive ITR.
Paul update the driver to advertise the highest capable link speed
when a module gets inserted. Also extended the displaying of firmware
version to include the iSCSI and OEM block in the EEPROM to better
identify firmware versions/images.
Tonghao Zhang cleans up a code comment that no longer applies since
InterruptThrottleRate has been removed from the driver.
Alex fixes SR-IOV and MACVLAN offload interaction, where the MACVLAN
offload was incorrectly configuring several filters with the wrong
pool value which resulted in MACLVAN interfaces not being able to
receive traffic that had to pass over the physical interface. Fixed
transmit hangs and dropped receive frames when the number of VFs
changed. Added support for RSS on MACVLAN pools for X550 devices.
Fixed up the MACVLAN limitations so we can now support 63 offloaded
devices. Cleaned up MACVLAN code that is no longer needed with the
recent changes and fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The l2 acceleration private pointer isn't needed in the ring struct. It
isn't really used anywhere other than to test and see if we are supporting
an offloaded macvlan netdev, and it is much easier to test netdev for not
being ixgbe based to verify that.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch simplifies the check for Tx pending traffic and makes it more
holistic as there being any difference between next_to_use and
next_to_clean is much more informative than if head and tail are equal, as
it is possible for us to either not update tail, or not be notified of
completed work in which case next_to_clean would not be equal to head.
In addition the simplification makes it so that we don't have to read
hardware which allows us to drop a number of variables that were previously
being used in the call.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change is a fix of the macvlan offload so that we correctly handle
macvlan offloaded devices. Specifically we were configuring our limits based
on the assumption that we were going to max out the RSS indices for every
mode. As a result when we went to 15 or more macvlan interfaces we were
forced into the 2 queue RSS mode on VFs even though they could have still
supported 4.
This change splits the logic up so that we limit either the total number of
macvlan instances if DCB is enabled, or limit the number of RSS queues used
per macvlan (instead of per pool) if SR-IOV is enabled. By doing this we
can make best use of the part.
In addition I have increased the maximum number of supported interfaces to
63 with one queue per offloaded interface as this more closely reflects the
actual values supported by the interface.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The num_rx_pools value is overwritten when we reinitialize the queue
configuration. In reality we shouldn't need to be updating the value since
it is redone every time we call into ixgbe_setup_tc so for now just drop
the spots where we were incrementing or decrementing the value.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In order for RSS to work on the macvlan pools of the X550 we need to
populate the MRQC, RETA, and RSS key values for each pool. This patch makes
it so that we now take care of that.
In addition I have dropped the macvlan specific configuration of psrtype
since it is redundant with the code that already exists for configuring
this value.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If the number of VFs are changed we need to reinitialize the part since the
offset for the device and the number of pools will be incorrect. Without
this change we can end up seeing Tx hangs and dropped Rx frames for
incoming traffic.
In addition we should drop the code that is arbitrarily changing the
default pool and queue configuration. Instead we should wait until the port
is reset and reconfigured via ixgbe_sriov_reinit.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When SR-IOV was enabled the macvlan offload was configuring several filters
with the wrong pool value. This would result in the macvlan interfaces not
being able to receive traffic that had to pass over the physical interface.
To fix it wrap the pool argument in the VMDQ_P macro which will add the
necessary offset to get to the actual VMDq pool
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Removed leftover assignment of xcast_mode.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The InterruptThrottleRate has been removed from ixgbe. Then Update
the comment.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Extend FW version reporting by displaying information from the iSCSI
or OEM block in the EEPROM.
This will allow us to more accurately identify the FW.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
On module insert advertise highest capable link speed. If module is
capable of 10G, then advertise 10G, else advertise modules capable
link speeds.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This enum is no longer needed after
commit: b4ded8327f ("ixgbe: Update adaptive ITR algorithm")
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Previously we only enabled the reception of multicast packets when
wake on multicast is set, but we also need this to allow waking with
IPv6 magic packets.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The i40e driver has a special "FDIR" RX-ring (I40E_VSI_FDIR) which is
a sideband channel for configuring/updating the flow director tables.
This (i40e_vsi_)type does not invoke XDP-ebpf code.
As suggested by Björn (V2): Instead of marking this I40E_VSI_FDIR RX-ring
a special case, reverse the logic and only select RX-rings of type
I40E_VSI_MAIN to register xdp_rxq_info's for.
Driver hook points for xdp_rxq_info:
* reg : i40e_setup_rx_descriptors (via i40e_vsi_setup_rx_resources)
* unreg: i40e_free_rx_resources (via i40e_vsi_free_rx_resources)
Tested on actual hardware with samples/bpf program.
V2: Fixed bug in i40e_set_ringparam (memset zero) + match on I40E_VSI_MAIN.
V4: Update patch desc that got out-of-sync with code.
Cc: intel-wired-lan@lists.osuosl.org
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2018-01-03
This series contains fixes for i40e and i40evf.
Amritha removes the UDP support for big buffer cloud filters since it is
not supported and having UDP enabled is a bug.
Alex fixes a bug in the __i40e_chk_linearize() which did not take into
account large (16K or larger) fragments that are split over 2 descriptors,
which could result in a transmit hang.
Jake fixes an issue where a devices own MAC address could be removed from
the unicast address list, so force a check on every address sync to ensure
removal does not happen.
Jiri Pirko fixes the return value when a filter configuration is not
supported, do not return "invalid" but return "not supported" so that
the core can react correctly.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When filter configuration is not supported, drivers should return
-EOPNOTSUPP so the core can react correctly.
Fixes: 2f4b411a3d ("i40e: Enable cloud filters via tc-flower")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In some circumstances, such as with bridging, it is possible that the
stack will add a devices own MAC address to its unicast address list.
If, later, the stack deletes this address, then the i40e driver will
receive a request to remove this address.
The driver stores its current MAC address as part of the MAC/VLAN hash
array, since it is convenient and matches exactly how the hardware
expects to be told which traffic to receive.
This causes a problem, since for more devices, the MAC address is stored
separately, and requests to delete a unicast address should not have the
ability to remove the filter for the MAC address.
Fix this by forcing a check on every address sync to ensure we do not
remove the device address.
There is a very narrow possibility of a race between .set_mac and
.set_rx_mode, if we don't change netdev->dev_addr before updating our
internal MAC list in .set_mac. This might be possible if .set_rx_mode is
going to remove MAC "XYZ" from the list, at the same time as .set_mac
changes our dev_addr to MAC "XYZ", we might possibly queue a delete,
then an add in .set_mac, then queue a delete in .set_rx_mode's
dev_uc_sync and then update netdev->dev_addr. We can avoid this by
moving the copy into dev_addr prior to the changes to the MAC filter
list.
A similar race on the other side does not cause problems, as if we're
changing our MAC form A to B, and we race with .set_rx_mode, it could
queue a delete from A, we'd update our address, and allow the delete.
This seems like a race, but in reality we're about to queue a delete of
A anyways, so it would not cause any issues.
A race in the initialization code is unlikely because the netdevice has
not yet been fully initialized and the stack should not be adding or
removing addresses yet.
Note that we don't (yet) need similar code for the VF driver because it
does not make use of __dev_uc_sync and __dev_mc_sync, but instead roles
its own method for handling updates to the MAC/VLAN list, which already
has code to protect against removal of the hardware address.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The original code for __i40e_chk_linearize didn't take into account the
fact that if a fragment is 16K in size or larger it has to be split over 2
descriptors and the smaller of those 2 descriptors will be on the trailing
edge of the transmit. As a result we can get into situations where we didn't
catch requests that could result in a Tx hang.
This patch takes care of that by subtracting the length of all but the
trailing edge of the stale fragment before we test for sum. By doing this
we can guarantee that we have all cases covered, including the case of a
fragment that spans multiple descriptors. We don't need to worry about
checking the inner portions of this since 12K is the maximum aligned DMA
size and that is larger than any MSS will ever be since the MTU limit for
jumbos is something on the order of 9K.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Since UDP based filters are not supported via big buffer cloud
filters, remove UDP support. Also change a few return types to
indicate unsupported vs invalid configuration.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The PCI pool API is deprecated. Replace the PCI pool old API by the
appropriate function with the DMA pool API.
Tested-by: Peter Senna Tschudin <peter.senna@collabora.com>
Signed-off-by: Romain Perier <romain.perier@collabora.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Peter Senna Tschudin <peter.senna@collabora.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
e1000e_check_for_copper_link() and e1000_check_for_copper_link_ich8lan()
are the two functions that may be assigned to mac.ops.check_for_link when
phy.media_type == e1000_media_type_copper. Commit 19110cfbb3 ("e1000e:
Separate signaling for link check/link up") changed the meaning of the
return value of check_for_link for copper media but only adjusted the first
function. This patch adjusts the second function likewise.
Reported-by: Christian Hesse <list@eworm.de>
Reported-by: Gabriel C <nix.or.die@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=198047
Fixes: 19110cfbb3 ("e1000e: Separate signaling for link check/link up")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Christian Hesse <list@eworm.de>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
tcfm_dev always points to the correct netdev and we already
hold a refcnt, so no need to use tcfm_ifindex to lookup again.
If we would support moving target netdev across netns, using
pointer would be better than ifindex.
This also fixes dumping obsolete ifindex, now after the
target device is gone we just dump 0 as ifindex.
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding cloud filters could fail for a number of reasons,
unsupported filter fields for example, which fails during
validation of fields itself. This will not result in admin
command errors and converting the admin queue status to posix
error code using i40e_aq_rc_to_posix would result in incorrect
error values. If the failure was due to AQ error itself,
reporting that correctly is handled in the inner function.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This is a follow on to commit b10effb92e ("fix buffer overrun while the
I219 is processing DMA transactions") to address David Laights concerns
about the use of "magic" numbers. So define masks as well as add
additional code comments to give a better understanding of what needs to
be done to avoid a buffer overrun.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Alexander H Duyck <alexander.h.duyck@intel.com>
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Reviewed-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
sizeof when applied to a pointer typed expression gives the size of
the pointer.
The proper fix in this particular case is to code sizeof(*vfres)
instead of sizeof(vfres).
This issue was detected with the help of Coccinelle.
Signed-off-by: Gustavo A R Silva <garsilva@embeddedor.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The original issue being fixed in this patch was seen with the ixgbe
driver, but the same issue exists with i40evf as well, as the code is
very similar. read_barrier_depends is not sufficient to ensure
loads following it are not speculatively loaded out of order
by the CPU, which can result in stale data being loaded, causing
potential system crashes.
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The original issue being fixed in this patch was seen with the ixgbe
driver, but the same issue exists with fm10k as well, as the code is
very similar. read_barrier_depends is not sufficient to ensure
loads following it are not speculatively loaded out of order
by the CPU, which can result in stale data being loaded, causing
potential system crashes.
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The original issue being fixed in this patch was seen with the ixgbe
driver, but the same issue exists with igb as well, as the code is
very similar. read_barrier_depends is not sufficient to ensure
loads following it are not speculatively loaded out of order
by the CPU, which can result in stale data being loaded, causing
potential system crashes.
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The original issue being fixed in this patch was seen with the ixgbe
driver, but the same issue exists with igbvf as well, as the code is
very similar. read_barrier_depends is not sufficient to ensure
loads following it are not speculatively loaded out of order
by the CPU, which can result in stale data being loaded, causing
potential system crashes.
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The original issue being fixed in this patch was seen with the ixgbe
driver, but the same issue exists with ixgbevf as well, as the code is
very similar. read_barrier_depends is not sufficient to ensure
loads following it are not speculatively loaded out of order
by the CPU, which can result in stale data being loaded, causing
potential system crashes.
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The original issue being fixed in this patch was seen with the ixgbe
driver, but the same issue exists with i40e as well, as the code is
very similar. read_barrier_depends is not sufficient to ensure
loads following it are not speculatively loaded out of order
by the CPU, which can result in stale data being loaded, causing
potential system crashes.
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>