It is not safe to have the string table for statistics change order or
size over the lifetime of a given netdevice. This is because of the
nature of the 3-step process for obtaining stats. First, user space
performs a request for the size of the strings table. Second it performs
a separate request for the strings themselves, after allocating space
for the table. Third, it requests the stats themselves, also allocating
space for the table.
If the size decreased, there is potential to see garbage data or stats
values. In the worst case, we could potentially see stats values become
mis-aligned with their strings, so that it looks like a statistic is
being reported differently than it actually is.
Even worse, if the size increased, there is potential that the strings
table or stats table was not allocated large enough and the stats code
could access and write to memory it should not, potentially resulting in
undefined behavior and system crashes.
It isn't even safe if the size always changes under the RTNL lock. This
is because the calls take place over multiple user space commands, so it
is not possible to hold the RTNL lock for the entire duration of
obtaining strings and stats. Further, not all consumers of the ethtool
API are the user space ethtool program, and it is possible that one
assumes the strings will not change (valid under the current contract),
and thus only requests the stats values when requesting stats in a loop.
Finally, it's not possible in the general case to detect when the size
changes, because it is quite possible that one value which could impact
the stat size increased, while another decreased. This would result in
the same total number of stats, but reordering them so that stats no
longer line up with the strings they belong to. Since only size changes
aren't enough, we would need some sort of hash or token to determine
when the strings no longer match. This would require extending the
ethtool stats commands, but there is no more space in the relevant
structures.
The real solution to resolve this would be to add a completely new API
for stats, probably over netlink.
In the ice driver, the only thing impacting the stats that is not
constant is the number of queues. Instead of reporting stats for each
used queue, report stats for each allocated queue. We do not change the
number of queues allocated for a given netdevice, as we pass this into
the alloc_etherdev_mq() function to set the num_tx_queues and
num_rx_queues.
This resolves the potential bugs at the slight cost of displaying many
queue statistics which will not be activated.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch implements multiple pieces of functionality:
1. Added ice_vsi_sync_filters, which is called through the service task
to push filter updates to the hardware.
2. Add support to enable/disable promiscuous mode on an interface.
Enabling/disabling promiscuous mode on an interface results in
addition/removal of a promisc filter rule through ice_vsi_sync_filters.
3. Implement handlers for ndo_set_mac_address, ndo_change_mtu,
ndo_poll_controller and ndo_set_rx_mode.
This patch also marks the end of the driver addition by bumping up the
driver version.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Link events are posted to a PF's admin receive queue (ARQ). This patch
adds the ability to detect and process link events.
This patch also adds the ability to process resets.
The driver can process the following resets:
1) EMP Reset (EMPR)
2) Global Reset (GLOBR)
3) Core Reset (CORER)
4) Physical Function Reset (PFR)
EMPR is the largest level of reset that the driver can handle. An EMPR
resets the manageability block and also the data path, including PHY and
link for all the PFs. The affected PFs are notified of this event through
a miscellaneous interrupt.
GLOBR is a subset of EMPR. It does everything EMPR does except that it
doesn't reset the manageability block.
CORER is a subset of GLOBR. It does everything GLOBR does but doesn't
reset PHY and link.
PFR is a subset of CORER and affects only the given physical function.
In other words, PFR can be thought of as a CORER for a single PF. Since
only the issuing PF is affected, a PFR doesn't result in the miscellaneous
interrupt being triggered.
All the resets have the following in common:
1) Tx/Rx is halted and all queues are stopped.
2) All the VSIs and filters programmed for the PF are lost and have to be
reprogrammed.
3) Control queue interfaces are reset and have to be reprogrammed.
In the rebuild flow, control queues are reinitialized, VSIs are reallocated
and filters are restored.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds the ability for a VSI to use multiple Tx queues. More
specifically, the patch
1) Provides the ability to update the Tx scheduler tree in the
firmware. The driver can configure the Tx scheduler tree by
adding/removing multiple Tx queues per TC per VSI.
2) Allows a VSI to reconfigure its Tx queues during runtime.
3) Synchronizes the Tx scheduler update operations using locks.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds support for VLANs. When a VLAN is created a switch filter
is added to direct the VLAN traffic to the corresponding VSI. When a VLAN
is deleted, the filter is deleted as well.
This patch also adds support for the following hardware offloads.
1) VLAN tag insertion/stripping
2) Receive Side Scaling (RSS)
3) Tx checksum and TCP segmentation
4) Rx checksum
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch implements ice_start_xmit (the handler for ndo_start_xmit) and
related functions. ice_start_xmit ultimately calls ice_tx_map, where the
Tx descriptor is built and posted to the hardware by bumping the ring tail.
This patch also implements ice_napi_poll, which is invoked when there's an
interrupt on the VSI's queues. The interrupt can be due to either a
completed Tx or an Rx event. In case of a completed Tx/Rx event, resources
are reclaimed. Additionally, in case of an Rx event, the skb is fetched
and passed up to the network stack.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch configures the VSIs to be able to send and receive
packets by doing the following:
1) Initialize flexible parser to extract and include certain
fields in the Rx descriptor.
2) Add Tx queues by programming the Tx queue context (implemented in
ice_vsi_cfg_txqs). Note that adding the queues also enables (starts)
the queues.
3) Add Rx queues by programming Rx queue context (implemented in
ice_vsi_cfg_rxqs). Note that this only adds queues but doesn't start
them. The rings will be started by calling ice_vsi_start_rx_rings on
interface up.
4) Configure interrupts for VSI queues.
5) Implement ice_open and ice_stop.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch introduces data structures and functions to alloc/free
VSIs. The driver represents a VSI using the ice_vsi structure.
Some noteworthy points about VSI allocation:
1) A VSI is allocated in the firmware using the "add VSI" admin queue
command (implemented as ice_aq_add_vsi). The firmware returns an
identifier for the allocated VSI. The VSI context is used to program
certain aspects (loopback, queue map, etc.) of the VSI's configuration.
2) A VSI is deleted using the "free VSI" admin queue command (implemented
as ice_aq_free_vsi).
3) The driver represents a VSI using struct ice_vsi. This is allocated
and initialized as part of the ice_vsi_alloc flow, and deallocated
as part of the ice_vsi_delete flow.
4) Once the VSI is created, a netdev is allocated and associated with it.
The VSI's ring and vector related data structures are also allocated
and initialized.
5) A VSI's queues can either be contiguous or scattered. To do this, the
driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with
the firmware's VSI queue allocation imap. If the VSI can't get a
contiguous queue allocation, it will fallback to scatter. This is
implemented in ice_vsi_get_qs which is called as part of the VSI setup
flow. In the release flow, the VSI's queues are released and the bitmap
is updated to reflect this by ice_vsi_put_qs.
CC: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch continues the initialization flow as follows:
1) Allocate and initialize necessary fields (like vsi, num_alloc_vsi,
irq_tracker, etc) in the ice_pf instance.
2) Setup the miscellaneous interrupt handler. This also known as the
"other interrupt causes" (OIC) handler and is used to handle non
hotpath interrupts (like control queue events, link events,
exceptions, etc.
3) Implement a background task to process admin queue receive (ARQ)
events received by the driver.
CC: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds code to continue the initialization flow as follows:
1) Get PHY/link information and store it
2) Get default scheduler tree topology and store it
3) Get the MAC address associated with the port and store it
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds to the initialization flow by getting switch
configuration, scheduler configuration and device capabilities.
Switch configuration:
On boot, an L2 switch element is created in the firmware per physical
function. Each physical function is also mapped to a port, to which its
switch element is connected. In other words, this switch can be visualized
as an embedded vSwitch that can connect a physical function's virtual
station interfaces (VSIs) to the egress/ingress port. Egress/ingress
filters will be eventually created and applied on this switch element.
As part of the initialization flow, the driver gets configuration data
from this switch element and stores it.
Scheduler configuration:
The Tx scheduler is a subsystem responsible for setting and enforcing QoS.
As part of the initialization flow, the driver queries and stores the
default scheduler configuration for the given physical function.
Device capabilities:
As part of initialization, the driver has to determine what the device is
capable of (ex. max queues, VSIs, etc). This information is obtained from
the firmware and stored by the driver.
CC: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch implements multiple pieces of the initialization flow
as follows:
1) A reset is issued to ensure a clean device state, followed
by initialization of admin queue interface.
2) Once the admin queue interface is up, clear the PF config
and transition the device to non-PXE mode.
3) Get the NVM configuration stored in the device's non-volatile
memory (NVM) using ice_init_nvm.
CC: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
A control queue is a hardware interface which is used by the driver
to interact with other subsystems (like firmware, PHY, etc.). It is
implemented as a producer-consumer ring. More specifically, an
"admin queue" is a type of control queue used to interact with the
firmware.
This patch introduces data structures and functions to initialize
and teardown control/admin queues. Once the admin queue is initialized,
the driver uses it to get the firmware version.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds a basic driver framework for the Intel(R) E800 Ethernet
Series of network devices. There is no functionality right now other than
the ability to load.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>