Remove a for loop that does nothing in ixgbe_probe().
This is a remnant from when we had IO bars (compare to the ixgb code).
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Introduce an inline function pci_pcie_type(dev) to extract PCIe
device type from pci_dev->pcie_flags_reg field, and prepare for
removing pci_dev->pcie_type.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This change updates the code related to configuring the transmit frame
checksum. Specifically I have updated the code so that we can only skip
inserting the checksum in the case that we are not performing some other
offload that will modify the frame data.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
This change moves the RSC code into the non-EOP descriptor handling
function. The main motivation behind this change is to help reduce the
overhead in the non-RSC case. Previously the non-RSC path code would
always be checking for append count even if RSC had been disabled. Now
this code is completely skipped in a single conditional check instead of
having to make two separate checks.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
This patch creates a function named ixgbe_fetch_rx_buffer. The sole
purpose of this function is to retrieve a single buffer off of the ring and
to place it in an skb.
The advantage to doing this is that it helps improve the readability since
I can decrease the indentation and for the code in this section.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
This change makes it so that if only the first 256 bytes of a buffer are
used we just copy the data out and leave the offset and page count
unchanged. There are multiple advantages to this. First it allows us to
reuse the page much more in the case of pages larger than 4K. It also
allows us to avoid some expensive atomic operations in the form of
get_page/put_page. In perf I have seen CPU utilization for put_page drop
from 3.5% to 1.8% as a result of this patch when doing small packet routing,
and packet rates increased by about 3%.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
This change creates a separate function for functionality similar to
pskb_pull_tail. The main motivation for moving it to a separate function
is so that later I can just skip this function in the case where we have
already copied the buffer into skb->head.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
This patch makes it so that we will always have ownership of the buffers by
the time we get to ixgbe_add_rx_frag. This is necessary as I am planning to
add a copy-break to ixgbe_add_rx_frag and in order for that to function
correctly we need the CPU to have ownership of the buffer.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
This change makes it so that we do not use double buffering if the page
size is larger than 4K. Instead we will simply walk through the page using
up to 3K per receive, and if we receive less than we only move the offset
by that amount. We will free the page when there is no longer any space
left that we can use instead of checking the page count to see if we can
cycle back to the start.
The main motivation behind this is to avoid the unnecessary truesize cost
for using a half page when most packets are 2K or smaller. With this new
approach the largest possible truesize for a page fragment will be 3K when
PAGE_SIZE is larger than 4K.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
This patch combines ixgbe_add_rx_frag and ixgbe_can_reuse_page into a
single function. The main motivation behind this is to make better use of
the values so that we don't have to load them from memory and into
registers twice.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
This change reverts an earlier patch that introduced
ixgbe_init_rx_page_offset. The idea behind the function was to provide
some variation in the starting offset for the page in order to reduce
hot-spots in the cache. However it doesn't appear to provide any
significant benefit in the testing I have done. It has however been a
source of several bugs, and it blocks us from being able to use 2K
fragments on larger page sizes. So the decision I made was to remove it.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
This patch adds missing braces around the 10gig link check to include the check for KR support.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reported-by: Sascha Wildner <saw@online.de>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The skb->pfmemalloc flag gets set to true iff during the slab allocation
of data in __alloc_skb that the the PFMEMALLOC reserves were used. If
page splitting is used, it is possible that pages will be allocated from
the PFMEMALLOC reserve without propagating this information to the skb.
This patch propagates page->pfmemalloc from pages allocated for fragments
to the skb.
It works by reintroducing and expanding the skb_alloc_page() API to take
an skb. If the page was allocated from pfmemalloc reserves, it is
automatically copied. If the driver allocates the page before the skb, it
should call skb_propagate_pfmemalloc() after the skb is allocated to
ensure the flag is copied properly.
Failure to do so is not critical. The resulting driver may perform slower
if it is used for swap-over-NBD or swap-over-NFS but it should not result
in failure.
[davem@davemloft.net: API rename and consistency]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch resolves a "BUG: unable to handle kernel paging request at ..."
oops while dumping packet data. The issue occurs with IOMMU enabled due to
the address provided by phys_to_virt().
This patch makes use of skb->data on Tx and the virtual address of the pages
allocated for Rx.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change makes it so that we can use 1TC DCB in the case of MSI and
legacy interrupts. The advantage to this is that it allows us to fully
support FCoE w/ DCB instead of having to drop to link flow control only
when using these interrupt modes.
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds support for a new 82599 device that supports WoL.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
With DCB and FCoE configured extra queues may be allocated and
never used. After this patch we calculate the max correctly.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Do RAR entry accounting correctly so that errors are reported and
promisc mode is set correctly when the number of entries exceeds
the hardware limits.
This can happen with many macvlan devices attached to the PF or
by adding many fdb entries in SR-IOV modes.
Also this includes a small refactor to fdb_add() to avoid having so
many nested if/else statements after adding a check for the number
or RAR entries.
The max entries for the PF is currently 16 we allow 15 additional
entries to account for the defined MAC.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it so the function ixgbe_dcb_get_tc_from_up will use the
num_tcs.pg_tcs to determine the starting value for determining a traffic
class based on a user priority. The main motivation for this change is to
address possible bad configurations in which more TCs worth of data are
populated then there are actual TCs. By limiting this value we can at
least make certain we are not providing a map with values that are out of
range.
As a result any user priorities that are setup in the configuration with a
traffic class mapping higher than what the hardware supports will be
reported as being on TC 0.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The recent changes to netdev_alloc_skb actually make it so that the size of
the buffer now actually has a more direct input on the truesize. So in
order to make best use of the piece of a page we are allocated I am
reducing the IXGBE_RX_HDR_SIZE to 256 so that our truesize will be reduced
by 256 bytes as well.
This should result in performance improvements since the number of uses per
page should increase from 4 to 6 in the case of a 4K page. In addition we
should see socket performance improvements due to the truesize dropping
to less than 1K for buffers less than 256 bytes.
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it so that we can use the atr_sample_rate to determine if
we are capable of supporting ATR. The advantage to this approach is that it
allows us to now determine the setting of the IXGBE_FLAG_FDIR_HASH_CAPABLE
based on the queueing scheme, instead of the queueing scheme being based on
the flag.
Using this approach there are essentially 5 conditions that must be checked
prior to trying to enable ATR:
1. Is SR-IOV disabled?
2. Are the number of TCs <= 1?
3. Is RSS queueing limit greater than 1?
4. Is atr_sample_rate set?
5. Is Flow Director perfect filtering disabled?
If any of these conditions are enabled they should disable ATR filtering.
Note that in the case of conditions 1 through 4 being met we will set
things up for ATR queueing, however if test 5 fails we will still leave the
queues allocated for use by perfect filters. The reason for this is to
allow for us to switch back and forth between ntuple and ATR without
needing to reallocate the descriptor rings.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch does two things. First it drops the unnecessary work of
searching for enabled VFs when we first bring up the adapter and instead
just uses pci_num_vf to determine how many VFs are enabled on the adapter.
The second thing it does is drop the use of vfdev from the vf_data_storage
structure. Instead we just search the entire system for a VF that has us
as it's PF, and then if that VF is assigned we indicate that the VFs are
assigned. This allows us to still check for assigned VFs even if the
vfinfo allocation has failed, or vfinfo has been freed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This is meant to fix a bug in which we were not checking for pre-existing
VFs if we were not setting the max_vfs value at driver load. What happens
now is that we always call ixgbe_enable_sriov and this checks for
pre-existing VFs ore requested VFs prior to deciding on no SR-IOV.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jerr Kirsher says:
====================
This series contains updates to ixgbe.
...
Alexander Duyck (9):
ixgbe: Use VMDq offset to indicate the default pool
ixgbe: Fix memory leak when SR-IOV VFs are direct assigned
ixgbe: Drop references to deprecated pci_ DMA api and instead use
dma_ API
ixgbe: Cleanup configuration of FCoE registers
ixgbe: Merge all FCoE percpu values into a single structure
ixgbe: Make FCoE allocation and configuration closer to how rings
work
ixgbe: Correctly set SAN MAC RAR pool to default pool of PF
ixgbe: Only enable anti-spoof on VF pools
ixgbe: Enable FCoE FSO and CRC offloads based on CAPABLE instead of
ENABLED flag
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use PCI_VENDOR_ID_INTEL from pci_ids.h instead of creating its own
vendor ID #define.
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
Cc: Don Skidmore <donald.c.skidmore@intel.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Cc: Alex Duyck <alexander.h.duyck@intel.com>
Cc: John Ronciak <john.ronciak@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of only setting the FCOE segmentation offload and CRC offload flags
if we enable FCoE, we could just set them always since there are no
modifications needed to the hardware or adapter FCoE structure in order to
use these features.
The advantage to this is that if FCoE enablement fails, for example because
SR-IOV was enabled on 82599, we will still have use of the FCoE
segmentation offload and Tx/Rx CRC offloads which should still help to
improve the FCoE performance.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The current logic is enabling anti-spoof on all pools and then clearing
anti-spoof on just the first PF pool. The correct approach is to only set
anti-spoof on the VF pools and to leave all of the PF pools unchecked.
This allows for items such as FCoE to use adjacent pools within the PF for
transmit and receive queues without the traffic being blocked by this
security feature.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change corrects an issue in which an FCoE enabled adapter was always
setting the FCoE SAN MAC MPSAR register to 0x1. This results in the first
VF being assigned the SAN MAC address in the case of SR-IOV and as such is
incorrect. To resolve this I am adding a new function that will update the
SAN MAC pool address after reset.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch changes the behavior of the FCoE configuration so that it is
much closer to how the main body of the ixgbe driver works for ring
allocation.
The first piece is the ixgbe_fcoe_ddp_enable/disable calls. These allocate
the percpu values and if successful set the fcoe_ddp_xid value indicating
that we can support DDP.
The next piece is the ixgbe_setup/free_ddp_resources calls. These are
called on open/close and will allocate and free the DMA pools.
Finally ixgbe_configure_fcoe is now just register configuration. It can go
through and enable the registers for the FCoE redirection offload, and FIP
configuration without any interference from the DDP pool allocation.
The net result of all this is two fold. First it adds a certain amount of
exception handling. So for example if ixgbe_setup_fcoe_resources fails we
will actually generate an error in open and refuse to bring up the
interface.
Secondly it provides a much more graceful failure case than the previous
model which would skip setting up the registers for FCoE on failure to
allocate DDP resources leaving no Rx functionality enabled instead of just
disabling DDP.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change merges the 2 statistics values for noddp and noddp_ext_buff
and the dma_pool into a single structure that can be allocated per CPU.
The advantages to this are several fold. First we only need to do one
alloc_percpu call now instead of 3, so that means less overhead for
handling memory allocation failures. Secondly in the case of
ixgbe_fcoe_ddp_setup we only need to call get_cpu once which makes things a
bit cleaner since we can drop a put_cpu() from the exception path.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it so we always use the FCoE redirection table. We just
set all 8 entries to the same value in the case of only having one queue
for FCoE.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The networking side of the code had already been updated to use dma_ calls
instead of the old pci_ calls. However it looks like the FCoE code was
never updated. This change goes through and moves everything from the pci
APIs to the dma APIs.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The VF driver had a memory leak that would occur if VFs were assigned to a
guest. The amount of leak would vary with the number of VFs but could max
out at about 14K per PF. To reproduce the leak all you would need to do is
enable all the VFs on the first PF. Then start a loop of loading and
unloading the driver with max_vfs=63 for the first port.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it so that we can use the VMDq ring feature offset value
to determine the default pool instead of using num_vfs. The reason for
this change is to avoid issues should we fail to allocate vfinfo but have
pre-existing VFs. What should happen in this case is that num_vfs will go
to 0, but the VMDq offset will contain the location of the first PF pool.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Sibai Li <Sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change is just meant to defragment the flags as there are several hole
that have been introduced since several features, or the flags for them,
have been removed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
All of our hardware supports RSS even if it is only for a single queue. So
instead of toting around the RSS enable flag I am updating the code so that
all devices are enabled and if we want to disable RSS it is indicated via
the RSS mask.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change essentially makes it so that we can enable almost all of the
features all at once. This patch allows for the combination of SR-IOV,
DCB, and FCoE in the case of the x540. It also beefs up the SR-IOV by
adding support for RSS to the PF.
The testing matrix gets to be very complex for this patch as there are a
number of different features and subsets for queueing options. I tried to
narrow these down a bit by restricting the PF to only supporting 4TC DCB
when it is enabled in addition to SR-IOV.
Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change allows all pools from the default pool forward to be enabled vi
ixgbe_configure_virtualization. This is needed as we are planning to use
queues belonging to adjacent pools for FCoE when SR-IOV and FCoE are both
enabled.
In addition this patch contains some minor formatting changes as there were
a few spots that seemed to be in need of some cleanup.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change is meant to make the code much more readable for MTQC and MRQC
configuration.
The big change is that I simplified much of the logic so that we are
essentially handling just 4 cases and their variants. In the cases where
RSS is disabled we are actually just programming the RETA table with all
1s resulting in a single queue RSS. In the case of SR-IOV I am treating
that as a subset of VMDq. This all results int he following configuration
for the hardware:
DCB
En Dis
VMDq En VMDQ/DCB VMDq/RSS
Dis DCB/RSS RSS
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change cleans up some of the logic in an attempt to try and simplify
things for how we are configuring DCB w/ RSS.
In this patch I basically did 3 things. I updated the logic for getting
the first register index. I applied the fact that all TCs get the same
number of queues to simplify the looping logic in caching the DCB ring
register. Finally I updated how we configure the RQTC register to match
the fact that all TCs are assigned the same number of queues.
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
It makes much more sense for us to configure the real number of Tx and Rx
queues in the ixgbe_open call than it does in ixgbe_set_num_queues. By
setting the number in ixgbe_open we can avoid a number of unecessary
updates and only have to make the calls once.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Previously we were exiting without cleaning up the memory internally on the
ixgbe_setup_rx_resources and ixgbe_setup_tx_resources calls. Instead of
forcing the caller to clean things up for us we should instead just unwind
the rings and free the memory as we go. This way we can more gracefully
clean up the rings in the event of an allocation failure.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When the link status changes on the PF we need to notify the VFs. In order
to do this we should ping all of the VFs in order to trigger a link status
change on them as well.
This fixes issues in which the PF would reset, but the VF didn't because the
NAK flag was not set in the VF mailbox.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jett Kirsher says:
====================
This series contains updates to e1000e and ixgbe.
...
Alexander Duyck (5):
ixgbe: Simplify logic for getting traffic class from user priority
ixgbe: Cleanup unpacking code for DCB
ixgbe: Populate the prio_tc_map in ixgbe_setup_tc
ixgbe: Add function for obtaining FCoE TC based on FCoE user priority
ixgbe: Merge FCoE set_num and cache_ring calls into RSS/DCB config
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert the existing uses of random_ether_addr to
the new eth_random_addr.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change merges the ixgbe_cache_ring_fcoe and ixgbe_set_fcoe_queues
logic into the DCB and RSS initialization calls.
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In upcoming patches it will become increasingly common to need to determine
the FCoE traffic class in order to determine the correct queues for FCoE.
In order to make this easier I am adding a function for obtaining the FCoE
traffic class based on the user priority.
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There were cases where the prio_tc_map was not populated when we were
calling open. This will result in us incorrectly configuring the traffic
classes when DCB is enabled. In order to correct this I have updated the
code so that we now populate the values prior to allocating the q_vectors
and calling ixgbe_open.
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This is meant to be a generic clean-up of the remaining functions for
unpacking data from the DCB structures. The only real changes are:
replaced the variable i with tc for functions that were looping through the
traffic classes, and added a pointer for tc_class instead of path since
that way we only need to pull the pointer once instead of once per loop.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch is meant to help simplify the logic for getting traffic classes
from user priorities. To do this I am adding a function named
ixgbe_dcb_get_tc_from_up that will go through the traffic classes in
reverse order in order to determine which traffic class contains a bit for
a given user priority.
Adding a declaration for this new function to the header so that
we have a centralized means for sorting out traffic classes belonging to
features such as FCoE.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There are really only 3 modes that can control the number of queues. Those
are RSS, DCB, and VMDq/SR-IOV. Currently we have things much more broken
up than they need to be for how we are configuring the rings. In order to
try and straiten some of this out I am going to start merging similar
functionality into single functions. To start with I am merging the Flow
Director ring configuration into the RSS ring configuration since Flow
Director cannot function with DCB or SR-IOV.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch replaces a switch statement for an 82598 workaround with an if
statement that only applies to 82598. In addition I am pulling out several
dead pieces of code and instead of reading the SRRCTL register and then
modifying it we are just writing a value which we generate from scratch.
Finally I am also removing any drop enable related code since that was
moved to a function of its own.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The mask value for ring features was overloaded for FCoE which can lead to
some confusion. In order to avoid any confusion I am splitting the mask
value and adding an offset value. This can be used for the start of the
FCoE rings, and in the future I hope to use it to store the start of the
registers for SR-IOV.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We are currently using indices to indicate the upper limit on a ring
feature. However since we can switch back and forth on features such as
DCB and that has effects on other features such as RSS it is preferable to
instead store the upper limit separate from the current value for the
number of rings related to the feature.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
It makes much more sense for us to count q_vectors instead of MSI-X
vectors. We were using num_msix_vectors to find the number of q_vectors in
multiple places. This was wasteful since we only had one place that
actually needs the number of MSI-X vectors and that is in slow path.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Conflicts:
net/batman-adv/bridge_loop_avoidance.c
net/batman-adv/bridge_loop_avoidance.h
net/batman-adv/soft-interface.c
net/mac80211/mlme.c
With merge help from Antonio Quartulli (batman-adv) and
Stephen Rothwell (drivers/net/usb/qmi_wwan.c).
The net/mac80211/mlme.c conflict seemed easy enough, accounting for a
conversion to some new tracing macros.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches. Delete
a few that are content-free.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DCB and SR-IOV cannot currently be enabled at the same time as the queueing
schemes are incompatible. If they are both enabled it will result in Tx
hangs since only the first Tx queue will be able to transmit any traffic.
This simple fix for this is to block us from enabling TCs in ixgbe_setup_tc
if SR-IOV is enabled. This change will be reverted once we can support
SR-IOV and DCB coexistence.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/caif/caif_hsi.c
drivers/net/usb/qmi_wwan.c
The qmi_wwan merge was trivial.
The caif_hsi.c, on the other hand, was not. It's a conflict between
1c385f1fdf ("caif-hsi: Replace platform
device with ops structure.") in the net-next tree and commit
39abbaef19 ("caif-hsi: Postpone init of
HIS until open()") in the net tree.
I did my best with that one and will ask Sjur to check it out.
Signed-off-by: David S. Miller <davem@davemloft.net>
FCoE target mode was experiencing issues due to the fact that we were
sending up data frames that were padded to 60 bytes after the DDP logic had
already stripped the frame down to 52 or 56 depending on the use of VLANs.
This was resulting in the FCoE DDP logic having issues since it thought the
frame still had data in it due to the padding.
To resolve this, adding code so that we do not pad FCoE frames prior to
handling them to the stack.
CC: <stable@vger.kernel.org>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/usb/qmi_wwan.c
net/batman-adv/translation-table.c
net/ipv6/route.c
qmi_wwan.c resolution provided by Bjørn Mork.
batman-adv conflict is dealing merely with the changes
of global function names to have a proper subsystem
prefix.
ipv6's route.c conflict is merely two side-by-side additions
of network namespace methods.
Signed-off-by: David S. Miller <davem@davemloft.net>
The check for length <= 0 is bogus because length is unsigned, and network
stack never sends zero length packets (unless it is totally broken).
The check for really small packets can be optimized (using unlikely)
and calling skb_pad directly.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch cleans up the method used for determining the link speed of
devices. The old method re-wrote some logic already existing in a mac.ops
function which should be used instead. The result is much simpler to
understand and removes a strange double-check of logic, as well as reducing
code redundancy.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds support for 1G Fiber PHY modules (SFP+ modules). This support
comes along side support for 1G Copper PHY modules, but uses a different PHY
type (ixgbe_sfp_type_1g_sx_core).
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes a memory leak that was introduced in the 3.4 kernel. The
leak occurred when FCoE was enabled and traffic was passed over the FCoE
rings reserved for FCoE. The memory leak was due to us not populating the
compound page information on the order 1 pages needed for FCoE.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes a potential hole when configuring the cycle counter used to
generate the nanosecond time clock. This clock is based off of the SYSTIME
registers along with the TIMINCA registers. The TIMINCA register determines
the increment to be added to the SYSTIME registers every DMA clock tick. This
register needs to be reconfigured whenever the link-speed changes. However,
the value calculated stays the same when link is down and when link is up.
Misconfiguration can occur if the link status changes due to a reset, which
causes the TIMINCA register to be reset. This reset puts the device in an
unstable state where the SYSTIME registers stop incrementing and the PTP
protocol does not function.
The solution is to double check the TIMINCA value and always reset the value
if the register is zero. This prevents a misconfiguration bug that halts the
PHC.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a potential Rx timestamp deadlock that causes the Rx
timestamping to stall indefinitely. The issue could occur when a PTP packet is
timestamped by hardware but never reaches the Rx queue. In order to prevent a
permanent loss of timestamping, the RXSTMP(L/H) registers have to be read to
unlock them. (This used to only occur when a packet that was timestamped
reached the software.) However the registers can't be read early otherwise
there is no way to correlate them to the packet.
This patch introduces a filter function which can be used to determine if a
packet should have been timestamped. Supplied with the filter setup by the
hwtstamp ioctl, check to make sure the PTP protocol and message type match the
expected values. If so, then read the timestamp registers (to free them.) At
this point check the descriptor bit, if the bit is set then we know this
packet correlates to the timestamp stored in the RXTSTAMP registers.
Otherwise, assume that packet was dropped by the hardware, and ignore this
timestamp value. However, we have at least unlocked the rxtstamp registers for
future timestamping.
Due to the way the driver handles skb data, it cannot be directly accessed. In
order to work around this, a copy of the skb data into a linear buffer is
made. From this buffer it becomes possible to read the data correctly
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When enabling the hwtstamp mode for Rx timestamping the V2 ptp event type
specific modes (Delay Request and Sync) have been rolled into the V2 all event
packet modes, in order to more accurately represent what hardware is doing.
Hardware always timestamps the Path delay packets when a V2 mode is selected,
regardless of what type was selected (in order to always support Path delay
mode). However this means the user selected modes of timestamping only Sync or
Delay Request is not truly supported. This patch correctly sets the mode for
the hwtstamp config and returns to the user that all V2 event packets will be
timestamped.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes two minor nits from Richard Cochran. The first is a case of
ambitious line wrapping that wasn't necessary. The second is to re-order the
flag checks for PPS support. Previously, the hardware test was done first, and
the interrupt flag test was done second. Now, test the interrupt flag and use
the unlikely macro.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
ixgbe_sysfs.c is only needed when CONFIG_IXGBE_HWMON is configured in the
kernel.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Acked-by: Don Skidmore <Donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The flow control DV macros are used to calculate the flow control
high and low thresholds. This patch annotates these macros slightly
better and fixes the issues below.
The macro variables are renamed LINK to _max_frame_link and TC to
_max_frame_tc. This was to avoid confusion and make them more
readable. It was found that people auditing the code read TC to be
'traffic class' in the 802.1Q definition instead of the max frame
size of the tc. Hopefully it is clear now.
This audit also found the following real deviations from the
theoretical values. Fixed in this patch.
* I multiplied the DV calculations by (36/25) which always
evaluates to 1. This does not match the intended theoretical
value of 1.44.
* IXGBE_BT2KB added 1023 to account for rounding however this
really should be 8 * 1023 - 1 to account for division by 8k.
* x2 multiplication of max frame in DV calculations to account
for updated hardware recommendations.
With this patch the DV values are inline with the recommendations
in the 82599 and 82598 data sheets. Its worth noting I did not
see any dropped frames with flow control on in my experiments without
this patch. However aligning with the hardware specs and
recommendations seems like a good idea here to account for worst
case scenarios.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The hardware bit IXGBE_RXD_STAT_VP appears to be set even when Rx
stripping is disabled. This results in passing frames up the stack
which do not have the 802.1Q tag stripped but have the tci bits
set as if it was.
Working around this with a check for the feature flag bit. I
would welcome any better ideas or a pointer to exactly which
bits in the hardware register need to be cleared to get the
IXGBE_RXD_STAT_VP bit to be set per data sheet.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Marcus Dennis <marcusx.e.dennis@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
DCB can be used independent of if RX VLAN stripping is enabled
or disabled so remove erroneous check.
Also enable or disable VLAN stripping when features are applied so
hardware and feature flags are in sync.
CC: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Marcus Dennis <marcusx.e.dennis@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Update version number to better match the version of the out of tree
driver with similar functionality.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When the hwmon code was initially added it was with the assumption that a
sysfs patch would be also coming soon. Since that isn't the case some
clean up needs to be done. This patch does that.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Kernel software timestamping requires that the driver calls skb_tx_timestamp
just before passing the skb to the MAC, in order to provide the best software
timestamps. This patch adds this call for that support.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds support for the ethtool get_ts_info operation, which enables
access of available timestamp/timesync support for that device. It can query
which ptp clock device is associated with the particular port.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The current value of the udelay timeout for ixgbe_disable_rx_buff is too
short. This causes the security path to not not be properly disabled during
the section that is meant to have it turned off. The end result causes a race
condition that results in RX issues.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch enables the PPS system in the PHC framework, by enabling
the clock-out feature on the X540 device. Causes the SDP0 to be set as
a 1Hz clock. Also configures the timesync interrupt cause in order to
report each pulse to the PPS via the PHC framework, which can be used
for general system clock synchronization. (This allows a stable method
for tuning the general system time via the on-board SYSTIM register
based clock.)
Signed-off-by: Jacob E Keller <jacob.e.keller@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch enables hardware timestamping for use with PTP software by
extracting a ns counter from an arbitrary fixed point cycles counter.
The hardware generates SYSTIME registers using the DMA tick which
changes based on the current link speed. These SYSTIME registers are
converted to ns using the cyclecounter and timecounter structures
provided by the kernel. Using the SO_TIMESTAMPING api, software can
enable and access timestamps for PTP packets.
The SO_TIMESTAMPING API has space for 3 different kinds of timestamps,
SYS, RAW, and SOF. SYS hardware timestamps are hardware ns values that
are then scaled to the software clock. RAW hardware timestamps are the
direct raw value of the ns counter. SOF software timestamps are the
software timestamp calculated as close as possible to the software
transmit, but are not offloaded to the hardware. This patch only
supports the RAW hardware timestamps due to inefficiency of the SYS
design.
This patch also enables the PHC subsystem features for atomically
adjusting the cycle register, and adjusting the clock frequency in
parts per billion. This frequency adjustment works by slightly
adjusting the value added to the cycle registers each DMA tick. This
causes the hardware registers to overflow rapidly (approximately once
every 34 seconds, when at 10gig link). To solve this, the timecounter
structure is used, along with a timer set for every 25 seconds. This
allows for detecting register overflow and converting the cycle
counter registers into ns values needed for providing useful
timestamps to the network stack.
Only the basic required clock functions are supported at this time,
although the hardware supports some ancillary features and these could
easily be enabled in the future.
Note that use of this hardware timestamping requires modifying daemon
software to use the SO_TIMESTAMPING API for timestamps, and the
ptp_clock PHC framework for accessing the clock. The timestamps have
no relation to the system time at all, so software must use the posix
clock generated by the PHC framework instead.
Signed-off-by: Jacob E Keller <jacob.e.keller@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If the VF sends a MACVLAN request with index of zero then it is not
actually trying to add a filter. Check the index value and only
indicate that operation is not allowed when the VF is actually trying
to add a filter.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The drop enable bit can be used to improve the performance of the adapter
in the case of multiple queues being present. This performance gain is due
to the fact that some slower CPUs can cause the FIFO to backfill preventing
faster CPUs from receiving additional work. By setting the drop enable bit
we prevent this and instead just drop the packets that would have been
bound for the slower CPU.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change cleans up the logic in the priority based flow control
configuration routines. Both the 82599 and 82598 based routines perform
similar functions however they are both arranged completely differently.
This patch goes over both of them to clean up the code.
In addition I am dropping the ixgbe_fc_pfc flow control mode and instead
just replacing it with checks for if priority flow control is enabled.
This allows us to maintain some of the link flow control information which
allows for an easier transition between link and priority flow control.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Previously we would get a mailbox error and still process the message.
Instead we should exit on error.
In addition we should also be flushing the ACK of the message so that we
can guarantee that the other end is aware we have received the message
while we are processing it.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Conflicts:
drivers/net/ethernet/intel/e1000e/param.c
drivers/net/wireless/iwlwifi/iwl-agn-rx.c
drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c
drivers/net/wireless/iwlwifi/iwl-trans.h
Resolved the iwlwifi conflict with mainline using 3-way diff posted
by John Linville and Stephen Rothwell. In 'net' we added a bug
fix to make iwlwifi report a more accurate skb->truesize but this
conflicted with RX path changes that happened meanwhile in net-next.
In e1000e a conflict arose in the validation code for settings of
adapter->itr. 'net-next' had more sophisticated logic so that
logic was used.
Signed-off-by: David S. Miller <davem@davemloft.net>
PFC stats are only tabulated when PFC is enabled. However in IEEE
mode the ieee_pfc pfc_tc bits were not checked and the calculation
was aborted.
This results in statistics not being reported through ethtool and
possible a false Tx hang occurring when receiving pause frames.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
igb and ixgbe incorrectly call netdev_tx_reset_queue() from
i{gb|xgbe}_clean_tx_ring() this sort of works in most cases except
when the number of real tx queues changes. When the number of real
tx queues changes netdev_tx_reset_queue() only gets called on the
new number of queues so when we reduce the number of queues we risk
triggering the watchdog timer and repeated device resets.
So this is not only a cosmetic issue but causes real bugs. For
example enabling/disabling DCB or FCoE in ixgbe will trigger this.
CC: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: John Bishop <johnx.bishop@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change updates the link flow control configuration so that we
correctly set the link flow control settings for DCB. Previously we would
have to call the fc_enable call 8 times, once for each packet buffer. If
we move that logic into the fc_enable call itself we can avoid multiple
unnecessary register writes.
This change also corrects an issue in which we were only shifting the water
marks for 82599 parts by 6 instead of 10. This was resulting in us only
using 1/16 of the packet buffer when flow control was enabled.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We can avoid many of the forward declarations found in ixgbe_common.c by
just reordering things so this patch does that to help cleanup the code.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change replaces the calls to put_page with calls to __free_page.
Since the FCoE code is able to access order 1 pages I thought it would be a
good idea to change things over to using __free_pages since that is the
preferred approach for freeing pages.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it so that ixgbe_fc_autoneg is a void and always sets the
current_mode. Previously if the link was down we would return an error,
however there is no harm in simply treating a link down case as a case in
which autoneg simply failed. This allows us to rely on the return value of
the ixgbe_fc_enable call now since there should be no cases where it
returns an error that would normally be ignored.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change reorders the mapping of rings to q_vectors in the case that the
number of rings exceeds the number of q_vectors. Previously we would
allocate the first R/N queues to the first q_vector where R is the number
of rings and N is the number of q_vectors. Instead of doing this we can do
a better job of interleaving the rings to the CPUs by assigning every Nth
ring to the q_vector.
The below tables illustrate this change for the R = 16 N = 4 case.
Before patch After patch
q_vector: 0 1 2 3 0 1 2 3
Rings: 0 4 8 12 0 1 2 3
1 5 9 13 4 5 6 7
3 6 10 14 8 9 10 11
4 7 11 15 12 13 14 15
This should improve the performance for both DCB or ATR when the number of
rings exceeds the number of q_vectors allocated by the adapter.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it so that we can track instances of where a packet was
dropped due to a packet being received when there are no DMA buffers
available in the ring.
For some reason this was only being enabled with RSC, however it makes
more sense to always have this feature on so that we can track any cases
where we might drop a buffer due to an Rx ring being full.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
After this commit:
commit aacc1bea19
Author: Multanen, Eric W <eric.w.multanen@intel.com>
Date: Wed Mar 28 07:49:09 2012 +0000
ixgbe: driver fix for link flap
The BIT_APP_UPCHG bit is no longer set when ixgbe_dcbnl_set_all() is
called. This results in the FCoE app user priority never getting set
and the driver will not configure the tx_rings correctly for FCoE
packets which use the SAN MTU and FCoE offloads.
We resolve this regression by fixing ixgbe_copy_dcb_cfg() to also
check for FCoE application changes. Additionally, we can drop the
IEEE variants of get_dcb_app() because this path is never called
with the IEEE mode enabled.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
It was possible for shutdown to pull the rug out from other driver entry
points. Now we just grab the rtnl lock before taking everything apart.
Thanks to Hariharan for noticing this tight race condition.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Cc: Hariharan Nagarajan <hanagara@cisco.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
While testing the TCP changes I had to fix an issue in order to be able to
load and unload the module.
The recent patch that added thermal sensor support added a use after free
bug on module unload with an 82598 adapter in the system. To resolve the
issue I have updated the code so that when we free the info_kobj we set it
back to NULL.
I suspect there are likely other bugs present, but I will leave that for
another patch that can undergo more testing.
I am submitting this directly to net-next since this fixes a fairly serious
bug that will lock up the ixgbe module until the system is rebooted.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the user request for the number of VFs in the max_vfs parameter is
out of range then reset the value to the default value of zero. This
makes the behavior of the ixgbe driver the same as for the igb driver.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Robert Garrett <robertx.e.garrett@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If the host VMM administrator has set the virtual function device's
MAC address then also deny VF requests for MACVLAN filters.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Garrett, Robert <robertx.e.garrett@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Some of our adapters have thermal data available, this patch exports
this data via hwmon sysfs interface.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Some 82599 adapters contain thermal data that we can get to via
an i2c interface. These functions provide support to get at that
data. A following patch will export this data.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
With the support to bounce buffer added, the skb is coming as nonlinear in the
case of non-DDPed data frames for FCoE, which is mostly ok as the FCoE stack
would take care of that. However, for target mode, we have to set the FC CRC
and FC EOF field to allow the protocol stack to not drop the frame for the last
data frame of that sequence. So fix this by linearizing the skb first before
doing skb_put().
Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Marcus Dennis <marcusx.e.dennis@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The driver was freeing memory in shutdown instead of remove. As a result
we were leaking memory if IEEE DCB was enabled and we loaded/unloaded the
driver. This change moves the freeing of the memory into the remove
routine where it belongs.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch consolidates the case logic for checking whether a device supports
WoL into a single place. Previously ethtool and probe used similar logic that
was copied and maintained separately. This patch encapsulates the core logic
into a function so that a user only has to update one place.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fix merge between commit 3adadc08cc ("net ax25: Reorder ax25_exit to
remove races") and commit 0ca7a4c87d ("net ax25: Simplify and
cleanup the ax25 sysctl handling")
The former moved around the sysctl register/unregister calls, the
later simply removed them.
With help from Stephen Rothwell.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes it so that we identify FCoE rings earlier than
ixgbe_set_rx_buffer_len. Instead we identify the Rx FCoE rings at
allocation time in ixgbe_alloc_q_vector.
The motivation behind this change is to avoid memory corruption when FCoE
is enabled. Without this change we were initializing the rings at 0, and
2K on systems with 4K pages, then when we bumped the buffer size to 4K with
order 1 pages we were accessing offsets 2K and 6K instead of 0 and 4K.
This was resulting in memory corruptions.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Yi Zou <yi.zou@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Upon resume from standby, ixgbe may trigger the ASSERT_RTNL() in
netif_set_real_num_tx_queues(). The call stack is:
netif_set_real_num_tx_queues
ixgbe_set_num_queues
ixgbe_init_interrupt_scheme
ixgbe_resume
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Conflicts:
drivers/net/ethernet/atheros/atlx/atl1.c
drivers/net/ethernet/atheros/atlx/atl1.h
Resolved a conflict between a DMA error bug fix and NAPI
support changes in the atl1 driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
The UTA table was being set to the functional equivalent of promiscuous
mode. This was resulting in traffic from the virtual function being
flooded onto the wire and the PF device. This resulted in additional
overhead for VF traffic sent to the network and in the case of traffic
sent to the PF or another VF resulted in unwanted packets on the wire.
This was actually not the intended behavior. Now that we can program
the embedded switch correctly we can remove this snippit of code. Users
who want to support this should configure the FDB correctly using the
FDB ops.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows RAR table updates while in promiscuous. With
SR-IOV enabled it is valuable to allow the RAR table to
be updated even when in promisc mode to configure forwarding
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable FDB ops on ixgbe when in SR-IOV mode.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for I2C clock stretching which is required per
SFF-8636. Customers with passive DA cables implement clock stretching
would fail without this patch.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There are times we turn of the laser before shutdown. This is a bad thing
if we want to wake on lan to work so now we make sure the laser is on
before shutdown if we support WoL.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch modifies ixgbe_get_pcie_msix_count_generic() to support
all current HW and removes the 82598 specific function.
- change the type of ixgbe_get_pcie_msix_count_generic() to u16
- include a check to make sure the maximum allowed number of vectors
is not exceeded.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fix up code so that changes in DCB settings
are detected only when ixgbe_dcbnl_set_all is called.
Previously, a series of 'change' commands followed by
a call to ixgbe_dcbnl_set_all() would always be handled
as a HW change - even if the net change was zero.
This patch checks for this case of no actual change and
skips going through the HW set process.
Without this fix, the link could reset and result in
a link flap.
The core change in this patch is to check for changes
in the ixgbe_copy_dcb_cfg() routine - and return
a bitmask of detected changes. The other
places where changes were detected previously can be removed.
Signed-off-by: Eric Multanen <eric.w.multanen@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Update the driver version number to better match version of out of tree
driver that has similar functionality.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This was pointed out to me by Xiaojun Zhang on Source Forge.
CC: Xiaojun Zhang <zhangxiaojun@sourceforge.net>
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Dan Carpenter noticed that ixgbevf initial default was different than
the rest. But the problem is broader than that, only one Intel driver (ixgb)
was doing it almost right.
The convention for default debug level should be consistent among
Intel drivers and follow established convention.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch allows us to avoid a Tx hang when SR-IOV is enabled. This hang
can be triggered by sending small packets at a rate that was triggering Rx
missed errors from the adapter while the internal Tx switch and at least
one VF are enabled.
This was all due to the fact that under heavy stress the Rx FIFO never
drained below the flow control high water mark. This resulted in the Tx
FIFO being head of line blocked due to the fact that it relies on the flow
control high water mark to determine when it is acceptable for the Tx to
place a packet in the Rx FIFO.
The resolution for this is to set the FCRTH value to the RXPBSIZE - 32 so
that even if the ring is almost completely full we can still place Tx
packets on the Rx ring and drop incoming Rx traffic if we do not have
sufficient space available in the Rx FIFO.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Resolve namespace issues when FCoE or DCB is not enabled.
The issue is with certain configurations we end up with namespace
problems. A simple example:
ixgbe_main.c
- defines func A()
- uses func A()
ixgbe_fcoe.c
- uses func A()
ixgbe.h
- has prototype for func A()
For default (FCoE included) all is good. But when it isn't the namespace
checker complains about how func A() could be static.
To resolve this, created a ixgbe_lib file to contain functions used
by DCB/FCoE and their helper functions so that they are always in
namespace whether or not DCB/FCoE is enabled.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
This patch replaces the variable name data with the variable name features
for ixgbe_fix_features and ixgbe_set_features. This helps to make some
issues more obvious such as the fact that we were disabling Rx VLAN tag
stripping when we should have been forcing it to be enabled when DCB is
enabled.
In addition there was deprecated code present that was disabling the LRO
flag if we had the itr value set too low. I have updated this logic so
that we will now allow the LRO flag to be set, but will not enable RSC
until the rx-usecs value is high enough to allow enough time for Rx packet
coalescing.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds support for enabling or disabling UDP RSS via the
ethtool -N rx-flow-hash command.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch contains several fixes for formatting in regards to whitespace.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change fixes two minor issues. The first was the fact that we were
setting the return value to false twice in the set_rss_queues function.
The second is the fact that we should have been using "min_t(int," instead
of "min((int)" in set_fdir_queues.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The err_eeprom and err_sw_init tags both go to the same location. So
instead of maintaining two tags this patch combines them so we only use
err_sw_init.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change relocates the ixgbe_poll routine so it is right next to the
interrupt routine that schedules and calls it.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change just cleans up some of the logic in the service_timer function
so that we can avoid unnecessary swapping of the ready value between true to
false and back to true.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it so that only the 2nd cache line in the ring structure
should see frequent updates. The advantage to this is that it should
reduce the amount of cross CPU cache bouncing since only the 2nd cache line
will be changing between most network transactions.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it so that we store the tx_flags and protocol information
to the tx_buffer_info structure sooner. This allows us to avoid unnecessary
read/write transactions since we are placing the data in the final location
earlier.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it so that we always write the DMA address for the skb
itself on the same tx_buffer struct that the skb is written on. This way
we don't need the MAPPED_AS_PAGE flag and we always know it will be the
first DMA value that we will have to unmap.
In addition I have found an issue in which we were leaking a DMA mapping if
the value happened to be 0 which is possible on some platforms. In order
to resolve that I have updated the transmit path to use the length instead
of the DMA mapping in order to determine if a mapping is actually present.
One other tweak in this patch is that it only writes the olinfo information
on the first descriptor. As it turns out it isn't necessary to write it
for anything but the first descriptor so there is no need to carry it
forward.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it so that gso_segs and bytecount are written to the ring
sooner. This helps to simplify the logic for the two since segmentation
offloads can now update them within their own function.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Instead of keeping a local copy of the skb on the stack for as long as long
as we do it makes sense to instead just place it on the first tx_buffer
structure so that we can save space on the stack and avoid unnecessary
read/write operations copying the pointer out of the stack and onto the
ring later.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
A separate value was added to track Tx completions in order to determine if
the Tx unit was hung. However we can do the same thing using the number of
packets completed without having to add another stat to the Tx ring.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it more likely that the descriptor flags setup will use
cmov instructions instead of conditional jumps when setting up the flags.
The advantage to this is that the code should just flow a bit more
smoothly.
To do this it is necessary to set the TX_FLAGS_CSUM bit in tx_flags when
doing TSO so that we also do the checksum in addition to the segmentation
offload.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes certain that any packet we attempt to transmit will meet
minimum size requirements for the hardware.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change is meant to just cleanup the logic in ixgbe_change_mtu since we
are making it unnecessarily complex due to a workaround required for 82599
when SR-IOV is enabled.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch replaces the existing Rx hot-path in the ixgbe driver with a new
implementation that is based on performing a double buffered receive. The
ixgbe driver already had something similar in place for its' packet split
path, however in that case we were still receiving the header for the
packet into the sk_buff. The big change here is the entire receive path
will receive into pages only, and then pull the header out of the page and
copy it into the sk_buff data. There are several motivations behind this
approach.
First, this allows us to avoid several cache misses as we were taking a
set of cache misses for allocating the sk_buff and then another set for
receiving data into the sk_buff. We are able to avoid these misses on
receive now as we allocate the sk_buff when data is available.
Second we are able to see a considerable performance gain when an IOMMU is
enabled because we are no longer unmapping every buffer on receive.
Instead we can delay the unmap until we are unable to use the page, and
instead we can simply call sync_single_range on the half of the page that
contains new data.
Finally we are able to drop a considerable amount of code from the driver
as we no longer have to support 2 different receive modes, packet split and
one buffer. This allows us to optimize the Rx path further since less
branching is required.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This allows the NIC to receive all frames available, including
those with bad FCS, ethernet control frames, and more.
Tested by sending frames with bad FCS.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Including bad FCS, used generate frames with bad FCS
to test other system's handling of RX of bad packets.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Disabling and enabling DCB can cause FCoE hardware initialization to
occur on the incorrect traffic class when the up2tc mapping has not
yet been reconfigured.
Fix this by using the DCB configuration maps that are correct
and will be pushed at mqprio after DCB driver setup completes
successfully.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Marcus Dennis <marcusx.e.dennis@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There was a race condition in the reset path where the RX buffer
could become corrupted during Fdir configuration.This is due to
a HW bug.The fix right now is to lock the buffer while we do the
fdir configuration.Since we were using similar workaround for another bug,
I moved the existing code to a function and reused it.HW team also recommended
that IXGBE_MAX_SECRX_POLL value be changed from 30 to 40.The erratum for this
bug will be published in the next release 82599 Spec Update
Signed-off-by: Atita Shirwaikar <atita.shirwaikar@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
using the form min((int)var, ver)) is replaced by min_t(int, ...)
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This is clearly a typeo where we are not checking the return value from
get_link_capabilities but should. This patch corrects that.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There isn't much point in using variables to store the values of eitr_low
and eitr_high since they are not user changeable. As such I am replacing
them with the constants 10 and 20 in order to avoid any confusion on what
the values actually are.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
A previous fix had gone though and disabled relaxed ordering for Rx
descriptor read fetching. This was not necessary as this functions
correctly and has no ill effects on the system.
In addition several of the defines used for the DCA control registers were
incorrect in that they indicated descriptor effects when they actually had
an impact on either data or header write back. As such I have update these
to correctly reflect either DATA or HEAD.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it a bit easier to do the loopback frame creating and
testing. Previously we were doing an and to drop the last bit, and then
dividing the frame_size by 2 in order to get locations for frame bytes and
testing. Instead we can simplify it by just shifting the register one bit
to the right and using that for the frame offsets.
This change also replaces all instances of rx_buffer_info with just
rx_buffer since that is closer to the name of the actual structure being
used and can save a few extra characters.
In addition I have updated the logic for cleaning up a test frame so that
we pass an rx_buffer instead of the sk_buff. The main motivation behind
this is changes that will replace the sk_buff with just a page in the
future.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Since there are multiple spots where we have to cycle through all of the
rings on a q_vector it makes sense to just add a function for iterating
through all of them.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>