A future patch needs these functions defined earlier in the file. Move
them closer to above where they will be called.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
It is possible that under rare circumstances the device is undergoing
a reset, such as when a PFLR occurs, and the device may be transmitting
simultaneously. In this case, we might attempt to divide by zero when
finding the proper r_idx. Instead, lets read the num_tx_queues once,
and make sure it's non-zero.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We've always had a really weird looping construction for resetting VFs.
We read the VFLRE register and reset the VF if the corresponding bit is
set, which makes sense. However we loop continuously until we no longer
have any bits left unset. At first this makes sense, as a sort of "keep
trying until we succeed" concept.
Unfortunately this causes a problem if we happen to surprise remove
while this code is executing, because in this case we'll always read all
1s for the VFLRE register. This results in a hard lockup on the CPU
because the loop will never terminate.
Because our own reset function will clear the VFLR event register
always, (except when we've lost PCIe link obviously) there is no real
reason to loop. In practice, we'll loop over once and find that no VFs
are pending anymore.
Lets just check once. Since we're clear the notification when we reset
there's no benefit to the loop. Additionally, there shouldn't be a race
as future VLFRE events should trigger an interrupt. Additionally, we
didn't warn or do anything in the looped case anyways.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We're doing a really convoluted bitshift and read for the PFVFLRE
register. Just reading the PFVFLRE(1), shifting it by 32, then reading
PFVFLRE(0) should be sufficient.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When we load the driver, we set the last_reset to be in the future,
which delays the initial driver reset. Additionally, the service task
isn't scheduled to run automatically until the timer runs out. This
causes a needless delay of the first reset to begin talking to the
switch manager.
We can avoid this by simply not setting last_reset and immediately
scheduling the service task while in probe. This allows the device to
wake up faster, and avoids this delay.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Newer versions of GCC starting with 7 now additionally warn when a case
statement may fall through without an explicit comment mentioning it.
Add such a comment to silence the warning, as this is expected.
Unfortunately the comment must come directly before the next case
statement, so we put it outside the #ifdef. Otherwise, the compiler
cannot properly detect it and thus the warning is displayed regardless.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
New versions of GCC since version 7 began warning about possible
truncation of calls to snprintf. We can fix this and avoid false
positives. First, we should pass the full buffer size to snprintf,
because it guarantees a NULL character as part of its passed length, so
passing len-1 is simply wasting a byte of possible storage.
Second, if we make the ri and ti variables unsigned, the compiler is
able to correctly reason that the value never gets larger than 256, so
it doesn't need to warn about the full space required to print a signed
integer.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Newer versions of GCC since version 7 now warn when a case statement may
fall through without an explicit comment. "Fallthough" does not count as
it is misspelled. Fix the typos for these comments to appease the new
warnings.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In fm10k_get_host_state_generic, we check the mailbox tx_read() function
to ensure that the mailbox is still open. This function also checks to
make sure we have space to transmit another message. Unfortunately, if
we just recently sent a bunch of messages (such as enabling hundreds of
VLANs on a VF) this can result in a race where the watchdog task thinks
the link went down just because we haven't had time to process all these
messages yet.
Instead, lets just check whether the mailbox is still open. This ensures
that we don't race with the Tx FIFO, and we only link down once the
mailbox is not open.
This is safe, because if the FIFO fills up and we're unable to send
a message for too long, we'll end up triggering the timeout detection
which results in a reset. Additionally, since we still check to ensure
the mailbox state is OPEN, we'll transition to link down whenever the
mailbox closes as well.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Two single characters should be put into a sequence.
Thus use the corresponding function "seq_putc".
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When we are handling PF<->VF mailbox messages, it is possible that the
VF will send us so many messages that the PF<->SM FIFO will fill up. In
this case, we stop the loop and wait until the service event is
rescheduled.
Normally this should happen due to an interrupt. But it is possible that
we don't get another interrupt for a while and it isn't until the
service timer actually reschedules us. Instead, simply reschedule
immediately which will cause the service event to be run again as soon
as we exit.
This ensures that we promptly handle all of the PF<->VF messages with
minimal delay, while still giving time for the SM mailbox to drain.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When we process VF mailboxes, the driver is likely going to also queue
up messages to the switch manager. This process merely queues up the
FIFO, but doesn't actually begin the transmission process. Because we
hold the mailbox lock during this VF processing, the PF<->SM mailbox is
not getting processed at this time. Ensure that we actually process the
PF<->SM mailbox in between each PF<->VF mailbox.
This should ensure prompt transmission of the messages queued up after
each VF message is received and handled.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Make local functions static to fix
HOSTCC samples/bpf/xdp_monitor_user.o
samples/bpf/xdp_monitor_user.c:64:7: warning: no previous prototype for ‘gettime’ [-Wmissing-prototypes]
__u64 gettime(void)
^~~~~~~
samples/bpf/xdp_monitor_user.c:209:6: warning: no previous prototype for ‘print_bpf_prog_info’ [-Wmissing-prototypes]
void print_bpf_prog_info(void)
^~~~~~~~~~~~~~~~~~~
Fixes: 3ffab54602 ("samples/bpf: xdp_monitor tool based on tracepoints")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the unlikely event that mfc->mfc_un.res.ttls[i] is 255 for all
values of i from 0 to MAXIVS-1, the err is not set at all and hence
has a garbage value on the error return at the end of the function,
so initialize it to 0. Also, the error return check on err and goto
to err: inside the for loop makes it impossible for err to be zero
at the end of the for loop, so we can remove the redundant err check
at the end of the loop.
Detected by CoverityScan CID#1457207 ("Unitialized scalar value")
Fixes: c011ec1bbf ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jonas Gorski says:
====================
bcm63xx_enet: small fixes and cleanups
This patch set fixes a few theoretical issues and cleans up the code a
bit. It also adds a bit more managed function usage to simplify clock
and iomem usage.
Based on net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't use anyhing from that file, so drop it.
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
clk_disable and clk_unprepare are NULL-safe, so need to duplicate the
NULL check of the functions.
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use managed functions where possible to reduce the amount of resource
handling on error and remove paths.
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not rely on the shared device being probed before the enet(sw)
devices. This makes it easier to eventually move out the shared
device as a dma controller driver (what it should be).
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DMA controller regs actually point to DMA channel 0, so the write to
ENETDMA_CFG_REG will actually modify a random DMA channel.
Since DMA controller registers do not exist on BCM6345, guard the write
with the usual check for dma_has_sram.
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check the return code of prepare_enable and change one last instance of
enable only to prepare_enable. Also properly disable and release the
clock in error paths and on remove for enetsw.
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function is unused, and furthermore it is buggy since it suffers
from the same issue that requires IP6_ECN_set_ce() to take a pointer
to the skb so that it may (in case of CHECKSUM_COMPLETE) update skb->csum
Instead of fixing it, let's just outright remove it.
Tested: builds, and 'git grep IP6_ECN_clear' comes up empty
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Different namespace application might require different time period in
second to disable Fastopen on active TCP sockets.
Tested:
Simulate following similar situation that the server's data gets dropped
after 3WHS.
C ---- syn-data ---> S
C <--- syn/ack ----- S
C ---- ack --------> S
S (accept & write)
C? X <- data ------ S
[retry and timeout]
And then print netstat of TCPFastOpenBlackhole, the counter increased as
expected when the firewall blackhole issue is detected and active TFO is
disabled.
# cat /proc/net/netstat | awk '{print $91}'
TCPFastOpenBlackhole
1
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Different namespace application might require different tcp_fastopen_key
independently of the host.
David Miller pointed out there is a leak without releasing the context
of tcp_fastopen_key during netns teardown. So add the release action in
exit_batch path.
Tested:
1. Container namespace:
# cat /proc/sys/net/ipv4/tcp_fastopen_key:
2817fff2-f803cf97-eadfd1f3-78c0992b
cookie key in tcp syn packets:
Fast Open Cookie
Kind: TCP Fast Open Cookie (34)
Length: 10
Fast Open Cookie: 1e5dd82a8c492ca9
2. Host:
# cat /proc/sys/net/ipv4/tcp_fastopen_key:
107d7c5f-68eb2ac7-02fb06e6-ed341702
cookie key in tcp syn packets:
Fast Open Cookie
Kind: TCP Fast Open Cookie (34)
Length: 10
Fast Open Cookie: e213c02bf0afbc8a
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 'publish' logic is not necessary after commit dfea2aa654 ("tcp:
Do not call tcp_fastopen_reset_cipher from interrupt context"), because
in tcp_fastopen_cookie_gen,it wouldn't call tcp_fastopen_init_key_once.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Different namespace application might require enable TCP Fast Open
feature independently of the host.
This patch series continues making more of the TCP Fast Open related
sysctl knobs be per net-namespace.
Reported-by: Luca BRUNO <lucab@debian.org>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot says:
====================
net: dsa: change dsa_ptr for a dsa_port
With DSA, a master net_device is physically wired to a dedicated CPU
switch port. For interaction with the DSA layer, the struct net_device
contains a dsa_ptr, which currently points to a dsa_switch_tree object.
This is only valid for a switch fabric with a single CPU port. In order
to support switch fabrics with multiple CPU ports, we first need to
change the type of dsa_ptr to what it really is: a dsa_port object.
This is what this patchset does. The first patches adds a
dsa_master_get_slave helper and cleans up portions of DSA core to make
the next patches more readable. These next patches prepare the xmit and
receive hot paths and finally change dsa_ptr.
Changes in v2:
- introduce dsa_master_get_slave helper to simplify patch 6
- keep hot path data at beginning of dsa_port for cacheline 1
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that the dsa_ptr is a dsa_port instance, there is no need to keep
the tag operations in the dsa_switch_tree structure. Remove it.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With DSA, a master net device (CPU facing interface) has a dsa_ptr
pointer to which hangs a dsa_switch_tree. This is not correct because a
master interface is wired to a dedicated switch port, and because we can
theoretically have several master interfaces pointing to several CPU
ports of the same switch fabric.
Change the master interface's dsa_ptr for the CPU dsa_port pointer.
This is a step towards supporting multiple CPU ports.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation to make DSA master devices point to their corresponding
CPU port instead of the whole tree, add copies of dst and rcv in the
dsa_port structure so that we keep fast access in the receive hot path.
Also keep the copies at the beginning of the dsa_port structure in order
to ensure they are available in cacheline 1.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DSA tagging protocol operations are specific to each CPU port,
thus the dsa_device_ops pointer belongs to the dsa_port structure.
>From now on assign a slave's xmit copy from its CPU port tagging
operations. This will ease the future support for multiple CPU ports.
Also keep the tag_ops at the beginning of the dsa_port structure so that
we ensure copies for hot path are in cacheline 1.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When resolving the DSA tagging protocol used by a CPU switch, use a
temporary "tag_ops" variable to store the dsa_device_ops instead of
using directly dst->tag_ops. This will make the future patches moving
this pointer around easier to read.
There is no functional changes.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make it clear that the master device is linked to a CPU port by using
"cpu_dp" for the dsa_port variable in master.c instead of "port", then
use a "port" variable to describe the port index, as usually seen in
other places of DSA core.
This will make the future patch touching dsa_ptr more readable. There is
no functional changes.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DSA tagging code does not need to know about the DSA architecture,
it only needs to return the slave device corresponding to the source
port index (and eventually the source device index for cascade-capable
switches) parsed from the frame received on the master device.
For this purpose, provide an inline dsa_master_get_slave helper which
validates the device and port indexes and look up the slave device.
This makes the tagging rcv functions more concise and robust, and also
makes dsa_get_cpu_port obsolete.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pointer ndev is being dereferenced with the call to netif_running
before it is being null checked. Re-order the code to only dereference
ndev after it has been null checked.
Detected by CoverityScan, CID#1457206 ("Dereference before null check")
Fixes: 9df8f79a4d ("net: hns3: Add DCB support when interacting with network stack")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report the numbers of events for stop_queue and wake_queue in
ethtool stats.
Example:
ethtool -S eth0
NIC statistics:
...
stop_queue: 7
wake_queue: 7
...
Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch uses u64_to_user_ptr() to cast info.map_ids to a userspace ptr.
It also tags the user_map_ids with '__user' for sparse check.
Fixes: cb4d2b3f03 ("bpf: Add name, load_time, uid and map_ids to bpf_prog_info")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The assignment of -EINVAL to variable ret is redundant as it
is being overwritten on the following error exit paths or
to the return value from the following call to basic_set_parms.
Fix this up by removing it. Cleans up clang warning message:
net/sched/cls_basic.c:185:2: warning: Value stored to 'err' is never read
Fixes: 1d8134fea2 ("net_sched: use idr to allocate basic filter handles")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function ipmr_notifier_init is local to the source and does
not need to be in global scope, so make it static.
Cleans up sparse warning:
warning: symbol 'ipmr_notifier_init' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
State is initially reported as UNKNOWN. Before register call
netif_carrier_off(). Once the device is opened, call netif_carrier_on() in
order to set the state to UP.
Signed-off-by: Mick Tarsel <mjtarsel@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit e85ec74ace ("net: dsa: bcm_sf2: Defer port
enabling to calling port_enable") because this now makes an unbind
followed by a bind to fail connecting to the ingrated PHY.
What this patch missed is that we need the PHY to be enabled with
bcm_sf2_gphy_enable_set() before probing it on the MDIO bus. This is
correctly done in the ops->setup() function, but by the time
ops->port_enable() runs, this is too late. Upon unbind we would power
down the PHY, and so when we would bind again, the PHY would be left
powered off.
Fixes: e85ec74ace ("net: dsa: bcm_sf2: Defer port enabling to calling port_enable")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2017-09-29
This series contains updates to i40e and i40evf only.
Jake provides several of the changes starting with the renaming of a
variable to clarify what the value is actually calculating. Found we
were misusing the __I40E_RECOVERY_PENDING bit to determine when we
should actually request a new IRQ in i40e_setup_misc_vector(), which
lead to a design mistake, so to resolve the issue, use a separate
state bit for miscellaneous IRQ setup and fix up the design while we
are at it. Cleaned up the old legacy PM support in the driver since
we support the newer generic PM callbacks. Fixed a failure to
hibernate issue, where on some platforms with a large number of CPUs,
we would allocate many IRQ vectors which we would try to migrate to
CPU0 when hibernating.
Sudheer cleans up a check for unqualified module inside i40e_up_complete()
because the link state information is in flux at time, so log messages
are getting logged with incorrect link state information. Also provided
additional log message cleanups and simplify member variable access in
the printing of the link messages.
Mariusz relaxes the firmware check since Fortville and Fort Park NICs
can and do have different firmware versions, so only warn for older
Fortville firmware. Fixed an errata with a flow director statistic that
was not wrapping as expected, simply reset after reading.
Mitch prevents consternation by lowering the log level to debug on a
message seen regularly on VF reset or unload, which is meaningless under
normal circumstances. Refactor the firmware version checking since
Fortville and Fort Park devices can have different firmware versions.
Alan fixes a ring to vector mapping, where the past implementation
attempted to map each Tx and Rx ring to its own vector, however we use
combined queues so we should be mapping the Tx/Rx rings together on one
vector. Adds the ability for the VF to request a different number of
queues allocated to it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The check on len is redundant as it is always greater than 1,
so just remove it and make the printk less complex.
Detected by CoverityScan, CID#1226729 ("Logically dead code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far we've been relying on sockopt(SOL_IP, IP_FREEBIND) being usable
even on IPv6 sockets.
However, it turns out it is perfectly reasonable to want to set freebind
on an AF_INET6 SOCK_RAW socket - but there is no way to set any SOL_IP
socket option on such a socket (they're all blindly errored out).
One use case for this is to allow spoofing src ip on a raw socket
via sendmsg cmsg.
Tested:
built, and booted
# python
>>> import socket
>>> SOL_IP = socket.SOL_IP
>>> SOL_IPV6 = socket.IPPROTO_IPV6
>>> IP_FREEBIND = 15
>>> IPV6_FREEBIND = 78
>>> s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, 0)
>>> s.getsockopt(SOL_IP, IP_FREEBIND)
0
>>> s.getsockopt(SOL_IPV6, IPV6_FREEBIND)
0
>>> s.setsockopt(SOL_IPV6, IPV6_FREEBIND, 1)
>>> s.getsockopt(SOL_IP, IP_FREEBIND)
1
>>> s.getsockopt(SOL_IPV6, IPV6_FREEBIND)
1
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NS for DAD are sent on admin up as long as a valid qdisc is found.
A race condition exists by which these packets will not egress the
interface if the operational state of the lower device is not yet up.
The solution is to delay DAD until the link is operationally up
according to RFC2863. Rather than only doing this, follow the existing
code checks by deferring IPv6 device initialization altogether. The fix
allows DAD on devices like tunnels that are controlled by userspace
control plane. The fix has no impact on regular deployments, but means
that there is no IPv6 connectivity until the port has been opened in
the case of port-based network access control, which should be
desirable.
Signed-off-by: Mike Manning <mmanning@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The i40e driver now supports two different devices with two different
firmware versions. So be smart about how we handle these. Move the FW
version macros to the appropriate header file, and add a convenience
macro that checks the version based on the device. Then use this macro
to check whether or not the driver can use the new link info API.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently the PF allocates a default number of queues for each VF and
cannot be changed. This patch enables the VF to request a different
number of queues allocated to it. This patch also adds a new virtchnl
op and capability flag to facilitate this negotiation.
After the PF receives a request message, it will set a requested number
of queues for that VF. Then when the VF resets, its VSI will get a new
number of queues allocated to it.
This is a best effort request and since we only allocate a guaranteed
default number, if the VF tries to ask for more than the guaranteed
number, there may not be enough in HW to accommodate it unless other
queues for other VFs are freed. It should also be noted decreasing the
number queues allocated to a VF to below the default will NOT enable the
allocation of more than 32 VFs per PF and will not free queues guaranteed
to each VF by default.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The current implementation for mapping queues to vectors is broken
because it attempts to map each Tx and Rx ring to its own vector,
however we use combined queues so we should actually be mapping the
Tx/Rx rings together on one vector.
Also in the current implementation, in the case where we have more
queues than vectors, we attempt to group the queues together into
'chunks' and map each 'chunk' of queues to a vector. Chunking them
together would be more ideal if, and only if, we only had RSS because of
the way the hashing algorithm works but in the case of a future patch
that enables VF ADq, round robin assignment is better and still works
with RSS.
This patch resolves both those issues and simplifies the code needed to
accomplish this. Instead of treating the case where we have more queues
than vectors as special, if we notice our vector index is greater than
vectors, reset the vector index to zero and continue mapping. This
should ensure that in both cases, whether we have enough vectors for
each queue or not, the queues get appropriately mapped.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
On some platforms with a large number of CPUs, we will allocate many IRQ
vectors. When hibernating, the system will attempt to migrate all of the
vectors back to CPU0 when shutting down all the other CPUs. It is
possible that we have so many vectors that it cannot re-assign them to
CPU0. This is even more likely if we have many devices installed in one
platform.
The end result is failure to hibernate, as it is not possible to
shutdown the CPUs. We can avoid this by disabling MSI-X and clearing our
interrupt scheme when the device is suspended. A more ideal solution
would be some method for the stack to properly handle this for all
drivers, rather than on a case-by-case basis for each driver to fix
itself.
However, until this more ideal solution exists, we can do our part and
shutdown our IRQs during suspend, which should allow systems with
a large number of CPUs to safely suspend or hibernate.
It may be worth investigating if we should shut down even further when
we suspend as it may make the path cleaner, but this was the minimum fix
for the hibernation issue mentioned here.
Testing-hints:
This affects systems with a large number of CPUs, and with multiple
devices enabled. Without this change, those platforms are unable to
hibernate at all.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Although the service task does check the suspended status before
running, it might already be part way through running when we go to
suspend. Lets ensure that the service task is stopped and will not be
restarted again until we finish resuming. This ensures that service task
code does not cause strange interactions with the suspend/resume
handlers.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>