If vPort has VLAN_RESTRICT flag, VLAN tagged traffic will not be
delivered without corresponding Rx filters which may be proxied to and
moderated by hypervisor.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If should be done after net_dev->hw_features initialization, to keep the
feature there to be able to enable it later using ethtool.
VLAN filtering is enforced and fixed if vPort requires usage of VLAN
filters to receive tagged traffic.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If it is not supported we simply disable the feature.
For the feature to work we need firmware filter support for
OUTER_VID + LOC_MAC and for OUTER_VID + LOC_MAC_IG.
The low-latency firmware can match on OUTER_VID + LOC_MAC but not on
OUTER_VID + LOC_MAC_IG.
For the capture packet firmware it is the other way around.
Only the full-feature variant can match on both combinations.
Incorporates a fix by Andrew Rybchenko <Andrew.Rybchenko@oktetlabs.ru>
in the net_dev->[hw_]features handling.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Filter match flags are not unique criteria to be mapped to priority
because of both unknown unicast and unknown multicast are mapped to
LOC_MAC_IG. So, local MAC is required to map filter to priority.
MCDI filter flags is unique criteria to find filter priority.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nearly every time we call efx_ef10_filter_remove_unsafe, we first check
for EFX_EF10_FILTER_ID_INVALID, in which case we do nothing. So move
that check into the function, simplifying all the call sites.
Also, change the return type to void, since none of the callers check it.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When trying to enslave an SFC interface to a bond the following BUG_ON was
hit:
kernel BUG [in ef10.c]!
CPU: 0 PID: 4383 Comm: ifenslave Tainted: G
...
Call Trace:
efx_ef10_filter_add_vlan+0x121/0x180 [sfc]
efx_ef10_filter_table_probe+0x2a2/0x4f0 [sfc]
efx_ef10_set_mac_address+0x370/0x6d0 [sfc]
efx_set_mac_address+0x7d/0x120 [sfc]
dev_set_mac_address+0x43/0xa0
bond_enslave+0x337/0xea0 [bonding]
This comes from function efx_ef10_filter_vlan_sync_rx_mode.
To solve the bug we ensure the mac_lock is taken before calling
efx_ef10_filter_add_vlan. But to avoid a priority inversion mac_lock must
be taken before filter_sem.
To satisfy these requirements we end up taking mac_lock in
efx_ef10_vport_set_mac_address, efx_ef10_set_mac_address,
efx_ef10_sriov_set_vf_vlan and efx_probe_filters.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Supports HW VLAN filtering, en/disabled using ethtool.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Right now it contains dummy VLAN entry with unspecified VID only.
The entry is used for the case when HW VLAN filtering is not used.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These flags are built when address cache is updated.
The information will be required when VLAN filtering is added and address
cache is used without re-sync.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is a step to support VLAN filtering in HW.
Until then, there is only one struct efx_ef10_filter_vlan per struct
efx_ef10_filter_table, with no VLAN information yet.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is required to remove setting of filter IDs to invalid from multicast
and unicast addresses caching functions.
Add initialization to invalid when filter table is created.
Add paranoid checks to track consistency.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on a patch by Andrew Rybchenko <Andrew.Rybchenko@oktetlabs.ru>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It allows to change set of fixed features on datapath reset.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is used for EF10 only and logically belongs to EF10 filter table state.
It is OK that it is reset to false on filter table recreation since all
filters are removed on destruction.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7000-series SFC NICs connected with an SFP+ module currently fail to
report any supported link speeds.
Reported-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Reviewed-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise we get confused when two flows on different channels get the
same flow ID.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise, if we fail to allocate new PIO buffers, our TXQs will try to
use the old ones, which aren't there any more.
Fixes: 183233bec8 "sfc: Allocate and link PIO buffers; map them with write-combining"
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When certain firmware variants are selected (via the sfboot utility) the
SFC7000 and SFC8000 series NICs don't support RSS. The driver still
tries (and fails) to insert filters with the RSS flag, and the NIC fails
to pass traffic.
When the firmware reports RSS_LIMITED suppress allocating a default RSS
context. The absence of an RSS context is picked up in filter insertion
and RSS flags are discarded.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I added this check in setup_tc to multiple drivers,
if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
Unfortunately restricting to TC_H_ROOT like this breaks the old
instantiation of mqprio to setup a hardware qdisc. This patch
relaxes the test to only check the type to make it equivalent
to the check before I broke it. With this the old instantiation
continues to work.
A good smoke test is to setup mqprio with,
# tc qdisc add dev eth4 root mqprio num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 0@0 1@1 2@2 3@3 4@4 5@5 6@6 7@7
Fixes: e4c6734eaa ("net: rework ndo tc op to consume additional qdisc handle paramete")
Reported-by: Singh Krishneil <krishneil.k.singh@intel.com>
Reported-by: Jake Keller <jacob.e.keller@intel.com>
CC: Murali Karicheri <m-karicheri2@ti.com>
CC: Shradha Shah <sshah@solarflare.com>
CC: Or Gerlitz <ogerlitz@mellanox.com>
CC: Ariel Elior <ariel.elior@qlogic.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Bruce Allan <bruce.w.allan@intel.com>
CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates setup_tc so we can pass additional parameters into
the ndo op in a generic way. To do this we provide structured union
and type flag.
This lets each classifier and qdisc provide its own set of attributes
without having to add new ndo ops or grow the signature of the
callback.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ndo_setup_tc() op was added to support drivers offloading tx
qdiscs however only support for mqprio was ever added. So we
only ever added support for passing the number of traffic classes
to the driver.
This patch generalizes the ndo_setup_tc op so that a handle can
be provided to indicate if the offload is for ingress or egress
or potentially even child qdiscs.
CC: Murali Karicheri <m-karicheri2@ti.com>
CC: Shradha Shah <sshah@solarflare.com>
CC: Or Gerlitz <ogerlitz@mellanox.com>
CC: Ariel Elior <ariel.elior@qlogic.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Bruce Allan <bruce.w.allan@intel.com>
CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Reviewed-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Depending on configuration the NIC may return errors for unprivileged
functions and/or VFs. Where these are expected and handled, reduce the
level of any output.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When running in an unprivileged function we expect some MC commands
to fail with permission errors. To avoid log spew downgrade these to
debug only.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are situations - mostly reset related - where our view of the
filter table differs from the hardware. In this case we may try and
remove filters that aren't actually installed. This isn't that
interesting in most situations, so downgrade the logging.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For unprivileged functions operations can be authorised by an admin
function. Extra steps are introduced to the MCDI protocol in this
situation - the initial response from the MCDI tells us that the
operation has been deferred, and we must retry when told. We then
receive an event telling us to retry.
Note that this provides only the functionality for the unprivileged
functions, not the handling of the administrative side.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After reboot the vswitch configuration from the PF may not be
complete before the VF attempts to restore filters. In that
case we see NO_EVB_PORT errors from the MC. Retry up to a time
limit or until a different result is seen.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/geneve.c
Here we had an overlapping change, where in 'net' the extraneous stats
bump was being removed whilst in 'net-next' the final argument to
udp_tunnel6_xmit_skb() was being changed.
Signed-off-by: David S. Miller <davem@davemloft.net>
These netif flags are unnecessary convolutions. It is more
straightforward to just use NETIF_F_HW_CSUM, NETIF_F_IP_CSUM,
and NETIF_F_IPV6_CSUM directly.
This patch also:
- Cleans up can_checksum_protocol
- Simplifies netdev_intersect_features
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The name NETIF_F_ALL_CSUM is a misnomer. This does not correspond to the
set of features for offloading all checksums. This is a mask of the
checksum offload related features bits. It is incorrect to set both
NETIF_F_HW_CSUM and NETIF_F_IP_CSUM or NETIF_F_IPV6 at the same time for
features of a device.
This patch:
- Changes instances of NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK (where
NETIF_F_ALL_CSUM is being used as a mask).
- Changes bonding, sfc/efx, ipvlan, macvlan, vlan, and team drivers to
use NEITF_F_HW_CSUM in features list instead of NETIF_F_ALL_CSUM.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We test for if "tries" is zero at the end but "tries--" is a post-op so
it will end with "tries" set to -1. I have changed it to a pre-op
instead.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without this, filter insertion on a VF would fail if only one channel
was in use. This would include the unicast station filter and therefore
no traffic would be received.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A change in MCFW behaviour means that the net driver must update its record
of the warm_boot_count by reading it from the ER_DZ_BIU_MC_SFT_STATUS
register.
On v4.6.x MCFW the global boot count was incremented when some functions
needed to be reset to enable multicast chaining, so all functions saw the
same value. In that case, the driver needed to increment its
warm_boot_count when other functions were reset, to avoid noticing it later
and then trying to reset itself to recover unnecessarily.
With v4.7+ MCFW, the boot count in firmware doesn't change as that is
unnecessary since the PFs that have been reset will each receive an MC
reboot notification. In that case, the driver re-reads the unchanged
value.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't open-code it.
CC: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
CC: Shradha Shah <sshah@solarflare.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also add support for 7000 series 40G NIC VF.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Solarflare 8000 series NIC will use a new TSO scheme. The current
driver refuses to load if the current TSO scheme is not found. Remove
that check and instead make the TSO version a per-queue parameter.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NAPI drivers no longer need to observe a particular protocol
to benefit from busy polling (CONFIG_NET_RX_BUSY_POLL=y)
napi_hash_add() and napi_hash_del() are automatically called
from core networking stack, respectively from
netif_napi_add() and netif_napi_del()
This patch depends on free_netdev() and netif_napi_del() being
called from process context, which seems to be the norm.
Drivers might still prefer to call napi_hash_del() on their
own, since they might combine all the rcu grace periods into
a single one, knowing their NAPI structures lifetime, while
core networking stack has no idea of a possible combining.
Once this patch proves to not bring serious regressions,
we will cleanup drivers to either remove napi_hash_del()
or provide appropriate rcu grace periods combining.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We would like to automatically provide busy polling support
to all NAPI drivers, without them having to implement anything.
skb_mark_napi_id() can be called from napi_gro_receive() and
napi_get_frags().
Few drivers are still calling skb_mark_napi_id() because
they use netif_receive_skb(). They should eventually call
napi_gro_receive() instead. I will leave this to drivers
maintainers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This pci_error_handlers structure is never modified, like all the other
pci_error_handlers structures, so declare it as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
dma_set_mask already checks for a supported DMA mask before updating it,
the call to dma_supported is redundant.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
Cc: Shradha Shah <sshah@solarflare.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking updates from David Miller:
Changes of note:
1) Allow to schedule ICMP packets in IPVS, from Alex Gartrell.
2) Provide FIB table ID in ipv4 route dumps just as ipv6 does, from
David Ahern.
3) Allow the user to ask for the statistics to be filtered out of
ipv4/ipv6 address netlink dumps. From Sowmini Varadhan.
4) More work to pass the network namespace context around deep into
various packet path APIs, starting with the netfilter hooks. From
Eric W Biederman.
5) Add layer 2 TX/RX checksum offloading to qeth driver, from Thomas
Richter.
6) Use usec resolution for SYN/ACK RTTs in TCP, from Yuchung Cheng.
7) Support Very High Throughput in wireless MESH code, from Bob
Copeland.
8) Allow setting the ageing_time in switchdev/rocker. From Scott
Feldman.
9) Properly autoload L2TP type modules, from Stephen Hemminger.
10) Fix and enable offload features by default in 8139cp driver, from
David Woodhouse.
11) Support both ipv4 and ipv6 sockets in a single vxlan device, from
Jiri Benc.
12) Fix CWND limiting of thin streams in TCP, from Bendik Rønning
Opstad.
13) Fix IPSEC flowcache overflows on large systems, from Steffen
Klassert.
14) Convert bridging to track VLANs using rhashtable entries rather than
a bitmap. From Nikolay Aleksandrov.
15) Make TCP listener handling completely lockless, this is a major
accomplishment. Incoming request sockets now live in the
established hash table just like any other socket too.
From Eric Dumazet.
15) Provide more bridging attributes to netlink, from Nikolay
Aleksandrov.
16) Use hash based algorithm for ipv4 multipath routing, this was very
long overdue. From Peter Nørlund.
17) Several y2038 cures, mostly avoiding timespec. From Arnd Bergmann.
18) Allow non-root execution of EBPF programs, from Alexei Starovoitov.
19) Support SO_INCOMING_CPU as setsockopt, from Eric Dumazet. This
influences the port binding selection logic used by SO_REUSEPORT.
20) Add ipv6 support to VRF, from David Ahern.
21) Add support for Mellanox Spectrum switch ASIC, from Jiri Pirko.
22) Add rtl8xxxu Realtek wireless driver, from Jes Sorensen.
23) Implement RACK loss recovery in TCP, from Yuchung Cheng.
24) Support multipath routes in MPLS, from Roopa Prabhu.
25) Fix POLLOUT notification for listening sockets in AF_UNIX, from Eric
Dumazet.
26) Add new QED Qlogic river, from Yuval Mintz, Manish Chopra, and
Sudarsana Kalluru.
27) Don't fetch timestamps on AF_UNIX sockets, from Hannes Frederic
Sowa.
28) Support ipv6 geneve tunnels, from John W Linville.
29) Add flood control support to switchdev layer, from Ido Schimmel.
30) Fix CHECKSUM_PARTIAL handling of potentially fragmented frames, from
Hannes Frederic Sowa.
31) Support persistent maps and progs in bpf, from Daniel Borkmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1790 commits)
sh_eth: use DMA barriers
switchdev: respect SKIP_EOPNOTSUPP flag in case there is no recursion
net: sched: kill dead code in sch_choke.c
irda: Delete an unnecessary check before the function call "irlmp_unregister_service"
net: dsa: mv88e6xxx: include DSA ports in VLANs
net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports
net/core: fix for_each_netdev_feature
vlan: Invoke driver vlan hooks only if device is present
arcnet/com20020: add LEDS_CLASS dependency
bpf, verifier: annotate verbose printer with __printf
dp83640: Only wait for timestamps for packets with timestamping enabled.
ptp: Change ptp_class to a proper bitmask
dp83640: Prune rx timestamp list before reading from it
dp83640: Delay scheduled work.
dp83640: Include hash in timestamp/packet matching
ipv6: fix tunnel error handling
net/mlx5e: Fix LSO vlan insertion
net/mlx5e: Re-eanble client vlan TX acceleration
net/mlx5e: Return error in case mlx5e_set_features() fails
net/mlx5e: Don't allow more than max supported channels
...
Pull locking changes from Ingo Molnar:
"The main changes in this cycle were:
- More gradual enhancements to atomic ops: new atomic*_read_ctrl()
ops, synchronize atomic_{read,set}() ordering requirements between
architectures, add atomic_long_t bitops. (Peter Zijlstra)
- Add _{relaxed|acquire|release}() variants for inc/dec atomics and
use them in various locking primitives: mutex, rtmutex, mcs, rwsem.
This enables weakly ordered architectures (such as arm64) to make
use of more locking related optimizations. (Davidlohr Bueso)
- Implement atomic[64]_{inc,dec}_relaxed() on ARM. (Will Deacon)
- Futex kernel data cache footprint micro-optimization. (Rasmus
Villemoes)
- pvqspinlock runtime overhead micro-optimization. (Waiman Long)
- misc smaller fixlets"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ARM, locking/atomics: Implement _relaxed variants of atomic[64]_{inc,dec}
locking/rwsem: Use acquire/release semantics
locking/mcs: Use acquire/release semantics
locking/rtmutex: Use acquire/release semantics
locking/mutex: Use acquire/release semantics
locking/asm-generic: Add _{relaxed|acquire|release}() variants for inc/dec atomics
atomic: Implement atomic_read_ctrl()
atomic, arch: Audit atomic_{read,set}()
atomic: Add atomic_long_t bitops
futex: Force hot variables into a single cache line
locking/pvqspinlock: Kick the PV CPU unconditionally when _Q_SLOW_VAL
locking/osq: Relax atomic semantics
locking/qrwlock: Rename ->lock to ->wait_lock
locking/Documentation/lockstat: Fix typo - lokcing -> locking
locking/atomics, cmpxchg: Privatize the inclusion of asm/cmpxchg.h
Minor overlapping changes in net/ipv4/ipmr.c, in 'net' we were
fixing the "BH-ness" of the counter bumps whilst in 'net-next'
the functions were modified to take an explicit 'net' parameter.
Signed-off-by: David S. Miller <davem@davemloft.net>
When the IP stack passes SKBs the sfc driver puts them in 2 different TX
queues (called partners), one for checksummed and one for not checksummed.
If the SKB has xmit_more set the driver will delay pushing the work to the
NIC.
When later it does decide to push the buffers this patch ensures it also
pushes the partner queue, if that also has any delayed work. Before this
fix the work in the partner queue would be left for a long time and cause
a netdev watchdog.
Fixes: 70b33fb ("sfc: add support for skb->xmit_more")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reduces the overhead of locking for busy poll.
Previously the state was protected by a lock, whereas now
it's manipulated solely with atomic operations.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On EF10, MC_CMD_VPORT_RECONFIGURE can cause a CODE_MC_REBOOT event
to be sent to a function without incrementing the (adapter-wide)
warm_boot_count. In this case, the reboot is not detected by the
loop on efx_mcdi_poll_reboot(), so prepare for recovery from an MC
reboot anyway. When this codepath is run, the MC has always just
rebooted, so this recovery is valid.
The loop on efx_mcdi_poll_reboot() is still required for other MC
reboot cases, so that actions in response to an MC reboot are
performed, such as clearing locally calculated statistics.
Siena NICs are unaffected by this change as the above scenario
does not apply.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sfc driver internally uses a time format based on 32-bit (unsigned)
seconds and 32-bit nanoseconds. This means it will overflow in 2106,
but the value we pass into it is a signed 32-bit tv_sec that already
overflows in 2038 to a negative value.
This patch changes the logic to use the lower 32 bits of the timespec64
tv_sec in efx_ptp_ns_to_s_ns, which will have the correct value beyond the overflow.
While this does not change any of the register values, it lets us
keep using the driver after we deprecate the use of the timespec type
in the kernel.
In the efx_ptp_process_times function, the change to use timespec64
is similar, in that the tv_sec portion is ignored anyway and we only
care about the nanosecond portion that remains unchanged.
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
The pps_event_time uses two 'timespec' structures internally, which
suffer from the y2038 problem. The uses of this structure are
fairly self-contained in the pps code, so this replaces them all at
once.
Unfortunately, this includes the sfc ethernet driver aside from the
pps subsystem, so we change that one as well. Both touch the
same data structure, and there probably is no good way to split
the patch into smaller units.
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
After commit:
654672d4ba ("locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations")
Architectures may only provide {cmp,}xchg_relaxed definitions in
asm/cmpxchg.h. Other variants, such as {cmp,}xchg, may be built in
linux/atomic.h, which means simply including asm/cmpxchg.h may not get
the definitions of all the{cmp,}xchg variants.
Therefore, we should privatize the inclusions of asm/cmpxchg.h to
keep it only included in arch/* and replace the inclusions outside
with linux/atomic.h
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aybuke Ozdemir <aybuke.147@gmail.com>
Cc: Chris Brannon <chris@the-brannons.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirk Reiser <kirk@reisers.ca>
Cc: Kishon Vijay Abraham I <kishon@ti.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Shradha Shah <sshah@solarflare.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: William Hubbs <w.d.hubbs@gmail.com>
Cc: devel@driverdev.osuosl.org
Cc: linux-net-drivers@solarflare.com
Cc: speakup@linux-speakup.org
Link: http://lkml.kernel.org/r/1440589966-26280-1-git-send-email-boqun.feng@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Previously, the driver would refuse to load if it couldn't secure
enough VIs from the MC to fulfill its RSS requirements.
This was causing probe to fail on later functions in
configurations where we'd run out of VIs, such as having many
VFs.
This change allows the driver to load with fewer VIs, down to a
minimum of 2. A warning will be printed saying that RSS
requirements were not met, possibly affecting performance.
efx->max_tx_channels needs to be set to avoid going down the
failure path in efx_probe_nic() immediately in the loop after the
probe() NIC-type function.
Also, Set rc=ENOSPC when bombing out of efx_probe_nic due to lack
of VIs.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the stats handling code differs based on SR-IOV support,
and SRIOV support is only available if full-featured firmware is
used.
Do not use vadaptor stats if firmware mode is not set to
full-featured.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The policy in the net driver is to attempt MCDI commands and
then handle any EPERM error codes appropriately when returned
by unprivileged functions.
The ethtool selftest contains some tests which are useful on
an unprivileged function, such as the event queue interrupt
tests, but other tests cannot be performed as the function
does not have the required permissions.
If a test returns -EPERM, act as though the test was not run
and continue.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Separate functions for inserting individual and promisc filters; explicit
fallback logic in efx_ef10_filter_sync_rx_mode(), in order not to overload
the 'promisc' flag as also meaning "fall back to promisc".
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the workaround to support cascaded multicast filters ("workaround_26807") is
enabled, the broadcast filter and individual multicast filters are not inserted
when in promiscuous or allmulti mode.
There is a race while inserting and removing filters when entering and leaving
promiscuous mode. When changing promiscuous state with cascaded multicast
filters, the old multicast filters are removed before inserting the new filters
to avoid duplicating packets; this can lead to dropped packets until all
filters have been inserted.
The efx_nic:mc_promisc flag is added to record the presence of a multicast
promiscuous filter; this gives a simple way to tell if the promiscuous state is
changing.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change is only re-factoring; there are no changes to functionality
except for a slight elaboration of an error message (on mismatch filter
insertion failure).
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a function is in promiscuous mode and another function has a broadcast or
multicast filter inserted, the function in promiscuous mode won't see that
broadcast or multicast traffic.
Most notably this breaks broadcast, which means ARP doesn't work. Less
show-stoppingly, a function listening on a multicast address that's also in
promiscuous mode will not see that multicast traffic if another function is
also listening on that multicast address.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When enabling the workaround for cascaded multicast filters, the MC
can reset other functions if they have already inserted filters.
In that case, the workaround has been enabled, but print an info
message in the log recording that other functions had to be reset.
As other functions were reset, the MC will have incremented its boot
count, so also increment the warm_boot_count on the function which
enabled the workaround, as that function won't have received an MC
reboot event and does not need to reset.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The initial use of this will be to check a flag reporting if an FLR was
performed on other functions when enabling cascaded multicast filters.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GET_WORKAROUNDS was only introduced in May 2014, not all firmware
will have it. So call sites need to handle ENOSYS.
In this case we're probing the bug26807 workaround, which is not
implemented in any firmware that doesn't have GET_WORKAROUNDS.
So interpret ENOSYS as 'false'.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After creating event queue 0, check to see if the workaround is enabled,
and enable it if necessary. This will be called during PCI probe and
also when coming back up after a reset. The nic_data->workaround_26807
will be used in the future to control the filter insertion behaviour
based on this workaround.
Only the primary PF can enable this workaround, so tolerate an EPERM
error and continue. Otherwise, if any step in the checking and enabling
of the workaround fails, the event queue must be removed.
We check that workaround is implemented before trying to enable it,
and store the current workaround setting before trying to change it.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CC: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
CC: Shradha Shah <sshah@solarflare.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The limit for BQL is updated each time we call
netdev_tx_completed_queue.
Without this patch the BQL limit was updated for every TX event we
see.
The issue was that this only updated the limit to handle the data
we complete in two events as the first event wouldn't show that
enough traffic had been processed between them.
This was OK when interrupt moderation was off but not when it was
on as more data had to be completed in a single interrupt.
The patch changes this so that we do report the completion to BQL
only when all the TX events in the interrupt have been processed.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch avoids the double up_write to filter_sem if
efx_net_open() fails.
Resolves: 2d432f20d2
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some versions of MCFW do not support the MC_CMD_VADAPTOR_SET_MAC
command, and ENOSYS will be returned.
If the PF created its own vport, the function's datapath must be
stopped and the vport can be reconfigured to reflect the new MAC
address.
If the MCFW created the vport for the PF (which is the case when
the nic_data->vport_mac is blank), nothing further needs to be
done as the vport is not under the control of the PF.
This only applies to PFs because the MCFW in question does not
support VFs.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Re-organize the structure of error handling to avoid having
to duplicate the netif_err() around the ifdefs.
The only change to the behaviour of the error-handling is that
the PF's data structure to record VF details should only be
updated if the original command succeeded.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
1) Add TX fast path in mac80211, from Johannes Berg.
2) Add TSO/GRO support to ibmveth, from Thomas Falcon
3) Move away from cached routes in ipv6, just like ipv4, from Martin
KaFai Lau.
4) Lots of new rhashtable tests, from Thomas Graf.
5) Run ingress qdisc lockless, from Alexei Starovoitov.
6) Allow servers to fetch TCP packet headers for SYN packets of new
connections, for fingerprinting. From Eric Dumazet.
7) Add mode parameter to pktgen, for testing receive. From Alexei
Starovoitov.
8) Cache access optimizations via simplifications of build_skb(), from
Alexander Duyck.
9) Move page frag allocator under mm/, also from Alexander.
10) Add xmit_more support to hv_netvsc, from KY Srinivasan.
11) Add a counter guard in case we try to perform endless reclassify
loops in the packet scheduler.
12) Extern flow dissector to be programmable and use it in new "Flower"
classifier. From Jiri Pirko.
13) AF_PACKET fanout rollover fixes, performance improvements, and new
statistics. From Willem de Bruijn.
14) Add netdev driver for GENEVE tunnels, from John W Linville.
15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso.
16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet.
17) Add an ECN retry fallback for the initial TCP handshake, from Daniel
Borkmann.
18) Add tail call support to BPF, from Alexei Starovoitov.
19) Add several pktgen helper scripts, from Jesper Dangaard Brouer.
20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa.
21) Favor even port numbers for allocation to connect() requests, and
odd port numbers for bind(0), in an effort to help avoid
ip_local_port_range exhaustion. From Eric Dumazet.
22) Add Cavium ThunderX driver, from Sunil Goutham.
23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata,
from Alexei Starovoitov.
24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai.
25) Double TCP Small Queues default to 256K to accomodate situations
like the XEN driver and wireless aggregation. From Wei Liu.
26) Add more entropy inputs to flow dissector, from Tom Herbert.
27) Add CDG congestion control algorithm to TCP, from Kenneth Klette
Jonassen.
28) Convert ipset over to RCU locking, from Jozsef Kadlecsik.
29) Track and act upon link status of ipv4 route nexthops, from Andy
Gospodarek.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits)
bridge: vlan: flush the dynamically learned entries on port vlan delete
bridge: multicast: add a comment to br_port_state_selection about blocking state
net: inet_diag: export IPV6_V6ONLY sockopt
stmmac: troubleshoot unexpected bits in des0 & des1
net: ipv4 sysctl option to ignore routes when nexthop link is down
net: track link-status of ipv4 nexthops
net: switchdev: ignore unsupported bridge flags
net: Cavium: Fix MAC address setting in shutdown state
drivers: net: xgene: fix for ACPI support without ACPI
ip: report the original address of ICMP messages
net/mlx5e: Prefetch skb data on RX
net/mlx5e: Pop cq outside mlx5e_get_cqe
net/mlx5e: Remove mlx5e_cq.sqrq back-pointer
net/mlx5e: Remove extra spaces
net/mlx5e: Avoid TX CQE generation if more xmit packets expected
net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion
net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq()
net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them
net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues
net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device
...
Without this change, modprobe -r sfc hits the BUG_ON() in
efx_pci_remove_main().
Fixes: e7fef9b45a ("sfc: add sysfs entry to control MCDI tracing")
Reported-by: Jarod Wilson <jarod@redhat.com>
Reviewed-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If any VF is assigned as the PF is unloaded, do not attempt to
remove its vport or the vswitch. These will be removed if the
driver binds to the PF again, as an entity reset occurs during
probe.
A 'force' flag is added to efx_ef10_pci_sriov_disable() to
distinguish between disabling SR-IOV and driver unload.
SR-IOV cannot be disabled if VFs are assigned to guests.
If the PF driver is unloaded while VFs are assigned, the driver
may try to bind to the VF again at a later point if the driver
has been reloaded and the VF returns to the same domain as the PF.
In this case, the PF will not have a VF data structure, so the VF
can check this and drop out of probe early.
In this case, efx->vf_count will be zero but VFs will be present.
The user is advised to remove the VF and re-create it. The check
at the beginning of efx_ef10_pci_sriov_disable() that
efx->vf_count is non-zero is removed to allow SR-IOV to be
disabled in this case. Also, if the PF driver is unloaded, it
will disable SR-IOV to remove these unknown VFs.
By not disabling bus-mastering if VFs are still assigned, the VF
will continue to pass traffic after the PF has been removed.
When using the max_vfs module parameter, if VFs are already
present do not try to initialise any more.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the driver unloads, force the unbind and removal of any
VFs in the host with the PF. The PF cannot remove vports and
vswitches if they are still being used by a VF driver, and when
unloading the sfc driver the removal order is not guaranteed,
so the instruction from the PF to the VF to unbind enforces a
suitable ordering so that vswitches and vports can be removed.
As a result of this, manually unbinding the driver from a single
PF will result in all of its VFs in the host also being removed.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ifenslave command to set up a bond runs in an atomic
context, and it queries the stats on the devices that are
being enslaved. A VF needs to make an MCDI call to update
its stats, which is not allowed in atomic context.
The releasing of the stats_lock is moved to the beginning of
the VF stats update function so that in_interrupt() can be
used; it must be taken again before returning from this
function.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The raw_mask array is not initialised, so it needs to be
explicitly set to zero in the 'else' branch.
If the EVB capability is not present, a port cannot have multiple
functions so the per-port MAC stats are correct and should match
the corresponding vadaptor stats, so this redundancy can be
removed from the ethtool stats output.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MC_CMD_MAC_STATS can be called on a function before a
vadaptor has been created, as the kernel can call into this
through ndo_get_stats/ndo_get_stats64.
If MC_CMD_MAC_STATS is called before the DMA queues have been
setup, so that a vadaptor has not been created yet, firmware
will return ENOENT. This is expected, so suppress the MCDI
error message in this case.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netdevice statistics (in /proc/net/dev) are per-function
stats so they must use the vadaptor stats. Change the use of
MAC stats to vadaptor stats, and remove any statistics that
can only be measured per-port. All stats that are removed
will be shown as zeroes when these statistics are displayed.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware does not support a periodic DMA of vadaptor-stats
on VFs, so only update the stats buffer when stats are
requested (when running "ethtool -S" or an ip/ifconfig
command that reports stats).
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All interfaces will display vadaptor statistics, so set all the
relevant bits in the stats bitmask. Only functions with the
LINKCTRL flag will see other stats, including (per-port) MAC stats.
The vadaptor stats are from rx_unicast to tx_overflow.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The port-id must be known so that the RMON level can be
set for the collection of vadapter stats.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MAC stats are per-port and will only be displayed on the PF
with control of the link (one per physical port). Vadapter stats
will also be displayed for this PF, so distinguish the MAC stats
by adding a prefix of "port_".
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On every adapter there will be one primary PF per adaptor and
one link control PF per port.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change is a stylistic change and does not affect
functionality.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the case where we have multiple functions (PFs and VFs), this
sysfs entry is useful to identify the physical port corresponding
to the function we are interested in.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/phy/amd-xgbe-phy.c
drivers/net/wireless/iwlwifi/Kconfig
include/net/mac80211.h
iwlwifi/Kconfig and mac80211.h were both trivial overlapping
changes.
The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and
the bug fix that happened on the 'net' side is already integrated
into the rest of the amd-xgbe driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
When Rx packet data must be dropped, all the buffers
associated with that Rx packet must be freed. Extend
and rename efx_free_rx_buffer() to efx_free_rx_buffers()
and loop through all the fragments.
By doing so this patch fixes a possible memory leak.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As many issues are encountered at probe time, where MCDI logging can't be
enabled through the sysfs node, this change adds a module parameter
'mcdi_logging_default', which defaults to false. When set to true, newly-
probed functions will have MCDI logging enabled. The setting can
subsequently be changed as normal through the sysfs node.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MCDI tracing is enabled per-function with a sysfs file
/sys/class/net/<NET_DEV>/device/mcdi_logging
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MCDI tracing is conditional on CONFIG_SFC_MCDI_LOGGING, which is enabled
by default.
Each MCDI command will produce a console line like
sfc dom🚌dev:fn ifname: MCDI RPC REQ: xxxxxxxx [yyyyyyyy...]
where xxxxxxxx etc. are the raw MCDI payload in 32-bit hex chunks.
The response will then produce a similar line with "RESP" instead of "REQ",
and containing the MCDI response payload (if any).
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a set_mac_address() NIC-type function for EF10 only, and
use this to set the MAC address on the vadaptor. For Siena and
earlier, the MAC address continues to be set by MC_CMD_SET_MAC;
this is still called on EF10, and including a MAC address in
this command has no effect.
The sriov_mac_address_changed() NIC-type function is no longer
needed on EF10, but it is needed for Siena where it is used to
update the peer address of the PF for VFDI. Change this to use
the new set_mac_address function pointer.
efx_ef10_sriov_mac_address_changed() is no longer called, as VFs
will try to change the MAC address on their vadaptor rather than
trying to change to the context of the PF to alter the vport.
When a VF is running in direct passthrough mode with MAC spoofing
enabled, it will be able to change the MAC address on its vadaptor.
In this case, there is a link to the PF, so find the correct VF in
its ef10_vf array and update the MAC address.
ndo_set_mac_address() can be called during driver unload while
bonding, and in this case the device has already been stopped, so
don't call efx_net_open() to restart it after reconfiguration.
efx->port_enabled is set to false in efx_stop_port(), so it is
indicator of whether the device needs to be restarted.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Exercised with
"ip link set <PF intf> vf <vf_i> state {auto|enable|disable}"
Sets the reporting policy for VF link state to either
- mirror physical link state
- always up
- always down
get VF link state mode in efx_ef10_sriov_get_vf_config
Exercised by
"ip link show <PF intf>";
output will include a line like
vf 0 MAC 12:34:56:78:9a:bc, link-state auto
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>