Convert the normal transmit completion path from dev_kfree_skb_any()
to dev_consume_skb_any() to help keep dropped packet profiling
meaningful.
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should prefer `struct pci_device_id` over `DEFINE_PCI_DEVICE_TABLE` to
meet kernel coding style guidelines. This issue was reported by checkpatch.
A simplified version of the semantic patch that makes this change is as
follows (http://coccinelle.lip6.fr/):
// <smpl>
@@
identifier i;
declarer name DEFINE_PCI_DEVICE_TABLE;
initializer z;
@@
- DEFINE_PCI_DEVICE_TABLE(i)
+ const struct pci_device_id i[]
= z;
// </smpl>
[bhelgaas: add semantic patch]
Signed-off-by: Benoit Taine <benoit.taine@lip6.fr>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
__iowrite64_copy() isn't quite the same as efx_memcpy_64(), but
it looks close enough:
- The length is in units of qwords not bytes
- It never byte-swaps, but that doesn't make a difference now as PIO
is only enabled for x86_64
- It doesn't include any memory barriers, but that's OK as there is a
barrier just before pushing the doorbell
- mlx4_en uses it for the same purpose
Compile-tested only.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the sfc driver code for implementing busy polling.
It adds ndo_busy_poll method and locking between it and napi poll.
It also adds each napi to the napi_hash right after netif_napi_add().
Uses efx_start_eventq and efx_stop_eventq in the self tests.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement per channel software TX and RX packet counters
accessed as ethtool statistics.
This allows confirmation with MAC statistics.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added a counter rx_noskb_drop for failure to allocate an skb.
Summed the per-channel rx_nodesc_trunc counters earlier so that they can
be included in rx_dropped.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Needed to select 40G mode on a 10G/40G capable card.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/core/rtnetlink.c
net/core/skbuff.c
Both conflicts were very simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes:ee45fd92c739
("sfc: Use TX PIO for sufficiently small packets")
The linux net driver uses memcpy_toio() in order to copy into
the PIO buffers.
Even on a 64bit machine this causes 32bit accesses to a write-
combined memory region.
There are hardware limitations that mean that only 64bit
naturally aligned accesses are safe in all cases.
Due to being write-combined memory region two 32bit accesses
may be coalesced to form a 64bit non 64bit aligned access.
Solution was to open-code the memory copy routines using pointers
and to only enable PIO for x86_64 machines.
Not tested on platforms other than x86_64 because this patch
disables the PIO feature on other platforms.
Compile-tested on x86 to ensure that works.
The WARN_ON_ONCE() code in the previous version of this patch
has been moved into the internal sfc debug driver as the
assertion was unnecessary in the upstream kernel code.
This bug fix applies to v3.13 and v3.14 stable branches.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings says:
====================
Pull request: Fixes for new ethtool RSS commands
This addresses several problems I previously identified with the new
ETHTOOL_{G,S}RSSH commands:
1. Missing validation of reserved parameters
2. Vague documentation
3. Use of unnamed magic number
4. No consolidation with existing driver operations
I don't currently have access to suitable network hardware, but have
tested these changes with a dummy driver that can support various
combinations of operations and sizes, together with (a) Debian's ethtool
3.13 (b) ethtool 3.14 with the submitted patch to use ETHTOOL_{G,S}RSSH
and minor adjustment for fixes 1 and 3.
v2: Update RSS operations in vmxnet3 too
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ETHTOOL_{G,S}RXFHINDIR and ETHTOOL_{G,S}RSSH should work for drivers
regardless of whether they expose the hash key, unless you try to
set a hash key for a driver that doesn't expose it.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Conflicts:
drivers/net/bonding/bond_alb.c
drivers/net/ethernet/altera/altera_msgdma.c
drivers/net/ethernet/altera/altera_sgdma.c
net/ipv6/xfrm6_output.c
Several cases of overlapping changes.
The xfrm6_output.c has a bug fix which overlaps the renaming
of skb->local_df to skb->ignore_df.
In the Altera TSE driver cases, the register access cleanups
in net-next overlapped with bug fixes done in net.
Similarly a bug fix to send ALB packets in the bonding driver using
the right source address overlaps with cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
o min_tx_rate puts lower limit on the VF bandwidth. VF is guaranteed
to have a bandwidth of at least this value.
max_tx_rate puts cap on the VF bandwidth. VF can have a bandwidth
of up to this value.
o A new handler set_vf_rate for attr IFLA_VF_RATE has been introduced
which takes 4 arguments:
netdev, VF number, min_tx_rate, max_tx_rate
o ndo_set_vf_rate replaces ndo_set_vf_tx_rate handler.
o Drivers that currently implement ndo_set_vf_tx_rate should now call
ndo_set_vf_rate instead and reject attempt to set a minimum bandwidth
greater than 0 for IFLA_VF_TX_RATE when IFLA_VF_RATE is not yet
implemented by driver.
o If user enters only one of either min_tx_rate or max_tx_rate, then,
userland should read back the other value from driver and set both
for IFLA_VF_RATE.
Drivers that have not yet implemented IFLA_VF_RATE should always
return min_tx_rate as 0 when read from ip tool.
o If both IFLA_VF_TX_RATE and IFLA_VF_RATE options are specified, then
IFLA_VF_RATE should override.
o Idea is to have consistent display of rate values to user.
o Usage example: -
./ip link set p4p1 vf 0 rate 900
./ip link show p4p1
32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT qlen 1000
link/ether 00:0e:1e:08:b0:f0 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 3e:a0:ca:bd:ae:5a, tx rate 900 (Mbps), max_tx_rate 900Mbps
vf 1 MAC f6:c6:7c:3f:3d:6c
vf 2 MAC 56:32:43:98:d7:71
vf 3 MAC d6:be:c3:b5:85:ff
vf 4 MAC ee:a9:9a:1e:19:14
vf 5 MAC 4a:d0:4c:07:52:18
vf 6 MAC 3a:76:44:93:62:f9
vf 7 MAC 82:e9:e7:e3:15:1a
./ip link set p4p1 vf 0 max_tx_rate 300 min_tx_rate 200
./ip link show p4p1
32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT qlen 1000
link/ether 00:0e:1e:08:b0:f0 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 3e:a0:ca:bd:ae:5a, tx rate 300 (Mbps), max_tx_rate 300Mbps,
min_tx_rate 200Mbps
vf 1 MAC f6:c6:7c:3f:3d:6c
vf 2 MAC 56:32:43:98:d7:71
vf 3 MAC d6:be:c3:b5:85:ff
vf 4 MAC ee:a9:9a:1e:19:14
vf 5 MAC 4a:d0:4c:07:52:18
vf 6 MAC 3a:76:44:93:62:f9
vf 7 MAC 82:e9:e7:e3:15:1a
./ip link set p4p1 vf 0 max_tx_rate 600 rate 300
./ip link show p4p1
32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT qlen 1000
link/ether 00:0e:1e:08:b0:f brd ff:ff:ff:ff:ff:ff
vf 0 MAC 3e:a0:ca:bd:ae:5, tx rate 600 (Mbps), max_tx_rate 600Mbps,
min_tx_rate 200Mbps
vf 1 MAC f6:c6:7c:3f:3d:6c
vf 2 MAC 56:32:43:98:d7:71
vf 3 MAC d6:be:c3:b5:85:ff
vf 4 MAC ee:a9:9a:1e:19:14
vf 5 MAC 4a:d0:4c:07:52:18
vf 6 MAC 3a:76:44:93:62:f9
vf 7 MAC 82:e9:e7:e3:15:1a
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net: get rid of SET_ETHTOOL_OPS
Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone.
This does that.
Mostly done via coccinelle script:
@@
struct ethtool_ops *ops;
struct net_device *dev;
@@
- SET_ETHTOOL_OPS(dev, ops);
+ dev->ethtool_ops = ops;
Compile tested only, but I'd seriously wonder if this broke anything.
Suggested-by: Dave Miller <davem@davemloft.net>
Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de>
Acked-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the sfc driver is in legacy interrupt mode (either explicitly by
using interrupt_mode module param or by falling back to it) it will
hit a warning at kernel/irq/manage.c because it will try to free an irq
which wasn't allocated by it in the first place because the MSI(X) irqs are
zero and it'll try to free them unconditionally. So fix it by checking if
we're in legacy mode and freeing the appropriate irqs.
CC: Zenghui Shi <zshi@redhat.com>
CC: Ben Hutchings <ben@decadent.org.uk>
CC: <linux-net-drivers@solarflare.com>
CC: Shradha Shah <sshah@solarflare.com>
CC: David S. Miller <davem@davemloft.net>
Fixes: 1899c111a5 ("sfc: Fix IRQ cleanup in case of a probe failure")
Reported-by: Zenghui Shi <zshi@redhat.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an MCDI command times out (whether or not we find it
completed when we poll), call efx_mcdi_abandon(), which tells
all subsequent MCDI calls to fail-fast, and queues up an FLR.
Because an FLR doesn't lead to receiving any reboot even from
the MC (unlike most other types of reset), we have to call
efx_ef10_reset_mc_allocations.
In efx_start_all(), if a reset (of any kind) is pending, we
bail out.
Without this, attempts to reconfigure (e.g. change mtu) can
cause driver/mc state inconsistency if the first MCDI call
triggers an FLR.
For similar reasons, on EF10, in
efx_reset_down(method=RESET_TYPE_MCDI_TIMEOUT), set the number
of active queues to zero before calling efx_stop_all().
And, on farch, in efx_reset_up(method=RESET_TYPE_MCDI_TIMEOUT),
set active_queues and flushes pending & outstanding to zero.
efx_mcdi_mode_{poll,event}() should not take us out of fail-fast
mode. Instead, this is done by efx_mcdi_reset() after the FLR
completes.
The new FLR reset_type RESET_TYPE_MCDI_TIMEOUT doesn't really
fit into the hierarchy of reset 'scopes' whereby efx_reset()
decides some resets subsume others. Thus, it uses separate logic.
Also, fixed up some inconsistency around RESET_TYPE_MC_BIST,
which was in the wrong place in that hierarchy.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using the "separate_tx_channels=1" module parameter, the TX queues are
initially numbered starting from the first TX-only channel number (after all the
RX-only channels). efx_set_channels() renumbers the queues so that they are
indexed from zero.
On EF10, the TX queues need to be relabelled in this way before calling the
dimension_resources NIC type operation, otherwise the TX queue PIO buffers can be
linked to the wrong VIs when using "separate_tx_channels=1".
Added comments to explain UC/WC mappings for PIO buffers
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates the many PTP Hardware Clock drivers with the
newly introduced field that advertises the number of programmable
pins. Some of these devices do have programmable pins, but the
implementation will have to wait for follow on patches.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.
This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Faster than memcpy/memset on some architectures.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/wireless/ath/ath9k/recv.c
drivers/net/wireless/mwifiex/pcie.c
net/ipv6/sit.c
The SIT driver conflict consists of a bug fix being done by hand
in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper
was created (netdev_alloc_pcpu_stats()) which takes care of this.
The two wireless conflicts were overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
If we receive a PTP event from the NIC when we haven't set up PTP state
in the driver, we attempt to read through a NULL pointer efx->ptp_data,
triggering a panic.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As result of deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers
using these two interfaces need to be updated to use the
new pci_enable_msi_range() and pci_enable_msix_range()
interfaces.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Shradha Shah <sshah@solarflare.com>
Cc: linux-net-drivers@solarflare.com
Cc: netdev@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Acked-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove trailing blank lines in several files.
Use only one blank line between functions.
Add a blank line as a separator in a few places.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't use 'Solarstorm' or 'Solarflare Communications' in full
any more.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Functions such as is_valid_ether_addr() expect u8 *, so use that
instead of char *.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We allocate efx_channel structures with kzalloc() so we don't need to
zero-initialise individual fields in efx_probe_channel(). Further,
this function will be called again during DMA ring resizing and we
should not reset any statistics then.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
EF10 implements option descriptors to switch TX checksum offload
on and off between packets. We could therefore use a single
hardware TX queue per kernel TX queue, although we don't yet.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These DMA descriptor types will only be used by the userland
networking stack.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is defined then NET_IP_ALIGN
will be defined as 0, so this macro is redundant.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is substantial latency in generation and handling of PPS events
from the NIC, which we have to correct for before passing a host
timestamp to the PPS subsystem. We compare clocks with the MC,
giving us two offsets to subtract from the timestamp generated by
pps_get_ts():
(a) Time from the last good sync (where we got host and NIC timestamps
for nearly the same instant) to the time we called pps_get_ts()
(b) Time from NIC top of second to the last good sync
We currently calculate (a) + (b) in a quite confusing way.
Instead, calculate (a) completely, then add (b) to it.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use conventional net_ratelimit() instead.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit ee45fd92c7 ("sfc: Use TX PIO
for sufficiently small packets") introduced the following warning:
drivers/net/ethernet/sfc/tx.c: In function 'efx_enqueue_skb':
drivers/net/ethernet/sfc/tx.c:432:1: warning: label 'finish_packet' defined but not used
Stick the label inside the same #ifdef that the code which calls
it uses. Note that this is only seen for arch that do not set
ARCH_HAS_IOREMAP_WC, such as arm, mips, sparc, ..., as the others
enable the write combining code and hence use the label.
Cc: Jon Cooper <jcooper@solarflare.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of a workaround for a hardware erratum in the SFC9100 family
(SF bug 35388), the TX_DESC_UPD_DWORD register address is also used
for communicating with the event block, and only descriptor pointer
values < 2048 are valid.
If the TX DMA ring size is increased to 4096 descriptors (which the
firmware still allows) then we may write a descriptor pointer
value >= 2048, which has entirely different and undesirable effects!
Limit the TX DMA ring size correctly when this workaround is in
effect.
Fixes: 8127d661e7 ('sfc: Add support for Solarflare SFC9100 family')
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Somehow I transposed these two while bringing the original statistics
support upstream.
Fixes: 99691c4ac1 ('sfc: Add PTP counters to ethtool stats')
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PORT_RESET MC command was NOP in the ef10 firmware hence we are using
ENTITY_RESET to make sure all resource allocations are reset.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following sparse warning:
drivers/net/ethernet/sfc/falcon.c:2601:6: warning:
symbol 'falcon_pull_nic_stats' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drivers should call skb_set_hash to set the hash and its type
in an skbuff.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings says:
====================
Miscellaneous changes for 3.14:
1. Add more information to some WARN messages.
2. Refactor pushing of RSS configuration, from Andrew Rybchenko.
3. Refactor handling of automatic (device address list) vs manual (RX
NFC) MAC filters.
4. Implement clearing of manual RX filters on EF10 when ntuple offload
is disabled.
5. Remove definitions that are unused since the RX buffer allocation
changes, from Andrew Rybchenko.
6. Improve naming of some statistics, from Shradha Shah.
7. Add statistics for PTP support code.
8. Fix insertion of RX drop filters on EF10.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When we insert an filter, the firmware checks that the given RX queue
index is in range even if it will not be used. In case we're
inserting a drop filter, pass the value 0.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings says:
====================
1. Change PTP clock name to 'sfc'.
2. Complete support for hardware timestamping and PTP clock on the
SFC9100 family.
3. Various cleanups for the PTP code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings says:
====================
An assortment of changes for Linux 3.14:
1. Merge the sfc fixes that you have already merged into net.git.
(The branch point for those was such that this does not bring in any
other changes.)
2. Reduce log level for a generally useless warning message, from
Robert Stonehouse.
3. Include BISTs in ethtool offline self-test for EF10 and recover from
BISTs initiated through other functions, from Jon Cooper.
4. Improve a sanity check on RX completions.
5. Avoid incrementing RX dropped count while the interface is down, from
Jon Cooper.
6. Improve hardware sensor naming and log messages, from Edward Cree.
7. Log all unexpected errors returned by firmware, from Edward Cree.
8. Expose another NVRAM partition to userland.
9. Some refactoring of the PTP code in preparation for EF10 support.
10. Various minor cleanups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
These were implemented by Andrew Jackson and Laurence Evans but not
previously included in-tree.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The operation can now fail, so change its return type to int.
Remove the inline wrapper while we're changing the signature.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Currently a higher priority client can remove a lower priority
client's filter with equal match-expression. This might happen if (a)
the higher priority client has a double-free bug, or (b) another
client with sufficient priority replaced and then removed an equal
filter, allowing the low priority client to insert an equal filter.
In neither case does it actually make sense to carry out the removal;
we should say the filter doesn't exist, as the filter currently
present is not the one that the high-priority client is referring to.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Change all the 'stack' naming to 'auto' (or other meaningful term);
the device address list is based on more than just what the network
stack wants, and the no-match filters aren't really what the stack
wants at all.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
MAC filters inserted automatically by the driver, based on the device
address list (EF10) or no-match filters (Siena), should be overridable
at MANUAL or REQUIRED priority. Currently they themselves have
REQUIRED priority and this requires some odd special-casing.
We also can't reliably tell whether such a MAC filter has or has
not been overridden. We just remember that it is wanted by the
stack (RX_STACK flag).
Add another priority level, AUTO, between HINT and MANUAL, and
use this for the automatic filters while they have not been
overridden. Remove the RX_STACK flag. Add an RX_OVER_AUTO
flag which is set only when an AUTO filter has been overridden
(or was requested to be inserted while a higher-priority filter
existed).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The EF10 implementation already does this, and it makes more logical
sense to group the RSS hash key and indirection table together.
Rename the operation to rx_push_rss_config.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
In case of certain hardware and firmware errors it can be useful to
have more context than just the file and line number.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The SFC9100 family has only one clock per controller, shared by all
functions. Therefore only create a clock device under the primary
function, and make all other functions refer to the primary's clock
device.
Since PTP functionality is limited to port 0 and PF 0 on the earlier
SFN[56]322F boards, and we also set the primary flag for that
function, we can make the creation of a clock device conditional only
on this flag.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The primary function of an EF10 controller will share its clock
device with other functions in the same domain (which we call
secondary functions). To this end, we need to associate functions
on the same controller.
We do not control probe order, so allow primary and secondary
functions to appear in any order. Maintain global lists of all
primary functions and of unassociated secondary functions,
and a list of secondary functions on each primary function.
Use the VPD serial number to tell whether functions are part of the
same controller. VPD will not be readable by virtual functions, so
this may need to be revisited later.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The EF10 firmware can optionally insert RX timestamps in the packet
prefix. These only include the clock minor value. We must also
enable periodic time sync events on each event queue which provide
the high bits of the clock value.
[bwh: Combined and rebased several changes.
Added the above description and some sanity checks for inline vs
separate timestamps.
Changed efx_rx_skb_attach_timestamp() to read the packet prefix
from the skb head area.]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We can potentially pull the entire packet contents into the head area
and then free the page it was in. In order to read an inline
timestamp safely, we need to copy the prefix into the head area as
well.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
I added efx_ptp_get_mode() to avoid moving the definition for
efx_ptp_data, since the current PTP mode is needed for
siena.c:siena_set_ptp_hwtstamp.
[bwh: Also move the rx_filters mask, and add kernel-doc]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The clock minor tick on the SFC9100 family is 2^-27 s, not 1 ns.
There are also various pipeline delays which we need to correct for
when interpreting timestamps.
We query the firmware for the clock format and corrections at run-time.
[bwh: Combined and rebased several changes]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We'll be sharing clocks between multiple functions with their own MAC
addresses. The name field is now documented as 'A short "friendly
name" to identify the clock ...' and '... not meant to be a unique
id.' So use the name 'sfc'.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We need a dedicated channel on Siena to ensure we can match up
the separate RX and timestamp events for each PTP packet. We won't
do this for EF10 as timestamps are delivered inline.
Pass a channel index of 0 to MC_CMD_PTP_OP_ENABLE when there is no
dedicated channel.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The MC firmware will return error MC_CMD_ERR_ENOSPC if filter
insertion fails due to lack of resources. The net driver's filter
implementation for Falcon-architecture returns EBUSY. They should
behave consistently, so for EF10 change ENOSPC to EBUSY.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_flush_all() is a really misleading name - it has nothing to do
with e.g. flushing DMA queues. Since it's called immediately after
efx_stop_port() and is highly dependent on what that does, combine
the two functions.
Update comments to explain what this is doing a little better.
Also update an related and erroneous comment in efx_start_port().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Split each of efx_mcdi_rpc, efx_mcdi_rpc_finish, and efx_mcdi_rpc_async into
a normal and a _quiet version; made the former log MCDI errors with
netif_err (and include the raw MCDI error code), and the latter never log
them at all. Changed various callers; any where some errors are expected
(but others are not) call the _quiet version and then if necessary log the
MCDI error themselves. Said logging is done by new efx_mcdi_display_error.
Callers of efx_mcdi_rpc*_quiet functions which may want to log the error
need to ensure that their outbuf is big enough to hold an MCDI error; to
this end, they now use MCDI_DECLARE_BUF_OUT_OR_ERR, which always allocates
at least 8 bytes.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We don't directly control RX ingress on Siena or any later
controllers, and so we cannot prevent packets from entering the RX
datapath while the RX queues are not set up. This results in
the hardware incrementing RX_NODESC_DROP_CNT, but it's not an
error and we should not include it in error stats.
When bringing an interface up or down, pull (or wait for) stats and
count the number of packets that were dropped while the interface was
down. Subtract this from the reported RX dropped count.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The addition of RX event merging support means we don't reliably
detect dropped RX events now. Currently we will only detect them if
the previous event for the RX queue had the CONT bit set.
Only accept RX completion events as merged if the
GET_CAPABILITIES_OUT_RX_BATCHING bit is set in datapath_caps (which it
won't be for the low-latency datapath) and the CONT bit is not set on
the event.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
To run BISTs the MC goes down in to a special mode where it will only
respond to MCDI from the testing PF, and TX, RX and event queues are
torn down. Other PFs get a message as it goes down to tell them it's
going down.
When the other PFs get this message, they check the soft status
register to tell when the MC has rebooted after BIST mode and they can
start recovery.
[bwh: Convert the test result to 1 or -1 as for earlier NICs]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MC firmware is cooperatively multitasking and its scheduler will
send an event when a task yields after running for more than the
expected maximum time. This can be useful for firmware development
but does not usually indicate a serious error and does not help to
detect a lockup (there is a hardware watchdog that does that).
Change the message and reduce log level accordingly.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
There is an as-yet unexplained bug that sometimes prevents (or delays)
the driver seeing the completion event for a completed MCDI request on
the SFC9120. The requested configuration change will have happened
but the driver assumes it to have failed, and this can result in
further failures. We can mitigate this by polling for completion
after unsuccessfully waiting for an event.
Fixes: 8127d661e7 ('sfc: Add support for Solarflare SFC9100 family')
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
rx_prefix_size is 4-bytes aligned on Falcon/Siena (16 bytes), but it is equal
to 14 on EF10. So, it should be taken into account if arch requires IP header
to be 4-bytes aligned (via NET_IP_ALIGN).
Fixes: 8127d661e7 ('sfc: Add support for Solarflare SFC9100 family')
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
There is a single MCDI PTP operation for setting the frequency
adjustment and applying a time offset to the hardware clock. When
applying a time offset we should not change the frequency adjustment.
These two operations can now be requested separately but this requires
a flash firmware update. Keep using the single operation, but
remember and repeat the previous frequency adjustment.
Fixes: 7c236c43b8 ('sfc: Add support for IEEE-1588 PTP')
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
This disables PTP when we bring the interface down to avoid getting
unmatched RX timestamp events, and tries to re-enable it when bringing
the interface up.
[bwh: Make efx_ptp_stop() safe on Falcon. Introduce
efx_ptp_{start,stop}_datapath() functions; we'll expand them later.]
Fixes: 7c236c43b8 ('sfc: Add support for IEEE-1588 PTP')
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
In case of a flood of PTP packets, the timestamp peripheral and MC
firmware on the SFN[56]322F boards may not be able to provide
timestamp events for all packets. Don't complain too much about this.
Fixes: 7c236c43b8 ('sfc: Add support for IEEE-1588 PTP')
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Limit syslog flood if a PTP packet storm occurs.
Fixes: 7c236c43b8 ('sfc: Add support for IEEE-1588 PTP')
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_ptp_is_ptp_tx() must be robust against skbs from raw sockets that
have invalid IPv4 and UDP headers.
Add checks that:
- the transport header has been found
- there is enough space between network and transport header offset
for an IPv4 header
- there is enough space after the transport header offset for a
UDP header
Fixes: 7c236c43b8 ('sfc: Add support for IEEE-1588 PTP')
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings says:
====================
SIOCGHWTSTAMP ioctl
1. Add the SIOCGHWTSTAMP ioctl and update the timestamping
documentation.
2. Implement SIOCGHWTSTAMP in most drivers that support SIOCSHWTSTAMP.
3. Add a test program to exercise SIOC{G,S}HWTSTAMP.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify the code. Avoid race conditions caused by attributes
being created after hwmon device registration. Implicitly
(through hwmon API) add mandatory 'name' sysfs attribute.
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull DMA mask updates from Russell King:
"This series cleans up the handling of DMA masks in a lot of drivers,
fixing some bugs as we go.
Some of the more serious errors include:
- drivers which only set their coherent DMA mask if the attempt to
set the streaming mask fails.
- drivers which test for a NULL dma mask pointer, and then set the
dma mask pointer to a location in their module .data section -
which will cause problems if the module is reloaded.
To counter these, I have introduced two helper functions:
- dma_set_mask_and_coherent() takes care of setting both the
streaming and coherent masks at the same time, with the correct
error handling as specified by the API.
- dma_coerce_mask_and_coherent() which resolves the problem of
drivers forcefully setting DMA masks. This is more a marker for
future work to further clean these locations up - the code which
creates the devices really should be initialising these, but to fix
that in one go along with this change could potentially be very
disruptive.
The last thing this series does is prise away some of Linux's addition
to "DMA addresses are physical addresses and RAM always starts at
zero". We have ARM LPAE systems where all system memory is above 4GB
physical, hence having DMA masks interpreted by (eg) the block layers
as describing physical addresses in the range 0..DMAMASK fails on
these platforms. Santosh Shilimkar addresses this in this series; the
patches were copied to the appropriate people multiple times but were
ignored.
Fixing this also gets rid of some ARM weirdness in the setup of the
max*pfn variables, and brings ARM into line with every other Linux
architecture as far as those go"
* 'for-linus-dma-masks' of git://git.linaro.org/people/rmk/linux-arm: (52 commits)
ARM: 7805/1: mm: change max*pfn to include the physical offset of memory
ARM: 7797/1: mmc: Use dma_max_pfn(dev) helper for bounce_limit calculations
ARM: 7796/1: scsi: Use dma_max_pfn(dev) helper for bounce_limit calculations
ARM: 7795/1: mm: dma-mapping: Add dma_max_pfn(dev) helper function
ARM: 7794/1: block: Rename parameter dma_mask to max_addr for blk_queue_bounce_limit()
ARM: DMA-API: better handing of DMA masks for coherent allocations
ARM: 7857/1: dma: imx-sdma: setup dma mask
DMA-API: firmware/google/gsmi.c: avoid direct access to DMA masks
DMA-API: dcdbas: update DMA mask handing
DMA-API: dma: edma.c: no need to explicitly initialize DMA masks
DMA-API: usb: musb: use platform_device_register_full() to avoid directly messing with dma masks
DMA-API: crypto: remove last references to 'static struct device *dev'
DMA-API: crypto: fix ixp4xx crypto platform device support
DMA-API: others: use dma_set_coherent_mask()
DMA-API: staging: use dma_set_coherent_mask()
DMA-API: usb: use new dma_coerce_mask_and_coherent()
DMA-API: usb: use dma_set_coherent_mask()
DMA-API: parport: parport_pc.c: use dma_coerce_mask_and_coherent()
DMA-API: net: octeon: use dma_coerce_mask_and_coherent()
DMA-API: net: nxp/lpc_eth: use dma_coerce_mask_and_coherent()
...
When using firmware assisted TSO, we use a single DMA mapping for
the linear area of a TSO skb.
We still have to segment the super-packet and insert a descriptor
containing the original headers before each segment of payload, so we
can unmap the linear area only after the last segment is completed.
The unmapping information for the linear area is therefore associated
with the last header descriptor.
We calculate the DMA address to unmap from using the map length and
the invariant that the end of the DMA mapping matches the end of
the data referenced by the last descriptor. But this invariant is
broken when there is TCP payload in the linear area.
Fix this by adding and using an explicit dma_offset field.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Conflicts:
include/linux/netdevice.h
net/core/sock.c
Trivial merge issues.
Removal of "extern" for functions declaration in netdevice.h
at the same time "const" was added to an argument.
Two parallel line additions in net/core/sock.c
Signed-off-by: David S. Miller <davem@davemloft.net>
Although we do not yet enable multiple PFs per port, it is possible
that a board will be reconfigured to enable them while the driver has
not yet been updated to fully support this.
The most obvious problem is that multiple functions may try to set
conflicting link settings. But we will also run into trouble if the
firmware doesn't consider us fully trusted. So, abort probing unless
both the LinkCtrl and Trusted flags are set for this function.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Recognise the new Packet Memory and RX Data Path counters.
The following counters are added:
rx_pm_{trunc,discard}_bb_overflow - burst buffer overflowed. This should not
occur if BB correctly configured.
rx_pm_{trunc,discard}_vfifo_full - not enough space in packet memory. May
indicate RX performance problems.
rx_pm_{trunc,discard}_qbb - dropped by 802.1Qbb early discard mechanism.
Since Qbb is not supported at present, this should not occur.
rx_pm_discard_mapping - 802.1p priority configured to be dropped. This should
not occur in normal operation.
rx_dp_q_disabled_packets - packet was to be delivered to a queue but queue is
disabled. May indicate misconfiguration by the driver.
rx_dp_di_dropped_packets - parser-dispatcher indicated that a packet should be
dropped.
rx_dp_streaming_packets - packet was sent to the RXDP streaming bus, ie. a
filter directed the packet to the MCPU.
rx_dp_emerg_{fetch,wait} - RX datapath had to wait for descriptors to be
loaded. Indicates performance problems but not drops.
These are only provided if the MC firmware has the
PM_AND_RXDP_COUNTERS capability. Otherwise, mask them out.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Previously, efx_ef10_stat_mask returned a static const unsigned long[], which
meant that each possible mask had to be declared statically with
STAT_MASK_BITMAP. Since adding a condition would double the size of the
decision tree, we now create the bitmask dynamically.
To do this, we have two functions efx_ef10_raw_stat_mask, which returns a u64,
and efx_ef10_get_stat_mask, which fills in an unsigned long * argument.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The indices in nic_data->stats need to match the EF10_STAT_whatever
enum values. In efx_nic_update_stats, only mask; gaps are removed in
efx_ef10_update_stats.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Conflicts:
drivers/net/ethernet/emulex/benet/be.h
drivers/net/usb/qmi_wwan.c
drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h
include/net/netfilter/nf_conntrack_synproxy.h
include/net/secure_seq.h
The conflicts are of two varieties:
1) Conflicts with Joe Perches's 'extern' removal from header file
function declarations. Usually it's an argument signature change
or a function being added/removed. The resolutions are trivial.
2) Some overlapping changes in qmi_wwan.c and be.h, one commit adds
a new value, another changes an existing value. That sort of
thing.
Signed-off-by: David S. Miller <davem@davemloft.net>
There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.
Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Signed-off-by: Joe Perches <joe@perches.com>
Replace the following sequence:
dma_set_mask(dev, mask);
dma_set_coherent_mask(dev, mask);
with a call to the new helper dma_set_mask_and_coherent().
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Extend efx_filter_rfs() to map TCP/IPv6 and UDP/IPv6 flows into
efx_filter_spec. These are only supported on EF10; on Falcon and
Siena they will be rejected by efx_farch_filter_from_gen_spec().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Sufficiently small linear packets can be copied into the PIO buffer
with a single call to memcpy_toio(). Non-linear packets require an
intermediate cache-line-sized buffer.
[bwh: I wrote the first version of this, but Jon did the hard work to
handle non-linear packets.]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Try to allocate a segment of PIO buffer to each TX channel. If
allocation fails, log an error but continue.
PIO buffers must be mapped separately from the NIC registers, with
write-combining enabled. Where the host page size is 4K, we could
potentially map each VI's registers and PIO buffer separately.
However, this would add significant complexity, and we also need to
support architectures such as POWER which have a greater page size.
So make a single contiguous write-combining mapping after the
uncacheable mapping, aligned to the host page size, and link PIO
buffers there. Where necessary, allocate additional VIs within
the write-combining mapping purely for access to PIO buffers.
Link all TX buffers to TX queues and the additional VIs in
efx_ef10_dimension_resources() and in efx_ef10_init_nic() after
an MC reboot.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Segmentation remains in the driver, but we generate option descriptors
describing the required packet editing rather than making our own
copies.
Reduce tso_state::ipv4_id to 16 bits, so it doesn't overflow into the
TCP_FLAGS field of the option descriptor.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The SFC9120 MC firmware often takes longer than 20ms to reboot and
update the warm boot count in BIU_MC_SFT_STATUS_REG. A timeout of
250ms is very generous for an MC reboot.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Scheduling a reset following an MC reboot event before waiting for
reboot to complete results in a race that can lead to a state where
must_realloc_vis is false in efx_ef10_fini_dmaq() but the VIs have
been destroyed during the MC reboot.
To avoid MC errors when trying to remove VIs that do not exist, wait
for the MC reboot to complete before scheduling the reset.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings says:
====================
Some bug fixes and future-proofing for the recently added SFC9120
support:
1. Minimal support for the 40G configuration.
2. Disable the incomplete PTP/hardware timestamping support.
3. Reset MAC stats properly after a firmware upgrade.
4. Re-check the datapath firmware capabilities after the controller is
reset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
After an MC reboot, the datapath may be running a different firmware
variant and have different capabilities. It is critical that we know
the current capabilities so that we can pass valid flags to
MC_CMD_INIT_EVQ.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Rename efx_ef10_init_capabilities() to the more specific
efx_ef10_init_datapath_caps().
Stop accepting short responses to MC_CMD_GET_CAPABILITIES; we
don't need to support pre-production firmware.
Move the check for RX prefix support from efx_ef10_probe() into
efx_ef10_init_datapath_caps() and use consistent error messages
for missing TSO support and missing RX prefix support.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
If the MC reboots then the stats it reports to us will have been
reset. We need to reset ours to get efx_update_diff_stat() working
properly.
(This is a re-run of commit 876be083b6 'sfc: Reset driver's
MAC stats after MC reboot seen'.)
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Unlike Siena where timestamping is provided by a peripheral, EF10
delivers RX timestamps in the packet prefix. However the driver
doesn't yet support this.
We are also creating a PHC device for each EF10 function, even though
the clock is really shared between all of them.
Disable hardware PTP/timestamping support until it's complete.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Accept and handle 40G link events.
Accept ethtool link settings of speed == 40000 && duplex, and set the
appropriate MCDI PHY capability.
This does not include reporting of 40G media types, as those have not
yet been assigned numbers in the MCDI protocol.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
It upsets static analyzers when we don't check for allocation failure.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings says:
====================
1. A little more refactoring.
2. Remove the unnecessary use of atomic_t that you pointed out.
3. Add support for starting or queueing firmware requests from atomic
context.
4. Add hwmon support for additional sensors found on some new boards.
5. Add support for the EF10 controller architecture, the SFC9100 family
and specifically the SFC9120 controller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
__GFP_ZERO is an uncommon flag and perhaps is better
not used. static inline dma_zalloc_coherent exists
so convert the uses of dma_alloc_coherent with __GFP_ZERO
to the more common kernel style with zalloc.
Remove memset from the static inline dma_zalloc_coherent
and add just one use of __GFP_ZERO instead.
Trivially reduces the size of the existing uses of
dma_zalloc_coherent.
Realign arguments as appropriate.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the dates for files that have been added to in 2012-2013.
Drop the 'Solarstorm' brand name that's still lingering here.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
This adds support for the EF10 network controller architecture and the
SFC9100 family, starting with SFC9120 'Farmingdale', and bumps the
driver version to 4.0.
New features in the SFC9100 family include:
- Flexible allocation of internal resources to PCIe physical and virtual
functions under firmware control
- RX event merging to reduce DMA writes at high packet rates
- Integrated RX timestamping
- PIO buffers for lower TX latency
- Firmware-driven data path that supports additional offload features
and filter types
- Delivery of packets between functions and to multiple recipients,
allowing firmware to implement a vswitch
- Multiple RX flow hash (RSS) contexts with their own hash keys and
indirection tables
- 40G MAC (single port only)
...not all of which are enabled in this initial driver or the initial
firmware release.
Much of the new code is by Jon Cooper.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Also update comments and assertions in io.h:
- EF10 does not have a general BIU collector and does not have the
bug affecting TIMER_COMMAND_REG[0] on Falcon/Siena
- The WPTR field moved within RX_DESC_UPD_REG and TX_DESC_UPD_REG.
Adjust efx_writed_page() accordingly
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The TX path firmware for EF10 supports 'option descriptors' to control
offloads and various other features. Add a flag and field for these
in struct efx_tx_buffer, and don't treat them as DMA descriptors on
completion.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
On EF10, the firmware will initiate a queue flush in certain
error cases. We need to accept that flush events might appear
at any time after a queue has been initialised, not just when
we try to flush them.
We can handle Falcon-architecture in just the same way.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
RX DMA scatter is always enabled on EF10. Adjust the common RX
completion handling to allow for this.
RX completion events on EF10 include the length used from a single
descriptor, not the cumulative length used. Add a field to struct
efx_rx_queue to hold the cumulative length.
[bwh: Also fix a related comment]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Add the efx_filter_is_mc_recip() function to decide whether a filter
is for a multicast recipient and can coexist with other filters with
the same match values. Update efx_filter_insert_filter() kernel-doc
to explain the conditions for this.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We can set, get and compare-and-exchange without using atomic_t.
Change efx_mcdi_iface::state to the enum type we really wanted it to
be.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Add support for power and current sensors, which need to be named
differently in sysfs. Power sensors also require values to be scaled
between MCDI and sysfs, and have no minimum value.
Add definitions of the power, current, fan, and additional temperature
and voltage sensors found on SFA6902F, SFN7022F and SFN7122F.
(Includes a bug fix from Andrew Jackson.)
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Define a flag for struct efx_rx_buffer and efx_rx_packet() that
indicates packet length must be read from the prefix. If this
is set, read the length in __efx_rx_packet() (when the prefix
should have arrived in cache).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Add a counter for TX merged completion events.
This is implemented in the common TX path, because the NIC event
handlers only know how many descriptors were completed, not how many
packets.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
EF10 uses an entirely different RX prefix format from Falcon-arch.
Extend struct efx_nic_type to describe this.
[bwh: Also replace the magic numbers used for the Falcon-arch RX prefix]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_reset_up() calls efx_nic_type::reconfigure_mac once directly,
then again through efx_start_all() -> efx_start_port() ->
efx->type->reconfigure_mac().
This first call is also made too early to work properly on EF10.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The Huntington MC will reject all MCDI requests after an MC reboot until it sees
one with the NOT_EPOCH flag clear. This flag is set by default for all requests,
and then cleared on the first request after we detect that an MC reboot has
occurred.
The old MCDI_STATUS_DELAY_COUNT gave a timeout of 10ms, which was not long enough
for the driver to detect that a reboot had occurred based on the warm boot count
while calling efx_mcdi_poll_reboot() from the loop in efx_mcdi_ev_death().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Also, since we handle all DMA errors in the same way, merge
RESET_TYPE_(RX|TX)_DESC_FETCH into RESET_TYPE_DMA_ERROR.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Various hardware statistics that are available for Siena are
unavailable or meaningless for Falcon. Huntington adds further to the
NIC-type-specific statistics, as it has different MAC blocks from
Falcon/Siena.
All NIC types still provide most statistics by DMA, and use
little-endian byte order.
Therefore:
1. Add some general utility functions for reporting hardware statistics,
efx_nic_describe_stats() and efx_nic_update_stats().
2. Add an efx_nic_type::describe_stats operation to get the number and
names of statistics, implemented using efx_nic_describe_stats()
3. Change efx_nic_type::update_stats to store the core statistics
(struct rtnl_link_stats64) or full statistics (array of u64) in a
caller-provided buffer. Use efx_nic_update_stats() to aid in the
implementation.
4. Rename struct efx_ethtool_stat to struct efx_sw_stat_desc and
EFX_ETHTOOL_NUM_STATS to EFX_ETHTOOL_SW_STAT_COUNT.
5. Remove efx_nic::mac_stats and struct efx_mac_stats.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Merge the per-NIC-type MTD probe selection and struct efx_mtd_ops into
struct efx_nic_type. Move the implementations into the appropriate
source files.
Several NVRAM functions are now only called from MTD operations which
are now implemented in the same file (falcon.c or mcdi.c). There is no
need for them to be extern, or to be defined at all if CONFIG_SFC_MTD
is not enabled, so move them into the #ifdef CONFIG_SFC_MTD sections
in those files.
Most of the SPI-related definitions are also only used in falcon.c,
so move them there. Put the remainder of spi.h into nic.h (which
previously included it).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Currently we use struct efx_mtd to represent a physical NVRAM device
and struct efx_mtd_partition to represent a partition on that device.
But this only really makes sense for Falcon, as we don't know or care
whether MC-managed NVRAM partitions are on one or more physical
devices. It complicates iteration and provides little benefit.
Therefore:
- Replace the pointer to efx_mtd in mtd_info::priv with a pointer to efx_nic
- Move the falcon_spi_device pointer into the union in struct efx_mtd_partition
- Move the device name to efx_mtd_partition::dev_type_name
- Move the efx_mtd_ops pointer to efx_nic::mtd_ops
- Make efx_nic::mtd_list a list of partitions
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
On Falcon we implement MAC filtering requested by the stack using the
MAC wrapper's single unicast filter and multicast hash filter. Siena
is very similar, though MAC configuration is mediated by the MC.
Since MCDI operations may sleep, reconfiguration is deferred from
ndo_set_rx_mode to a work item. However, it still updates the private
variables describing the filter state synchronously. Contrary to
comments, the later use of these variables is not protected using the
address lock, resulting in race conditions.
Move the state update to a new function
efx_farch_filter_sync_rx_mode() and make the Falcon-arch MAC
configuration functions call that, so that its use is consistently
serialised by the mac_lock.
Invert and rename the promiscuous flag to the more accurate
unicast_filter, and comment that both this and multicast_hash are
not used on EF10.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
MAC filters inserted on request from the stack (ndo_set_rx_mode)
should allow manual steering but not removal. Currently we have a
special case for Siena's all-multicast and all-unicast MAC filters,
but on EF10 we need to allow for steering of precise MAC filters as
well.
The EFX_FILTER_FLAG_RX_STACK flag changes the behaviour of replacement
and removal requests:
- Replacement *of* a filter with this flag never clears the flag but
does change steering and saved priority
- Replacement *by* a filter with this flag only sets the flag but does
not change steering
- Removal with priority < EFX_FILTER_PRI_REQUIRED really resets RX
steering and saved priority
This could support precise MAC filtering on Siena in future.
As a side-benefit, the default MAC filters are hidden from ethtool
until they are steered.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Move the special case for removal of default filters from
efx_farch_filter_table_clear_entry() into a wrapper function,
efx_farch_filter_table_remove(). Move the existence and priority
checks into the latter and use it where appropriate.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Aside from accelerated RFS, there is almost nothing that can be shared
between the filter table implementations for the Falcon architecture
and EF10.
Move the few shared functions into efx.c and rx.c and the rest into
farch.c. Introduce efx_nic_type operations for the implementation and
inline wrapper functions that call these.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Currently every call to efx_farch_filter_table_clear_entry() is
shortly followed by a conditional reset of the table limits. The new
limits (0) are not pushed to hardware until the next filter insertion.
Move both the reset and the hardware reconfiguration into
efx_farch_filter_table_clear_entry(), and add an explanatory comment.
Also, make consistent use of the term 'search limit' for the maximum
number of probes the NIC must make when searching for a filter of a
particular type.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Move the common state from struct efx_filter_state into struct efx_nic.
Rename struct efx_filter_state to efx_farch_filter_state and change
the type of efx_nic::filter_state to void *.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Replace type field with match_flags. Add rss_context and match values
covering of most of what is now in the MCDI protocol.
Change some fields into bitfields so that the structure size doesn't grow
beyond 64 bytes.
Ditch the filter decoding functions as it is now easier to pick apart
the abstract structure.
Rewrite ethtool NFC rule functions to set/get filter match flags and
values directly.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The filter table(s) on EF10 are managed by firmware and will need
almost entirely separate code. Rename the types and functions used
within the existing implementation. The current definition of struct
efx_filter_spec is really implementation-specific, so we need to keep
it. For now, define a separate structure for the internal
representation but leave them identical.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The workarounds that currently use EFX_WORKAROUND_ALWAYS are in
Falcon-specific or Falcon-arch-specific code, so get rid of the
conditions altogether. Add/move comments as appropriate.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
EF10 functions don't have a fixed BAR size, and the minimum is not
large enough for all the queues we might want to allocate. We have to
find out the BAR size at run-time, and therefore phys_addr_channels
and mem_map_size cannot be defined per-NIC-type.
Change efx_nic_type::mem_map_size to a function pointer which is
called to find the wanted memory map size (before probe).
Replace efx_nic_type::phys_addr_channels with efx_nic::max_channels,
to be initialised by the probe function.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
When we poll for MCDI request completion, we don't hold the interface
lock while setting the response fields in struct efx_mcdi_iface.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
MCDI v2 adds a second header dword with wider command and length
fields. It also defines extra error codes.
Change the fallback error number for unknown MCDI error codes from EIO
to EPROTO. EIO is treated as indicating the MCDI transport has failed
and we need to reset the function, which is rather drastic.
v2 error codes and lengths don't fit into completion events, so for a
v2-capable transport, always read the response header rather then
using the event fields.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
EF10 controllers do not have shared memory for communication with the
MC; instead it reads requests and writes responses in host memory,
which allows for longer messages. It is also responsible for all
datapath control operations and hardware resource allocation, which
requires a large number of new commands and adds more possible error
cases. MCDI v2 extends the message header to support this.
Update the MCDI protocol definition header to include v2 lengths,
errors and messages, and a few definitions specific to the
SFC9100 family (codenames Farmingdale and Huntington) which is
the first generation of EF10.
Some messages have been extended, so adjust the code accordingly:
- The request for MC_CMD_DRV_ATTACH now includes a datapath firmware
ID. This is ignored by Siena but we should fill it in anyway,
initially always specifying low-latency datapath.
- The response for MC_CMD_GET_LOOPBACK_MODES now includes a 40G
field. Accept shorter responses that don't include it.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Currently we only translate error codes in efx_mcdi_poll(), but we
also need to do so in efx_mcdi_ev_cpl().
The reason we didn't notice before is that the MC firmware error codes
are mostly taken from Unix/Linux and no translation is necessary on
most architectures. Make sure we notice any future failure by
changing the sign of resprc (matching the kernel convention) and BUG
if it's ever positive at command completion.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Add efx_nic_type operations for the many efx_nic functions that need
to be implemented different on EF10. For now, change most of the
existing efx_nic_*() functions into inline wrappers. As a later step,
we may be able to improve branch prediction for operations used on the
fast path by copying the pointers into each queue/channel structure.
Move the Falcon/Siena implementations to new file farch.c and rename
the functions and static data to use a prefix of 'efx_farch_'.
Move efx_may_push_tx_desc() to nic.h, as the EF10 TX code will also
use it.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Currently efx_stop_datapath() will try to flush our DMA queues (if DMA
is enabled), then finalise software and hardware state for each queue.
However, for EF10 we must ask the MC to finalise each queue, which
implicitly starts flushing it, and then wait for the flush events.
We therefore need to delegate more of this to the NIC type.
Combine all the hardware operations into a new NIC-type operation
efx_nic_type::fini_dmaq, and call this before tearing down the
software state and buffers for all the DMA queues.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_unregister_netdev() should not call efx_release_tx_buffers()
directly, as it is already done when closing the device:
efx_net_stop() -> efx_stop_all() -> efx_stop_datapath() ->
efx_fini_tx_queue() -> efx_release_tx_buffers().
(This was presumably a workaround for a race between efx_stop_all()
and the data path that has since been properly fixed.)
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
rx_queue::enabled guards refill, so rename it to reflect that. Clear
it at the start of the queue teardown process rather than waiting for
the RX queue to be flushed.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We unconditionally acknowledge legacy interrupts just before disabling
them. This workaround is needed on Falcon A1 but probably not on
later chips where the legacy interrupt mechanism is different. It was
also originally done after the IRQ handler was removed, not before.
Restore the original behaviour for Falcon A1 only by doing this
acknowledgement in the efx_nic_type::fini operation.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
There are many problems with the current efx_stop_interrupts() and
efx_start_interrupts():
1. On Siena, it is unsafe to disable the master IRQ enable bit
(DRV_INT_EN_KER) while any IRQ sources are enabled.
2. On EF10 there is no master IRQ enable bit, so we cannot expect to
defer IRQs without tearing down event queues. (Though I don't think
we will need to keep any event queues around while the device is down,
as we do for VFDI on Siena.)
3. synchronize_irq() only waits for a running IRQ handler to finish,
not for any propagation through IRQ controllers. Therefore an IRQ may
still be received and handled after efx_stop_interrupts() returns.
IRQ handlers can then race with channel reallocation.
To fix this:
a. Introduce a software IRQ enable flag. So long as this is clear,
IRQ handlers will only acknowledge IRQs and not touch the channel
structures.
b. Define a new struct efx_msi_context as the context for MSIs. This
is never reallocated and is sufficient to find the software enable
flag and the channel structure. It also includes the channel/IRQ
name, which was previously separated out as it must also not be
reallocated.
c. Split efx_{start,stop}_interrupts() into
efx_{,soft_}_{enable,disable}_interrupts(). The 'soft' functions
don't touch the hardware master enable flag (if it exists) and don't
reinitialise or tear down channels with the keep_eventq flag set.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_process_channel_now() is unneeded since self-tests can rely on
normal NAPI polling. Remove it and all calls to it.
efx_channel::work_pending and efx_channel_processed() are also
unneeded (the latter being the same as efx_nic_eventq_read_ack()).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The EF10 architecture has a very different register layout from
previous controllers, so we'll use separate files for the two sets of
register definitions. Use 'farch' as an abbreviation for
Falcon-architecture.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
On EF10, the firmware is in charge of allocating buffer table entries.
Change struct efx_special_buffer to use a struct efx_buffer member,
so that it can be used with efx_nic_{alloc,free}_buffer() in that
case.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Most call sites for efx_nic_alloc_buffer() are part of the probe or
reconfiguration paths and can allocate with GFP_KERNEL. A few others
should use GFP_NOIO (I think). Only one is in atomic context and
must use the current GFP_ATOMIC.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Move the lowest layer (transport) of the current MCDI code to
per-NIC-type operations.
Introduce a new structure and efx_nic member for MCDI-specific data.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
This should probably be done during MCDI initialisation for any NIC.
Change efx_mcdi_init() to return an error code.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Collect together MCDI port functions from mcdi.c, mcdi_mac.c,
mcdi_phy.c and siena.c. Rename the 'siena' functions accordingly.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We currently require that MCDI request and response lengths are
multiples of 4 bytes, because we will copy dwords in and out of shared
memory and we want to be sure we won't read or write out of bounds.
But all we really need to know is that there is sufficient padding for
that. Also, we should ensure that buffers are dword-aligned, as on
some architectures misaligned access will result in data corruption or
a crash.
Change the buffer type to array-of-efx_dword_t and remove the
requirement that the lengths are multiples of 4.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
A few functions are using heap buffers; change them to use stack
buffers as we really don't need to resort to the heap for a 252
byte buffer in process context.
MC_CMD_MEMCPY is quite weird in that it can use inline data placed in
the request buffer after the array of records. Thus there are two
variable-length arrays and we can't use the normal accessors for
the second. So we have to use _MCDI_PTR() in efx_sriov_memcpy().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We need to access arrays of 16-bit words and 32-bit dwords in MCDI
buffers based on the MCDI protocol definitions.
We should also be able to read and write fields within structures,
without specifying an array index each time. So add MCDI_FIELD()
and make MCDI_ARRAY_FIELD() use it. Also add MCDI_SET_FIELD().
Split MCDI_ARRAY_PTR() into MCDI_ARRAY_STRUCT_PTR() and
_MCDI_ARRAY_PTR(), which are currently identical but will diverge in
later changes.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Add _MCDI_DWORD() which yields an lvalue for the given dword field
and change MCDI_DWORD(), MCDI_SET_DWORD() and MCDI_QWORD() to use it.
Fold the rather trivial MCDI_PTR2() into MCDI_PTR() and _MCDI_DWORD().
Remove MCDI_SET_DWORD2() and MCDI_QWORD2(). MCDI_DWORD2() should also
go, but it still has one user which we'll get rid of later.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
MCDI_DECLARE_BUF declares a variable as an MCDI buffer of the
requested length, adding any necessary padding.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
commit 385904f819 ('sfc: Don't use
efx_filter_{build,hash,increment}() for default MAC filters') used the
wrong name to find the index of default RX MAC filters at insertion/
update time. This could result in memory corruption and would in any
case silently fail to update the filter.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Received packets are only scattered if this is enabled in both the
matching filter and the receiving queue. This was not being done for
filters inserted for RFS, so any packet requiring more than a single
descriptor was dropped.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 2768935a46 ('sfc: reuse pages to avoid DMA mapping/unmapping
costs') did not fully take account of DMA scattering which was
introduced immediately before. If a received packet is invalid and
must be discarded, we only drop a reference to the first buffer's
page, but we need to drop a reference for each buffer the packet
used.
I think this bug was missed partly because efx_recycle_rx_buffers()
was not renamed and so no longer does what its name says. It does not
change the state of buffers, but only prepares the underlying pages
for recycling. Rename it accordingly.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/freescale/fec_main.c
drivers/net/ethernet/renesas/sh_eth.c
net/ipv4/gre.c
The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list)
and the splitting of the gre.c code into seperate files.
The FEC conflict was two sets of changes adding ethtool support code
in an "!CONFIG_M5272" CPP protected block.
Finally the sh_eth.c conflict was between one commit add bits set
in the .eesr_err_check mask whilst another commit removed the
.tx_error_check member and assignments.
Signed-off-by: David S. Miller <davem@davemloft.net>
The device::iommu_group field may be set even if no IOMMU is in use.
iommu_present() is still a better indicator, although it doesn't tell
us whether *our* device is affected.
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The lifetime of an irq_cpu_rmap is odd: we have to allocate it before
installing IRQ handlers and free it before removing the IRQ handlers.
As a result of this asymmetry, it was omitted from some failure paths.
On another failure path, we could try to remove IRQ handlers we
had not yet installed.
Move the irq_cpu_rmap allocation and freeing alongside IRQ handler
installation and removal, in efx_nic_{init,fini}_interrupts().
Count the number of IRQ handlers successfully installed and only
remove those on the failure path.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
GRO can handle non-TCP packets and pass them up without coalescing,
but it has to do some extra work to parse the packet which we can
bypass using the hardware parse result. (This condition yields a
false negative for TCP/IPv6 packets received by Falcon, but its
performance is already poor in that case due to lack of checksum
offload.)
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
As far as I know, the hardware doesn't support matching on both IP
fields and vlan tag, but it can at least match on the IP fields.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The kernel can generate software receive timestamps and we should
report those for all ports regardless of hardware capabilities.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
PCI legacy interrupts are level-triggered, and we cannot mask them up
on an isolated device. Instead, disable the IRQ at the controller
until we have recovered.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Driver probe currently results in:
WARNING: at drivers/base/core.c:576 device_create_file+0x57/0x7e()
Attribute phy_type: write permission without 'store'
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should not use net_device::dev_id to indicate the port number, as
this affects the way the local part of IPv6 addresses is normally
generated.
This field was intended for use where multiple devices may share a
single assigned MAC address and need to have different IPv6 addresses.
Siena's two ports each have their own MAC addresses.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far, only net_device * could be passed along with netdevice notifier
event. This patch provides a possibility to pass custom structure
able to provide info that event listener needs to know.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
v2->v3: fix typo on simeth
shortened dev_getter
shortened notifier_info struct name
v1->v2: fix notifier_call parameter in call_netdevice_notifier()
Signed-off-by: David S. Miller <davem@davemloft.net>
efx_start_datapath() asserts that we can fit 2 RX scatter buffers plus
a software structure, each appropriately aligned, into a single page.
Where L1_CACHE_BYTES == 256 and PAGE_SIZE == 4096, which is the case
on s390, this assertion fails.
The current scatter buffer size is also not a multiple of 64 or 128,
which are more common cache line sizes. If we can make both the start
and end of a scatter buffer cache-aligned, this will reduce the need
for read-modify-write operations on inter- processor links.
Fix the alignment by reducing EFX_RX_USR_BUF_SIZE to 2048 - 256 ==
1792. (We could use 2048 - L1_CACHE_BYTES, but EFX_RX_USR_BUF_SIZE
also affects user-level networking where a larger amount of
housekeeping data may be needed. Although this version of the driver
does not support user-level networking, I prefer to keep scattering
behaviour consistent with the out-of-tree version.)
This still doesn't fix the s390 build because like most architectures
it has NET_IP_ALIGN == 2. When NET_IP_ALIGN != 0 we cannot achieve
cache line alignment at either the start or end of a scatter buffer,
so there is actually no point in padding the buffers to a multiple of
the cache line size. All we need is 4-byte alignment of the network
header, so do that.
Adjust the assertions accordingly.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The two architectures that define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
(powerpc and x86) now both define NET_IP_ALIGN as 0, so there is no
need for this optimisation any more.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of error, the function ptp_clock_register() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check should
be replaced with IS_ERR().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull i2c changes from Wolfram Sang:
- an arbitration driver. While the driver is quite simple, it caused
discussion if we need additional arbitration on top of the one
specified in the I2C standard. Conclusion is that I accept a few
generic mechanisms, but not very specific ones.
- the core lost the detach_adapter() call. It has no users anymore and
was in the way for other cleanups. attach_adapter() is sadly still
there since there are users waiting to be converted.
- the core gained a bus recovery infrastructure. I2C defines a way to
recover if the data line is stalled. This mechanism is now in the
core and drivers can now pass some data to make use of it.
- bigger driver cleanups for designware, s3c2410
- removing superfluous refcounting from drivers
- removing Ben Dooks as second maintainer due to inactivity. Thanks
for all your work so far, Ben!
- bugfixes, feature additions, devicetree fixups, simplifications...
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (38 commits)
i2c: xiic: must always write 16-bit words to TX_FIFO
i2c: octeon: use HZ in timeout value
i2c: octeon: Fix i2c fail problem when a process is terminated by a signal
i2c: designware-pci: drop superfluous {get|put}_device
i2c: designware-plat: drop superfluous {get|put}_device
i2c: davinci: drop superfluous {get|put}_device
MAINTAINERS: Ben Dooks is inactive regarding I2C
i2c: mux: Add i2c-arb-gpio-challenge 'mux' driver
i2c: at91: convert to dma_request_slave_channel_compat()
i2c: mxs: do error checking and handling in PIO mode
i2c: mxs: remove races in PIO code
i2c-designware: switch to use runtime PM autosuspend
i2c-designware: use usleep_range() in the busy-loop
i2c-designware: enable/disable the controller properly
i2c-designware: use dynamic adapter numbering on Lynxpoint
i2c-designware-pci: use managed functions pcim_* and devm_*
i2c-designware-pci: use dev_err() instead of printk()
i2c-designware: move to managed functions (devm_*)
i2c: remove CONFIG_HOTPLUG ifdefs
i2c: s3c2410: Add SMBus emulation for block read
...
Pull networking updates from David Miller:
"Highlights (1721 non-merge commits, this has to be a record of some
sort):
1) Add 'random' mode to team driver, from Jiri Pirko and Eric
Dumazet.
2) Make it so that any driver that supports configuration of multiple
MAC addresses can provide the forwarding database add and del
calls by providing a default implementation and hooking that up if
the driver doesn't have an explicit set of handlers. From Vlad
Yasevich.
3) Support GSO segmentation over tunnels and other encapsulating
devices such as VXLAN, from Pravin B Shelar.
4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.
5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
Dukkipati.
6) In the PHY layer, allow supporting wake-on-lan in situations where
the PHY registers have to be written for it to be configured.
Use it to support wake-on-lan in mv643xx_eth.
From Michael Stapelberg.
7) Significantly improve firewire IPV6 support, from YOSHIFUJI
Hideaki.
8) Allow multiple packets to be sent in a single transmission using
network coding in batman-adv, from Martin Hundebøll.
9) Add support for T5 cxgb4 chips, from Santosh Rastapur.
10) Generalize the VXLAN forwarding tables so that there is more
flexibility in configurating various aspects of the endpoints.
From David Stevens.
11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
from Dmitry Kravkov.
12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
Neira Ayuso.
13) Start adding networking selftests.
14) In situations of overload on the same AF_PACKET fanout socket, or
per-cpu packet receive queue, minimize drop by distributing the
load to other cpus/fanouts. From Willem de Bruijn and Eric
Dumazet.
15) Add support for new payload offset BPF instruction, from Daniel
Borkmann.
16) Convert several drivers over to mdoule_platform_driver(), from
Sachin Kamat.
17) Provide a minimal BPF JIT image disassembler userspace tool, from
Daniel Borkmann.
18) Rewrite F-RTO implementation in TCP to match the final
specification of it in RFC4138 and RFC5682. From Yuchung Cheng.
19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
you like netlink, so I implemented netlink dumping of netlink
sockets.") From Andrey Vagin.
20) Remove ugly passing of rtnetlink attributes into rtnl_doit
functions, from Thomas Graf.
21) Allow userspace to be able to see if a configuration change occurs
in the middle of an address or device list dump, from Nicolas
Dichtel.
22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
Frederic Sowa.
23) Increase accuracy of packet length used by packet scheduler, from
Jason Wang.
24) Beginning set of changes to make ipv4/ipv6 fragment handling more
scalable and less susceptible to overload and locking contention,
from Jesper Dangaard Brouer.
25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
instead. From Hong Zhiguo.
26) Optimize route usage in IPVS by avoiding reference counting where
possible, from Julian Anastasov.
27) Convert IPVS schedulers to RCU, also from Julian Anastasov.
28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
Eitzenberger.
29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
nfnetlink_log, and nfnetlink_queue. From Gao feng.
30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.
31) Support several new r8169 chips, from Hayes Wang.
32) Support tokenized interface identifiers in ipv6, from Daniel
Borkmann.
33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.
34) Add 802.1ad vlan offload support, from Patrick McHardy.
35) Support mmap() based netlink communication, also from Patrick
McHardy.
36) Support HW timestamping in mlx4 driver, from Amir Vadai.
37) Rationalize AF_PACKET packet timestamping when transmitting, from
Willem de Bruijn and Daniel Borkmann.
38) Bring parity to what's provided by /proc/net/packet socket dumping
and the info provided by netlink socket dumping of AF_PACKET
sockets. From Nicolas Dichtel.
39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
Poirier"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
filter: fix va_list build error
af_unix: fix a fatal race with bit fields
bnx2x: Prevent memory leak when cnic is absent
bnx2x: correct reading of speed capabilities
net: sctp: attribute printl with __printf for gcc fmt checks
netlink: kconfig: move mmap i/o into netlink kconfig
netpoll: convert mutex into a semaphore
netlink: Fix skb ref counting.
net_sched: act_ipt forward compat with xtables
mlx4_en: fix a build error on 32bit arches
Revert "bnx2x: allow nvram test to run when device is down"
bridge: avoid OOPS if root port not found
drivers: net: cpsw: fix kernel warn on cpsw irq enable
sh_eth: use random MAC address if no valid one supplied
3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
tg3: fix to append hardware time stamping flags
unix/stream: fix peeking with an offset larger than data in queue
unix/dgram: fix peeking with an offset larger than data in queue
unix/dgram: peek beyond 0-sized skbs
openvswitch: Remove unneeded ovs_netdev_get_ifindex()
...
Conflicts:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
drivers/net/ethernet/emulex/benet/be.h
include/net/tcp.h
net/mac802154/mac802154.h
Most conflicts were minor overlapping stuff.
The be2net driver brought in some fixes that added __vlan_put_tag
calls, which in net-next take an additional argument.
Signed-off-by: David S. Miller <davem@davemloft.net>
efx_mcdi_get_board_cfg() uses a buffer for the firmware response that
is only large enough to hold subtypes for the originally defined set
of NVRAM partitions. Longer responses are truncated, and we may read
off the end of the buffer when copying out subtypes for additional
partitions. In particular, this can result in the MTD partition for
an FPGA bitfile being named e.g. 'eth5 sfc_fpga:00' when it should be
'eth5 sfc_fpga:01'. This means the firmware update tool (sfupdate)
can't tell which bitfile should be written to the partition.
Correct the response buffer size.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
i2c_del_adapter() always returns 0. So all checks testing whether it will be
non zero will always evaluate to false and the conditional code is dead code.
This patch updates all callers of i2c_del_mux_adapter() to ignore the return
value and assume that it will always succeed (which it will). In a subsequent
patch the return type of i2c_del_adapter() will be made void.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Reviewed-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Trivial sparse detected functions that should be static.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce the number of calls required to alloc
a zeroed block of memory.
Trivially reduces overall object size.
Other changes around these removals
o Neaten call argument alignment
o Remove an unnecessary OOM message after dma_alloc_coherent failure
o Remove unnecessary gfp_t stack variable
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using TX push when notifying the NIC of multiple new descriptors in
the ring will very occasionally cause the TX DMA engine to re-use an
old descriptor. This can result in a duplicated or partly duplicated
packet (new headers with old data), or an IOMMU page fault. This does
not happen when the pushed descriptor is the only one written.
TX push also provides little latency benefit when a packet requires
more than one descriptor.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Allocating 2 buffers per page is insanely inefficient when MTU is 1500
and PAGE_SIZE is 64K (as it usually is on POWER). Allocate as many as
we can fit, and choose the refill batch size at run-time so that we
still always use a whole page at once.
[bwh: Fix loop condition to allow for compound pages; rebase]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
On POWER systems, DMA mapping/unmapping operations are very expensive.
These changes reduce these costs by trying to reuse DMA mapped pages.
After all the buffers associated with a page have been processed and
passed up, the page is placed into a ring (if there is room). For
each page that is required for a refill operation, a page in the ring
is examined to determine if its page count has fallen to 1, ie. the
kernel has released its reference to these packets. If this is the
case, the page can be immediately added back into the RX descriptor
ring, without having to re-map it for DMA.
If the kernel is still holding a reference to this page, it is removed
from the ring and unmapped for DMA. Then a new page, which can
immediately be used by RX buffers in the descriptor ring, is allocated
and DMA mapped.
The time a page needs to spend in the recycle ring before the kernel
has released its page references is based on the number of buffers
that use this page. As large pages can hold more RX buffers, the RX
recycle ring can be shorter. This reduces memory usage on POWER
systems, while maintaining the performance gain achieved by recycling
pages, following the driver change to pack more than two RX buffers
into large pages.
When an IOMMU is not present, the recycle ring can be small to reduce
memory usage, since DMA mapping operations are inexpensive.
With a small recycle ring, attempting to refill the descriptor queue
with more buffers than the equivalent size of the recycle ring could
ultimately lead to memory leaks if page entries in the recycle ring
were overwritten. To prevent this, the check to see if the recycle
ring is full is changed to check if the next entry to be written is
NULL.
[bwh: Combine and rebase several commits so this is complete
before the following buffer-packing changes. Remove module
parameter.]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Enable RX DMA scattering iff an RX buffer large enough for the current
MTU will not fit into a single page and the NIC supports DMA
scattering for kernel-mode RX queues.
On Falcon and Siena, the RX_USR_BUF_SIZE field is used as the DMA
limit for both all RX queues with scatter enabled. Set it to 1824,
matching what Onload uses now.
Maintain a statistic for frames truncated due to lack of descriptors
(rx_nodesc_trunc). This is distinct from rx_frm_trunc which may be
incremented when scattering is disabled and implies an over-length
frame.
Whenever an MTU change causes scattering to be turned on or off,
update filters that point to the PF queues, but leave others
unchanged, as VF drivers assume scattering is off.
Add n_frags parameters to various functions, and make them iterate:
- efx_rx_packet()
- efx_recycle_rx_buffers()
- efx_rx_mk_skb()
- efx_rx_deliver()
Make efx_handle_rx_event() responsible for updating
efx_rx_queue::removed_count.
Change the RX pipeline state to a starting ring index and number of
fragments, and make __efx_rx_packet() responsible for clearing it.
Based on earlier versions by David Riddoch and Jon Cooper.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Adjust rx_buf->page_offset when we eat the RX hash prefix. Remove
efx_rx_buf_offset(), which is now redundant.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>