Add efx_nic_type operations for the many efx_nic functions that need
to be implemented different on EF10. For now, change most of the
existing efx_nic_*() functions into inline wrappers. As a later step,
we may be able to improve branch prediction for operations used on the
fast path by copying the pointers into each queue/channel structure.
Move the Falcon/Siena implementations to new file farch.c and rename
the functions and static data to use a prefix of 'efx_farch_'.
Move efx_may_push_tx_desc() to nic.h, as the EF10 TX code will also
use it.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Currently efx_stop_datapath() will try to flush our DMA queues (if DMA
is enabled), then finalise software and hardware state for each queue.
However, for EF10 we must ask the MC to finalise each queue, which
implicitly starts flushing it, and then wait for the flush events.
We therefore need to delegate more of this to the NIC type.
Combine all the hardware operations into a new NIC-type operation
efx_nic_type::fini_dmaq, and call this before tearing down the
software state and buffers for all the DMA queues.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_unregister_netdev() should not call efx_release_tx_buffers()
directly, as it is already done when closing the device:
efx_net_stop() -> efx_stop_all() -> efx_stop_datapath() ->
efx_fini_tx_queue() -> efx_release_tx_buffers().
(This was presumably a workaround for a race between efx_stop_all()
and the data path that has since been properly fixed.)
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
rx_queue::enabled guards refill, so rename it to reflect that. Clear
it at the start of the queue teardown process rather than waiting for
the RX queue to be flushed.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We unconditionally acknowledge legacy interrupts just before disabling
them. This workaround is needed on Falcon A1 but probably not on
later chips where the legacy interrupt mechanism is different. It was
also originally done after the IRQ handler was removed, not before.
Restore the original behaviour for Falcon A1 only by doing this
acknowledgement in the efx_nic_type::fini operation.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
There are many problems with the current efx_stop_interrupts() and
efx_start_interrupts():
1. On Siena, it is unsafe to disable the master IRQ enable bit
(DRV_INT_EN_KER) while any IRQ sources are enabled.
2. On EF10 there is no master IRQ enable bit, so we cannot expect to
defer IRQs without tearing down event queues. (Though I don't think
we will need to keep any event queues around while the device is down,
as we do for VFDI on Siena.)
3. synchronize_irq() only waits for a running IRQ handler to finish,
not for any propagation through IRQ controllers. Therefore an IRQ may
still be received and handled after efx_stop_interrupts() returns.
IRQ handlers can then race with channel reallocation.
To fix this:
a. Introduce a software IRQ enable flag. So long as this is clear,
IRQ handlers will only acknowledge IRQs and not touch the channel
structures.
b. Define a new struct efx_msi_context as the context for MSIs. This
is never reallocated and is sufficient to find the software enable
flag and the channel structure. It also includes the channel/IRQ
name, which was previously separated out as it must also not be
reallocated.
c. Split efx_{start,stop}_interrupts() into
efx_{,soft_}_{enable,disable}_interrupts(). The 'soft' functions
don't touch the hardware master enable flag (if it exists) and don't
reinitialise or tear down channels with the keep_eventq flag set.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_process_channel_now() is unneeded since self-tests can rely on
normal NAPI polling. Remove it and all calls to it.
efx_channel::work_pending and efx_channel_processed() are also
unneeded (the latter being the same as efx_nic_eventq_read_ack()).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The EF10 architecture has a very different register layout from
previous controllers, so we'll use separate files for the two sets of
register definitions. Use 'farch' as an abbreviation for
Falcon-architecture.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
On EF10, the firmware is in charge of allocating buffer table entries.
Change struct efx_special_buffer to use a struct efx_buffer member,
so that it can be used with efx_nic_{alloc,free}_buffer() in that
case.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Most call sites for efx_nic_alloc_buffer() are part of the probe or
reconfiguration paths and can allocate with GFP_KERNEL. A few others
should use GFP_NOIO (I think). Only one is in atomic context and
must use the current GFP_ATOMIC.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Move the lowest layer (transport) of the current MCDI code to
per-NIC-type operations.
Introduce a new structure and efx_nic member for MCDI-specific data.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
This should probably be done during MCDI initialisation for any NIC.
Change efx_mcdi_init() to return an error code.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Collect together MCDI port functions from mcdi.c, mcdi_mac.c,
mcdi_phy.c and siena.c. Rename the 'siena' functions accordingly.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We currently require that MCDI request and response lengths are
multiples of 4 bytes, because we will copy dwords in and out of shared
memory and we want to be sure we won't read or write out of bounds.
But all we really need to know is that there is sufficient padding for
that. Also, we should ensure that buffers are dword-aligned, as on
some architectures misaligned access will result in data corruption or
a crash.
Change the buffer type to array-of-efx_dword_t and remove the
requirement that the lengths are multiples of 4.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
A few functions are using heap buffers; change them to use stack
buffers as we really don't need to resort to the heap for a 252
byte buffer in process context.
MC_CMD_MEMCPY is quite weird in that it can use inline data placed in
the request buffer after the array of records. Thus there are two
variable-length arrays and we can't use the normal accessors for
the second. So we have to use _MCDI_PTR() in efx_sriov_memcpy().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We need to access arrays of 16-bit words and 32-bit dwords in MCDI
buffers based on the MCDI protocol definitions.
We should also be able to read and write fields within structures,
without specifying an array index each time. So add MCDI_FIELD()
and make MCDI_ARRAY_FIELD() use it. Also add MCDI_SET_FIELD().
Split MCDI_ARRAY_PTR() into MCDI_ARRAY_STRUCT_PTR() and
_MCDI_ARRAY_PTR(), which are currently identical but will diverge in
later changes.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Add _MCDI_DWORD() which yields an lvalue for the given dword field
and change MCDI_DWORD(), MCDI_SET_DWORD() and MCDI_QWORD() to use it.
Fold the rather trivial MCDI_PTR2() into MCDI_PTR() and _MCDI_DWORD().
Remove MCDI_SET_DWORD2() and MCDI_QWORD2(). MCDI_DWORD2() should also
go, but it still has one user which we'll get rid of later.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
MCDI_DECLARE_BUF declares a variable as an MCDI buffer of the
requested length, adding any necessary padding.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
commit 385904f819 ('sfc: Don't use
efx_filter_{build,hash,increment}() for default MAC filters') used the
wrong name to find the index of default RX MAC filters at insertion/
update time. This could result in memory corruption and would in any
case silently fail to update the filter.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Received packets are only scattered if this is enabled in both the
matching filter and the receiving queue. This was not being done for
filters inserted for RFS, so any packet requiring more than a single
descriptor was dropped.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 2768935a46 ('sfc: reuse pages to avoid DMA mapping/unmapping
costs') did not fully take account of DMA scattering which was
introduced immediately before. If a received packet is invalid and
must be discarded, we only drop a reference to the first buffer's
page, but we need to drop a reference for each buffer the packet
used.
I think this bug was missed partly because efx_recycle_rx_buffers()
was not renamed and so no longer does what its name says. It does not
change the state of buffers, but only prepares the underlying pages
for recycling. Rename it accordingly.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/freescale/fec_main.c
drivers/net/ethernet/renesas/sh_eth.c
net/ipv4/gre.c
The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list)
and the splitting of the gre.c code into seperate files.
The FEC conflict was two sets of changes adding ethtool support code
in an "!CONFIG_M5272" CPP protected block.
Finally the sh_eth.c conflict was between one commit add bits set
in the .eesr_err_check mask whilst another commit removed the
.tx_error_check member and assignments.
Signed-off-by: David S. Miller <davem@davemloft.net>
The device::iommu_group field may be set even if no IOMMU is in use.
iommu_present() is still a better indicator, although it doesn't tell
us whether *our* device is affected.
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The lifetime of an irq_cpu_rmap is odd: we have to allocate it before
installing IRQ handlers and free it before removing the IRQ handlers.
As a result of this asymmetry, it was omitted from some failure paths.
On another failure path, we could try to remove IRQ handlers we
had not yet installed.
Move the irq_cpu_rmap allocation and freeing alongside IRQ handler
installation and removal, in efx_nic_{init,fini}_interrupts().
Count the number of IRQ handlers successfully installed and only
remove those on the failure path.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
GRO can handle non-TCP packets and pass them up without coalescing,
but it has to do some extra work to parse the packet which we can
bypass using the hardware parse result. (This condition yields a
false negative for TCP/IPv6 packets received by Falcon, but its
performance is already poor in that case due to lack of checksum
offload.)
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
As far as I know, the hardware doesn't support matching on both IP
fields and vlan tag, but it can at least match on the IP fields.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The kernel can generate software receive timestamps and we should
report those for all ports regardless of hardware capabilities.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
PCI legacy interrupts are level-triggered, and we cannot mask them up
on an isolated device. Instead, disable the IRQ at the controller
until we have recovered.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Driver probe currently results in:
WARNING: at drivers/base/core.c:576 device_create_file+0x57/0x7e()
Attribute phy_type: write permission without 'store'
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should not use net_device::dev_id to indicate the port number, as
this affects the way the local part of IPv6 addresses is normally
generated.
This field was intended for use where multiple devices may share a
single assigned MAC address and need to have different IPv6 addresses.
Siena's two ports each have their own MAC addresses.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far, only net_device * could be passed along with netdevice notifier
event. This patch provides a possibility to pass custom structure
able to provide info that event listener needs to know.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
v2->v3: fix typo on simeth
shortened dev_getter
shortened notifier_info struct name
v1->v2: fix notifier_call parameter in call_netdevice_notifier()
Signed-off-by: David S. Miller <davem@davemloft.net>
efx_start_datapath() asserts that we can fit 2 RX scatter buffers plus
a software structure, each appropriately aligned, into a single page.
Where L1_CACHE_BYTES == 256 and PAGE_SIZE == 4096, which is the case
on s390, this assertion fails.
The current scatter buffer size is also not a multiple of 64 or 128,
which are more common cache line sizes. If we can make both the start
and end of a scatter buffer cache-aligned, this will reduce the need
for read-modify-write operations on inter- processor links.
Fix the alignment by reducing EFX_RX_USR_BUF_SIZE to 2048 - 256 ==
1792. (We could use 2048 - L1_CACHE_BYTES, but EFX_RX_USR_BUF_SIZE
also affects user-level networking where a larger amount of
housekeeping data may be needed. Although this version of the driver
does not support user-level networking, I prefer to keep scattering
behaviour consistent with the out-of-tree version.)
This still doesn't fix the s390 build because like most architectures
it has NET_IP_ALIGN == 2. When NET_IP_ALIGN != 0 we cannot achieve
cache line alignment at either the start or end of a scatter buffer,
so there is actually no point in padding the buffers to a multiple of
the cache line size. All we need is 4-byte alignment of the network
header, so do that.
Adjust the assertions accordingly.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The two architectures that define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
(powerpc and x86) now both define NET_IP_ALIGN as 0, so there is no
need for this optimisation any more.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of error, the function ptp_clock_register() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check should
be replaced with IS_ERR().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull i2c changes from Wolfram Sang:
- an arbitration driver. While the driver is quite simple, it caused
discussion if we need additional arbitration on top of the one
specified in the I2C standard. Conclusion is that I accept a few
generic mechanisms, but not very specific ones.
- the core lost the detach_adapter() call. It has no users anymore and
was in the way for other cleanups. attach_adapter() is sadly still
there since there are users waiting to be converted.
- the core gained a bus recovery infrastructure. I2C defines a way to
recover if the data line is stalled. This mechanism is now in the
core and drivers can now pass some data to make use of it.
- bigger driver cleanups for designware, s3c2410
- removing superfluous refcounting from drivers
- removing Ben Dooks as second maintainer due to inactivity. Thanks
for all your work so far, Ben!
- bugfixes, feature additions, devicetree fixups, simplifications...
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (38 commits)
i2c: xiic: must always write 16-bit words to TX_FIFO
i2c: octeon: use HZ in timeout value
i2c: octeon: Fix i2c fail problem when a process is terminated by a signal
i2c: designware-pci: drop superfluous {get|put}_device
i2c: designware-plat: drop superfluous {get|put}_device
i2c: davinci: drop superfluous {get|put}_device
MAINTAINERS: Ben Dooks is inactive regarding I2C
i2c: mux: Add i2c-arb-gpio-challenge 'mux' driver
i2c: at91: convert to dma_request_slave_channel_compat()
i2c: mxs: do error checking and handling in PIO mode
i2c: mxs: remove races in PIO code
i2c-designware: switch to use runtime PM autosuspend
i2c-designware: use usleep_range() in the busy-loop
i2c-designware: enable/disable the controller properly
i2c-designware: use dynamic adapter numbering on Lynxpoint
i2c-designware-pci: use managed functions pcim_* and devm_*
i2c-designware-pci: use dev_err() instead of printk()
i2c-designware: move to managed functions (devm_*)
i2c: remove CONFIG_HOTPLUG ifdefs
i2c: s3c2410: Add SMBus emulation for block read
...
Pull networking updates from David Miller:
"Highlights (1721 non-merge commits, this has to be a record of some
sort):
1) Add 'random' mode to team driver, from Jiri Pirko and Eric
Dumazet.
2) Make it so that any driver that supports configuration of multiple
MAC addresses can provide the forwarding database add and del
calls by providing a default implementation and hooking that up if
the driver doesn't have an explicit set of handlers. From Vlad
Yasevich.
3) Support GSO segmentation over tunnels and other encapsulating
devices such as VXLAN, from Pravin B Shelar.
4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.
5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
Dukkipati.
6) In the PHY layer, allow supporting wake-on-lan in situations where
the PHY registers have to be written for it to be configured.
Use it to support wake-on-lan in mv643xx_eth.
From Michael Stapelberg.
7) Significantly improve firewire IPV6 support, from YOSHIFUJI
Hideaki.
8) Allow multiple packets to be sent in a single transmission using
network coding in batman-adv, from Martin Hundebøll.
9) Add support for T5 cxgb4 chips, from Santosh Rastapur.
10) Generalize the VXLAN forwarding tables so that there is more
flexibility in configurating various aspects of the endpoints.
From David Stevens.
11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
from Dmitry Kravkov.
12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
Neira Ayuso.
13) Start adding networking selftests.
14) In situations of overload on the same AF_PACKET fanout socket, or
per-cpu packet receive queue, minimize drop by distributing the
load to other cpus/fanouts. From Willem de Bruijn and Eric
Dumazet.
15) Add support for new payload offset BPF instruction, from Daniel
Borkmann.
16) Convert several drivers over to mdoule_platform_driver(), from
Sachin Kamat.
17) Provide a minimal BPF JIT image disassembler userspace tool, from
Daniel Borkmann.
18) Rewrite F-RTO implementation in TCP to match the final
specification of it in RFC4138 and RFC5682. From Yuchung Cheng.
19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
you like netlink, so I implemented netlink dumping of netlink
sockets.") From Andrey Vagin.
20) Remove ugly passing of rtnetlink attributes into rtnl_doit
functions, from Thomas Graf.
21) Allow userspace to be able to see if a configuration change occurs
in the middle of an address or device list dump, from Nicolas
Dichtel.
22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
Frederic Sowa.
23) Increase accuracy of packet length used by packet scheduler, from
Jason Wang.
24) Beginning set of changes to make ipv4/ipv6 fragment handling more
scalable and less susceptible to overload and locking contention,
from Jesper Dangaard Brouer.
25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
instead. From Hong Zhiguo.
26) Optimize route usage in IPVS by avoiding reference counting where
possible, from Julian Anastasov.
27) Convert IPVS schedulers to RCU, also from Julian Anastasov.
28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
Eitzenberger.
29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
nfnetlink_log, and nfnetlink_queue. From Gao feng.
30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.
31) Support several new r8169 chips, from Hayes Wang.
32) Support tokenized interface identifiers in ipv6, from Daniel
Borkmann.
33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.
34) Add 802.1ad vlan offload support, from Patrick McHardy.
35) Support mmap() based netlink communication, also from Patrick
McHardy.
36) Support HW timestamping in mlx4 driver, from Amir Vadai.
37) Rationalize AF_PACKET packet timestamping when transmitting, from
Willem de Bruijn and Daniel Borkmann.
38) Bring parity to what's provided by /proc/net/packet socket dumping
and the info provided by netlink socket dumping of AF_PACKET
sockets. From Nicolas Dichtel.
39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
Poirier"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
filter: fix va_list build error
af_unix: fix a fatal race with bit fields
bnx2x: Prevent memory leak when cnic is absent
bnx2x: correct reading of speed capabilities
net: sctp: attribute printl with __printf for gcc fmt checks
netlink: kconfig: move mmap i/o into netlink kconfig
netpoll: convert mutex into a semaphore
netlink: Fix skb ref counting.
net_sched: act_ipt forward compat with xtables
mlx4_en: fix a build error on 32bit arches
Revert "bnx2x: allow nvram test to run when device is down"
bridge: avoid OOPS if root port not found
drivers: net: cpsw: fix kernel warn on cpsw irq enable
sh_eth: use random MAC address if no valid one supplied
3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
tg3: fix to append hardware time stamping flags
unix/stream: fix peeking with an offset larger than data in queue
unix/dgram: fix peeking with an offset larger than data in queue
unix/dgram: peek beyond 0-sized skbs
openvswitch: Remove unneeded ovs_netdev_get_ifindex()
...
Conflicts:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
drivers/net/ethernet/emulex/benet/be.h
include/net/tcp.h
net/mac802154/mac802154.h
Most conflicts were minor overlapping stuff.
The be2net driver brought in some fixes that added __vlan_put_tag
calls, which in net-next take an additional argument.
Signed-off-by: David S. Miller <davem@davemloft.net>
efx_mcdi_get_board_cfg() uses a buffer for the firmware response that
is only large enough to hold subtypes for the originally defined set
of NVRAM partitions. Longer responses are truncated, and we may read
off the end of the buffer when copying out subtypes for additional
partitions. In particular, this can result in the MTD partition for
an FPGA bitfile being named e.g. 'eth5 sfc_fpga:00' when it should be
'eth5 sfc_fpga:01'. This means the firmware update tool (sfupdate)
can't tell which bitfile should be written to the partition.
Correct the response buffer size.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>