When we have a large number of static label mappings that spill across
the netlink message boundary we fail to properly save our state in the
netlink_callback struct which causes us to repeat the same listings.
This patch fixes this problem by saving the state correctly between
calls to the NetLabel static label netlink "dumpit" routines.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When autoneg has been disabled in the PHY (Marvell 88E1118 here), auto
negotiation between MAC and PHY seem non-functional anymore. The only
way I found to workaround this is to manually configure the MAC with the
settings sent to the PHY earlier.
Signed-off-by: Phil Sutter <phil.sutter@viprinet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, if we set up netconsole over bonding and release a slave,
netconsole will stop logging on the whole bonding device. Change the
behavior to stop the netconsole only when the last slave is released.
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following script will produce a kernel oops:
sudo ip netns add v
sudo ip netns exec v ip ad add 127.0.0.1/8 dev lo
sudo ip netns exec v ip link set lo up
sudo ip netns exec v ip ro add 224.0.0.0/4 dev lo
sudo ip netns exec v ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev lo
sudo ip netns exec v ip link set vxlan0 up
sudo ip netns del v
where inspect by gdb:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 107]
0xffffffffa0289e33 in ?? ()
(gdb) bt
#0 vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533
#1 vxlan_stop (dev=0xffff88001bafa000) at drivers/net/vxlan.c:1087
#2 0xffffffff812cc498 in __dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1299
#3 0xffffffff812cd920 in dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1335
#4 0xffffffff812cef31 in rollback_registered_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:4851
#5 0xffffffff812cf040 in unregister_netdevice_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:5752
#6 0xffffffff812cf1ba in default_device_exit_batch (net_list=0xffff88001f2e7e18) at net/core/dev.c:6170
#7 0xffffffff812cab27 in cleanup_net (work=<optimized out>) at net/core/net_namespace.c:302
#8 0xffffffff810540ef in process_one_work (worker=0xffff88001ba9ed40, work=0xffffffff8167d020) at kernel/workqueue.c:2157
#9 0xffffffff810549d0 in worker_thread (__worker=__worker@entry=0xffff88001ba9ed40) at kernel/workqueue.c:2276
#10 0xffffffff8105870c in kthread (_create=0xffff88001f2e5d68) at kernel/kthread.c:168
#11 <signal handler called>
#12 0x0000000000000000 in ?? ()
#13 0x0000000000000000 in ?? ()
(gdb) fr 0
#0 vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533
533 struct sock *sk = vn->sock->sk;
(gdb) l
528 static int vxlan_leave_group(struct net_device *dev)
529 {
530 struct vxlan_dev *vxlan = netdev_priv(dev);
531 struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
532 int err = 0;
533 struct sock *sk = vn->sock->sk;
534 struct ip_mreqn mreq = {
535 .imr_multiaddr.s_addr = vxlan->gaddr,
536 .imr_ifindex = vxlan->link,
537 };
(gdb) p vn->sock
$4 = (struct socket *) 0x0
The kernel calls `vxlan_exit_net` when deleting the netns before shutting down
vxlan interfaces. Later the removal of all vxlan interfaces, where `vn->sock`
is already gone causes the oops. so we should manually shutdown all interfaces
before deleting `vn->sock` as the patch does.
Signed-off-by: Zang MingJie <zealot0630@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linux is free to call ethtool ops as soon as a netdev exists when probe
finishes. However, we only allocate vmxnet3 tx/rx queues and initialize the
rx_buf_per_pkt field in struct vmxnet3_adapter when the interface is
opened (UP).
Signed-off-by: Bhavesh Davda <bhavesh@vmware.com>
Signed-off-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A few small things here and there, nothing major here really. The
conversion of twl4030ldo_ops to get_voltage_sel is a fix, as covered in
the commit log it fixes inconsistency in handling of the IS_UNSUP()
feature in the driver.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRNVWaAAoJELSic+t+oim9dosP/0lHA/JW75ryGElWXn0wgDe2
R+9vkDFGWTLueSjuDiwMnOHpLSjWHQFd8IN6/BmcXPAwVWo2y6obGmXcHrT2Y+Wj
2w0rkdYcF93bepZVHZgyRHcksKo1Liu1VF9h32cYBysOPQ3gwKD2nIR7S9iEuAUt
R2ERt5KwtmFTrKRgsbY400s4fBQhBdKnkCevyuIDSk2KdhUpOhEHkv1jzo2NoRvk
7kD9LAIZjh/Ze0cy1H32T8Zo5nFR40JwZVOMrWXouoJvnhqY9pccaBQui4g33QRg
BKZ0cSy/TVOT7oiKznFxyP8a1J+zwOkeSZRVweSs9pcGuZMpKpm+PYEuLJKL0WWb
fSdmA9djq8NMc7nTx+GAnJAHk08O6eeLupLo2rr/VsWChTKf2MvEAsznLC1aOlwf
/tau7L7sw1/5Yaj8XJ8lAprRcE4AULzIQn5/c2hvr7R71yTSN/Sg5fE7yWhDMSPz
BmMtXpEo0XJIyotESe0FFjh3SJ6V6M1kYknmTlJTf90hhaskgFzF5Wdtz72QLoIZ
/1xl19x7Rk03O5OrVHKd6vGZOUbwZve8FwS9PPICpqObzRIybo6mO0cfEFXwttkw
ZECjf2M4gBLx3NgrRxLt/RW982gGxJFNoqwh/koyowhkwsobZEtHdng12BBk3i8s
GLBjoi9ofEJf7k7xf0zX
=wEGn
-----END PGP SIGNATURE-----
Merge tag 'regulator-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A few small things here and there, nothing major here really. The
conversion of twl4030ldo_ops to get_voltage_sel is a fix, as covered
in the commit log it fixes inconsistency in handling of the IS_UNSUP()
feature in the driver."
* tag 'regulator-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: fixed regulator_bulk_enable unwinding code
regulator: twl: Convert twl4030ldo_ops to get_voltage_sel
regulator: palmas: fix number of SMPS voltages
regulator: core: fix documentation error in regulator_allow_bypass
regulator: core: update kernel documentation for regulator_desc
regulator: db8500-prcmu - remove incorrect __exit markup
A simple fix to stop us leaking a runtime PM reference in the case where
we fail to enable a device.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRNVQ5AAoJELSic+t+oim93WQQAJIToyJgnuoZfebD3vgT1Tey
YVGM5YY0pL+Ec5Fg91vu/ypFaY888J3UlRtQGxEM13grPunR4y/OflRYAXXTnspW
TcPbcWpkEv464iTQra2GY9Z4gqL9c6fKKBSFwrj74wRb+Jq0BQhrdmbw6U6pMnDS
iAxngfYEdlIULy8gyGnAszJFrQWjYh4U4e7wnUlsOJoZbc7JpW/6ITslwG9PWwK7
h+o7ekjn2anyjAqBStlnSOzQ12kcaam+cDh8Fa8TUmg3HTmFmuCytGA8+XwCVBSQ
ndWIhL1bqeyk7MdS84HjatNRAfPtpSZ9ouxKvLHm/tgALTNt/7CIsXeCm+2OoCQU
7uFJ01WnAstQ58ggEndgjvhr4wGRIp9VZXyVjm8tqH2CLT/UE7H+nnOAcABcd/cn
jZ+t8DQHU2ST1Rvs4Mohax8K6XcOTEQLp/kuhPEUXyqsv73VqIsjloPtqcLbUQdA
RYjMMsSFVFqlPQEOBTDNhGVjrfI4/tlkEh7Kw4VXSZXqf8cvTrAvbWYmMV/MJu2M
pvncD872/jSatRbj5qocnUbOuEyQe3UmdBNtQrdWgseI1z0fyz41X/VvZlzgt+Ll
se8iU4YojEviAUjPzKbKpFwr98r6pmMXtHqxDCYSv47YukiCC5QMenFukMGE5G9R
2qSw38quY1edJiXnq42Y
=D1mN
-----END PGP SIGNATURE-----
Merge tag 'regmap-v3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap PM fix from Mark Brown:
"A simple fix to stop us leaking a runtime PM reference in the case
where we fail to enable a device."
* tag 'regmap-v3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: irq: call pm_runtime_put in pm_runtime_get_sync failed case
Or Gerlitz says:
====================
Here's a batch of fixes to the mlx4 core and ethernet drivers for 3.9
The commit that disabled RFS when running in SRIOV mode fixes a regression which was
introduced in 3.9-rc1 but actually present also in the 3.8 -stable series. It turns out
that a slightly different fix is needed there and we will generate and submit it there.
Patches done against net commit 66d29cbc59 "benet: Wait f/w POST until timeout"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 37706996 "mlx4_en: fix allocation of CPU affinity reverse-map" fixed
a bug when mlx4_dev->caps.comp_pool is larger from the device rx rings, but
introduced a regression.
When the mlx4_core is activating its "legacy mode" (e.g when running in SRIOV
mode) w.r.t to EQs/IRQs usage, comp_pool becomes zero and we're crashing on
divide by zero alloc_cpu_rmap.
Fix that by enabling RFS only when running in non-legacy mode.
Reported-by: Yan Burman <yanb@mellanox.com>
Cc: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure we cleanup all MAC related resources (entries in the port MAC
table and steering rules) when stopping a port or when the driver is unloaded.
The leak was introduced by commit 07cb4b0a "net/mlx4_en: Manage hash of MAC
addresses per port".
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove unnecessary use of workqueue for the device MAC address setting
flow, and fix a race when setting MAC address which was introduced by
commit c07cb4b0a "net/mlx4_en: Manage hash of MAC addresses per port"
The race happened when mlx4_en_replace_mac was being executed in parallel
with a successive call to ndo_set_mac_address, e.g witn an A/B/A MAC
setting configuration test, the third set fails.
With this change we also properly report an error if set MAC fails.
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The set_param_l function assumes casting a u64 pointer to a u32 pointer
allows to access the lower 32bits, but it results in writing the upper
32 bits on big endian systems.
The fixed function reads the upper 32 bits of the 64 argument, and or's
them with the 32 bits of the 32-bit value passed to the function.
Since this is now a "read-modify-write" operation, we got many
"unintialized variable" warnings which needed to be fixed as well.
Reported-by: Alexander Schmidt <alexschm@de.ibm.com>.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Older kernels detect DMFS (device-managed flow steering) from the HCA
device capability directly, regardless of whether the capability was
enabled in INIT_HCA, this is fixed by commit 7b8157bed "mlx4_core: Adjustments
to Flow Steering activation logic for SR-IOV"
To protect against guests running kernels without this fix, the host driver
should turn off the DMFS capability bit in mlx4_QUERY_DEV_CAP_wrapper.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guests kernels may not correctly detect if DMFS (device-enabled flow steering) is
activated by the host. If DMFS is activated, the master should return error to guests
which try to use the B0-steering flow calls (mlx4_QP_ATTACH).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code cleanups fix up W=1 compiler warnings and some unnecessary checks. The
new Kconfig option, defaulting to N, allows the rarely used eCryptfs kernel to
userspace communication channel to be compiled out. This may be the first step
in it being eventually removed.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCgAGBQJRN+Z7AAoJENaSAD2qAscKr00P/0/sNgen9e5zqe1q+CAj6hW0
ynzWY/ZNk905hU6tmYb/rHwt7DfaSmrZuzypZP9sGbu+q9RITLl65Hm9HGEJuvJA
fK0UHejcAMQmf+AGZiiMs0SB4B4z+eAzUQTWZsX22C1u+3zyI5xLs1NBquKwDyeq
5sNbcmzQYn4w04xag3yYVQEow0NeIjjuCUc8gNUPctDQldN9DdFTdwFTar5lvC0s
V4qPWqa61mS9xtegryWAw4DNKjUIrZZFFupWPqRYDVYK8N+RQRBL1RWGVRFCJ17j
Ho8yi2onPFGt2y/kW6MwsC41wWFk0Mxsfxf/ZaBMm3lpfYM8UbGQJ6+V9wQWOokU
kioUcTI0WvK999mRLxUNkXuVuNDv0OUysgtALy5bevfneWrfXxoSKq+MPbyNfC7+
mo2BCIyHLXn7BYhzPTU+XfksPfMneYUi5LWf4Km5XYXlZ8rwk3IKvJQFyVThEv8+
peVvwSwblUHaoQLnFhEVeu4olHO6AdVQtwr53HPgpMPaZj2/vaWQNA4+bu5HZHTG
wqBmdo4DH4jgd9D8xiMZMIJTik8j9aUmpntc4eR7RJEKSice4+X1fUXL4n4N4NfD
FkYjWCUZI6nkFUGhGDCokCjzZ3GTEzbe+4pNi3ycTnywcOXFSoq2Kx+tNzE4zXBs
FlWGJYrCub9UOLwoYV2C
=XwgS
-----END PGP SIGNATURE-----
Merge tag 'ecryptfs-3.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull ecryptfs fixes from Tyler Hicks:
"Minor code cleanups and new Kconfig option to disable /dev/ecryptfs
The code cleanups fix up W=1 compiler warnings and some unnecessary
checks. The new Kconfig option, defaulting to N, allows the rarely
used eCryptfs kernel to userspace communication channel to be compiled
out. This may be the first step in it being eventually removed."
Hmm. I'm not sure whether these should be called "fixes", and it
probably should have gone in the merge window. But I'll let it slide.
* tag 'ecryptfs-3.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: allow userspace messaging to be disabled
eCryptfs: Fix redundant error check on ecryptfs_find_daemon_by_euid()
ecryptfs: ecryptfs_msg_ctx_alloc_to_free(): remove kfree() redundant null check
eCryptfs: decrypt_pki_encrypted_session_key(): remove kfree() redundant null check
eCryptfs: remove unneeded checks in virt_to_scatterlist()
eCryptfs: Fix -Wmissing-prototypes warnings
eCryptfs: Fix -Wunused-but-set-variable warnings
eCryptfs: initialize payload_len in keystore.c
Nithin Nayak says:
====================
For the 57766 devices with no NVRAM, there is not enough space for the complete
boot code with EEE support. On these devices, the tg3 driver has to download
a service patch firmware to the scratchpad for the boot code to execute. This
patchset adds support to do the above.
A major portion of this patchset is refactoring the existing firmware download
section to allow a cleaner merge of the 57766 download. The 57766 firmware
differs from previous firmware in that it's not written to a contiguous area in
memory. It consists of multiple fragments to be written to different locations.
The patchset makes an attempt to make the new firmware format to be an
extension of the existing format.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch downloads the EEE service patch firmware and enables the necessary
EEE flags.
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This lays the ground work to download the 57766 fragmented firmware. We
loop until we've written data equal to tp->fw->size minus headers.
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current firmware header parsing is complicated due to interpreting it as a
u32 array and accessing header members via array offsets. Add tg3_firmware_hdr
structure to access the firmware fields instead of hardcoding offsets. The same
header format will be used for individual firmware fragments in the 57766.
The fw_hdr and tg3 structures have all the information required for
loading the fw. Remove the redundant fw_info structure and pass fw_hdr
instead.
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For completeness and consistency, add common function
tg3_pause_cpu_and_set_pc(). This is only for existing fw and not used for the
57766.
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 57766 rxcpu needs to be paused/resumed when we download the firmware just
like we do for existing firmware. Refactor the pause/resume code to be
reusable.
This patch also renames the "offset" argument of tg3_halt_cpu to "cpu_base"
since that's what it really is.
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tg3 used the fw_needed member loosely as a synonym for firmware TSO. Now
that the 57766 needs firmware download support, fw_needed can no longer be
used like this. This patch creates a new FW_TSO flag and changes the
code to use it.
Also rearrange all the TSO flags together in the enum.
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich says:
====================
This is a short series that now allows mac filter programming on any
card that support IFF_UNICAST_FLT by using the existing FDB interface.
Some existing drivers that had FDB functionality usually supported
it only in SR-IOV mode. Since that's not always enabled, and
we want to take advantage of IFF_UNICAST_FLT support, these drivers
have been converted to call the default handler when not in SRIOV mode.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow qlcnic to use the generic fdb handler when the driver options
are not enabled. Untill the driver is fully fixed, this allows
the use of the FDB interface with qlogic driver, but simply puts
the driver into promisc mode since the driver currently does not
support IFF_UNICAST_FLT.
CC: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Acked-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
CC: Sony Chacko <sony.chacko@qlogic.com>
CC: linux-driver@qlogic.com
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove driver specific fdb hadlers since they are the same as
the default ones.
CC: Amir Vadai <amirv@mellanox.com>
CC: Yan Burman <yanb@mellanox.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For fdb_add, use the default handler in the non-SRIOV case.
For the other fdb handlers, just remove them and use the
default ones.
CC: John Fastabend <john.r.fastabend@intel.com>
Acked-By: John Fastabend <john.r.fastabend@intel.com>
CC: CC: Gregory Rose <gregory.v.rose@intel.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the driver does not support the ndo_op use the generic
handler for it. This should work in the majority of cases.
Eventually the fdb_dflt_add call gets translated into a
__dev_set_rx_mode() call which should handle hardware
support for filtering via the IFF_UNICAST_FLT flag.
Namely IFF_UNICAST_FLT indicates if the hardware can do
unicast address filtering. If no support is available
the device is put into promisc mode.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocating 2 buffers per page is insanely inefficient when MTU is 1500
and PAGE_SIZE is 64K (as it usually is on POWER). Allocate as many as
we can fit, and choose the refill batch size at run-time so that we
still always use a whole page at once.
[bwh: Fix loop condition to allow for compound pages; rebase]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
On POWER systems, DMA mapping/unmapping operations are very expensive.
These changes reduce these costs by trying to reuse DMA mapped pages.
After all the buffers associated with a page have been processed and
passed up, the page is placed into a ring (if there is room). For
each page that is required for a refill operation, a page in the ring
is examined to determine if its page count has fallen to 1, ie. the
kernel has released its reference to these packets. If this is the
case, the page can be immediately added back into the RX descriptor
ring, without having to re-map it for DMA.
If the kernel is still holding a reference to this page, it is removed
from the ring and unmapped for DMA. Then a new page, which can
immediately be used by RX buffers in the descriptor ring, is allocated
and DMA mapped.
The time a page needs to spend in the recycle ring before the kernel
has released its page references is based on the number of buffers
that use this page. As large pages can hold more RX buffers, the RX
recycle ring can be shorter. This reduces memory usage on POWER
systems, while maintaining the performance gain achieved by recycling
pages, following the driver change to pack more than two RX buffers
into large pages.
When an IOMMU is not present, the recycle ring can be small to reduce
memory usage, since DMA mapping operations are inexpensive.
With a small recycle ring, attempting to refill the descriptor queue
with more buffers than the equivalent size of the recycle ring could
ultimately lead to memory leaks if page entries in the recycle ring
were overwritten. To prevent this, the check to see if the recycle
ring is full is changed to check if the next entry to be written is
NULL.
[bwh: Combine and rebase several commits so this is complete
before the following buffer-packing changes. Remove module
parameter.]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Enable RX DMA scattering iff an RX buffer large enough for the current
MTU will not fit into a single page and the NIC supports DMA
scattering for kernel-mode RX queues.
On Falcon and Siena, the RX_USR_BUF_SIZE field is used as the DMA
limit for both all RX queues with scatter enabled. Set it to 1824,
matching what Onload uses now.
Maintain a statistic for frames truncated due to lack of descriptors
(rx_nodesc_trunc). This is distinct from rx_frm_trunc which may be
incremented when scattering is disabled and implies an over-length
frame.
Whenever an MTU change causes scattering to be turned on or off,
update filters that point to the PF queues, but leave others
unchanged, as VF drivers assume scattering is off.
Add n_frags parameters to various functions, and make them iterate:
- efx_rx_packet()
- efx_recycle_rx_buffers()
- efx_rx_mk_skb()
- efx_rx_deliver()
Make efx_handle_rx_event() responsible for updating
efx_rx_queue::removed_count.
Change the RX pipeline state to a starting ring index and number of
fragments, and make __efx_rx_packet() responsible for clearing it.
Based on earlier versions by David Riddoch and Jon Cooper.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Adjust rx_buf->page_offset when we eat the RX hash prefix. Remove
efx_rx_buf_offset(), which is now redundant.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Currently we prefetch from the Ethernet header, but we will also read
the hash prefix. In practice they should be in the same cache line
and this won't hurt, but it is still pointless to add on the hash
prefix size.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_rx_buf_va() returns the virtual address of the current start of
the buffer. The callers must add the hash prefix size themselves.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The pipeline mechanism will need to change a bit for scattered
packets. Add a wrapper to insulate efx_process_channel() from this.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The Linux side of EEH is triggered by MMIO reads, but this
driver's data path does not issue any MMIO reads (except in
legacy interrupt mode). Therefore add a monitor function
to poll EEH periodically.
When preparing to reset the device based on our own error
detection, also poll EEH and defer to its recovery mechanism
if appropriate.
[bwh: Use a separate condition for the initial link poll; fix some
style errors]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
On Siena, VFs share RSS configuration with the PF. We attempted to
support configurations where the PF only uses 1 RX queue and VFs use
multiple RX queues, by (1) setting up RSS for the number of RX queues
per VF (2) disabling RSS in the PF's RX default filters.
Unfortunately commit cd2d5b529c ('sfc: Add SR-IOV back-end support
for SFC9000 family') only included (1). This is (2).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_filter_insert_filter() uses the first table entry in the hash chain
that either has the same match values or is empty. This means that
replacement doesn't always work correctly:
1. Insert filter F1 with match values M1, hashing to H1, at first
possible entry E1.
2. Insert filter F2 with match values M2, hashing to H1, at second
possible entry E2.
3. Remove filter F1.
4. Insert filter F3 with match values M2, hashing to H1, at first
possible entry E1.
F3 should have either replaced F2 or been rejected (depending on
priority and the replace_equal parameter).
Instead, search for both a matching filter that the inserted filter
would replace, and an available insertion point, up to the applicable
maximum search depths. If we insert at lower depth than a replaced
filter, clear the replaced filter.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
efx_filter_search() is only called from efx_filter_insert(), and
neither function is very long. The following bug fix requires a more
sophisticated search with a third result, which is going to be easier
to implement as part of the same function.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
These functions happen to work for default MAC filters: they generate
an initial index of 1/0 for unicast/multicast respectively and an
increment of 1 for either, so a search succeeds at depth 2. But this
is a matter of luck rather than design, and it really won't work well
with the bug fix we're about to do.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The 'for_insert' parameter is redundant since there are no longer
any other operations that need to search based on a filter spec.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The 'replace' flag to efx_filter_insert_filter() controls whether the
new filter may replace *any* filter, and is checked even before
priority comparison. But lower-priority filters should never
block insertion of higher-priority filters.
Change the priority checking so that lower-priority filters are
replaced regardless of the value of the flag, and rename the
flag to 'replace_equal'.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
[bwh: Remove more dead code, and make efx_ptp_rx() pull the data it
needs into the header area.]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
There is a long-standing problem with the packet-timestamp matching in
the driver. When a PTP packet is received by the MC, the FPGA
timestamps the packet and the MC sends the timestamp and 6 bytes of
the UUID to the driver. The driver then matches the timestamp against
received packets using the same 6 bytes of UUID.
The problem comes from the choice of which 6 bytes to use. The PTP
spec is slightly contradictory and misleading in one of the two places
where the UUIDs are discussed. From section 7.2.2.2 of the spec, a
PTPD2 UUID can be either a EUI-64 or a EUI-64 constructed from a
EUI-48. The typical ethernet based implementation uses a EUI-64
constructed from a EUI-48. This works by taking the first 3 bytes of
the MAC address of the NIC being used for PTP (the OUI), then
inserting 0xFF, 0xFE, then taking the last 3 bytes of the MAC address
giving
MAC[0], MAC[1], MAC[2], 0xFF, 0xFE, MAC[3], MAC[4], MAC[5]
The current MC firmware and driver discard the first two bytes of this
UUID and packets are matched against timestamps using bytes 2 to 7 so
there is a small risk that in a deployment of Solarflare PTP NICs used
with other vendors NICs, that a PTP packet could be matched against
the wrong timestamp. This applies to all other organisations whose
third byte of the OUI is 0x53. It's a long list but I notice that it
includes Cisco.
The necessary modifications to use bytes 0-2 and 5-7 of the UUID to
match against are quite small but introduce incompatibility between
older version of the firmware and driver.
When PTP is enabled via SO_TIMESTAMPING specifying PTP V2, the driver
will try to enable PTP in the firmware using the enhanced mode
(above). If the firmware returns an error, the driver will enable PTP
in the firmware using the old mode.
[bwh: Fix some style errors; remove private ioctl bits]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>