With the merging of 2 sub flows, a new 'merge' flow will be created and
written to FW. The TC layer is unaware that the merge flow exists and will
request stats from the sub flows. Conversely, the FW treats a merge rule
the same as any other rule and sends stats updates to the NFP driver.
Add links between merge flows and their sub flows. Use these links to pass
merge flow stats updates from FW to the underlying sub flows, ensuring TC
stats requests are handled correctly. The updating of sub flow stats is
done on (the less time critcal) TC stats requests rather than on FW stats
update.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When combining 2 sub_flows to a single 'merge flow' (assuming the merge is
valid), the merge flow should contain the same match fields as sub_flow 1
with actions derived from a combination of sub_flows 1 and 2. This action
list should have all actions from sub_flow 1 with the exception of the
output action that triggered the 'implicit recirculation' by sending to
an internal port, followed by all actions of sub_flow 2. Any pre-actions
in either sub_flow should feature at the start of the action list.
Add code to generate a new merge flow and populate the match and actions
fields based on the sub_flows. The offloading of the flow is left to
future patches.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two flows can be merged if the second flow (after recirculation) matches
on bits that are either matched on or explicitly set by the first flow.
This means that if a packet hits flow 1 and recirculates then it is
guaranteed to hit flow 2.
Add a 'can_merge' function that determines if 2 sub_flows in a merge hint
can be validly merged to a single flow.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a merge hint is received containing 2 flows that are matched via an
implicit recirculation (sending to and matching on an internal port), fw
reports that the flows (called sub_flows) may be able to be combined to a
single flow.
Add infastructure to accept and process merge hint messages. The actual
merging of the flows is left as a stub call.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each flow is given a context ID that the fw uses (along with its cookie)
to identity the flow. The flows stats are updated by the fw via this ID
which is a reference to a pre-allocated array entry.
In preparation for flow merge code, enable the nfp_fl_payload structure to
be accessed via this stats context ID. Rather than increasing the memory
requirements of the pre-allocated array, add a new rhashtable to associate
each active stats context ID with its rule payload.
While adding new code to the compile metadata functions, slightly
restructure the existing function to allow for cleaner, easier to read
error handling.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The neighbour table in the FW only accepts next hop entries if the egress
port is an nfp repr. Modify this to allow the next hop to be an internal
port. This means that if a packet is to egress to that port, it will
recirculate back into the system with the internal port becoming its
ingress port.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FW may receive a packet with its ingress port marked as an internal port.
If a rule does not exist to match on this port, the packet will be sent to
the NFP driver. Modify the flower app to detect packets from such internal
ports and convert the ingress port to the correct kernel space netdev.
At this point, it is assumed that fallback packets from internal ports are
to be sent out said port. Therefore, set the redir_egress bool to true on
detection of these ports.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, it is assumed that fallback packets will be from reprs. Modify
this to allow an app to receive non-repr ports from the fallback channel -
e.g. from an internal port. If such a packet is received, do not update
repr stats.
Change the naming function calls so as not to imply it will always be a
repr netdev returned. Add the option to set a bool value to redirect a
fallback packet out the returned port rather than RXing it. Setting of
this bool in subsequent patches allows the handling of packets falling
back when they are due to egress an internal port.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent FW modifications allow the offloading of non repr ports. These
ports exist internally on the NFP. So if a rule outputs to an 'internal'
port, then the packet will recirculate back into the system but will now
have this internal port as it's incoming port. These ports are indicated
by a specific type field combined with an 8 bit port id.
Add private app data to assign additional port ids for use in offloads.
Provide functions to lookup or create new ids when a rule attempts to
match on an internal netdev - the only internal netdevs currently
supported are of type openvswitch. Have a netdev notifier to release
port ids on netdev unregister.
OvS offloads rules that match on internal ports as TC egress filters.
Ensure that such rules are accepted by the driver.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Write to a FW symbol to indicate that the driver supports flow merging. If
this symbol does not exist then flow merging and recirculation is not
supported on the FW. If support is available, add a stub to deal with FW
to kernel merge hint messages.
Full flow merging requires the firmware to support of flow mods. If it
does not, then do not attempt to 'turn on' flow merging.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Avoid a crash scenario with architectures like powerpc that require
'pgtable_deposit' for the zero page.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJctNryAAoJEB7SkWpmfYgCzcMP/37LJbb4SYNwnDIW4BF33ril
ZwtPeJJVTR56Ojo+Dy1v9084zeyhUHHewz0Oqx15dm6k/N5SS19yKNFKQDOK+4OC
zbaWD5UOtllU3RQ2ORUOUoqNGF278+h4VVVQMntVaHhdt5f120tgHXxmKoB5Z5zH
Gcy0vZNHoJ5lVYfKjKYG0b0/dWWOD1ZEjTkZjTa4DjhVSQcFauN8DxJ4hSyumYqs
HDnZZt44RTTUS5W3BTlhuaSEcZaDOznmyj1HmKXNg3ghxguKACho4xhA7xFKqT8O
03WZxDBFnOXZb3yfKpHB6RclkJgrtmD5U5GStzl5SobLPb2E/TPQzCRhZ/kcFPZ8
RE2JkgdGl8gqCDRqRsC/tbF3dETO66vxUyf5utNv0ttBk7qLMwTGTKm3VQz7Xvu2
SLkwv6Rlw4UT6ML8nd2kNhf8xRkaLl6j1B6zWDy7wEoFPXWW+My0PPpsJZcbTeza
eib2ood7AlPHRU0/mW2ZrGHGabbS6kNGeQlod9U5sikkE7ZA/LwzyFl4b/uCqYNP
NKGQdz0iHVcq8lFPXEmZ7vP2krd6uUWIv9KaiwmjBBMf9w3ZAzS85c7HFAZD0zgC
tTHm6stMhpdS3ndyIxMBf0sL7AB/Q9BH7jJwDK/P8QObovezW2zZ4CPx/gQYJ2XU
LTeCJmQh3xcCpI3f/eka
=ijtJ
-----END PGP SIGNATURE-----
Merge tag 'fsdax-fix-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull fsdax fix from Dan Williams:
"A single filesystem-dax fix. It has been lingering in -next for a long
while and there are no other fsdax fixes on the horizon:
- Avoid a crash scenario with architectures like powerpc that require
'pgtable_deposit' for the zero page"
* tag 'fsdax-fix-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
fs/dax: Deposit pagetable even when installing zero page
Huazhong Tan says:
====================
net: hns3: fixes sparse: warning and type error
This patchset fixes a sparse warning and a overflow problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When setting vport->bw_limit to hdev->tm_info.pg_info[0].bw_limit
in hclge_tm_vport_tc_info_update, vport->bw_limit can be as big as
HCLGE_ETHER_MAX_RATE (100000), which can not fit into u16 (65535).
So this patch fixes it by using u32 for vport->bw_limit.
Fixes: 848440544b ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The input parameter "proto" in function hclge_set_vlan_filter_hw()
is asked to be __be16, but got u16 when calling it in function
hclge_update_port_base_vlan_cfg().
This patch fixes it by converting it with htons().
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: 21e043cd81 ("net: hns3: fix set port based VLAN for PF")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long says:
====================
sctp: fully support memory accounting
sctp memory accounting is added in this patchset by using
these kernel APIs on send side:
- sk_mem_charge()
- sk_mem_uncharge()
- sk_wmem_schedule()
- sk_under_memory_pressure()
- sk_mem_reclaim()
and these on receive side:
- sk_mem_charge()
- sk_mem_uncharge()
- sk_rmem_schedule()
- sk_under_memory_pressure()
- sk_mem_reclaim()
With sctp memory accounting, we can limit the memory allocation by
either sysctl:
# sysctl -w net.sctp.sctp_mem="10 20 50"
or cgroup:
# echo $((8<<14)) > \
/sys/fs/cgroup/memory/sctp_mem/memory.kmem.tcp.limit_in_bytes
When the socket is under memory pressure, the send side will block
and wait, while the receive side will renege or drop.
v1->v2:
- add the missing Reported/Tested/Acked/-bys.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_forward_alloc's updating is also done on rx path, but to be consistent
we change to use sk_mem_charge() in sctp_skb_set_owner_r().
In sctp_eat_data(), it's not enough to check sctp_memory_pressure only,
which doesn't work for mem_cgroup_sockets_enabled, so we change to use
sk_under_memory_pressure().
When it's under memory pressure, sk_mem_reclaim() and sk_rmem_schedule()
should be called on both RENEGE or CHUNK DELIVERY path exit the memory
pressure status as soon as possible.
Note that sk_rmem_schedule() is using datalen to make things easy there.
Reported-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now when sending packets, sk_mem_charge() and sk_mem_uncharge() have been
used to set sk_forward_alloc. We just need to call sk_wmem_schedule() to
check if the allocated should be raised, and call sk_mem_reclaim() to
check if the allocated should be reduced when it's under memory pressure.
If sk_wmem_schedule() returns false, which means no memory is allowed to
allocate, it will block and wait for memory to become available.
Note different from tcp, sctp wait_for_buf happens before allocating any
skb, so memory accounting check is done with the whole msg_len before it
too.
Reported-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel says:
====================
mlxsw: Add neighbour offload indication
Neighbour entries are programmed to the device's table so that the
correct destination MAC will be specified in a packet after it was
routed.
Despite being programmed to the device and unlike routes and FDB
entries, neighbour entries are currently not marked as offloaded. This
patchset changes that.
Patch #1 is a preparatory patch to make sure we only mark a neighbour as
offloaded in case it was successfully programmed to the device.
Patch #2 sets the offload indication on neighbours.
Patch #3 adds a test to verify above mentioned functionality.
Patched iproute2 version that prints the offload indication is available
here [1].
[1] https://github.com/idosch/iproute2/tree/idosch-next
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Test that neighbour entries are marked as offloaded.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a similar fashion to routes and FDB entries, the neighbour table is
reflected to the device.
Set an offload indication on the neighbour in case it was programmed to
the device.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Next patch will add offload indication to neighbours, but the indication
should only be altered in case the neighbour was successfully added to /
deleted from the device.
Propagate neighbour update errors, so that they could be taken into
account by the next patch.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MAINTAINERS contains a lower-case and upper-case variant of
Woojung Huh' s email address.
Only keep the lower-case variant in MAINTAINERS.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a bond is enslaved to another bond, bond_netdev_event() only
handles the event as if the bond is a master, and skips treating the
bond as a slave.
This leads to a refcount leak on the slave, since we don't remove the
adjacency to its master and the master holds a reference on the slave.
Reproducer:
ip link add bondL type bond
ip link add bondU type bond
ip link set bondL master bondU
ip link del bondL
No "Fixes:" tag, this code is older than git history.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Remove the broute pseudo hook, implement this from the bridge
prerouting hook instead. Now broute becomes real table in ebtables,
from Florian Westphal. This also includes a size reduction patch for the
bridge control buffer area via squashing boolean into bitfields and
a selftest.
2) Add OS passive fingerprint version matching, from Fernando Fernandez.
3) Support for gue encapsulation for IPVS, from Jacky Hu.
4) Add support for NAT to the inet family, from Florian Westphal.
This includes support for masquerade, redirect and nat extensions.
5) Skip interface lookup in flowtable, use device in the dst object.
6) Add jiffies64_to_msecs() and use it, from Li RongQing.
7) Remove unused parameter in nf_tables_set_desc_parse(), from Colin Ian King.
8) Statify several functions, patches from YueHaibing and Florian Westphal.
9) Add an optimized version of nf_inet_addr_cmp(), from Li RongQing.
10) Merge route extension to core, also from Florian.
11) Use IS_ENABLED(CONFIG_NF_NAT) instead of NF_NAT_NEEDED, from Florian.
12) Merge ip/ip6 masquerade extensions, from Florian. This includes
netdevice notifier unification.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Second set of fixes for 5.1.
iwlwifi
* add some new PCI IDs (plus a struct name change they depend on)
* fix crypto with new devices, namely 22560 and above
* fix for a potential deadlock in the TX path
* a fix for offloaded rate-control
* support new PCI HW IDs which use a new FW
mt76
* fix lock initialisation and a possible deadlock
* aggregation fixes
rt2x00
* fix sequence numbering during retransmits
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJctIf2AAoJEG4XJFUm622bm4UH/RjizpPno79+PxKgZFxhYcWJ
4sBW1hDwcO4FNKLX3tf6WR4N/vxkPEWSbB1gGRbHAIeYeb74ab5hYQoCVaOnU8GL
uzhjgGVHwUnBi0oluuG12gpU9Zf6raV4Ec9HdRszx1A8DOa0qn5lUrXwWbF3GlZZ
BU4ojdOZOVQx+yqE+xIr7okcbBpxRZgLK7KthKWgR4G9RlISIsfMYAHE1fwPmPr8
b7fwqAKkgJ5fGSxez1tv7hlYaRw2zVwFYrVv3TegRKMl1pHTI/ps3ip37JTwv+O0
lv5112AgEO2cSPT/R0wWqEkZ21Rc/AeFbttcQ9Ejw1SPc/w2d+fX8pTY5/RPw8w=
=RCyH
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-for-davem-2019-04-15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 5.1
Second set of fixes for 5.1.
iwlwifi
* add some new PCI IDs (plus a struct name change they depend on)
* fix crypto with new devices, namely 22560 and above
* fix for a potential deadlock in the TX path
* a fix for offloaded rate-control
* support new PCI HW IDs which use a new FW
mt76
* fix lock initialisation and a possible deadlock
* aggregation fixes
rt2x00
* fix sequence numbering during retransmits
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
After merging the netfilter-next tree, today's linux-next build (powerpc
ppc44x_defconfig) failed like this:
In file included from net/bridge/br_input.c:19:
include/net/netfilter/nf_queue.h:16:23: error: field 'state' has incomplete type
struct nf_hook_state state;
^~~~~
Fixes: 971502d77f ("bridge: netfilter: unroll NF_HOOK helper in bridge input path")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Merge page ref overflow branch.
Jann Horn reported that he can overflow the page ref count with
sufficient memory (and a filesystem that is intentionally extremely
slow).
Admittedly it's not exactly easy. To have more than four billion
references to a page requires a minimum of 32GB of kernel memory just
for the pointers to the pages, much less any metadata to keep track of
those pointers. Jann needed a total of 140GB of memory and a specially
crafted filesystem that leaves all reads pending (in order to not ever
free the page references and just keep adding more).
Still, we have a fairly straightforward way to limit the two obvious
user-controllable sources of page references: direct-IO like page
references gotten through get_user_pages(), and the splice pipe page
duplication. So let's just do that.
* branch page-refs:
fs: prevent page refcount overflow in pipe_buf_get
mm: prevent get_user_pages() from overflowing page refcount
mm: add 'try_get_page()' helper function
mm: make page ref count overflow check tighter and more explicit
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJcrPOfAAoJEEg/ir3gV/o+c1sIAIuVUmF95OK6BxrNxQ31HN7i
0V/OW29V6B5musqyGXVa90nl9wJ9BE2tmtHsg2HPABXdGdiYhNRP7Tm+aq+QYBe3
8kJVk5U+HCLeHvf9k3dpJZokMzAgEhuWAbuAE1YelYUtbOXO9Zrj2uTL1NHJTYyc
SNOg9+gATOMsOAuiUyygN0XMoYESTsUE7UH4tuhyYr44cKR85qOQDPAlcDEHGTfO
uHWwmOznZqFVJUVyfwtEkTojsxNiW+QA2PR5faX/+eI7746qXOAzYq2JSjtNEyTz
4xB9a+t47xpGDw4Svwu51pDw+4Uiiy1Yv0kOKKpBqrCk892bZ8l1gWcHRgjYx/8=
=9wkB
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2019-04-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-04-09
This series provides some fixes to mlx5 driver.
I've cc'ed some of the checksum fixes to Eric Dumazet and i would like to get
his feedback before you pull.
For -stable v4.19
('net/mlx5: FPGA, tls, idr remove on flow delete')
('net/mlx5: FPGA, tls, hold rcu read lock a bit longer')
For -stable v4.20
('net/mlx5e: Rx, Check ip headers sanity')
('Revert "net/mlx5e: Enable reporting checksum unnecessary also for L3 packets"')
('net/mlx5e: Rx, Fixup skb checksum for packets with tail padding')
For -stable v5.0
('net/mlx5e: Switch to Toeplitz RSS hash by default')
('net/mlx5e: Protect against non-uplink representor for encap')
('net/mlx5e: XDP, Avoid checksum complete when XDP prog is loaded')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub forgot to either use nlmsg_len() or nlmsg_msg_size(),
allowing KMSAN to detect a possible uninit-value in rtnl_stats_get
BUG: KMSAN: uninit-value in rtnl_stats_get+0x6d9/0x11d0 net/core/rtnetlink.c:4997
CPU: 0 PID: 10428 Comm: syz-executor034 Not tainted 5.1.0-rc2+ #24
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x173/0x1d0 lib/dump_stack.c:113
kmsan_report+0x131/0x2a0 mm/kmsan/kmsan.c:619
__msan_warning+0x7a/0xf0 mm/kmsan/kmsan_instr.c:310
rtnl_stats_get+0x6d9/0x11d0 net/core/rtnetlink.c:4997
rtnetlink_rcv_msg+0x115b/0x1550 net/core/rtnetlink.c:5192
netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2485
rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5210
netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1336
netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1925
sock_sendmsg_nosec net/socket.c:622 [inline]
sock_sendmsg net/socket.c:632 [inline]
___sys_sendmsg+0xdb3/0x1220 net/socket.c:2137
__sys_sendmsg net/socket.c:2175 [inline]
__do_sys_sendmsg net/socket.c:2184 [inline]
__se_sys_sendmsg+0x305/0x460 net/socket.c:2182
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2182
do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
Fixes: 51bc860d4a ("rtnetlink: stats: validate attributes in get as well as dumps")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis Bolotin says:
====================
qed: Fix the Doorbell Overflow Recovery mechanism
This patch series fixes and improves the doorbell recovery mechanism.
The main goals of this series are to fix missing attentions from the
doorbells block (DORQ) or not handling them properly, and execute the
recovery from periodic handler instead of the attention handler.
Please consider applying the series to net.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Separate the overflow handling from the hardware interrupt status analysis.
The interrupt status is a single register and is common for all PFs. The
first PF reading the register is not necessarily the one who overflowed.
All PFs must check their overflow status on every attention.
In this change we clear the sticky indication in the attention handler to
allow doorbells to be processed again as soon as possible, but running
the doorbell recovery is scheduled for the periodic handler to reduce the
time spent in the attention handler.
Checking the need for DORQ flush was changed to "db_bar_no_edpm" because
qed_edpm_enabled()'s result could change dynamically and might have
prevented a needed flush.
Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the DORQ (doorbell block) is overflowed, all PFs get attentions at the
same time. If one PF finished handling the attention before another PF even
started, the second PF might miss the DORQ's attention bit and not handle
the attention at all.
If the DORQ attention is missed and the issue is not resolved, another
attention will not be sent, therefore each attention is treated as a
potential DORQ attention.
As a result, the attention callback is called more frequently so the debug
print was moved to reduce its quantity.
The number of periodic doorbell recovery handler schedules was reduced
because it was the previous way to mitigating the missed attention issue.
Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the condition which verifies that doorbell address is inside the
doorbell bar by checking that the end of the address is within range
as well.
Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DB_REC_DRY_RUN (running doorbell recovery without sending doorbells) is
never used. DB_REC_ONCE (send a single doorbell from the doorbell recovery)
is not needed anymore because by running the periodic handler we make sure
we check the overflow status later instead.
This patch is needed because in the next patches, the only doorbell
recovery type being used is DB_REC_REAL_DEAL, and the fixes are much
cleaner without this enum.
Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This check isn't really needed and we can simplify the code and save
some CPU cycles by removing it. Only in case of an error none of these
bits are set, and calling the NAPI callback doesn't hurt in this case.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit says:
====================
r8169: create function pointer arrays for PHY and chip hw init functions
Using function pointer arrays makes the code easier to read and better
maintainable. AFAIK function pointer arrays cause some performance
drawback due to Spectre mitigation, but we're not in a hot path.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Using a function pointer array makes this easier to read and better
maintainable.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using a function pointer array makes this easier to read and better
maintainable. AFAIK function pointer arrays cause some performance
drawback due to Spectre mitigation, but we're not in a hot path here.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan says:
====================
code optimizations & bugfixes for HNS3 driver
This patch-set includes code optimizations and bugfixes for the HNS3
ethernet controller driver.
[patch 1/12 - 4/12] optimizes the VLAN freature and adds support for port
based VLAN, fixes some related bugs about the current implementation.
[patch 5/12 - 12/12] includes some other code optimizations for the HNS3
ethernet controller driver.
Change log:
V1->V2: modifies some patches' commint log and code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes some redundant BH disable when initializing
and uninitializing command queue.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If there is pending skb in RX flow when close the port, and the
pending buffer is not cleaned, the new packet will be added to
the pending skb when the port opens again, and the first new
packet has error data.
This patch cleans the pending skb when clean RX ring.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For some cases, PHY may not be connected to MDIO bus, then
the driver will initialize fail since MDIO bus initialization
fails.
This patch fixes it by skipping the MDIO bus initialization
when PHY is inexistent.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to hardware description, reset level that should be
triggered are not consistent in a module. For example, in SSU
common errors, the first two bits has no need to do reset,
but the other bits need global reset.
This patch sets separate reset level for all RAS and MSI-X
interrupts by adding a reset_lvel field in struct hclge_hw_error,
and fixes some incorrect reset level.
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently hardware may have not enough buffer to receive packet
when it has used more than two MPS(maximum packet size) of
buffer, but there are still a lot of shared buffer left unused
when TC num is small.
This patch divides shared buffer to be used between TC when
the port supports DCB, and adjusts the waterline and threshold
according to user manual for the port that does not support
DCB.
This patch also change hclge_get_tc_num's return type to u32
to avoid signed-unsigned mix with divide.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently RX shared buffer' threshold size for speific TC is
set to smaller value when the TC's PFC is not enabled, which may
cause performance problem because hardware may not have enough
hardware buffer when PFC is not enabled.
This patch sets the same threshold size for all TC no matter if
the specific TC's PFC is enabled.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a GRO packet is received by driver, the cwr field in the
struct tcphdr needs to be checked to decide whether to set the
SKB_GSO_TCP_ECN for skb_shinfo(skb)->gso_type.
So this patch adds hns3_gro_complete to do that, and adds the
hns3_handle_bdinfo to handle the hns3_gro_complete and
hns3_rx_checksum.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>