Current APIs only allow looking for a FID and creating it in case it
does not exist.
With VxLAN, in case the bridge to which the VxLAN device was enslaved
does not already have a corresponding FID, then it means that something
went wrong that we need to be aware of.
Add an API to look up a FID, but without creating it in order to catch
above-mentioned situation.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the device, the VNI and the list of remote VTEPs a packet should be
flooded to is a property of the filtering identifier (FID).
During encapsulation, the VNI is taken from the FID the packet was
classified to. During decapsulation, the overlay packet is injected into
a bridge and classified to a FID based on the VNI it came with.
Allow NVE configuration for a FID. Currently, this is only supported
with 802.1D FIDs which are used for VLAN-unaware bridges. However, NVE
configuration is going to be supported with 802.1Q FIDs which is why the
related fields are placed in the common FID struct.
Since the device requires a 1:1 mapping between FID and VNI, the driver
maintains a hashtable keyed by VNI and checks if the VNI is already
associated with an existing FID.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we fail when user specify a non-zero chain, this patch adds the
support for it and tc priorities. To get to a new chain, use the tc
goto action.
Currently we support a fixed prio range 1-16, and chain range 0-3.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When adding a vxlan tc rule, and a neighbour isn't available, we
don't insert any rule to hardware. Once we enable offloading flows
with multiple priorities, a packet that should have matched this rule
will continue in hardware pipeline and might match a wrong one.
This is unlike in tc software path where it will be matched and
forwarded to the vxlan device (which will cause a ARP lookup
eventually) and stop processing further tc filters.
To address that, when when a neighbour isn't available (EAGAIN from
attach_encap), or gets deleted, change the original action to be a
forward to slow path instead. Neighbour update will restore the original
action once the neighbour becomes available. This will be done atomically
so at any given time we will have a the correct match.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
A pre-step for the tc offloads code to use this when a neigh is
not available for encap rules.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The code for adding/deleting fdb flow is repeated when
user-space does flow add/del and when we add/del from
the neigh update path - unify them to avoid the duplication.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When replacing a tc flower rule, flower first requests to add the
new rule (new action), then deletes the old one.
But currently when asked to add a new tc flower flow, we append the
actions (and counters to it).
This can result in a fte with two flow counters or conflicting
actions (drop and encap action) which firmware complains/errs
about and isn't achieving what the user aimed for.
Instead, insert the flow using the new no-append flag which will add a
new HW rule, the old flow and rule will be deleted later by flower
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanmox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
If no-append flag is set, we will add a new FTE, instead of appending
the actions of the inserted rule when the same match already exists.
While here, move the has_flow_tag boolean indicator to be a flag too.
This patch doesn't change any functionality.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanmox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
A chain is a group of priorities, so use the fdb parallel
sub namespaces to implement chains, and a flow table for each
priority in them.
Because these namespaces are parallel and in series to the slow path
fdb, the chains aren't connected to one another (but to the slow path),
and one must use a explicit goto action to reach a different chain.
Flow tables for the priorities will be created on demand and destroyed
once not used.
The Firmware has four pools of tables for sizes S/XS/M/L (4k, 64k, 1m, 4m).
We maintain ghost copies of the pools occupancy.
When a new table is to be created, we scan the pools from large to small
and find the 1st table size which can be now created. When a table is
destroyed, we update the relevant pool.
Multi chain/prio isn't enabled yet by this patch, for now all flows
will use the default chain 0, and prio 1.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Be symmetric with the e-switch API to add rules which has a
specific function to add fwd rules which are used as part of
vport mirroring.
This patch doesn't change any functionality.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Towards supporting multi-chains and priorities, split the FDB fast path
to multiple namespaces (sub namespaces), each with multiple priorities.
This patch adds a new flow steering type, FS_TYPE_PRIO_CHAINS, which is
like current FS_TYPE_PRIO, but may contain only namespaces, and those
will be in parallel to one another in terms of managing of the flow
tables connections inside them. Meaning, while searching for the next
or previous flow table to connect for a new table inside such namespace
we skip the parallel namespaces in the same level under the
FS_TYPE_PRIO_CHAINS prio we originated from.
We use this new type for splitting the fast path prio into multiple
parallel namespaces, each containing normal prios.
The prios inside them (and their tables) will be connected to one
another, but not from one parallel namespace to another, instead the
last prio in each namespace will be connected to the next prio in
the containing FDB namespace, which is the slow path prio.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Move to have clear separation on the code path to add nic vs e-switch
flows. While here we break the code that deals with adding offloaded
TC tool to few smaller stages, each on helper function.
Besides getting us simpler and readable code, these are pre-steps
for being able to have two HW flows serving one SW TC flow for some
e-switch use cases.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Refactor the flow add utility functions to return err code instead of rule
pointers. This will allow for simpler logic when one tc rule is
duplicated to two HW rules in downstream patches.
Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently, when a flow rule is created using the FS core layer, the caller
has to pass the entire flow counter object and not just the counter HW
handle (ID). This requires both the FS core and the caller to have
knowledge about the inner implementation of the FS layer flow counters
cache and limits the possible users.
Move to use the counter ID across the place when dealing with flows.
Doing this decoupling, now can we privatize the inner implementation
of the flow counters.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
There's no real reason for the e-switch logic to manage the creation of
counters for offloaded flows. The API already has the directive for the
caller to denote they want to attach a counter to the created flow.
As such, we go and move the management of flow counters to the mlx5e
tc offload logic. This also lets us remove an inelegant interface where
the FS layer had to provide a way to retrieve a counter from a flow rule.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5 updates for both net-next and rdma-next
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: (21 commits)
net/mlx5: Expose DC scatter to CQE capability bit
net/mlx5: Update mlx5_ifc with DEVX UID bits
net/mlx5: Set uid as part of DCT commands
net/mlx5: Set uid as part of SRQ commands
net/mlx5: Set uid as part of SQ commands
net/mlx5: Set uid as part of RQ commands
net/mlx5: Set uid as part of QP commands
net/mlx5: Set uid as part of CQ commands
net/mlx5: Rename incorrect naming in IFC file
net/mlx5: Export packet reformat alloc/dealloc functions
net/mlx5: Pass a namespace for packet reformat ID allocation
net/mlx5: Expose new packet reformat capabilities
{net, RDMA}/mlx5: Rename encap to reformat packet
net/mlx5: Move header encap type to IFC header file
net/mlx5: Break encap/decap into two separated flow table creation flags
net/mlx5: Add support for more namespaces when allocating modify header
net/mlx5: Export modify header alloc/dealloc functions
net/mlx5: Add proper NIC TX steering flow tables support
net/mlx5: Cleanup flow namespace getter switch logic
net/mlx5: Add memic command opcode to command checker
...
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add link establishment methods
Add auto negotiation methods
Add read MAC address method
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add PHY's ID support
Add support for initialization, acquire and release of PHY
Enable register access
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add code for NVM support and get MAC address, complete probe
method.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add code for hardware initialization and reset
Add code for semaphore handling
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds support for allocating, configuring, and freeing Tx/Rx ring
resources. With these changes in place the descriptor queues are in a
state where they are ready to transmit or receive if provided buffers.
This also adds the transmit and receive fastpath and interrupt handlers.
With this code in place the network device is now able to send and receive
frames over the network interface using a single queue.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change adds the defines and structures necessary to support both Tx
and Rx descriptor rings.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch set adds interrupt support for the igc interfaces.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Now that we have the ability to configure the basic settings on the device
we can start allocating and configuring a netdev for the interface.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds the basic defines and structures needed by the PF for
operation. With this it is possible to bring up the interface,
but without being able to configure any of the filters on
the interface itself.
Add skeleton for a function pointers.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds the beginning framework onto which I am going to add
the igc driver which supports the Intel(R) I225-LM/I225-V 2.5G
Ethernet Controller.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Update cxgb4 to send No. of Tx Queue created in lldinfo struct
and use the same ntxq in chcr driver.
This patch depends on following commit
commit add92a817e
"Fix memory corruption in DMA Mapped buffers"
v2:
Free txq_info in error case as pointed by Lino Sanfilippo.
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Program translation stage checks that program can be offloaded to
the netdev which was passed during the load (bpf_attr->prog_ifindex).
After program sharing was introduced, however, the netdev on which
program is loaded can theoretically be different, and therefore
we should recheck the program size and max stack size at load time.
This was found by code inspection, AFAIK today all vNICs have
identical caps.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Atomic operations on the NFP are currently always in big endian.
The driver keeps track of regions of memory storing atomic values
and byte swaps them accordingly. There are corner cases where
the map values may be initialized before the driver knows they
are used as atomic counters. This can happen either when the
datapath is performing the update and the stack contents are
unknown or when map is updated before the program which will
use it for atomic values is loaded.
To avoid situation where user initializes the value to 0 1 2 3
and then after loading a program which uses the word as an atomic
counter starts reading 3 2 1 0 - only allow atomic counters to be
initialized to endian-neutral values.
For updates from the datapath the stack information may not be
as precise, so just allow initializing such values to 0.
Example code which would break:
struct bpf_map_def SEC("maps") rxcnt = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(__u32),
.value_size = sizeof(__u64),
.max_entries = 1,
};
int xdp_prog1()
{
__u64 nonzeroval = 3;
__u32 key = 0;
__u64 *value;
value = bpf_map_lookup_elem(&rxcnt, &key);
if (!value)
bpf_map_update_elem(&rxcnt, &key, &nonzeroval, BPF_ANY);
else
__sync_fetch_and_add(value, 1);
return XDP_PASS;
}
$ offload bpftool map dump
key: 00 00 00 00 value: 00 00 00 03 00 00 00 00
should be:
$ offload bpftool map dump
key: 00 00 00 00 value: 03 00 00 00 00 00 00 00
Reported-by: David Beckett <david.beckett@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When sending a big fragment using multiple buffer descriptor,
hns3 does one maping, but do multiple unmapping when tx is done,
which may cause unmapping problem.
To fix it, this patch makes sure the value of desc_cb.length of
the non-first bd is zero. If desc_cb.length is zero, we do not
unmap the buffer.
Fixes: 76ad4f0ee7 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To keep symmetrical, this patch renames hns_nic_dma_unmap to
hns3_clear_desc.
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch unifies big tx fragment handling for tso and non-tso
case.
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To solve the L3 checksum error problem which happens when driver
does not clear L3 checksum, DMA map should be done after calling
skb_cow_head.
This patch moves DMA map into hns3_fill_desc to ensure that DMA
map is done after calling skb_cow_head.
Fixes: 76ad4f0ee7 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes hns3_fill_desc_tso in preparation for
fixing some desc filling bug, because for tso or non-tso
case, we will use the unified hns3_fill_desc.
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Newly added link modes are required to be added
during setting link modes. If the new link mode
is not available during qed_set_link, it may cause
link getting down due to empty supported capability,
being passed to MFW, after setting autoneg off/on
with current/supported speed.
Signed-off-by: Rahul Verma <Rahul.Verma@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set link mode after checking available "supported" link caps
of the port.
Signed-off-by: Rahul Verma <Rahul.Verma@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added transceiver type, speed capability and board types
in HSI, are utilizing to display the accurate link
information in ethtool.
Signed-off-by: Rahul Verma <Rahul.Verma@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added transceiver modes with different speed and media type,
speed capability and supported board types in HSI, which
will be utilizing to display correct specification of link
modes and speed type.
Signed-off-by: Rahul Verma <Rahul.Verma@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Align the use of local PTT to propagate through the qed_mcp* API's.
Global ptt should not be used.
Register access should be done through layers. Register address is
mapped into a PTT, PF translation table. Several interface functions
require a PTT to direct read/write into register. There is a pool of
PTT maintained, and several PTT are used simultaneously to access
device registers in different flows. Same PTT should not be used in
flows that can run concurrently.
To avoid running out of PTT resources, too many PTT should not be
acquired without releasing them. Every PF has a global PTT, which is
used throughout the life of PF, in most important flows for register
access. Generic functions acquire the PTT locally and release after
the use. This patch aligns the use of Global PTT and Local PTT
accordingly.
Signed-off-by: Rahul Verma <rahul.verma@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following sparse warning:
drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c:282:5: warning:
symbol 'aq_fw2x_update_stats' was not declared. Should it be static?
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously when populating the set ipv6 address action, we incorrectly
made use of pedit's key index to determine which 32bit word should be
set. We now calculate which word has been selected based on the offset
provided by the pedit action.
Fixes: 354b82bb32 ("nfp: add set ipv6 source and destination address")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously we only allowed a single header key per pedit action to
change the header. This used to result in the last header key in the
pedit action to overwrite previous headers. We now keep track of them
and allow multiple header keys per pedit action.
Fixes: c0b1bd9a8b ("nfp: add set ipv4 header action flower offload")
Fixes: 354b82bb32 ("nfp: add set ipv6 source and destination address")
Fixes: f8b7b0a6b1 ("nfp: add set tcp and udp header action flower offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously we did not correctly change headers when using multiple
pedit actions with partial masks. We now take this into account and
no longer just commit the last pedit action.
Fixes: c0b1bd9a8b ("nfp: add set ipv4 header action flower offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit db65f35f50 ("net: fec: add support of ethtool get_regs") introduce
ethool "--register-dump" interface to dump all FEC registers.
But not all silicon implementations of the Freescale FEC hardware module
have the FRBR (FIFO Receive Bound Register) and FRSR (FIFO Receive Start
Register) register, so we should not be trying to dump them on those that
don't.
To fix it we create a quirk flag, FEC_QUIRK_HAS_RFREG, and check it before
dump those RX FIFO registers.
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new poll function that polls for NQ events. If the NQ event is
a CQ notification, we locate the CP ring from the cq_handle and call
__bnxt_poll_work() to handle RX/TX events on the CP ring.
Add a new has_more_work field in struct bnxt_cp_ring_info to indicate
budget has been reached. __bnxt_poll_cqs_done() is called to update or
ARM the CP rings if budget has not been reached or not. If budget
has been reached, the next bnxt_poll_p5() call will continue to poll
from the CQ rings directly. Otherwise, the NQ will be ARMed for the
next IRQ.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Separate the CP ring polling logic in bnxt_poll_work() into 2 separate
functions __bnxt_poll_work() and __bnxt_poll_work_done(). Since the logic
is separated, we need to add tx_pkts and events fields to struct bnxt_napi
to keep track of the events to handle between the 2 functions. We also
add had_work_done field to struct bnxt_cp_ring_info to indicate whether
some work was performed on the CP ring.
This is needed to better support the 57500 chips. We need to poll up to
2 separate CP rings before we update or ARM the CP rings on the 57500 chips.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On legacy chips, the CP ring may be shared between RX and TX and so only
setup the RX coalescing parameters in such a case. On 57500 chips, we
always have a dedicated CP ring for TX so we can always set up the
TX coalescing parameters in bnxt_hwrm_set_coal().
Also, the min_timer coalescing parameter applies to the NQ on the new
chips and a separate firmware call needs to be made to set it up.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the RX code path, we current use the bnxt_napi struct pointer to
identify the associated RX/CP rings. Change it to use the struct
bnxt_cp_ring_info pointer instead since there are now up to 2
CP rings per MSIX.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RSS context allocation and RSS indirection table setup are very different
on the new chip. Refactor bnxt_setup_vnic() to call 2 different functions
to set up RSS for the vnic based on chip type. On the new chip, the
number of RSS contexts and the indirection table size depends on the
number of RX rings. Each indirection table entry is also different
on the new chip since ring groups are no longer used.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On the new 57500 chips, we need to allocate one RSS context for every
64 RX rings. In previous chips, only one RSS context per vnic is
required regardless of the number of RX rings. So increase the max
RSS context array count to 8.
Hardware ring groups are not used on the new chips. Note that the
software ring group structure is still maintained in the driver to
keep track of the rings associated with the vnic.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On the new 57500 chips, we allocate/free one CP ring for each RX ring or
TX ring separately. Using separate CP rings for RX/TX is an improvement
as TX events will no longer be stuck behind RX events.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware ring allocation semantics are slightly different for most
ring types on 57500 chips. Allocation/deallocation for NQ rings are
also added for the new chips.
A CP ring handle is also added so that from the NQ interrupt event,
we can locate the CP ring.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On the new 57500 chips, getting the associated CP ring ID associated with
an RX ring or TX ring is different than before. On the legacy chips,
we find the associated ring group and look up the CP ring ID. On the
57500 chips, each RX ring and TX ring has a dedicated CP ring even if
they share the MSIX. Use these helper functions at appropriate places
to get the CP ring ID.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On 57500 chips, the original bnxt_cp_ring_info struct now refers to the
NQ. bp->cp_nr_rings refer to the number of NQs on 57500 chips. There
are now 2 pointers for the CP rings associated with RX and TX rings.
Modify bnxt_alloc_cp_rings() and bnxt_free_cp_rings() accordingly.
With multiple CP rings per NAPI, we need to add a pointer in
bnxt_cp_ring_info struct to point back to the bnxt_napi struct.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ring reservation functions have to be modified for P5 chips in the
following ways:
- bnxt_cp_ring_info structs map to internal NQs as well as CP rings.
- Ring groups are not used.
- 1 CP ring must be available for each RX or TX ring.
- number of RSS contexts to reserve is multiples of 64 RX rings.
- RFS currently not supported.
Also, RX AGG rings are only used for jumbo frames, so we need to
unconditionally call bnxt_reserve_rings() in __bnxt_open_nic()
to see if we need to reserve AGG rings in case MTU has changed.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Store the maximum MSIX capability in PCIe config. space earlier. When
we call firmware to query capability, we need to compare the PCIe
MSIX max count with the firmware count and use the smaller one as
the MSIX count for 57500 (P5) chips.
The new chips don't use ring groups. But previous chips do and
the existing logic limits the available rings based on resource
calculations including ring groups. Setting the max ring groups to
the max rx rings will work on the new chips without changing the
existing logic.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 57500 series chips have a new 64-bit doorbell format. Use a new
bnxt_db_info structure to unify the new and the old 32-bit doorbells.
Add a new bnxt_set_db() function to set up the doorbell addreses and
doorbell keys ahead of time. Modify and introduce new doorbell
helpers to help abstract and unify the old and new doorbells.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
57500 series is a new chip class (P5) that requires some driver changes
in the next several patches. This adds basic chip ID, doorbells, and
the notification queue (NQ) structures. Each MSIX is associated with an
NQ instead of a CP ring in legacy chips. Each NQ has up to 2 associated
CP rings for RX and TX. The same bnxt_cp_ring_info struct will be used
for the NQ.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call firmware to configure the DMA addresses of all context memory
pages on new devices requiring context memory.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New device requires host context memory as a backing store. Call
firmware to check for context memory requirements and store the
parameters. Allocate host pages accordingly.
We also need to move the call bnxt_hwrm_queue_qportcfg() earlier
so that all the supported hardware queues and the IDs are known
before checking and allocating context memory.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Newer chips require the PTU_PTE_VALID bit to be set for every page
table entry for context memory and rings. Additional bits are also
required for page table entries for all rings. Add a flags field to
bnxt_ring_mem_info struct to specify these additional bits to be used
when setting up the pages tables as needed.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the DMA page table and vmem fields in bnxt_ring_struct to a new
bnxt_ring_mem_info struct. This will allow context memory management
for a new device to re-use some of the existing infrastructure.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New firmware spec. allows interrupt coalescing parameters, such as
maximums, timer units, supported features to be queried. Update
the driver to make use of the new call to query these parameters
and provide the legacy defaults if the call is not available.
Replace the hard-coded values with these parameters.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support the max_ext_req_len field from the HWRM_VER_GET_RESPONSE.
If this field is valid and greater than the mailbox size, use the
short command format to send firmware messages greater than the
mailbox size. Newer devices use this method to send larger messages
to the firmware.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Latest firmware spec. has some additional rx extended port stats and new
tx extended port stats added. We now need to check the size of the
returned rx and tx extended stats and determine how many counters are
valid. New counters added include CoS byte and packet counts for rx
and tx.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Among the new changes are trusted VF support, 200Gbps support, and new
API to dump ring information on the new chips.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial fix to spelling mistake in DP_INFO message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netif_device_detach() stops all tx queues already, so we don't need
this call.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify this function, no functional change intended.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The newly added driver causes a warning about a function that is
not used anywhere:
drivers/net/ethernet/marvell/octeontx2/af/cgx.c:320:12: error: 'cgx_fwi_link_change' defined but not used [-Werror=unused-function]
Remove it for now, until a user gets added. If we want to use this
function from another module, we also need a declaration in a header
file, which is currently missing, so it would have to change anyway.
Fixes: 1463f382f5 ("octeontx2-af: Add support for CGX link management")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit makes it possible to use devlink to split the 100G CXP
Netronome into two 40G interfaces. Currently when you ask for 2
interfaces, the math in src/nfp_devlink.c:nfp_devlink_port_split
calculates that you want 5 lanes per port because for some reason
eth_port.port_lanes=10 (shouldn't this be 12 for CXP?). What we really
want when asking for 2 breakout interfaces is 4 lanes per port. This
commit makes that happen by calculating based on 8 lanes if 10 are
present.
Signed-off-by: Ryan C Goodfellow <rgoodfel@isi.edu>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Greg Weeks <greg.weeks@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to the hardware ArchDef, the PTV1 field in FD[CTRL]
is ignored by WRIOP, so setting it for Tx FDs is pointless.
Remove all references to it from the code.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ch parameter is never used in the dpaa2_eth_tx_conf function but
since its prototype must match the type defined in the consume field of
struct dpaa2_eth_fq, just mark it as __always_unused.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The priv parameter is never used in the build_linear_skb and
drain_channel function. Remove it from the function definitions.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All 3 cases of possible uninitialized variables are false
positives since they are used only as output parameters.
Nonetheless, fix the warnings.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dpaa2_eth_set_dist_key function is only used in a single file.
Make it static.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both ARCH_LAYERSCAPE and COMPILE_TEST dependencies are already implied
through the FSL_MC_BUS dep, so there's no need to state it explicitly.
Also, the fsl-mc bus depends on COMPILE_TEST only for some
architectures (arm, arm64, ppc, x86), so it's not correct to
claim build support unconditionally.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In dual-emac mode the cpsw driver sends directed packets, that means
that packets go to the directed port, but an ALE lookup is performed
to determine untagged egress only. It means that on tx side no need
to add port bit for ALE mcast entry mask, and basically ALE entry
for port identification is needed only on rx side.
So, add only host port in dual_emac mode as used directed
transmission, and no need in one more port. For single port boards
and switch mode all ports used, as usual, so no changes for them.
Also it simplifies farther changes.
In other words, mcast entries for dual-emac should behave exactly
like unicast. It also can help avoid leaking packets between ports
with same vlan on h/w level if ports could became members of same vid.
So now, for instance, if mcast address 33:33:00:00:00:01 is added then
entries in ALE table:
vid = 1, addr = 33:33:00:00:00:01, port_mask = 0x1
vid = 2, addr = 33:33:00:00:00:01, port_mask = 0x1
Instead of:
vid = 1, addr = 33:33:00:00:00:01, port_mask = 0x3
vid = 2, addr = 33:33:00:00:00:01, port_mask = 0x5
With the same considerations, set only host port for unregistered
mcast for dual-emac mode in case of IFF_ALLMULTI is set, exactly like
it's done in cpsw_ale_set_allmulti().
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Whenever kernel or user decides to call rx mode update, it clears
every multicast entry from forwarding table and in some time adds
it again. This time can be enough to drop incoming multicast packets.
That's why clear only staled multicast entries and update or add new
one afterwards.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It allows to use function under callbacks with same const qualifier of
mac address for farther changes.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
On GENETv5, there is a hardware issue which prevents the GENET hardware
from generating a link UP interrupt when the link is operating at
10Mbits/sec. Since we do not have any way to configure the link
detection logic, fallback to polling in that case.
Fixes: 421380856d ("net: bcmgenet: add support for the GENETv5 hardware")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJbvqcMAAoJEEg/ir3gV/o+eFsH/2TbJH+i1BuGMVwCB8o+U1Rz
C01pJmR7Lb7WwQZ8ZKTOqQkS7BkGX1hNGyIlc4i6ZnP+4gsVJAbP6LKPjTvyD7e6
TNb8bvxTUCOovknrevKkGba8tzoTTsC4wwwbHLGHd1hkKSY1P5hXg8R7vpear+n6
/PFJwzpIXDAa8AHqeORCNYj7MneUm3kaahcmSOxOhvDbRx3UG9cgy7tEhPjZbRn5
jPFsxFCSPcGedtI+g8bzodmpneTcu1KF6QCunrl2bGt5EzgDrbaw1UUoctxD2CJR
Ch45W807EvBJoFiJXXCNf9N+p5020F/Q+mTmK7khPirUjdtoLdcT9Goswpjfbtk=
=vJIA
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2018-10-10' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2018-10-10
This pull request includes some fixes to mlx5 driver,
Please pull and let me know if there's any problem.
For -stable v4.11:
('net/mlx5: Take only bit 24-26 of wqe.pftype_wq for page fault type')
For -stable v4.17:
('net/mlx5: Fix memory leak when setting fpga ipsec caps')
For -stable v4.18:
('net/mlx5: WQ, fixes for fragmented WQ buffers API')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
IPoIB netlink support and mlx5e pre-allocated netdevice initialization
IP link was broken due to the changes in IPoIB for the rdma_netdev
support after commit cd565b4b51
("IB/IPoIB: Support acceleration options callbacks").
This patchset fixes IPoIB pkey creation and removal using rtnetlink by
adding support in both IPoIB ULP layer and mlx5 layer:
From Jason and Denis:
1) Introduces changes in the RDMA netdev code in order to
allow allocation of the netdev to be done by the rtnl netdev code.
2) Reworks IPoIB initialization to use the two step rdma_netdev
creation.
From Feras and Saeed, mlx5e netdev layer refactoring to allow accepting
pre-allocated netdevs:
3) Adds support to initialize/cleanup netdevs that are not created
by mlx5 driver.
4) Change mlx5e netdevice layer to accept the pre-allocated netdevice
queue number.
5) Initialize mlx5e generic structures in one place to be used for all
netdevs types NIC/representors/IPoIB (both mlx5 allocated and
pre-allocted).
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJbvqHhAAoJEEg/ir3gV/o+TRQH+wYWgvi5AbqYlGpT8xSho8Dg
5tuFE9AbYDGLLWptZgtYXsBsTaltGVqKnwWy//dAuuSI7LvqtjeKqq0G1JKau70L
/49+vv8NTQ6vov2yY95yzzEo5B1/zVcYEl9dmJl4y6Xlwc+YIteewReVqRj+tucR
xO7ghLWc1o4Pl7gWBAcOULyOsZc+CtG8ElwEaBROXTtYRpsR7FKn7vN+WdWZ2VOG
MjtCUaG0vZlK1vaF76J3P9bL2V3y6o6i6S8RZwg1kPxO4jnrzyGrkxWMOX6f32KB
JAUcLVlJW5wckcRAKfTwLxsFMrc9vLBPHyj4XneXVWGKh8/dGUGY5gdGIJV5Jlc=
=m0tM
-----END PGP SIGNATURE-----
Merge tag 'mlx5e-updates-2018-10-10' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5e-updates-2018-10-10
IPoIB netlink support and mlx5e pre-allocated netdevice initialization
IP link was broken due to the changes in IPoIB for the rdma_netdev
support after commit cd565b4b51
("IB/IPoIB: Support acceleration options callbacks").
This patchset fixes IPoIB pkey creation and removal using rtnetlink by
adding support in both IPoIB ULP layer and mlx5 layer:
From Jason and Denis:
1) Introduces changes in the RDMA netdev code in order to
allow allocation of the netdev to be done by the rtnl netdev code.
2) Reworks IPoIB initialization to use the two step rdma_netdev
creation.
From Feras and Saeed, mlx5e netdev layer refactoring to allow accepting
pre-allocated netdevs:
3) Adds support to initialize/cleanup netdevs that are not created
by mlx5 driver.
4) Change mlx5e netdevice layer to accept the pre-allocated netdevice
queue number.
5) Initialize mlx5e generic structures in one place to be used for all
netdevs types NIC/representors/IPoIB (both mlx5 allocated and
pre-allocted).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Originally, we have an issue where r8169 MSI-X interrupt is broken after
S3 suspend/resume on RTL8106e of ASUS X441UAR.
02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136]
(rev 07)
Subsystem: ASUSTeK Computer Inc. RTL810xE PCI Express Fast
Ethernet controller [1043:200f]
Flags: bus master, fast devsel, latency 0, IRQ 16
I/O ports at e000 [size=256]
Memory at ef100000 (64-bit, non-prefetchable) [size=4K]
Memory at e0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
Capabilities: [d0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [160] Device Serial Number 01-00-00-00-36-4c-e0-00
Capabilities: [170] Latency Tolerance Reporting
Kernel driver in use: r8169
Kernel modules: r8169
We found the all of the values in PCI BAR=4 of the ethernet adapter
become 0xFF after system resumes. That breaks the MSI-X interrupt.
Therefore, we can only fall back to MSI interrupt to fix the issue at
that time.
However, there is a commit which resolves the drivers getting nothing in
PCI BAR=4 after system resumes. It is 04cb3ae895d7 "PCI: Reprogram
bridge prefetch registers on resume" by Daniel Drake.
After apply the patch, the ethernet adapter works fine before suspend
and after resume. So, we can revert the workaround after the commit
"PCI: Reprogram bridge prefetch registers on resume" is merged into main
tree.
This patch reverts commit 7bb05b85bc
"r8169: don't use MSI-X on RTL8106e".
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=201181
Fixes: 7bb05b85bc ("r8169: don't use MSI-X on RTL8106e")
Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts were easy to resolve using immediate context mostly,
except the cls_u32.c one where I simply too the entire HEAD
chunk.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch resumes promisc mode and vlan filter status after
loopback test.
Fixes: 3b75c3df59 ("net: hns3: net: hns3: Add support for IFF_ALLMULTI flag")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch resumes promisc mode and vlan filter status after reset.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the driver does nothing when mac vlan table is full.
In this case, the packet with new mac address will be dropped
by hardware. This patch adds check for the result of sync mac
address, and enable promisc mode when mac vlan table is full.
Furtherly, disable vlan filter when enable promisc by user
command.
Fixes: 46a3df9f97 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace the repeated license text with SDPX identifiers.
While at it bump the Copyright dates for files we touched
this year.
Signed-off-by: Edwin Peer <edwin.peer@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Nic Viljoen <nick.viljoen@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It has been reported that since
commit 05212ba813 ("r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices")
at least RTL_GIGA_MAC_VER_38 NICs work erratically after a resume from
suspend.
The problem has been traced to a missing RX_MULTI_EN bit in the RxConfig
register.
We already set this bit for RTL_GIGA_MAC_VER_35 NICs of the same 8168F
chip family so let's do it also for its other siblings: RTL_GIGA_MAC_VER_36
and RTL_GIGA_MAC_VER_38.
Curiously, the NIC seems to work fine after a system boot without having
this bit set as long as the system isn't suspended and resumed.
Fixes: 05212ba813 ("r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices")
Reported-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 63ae7949e9 ("net: socionext: Use descriptor info instead of MMIO reads on Rx")
removed constant mmio reads from the driver and started using a descriptor
field to check if packet should be processed.
This lead the napi rx handler being constantly called while no packets
needed processing and ksoftirq getting 100% cpu usage. Issue one mmio read
to clear the irq correcty after processing packets
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
During boot, mlx4_core sets the driverinit configuration parameters and
updates the devlink module on the initial values calling
devlink_param_driverinit_value_set().
If devlink_param_driverinit_value_set() returns an error mlx4_core
reports kernel module warning.
This caused false alarm during boot in case kernel was compiled with
CONFIG_NET_DEVLINK off.
Fix by removing warning reported in case
devlink_param_driverinit_value_set() fails.
This actually makes the function mlx4_devlink_set_init_value()
redundant to using directly devlink_param_driverinit_value_set() and so
removed.
It fixes the following kernel trace:
mlx4_core 0000:00:06.0: devlink set parameter 0 value failed (err = -95)
mlx4_core 0000:00:06.0: devlink set parameter 1 value failed (err = -95)
mlx4_core 0000:00:06.0: devlink set parameter 4 value failed (err = -95)
mlx4_core 0000:00:06.0: devlink set parameter 5 value failed (err = -95)
mlx4_core 0000:00:06.0: devlink set parameter 3 value failed (err = -95)
Fixes: bd1b51dc66 ("mlx4: Add mlx4 initial parameters table and register it")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With CONFIG_THERMAL=m, we get a build error:
drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c: In function 'cxgb4_thermal_get_trip_type':
drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c:48:11: error: 'struct adapter' has no member named 'ch_thermal'
Once that is fixed by using IS_ENABLED() checks, we get a link error
against the thermal subsystem when cxgb4 is built-in:
drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.o: In function `cxgb4_thermal_init':
cxgb4_thermal.c:(.text+0x180): undefined reference to `thermal_zone_device_register'
drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.o: In function `cxgb4_thermal_remove':
cxgb4_thermal.c:(.text+0x1e0): undefined reference to `thermal_zone_device_unregister'
Finally, since CONFIG_THERMAL can be =m, the Makefile fails to pick up the
extra file into built-in.a, and we get another link failure against the
cxgb4_thermal_init/cxgb4_thermal_remove files, so the Makefile has to
be adapted as well to work for both CONFIG_THERMAL=y and =m.
Fixes: b187191577 ("cxgb4: Add thermal zone support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove redundant spinlock acquire parameter from ena_com_admin_init()
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Improves socket memory utilization when receiving packets larger
than 128 bytes (the previous rx copybreak) and smaller than 256 bytes.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently Rx refill is done when the number of required descriptors is
above 1/8 queue size. With a default of 1024 entries per queue the
threshold is 128 descriptors.
There is intention to increase the queue size to 8196 entries.
In this case threshold of 1024 descriptors is too large and can hurt
latency.
Add another limitation to Rx threshold to be at most 256 descriptors.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set skb->ip_summed to the correct value as reported by the device.
Add counter for the case where rx csum offload is enabled but
device didn't check it.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch includes all code changes necessary in ena_netdev to enable
packet sending via the LLQ placemnt mode.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces APIs for detection, initialization, configuration
and actual usage of low latency queues(LLQ). It extends transmit API with
creation of LLQ descriptors in device memory (which include host buffers
descriptors as well as packet header)
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Low Latency Queues(LLQ) allow usage of device's memory for descriptors
and headers. Such queues decrease processing time since data is already
located on the device when driver rings the doorbell.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new fields and definitions to host info and fill them
according to the latest ENA spec version.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce fastpath overhead by making ena_com_tx_comp_req_id_get() inline.
Also move it to ena_eth_com.h file with its dependency function
ena_com_cq_inc_head().
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DECAP_ECN0 trap will be used to trap packets where the overlay
packet is marked with Non-ECT, but the underlay packet is marked with
either ECT(0), ECT(1) or CE. When trapped, such packets will be counted
as errors by the VxLAN driver and thus provide better visibility.
The NVE_ENCAP_ARP trap will be used to trap ARP packets undergoing NVE
encapsulation. This is needed in order to support E-VPN ARP suppression,
where the Linux bridge does not flood ARP packets through tunnel ports
in case it can answer the ARP request itself.
Note that all the packets trapped via these traps are marked with
'offload_fwd_mark', so as to not be re-flooded by the Linux bridge
through the ASIC ports.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the following resources to be used by the NVE code:
* Number of IPv4 underlay destination IPs in a single TNUMT record
* Number of IPv6 underlay destination IPs in a single TNUMT record
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This register is used for setting up the parsing for hash, policy-engine
and routing.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Will be used to program the device with FDB records pointing to a NVE
tunnel.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TNQDR register configures the default QoS settings for NVE
encapsulation.
It will be used to set the default DSCP of each port to 0, so that when
DSCP is set to inherit and the overlay packet does not have an IP header
the outer DSCP will be set to 0, in accordance with the software data
path.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The register configures how QoS is set in Encapsulation into the
underlay network.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This register configures the actions that are done during NVE
decapsulation based on the ECN bits.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This register performs mapping from overlay ECN to underlay ECN during
NVE encapsulation.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This register builds the linked list of underlay destination IPs used
for BUM traffic on the overlay.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This register enables / disables learning on different types of tunnel
ports (e.g., NVE, VPLS).
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This register configures global NVE configuration such as source IP of
the NVE tunnel and UDP source port calculation.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the seed of the LAG hash function is always set to 0, which
means it is identical across all switches. Instead, use a random number.
This is especially important now that VxLAN is supported, as the LAG
hash function is used to calculate the UDP source port of the
encapsulated packet.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device has the ability to flush all the FDB records that perform NVE
encapsulation or only a subset of these with a specific filtering
identifier (FID).
Expose these types so that they could be used by subsequent patches
where we need to flush the FDB records when an NVE device is unlinked
from a bridge (FID).
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the device needs to flood an overlay packet to remote VTEPs it
retrieves a pointer to the head of a linked-list of records that store
the IP addresses of these VTEPs.
These records are stored in the KVD linear memory and configured via the
Tunneling NVE Underlay Multicast Table (TNUMT) register.
Add a new KVD linear entry type for these records, so that we will be
able to allocate and free them.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The L3 protocol and address definitions are going to be used by the NVE
code, so move them to the global header file from the one private to the
router.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VxLAN notifications are going to use a different notifier information
type, so cast to the correct type based on the received event.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VxLAN FDB updates are sent with the VxLAN device which is not our upper
and will therefore be ignored by current code.
Solve this by checking whether the upper device (bridge) is our upper.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VxLAN FDB notifications need to be handled differently than bridge FDB
notifications, so initialize the work item based on the received
notification and rename the invoked function accordingly.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The spectrum_router.h header file is private to the router block and
should only be included by direct consumers of it, such as dpipe and the
multicast routing code.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/marvell/octeontx2/af/cgx.c: In function 'cgx_fwi_event_handler':
drivers/net/ethernet/marvell/octeontx2/af/cgx.c:257:17: warning:
variable 'dev' set but not used [-Wunused-but-set-variable]
It never be used since introduction in
commit 1463f382f5 ("octeontx2-af: Add support for CGX link management")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for the NETIF_F_RXFCS feature in the Mscc
Ethernet driver. This feature is disabled by default and allow a user
to request the driver not to drop the FCS and to extract it into the skb
for debugging purposes.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drivers should call skb_set_hash to set the hash and its type
in an skbuff.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds RSS tuple support for VF in revision
0x21.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds RSS key, hash algorithm configuration support
for VF in revision 0x21.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds ETH_RSS_HASH_XOR hash algorithm supports, which
is supported by hw revision 0x21.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Read the host context count symbols provided by firmware and use
it to determine the number of allocated stats ids. Previously it
won't be possible to offload more than 2^17 filter even if FW was
able to do so.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of an array stats instead of storing stats per flow which
would require a hash lookup at critical times.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of relativistic hash tables for tracking flows instead
of fixed sized hash tables.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are several variables being initialized that are being set later
and hence the initialization is redundant and can be removed. Remove
then.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx5e netdevice used to calculate fragment edges by a call to
mlx5_wq_cyc_get_frag_size(). This calculation did not give the correct
indication for queues smaller than a PAGE_SIZE, (broken by default on
PowerPC, where PAGE_SIZE == 64KB). Here it is replaced by the correct new
calls/API.
Since (TX/RX) Work Queues buffers are fragmented, here we introduce
changes to the API in core driver, so that it gets a stride index and
returns the index of last stride on same fragment, and an additional
wrapping function that returns the number of physically contiguous
strides that can be written contiguously to the work queue.
This obsoletes the following API functions, and their buggy
usage in EN driver:
* mlx5_wq_cyc_get_frag_size()
* mlx5_wq_cyc_ctr2fragix()
The new API improves modularity and hides the details of such
calculation for mlx5e netdevice and mlx5_ib rdma drivers.
New calculation is also more efficient, and improves performance
as follows:
Packet rate test: pktgen, UDP / IPv4, 64byte, single ring, 8K ring size.
Before: 16,477,619 pps
After: 17,085,793 pps
3.7% improvement
Fixes: 3a2f703312 ("net/mlx5: Use order-0 allocations for all WQ types")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The HW spec defines only bits 24-26 of pftype_wq as the page fault type,
use the required mask to ensure that.
Fixes: d9aaed8387 ("{net,IB}/mlx5: Refactor page fault handling")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Allocated memory for context should be freed once
finished working with it.
Fixes: d6c4f0298c ("net/mlx5: Refactor accel IPSec code")
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The current design of mlx5e driver ignores the netdevice TX/RX queues
number for netdevices that RDMA IPoIB ULP creates. Instead, the queue
number is initialized to the maximum number that mlx5 thinks best for
performance. As a result, ULP drivers that choose to create a netdevice
with queue number that is less than the maximum channels mlx5 creates,
will get a memory corruption.
This fix changes the mlx5e netdev logic to respect ULP netdevices TX/RX
queue number and use it when creating resources instead of the maximum
channel number.
Fixes: cd565b4b51 ("IB/IPoIB: Support acceleration options callbacks")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Convert mlx5e update stats work to a normal work structure, since it is
never used delayed.
Add a helper function to queue update stats work on demand which checks
for some conditions and reduce code duplication to have a better
abstraction.
Fixes: ed56c5193a ("net/mlx5e: Update NIC HW stats on demand only")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Move all mlx5e generic structures initializations to mlx5e_netdev_init.
The common structure new initializer function will be used to initialize
mlx5 context for netlink created netdevs such as IPoIB mlx5 accelerated
child netdevs.
Fixes: cd565b4b51 ("IB/IPoIB: Support acceleration options callbacks")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
mlx5e_detach_netdev cancels update_stats work which was not initialized
in ipoib netdevice profile, as a result, the following assert occurs:
ODEBUG: assert_init not available (active state 0) object type:
timer_list hint:(null)
This change moves the update stats work to be initialized for all
mlx5e netdevices.
Fixes: cd565b4b51 ("IB/IPoIB: Support acceleration options callbacks")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Introduce a helper init/cleanup function that initializes mlx5e generic
netdev private structure, and use them from all profiles init/cleanup
callbacks.
This patch will also be helpful to initialize/cleanup netdevs that are
not created by mlx5 driver, e.g: accelerated ipoib child netdevs.
Fixes: 26e59d8077 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
netdev has several interfaces that expect to call alloc_netdev_mqs from
the core code, with the driver only providing the arguments. This is
incompatible with the rdma_netdev interface that returns the netdev
directly.
Thus re-organize the API used by ipoib so that the verbs core code calls
alloc_netdev_mqs for the driver. This is done by allowing the drivers to
provide the allocation parameters via a 'get_params' callback and then
initializing an allocated netdev as a second step.
Fixes: cd565b4b51 ("IB/IPoIB: Support acceleration options callbacks")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Denis Drozdov <denisd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Added support in RVU AF driver to register for
CGX LMAC link status change events from firmware
and managing them. Processing part will be added
in followup patches.
- Introduced eventqueue for posting events from cgx lmac.
Queueing mechanism will ensure that events can be posted
and firmware can be acked immediately and hence event
reception and processing are decoupled.
- Events gets added to the queue by notification callback.
Notification callback is expected to be atomic, since it
is called from interrupt context.
- Events are dequeued and processed in a worker thread.
Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CGX LMAC initialization, link status polling etc is done
by low level secure firmware. For link management this patch
adds a interface or communication mechanism between firmware
and this kernel CGX driver.
- Firmware interface specification is defined in cgx_fw_if.h.
- Support to send/receive commands/events to/form firmware.
- events/commands implemented
* link up
* link down
* reading firmware version
Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Nithya Mani <nmani@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each of the enabled CGX LMAC is considered a physical
interface and RVU PFs are mapped to these. VFs of these
SRIOV PFs will be virtual interfaces and share CGX LMAC
along with PF.
This mapping info will be used later on for Rx/Tx pkt steering.
Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds basic template for Marvell OcteonTX2's
CGX ethernet interface driver. Just the probe.
RVU AF driver will use APIs exported by this driver
for various things like PF to physical interface mapping,
loopback mode, interface stats etc. Hence marged both
drivers into a single module.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
HW interprets RVU_AF_MSIXTR_BASE address as an IOVA, hence
create a IOMMU mapping for the physcial address configured by
firmware and reconfig RVU_AF_MSIXTR_BASE with IOVA.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware configures a certain number of MSIX vectors to each of
enabled RVU PF/VF. When a block LF is attached to a PF/VF, number
of MSIX vectors needed by that LF are set aside (out of PF/VF's
total MSIX vectors) and LF's msix_offset is configured in HW.
Also added support for a RVU PF/VF to retrieve that block LF's
MSIX vector offset information from AF via mbox.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added support for a RVU PF/VF to request AF via mailbox
to attach or detach NPA/NIX/SSO/SSOW/TIM/CPT block LFs.
Also supports partial detachment and modifying current
LF attached count of a certian block type.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Scan all RVU blocks to find any 'LF to RVU PF/VF' mapping done by
low level firmware. If found any, mark them as used in respective
block's LF bitmap and also save mapped PF/VF's PF_FUNC info.
This is done to avoid reattaching a block LF to a different RVU PF/VF.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With 10's of mailbox messages expected to be handled in future,
checking for message id could become a lengthy switch case. Hence
added a macro to auto generate the switch case for each msg id.
Signed-off-by: Aleksey Makarov <amakarov@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for mailbox interrupt and message
handling. Mapped mailbox region and registered a workqueue
for message handling. Enabled mailbox IRQ of RVU PFs
and registered a interrupt handler. When IRQ is triggered
work is added to the mbox workqueue for msgs to get processed.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds mailbox support infrastructure APIs.
Each RVU device has a dedicated 64KB mailbox region
shared with it's peer for communication. RVU AF has
a separate mailbox region shared with each of RVU PFs
and a RVU PF has a separate region shared with each of
it's VF.
These set of APIs are used by this driver (RVU AF) and
other RVU PF/VF drivers eg netdev, crypto e.t.c.
Signed-off-by: Aleksey Makarov <amakarov@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Lukasz Bartosik <lbartosik@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch gathers NPA/NIX/SSO/SSOW/TIM/CPT RVU blocks's
HW info like number of LFs. Important register offsets
saved for later use to avoid code duplication for each block.
A bitmap is allocated for each of the blocks which later
on will be used to allocate a LF for a RVU PF/VF.
Also added RVU NIX/NPA block registers and few registers
of other blocks.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Go through all BLKADDRs and check which ones are implemented
on this silicon and do a HW reset of each implemented block.
Also added all RVU AF and PF register offsets.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds basic template for Marvell OcteonTX2's
resource virtualization unit (RVU) admin function (AF)
driver. Just the driver registration and probe.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently driver registers to physical link notifications (of the device)
from Management firmware (MFW). Driver doesn't get notified if there's a
change in the virtual link e.g., link-flap on the peer PF interface.
Virtual link indication from MFW reflects the per PF link status instead
of the physical link.
The patch adds driver support for,
- Advertising the virtual link support to MFW.
- Handling the virtual link notification from MFW.
Please consider applying it to 'net-next'.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add thermal zone support to monitor ASIC's temperature.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When memory is limited (on kdump kernel), reduce size of rx and tx rings.
Also reduce the number of rx rings.
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate potential auto casting compilation error.
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
napi poll functions should be initialized before running request_irq(),
to handle a rare condition where there is a pending interrupt, causing
the ISR to fire immediately while the poll function wasn't set yet,
causing a NULL dereference.
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a rare scenario when ena_device_restore() fails, followed by device
remove, an FLR will not be issued. In this case, the device will keep
sending asynchronous AENQ keep-alive events, even after driver removal,
leading to memory corruption.
Fixes: 8c5c7abdeb ("net: ena: add power management ops to the ENA driver")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Memory mapped with devm_ioremap is automatically freed when the driver
is disconnected from the device. Therefore there is no need to
explicitly call devm_iounmap.
Fixes: 0857d92f71 ("net: ena: add missing unmap bars on device removal")
Fixes: 411838e7b4 ("net: ena: fix rare kernel crash when bar memory remap fails")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2018-10-08
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) sk_lookup_[tcp|udp] and sk_release helpers from Joe Stringer which allow
BPF programs to perform lookups for sockets in a network namespace. This would
allow programs to determine early on in processing whether the stack is
expecting to receive the packet, and perform some action (eg drop,
forward somewhere) based on this information.
2) per-cpu cgroup local storage from Roman Gushchin.
Per-cpu cgroup local storage is very similar to simple cgroup storage
except all the data is per-cpu. The main goal of per-cpu variant is to
implement super fast counters (e.g. packet counters), which don't require
neither lookups, neither atomic operations in a fast path.
The example of these hybrid counters is in selftests/bpf/netcnt_prog.c
3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet
4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski
5) rename of libbpf interfaces from Andrey Ignatov.
libbpf is maturing as a library and should follow good practices in
library design and implementation to play well with other libraries.
This patch set brings consistent naming convention to global symbols.
6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov
to let Apache2 projects use libbpf
7) various AF_XDP fixes from Björn and Magnus
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Until now, both Rx and Tx confirmation frames handled during
NAPI poll were counted toward the NAPI budget. However, Tx
confirmations are lighter to process than Rx frames, which can
skew the amount of work actually done inside one NAPI cycle.
Update the code to only count Rx frames toward the NAPI budget
and set a separate threshold on how many Tx conf frames can be
processed in one poll cycle.
The NAPI poll routine stops when either the budget is consumed
by Rx frames or when Tx confirmation frames reach this threshold.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/mscc/ocelot_board.c: In function 'mscc_ocelot_probe':
drivers/net/ethernet/mscc/ocelot_board.c:262:17: warning:
variable 'phy_mode' set but not used [-Wunused-but-set-variable]
enum phy_mode phy_mode;
It never used since introduction in
commit 71e32a20cf ("net: mscc: ocelot: make use of SerDes PHYs for handling their configuration")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to fix and improve dpaa2-ptp driver
in some places.
- Fixed the return for some functions.
- Replaced kzalloc with devm_kzalloc.
- Removed dev_set_drvdata(dev, NULL).
- Made ptp_dpaa2_caps const.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to removed unused code for dprtc.
This code will be re-added along with more features
of dpaa2-ptp added.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In dpaa2-ptp driver, it's odd to use rtc in names of
some functions and structures except these dprtc APIs.
This patch is to use ptp instead of rtc in names.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NETDEVICES dependency and ETHERNET dependency hadn't
been required since dpaa2-eth was moved out of staging.
Also allowed COMPILE_TEST for dpaa2-eth.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to move DPAA2 PTP driver out of staging/
since the dpaa2-eth had been moved out.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark instructions that use pointers to areas in the stack outside of the
current stack frame, and process them accordingly in mem_op_stack().
This way, we also support BPF-to-BPF calls where the caller passes a
pointer to data in its own stack frame to the callee (typically, when
the caller passes an address to one of its local variables located in
the stack, as an argument).
Thanks to Jakub and Jiong for figuring out how to deal with this case,
I just had to turn their email discussion into this patch.
Suggested-by: Jiong Wang <jiong.wang@netronome.com>
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When pre-processing the instructions, it is trivial to detect what
subprograms are using R6, R7, R8 or R9 as destination registers. If a
subprogram uses none of those, then we do not need to jump to the
subroutines dedicated to saving and restoring callee-saved registers in
its prologue and epilogue.
This patch introduces detection of callee-saved registers in subprograms
and prevents the JIT from adding calls to those subroutines whenever we
can: we save some instructions in the translated program, and some time
at runtime on BPF-to-BPF calls and returns.
If no subprogram needs to save those registers, we can avoid appending
the subroutines at the end of the program.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
On performing a BPF-to-BPF call, we first jump to a subroutine that
pushes callee-saved registers (R6~R9) to the stack, and from there we
goes to the start of the callee next. In order to do so, the caller must
pass to the subroutine the address of the NFP instruction to jump to at
the end of that subroutine. This cannot be reliably implemented when
translated the caller, as we do not always know the start offset of the
callee yet.
This patch implement the required fixup step for passing the start
offset in the callee via the register used by the subroutine to hold its
return address.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Relocation for targets of BPF-to-BPF calls are required at the end of
translation. Update the nfp_fixup_branches() function in that regard.
When checking that the last instruction of each bloc is a branch, we
must account for the length of the instructions required to pop the
return address from the stack.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Offloaded programs using BPF-to-BPF calls use the stack to store the
return address when calling into a subprogram. Callees also need some
space to save eBPF registers R6 to R9. And contrarily to kernel
verifier, we align stack frames on 64 bytes (and not 32). Account for
all this when checking the stack size limit before JIT-ing the program.
This means we have to recompute maximum stack usage for the program, we
cannot get the value from the kernel.
In addition to adapting the checks on stack usage, move them to the
finalize() callback, now that we have it and because such checks are
part of the verification step rather than translation.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This is the main patch for the logics of BPF-to-BPF calls in the nfp
driver.
The functions called on BPF_JUMP | BPF_CALL and BPF_JUMP | BPF_EXIT were
used to call helpers and exit from the program, respectively; make them
usable for calling into, or returning from, a BPF subprogram as well.
For all calls, push the return address as well as the callee-saved
registers (R6 to R9) to the stack, and pop them upon returning from the
calls. In order to limit the overhead in terms of instruction number,
this is done through dedicated subroutines. Jumping to the callee
actually consists in jumping to the subroutine, that "returns" to the
callee: this will require some fixup for passing the address in a later
patch. Similarly, returning consists in jumping to the subroutine, which
pops registers and then return directly to the caller (but no fixup is
needed here).
Return to the caller is performed with the RTN instruction newly added
to the JIT.
For the few steps where we need to know what subprogram an instruction
belongs to, the struct nfp_insn_meta is extended with a new subprog_idx
field.
Note that checks on the available stack size, to take into account the
additional requirements associated to BPF-to-BPF calls (storing R6-R9
and return addresses), are added in a later patch.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Similarly to "exit" or "helper call" instructions, BPF-to-BPF calls will
require additional processing before translation starts, in order to
record and mark jump destinations.
We also mark the instructions where each subprogram begins. This will be
used in a following commit to determine where to add prologues for
subprograms.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The checks related to eBPF helper calls are performed each time the nfp
driver meets a BPF_JUMP | BPF_CALL instruction. However, these checks
are not relevant for BPF-to-BPF call (same instruction code, different
value in source register), so just skip the checks for such calls.
While at it, rename the function that runs those checks to make it clear
they apply to _helper_ calls only.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
In order to support BPF-to-BPF calls in offloaded programs, the nfp
driver must collect information about the distinct subprograms: namely,
the number of subprograms composing the complete program and the stack
depth of those subprograms. The latter in particular is non-trivial to
collect, so we copy those elements from the kernel verifier via the
newly added post-verification hook. The struct nfp_prog is extended to
store this information. Stack depths are stored in an array of dedicated
structs.
Subprogram start indexes are not collected. Instead, meta instructions
associated to the start of a subprogram will be marked with a flag in a
later patch.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
In preparation for support for BPF to BPF calls in offloaded programs,
rename the "stack_depth" field of the struct nfp_prog as
"stack_frame_depth". This is to make it clear that the field refers to
the maximum size of the current stack frame (as opposed to the maximum
size of the whole stack memory).
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
In preparation for BPF-to-BPF calls in offloaded programs, add a new
function attribute to the struct bpf_prog_offload_ops so that drivers
supporting eBPF offload can hook at the end of program verification, and
potentially extract information collected by the verifier.
Implement a minimal callback (returning 0) in the drivers providing the
structs, namely netdevsim and nfp.
This will be useful in the nfp driver, in later commits, to extract the
number of subprograms as well as the stack depth for those subprograms.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
There is no need to compare *val.vu32* with < 0 because
such variable is of type u32 (32 bits, unsigned), making it
impossible to hold a negative value. Fix this by removing
such comparison.
Also, initialize variable *max_val* to -1, just in case
it is not initialized to either BNXT_MSIX_VEC_MAX or
BNXT_MSIX_VEC_MIN_MAX before using it in a comparison
with val.vu32 at line 159:
if (val.vu32 > max_val)
Addresses-Coverity-ID: 1473915 ("Unsigned compared against 0")
Addresses-Coverity-ID: 1473920 ("Uninitialized scalar variable")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When offloading the L3 and L4 csum computation on TX, we need to extract
the l3_proto from the ethtype, independently of the presence of a vlan
tag.
The actual driver uses skb->protocol as-is, resulting in packets with
the wrong L4 checksum being sent when there's a vlan tag in the packet
header and checksum offloading is enabled.
This commit makes use of vlan_protocol_get() to get the correct ethtype
regardless the presence of a vlan tag.
Fixes: 3f518509de ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, the SerDes muxing was hardcoded to a given mode in the MAC
controller driver. Now, the SerDes muxing is configured within the
Device Tree and is enforced in the MAC controller driver so we can have
a lot of different SerDes configurations.
Make use of the SerDes PHYs in the MAC controller to set up the SerDes
according to the SerDes<->switch port mapping and the communication mode
with the Ethernet PHY.
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since HSIO address space can be accessed by different drivers, let's
simplify the register address definitions so that it can be easily used
by all drivers and put the register address definition in the
include/soc/mscc/ocelot_hsio.h header file.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since HSIO address space can be used by different drivers (PLL, SerDes
muxing, temperature sensor), let's move it somewhere it can be included
by all drivers.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
HSIO address space was moved to a syscon, hence we need to get the
regmap of this address space from there and no more from the device
node.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In revision 0x20, we use vlan id != 0 to check whether a vlan tag
has been offloaded, so vlan id 0 is not supported.
In revision 0x21, rx buffer descriptor adds two bits to indicate
whether one or more vlan tags have been offloaded, so vlan id 0
is valid now.
This patch seperates the handle for vlan id 0, add vlan id 0 support
for revision 0x21.
Fixes: 5b5455a9ed ("net: hns3: Add STRP_TAGP field support for hardware revision 0x21")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In revision 0x21, hw supports both ingress and egress vlan filter.
This patch adds support for it.
Signed-off-by: Zhongzhu Liu <liuzhongzhu@huawei.com>
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For mta table support has been dropped, remove the code for mta table.
Signed-off-by: Zhongzhu Liu <liuzhongzhu@huawei.com>
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In previously implement for unicast mac vlan table, the space is
shared by all the functions, driver does nothing when the space is
exhausted. This patch preallocates the space of unicast mac vlan
table for each function by software. Each function can only use its
private space and available shared space, avoiding single function
exhausts too much space, and other functions are unable to add
unicast mac address.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In original codes, the mac vlan table entries are not cleared when
unload hns3 driver. The dirty mac vlan table entries will make the
result of looking up mac vlan table being unexpected.
When doing core reset or global reset, the firmware will clear all
the tables for driver, and driver shouldn't send any commands to
firmware during reset. But when doing function reset, the driver
needs to clear the tables itself.
This patch clears the mac vlan table entries for each client when
unload driver or reset.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The default mask configuration has been done by firmware, so the driver
doesn't need to do it any more.
Signed-off-by: Zhongzhu Liu <liuzhongzhu@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In cxgb_extension_ioctl(), the command of the ioctl is firstly copied from
the user-space buffer 'useraddr' to 'cmd' and checked through the
switch statement. If the command is not as expected, an error code
EOPNOTSUPP is returned. In the following execution, i.e., the cases of the
switch statement, the whole buffer of 'useraddr' is copied again to a
specific data structure, according to what kind of command is requested.
However, after the second copy, there is no re-check on the newly-copied
command. Given that the buffer 'useraddr' is in the user space, a malicious
user can race to change the command between the two copies. By doing so,
the attacker can supply malicious data to the kernel and cause undefined
behavior.
This patch adds a re-check in each case of the switch statement if there is
a second copy in that case, to re-check whether the command obtained in the
second copy is the same as the one in the first copy. If not, an error code
EINVAL is returned.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>