Code logic is similar with logic right shift except we also need to set
PREV_ALU result properly, the MSB of which is the bit that will be
replicated to fill in all the vacant positions.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
For indirect shifts, shift amount is not specified as constant, NFP needs
to get the shift amount through the low 5 bits of source A operand in
PREV_ALU, therefore extra instructions are needed compared with shifts by
constants.
Because NFP is 32-bit, so we are using register pair for 64-bit shifts and
therefore would need different instruction sequences depending on whether
shift amount is less than 32 or not.
NFP branch-on-bit-test instruction emitter is added by this patch and is
used for efficient runtime check on shift amount. We'd think the shift
amount is less than 32 if bit 5 is clear and greater or equal than 32
otherwise. Shift amount is greater than or equal to 64 will result in
undefined behavior.
This patch also use range info to avoid generating unnecessary runtime code
if we are certain shift amount is less than 32 or not.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
BPF has access to all internal FW datapath structures. Including
the structure containing RX queue selection. With little coordination
with the datapath we can let the offloaded BPF select the RX queue.
We just need a way to tell the datapath that queue selection has already
been done and it shouldn't overwrite it. Define a bit to tell datapath
BPF already selected a queue (QSEL_SET), if the selected queue is not
enabled (>= number of enabled queues) datapath will perform normal RSS.
BPF queue selection on the NIC can be used to replace standard
datapath RSS with fully programmable BPF/XDP RSS.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
NFP has a prng register, which we can read to obtain a u32 worth
of pseudo random data. Generate code for it.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Allow atomic add to be used even when the value is not guaranteed
to fit into a 16 bit immediate. This requires the value to be pulled
as data, and therefore use of a transfer register and a context swap.
Track the information about possible lengths of the value, if it's
guaranteed to be larger than 16bits don't generate the code for the
optimized case at all.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Implement atomic add operation for 32 and 64 bit values. Depend
on the verifier to ensure alignment. Values have to be kept in
big endian and swapped upon read/write. For now only support
atomic add of a constant.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch assumes there is a packet data cache, and would try to read
packet data from the cache instead of from memory.
This patch only implements the optimisation "backend", it doesn't build
the packet data cache, so this optimisation is not enabled.
This patch has only enabled aligned packet data read, i.e. when the read
offset to the start of cache is REG_WIDTH aligned.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Immediate loads are used to load the return address of a helper.
We need to be able to update those loads for relocations.
Immediate loads can be slightly more complex and spread over
two instructions in general, but here we only care about simple
loads of small (< 65k) constants, so complex cases are not handled.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch adds signed jump instructions (jsgt, jsge, jslt, jsle)
to the nfp jit. As well as adding the additional required raw
assembler branch mask to nfp_asm.h
Signed-off-by: Nic Viljoen <nick.viljoen@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
In preparation for better handling of relocations move existing
helper for setting branch offset to nfp_asm.c and add two more.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Support bpf_xdp_adjust_head(). We need to check whether the
packet offset after adjustment is within datapath's limits.
We also check if the frame is at least ETH_HLEN long (similar
to the kernel implementation).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
- The spec defines CSR address ranges for indirect ME CSRs. For Each TLV
chunk in the spec, dump a chunk that includes the spec and the data
over the defined address range.
- Each indirect CSR has 8 contexts. To read one context, first write the
context to a specific derived address, read it back, and then read the
register value.
- For each address, read and dump all 8 contexts in this manner.
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For NFP, we want to re-group a sequence of load/store pairs lowered from
memcpy/memmove into single memory bulk operation which then could be
accelerated using NFP CPP bus.
This patch extends the existing load/store auxiliary information by adding
two new fields:
struct bpf_insn *paired_st;
s16 ldst_gather_len;
Both fields are supposed to be carried by the the load instruction at the
head of the sequence. "paired_st" is the corresponding store instruction at
the head and "ldst_gather_len" is the gathered length.
If "ldst_gather_len" is negative, then the sequence is doing memory
load/store in descending order, otherwise it is in ascending order. We need
this information to detect overlapped memory access.
This patch then optimize memory bulk copy when the copy length is within
32-bytes.
The strategy of read/write used is:
* Read.
Use read32 (direct_ref), always.
* Write.
- length <= 8-bytes
write8 (direct_ref).
- length <= 32-bytes and is 4-byte aligned
write32 (direct_ref).
- length <= 32-bytes but is not 4-byte aligned
write8 (indirect_ref).
NOTE: the optimization should not change program semantics. The destination
register of the last load instruction should contain the same value before
and after this optimization.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add support for emitting commands with field overwrites.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The current ALU_OP_NEG is Op encoding 0x4 for NPF ALU instruction. It is
actually performing "~B" operation which is bitwise NOT.
The using naming ALU_OP_NEG is misleading as NEG is -B which is not the
same as ~B.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To access beyond 64th byte of the stack we need to set a new
stack pointer register (LMEM is accessed indirectly through
those pointers). Add a function for encoding local CSR access
instruction. Use stack pointer number 3.
Note that stack pointer registers allow us to index into 32
bytes of LMEM (with shift operations i.e. when operands are
restricted). This means if access is crossing 32 byte boundary
we must not use offsetting, we have to set the pointer to the
exact address and move it with post-increments.
We depend on the datapath placing the stack base address in
GPR A22 for our use.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Today's -next build encountered an error due to a missing definition of
WARN_ON(), caused by some header reorganization removing an implicit
inclusion of linux/bug.h. Fix this with an explicit inclusion.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds ability to write packet contents using pre-validated
packet pointers (direct packet access).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In direct packet access bound checks are already done, we can
simply dereference the packet pointer.
Verifier/parser logic needs to record pointer type. Note that
although verifier does protect us from CTX vs other pointer
changes we will also want to differentiate between PACKET vs
MAP_VALUE or STACK, so we can add the check already.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to append up to 8 nops after last instruction to make
sure the CPU will not fetch garbage instructions with invalid
ECC if the code store was not initialized.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the initial PoC firmware I simply disabled ECC on the instruction
store. Do the ECC calculation for generated instructions in the driver.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most instructions have special fields which allow switching
between base and extended Local Memory pointers. Introduce
those to register encoding, we will use the extra LM pointers
to access high addresses of the stack.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFP LMEM is a large, indirectly accessed register file. There
are two basic indirect access registers. Each access operation
may either use offset (up to 8 or 16 words) or perform post
decrement/increment.
Add encodings of LMEM indexes as instruction operands.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to add longer OP_* defines, move the values away.
Purely whitespace commit.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the software reg helpers and some static data to nfp_asm.c.
They are related to the previous patch, but move is done in a separate
commit for ease of review.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define a new __bitwise type for software representation of registers.
This will allow us to catch incorrect parameter types using sparse.
Accessors we define also allow us to return correct enum type and
therefore ensure all switches handle all register types.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pure move of eBPF offload files to BPF app directory,
only change the names and relative header location.
nfp_asm.h stays in the main dir and it doesn't really
have to include nfp_bpf.h.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add translator for JITing eBPF to operations which
can be executed on NFP's programmable engines.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>