Following steps are taken in the driver to offload an XDP program:
XDP_SETUP_PROG:
* prepare:
- allocate program state;
- run verifier (bpf_analyzer());
- run translation;
* load:
- stop old program if needed;
- load program;
- enable BPF if not enabled;
* clean up:
- free program image.
With new infrastructure the flow will look like this:
BPF_OFFLOAD_VERIFIER_PREP:
- allocate program state;
BPF_OFFLOAD_TRANSLATE:
- run translation;
XDP_SETUP_PROG:
- stop old program if needed;
- load program;
- enable BPF if not enabled;
BPF_OFFLOAD_DESTROY:
- free program image.
Take advantage of the new infrastructure. Allocation of driver
metadata has to be moved from jit.c to offload.c since it's now
done at a different stage. Since there is no separate driver
private data for verification step, move temporary nfp_meta
pointer into nfp_prog. We will now use user space context
offsets.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct nfp_prog is currently only used internally by the translator.
This means there is a lot of parameter passing going on, between
the translator and different stages of offload. Simplify things
by allocating nfp_prog in offload.c already.
We will now use kmalloc() to allocate the program area and only
DMA map it for the time of loading (instead of allocating DMA
coherent memory upfront).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most of offload/translation prepare logic will be moved to
offload.c. To help git generate more reasonable diffs
move nfp_prog_prepare() and nfp_prog_free() functions
there as a first step.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware supports live replacement of programs for quite some
time now. Remove the software-fallback related logic and
depend on the FW for program replace. Seamless reload will
become a requirement if maps are present, anyway.
Load and start stages have to be split now, since replace
only needs a load, start has already been done on add.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently create a fake cls_bpf offload object when we want
to offload XDP. Simplify and clarify the code by moving the
TC/XDP specific logic out of common offload code. This is easy
now that we don't support legacy TC actions. We only need the
bpf program and state of the skip_sw flag.
Temporarily set @code to NULL in nfp_net_bpf_offload(), compilers
seem to have trouble recognizing it's always initialized. Next
patches will eliminate that variable.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BPF offload's main header does not need to include nfp_net.h.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The register renumbering was removed and will not be coming back
in its old, naive form, given that it would be fundamentally
incompatible with calling functions. Remove the leftovers.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only support BPF_PROG_TYPE_SCHED_CLS programs in direct
action mode. This simplifies preparing the offload since
there will now be only one mode of operation for that type
of program. We need to know the attachment mode type of
cls_bpf programs, because exit codes are interpreted
differently for legacy vs DA mode.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If TC program is loaded with skip_sw flag, we should allow
the device-specific programs to be accepted.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass the netdev pointer to bpf_prog_get_type(). This way
BPF code can decide whether the device matches what the
code was loaded/translated for.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If program is bound to a device, print the name of the relevant
interface or unknown if the netdev has since been removed.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend struct bpf_prog_info to contain information about program
being bound to a device. Since the netdev may get destroyed while
program still exists we need a flag to indicate the program is
loaded for a device, even if the device is gone.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fact that we don't know which device the program is going
to be used on is quite limiting in current eBPF infrastructure.
We have to reverse or limit the changes which kernel makes to
the loaded bytecode if we want it to be offloaded to a networking
device. We also have to invent new APIs for debugging and
troubleshooting support.
Make it possible to load programs for a specific netdev. This
helps us to bring the debug information closer to the core
eBPF infrastructure (e.g. we will be able to reuse the verifer
log in device JIT). It allows device JITs to perform translation
on the original bytecode.
__bpf_prog_get() when called to get a reference for an attachment
point will now refuse to give it if program has a device assigned.
Following patches will add a version of that function which passes
the expected netdev in. @type argument in __bpf_prog_get() is
renamed to attach_type to make it clearer that it's only set on
attachment.
All calls to ndo_bpf are protected by rtnl, only verifier callbacks
are not. We need a wait queue to make sure netdev doesn't get
destroyed while verifier is still running and calling its driver.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ndo_xdp is a control path callback for setting up XDP in the
driver. We can reuse it for other forms of communication
between the eBPF stack and the drivers. Rename the callback
and associated structures and definitions.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of the swap macro and remove unnecessary variables skb and cnt.
This makes the code easier to read and maintain.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using l2tp_tunnel_find() in l2tp_ip_recv() is wrong for two reasons:
* It doesn't take a reference on the returned tunnel, which makes the
call racy wrt. concurrent tunnel deletion.
* The lookup is only based on the tunnel identifier, so it can return
a tunnel that doesn't match the packet's addresses or protocol.
For example, a packet sent to an L2TPv3 over IPv6 tunnel can be
delivered to an L2TPv2 over UDPv4 tunnel. This is worse than a simple
cross-talk: when delivering the packet to an L2TP over UDP tunnel, the
corresponding socket is UDP, where ->sk_backlog_rcv() is NULL. Calling
sk_receive_skb() will then crash the kernel by trying to execute this
callback.
And l2tp_tunnel_find() isn't even needed here. __l2tp_ip_bind_lookup()
properly checks the socket binding and connection settings. It was used
as a fallback mechanism for finding tunnels that didn't have their data
path registered yet. But it's not limited to this case and can be used
to replace l2tp_tunnel_find() in the general case.
Fix l2tp_ip6 in the same way.
Fixes: 0d76751fad ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support")
Fixes: a32e0eec70 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 114893
Addresses-Coverity-ID: 114894
Addresses-Coverity-ID: 114895
Addresses-Coverity-ID: 114896
Addresses-Coverity-ID: 114897
Addresses-Coverity-ID: 114898
Addresses-Coverity-ID: 114899
Addresses-Coverity-ID: 114900
Addresses-Coverity-ID: 114901
Addresses-Coverity-ID: 114902
Addresses-Coverity-ID: 114903
Addresses-Coverity-ID: 114904
Addresses-Coverity-ID: 114905
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_init_nondata_skb() is fed with freshly allocated skbs.
They already have a cleared csum field, no need to clear it again.
This is based on Neal review on commit 3b11775033 ("tcp: do not mangle
skb->cb[] in tcp_make_synack()"), noticing I did not clear skb->csum.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce one indentation level to make code more readable.
tcp_sync_mss() can be factorized.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can now build this driver on ARM, so I ran into a randconfig build
warning that presumably had existed on powerpc already.
drivers/net/ethernet/freescale/dpaa/dpaa_eth.c: In function 'sg_fd_to_skb':
drivers/net/ethernet/freescale/dpaa/dpaa_eth.c:1712:18: error: 'skb' may be used uninitialized in this function [-Werror=maybe-uninitialized]
I'm slightly changing the logic here, to make it obvious to the
compiler that 'skb' is always initialized.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Flavio Leitner says:
====================
Allow openvswitch to query ports in another netns.
Today Open vSwitch users are moving internal ports to other namespaces and
although packets are flowing OK, the userspace daemon can't find out basic
information like if the port is UP or DOWN, for instance.
This patchset extends openvswitch API to retrieve the current netnsid of
a port. It will be used by the userspace daemon to find out in which netns
the port is located.
This patchset also extends the rtnetlink getlink call to accept and operate
on a given netnsid. More details are available in each patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, when an application gets netnsid from the kernel (for example as
the result of RTM_GETLINK call on one end of the veth pair), it's not much
useful. There's no reliable way to get to the netns fd from the netnsid, nor
does any kernel API accept netnsid.
Extend the RTM_GETLINK call to also accept netnsid. It will operate on the
netns with the given netnsid in such case. Of course, the calling process
needs to have enough capabilities in the target name space; for now, require
CAP_NET_ADMIN. This can be relaxed in the future.
To signal to the calling process that the kernel understood the new
IFLA_IF_NETNSID attribute in the query, it will include it in the response.
This is needed to detect older kernels, as they will just ignore
IFLA_IF_NETNSID and query in the current name space.
This patch implemetns IFLA_IF_NETNSID only for get and dump. For set
operations, this can be extended later.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows reliable identification of netdevice interfaces connected
to openvswitch bridges. In particular, user space queries the netdev
interfaces belonging to the ports for statistics, up/down state, etc.
Datapath dump needs to provide enough information for the user space to be
able to do that.
Currently, only interface names are returned. This is not sufficient, as
openvswitch allows its ports to be in different name spaces and the
interface name is valid only in its name space. What is needed and generally
used in other netlink APIs, is the pair ifindex+netnsid.
The solution is addition of the ifindex+netnsid pair (or only ifindex if in
the same name space) to vport get/dump operation.
On request side, ideally the ifindex+netnsid pair could be used to
get/set/del the corresponding vport. This is not implemented by this patch
and can be added later if needed.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There have been some cases where external tooling (e.g., kpatch-build)
creates a corrupt relocation which targets the wrong address. This is a
silent failure which can corrupt memory in unexpected places.
On x86, the bytes of data being overwritten by relocations are always
initialized to zero beforehand. Use that knowledge to add sanity checks
to detect such cases before they corrupt memory.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: jeyu@kernel.org
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/37450d6c6225e54db107fba447ce9e56e5f758e9.1509713553.git.jpoimboe@redhat.com
[ Restructured the messages, as it's unclear whether the relocation or the target is corrupted. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
By using CQE based moderation on TX CQ we can reduce the number of TX
interrupt rate. Besides the benefit of less interrupts, this also
allows the kernel to better utilize TSO. Since TSO has some CPU overhead,
it might not aggregate when CPU is under high stress. By reducing the
interrupt rate and the CPU utilization, we can get better aggregation
and better overall throughput.
The feature is enabled by default and has a private flag in ethtool
for control.
Throughput, interrupt rate and TSO utilization improvements:
(ConnectX-4Lx 40GbE, unidirectional, 1/16 TCP streams, 64B packets)
---------------------------------------------------------
Metric | Streams | CQE Based | EQE Based | improvement
---------------------------------------------------------
BW | 1 | 2.4Gb/s | 2.15Gb/s | +11.6%
IR | 1 | 27Kips | 50.6Kips | -46.7%
TSO Util | 1 | 74.6% | 71% | +5%
BW | 16 | 29Gb/s | 25.85Gb/s | +12.2%
IR | 16 | 482Kips | 745Kips | -35.3%
TSO Util | 16 | 69.1% | 49% | +41.1%
*BW = Bandwidth, IR = Interrupt rate, ips = interrupt per second.
TSO Util = bytes in TSO sessions / all bytes transferred
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This is needed in order to enlarge it with more members that will get
value of 0 when not set.
Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The NIC TC offload table size was hard coded to 1k. Change it to be
min(max NIC RX table size,
min(max flow counters, 64k) * num flow groups)
where the max values are read from the firmware and the number of
flow groups is hard-coded as before this change.
We don't know upfront the division of flows to groups (== different masks).
This setup allows each group to be of size up to the where we want to go
(when supported, all offloaded flows use counters). Thus, we don't expect
multiple occurences for a group which in turn would add steering hops.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add debug print when changing the configuration of QoS through dcbnl.
Use ethtool -s <devname> msglvl hw on/off to toggle debug messages.
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
If the port is in DSCP trust state, packets are placed in the right
priority queue based on the dscp value. This is done by selecting
the transmit queue based on the dscp of the skb.
Until now select_queue honors priority only from the vlan header.
However that is not sufficient in cases where port trust state is DSCP
mode as packet might not even contain vlan header. Therefore if the port
is in dscp trust state and vport's min inline mode is not NONE,
copy the IP header to the eseg's inline header if the skb has it.
This is done by changing the transmit queue sq's min inline mode to L3.
Note that the min inline mode of sqs that belong to other features such
as xdpsq, icosq are not modified.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch implements dcbnl hooks to set and delete DSCP to priority map
as defined by the DCB subsystem. Device maintains internal trust state
which needs to be set to DSCP state for performing DSCP to priority mapping.
When the first dscp to priority APP entry is added by the user, the
trust state is changed to dscp.
When the last dscp to priority APP entry is deleted by the user, the
trust state is changed to pcp.
If user sends multiple dscp to priority APP entries on the same dscp,
the last sent one will take effect. All the previous sent will be
deleted.
The dscp to priority APP entries are added and deleted in the net/dcb
APP database using dcb_ieee_setapp/getapp.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The QPTS register allows changing the priority trust state between pcp and
dscp. Add support to get/set trust state from device. When the port is
in pcp/dscp trust state, packet is routed by hardware to matching priority
based on its pcp/dscp value respectively.
The QPDPM register allow channing the dscp to priority mapping. Add support
to get/set dscp to priority mapping from device.
Note that to change a dscp mapping, the "e" bit of this dscp structure
must be set in the QPDPM firmware command.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The QCAM register provides capability bit for all the QoS registers
using ACCESS_REG command.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
IN6_ADDR_HSIZE is private to addrconf.c, move it here to avoid
confusion.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pktgen accidentally used IN6_ADDR_HSIZE, instead of using the size of an
IPv6 address.
Since IN6_ADDR_HSIZE recently was increased from 16 to 256, this old
bug is hitting us.
Fixes: 3f27fb2321 ("ipv6: addrconf: add per netns perturbation in inet6_addr_hash()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull ARM fixes from Russell King:
- omit EFI memory map sorting, which was recently introduced, but
caused problems with the decompressor due to additional sections
being emitted.
- avoid unaligned load fault-generating instructions in the
decompressor by switching to a private unaligned implementation.
- add a symbol into the decompressor to further debug non-boot
situations (ld's documentation is extremely poor for how "." works,
ld doesn't seem to follow its own documentation!)
- parse endian information to sparse
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: add debug ".edata_real" symbol
ARM: 8716/1: pass endianness info to sparse
efi/libstub: arm: omit sorting of the UEFI memory map
ARM: 8715/1: add a private asm/unaligned.h
x86 KVM guest fix.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJZ/fZuAAoJEL/70l94x66DHVkH/i99gyP/BoFaNfooesXpy89o
VcjuHzp4XYvUmhP1rCGYqYQEVZYrgsqKAsxL5cyN1nF5SWxebpM8cD96yM7lQx2Y
Ap5rxYWldn41ZmRRLQzCRKgwPG+V+yMlVTDM8FG/PKJyRTG7fMUEN6IBlRZF2yZr
DNmy2s//JafEUL3TDq2IXCvfZ1d5VEsCfI2xiYsIzQxwKZ1bHFNqbTqWJZr3Xns1
xL9e0VjMtNaGtyyCs0ZDjco3kAVQp58Q5+BhnL4/P+uqThjFDrpjQ3RmF0mtC95n
TKQuUP7QpLUoq74RwHa8tP4IpWj2EZLjefOw/s1Uv2XtieJrRmNIHT0OOGBj9O8=
=uYvL
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Fixes for interrupt controller emulation in ARM/ARM64 and x86, plus a
one-liner x86 KVM guest fix"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Update APICv on APIC reset
KVM: VMX: Do not fully reset PI descriptor on vCPU reset
kvm: Return -ENODEV from update_persistent_clock
KVM: arm/arm64: vgic-its: Check GITS_BASER Valid bit before saving tables
KVM: arm/arm64: vgic-its: Check CBASER/BASER validity before enabling the ITS
KVM: arm/arm64: vgic-its: Fix vgic_its_restore_collection_table returned value
KVM: arm/arm64: vgic-its: Fix return value for device table restore
arm/arm64: kvm: Disable branch profiling in HYP code
arm/arm64: kvm: Move initialization completion message
arm/arm64: KVM: set right LR register value for 32 bit guest when inject abort
KVM: arm64: its: Fix missing dynamic allocation check in scan_its_table
Only two patches came in over the last two weeks: Uniphier USB support
needs additional clocks enabled (on both 32-bit and 64-bit ARM), and
a Marvell MVEBU stability issue has been fixed.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIVAwUAWfz7gGCrR//JCVInAQLpeQ//SI+l8egWQCpBVF57oW3Y+PdNcYvAmfqv
h4fPl6if0VXYKPdGoiIOLO5uk+SL2MxoX46dSmqOVnBVj7CvHZzmlCjvVk8UKzJI
svfU3x1YwHdFf+brIoQxrdCI3iVV/6LgtHgjF2jxxatHqLpnjQRqLmY/kTV99I19
IXSTBS49H0X4QaXt+l6AUdn5f/fauX0cN8EIh3e8bPIBHZWkrXEbJb7Zx0tGMtlz
jKb0vw4RTms7BS7R5iZIvUzD5WvgRXEMeiTVbBXlB7Tp6Pet4+zdP98J3TBO7GYD
Dq/vhj2rLw6C2sbmLNCdghWi7urZIuWWdJAEDU6hijvoDqidGUjtmSobGToW8B5n
rb42NbfeOleDzFCXN+0mjE2dH/coqe3FPfG3MkppdLc8AM70wvYMpguAAkGWp+DI
FTJvqybrPZ0/YCy9x5UDRe4VsBp015lUdRzZx/kfZ0olvE12wuLRiQ4+d26nHrry
Y08EKY8pYJ9BMVTWYqB4XVaP5axuDa4tLr+hsuHEwW21fziyZ/IvkYTbwfmmxxCG
bF9alE/H5bp20I8j3taZUhpdAg4f/Cl+sZBHMPfyo+oeQ2Dmx1XOtk9nXqcvroa3
8ls9BK1ySJSAREpIADPa8OESeSWOHuGDmbzcw0KtVVcraeLfEl1m1L+zHqPsHPjB
Ii+uUzsmg0M=
=pn9T
-----END PGP SIGNATURE-----
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"Only two patches came in over the last two weeks: Uniphier USB support
needs additional clocks enabled (on both 32-bit and 64-bit ARM), and a
Marvell MVEBU stability issue has been fixed"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: dts: mvebu: pl310-cache disable double-linefill
arm64: dts: uniphier: add STDMAC clock to EHCI nodes
ARM: dts: uniphier: add STDMAC clock to EHCI nodes
A selection of important MIPS fixes for 4.14, and some MAINTAINERS /
email address updates:
- Update imgtec.com -> mips.com email addresses (this trivially updates
comments in quite a few files, as well as MAINTAINERS)
- Update Pistachio SoC maintainership
- Fix NI 169445 build (new platform in 4.14)
- Fix EVA regression (4.14)
- Fix SMP-CPS build & preemption regressions (4.14)
- Fix SMP/hotplug deadlock & race (deadlock reintroduced 4.13)
- Fix ebpf_jit error return (4.13)
- Fix SMP-CMP build regressions (4.11 and 4.14)
- Fix bad UASM microMIPS encoding (3.16)
- Fix CM definitions (3.15)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEd80NauSabkiESfLYbAtpk944dnoFAln8oQ0ACgkQbAtpk944
dnoYohAAlw4Ui09K7fgpwGcmquwoo5h4FYRE2dkd3RvITl73m7GI+3rML6RzxINV
o5DG6WWJWWROKBVkfXMkJ+lksoSSMfBlSSE+MdvWdWUbWm+Oh15rXOtzZSbNTqWG
Y+pghJq2XGpOBe8Bp4EVDJjPnsQMLYK+tmw/jWxtCYLzp+j4I40WHnNPgtT/Tci8
NzAn7J9jTft3rWd4/dqQFhAnZrNAv/Udx2QGNPPRfe1TOrMVP/2T5gSLW+yjuOA6
NG10DennLfTjtpGFlCeF/pGHXqRJ629AzOq5nKJthGjc/QPk6T+vnSpH/wcgZra/
sw4bAqEo769x6KHgJzTqvES2j6rTAqJYeqxc2/GaH9HGcTzvqDCwynUr860BIv+7
MCiLhyf73ivZNqh5ntJfXCXkKex0oDDFu9eze1wZ76qJYlyEKPx6cSCvvUIWBOU5
7/jzQ3wIiHcPIp48uqZ5c6vxuY0ppbD5feilMDuXcNTDVGOJIFr43agB94Ynr+yM
KnKlgosrRsvTHvcTYRsgG3qC/0pllRmlsNKKUrwtlu2gfIIvpWBFAJWR3pqHiZB6
UYu0AIHg0ctMOCLoOAT9jD0iUS3sCzqdvFufTrLwak0UZgw/nvFAYNWZqN0ZRmpB
6NR6U6o2r8ld6cGfzSi8BQtIZvxqwNl0qlVUVBcxUeR8OUUkBJA=
=nqbK
-----END PGP SIGNATURE-----
Merge tag 'mips_fixes_4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips
Pull MIPS fixes from James Hogan:
"A selection of important MIPS fixes for 4.14, and some MAINTAINERS /
email address updates:
Maintainership updates:
- imgtec.com -> mips.com email addresses (this trivially updates
comments in quite a few files, as well as MAINTAINERS)
- Pistachio SoC maintainership update
Fixes:
- NI 169445 build (new platform in 4.14)
- EVA regression (4.14)
- SMP-CPS build & preemption regressions (4.14)
- SMP/hotplug deadlock & race (deadlock reintroduced 4.13)
- ebpf_jit error return (4.13)
- SMP-CMP build regressions (4.11 and 4.14)
- bad UASM microMIPS encoding (3.16)
- CM definitions (3.15)"
[ I had taken the email address updates separately, because I didn't
expect James to send a pull request, so those got applied twice. - Linus]
* tag 'mips_fixes_4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips:
MIPS: Update email address for Marcin Nowakowski
MIPS: smp-cmp: Fix vpe_id build error
MAINTAINERS: Update Pistachio platform maintainers
MIPS: smp-cmp: Use right include for task_struct
MIPS: Update Goldfish RTC driver maintainer email address
MIPS: Update RINT emulation maintainer email address
MIPS: CPS: Fix use of current_cpu_data in preemptible code
MIPS: SMP: Fix deadlock & online race
MIPS: bpf: Fix a typo in build_one_insn()
MIPS: microMIPS: Fix incorrect mask in insn_table_MM
MIPS: Fix CM region target definitions
MIPS: generic: Fix compilation error from include asm/mips-cpc.h
MIPS: Fix exception entry when CONFIG_EVA enabled
MIPS: generic: Fix NI 169445 its build
Update MIPS email addresses
After commit 674e75411f (sched: cpufreq: Allow remote cpufreq
callbacks) we stopped to always read the utilization for the CPU we
are running the governor on, and instead we read it for the CPU
which we've been told has updated utilization. This is stored in
sugov_cpu->cpu.
The value is set in sugov_register() but we clear it in sugov_start()
which leads to always looking at the utilization of CPU0 instead of
the correct one.
Fix this by consolidating the initialization code into sugov_start().
Fixes: 674e75411f (sched: cpufreq: Allow remote cpufreq callbacks)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Reviewed-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: Brendan Jackman <brendan.jackman@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This fixes the following warning with GCC 4.6:
mm/migrate.o: warning: objtool: migrate_misplaced_transhuge_page()+0x71: unreachable instruction
The problem is that the compiler merged identical annotate_unreachable()
inline asm blocks, resulting in a missing 'unreachable' annotation.
This problem happened before, and was partially fixed with:
3d1e236022 ("objtool: Prevent GCC from merging annotate_unreachable()")
That commit tried to ensure that each instance of the
annotate_unreachable() inline asm statement has a unique label. It used
the __LINE__ macro to generate the label number. However, even the line
number isn't necessarily unique when used in an inline function with
multiple callers (in this case, __alloc_pages_node()'s use of
VM_BUG_ON).
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kbuild-all@01.org
Cc: tipbuild@zytor.com
Fixes: 3d1e236022 ("objtool: Prevent GCC from merging annotate_unreachable()")
Link: http://lkml.kernel.org/r/20171103221941.cajpwszir7ujxyc4@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This reverts commit 43858b4f25.
The reason I removed the leave_mm() calls in question is because the
heuristic wasn't needed after that patch. With the original version
of my PCID series, we never flushed a "lazy cpu" (i.e. a CPU running
kernel thread) due a flush on the loaded mm.
Unfortunately, that caused architectural issues, so now I've
reinstated these flushes on non-PCID systems in:
commit b956575bed ("x86/mm: Flush more aggressively in lazy TLB mode").
That, in turn, gives us a power management and occasionally
performance regression as compared to old kernels: a process that
goes into a deep idle state on a given CPU and gets its mm flushed
due to activity on a different CPU will wake the idle CPU.
Reinstate the old ugly heuristic: if a CPU goes into ACPI C3 or an
intel_idle state that is likely to cause a TLB flush gets its mm
switched to init_mm before going idle.
FWIW, this heuristic is lousy. Whether we should change CR3 before
idle isn't a good hint except insofar as the performance hit is a bit
lower if the TLB is getting flushed by the idle code anyway. What we
really want to know is whether we anticipate being idle long enough
that the mm is likely to be flushed before we wake up. This is more a
matter of the expected latency than the idle state that gets chosen.
This heuristic also completely fails on systems that don't know
whether the TLB will be flushed (e.g. AMD systems?). OTOH it may be a
bit obsolete anyway -- PCID systems don't presently benefit from this
heuristic at all.
We also shouldn't do this callback from innermost bit of the idle code
due to the RCU nastiness it causes. All the information need is
available before rcu_idle_enter() needs to happen.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 43858b4f25 "x86/mm: Stop calling leave_mm() in idle code"
Link: http://lkml.kernel.org/r/c513bbd4e653747213e05bc7062de000bf0202a5.1509793738.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently n->flags is being operated on by a logical && operator rather
than a bitwise & operator. This looks incorrect as these should be bit
flag operations. Fix this.
Detected by CoverityScan, CID#1460398 ("Logical vs. bitwise operator")
Fixes: 245dc5121a ("net: sched: cls_u32: call block callbacks for offload")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When run ipvs in two different network namespace at the same host, and one
ipvs transport network traffic to the other network namespace ipvs.
'ipvs_property' flag will make the second ipvs take no effect. So we should
clear 'ipvs_property' when SKB network namespace changed.
Fixes: 621e84d6f3 ("dev: introduce skb_scrub_packet()")
Signed-off-by: Ye Yin <hustcat@gmail.com>
Signed-off-by: Wei Zhou <chouryzhou@gmail.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Average RTT is 32-bit thus full 64-bit division is redundant.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>