Pull x86 fixlet from Thomas Gleixner.
Remove a warning about lack of compiler support for retpoline that most
people can't do anything about, so it just annoys them needlessly.
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/retpoline: Remove compile time warning
One fix for an oops at boot if we take a hotplug interrupt before we are ready
to handle it.
The bulk is patches to implement mitigation for Meltdown, see the change logs
for more details.
Thanks to:
Nicholas Piggin, Michael Neuling, Oliver O'Halloran, Jon Masters, Jose Ricardo
Ziviani, David Gibson.
-----BEGIN PGP SIGNATURE-----
iQIwBAABCAAaBQJaWXZ+ExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYBe
ew/+KMp80VAIGSyFeYHLg8oClnQjKcEDMzlNoHo8WU6NwwNzk2XXBXNaGT7Osg7B
Ny69R3/VRrGmi/2uule9OzF2eZt8BUmRn3cHDcUB5OwFjMysv4M2jc+OK0STQxLU
F769rX+ZlkyAvQn4UxtNCUob+JDfVbE5Lurrx475ZGL1YHk6QqBGhlN/niXKtkBz
upeBEpu4JgwsLLD9H7c4QQha9PkhxY51jtvxL24cgwDuKJKhTdxLahPlA3QmLafu
FJxR9R0YzTj9Ivuae5yM7xRKr5wROTPtTRbG9sN7XOgFNrQ+LSgemXGrlV/fzdhV
6iPn2hHs4bu45SOIM+Hhf7OwHTtnMetXmKo6NldlKHVCw1HwBLigRohnOTPjiHfZ
P6brR4T/cglakBiE29+8T0UFqOPizEJiQRHnsoWzwIa/aTqCGEE3m3rgTxU6ovt5
3JggC51ZRDFsjGvj+JFnxGxCjYC4tDbfBI28M+GuWZZPBKLkFnEk8PlSzYyHPW2p
5uVA/UN4KHVGYUfJVUC+LMP+ZIOSEyEISKk7RuK0C0mwHSXcvxbeEefXbc18Xjai
J/pl/c9/4CRnEbdPm305xBxKKmU6gWxWRK3KbRFsoTI/wp+Ryx5TjPpgAa5GnaLv
nrtrfKs8skWfGMoDWH7+KILt0fgRR2x+91GTSxc5aaHtYBI=
=UZHv
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.15-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"One fix for an oops at boot if we take a hotplug interrupt before we
are ready to handle it.
The bulk is patches to implement mitigation for Meltdown, see the
change logs for more details.
Thanks to: Nicholas Piggin, Michael Neuling, Oliver O'Halloran, Jon
Masters, Jose Ricardo Ziviani, David Gibson"
* tag 'powerpc-4.15-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powernv: Check device-tree for RFI flush settings
powerpc/pseries: Query hypervisor for RFI flush settings
powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti
powerpc/64s: Add support for RFI flush of L1-D cache
powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL
powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL
powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL
powerpc/64s: Simple RFI macro conversions
powerpc/64: Add macros for annotating the destination of rfid/hrfid
powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper
powerpc/pseries: Make RAS IRQ explicitly dependent on DLPAR WQ
Jakub Kicinski says:
====================
This set adds support for creating maps on networking devices. BPF is
programs+maps, the pure program offload has been around for quite some
time, this patchset adds the map part of the equation.
Maps are allocated on the target device from the start. There is no
host copy when map is created on the device. Device maps are represented
by struct bpf_offloaded_map, regardless of type. Host programs can't
access such maps, access is only possible from a program also loaded
to the same device and/or via the BPF syscall.
Offloaded programs are currently only allowed to perform lookups,
control plane is responsible for populating the maps.
For brevity only infrastructure and basic NFP patches are included.
Target device reporting, netdevsim and tests will follow up as well as
some further optimizations to the NFP code.
v2:
- leave out the array maps, we will add them trivially later to avoid
merge conflicts with ongoing spectere&meltdown mitigations.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Plug in to the stack's map offload callbacks for BPF map offload.
Get next call needs some special handling on the FW side, since
we can't send a NULL pointer to the FW there is a get first entry
FW command.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Map memory needs to use 40 bit addressing. Add handling of such
accesses. Since 40 bit addresses are formed by using both 32 bit
operands we need to pre-calculate the actual address instead of
adding in the offset inside the instruction, like we did in 32 bit
mode.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Verify our current constraints on the location of the key are
met and generate the code for calling map lookup on the datapath.
New relocation types have to be added - for helpers and return
addresses.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Immediate loads are used to load the return address of a helper.
We need to be able to update those loads for relocations.
Immediate loads can be slightly more complex and spread over
two instructions in general, but here we only care about simple
loads of small (< 65k) constants, so complex cases are not handled.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
For map support we will need to send and receive control messages.
Add basic support for sending a message to FW, and waiting for a
reply.
Control messages are tagged with a 16 bit ID. Add a simple ID
allocator and make sure we don't allow too many messages in flight,
to avoid request <> reply mismatches.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
To be able to split code into reasonable chunks we need to add
the map data structures already. Later patches will add code
piece by piece.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
BPF map offload follow similar path to program offload. At creation
time users may specify ifindex of the device on which they want to
create the map. Map will be validated by the kernel's
.map_alloc_check callback and device driver will be called for the
actual allocation. Map will have an empty set of operations
associated with it (save for alloc and free callbacks). The real
device callbacks are kept in map->offload->dev_ops because they
have slightly different signatures. Map operations are called in
process context so the driver may communicate with HW freely,
msleep(), wait() etc.
Map alloc and free callbacks are muxed via existing .ndo_bpf, and
are always called with rtnl lock held. Maps and programs are
guaranteed to be destroyed before .ndo_uninit (i.e. before
unregister_netdev() returns). Map callbacks are invoked with
bpf_devs_lock *read* locked, drivers must take care of exclusive
locking if necessary.
All offload-specific branches are marked with unlikely() (through
bpf_map_is_dev_bound()), given that branch penalty will be
negligible compared to IO anyway, and we don't want to penalize
SW path unnecessarily.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add a helper to check if netdev could be found and whether it
has .ndo_bpf callback. There is no need to check the callback
every time it's invoked, ndos can't reasonably be swapped for
a set without .ndp_bpf while program is loaded.
bpf_dev_offload_check() will also be used by map offload.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
With map offload coming, we need to call program offload structure
something less ambiguous. Pure rename, no functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
All map types reimplement the field-by-field copy of union bpf_attr
members into struct bpf_map. Add a helper to perform this operation.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Use the new callback to perform allocation checks for hash maps.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Number of attribute checks are currently performed after hashtab
is already allocated. Move them to be able to split them out to
the check function later on. Checks have to now be performed on
the attr union directly instead of the members of bpf_map, since
bpf_map will be allocated later. No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
.map_alloc callbacks contain a number of checks validating user-
-provided map attributes against constraints of a particular map
type. For offloaded maps we will need to check map attributes
without actually allocating any memory on the host. Add a new
callback for validating attributes before any memory is allocated.
This callback can be selectively implemented by map types for
sharing code with offloads, or simply to separate the logical
steps of validation and allocation.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Remove the compile time warning when CONFIG_RETPOLINE=y and the compiler
does not have retpoline support. Linus rationale for this is:
It's wrong because it will just make people turn off RETPOLINE, and the
asm updates - and return stack clearing - that are independent of the
compiler are likely the most important parts because they are likely the
ones easiest to target.
And it's annoying because most people won't be able to do anything about
it. The number of people building their own compiler? Very small. So if
their distro hasn't got a compiler yet (and pretty much nobody does), the
warning is just annoying crap.
It is already properly reported as part of the sysfs interface. The
compile-time warning only encourages bad things.
Fixes: 76b043848f ("x86/retpoline: Add initial retpoline support")
Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Link: https://lkml.kernel.org/r/CA+55aFzWgquv4i6Mab6bASqYXg3ErV3XDFEYf=GEcCDQg5uAtw@mail.gmail.com
Pull NVMe fix from Jens Axboe:
"Just a single fix for nvme over fabrics that should go into 4.15"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme-fabrics: initialize default host->id in nvmf_host_default()
Pull x86 pti updates from Thomas Gleixner:
"This contains:
- a PTI bugfix to avoid setting reserved CR3 bits when PCID is
disabled. This seems to cause issues on a virtual machine at least
and is incorrect according to the AMD manual.
- a PTI bugfix which disables the perf BTS facility if PTI is
enabled. The BTS AUX buffer is not globally visible and causes the
CPU to fault when the mapping disappears on switching CR3 to user
space. A full fix which restores BTS on PTI is non trivial and will
be worked on.
- PTI bugfixes for EFI and trusted boot which make sure that the user
space visible page table entries have the NX bit cleared
- removal of dead code in the PTI pagetable setup functions
- add PTI documentation
- add a selftest for vsyscall to verify that the kernel actually
implements what it advertises.
- a sysfs interface to expose vulnerability and mitigation
information so there is a coherent way for users to retrieve the
status.
- the initial spectre_v2 mitigations, aka retpoline:
+ The necessary ASM thunk and compiler support
+ The ASM variants of retpoline and the conversion of affected ASM
code
+ Make LFENCE serializing on AMD so it can be used as speculation
trap
+ The RSB fill after vmexit
- initial objtool support for retpoline
As I said in the status mail this is the most of the set of patches
which should go into 4.15 except two straight forward patches still on
hold:
- the retpoline add on of LFENCE which waits for ACKs
- the RSB fill after context switch
Both should be ready to go early next week and with that we'll have
covered the major holes of spectre_v2 and go back to normality"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
x86,perf: Disable intel_bts when PTI
security/Kconfig: Correct the Documentation reference for PTI
x86/pti: Fix !PCID and sanitize defines
selftests/x86: Add test_vsyscall
x86/retpoline: Fill return stack buffer on vmexit
x86/retpoline/irq32: Convert assembler indirect jumps
x86/retpoline/checksum32: Convert assembler indirect jumps
x86/retpoline/xen: Convert Xen hypercall indirect jumps
x86/retpoline/hyperv: Convert assembler indirect jumps
x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
x86/retpoline/entry: Convert entry assembler indirect jumps
x86/retpoline/crypto: Convert crypto assembler indirect jumps
x86/spectre: Add boot time option to select Spectre v2 mitigation
x86/retpoline: Add initial retpoline support
objtool: Allow alternatives to be ignored
objtool: Detect jumps to retpoline thunks
x86/pti: Make unpoison of pgd for trusted boot work for real
x86/alternatives: Fix optimize_nops() checking
sysfs/cpu: Fix typos in vulnerability documentation
x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
...
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2018-01-12
This series contains updates to ixgbe, fm10k and net core.
Alex updates the driver to remove a duplicate MAC address check and
verifies that we have not run out of resources to configure a MAC rule
in our filter table. Also do not assume that dev->num_tc was populated
and configured with the driver, since it can be configured via mqprio
without any hardware coordination. Fixed the recording of stats for
MACVLAN in ixgbe and fm10k instead of recording the receive queue on
MACVLAN offloaded frames. When handling a MACVLAN offload, we should
be stopping/starting traffic on our own queues instead of the upper
devices transmit queues. Fixed possible race conditions with the
MACVLAN cleanup with the interface cleanup on shutdown. With the
recent fixes to ixgbe, we can cap the number of queues regardless of
accel_priv being in use or not, since the actual number of queues are
being reported via real_num_tx_queues.
Tony fixes up the kernel documentation for ixgbe and ixgbevf to resolve
warnings when W=1 is used.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko says:
====================
mlxsw: Offload PRIO qdisc
Nogah says:
Add an offload support for PRIO qdisc for mlxsw driver.
PRIO qdisc is being offloaded by using ndo_setup_tc. It has three
commands, to set or tune the qdisc, to remove it and to get its stats.
Like RED offloading, offloading this qdisc is not enforced on the driver
and determining its offload state is done in the dump action, when the
stats are being updated.
In the driver, offloading of PRIO is supported as root qdisc only. It
supports only priorities 0-7 (the range that is used by the current static
mapping of DSCP to skb prio and by 1:1 PCP values mapping) and up to 8
bands.
Patches 1-2 offload DSCP to priority mapping in the mlxsw_sp driver.
Patch 3 adds offload support for PRIO qdisc.
Patches 4-5 Add PRIO offload support in the mlxsw_sp driver.
---
v1->v2:
- Patch 1/5:
- Rewrite patch msg
- Patch 3/5:
- Send all the qstats in the replace command (and not just backlog)
- Patch 5/5:
- Align with the changes from 3/5
- Move backlog to the generic qdisc stats struct
- Delete extra newline
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Support basic stats for PRIO qdisc, which includes tx packets and bytes
count, drops count and backlog size. The rest of the stats are irrelevant
for this qdisc offload.
Since backlog is not only incremental but reflecting momentary value, in
case of a qdisc that stops being offloaded but is not destroyed, backlog
value needs to be updated about the un-offloading.
For that reason an unoffload function is being added to the ops struct.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for offloading PRIO qdisc as root qdisc.
The support is for up to 8 bands.
Routed packets priority is determined by the DSCP field with the default
translations. Bridged packets priority is determined by the PCP field, if
exist, otherwise it is set to 0.
Since both options have only priorities 0-7, higher priorities mapping are
being ignored.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the ability to offload PRIO qdisc by using ndo_setup_tc.
There are three commands for PRIO offloading:
* TC_PRIO_REPLACE: handles set and tune
* TC_PRIO_DESTROY: handles qdisc destroy
* TC_PRIO_STATS: updates the qdiscs counters (given as reference)
Like RED qdisc, the indication of whether PRIO is being offloaded is being
set and updated as part of the dump function. It is so because the driver
could decide to offload or not based on the qdisc parent, which could
change without notifying the qdisc.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When routing ip packets, the kernel is setting the SKB's priority
based on the tos field of the packet.
Imitate this behavior in the mlxsw router, having the internal
switch priority of a routed packet determined according to its DS
field.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add rdpm definition - router DSCP to priority mapping register.
This register will be utilized later to align the default mapping between
packet DSCP and switch-priority to the kernel's mapping between
packet priority and skb priority.
This is the first non-bit indexed register where the entries are arranged
in descending order, i.e., entry at offset 0 matches configuration for
dscp[63]. As a result, the item's step is converted into a signed variable
to support descending arrays [where step would be negative].
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn says:
====================
mv88e6xxx: ATU and VTU interrupts
Both the ATU and VTU of Mavell switches can generate interrupts when
violations occur. Trap this interrupts and print what violation
occurred.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When there is a problem with the VTU, an interrupt can be
generated. Trap this interrupt and decode the registers to determine
what the problem was, then log the error.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
When there is a problem with the ATU, an interrupt can be
generated. Trap this interrupt and decode the registers to determine
what the problem was, then log the error.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit eb789980d0 ("mlxsw: spectrum_router: Populate adjacency
entries according to weights") the driver includes support for
non-equal-cost multipath, but IPv4 nexthops were the only user.
Now that the kernel supports weighted IPv6 nexthops, we can extend the
driver to support it as well.
This is done by assigning each nexthop its configured weight, so that it
will be populated accordingly in the device's adjacency table. The
`weight` parameter is also taken into account when comparing nexthop
groups in order not to consolidate non-identical groups.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On targets that have different sizes for phys_addr_t and dma_addr_t,
we get a type mismatch error:
drivers/net/ethernet/socionext/netsec.c: In function 'netsec_alloc_dring':
drivers/net/ethernet/socionext/netsec.c:970:9: error: passing argument 3 of 'dma_zalloc_coherent' from incompatible pointer type [-Werror=incompatible-pointer-types]
The code is otherwise correct, as the address is never actually used as a
physical address but only passed into a DMA register. For consistently,
I'm changing the variable name as well, to clarify that this is a DMA
address.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2018-01-13
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Follow-up fix to the recent BPF out-of-bounds speculation
fix that prevents max_entries overflows and an undefined
behavior on 32 bit archs on index_mask calculation, from
Daniel.
2) Reject unsupported BPF_ARSH opcode in 32 bit ALU mode that
was otherwise throwing an unknown opcode warning in the
interpreter, from Daniel.
3) Typo fix in one of the user facing verbose() messages that
was added during the BPF out-of-bounds speculation fix,
from Colin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The intel_bts driver does not use the 'normal' BTS buffer which is exposed
through the cpu_entry_area but instead uses the memory allocated for the
perf AUX buffer.
This obviously comes apart when using PTI because then the kernel mapping;
which includes that AUX buffer memory; disappears. Fixing this requires to
expose a mapping which is visible in all context and that's not trivial.
As a quick fix disable this driver when PTI is enabled to prevent
malfunction.
Fixes: 385ce0ea4c ("x86/mm/pti: Add Kconfig")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Robert Święcki <robert@swiecki.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: greg@kroah.com
Cc: hughd@google.com
Cc: luto@amacapital.net
Cc: Vince Weaver <vince@deater.net>
Cc: torvalds@linux-foundation.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180114102713.GB6166@worktop.programming.kicks-ass.net
The switch to the user space page tables in the low level ASM code sets
unconditionally bit 12 and bit 11 of CR3. Bit 12 is switching the base
address of the page directory to the user part, bit 11 is switching the
PCID to the PCID associated with the user page tables.
This fails on a machine which lacks PCID support because bit 11 is set in
CR3. Bit 11 is reserved when PCID is inactive.
While the Intel SDM claims that the reserved bits are ignored when PCID is
disabled, the AMD APM states that they should be cleared.
This went unnoticed as the AMD APM was not checked when the code was
developed and reviewed and test systems with Intel CPUs never failed to
boot. The report is against a Centos 6 host where the guest fails to boot,
so it's not yet clear whether this is a virt issue or can happen on real
hardware too, but thats irrelevant as the AMD APM clearly ask for clearing
the reserved bits.
Make sure that on non PCID machines bit 11 is not set by the page table
switching code.
Andy suggested to rename the related bits and masks so they are clearly
describing what they should be used for, which is done as well for clarity.
That split could have been done with alternatives but the macro hell is
horrible and ugly. This can be done on top if someone cares to remove the
extra orq. For now it's a straight forward fix.
Fixes: 6fd166aae7 ("x86/mm: Use/Fix PCID to optimize user/kernel switches")
Reported-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Willy Tarreau <w@1wt.eu>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801140009150.2371@nanos
Here are some small USB fixes and device ids for 4.15-rc8
Nothing major, small fixes for various devices, some resolutions for
bugs found by fuzzers, and the usual handful of new device ids.
All of these have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWlpsFw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylJKQCgsPvjc28iHycd//TY0XyXIAqd3VoAn0CsuoGz
evubxegyFv0f2XcRbiaQ
=WIUz
-----END PGP SIGNATURE-----
Merge tag 'usb-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes and device ids for 4.15-rc8
Nothing major, small fixes for various devices, some resolutions for
bugs found by fuzzers, and the usual handful of new device ids.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
Documentation: usb: fix typo in UVC gadgetfs config command
usb: misc: usb3503: make sure reset is low for at least 100us
uas: ignore UAS for Norelsys NS1068(X) chips
USB: UDC core: fix double-free in usb_add_gadget_udc_release
USB: fix usbmon BUG trigger
usbip: vudc_tx: fix v_send_ret_submit() vulnerability to null xfer buffer
usbip: remove kernel addresses from usb device and urb debug msgs
usbip: fix vudc_rx: harden CMD_SUBMIT path to handle malicious input
USB: serial: cp210x: add new device ID ELV ALC 8xxx
USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ
Here is a single android ashmem bugfix that resolves a reported issue in
that interface. It's been in linux-next this week with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWlprFg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykYswCgqA08XoDPBQTSIHDamGy2thT2C+UAn3DGDjbk
3Rm+GI4DZugzjKh15pU0
=3aj6
-----END PGP SIGNATURE-----
Merge tag 'staging-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fix from Greg KH:
"Here is a single android ashmem bugfix that resolves a reported issue
in that interface. It's been in linux-next this week with no reported
issues"
* tag 'staging-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: android: ashmem: fix a race condition in ASHMEM_SET_SIZE ioctl
Here are two bugfixes for some driver bugs for 4.15-rc8
The first is a bluetooth security bug that has been ignored by the
Bluetooth developers for months for no obvious reason at all, so I've
taken it through my tree.
The second is a simple double-free bug in the mux subsystem.
Both have been in linux-next for a while with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWlppww8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylnwgCeOrW4MKzAG9nc+ZWKRw5CeWVqx9AAoLyQeiF6
KyLdQ6C2GiSRHtz7memv
=Zbvd
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
"Here are two bugfixes for some driver bugs for 4.15-rc8
The first is a bluetooth security bug that has been ignored by the
Bluetooth developers for months for no obvious reason at all, so I've
taken it through my tree.
The second is a simple double-free bug in the mux subsystem.
Both have been in linux-next for a while with no reported issues"
* tag 'char-misc-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
mux: core: fix double get_device()
Bluetooth: Prevent stack info leak from the EFS element.
- fix cross-compilation for architectures that setup CROSS_COMPILE
in their arch Makefile
- fix Kconfig rational operators for bool / tristate
- drop a gperf-generated file from .gitignore
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJaWgIlAAoJED2LAQed4NsGniEQAKFEhwU3XErlJHhfz1o84oJT
ycfulcz1VDUrxpfYi7/uCiBNB7B5rajnH0fclkMwt4oV7wJgWbtDTzlLNIhX5vQI
e+zDc8GMv/9rBks8OBypsUwlW/3etP6PjL1uC2KgG8F/2gNoodJs9Y/5Jfb3esKF
Lh8A1LMqSKNG4B9pjRBQv1k2KX6K5R011UGKLg6qcek1lj8r9NpdgKo/jH5tjrTb
y5weRpIkofb4sqBjls+7H24DUWg2GVSunEIBBNyxqwn52UhcSNcC2s+jdKqwmS50
R2jP8ENXyiATCJfVdKguhiTQJ4xLbTbHrL1K9vGpimj+3PAf37VbRlhXPJ6FVyCm
vuxv6HHL9a7Pm7o/sQxWmHD6GQa6/DCD+j8LPR5ro3Imkh0zqTqvA8R3mnX3NnVz
lj2Bu+Ii+OaSgoN7B2lLgIkr8uc99CErEcqjI2fxKm5hVbuqGF9nciiGMLc0fXJW
9alfkdi911LR3SjmwvFngGtq0SXOTG830J2ERfoD4zCKVg5ZffrFyMPLPgdze3Uv
BKDynomNfCxciz1h4/MZunUOjrViUFfHXDwkSnkfXAmOGfCM1XoE7/aSOuTzvTnl
CT8Sk9RIa90AogVnkA92Zza93Itophpdqdnw8acJIDDrNaToNprzTLB4HNDcKWy9
3k7yoa63AXBvE1k+ofJK
=py+X
-----END PGP SIGNATURE-----
Merge tag 'kbuild-fixes-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- fix cross-compilation for architectures that setup CROSS_COMPILE in
their arch Makefile
- fix Kconfig rational operators for bool / tristate
- drop a gperf-generated file from .gitignore
* tag 'kbuild-fixes-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
genksyms: drop *.hash.c from .gitignore
kconfig: fix relational operators for bool and tristate symbols
kbuild: move cc-option and cc-disable-warning after incl. arch Makefile
- apparmor: Fix regression in profile conflict logic
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJaWWwtAAoJEAUvNnAY1cPY/0gP/iHnqNR705LOuLzy7Iya0Iy+
n8J9Tba9dQMGpiR8W5EIMklP0Hj+Wt50ngwmemJrjNpALphGeET16Qg01iYQcPNh
fR/oc9ZA5vpn7fFTnQZzlvcxRZCBJtuuqBcvM6oNjVVqLh6t6EN+viqh97XFZyc+
XFFYdCTRQ1JSgW7RIw/h8rFAvfPXIuXA7SwAQ/+PXlxOMi0fbZGbsVAiXPI1+pmX
wXj3Vtgisb/i4cSJtG9wujGs/SNjmN/2Yyzj2ZjKh3Y4lxug2bko3BfrrbD7uAeL
XOARp9ELjUk5m9tI9NuGpr9iIT392BaVvJDKyN8byK8skMOOsomG/zypVFLAbsBc
w0dN5IxLvyk3lH7amSBj+aJKQV0cthss0L11WlMuftLiqsDzRMec2md35+T3eSt3
AQ132tTHcQ3SiMd4b/yxXe1CjJEo62dBQzppLpV27lcpurOJdhQYEAC0vfXGpkgJ
5UIn7t5wNpb5GC+CBSQ+qFhBTtGEY9MdhcM/mZJrBTRWmId7umm3UdUtihVz8qlJ
uue+S8h3L7TUFkdIRcGgViJrcQsvDFhabkHvsKppl5CYRrT6Lu46XBTZV6ioZcsm
maIuVY7agFVk4Z0Fyc+zYM7e/IDoVVBdBwUKDWNnEkFmRKE7RrWsBq59PBksLUtz
G/8zxZ//8MRhlm5uyuid
=Myva
-----END PGP SIGNATURE-----
Merge tag 'apparmor-pr-2018-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
Pull apparmor regression fixes from John Johansen:
"This fixes a couple bugs I have been working with Matthew Garrett on
this week. Specifically a regression in the handling of a conflicting
profile attachment and label match restrictions for ptrace when
profiles are stacked.
Summary:
- fix ptrace label match when matching stacked labels
- fix regression in profile conflict logic"
* tag 'apparmor-pr-2018-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: Fix regression in profile conflict logic
apparmor: fix ptrace label match when matching stacked labels
Merge misc fixlets from Andrew Morton:
"4 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
tools/objtool/Makefile: don't assume sync-check.sh is executable
kdump: write correct address of mem_section into vmcoreinfo
kmemleak: allow to coexist with fault injection
MAINTAINERS, nilfs2: change project home URLs
patch(1) loses the x bit. So if a user follows our patching
instructions in Documentation/admin-guide/README.rst, their kernel will
not compile.
Fixes: 3bd51c5a37 ("objtool: Move kernel headers/code sync check to a script")
Reported-by: Nicolas Bock <nicolasbock@gentoo.org>
Reported-by Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Depending on configuration mem_section can now be an array or a pointer
to an array allocated dynamically. In most cases, we can continue to
refer to it as 'mem_section' regardless of what it is.
But there's one exception: '&mem_section' means "address of the array"
if mem_section is an array, but if mem_section is a pointer, it would
mean "address of the pointer".
We've stepped onto this in kdump code. VMCOREINFO_SYMBOL(mem_section)
writes down address of pointer into vmcoreinfo, not array as we wanted.
Let's introduce VMCOREINFO_SYMBOL_ARRAY() that would handle the
situation correctly for both cases.
Link: http://lkml.kernel.org/r/20180112162532.35896-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: 83e3c48729 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kmemleak does one slab allocation per user allocation. So if slab fault
injection is enabled to any degree, kmemleak instantly fails to allocate
and turns itself off. However, it's useful to use kmemleak with fault
injection to find leaks on error paths. On the other hand, checking
kmemleak itself is not so useful because (1) it's a debugging tool and
(2) it has a very regular allocation pattern (basically a single
allocation site, so it either works or not).
Turn off fault injection for kmemleak allocations.
Link: http://lkml.kernel.org/r/20180109192243.19316-1-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The domain of NILFS project home was changed to "nilfs.sourceforge.io"
to enable https access (the previous domain "nilfs.sourceforge.net" is
redirected to the new one). Modify URLs of the project home to reflect
this change and to replace their protocol from http to https.
Link: http://lkml.kernel.org/r/1515416141-5614-1-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a left-over of commit bb3290d916 ("Remove gperf usage from
toolchain").
We do not generate a hash function any more.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
This tests that the vsyscall entries do what they're expected to do.
It also confirms that attempts to read the vsyscall page behave as
expected.
If changes are made to the vsyscall code or its memory map handling,
running this test in all three of vsyscall=none, vsyscall=emulate,
and vsyscall=native are helpful.
(Because it's easy, this also compares the vsyscall results to their
vDSO equivalents.)
Note to KAISER backporters: please test this under all three
vsyscall modes. Also, in the emulate and native modes, make sure
that test_vsyscall_64 agrees with the command line or config
option as to which mode you're in. It's quite easy to mess up
the kernel such that native mode accidentally emulates
or vice versa.
Greg, etc: please backport this to all your Meltdown-patched
kernels. It'll help make sure the patches didn't regress
vsyscalls.
CSigned-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/2b9c5a174c1d60fd7774461d518aa75598b1d8fd.1515719552.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>