Pull x86 apic updates from Thomas Gleixner:
"This update provides:
- Cleanup of the IDT management including the removal of the extra
tracing IDT. A first step to cleanup the vector management code.
- The removal of the paravirt op adjust_exception_frame. This is a
XEN specific issue, but merged through this branch to avoid nasty
merge collisions
- Prevent dmesg spam about the TSC DEADLINE bug, when the CPU has
disabled the TSC DEADLINE timer in CPUID.
- Adjust a debug message in the ioapic code to print out the
information correctly"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
x86/idt: Fix the X86_TRAP_BP gate
x86/xen: Get rid of paravirt op adjust_exception_frame
x86/eisa: Add missing include
x86/idt: Remove superfluous ALIGNment
x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on CPUs without the feature
x86/idt: Remove the tracing IDT leftovers
x86/idt: Hide set_intr_gate()
x86/idt: Simplify alloc_intr_gate()
x86/idt: Deinline setup functions
x86/idt: Remove unused functions/inlines
x86/idt: Move interrupt gate initialization to IDT code
x86/idt: Move APIC gate initialization to tables
x86/idt: Move regular trap init to tables
x86/idt: Move IST stack based traps to table init
x86/idt: Move debug stack init to table based
x86/idt: Switch early trap init to IDT tables
x86/idt: Prepare for table based init
x86/idt: Move early IDT setup out of 32-bit asm
x86/idt: Move early IDT handler setup to IDT code
x86/idt: Consolidate IDT invalidation
...
Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()
Remove now useless invalidate_page callback.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: xen-devel@lists.xenproject.org (moderated for non-subscribers)
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit aba831a69632 ("xen: remove tests for pvh mode in pure pv paths")
removed XENFEAT_auto_translated_physmap test in xen_alloc_p2m_entry()
since it is assumed that the routine is never called by non-PV guests.
However, alloc_xenballooned_pages() may make this call on a PVH guest.
Prevent this from happening by adding XENFEAT_auto_translated_physmap
check there.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Fixes: aba831a69632 ("xen: remove tests for pvh mode in pure pv paths")
When booting Linux as Xen guest with CONFIG_DEBUG_ATOMIC, the following
splat appears:
[ 0.002323] Mountpoint-cache hash table entries: 1024 (order: 1, 8192 bytes)
[ 0.019717] ASID allocator initialised with 65536 entries
[ 0.020019] xen:grant_table: Grant tables using version 1 layout
[ 0.020051] Grant table initialized
[ 0.020069] BUG: sleeping function called from invalid context at /data/src/linux/mm/page_alloc.c:4046
[ 0.020100] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
[ 0.020123] no locks held by swapper/0/1.
[ 0.020143] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc5 #598
[ 0.020166] Hardware name: FVP Base (DT)
[ 0.020182] Call trace:
[ 0.020199] [<ffff00000808a5c0>] dump_backtrace+0x0/0x270
[ 0.020222] [<ffff00000808a95c>] show_stack+0x24/0x30
[ 0.020244] [<ffff000008c1ef20>] dump_stack+0xb8/0xf0
[ 0.020267] [<ffff0000081128c0>] ___might_sleep+0x1c8/0x1f8
[ 0.020291] [<ffff000008112948>] __might_sleep+0x58/0x90
[ 0.020313] [<ffff0000082171b8>] __alloc_pages_nodemask+0x1c0/0x12e8
[ 0.020338] [<ffff00000827a110>] alloc_page_interleave+0x38/0x88
[ 0.020363] [<ffff00000827a904>] alloc_pages_current+0xdc/0xf0
[ 0.020387] [<ffff000008211f38>] __get_free_pages+0x28/0x50
[ 0.020411] [<ffff0000086566a4>] evtchn_fifo_alloc_control_block+0x2c/0xa0
[ 0.020437] [<ffff0000091747b0>] xen_evtchn_fifo_init+0x38/0xb4
[ 0.020461] [<ffff0000091746c0>] xen_init_IRQ+0x44/0xc8
[ 0.020484] [<ffff000009128adc>] xen_guest_init+0x250/0x300
[ 0.020507] [<ffff000008083974>] do_one_initcall+0x44/0x130
[ 0.020531] [<ffff000009120df8>] kernel_init_freeable+0x120/0x288
[ 0.020556] [<ffff000008c31ca8>] kernel_init+0x18/0x110
[ 0.020578] [<ffff000008083710>] ret_from_fork+0x10/0x40
[ 0.020606] xen:events: Using FIFO-based ABI
[ 0.020658] Xen: initializing cpu0
[ 0.027727] Hierarchical SRCU implementation.
[ 0.036235] EFI services will not be available.
[ 0.043810] smp: Bringing up secondary CPUs ...
This is because get_cpu() in xen_evtchn_fifo_init() will disable
preemption, but __get_free_page() might sleep (GFP_ATOMIC is not set).
xen_evtchn_fifo_init() will always be called before SMP is initialized,
so {get,put}_cpu() could be replaced by a simple smp_processor_id().
This also avoid to modify evtchn_fifo_alloc_control_block that will be
called in other context.
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reported-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Fixes: 1fe565517b ("xen/events: use the FIFO-based ABI if available")
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
__WARN() is an internal helper that is only available on
some architectures, but causes a build error e.g. on ARM64
in some configurations:
drivers/xen/pvcalls-back.c: In function 'set_backend_state':
drivers/xen/pvcalls-back.c:1097:5: error: implicit declaration of function '__WARN' [-Werror=implicit-function-declaration]
Unfortunately, there is no equivalent of BUG() that takes no
arguments, but WARN_ON(1) is commonly used in other drivers
and works on all configurations.
Fixes: 7160378206b2 ("xen/pvcalls: xenbus state handling")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
When the other end notifies us that there is data to be written
(pvcalls_back_conn_event), increment the io and write counters, and
schedule the ioworker.
Implement the write function called by ioworker by reading the data from
the data ring, writing it to the socket by calling inet_sendmsg.
Set out_error on error.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
When an active socket has data available, increment the io and read
counters, and schedule the ioworker.
Implement the read function by reading from the socket, writing the data
to the data ring.
Set in_error on error.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
We have one ioworker per socket. Each ioworker goes through the list of
outstanding read/write requests. Once all requests have been dealt with,
it returns.
We use one atomic counter per socket for "read" operations and one
for "write" operations to keep track of the reads/writes to do.
We also use one atomic counter ("io") per ioworker to keep track of how
many outstanding requests we have in total assigned to the ioworker. The
ioworker finishes when there are none.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Release both active and passive sockets. For active sockets, make sure
to avoid possible conflicts with the ioworker reading/writing to those
sockets concurrently. Set map->release to let the ioworker know
atomically that the socket will be released soon, then wait until the
ioworker finishes (flush_work).
Unmap indexes pages and data rings.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Implement poll on passive sockets by requesting a delayed response with
mappass->reqcopy, and reply back when there is data on the passive
socket.
Poll on active socket is unimplemented as by the spec, as the frontend
should just wait for events and check the indexes on the indexes page.
Only support one outstanding poll (or accept) request for every passive
socket at any given time.
[ boris: fixed long lines ]
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Implement the accept command by calling inet_accept. To avoid blocking
in the kernel, call inet_accept(O_NONBLOCK) from a workqueue, which get
scheduled on sk_data_ready (for a passive socket, it means that there
are connections to accept).
Use the reqcopy field to store the request. Accept the new socket from
the delayed work function, create a new sock_mapping for it, map
the indexes page and data ring, and reply to the other end. Allocate an
ioworker for the socket.
Only support one outstanding blocking accept request for every socket at
any time.
Add a field to sock_mapping to remember the passive socket from which an
active socket was created.
[ boris: fixed whitespaces ]
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Allocate a socket. Track the allocated passive sockets with a new data
structure named sockpass_mapping. It contains an unbound workqueue to
schedule delayed work for the accept and poll commands. It also has a
reqcopy field to be used to store a copy of a request for delayed work.
Reads/writes to it are protected by a lock (the "copy_lock" spinlock).
Initialize the workqueue in pvcalls_back_bind.
Implement the bind command with inet_bind.
The pass_sk_data_ready event handler will be added later.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Allocate a socket. Keep track of socket <-> ring mappings with a new data
structure, called sock_mapping. Implement the connect command by calling
inet_stream_connect, and mapping the new indexes page and data ring.
Allocate a workqueue and a work_struct, called ioworker, to perform
reads and writes to the socket.
When an active socket is closed (sk_state_change), set in_error to
-ENOTCONN and notify the other end, as specified by the protocol.
sk_data_ready and pvcalls_back_ioworker will be implemented later.
[ boris: fixed whitespaces ]
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Just reply with success to the other end for now. Delay the allocation
of the actual socket to bind and/or connect.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
When the other end notifies us that there are commands to be read
(pvcalls_back_event), wake up the backend thread to parse the command.
The command ring works like most other Xen rings, so use the usual
ring macros to read and write to it. The functions implementing the
commands are empty stubs for now.
[ boris: fixed whitespaces ]
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Introduce a per-frontend data structure named pvcalls_fedata. It
contains pointers to the command ring, its event channel, a list of
active sockets and a tree of passive sockets (passing sockets need to be
looked up from the id on listen, accept and poll commands, while active
sockets only on release).
It also has an unbound workqueue to schedule the work of parsing and
executing commands on the command ring. socket_lock protects the two
lists. In pvcalls_back_global, keep a list of connected frontends.
[ boris: fixed whitespaces/long lines ]
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Introduce the code to handle xenbus state changes.
Implement the probe function for the pvcalls backend. Write the
supported versions, max-page-order and function-calls nodes to xenstore,
as required by the protocol.
Introduce stub functions for disconnecting/connecting to a frontend.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Keep a list of connected frontends. Use a semaphore to protect list
accesses.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Introduce a xenbus backend for the pvcalls protocol, as defined by
https://xenbits.xen.org/docs/unstable/misc/pvcalls.html.
This patch only adds the stubs, the code will be added by the following
patches.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
The only users of alloc_intr_gate() are hypervisors, which both check the
used_vectors bitmap whether they have allocated the gate already. Move that
check into alloc_intr_gate() and simplify the users.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064959.580830286@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It's better to be explicit and use the DRIVER_ATTR_RW() and
DRIVER_ATTR_RO() macros when defining a driver's sysfs file.
Bonus is this fixes up a checkpatch.pl warning.
This is part of a series to drop DRIVER_ATTR() from the tree entirely.
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- fix linker script regression caused by dead code elimination support
- fix typos and outdated comments
- specify kselftest-clean as a PHONY target
- fix "make dtbs_install" when $(srctree) includes shell special
characters like '~'
- Move -fshort-wchar to the global option list because defining it
partially emits warnings
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZnvezAAoJED2LAQed4NsGTK0P/jZY3jNT3tzgldH5ggCeA7Z2
qekRVR3z0CCm3FU+NlWLVNsiyWDRxuN93lIVML8f7wt/kuJDNxycx5G7+Yiw60yH
h41gibJJdS3GL+VzfGUk1WeZkoAiGVVbLvENJzOQRIeHblsi8Ris+pBwT3MY/WKl
XM0k+xsc/abB3dnhgws8WiUvrNhRiqsciq7IBkDPzxGa4JC53+9Yjjc1HA6gah+g
gKX2qZmsjKm9Hsueot9clTttT+iDHmJ0QetHeMGy0T6bqjlRzG+H37ZGtGv3ZP4p
wcsct8QJt+8/TSEtvkROg2apBUlrEkblvsL/G2UFew42Fz4Z3XLariHTcrDoYuHY
IiUc/tUbmx6Ft72ZvepD+qN+rxQiWKqRzTD+7sDDNPPbdgC4pwZ2Z6m7Oih8mRT5
jGlex4PN/rA5ZvTDFF8c/1GhStZrfxM7A7t6k4Snui6hOpxcpK9vHDgKrfGjk83h
xkwvrPIXnreaAvUYz18JhTU1Zuzj5vwIZgPI1bRl3fi4JmagsAwl+IJawWXMFODA
WONJnqavCYRcdffD0cGfJqA8YSmpxfiDxtP7NiOt05jagDHDGH6bbsUR5Oni/9wU
9D8wdn4XdrrYGG2kEVMDa0x4Yn8vVuPfyxy5DgcRciMcYCUfNHOIHFutRvFhwcqm
rV7ege5oIH8DdsWoE5pj
=m8Zm
-----END PGP SIGNATURE-----
Merge tag 'kbuild-fixes-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- fix linker script regression caused by dead code elimination support
- fix typos and outdated comments
- specify kselftest-clean as a PHONY target
- fix "make dtbs_install" when $(srctree) includes shell special
characters like '~'
- Move -fshort-wchar to the global option list because defining it
partially emits warnings
* tag 'kbuild-fixes-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: update comments of Makefile.asm-generic
kbuild: Do not use hyphen in exported variable name
Makefile: add kselftest-clean to PHONY target list
Kbuild: use -fshort-wchar globally
fixdep: trivial: typo fix and correction
kbuild: trivial cleanups on the comments
kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured
Commit 971a69db7d ("Xen: don't warn about 2-byte wchar_t in efi")
added the --no-wchar-size-warning to the Makefile to avoid this
harmless warning:
arm-linux-gnueabi-ld: warning: drivers/xen/efi.o uses 2-byte wchar_t yet the output is to use 4-byte wchar_t; use of wchar_t values across objects may fail
Changing kbuild to use thin archives instead of recursive linking
unfortunately brings the same warning back during the final link.
The kernel does not use wchar_t string literals at this point, and
xen does not use wchar_t at all (only efi_char16_t), so the flag
has no effect, but as pointed out by Jan Beulich, adding a wchar_t
string literal would be bad here.
Since wchar_t is always defined as u16, independent of the toolchain
default, always passing -fshort-wchar is correct and lets us
remove the Xen specific hack along with fixing the warning.
Link: https://patchwork.kernel.org/patch/9275217/
Fixes: 971a69db7d ("Xen: don't warn about 2-byte wchar_t in efi")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Pull xen block changes from Konrad:
Two fixes, both of them spotted by Amazon:
1) Fix in Xen-blkfront caused by the re-write in 4.8 time-frame.
2) Fix in the xen_biovec_phys_mergeable which allowed guest
requests when using NVMe - to slurp up more data than allowed
leading to an XSA (which has been made public today).
The current test for bio vec merging is not fully accurate and can be
tricked into merging bios when certain grant combinations are used.
The result of these malicious bio merges is a bio that extends past
the memory page used by any of the originating bios.
Take into account the following scenario, where a guest creates two
grant references that point to the same mfn, ie: grant 1 -> mfn A,
grant 2 -> mfn A.
These references are then used in a PV block request, and mapped by
the backend domain, thus obtaining two different pfns that point to
the same mfn, pfn B -> mfn A, pfn C -> mfn A.
If those grants happen to be used in two consecutive sectors of a disk
IO operation becoming two different bios in the backend domain, the
checks in xen_biovec_phys_mergeable will succeed, because bfn1 == bfn2
(they both point to the same mfn). However due to the bio merging,
the backend domain will end up with a bio that expands past mfn A into
mfn A + 1.
Fix this by making sure the check in xen_biovec_phys_mergeable takes
into account the offset and the length of the bio, this basically
replicates whats done in __BIOVEC_PHYS_MERGEABLE using mfns (bus
addresses). While there also remove the usage of
__BIOVEC_PHYS_MERGEABLE, since that's already checked by the callers
of xen_biovec_phys_mergeable.
CC: stable@vger.kernel.org
Reported-by: "Jan H. Schönherr" <jschoenh@amazon.de>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Here is a device has xen-pirq-MSI interrupt. Dom0 might lost interrupt
during driver irq_disable/irq_enable. Here is the scenario,
1. irq_disable -> disable_dynirq -> mask_evtchn(irq channel)
2. dev interrupt raised by HW and Xen mark its evtchn as pending
3. irq_enable -> startup_pirq -> eoi_pirq ->
clear_evtchn(channel of irq) -> clear pending status
4. consume_one_event process the irq event without pending bit assert
which result in interrupt lost once
5. No HW interrupt raising anymore.
Now use enable_dynirq for enable_pirq of xen_pirq_chip to remove
eoi_pirq when irq_enable.
Signed-off-by: Liu Shuo <shuo.a.liu@intel.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
When starting the xenwatch thread a theoretical deadlock situation is
possible:
xs_init() contains:
task = kthread_run(xenwatch_thread, NULL, "xenwatch");
if (IS_ERR(task))
return PTR_ERR(task);
xenwatch_pid = task->pid;
And xenwatch_thread() does:
mutex_lock(&xenwatch_mutex);
...
event->handle->callback();
...
mutex_unlock(&xenwatch_mutex);
The callback could call unregister_xenbus_watch() which does:
...
if (current->pid != xenwatch_pid)
mutex_lock(&xenwatch_mutex);
...
In case a watch is firing before xenwatch_pid could be set and the
callback of that watch unregisters a watch, then a self-deadlock would
occur.
Avoid this by setting xenwatch_pid in xenwatch_thread().
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Instead of fiddling with masking the event channels during suspend
and resume handling let do the irq subsystem do its job. It will do
the mask and unmask operations as needed.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Remove unnecessary static on local variables last_frontswap_pages and
tgt_frontswap_pages. Such variables are initialized before being used,
on every execution path throughout the function. The statics have no
benefit and, removing them reduce the code size.
This issue was detected using Coccinelle and the following semantic patch:
@bad exists@
position p;
identifier x;
type T;
@@
static T x@p;
...
x = <+...x...+>
@@
identifier x;
expression e;
type T;
position p != bad.p;
@@
-static
T x@p;
... when != x
when strict
?x = e;
You can see a significant difference in the code size after executing
the size command, before and after the code change:
before:
text data bss dec hex filename
5633 3452 384 9469 24fd drivers/xen/xen-selfballoon.o
after:
text data bss dec hex filename
5576 3308 256 9140 23b4 drivers/xen/xen-selfballoon.o
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
On systems that are not booted as a Xen domain, the xenfs driver prints
the following message during boot.
[ 3.460595] xenfs: not registering filesystem on non-xen platform
As the user chose not to boot a Xen domain, this message does not
provide useful information. Drop this message.
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
When setting up the Xenstore watch for the memory target size the new
watch will fire at once. Don't try to reach the configured target size
by onlining new memory in this case, as the current memory size will
be smaller in almost all cases due to e.g. BIOS reserved pages.
Onlining new memory will lead to more problems e.g. undesired conflicts
with NVMe devices meant to be operated as block devices.
Instead remember the difference between target size and current size
when the watch fires for the first time and apply it to any further
size changes, too.
In order to avoid races between balloon.c and xen-balloon.c init calls
do the xen-balloon.c initialization from balloon.c.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
log a message when we enter this situation:
1) we already allocated the max number of available grants from hypervisor
and
2) we still need more (but the request fails because of 1)).
Sometimes the lack of grants causes IO hangs in xen_blkfront devices.
Adding this log would help debuging.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Pull SCSI target updates from Nicholas Bellinger:
"It's been usually busy for summer, with most of the efforts centered
around TCMU developments and various target-core + fabric driver bug
fixing activities. Not particularly large in terms of LoC, but lots of
smaller patches from many different folks.
The highlights include:
- ibmvscsis logical partition manager support (Michael Cyr + Bryant
Ly)
- Convert target/iblock WRITE_SAME to blkdev_issue_zeroout (hch +
nab)
- Add support for TMR percpu LUN reference counting (nab)
- Fix a potential deadlock between EXTENDED_COPY and iscsi shutdown
(Bart)
- Fix COMPARE_AND_WRITE caw_sem leak during se_cmd quiesce (Jiang Yi)
- Fix TMCU module removal (Xiubo Li)
- Fix iser-target OOPs during login failure (Andrea Righi + Sagi)
- Breakup target-core free_device backend driver callback (mnc)
- Perform TCMU add/delete/reconfig synchronously (mnc)
- Fix TCMU multiple UIO open/close sequences (mnc)
- Fix TCMU CHECK_CONDITION sense handling (mnc)
- Fix target-core SAM_STAT_BUSY + TASK_SET_FULL handling (mnc + nab)
- Introduce TYPE_ZBC support in PSCSI (Damien Le Moal)
- Fix possible TCMU memory leak + OOPs when recalculating cmd base
size (Xiubo Li + Bryant Ly + Damien Le Moal + mnc)
- Add login_keys_workaround attribute for non RFC initiators (Robert
LeBlanc + Arun Easi + nab)"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (68 commits)
iscsi-target: Add login_keys_workaround attribute for non RFC initiators
Revert "qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR ABORT"
tcmu: clean up the code and with one small fix
tcmu: Fix possbile memory leak / OOPs when recalculating cmd base size
target: export lio pgr/alua support as device attr
target: Fix return sense reason in target_scsi3_emulate_pr_out
target: Fix cmd size for PR-OUT in passthrough_parse_cdb
tcmu: Fix dev_config_store
target: pscsi: Introduce TYPE_ZBC support
target: Use macro for WRITE_VERIFY_32 operation codes
target: fix SAM_STAT_BUSY/TASK_SET_FULL handling
target: remove transport_complete
pscsi: finish cmd processing from pscsi_req_done
tcmu: fix sense handling during completion
target: add helper to copy sense to se_cmd buffer
target: do not require a transport_complete for SCF_TRANSPORT_TASK_SENSE
target: make device_mutex and device_list static
tcmu: Fix flushing cmd entry dcache page
tcmu: fix multiple uio open/close sequences
tcmu: drop configured check in destroy
...
Target drivers must guarantee that struct se_cmd and struct se_tmr_req
exist as long as target_tmr_work() is in progress. Since the last
access by the LIO core is a call to .check_stop_free() and since the
Xen scsiback .check_stop_free() drops a reference to the TMF, it is
already guaranteed that the struct se_cmd that corresponds to the TMF
exists as long as target_tmr_work() is in progress. Hence change the
second argument of transport_generic_free_cmd() from 1 into 0.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: David Disseldorp <ddiss@suse.de>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch simplifies the implementation of the scsiback driver
but does not change its behavior.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: David Disseldorp <ddiss@suse.de>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
scsiback_release_cmd() must not dereference se_cmd->se_tmr_req
because that memory is freed by target_free_cmd_mem() before
scsiback_release_cmd() is called. Fix this use-after-free by
inlining struct scsiback_tmr into struct vscsibk_pend.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: David Disseldorp <ddiss@suse.de>
Cc: xen-devel@lists.xenproject.org
Cc: <stable@vger.kernel.org> # 3.18+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
In this new subsystem we'll try to properly maintain all the generic
code related to dma-mapping, and will further consolidate arch code
into common helpers.
This pull request contains:
- removal of the DMA_ERROR_CODE macro, replacing it with calls
to ->mapping_error so that the dma_map_ops instances are
more self contained and can be shared across architectures (me)
- removal of the ->set_dma_mask method, which duplicates the
->dma_capable one in terms of functionality, but requires more
duplicate code.
- various updates for the coherent dma pool and related arm code
(Vladimir)
- various smaller cleanups (me)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlldmw0LHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYOiKA/+Ln1mFLSf3nfTzIHa24Bbk8ZTGr0B8TD4Vmyyt8iG
oO3AeaTLn3d6ugbH/uih/tPz8PuyXsdiTC1rI/ejDMiwMTSjW6phSiIHGcStSR9X
VFNhmMFacp7QpUpvxceV0XZYKDViAoQgHeGdp3l+K5h/v4AYePV/v/5RjQPaEyOh
YLbCzETO+24mRWdJxdAqtTW4ovYhzj6XsiJ+pAjlV0+SWU6m5L5E+VAPNi1vqv1H
1O2KeCFvVYEpcnfL3qnkw2timcjmfCfeFAd9mCUAc8mSRBfs3QgDTKw3XdHdtRml
LU2WuA5cpMrOdBO4mVra2plo8E2szvpB1OZZXoKKdCpK3VGwVpVHcTvClK2Ks/3B
GDLieroEQNu2ZIUIdWXf/g2x6le3BcC9MmpkAhnGPqCZ7skaIBO5Cjpxm0zTJAPl
PPY3CMBBEktAvys6DcudOYGixNjKUuAm5lnfpcfTEklFdG0AjhdK/jZOplAFA6w4
LCiy0rGHM8ZbVAaFxbYoFCqgcjnv6EjSiqkJxVI4fu/Q7v9YXfdPnEmE0PJwCVo5
+i7aCLgrYshTdHr/F3e5EuofHN3TDHwXNJKGh/x97t+6tt326QMvDKX059Kxst7R
rFukGbrYvG8Y7yXwrSDbusl443ta0Ht7T1oL4YUoJTZp0nScAyEluDTmrH1JVCsT
R4o=
=0Fso
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping infrastructure from Christoph Hellwig:
"This is the first pull request for the new dma-mapping subsystem
In this new subsystem we'll try to properly maintain all the generic
code related to dma-mapping, and will further consolidate arch code
into common helpers.
This pull request contains:
- removal of the DMA_ERROR_CODE macro, replacing it with calls to
->mapping_error so that the dma_map_ops instances are more self
contained and can be shared across architectures (me)
- removal of the ->set_dma_mask method, which duplicates the
->dma_capable one in terms of functionality, but requires more
duplicate code.
- various updates for the coherent dma pool and related arm code
(Vladimir)
- various smaller cleanups (me)"
* tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping: (56 commits)
ARM: dma-mapping: Remove traces of NOMMU code
ARM: NOMMU: Set ARM_DMA_MEM_BUFFERABLE for M-class cpus
ARM: NOMMU: Introduce dma operations for noMMU
drivers: dma-mapping: allow dma_common_mmap() for NOMMU
drivers: dma-coherent: Introduce default DMA pool
drivers: dma-coherent: Account dma_pfn_offset when used with device tree
dma: Take into account dma_pfn_offset
dma-mapping: replace dmam_alloc_noncoherent with dmam_alloc_attrs
dma-mapping: remove dmam_free_noncoherent
crypto: qat - avoid an uninitialized variable warning
au1100fb: remove a bogus dma_free_nonconsistent call
MAINTAINERS: add entry for dma mapping helpers
powerpc: merge __dma_set_mask into dma_set_mask
dma-mapping: remove the set_dma_mask method
powerpc/cell: use the dma_supported method for ops switching
powerpc/cell: clean up fixed mapping dma_ops initialization
tile: remove dma_supported and mapping_error methods
xen-swiotlb: remove xen_swiotlb_set_dma_mask
arm: implement ->dma_supported instead of ->set_dma_mask
mips/loongson64: implement ->dma_supported instead of ->set_dma_mask
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJZXdVXAAoJELDendYovxMvVA0IAITmvH21SDTFiilKCOrxhCv0
W3q3cOhZA4D+UtTqqIm/os/et08n72864s0mUFoY4PxETaUsb1jBav7z7Tod2c6B
wh26UgIAhVO3ZewFSmpdPYoW0l3elC5JUMkVMfwSvHkROaU+YDEYUsLWGuIHZiiy
V/kIskcKe08HLObU//BMjfFusmMHmQSg+TruyqRWodlWj4Rwm7q5fNZ/xaap1UCM
O7GcHyq1k699w5YYTlIEkLWsX/pGM+auGSlT1xdjJEc2bpjH8ps0xbvAn6dsAKsE
yoDyxQWtX2wBUXCqF0hXYAB2r1iFx2aFfLQjwc7p+V6BvxpWwSsC7Ur4QIDnm3E=
=OLb7
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
"Other than fixes and cleanups it contains:
- support > 32 VCPUs at domain restore
- support for new sysfs nodes related to Xen
- some performance tuning for Linux running as Xen guest"
* tag 'for-linus-4.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: allow userspace access during hypercalls
x86: xen: remove unnecessary variable in xen_foreach_remap_area()
xen: allocate page for shared info page from low memory
xen: avoid deadlock in xenbus driver
xen: add sysfs node for hypervisor build id
xen: sync include/xen/interface/version.h
xen: add sysfs node for guest type
doc,xen: document hypervisor sysfs nodes for xen
xen/vcpu: Handle xen_vcpu_setup() failure at boot
xen/vcpu: Handle xen_vcpu_setup() failure in hotplug
xen/pv: Fix OOPS on restore for a PV, !SMP domain
xen/pvh*: Support > 32 VCPUs at domain restore
xen/vcpu: Simplify xen_vcpu related code
xen-evtchn: Bind dyn evtchn:qemu-dm interrupt to next online VCPU
xen: avoid type warning in xchg_xen_ulong
xen: fix HYPERVISOR_dm_op() prototype
xen: don't print error message in case of missing Xenstore entry
arm/xen: Adjust one function call together with a variable assignment
arm/xen: Delete an error message for a failed memory allocation in __set_phys_to_machine_multi()
arm/xen: Improve a size determination in __set_phys_to_machine_multi()
Pull RAS updates from Thomas Gleixner:
"The RAS updates for the 4.13 merge window:
- Cleanup of the MCE injection facility (Borsilav Petkov)
- Rework of the AMD/SMCA handling (Yazen Ghannam)
- Enhancements for ACPI/APEI to handle new notitication types (Shiju
Jose)
- atomic_t to refcount_t conversion (Elena Reshetova)
- A few fixes and enhancements all over the place"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
RAS/CEC: Check the correct variable in the debugfs error handling
x86/mce: Always save severity in machine_check_poll()
x86/MCE, xen/mcelog: Make /dev/mcelog registration messages more precise
x86/mce: Update bootlog description to reflect behavior on AMD
x86/mce: Don't disable MCA banks when offlining a CPU on AMD
x86/mce/mce-inject: Preset the MCE injection struct
x86/mce: Clean up include files
x86/mce: Get rid of register_mce_write_callback()
x86/mce: Merge mce_amd_inj into mce-inject
x86/mce/AMD: Use saved threshold block info in interrupt handler
x86/mce/AMD: Use msr_stat when clearing MCA_STATUS
x86/mce/AMD: Carve out SMCA bank configuration
x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers
x86/mce: Convert threshold_bank.cpus from atomic_t to refcount_t
RAS: Make local function parse_ras_param() static
ACPI/APEI: Handle GSIV and GPIO notification types
Pull irq updates from Thomas Gleixner:
"The irq department delivers:
- Expand the generic infrastructure handling the irq migration on CPU
hotplug and convert X86 over to it. (Thomas Gleixner)
Aside of consolidating code this is a preparatory change for:
- Finalizing the affinity management for multi-queue devices. The
main change here is to shut down interrupts which are affine to a
outgoing CPU and reenabling them when the CPU comes online again.
That avoids moving interrupts pointlessly around and breaking and
reestablishing affinities for no value. (Christoph Hellwig)
Note: This contains also the BLOCK-MQ and NVME changes which depend
on the rework of the irq core infrastructure. Jens acked them and
agreed that they should go with the irq changes.
- Consolidation of irq domain code (Marc Zyngier)
- State tracking consolidation in the core code (Jeffy Chen)
- Add debug infrastructure for hierarchical irq domains (Thomas
Gleixner)
- Infrastructure enhancement for managing generic interrupt chips via
devmem (Bartosz Golaszewski)
- Constification work all over the place (Tobias Klauser)
- Two new interrupt controller drivers for MVEBU (Thomas Petazzoni)
- The usual set of fixes, updates and enhancements all over the
place"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits)
irqchip/or1k-pic: Fix interrupt acknowledgement
irqchip/irq-mvebu-gicp: Allocate enough memory for spi_bitmap
irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity
nvme: Allocate queues for all possible CPUs
blk-mq: Create hctx for each present CPU
blk-mq: Include all present CPUs in the default queue mapping
genirq: Avoid unnecessary low level irq function calls
genirq: Set irq masked state when initializing irq_desc
genirq/timings: Add infrastructure for estimating the next interrupt arrival time
genirq/timings: Add infrastructure to track the interrupt timings
genirq/debugfs: Remove pointless NULL pointer check
irqchip/gic-v3-its: Don't assume GICv3 hardware supports 16bit INTID
irqchip/gic-v3-its: Add ACPI NUMA node mapping
irqchip/gic-v3-its-platform-msi: Make of_device_ids const
irqchip/gic-v3-its: Make of_device_ids const
irqchip/irq-mvebu-icu: Add new driver for Marvell ICU
irqchip/irq-mvebu-gicp: Add new driver for Marvell GICP
dt-bindings/interrupt-controller: Add DT binding for the Marvell ICU
genirq/irqdomain: Remove auto-recursive hierarchy support
irqchip/MSI: Use irq_domain_update_bus_token instead of an open coded access
...
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle were:
- Add the SYSTEM_SCHEDULING bootup state to move various scheduler
debug checks earlier into the bootup. This turns silent and
sporadically deadly bugs into nice, deterministic splats. Fix some
of the splats that triggered. (Thomas Gleixner)
- A round of restructuring and refactoring of the load-balancing and
topology code (Peter Zijlstra)
- Another round of consolidating ~20 of incremental scheduler code
history: this time in terms of wait-queue nomenclature. (I didn't
get much feedback on these renaming patches, and we can still
easily change any names I might have misplaced, so if anyone hates
a new name, please holler and I'll fix it.) (Ingo Molnar)
- sched/numa improvements, fixes and updates (Rik van Riel)
- Another round of x86/tsc scheduler clock code improvements, in hope
of making it more robust (Peter Zijlstra)
- Improve NOHZ behavior (Frederic Weisbecker)
- Deadline scheduler improvements and fixes (Luca Abeni, Daniel
Bristot de Oliveira)
- Simplify and optimize the topology setup code (Lauro Ramos
Venancio)
- Debloat and decouple scheduler code some more (Nicolas Pitre)
- Simplify code by making better use of llist primitives (Byungchul
Park)
- ... plus other fixes and improvements"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits)
sched/cputime: Refactor the cputime_adjust() code
sched/debug: Expose the number of RT/DL tasks that can migrate
sched/numa: Hide numa_wake_affine() from UP build
sched/fair: Remove effective_load()
sched/numa: Implement NUMA node level wake_affine()
sched/fair: Simplify wake_affine() for the single socket case
sched/numa: Override part of migrate_degrades_locality() when idle balancing
sched/rt: Move RT related code from sched/core.c to sched/rt.c
sched/deadline: Move DL related code from sched/core.c to sched/deadline.c
sched/cpuset: Only offer CONFIG_CPUSETS if SMP is enabled
sched/fair: Spare idle load balancing on nohz_full CPUs
nohz: Move idle balancer registration to the idle path
sched/loadavg: Generalize "_idle" naming to "_nohz"
sched/core: Drop the unused try_get_task_struct() helper function
sched/fair: WARN() and refuse to set buddy when !se->on_rq
sched/debug: Fix SCHED_WARN_ON() to return a value on !CONFIG_SCHED_DEBUG as well
sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming
sched/wait: Move bit_wait_table[] and related functionality from sched/core.c to sched/wait_bit.c
sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h>
sched/wait: Re-adjust macro line continuation backslashes in <linux/wait.h>
...
- introduce the new uuid_t/guid_t types that are going to replace
the somewhat confusing uuid_be/uuid_le types and make the terminology
fit the various specs, as well as the userspace libuuid library.
(me, based on a previous version from Amir)
- consolidated generic uuid/guid helper functions lifted from XFS
and libnvdimm (Amir and me)
- conversions to the new types and helpers (Amir, Andy and me)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAllZfmILHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYMvyg/9EvWHOOsSdeDykCK3KdH2uIqnxwpl+m7ljccaGJIc
MmaH0KnsP9p/Cuw5hESh2tYlmCYN7pmYziNXpf/LRS65/HpEYbs4oMqo8UQsN0UM
2IXHfXY0HnCoG5OixH8RNbFTkxuGphsTY8meaiDr6aAmqChDQI2yGgQLo3WM2/Qe
R9N1KoBWH/bqY6dHv+urlFwtsREm2fBH+8ovVma3TO73uZCzJGLJBWy3anmZN+08
uYfdbLSyRN0T8rqemVdzsZ2SrpHYkIsYGUZV43F581vp8e/3OKMoMxpWRRd9fEsa
MXmoaHcLJoBsyVSFR9lcx3axKrhAgBPZljASbbA0h49JneWXrzghnKBQZG2SnEdA
ktHQ2sE4Yb5TZSvvWEKMQa3kXhEfIbTwgvbHpcDr5BUZX8WvEw2Zq8e7+Mi4+KJw
QkvFC1S96tRYO2bxdJX638uSesGUhSidb+hJ/edaOCB/GK+sLhUdDTJgwDpUGmyA
xVXTF51ramRS2vhlbzN79x9g33igIoNnG4/PV0FPvpCTSqxkHmPc5mK6Vals1lqt
cW6XfUjSQECq5nmTBtYDTbA/T+8HhBgSQnrrvmferjJzZUFGr/7MXl+Evz2x4CjX
OBQoAMu241w6Vp3zoXqxzv+muZ/NLar52M/zbi9TUjE0GvvRNkHvgCC4NmpIlWYJ
Sxg=
=J/4P
-----END PGP SIGNATURE-----
Merge tag 'uuid-for-4.13' of git://git.infradead.org/users/hch/uuid
Pull uuid subsystem from Christoph Hellwig:
"This is the new uuid subsystem, in which Amir, Andy and I have started
consolidating our uuid/guid helpers and improving the types used for
them. Note that various other subsystems have pulled in this tree, so
I'd like it to go in early.
UUID/GUID summary:
- introduce the new uuid_t/guid_t types that are going to replace the
somewhat confusing uuid_be/uuid_le types and make the terminology
fit the various specs, as well as the userspace libuuid library.
(me, based on a previous version from Amir)
- consolidated generic uuid/guid helper functions lifted from XFS and
libnvdimm (Amir and me)
- conversions to the new types and helpers (Amir, Andy and me)"
* tag 'uuid-for-4.13' of git://git.infradead.org/users/hch/uuid: (34 commits)
ACPI: hns_dsaf_acpi_dsm_guid can be static
mmc: sdhci-pci: make guid intel_dsm_guid static
uuid: Take const on input of uuid_is_null() and guid_is_null()
thermal: int340x_thermal: fix compile after the UUID API switch
thermal: int340x_thermal: Switch to use new generic UUID API
acpi: always include uuid.h
ACPI: Switch to use generic guid_t in acpi_evaluate_dsm()
ACPI / extlog: Switch to use new generic UUID API
ACPI / bus: Switch to use new generic UUID API
ACPI / APEI: Switch to use new generic UUID API
acpi, nfit: Switch to use new generic UUID API
MAINTAINERS: add uuid entry
tmpfs: generate random sb->s_uuid
scsi_debug: switch to uuid_t
nvme: switch to uuid_t
sysctl: switch to use uuid_t
partitions/ldm: switch to use uuid_t
overlayfs: use uuid_t instead of uuid_be
fs: switch ->s_uuid to uuid_t
ima/policy: switch to use uuid_t
...
Update the effective affinity mask when an interrupt was successfully
targeted to a CPU.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235446.799944725@linutronix.de
When running under Xen as dom0, /dev/mcelog is being provided by Xen
instead of the normal mcelog character device of the MCE core. Convert
an error message being issued by the MCE core in this case to an
informative message that Xen has registered the device.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170614084059.19294-1-jgross@suse.com
Conflicts:
kernel/sched/Makefile
Pick up the waitqueue related renames - it didn't get much feedback,
so it appears to be uncontroversial. Famous last words? ;-)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
DMA_ERROR_CODE is going to go away, so don't rely on it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
ARM and x86 had duplicated versions of the dma_ops structure, the
only difference is that x86 hasn't wired up the set_dma_mask,
mmap, and get_sgtable ops yet. On x86 all of them are identical
to the generic version, so they aren't needed but harmless.
All the symbols used only for xen_swiotlb_dma_ops can now be marked
static as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
For support of Xen hypervisor live patching the hypervisor build id is
needed. Add a node /sys/hypervisor/properties/buildid containing the
information.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Currently there is no reliable user interface inside a Xen guest to
determine its type (e.g. HVM, PV or PVH). Instead of letting user mode
try to determine this by various rather hacky mechanisms (parsing of
boot messages before they are gone, trying to make use of known subtle
differences in behavior of some instructions), add a sysfs node
/sys/hypervisor/guest_type to explicitly deliver this information as
it is known to the kernel.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
A HVM domian booting generates around 200K (evtchn:qemu-dm xen-dyn)
interrupts,in a short period of time. All these evtchn:qemu-dm are bound
to VCPU 0, until irqbalance sees these IRQ and moves it to a different VCPU.
In one configuration, irqbalance runs every 10 seconds, which means
irqbalance doesn't get to see these burst of interrupts and doesn't
re-balance interrupts most of the time, making all evtchn:qemu-dm to be
processed by VCPU0. This cause VCPU0 to spend most of time processing
hardirq and very little time on softirq. Moreover, if dom0 kernel PREEMPTION
is disabled, VCPU0 never runs watchdog (process context), triggering a
softlockup detection code to panic.
Binding evtchn:qemu-dm to next online VCPU, will spread hardirq
processing evenly across different CPU. Later, irqbalance will try to balance
evtchn:qemu-dm, if required.
Signed-off-by: Anoob Soman <anoob.soman@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJZOscBAAoJELDendYovxMvN8EH+wXMRtFufmdXxyh3Wi5IHbfg
B56J4mjrOFpw+NnNHZk5H0cUSDwYb14dRCEnLNIXUpzCAb0mRMhPclhe07IMLqe1
FEqz6qWAh301mugqu6PlXaPZs9af7A6t6LEnfbAxXzgthWEhfzOecOXo0D5oV9sN
e4qFfoY9/5IoSShbEuHVLf5OBs4S5rhyQ0DNCEfqnHKvCn0VlRlBQMTrYTNZG28O
jgAWdxIPKXxCy2hoVV/vovuan1F38v9ZeWyVbf03IGfAGjVBFHzIbd9dH1OJm6X0
H/RGfJW6VPvswEsZXD6z0UkMW1IXa8fKCjwtvkVf5BFrKDJi4QUB/wZuteqmxrY=
=BUAE
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.12b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fix from Juergen Gross:
"A fix for Xen on ARM when dealing with 64kB page size of a guest"
* tag 'for-linus-4.12b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/privcmd: Support correctly 64KB page granularity when mapping memory
Commit 5995a68 "xen/privcmd: Add support for Linux 64KB page granularity" did
not go far enough to support 64KB in mmap_batch_fn.
The variable 'nr' is the number of 4KB chunk to map. However, when Linux
is using 64KB page granularity the array of pages (vma->vm_private_data)
contain one page per 64KB. Fix it by incrementing st->index correctly.
Furthermore, st->va is not correctly incremented as PAGE_SIZE !=
XEN_PAGE_SIZE.
Fixes: 5995a68 ("xen/privcmd: Add support for Linux 64KB page granularity")
CC: stable@vger.kernel.org
Reported-by: Feng Kan <fkan@apm.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
When registering for the Xenstore watch of the node control/sysrq the
handler will be called at once. Don't issue an error message if the
Xenstore node isn't there, as it will be created only when an event
is being triggered.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
For some file systems we still memcpy into it, but in various places this
already allows us to use the proper uuid helpers. More to come..
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> (Changes to IMA/EVM)
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
might_sleep() debugging and smp_processor_id() debugging should be active
right after the scheduler starts working. The init task can invoke
smp_processor_id() from preemptible context as it is pinned on the boot cpu
until sched_smp_init() removes the pinning and lets it schedule on all non
isolated cpus.
Add a new state which allows to enable those checks earlier and add it to
the xen do_poweroff() function.
No functional change.
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170516184736.196214622@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull misc vfs updates from Al Viro:
"Assorted bits and pieces from various people. No common topic in this
pile, sorry"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/affs: add rename exchange
fs/affs: add rename2 to prepare multiple methods
Make stat/lstat/fstatat pass AT_NO_AUTOMOUNT to vfs_statx()
fs: don't set *REFERENCED on single use objects
fs: compat: Remove warning from COMPATIBLE_IOCTL
remove pointless extern of atime_need_update_rcu()
fs: completely ignore unknown open flags
fs: add a VALID_OPEN_FLAGS
fs: remove _submit_bh()
fs: constify tree_descr arrays passed to simple_fill_super()
fs: drop duplicate header percpu-rwsem.h
fs/affs: bugfix: Write files greater than page size on OFS
fs/affs: bugfix: enable writes on OFS disks
fs/affs: remove node generation check
fs/affs: import amigaffs.h
fs/affs: bugfix: make symbolic links work again
There are many code paths opencoding kvmalloc. Let's use the helper
instead. The main difference to kvmalloc is that those users are
usually not considering all the aspects of the memory allocator. E.g.
allocation requests <= 32kB (with 4kB pages) are basically never failing
and invoke OOM killer to satisfy the allocation. This sounds too
disruptive for something that has a reasonable fallback - the vmalloc.
On the other hand those requests might fallback to vmalloc even when the
memory allocator would succeed after several more reclaim/compaction
attempts previously. There is no guarantee something like that happens
though.
This patch converts many of those places to kv[mz]alloc* helpers because
they are more conservative.
Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390
Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim
Acked-by: David Sterba <dsterba@suse.com> # btrfs
Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph
Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4
Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Santosh Raspatur <santosh@chelsio.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recent discussion (http://marc.info/?l=xen-devel&m=149192184523741)
established that commit 72a9b18629 ("xen: Remove event channel
notification through Xen PCI platform device") (and thus commit
da72ff5bfc ("partially revert "xen: Remove event channel
notification through Xen PCI platform device"")) are unnecessary and,
in fact, prevent HVM guests from booting on Xen releases prior to 4.0
Therefore we revert both of those commits.
The summary of that discussion is below:
Here is the brief summary of the current situation:
Before the offending commit (72a9b18629):
1) INTx does not work because of the reset_watches path.
2) The reset_watches path is only taken if you have Xen > 4.0
3) The Linux Kernel by default will use vector inject if the hypervisor
support. So even INTx does not work no body running the kernel with
Xen > 4.0 would notice. Unless he explicitly disabled this feature
either in the kernel or in Xen (and this can only be disabled by
modifying the code, not user-supported way to do it).
After the offending commit (+ partial revert):
1) INTx is no longer support for HVM (only for PV guests).
2) Any HVM guest The kernel will not boot on Xen < 4.0 which does
not have vector injection support. Since the only other mode
supported is INTx which.
So based on this summary, I think before commit (72a9b18629) we were
in much better position from a user point of view.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Now that __generic_dma_ops is a xen specific function, rename it to
xen_get_dma_ops. Change all the call sites appropriately.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: linux@armlinux.org.uk
CC: catalin.marinas@arm.com
CC: will.deacon@arm.com
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
CC: Julien Grall <julien.grall@arm.com>
Balloon driver uses several PV-only concepts (xen_start_info,
xen_extra_mem,..) and it seems the simpliest solution to make HVM-only
build happy is to decorate these parts with #ifdefs.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
simple_fill_super() is passed an array of tree_descr structures which
describe the files to create in the filesystem's root directory. Since
these arrays are never modified intentionally, they should be 'const' so
that they are placed in .rodata and benefit from memory protection.
This patch updates the function signature and all users, and also
constifies tree_descr.name.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
After allocation the item is being placed on the list right away.
Consequently it needs to be taken off the list before freeing in the
case xenbus_dev_request_and_reply() failed, as in that case the
callback (xenbus_dev_queue_reply()) is not being called (and if it
was called, it should do both).
Fixes: 5584ea250a
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
This was broken in commit cd979883b9 ("xen/acpi-processor:
fix enabling interrupts on syscore_resume"). do_suspend (from
xen/manage.c) and thus xen_resume_notifier never get called on
the initial-domain at resume (it is if running as guest.)
The rationale for the breaking change was that upload_pm_data()
potentially does blocking work in syscore_resume(). This patch
addresses the original issue by scheduling upload_pm_data() to
execute in workqueue context.
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: stable@vger.kernel.org
Based-on-patch-by: Konrad Wilk <konrad.wilk@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Replace hard coded "ACPI0007" with ACPI_PROCESSOR_DEVICE_HID
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJYwPOdAAoJELDendYovxMvaREH/jWjZt38HQFyWYmpGN/jTl5e
fl2kK8PcYR50WDVMROG50MoWeDWj4OiCHkQTln/BIckBfi895qNE+S27Z6ZvPBcY
Xqx+lbcKej1KU5O11Kmmuz7Jz/h3pyP09lY7vG50pxLMBVJy8L2P3Oj66fB4MbY0
u5DQtPSwRDlf86gNisQuRHDYIF+LZ+ZQD5SL0hRz5UStnxojbX0oxP/ijz/tyshP
Qk5PZXWLOTcWn8mvKJu9wqfNur9FLT+FE8dYzAqa8hoLECl3wR3jUxGb1kNVt+GB
GuK6AHtJ7plVjfMYaAJtjYJnBaCGTCt3GuSzGJhoES0RkC/u1knuQBe87hKhzCc=
=gYGK
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fix and cleanup from Juergen Gross:
"This contains one fix for MSIX handling under Xen and a trivial
cleanup patch"
* tag 'for-linus-4.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xenbus: Remove duplicate inclusion of linux/init.h
xen: do not re-use pirq number cached in pci device msi msg data
Pull swiotlb updates from Konrad Rzeszutek Wilk:
"Two tiny implementations of the DMA API for callback in ARM (for Xen)"
* 'stable/for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
swiotlb-xen: implement xen_swiotlb_get_sgtable callback
swiotlb-xen: implement xen_swiotlb_dma_mmap callback
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h
doing that for them.
Note that even if the count where we need to add extra headers seems high,
it's still a net win, because <linux/sched.h> is included in over
2,200 files ...
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/mm.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
The APIs that are going to be moved first are:
mm_alloc()
__mmdrop()
mmdrop()
mmdrop_async_fn()
mmdrop_async()
mmget_not_zero()
mmput()
mmput_async()
get_task_mm()
mm_access()
mm_release()
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch remove duplicate inclusion of linux/init.h in
xenbus_dev_frontend.c.
Confirm successfully compile after remove the line.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.
Remove the vma parameter to simplify things.
[arnd@arndb.de: fix ARM build]
Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The purpose if this ioctl is to allow a user of privcmd to restrict its
operation such that it will no longer service arbitrary hypercalls via
IOCTL_PRIVCMD_HYPERCALL, and will check for a matching domid when
servicing IOCTL_PRIVCMD_DM_OP or IOCTL_PRIVCMD_MMAP*. The aim of this
is to limit the attack surface for a compromised device model.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Recently a new dm_op[1] hypercall was added to Xen to provide a mechanism
for restricting device emulators (such as QEMU) to a limited set of
hypervisor operations, and being able to audit those operations in the
kernel of the domain in which they run.
This patch adds IOCTL_PRIVCMD_DM_OP as gateway for __HYPERVISOR_dm_op.
NOTE: There is no requirement for user-space code to bounce data through
locked memory buffers (as with IOCTL_PRIVCMD_HYPERCALL) since
privcmd has enough information to lock the original buffers
directly.
[1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=524a98c2
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
The code sets the default return code to -ENOSYS but then overrides this
to -EINVAL in the switch() statement's default case, which is clearly
silly.
This patch removes the override and sets the default return code to
-ENOTTY, which is the conventional return for an unimplemented ioctl.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Handling of multiple concurrent Xenstore accesses through xenbus driver
either from the kernel or user land is rather lame today: xenbus is
capable to have one access active only at one point of time.
Rewrite xenbus to handle multiple requests concurrently by making use
of the request id of the Xenstore protocol. This requires to:
- Instead of blocking inside xb_read() when trying to read data from
the xenstore ring buffer do so only in the main loop of
xenbus_thread().
- Instead of doing writes to the xenstore ring buffer in the context of
the caller just queue the request and do the write in the dedicated
xenbus thread.
- Instead of just forwarding the request id specified by the caller of
xenbus to xenstore use a xenbus internal unique request id. This will
allow multiple outstanding requests.
- Modify the locking scheme in order to allow multiple requests being
active in parallel.
- Instead of waiting for the reply of a user's xenstore request after
writing the request to the xenstore ring buffer return directly to
the caller and do the waiting in the read path.
Additionally signal handling was optimized by avoiding waking up the
xenbus thread or sending an event to Xenstore in case the addressed
entity is known to be running already.
As a result communication with Xenstore is sped up by a factor of up
to 5: depending on the request type (read or write) and the amount of
data transferred the gain was at least 20% (small reads) and went up to
a factor of 5 for large writes.
In the end some more rough edges of xenbus have been smoothed:
- Handling of memory shortage when reading from xenstore ring buffer in
the xenbus driver was not optimal: it was busy looping and issuing a
warning in each loop.
- In case of xenstore not running in dom0 but in a stubdom we end up
with two xenbus threads running as the initialization of xenbus in
dom0 expecting a local xenstored will be redone later when connecting
to the xenstore domain. Up to now this was no problem as locking
would prevent the two xenbus threads interfering with each other, but
this was just a waste of kernel resources.
- An out of memory situation while writing to or reading from the
xenstore ring buffer no longer will lead to a possible loss of
synchronization with xenstore.
- The user read and write part are now interruptible by signals.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Today a Xenstore watch event is delivered via a callback function
declared as:
void (*callback)(struct xenbus_watch *,
const char **vec, unsigned int len);
As all watch events only ever come with two parameters (path and token)
changing the prototype to:
void (*callback)(struct xenbus_watch *,
const char *path, const char *token);
is the natural thing to do.
Apply this change and adapt all users.
Cc: konrad.wilk@oracle.com
Cc: roger.pau@citrix.com
Cc: wei.liu2@citrix.com
Cc: paul.durrant@citrix.com
Cc: netdev@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
The xenbus driver has an awful mixture of internally and globally
visible headers: some of the internally used only stuff is defined in
the global header include/xen/xenbus.h while some stuff defined in
internal headers is used by other drivers, too.
Clean this up by moving the externally used symbols to
include/xen/xenbus.h and the symbols used internally only to a new
header drivers/xen/xenbus/xenbus.h replacing xenbus_comms.h and
xenbus_probe.h
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
This function error patch can be simplified, so do so.
Remove fail: label and somewhat obfuscating, used once "error_path"
function.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
PVH guests don't (yet) receive ACPI hotplug interrupts and therefore
need to monitor xenstore for CPU hotplug event.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Like PV guests, PVH does not have PCI devices and therefore cannot
use MMIO space to store grants. Instead it balloons out memory and
keeps grants there.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
We are replacing existing PVH guests with new implementation.
We are keeping xen_pvh_domain() macro (for now set to zero) because
when we introduce new PVH implementation later in this series we will
reuse current PVH-specific code (xen_pvh_gnttab_setup()), and that
code is conditioned by 'if (xen_pvh_domain())'. (We will also need
a noop xen_pvh_domain() for !CONFIG_XEN_PVH).
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
A negative return value indicates an error; in fact the function at
present won't ever return zero.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Pull swiotlb fix from Konrad Rzeszutek Wilk:
"An ARM fix in the Xen SWIOTLB - mainly the translation of physical to
bus addresses was done just a tad too late"
* 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
swiotlb-xen: update dev_addr after swapping pages
In xen_swiotlb_map_page and xen_swiotlb_map_sg_attrs, if the original
page is not suitable, we swap it for another page from the swiotlb
pool.
In these cases, we don't update the previously calculated dma address
for the page before calling xen_dma_map_page. Thus, we end up calling
xen_dma_map_page passing the wrong dev_addr, resulting in
xen_dma_map_page mistakenly assuming that the page is foreign when it is
local.
Fix the bug by updating dev_addr appropriately.
This change has no effect on x86, because xen_dma_map_page is a stub
there.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Pooya Keshavarzi <Pooya.Keshavarzi@de.bosch.com>
Tested-by: Pooya Keshavarzi <Pooya.Keshavarzi@de.bosch.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Commit 72a9b18629 ("xen: Remove event channel notification through Xen
PCI platform device") broke Linux when booting as Dom0 on Xen in a
nested Xen environment (Xen installed inside a Xen VM). In this
scenario, Linux is a PV guest, but at the same time it uses the
platform-pci driver to receive notifications from L0 Xen. vector
callbacks are not available because L1 Xen doesn't allow them.
Partially revert the offending commit, by restoring IRQ based
notifications for PV guests only. I restored only the code which is
strictly needed and replaced the xen_have_vector_callback checks within
it with xen_pv_domain() checks.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Pull swiotlb fixes from Konrad Rzeszutek Wilk:
"This has one fix to make i915 work when using Xen SWIOTLB, and a
feature from Geert to aid in debugging of devices that can't do DMA
outside the 32-bit address space.
The feature from Geert is on top of v4.10 merge window commit
(specifically you pulling my previous branch), as his changes were
dependent on the Documentation/ movement patches.
I figured it would just easier than me trying than to cherry-pick the
Documentation patches to satisfy git.
The patches have been soaking since 12/20, albeit I updated the last
patch due to linux-next catching an compiler error and adding an
Tested-and-Reported-by tag"
* 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
swiotlb: Export swiotlb_max_segment to users
swiotlb: Add swiotlb=noforce debug option
swiotlb: Convert swiotlb_force from int to enum
x86, swiotlb: Simplify pci_swiotlb_detect_override()
So they can figure out what is the optimal number of pages
that can be contingously stitched together without fear of
bounce buffer.
We also expose an mechanism for sub-users of SWIOTLB API, such
as Xen-SWIOTLB to set the max segment value. And lastly
if swiotlb=force is set (which mandates we bounce buffer everything)
we set max_segment so at least we can bounce buffer one 4K page
instead of a giant 512KB one for which we may not have space.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reported-and-Tested-by: Juergen Gross <jgross@suse.com>
- small fixes for xenbus driver
- one fix for xen dom0 boot on huge system
- small cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJYbgWzAAoJELDendYovxMvzqQH/iO+SKCrT39q6fCP+fyov7Hi
J67XrHVT/AAPUXizWzKdtBE5EdI+WZXBkdsCEh3+3XPCeCRL/t9dRYEytle0Ioy9
hXC5otiJQ1hhm2N5dQKT5c0IMVh9mAjbeIqcG2dV1lSVaw0CYcJS4xh9eALxj7UY
eXGpNMdNyeiEG2p5OgnDE5GqHavxPh+6ChNxmr8341T8E+C9U1BNtJeUiIQshKmC
YAlt7YWoPzEJeLAYEiwrROYNyrLNd17IlYOeKXSwZUdkVtZahW+/jO+YYmhbx1C/
Yvt93r7ewUFKslRgpZQjjl8y9eynKg+j2BWx8WjAwpdHfCa1DFEOxiAOraLp7Cc=
=ro0H
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.10-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes and cleanups from Juergen Gross:
- small fixes for xenbus driver
- one fix for xen dom0 boot on huge system
- small cleanups
* tag 'for-linus-4.10-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
Xen: ARM: Zero reserved fields of xatp before making hypervisor call
xen: events: Replace BUG() with BUG_ON()
xen: remove stale xs_input_avail() from header
xen: return xenstore command failures via response instead of rc
xen: xenbus driver must not accept invalid transaction ids
xen/evtchn: use rb_entry()
xen/setup: Don't relocate p2m over existing one
Ensure all reserved fields of xatp are zero before making
hypervisor call to XEN in xen_map_device_mmio().
xenmem_add_to_physmap_one() in XEN fails the mapping request if
extra.res reserved field in xatp is not zero for XENMAPSPACE_dev_mmio
request.
Signed-off-by: Jiandi An <anjiandi@codeaurora.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>