The code for switching to irq_stack stores three pieces of information on
the stack, fp+lr, as a fake stack frame (that lets us walk back onto the
interrupted tasks stack frame), and the address of the struct pt_regs that
contains the register values from kernel entry. (which dump_backtrace()
will print in any stack trace).
To reduce this, we store fp, and the pointer to the struct pt_regs.
unwind_frame() can recognise this as the irq_stack dummy frame, (as it only
appears at the top of the irq_stack), and use the struct pt_regs values
to find the missing interrupted link-register.
Suggested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In paging_init, we allocate the zero page, memset it to zero and then
point TTBR0 to it in order to avoid speculative fetches through the
identity mapping.
In order to guarantee that the freshly zeroed page is indeed visible to
the page table walker, we need to execute a dsb instruction prior to
writing the TTBR.
Cc: <stable@vger.kernel.org> # v3.14+, for older kernels need to drop the 'ishst'
Signed-off-by: Will Deacon <will.deacon@arm.com>
It's not immediately obvious which hardware errata are worked around in
the Linux kernel for an arbitrary kernel tree, so add a file to keep
track of what we're working around.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We drop __cpu_setup in .text.init, which ends up being part of .text.
The .text.init section was a legacy section name which has been unused
elsewhere for a long time.
The ".text.init" name is misleading if read as a synonym for
".init.text". Any CPU may execute __cpu_setup before turning the MMU on,
so it should simply live in .text.
Remove the pointless section assignment. This will leave __cpu_setup in
the .text section.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The arm64 asm/cmpxchg.h includes linux/mmdebug.h but doesn't so far as I
can tell actually use anything from it. Removing the inclusion reduces
spurious header dependency rebuilds and also avoids issues with
recursive inclusions of headers causing build breaks due to attempts to
use things before they are defined if linux/mmdebug.h starts pulling in
more low level headers.
Such errors have happened in -next recently, for example:
In file included from include/linux/completion.h:11:0,
from include/linux/rcupdate.h:43,
from include/linux/tracepoint.h:19,
from include/linux/mmdebug.h:6,
from ./arch/arm64/include/asm/cmpxchg.h:22,
from ./arch/arm64/include/asm/atomic.h:41,
from include/linux/atomic.h:4,
from include/linux/spinlock.h:406,
from include/linux/seqlock.h:35,
from include/linux/time.h:5,
from include/uapi/linux/timex.h:56,
from include/linux/timex.h:56,
from include/linux/sched.h:19,
from arch/arm64/kernel/asm-offsets.c:21:
include/linux/wait.h: In function 'wait_on_atomic_t':
include/linux/wait.h:1218:2: error: implicit declaration of function 'atomic_read' [-Werror=implicit-function-declaration]
if (atomic_read(val) == 0)
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently we treat the alternatives separately from other data that's
only used during initialisation, using separate .altinstructions and
.altinstr_replacement linker sections. These are freed for general
allocation separately from .init*. This is problematic as:
* We do not remove execute permissions, as we do for .init, leaving the
memory executable.
* We pad between them, making the kernel Image bianry up to PAGE_SIZE
bytes larger than necessary.
This patch moves the two sections into the contiguous region used for
.init*. This saves some memory, ensures that we remove execute
permissions, and allows us to remove some code made redundant by this
reorganisation.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently we place an ALIGN_DEBUG_RO between text and data for the .text
and .init sections, and depending on configuration each of these may
result in up to SECTION_SIZE bytes worth of padding (for
DEBUG_RODATA_ALIGN).
We make no distinction between the text and data in each of these
sections at any point when creating the initial page tables in head.S.
We also make no distinction when modifying the tables; __map_memblock,
fixup_executable, mark_rodata_ro, and fixup_init only work at section
granularity. Thus this padding is unnecessary.
For the spit between init text and data we impose a minimum alignment of
16 bytes, but this is also unnecessary. The init data is output
immediately after the padding before any symbols are defined, so this is
not required to keep a symbol for linker a section array correctly
associated with the data. Any objects within the section will be given
at least their usual alignment regardless.
This patch removes the redundant padding.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
As pgd_offset{,_k} shift the input address by PGDIR_SHIFT, the sub-page
bits will always be shifted out. There is no need to apply PAGE_MASK
before this.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
On entry from el0, we save all the registers on the kernel stack, and
restore them before returning. x29 remains unchanged when we call out
to C code, which will store x29 as the frame-pointer on the stack.
Instead, write 0 into x29 after entry from el0, to avoid any risk of
tracing into user space.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When unwind_frame() reaches the bottom of the irq_stack, the last fp
points to the original task stack. unwind_frame() uses
IRQ_STACK_TO_TASK_STACK() to find the sp value. If either values is
wrong, we may end up walking a corrupt stack.
Check these values are sane by testing if they are both on the stack
pointed to by current->stack.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
irq_stack is a per_cpu variable, that needs to be access from entry.S.
Use an assembler macro instead of the unreadable details.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This refactors the EFI init and runtime code that will be shared
between arm64 and ARM so that it can be built for both archs.
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This splits off the early EFI init and runtime code that
- discovers the EFI params and the memory map from the FDT, and installs
the memblocks and config tables.
- prepares and installs the EFI page tables so that UEFI Runtime Services
can be invoked at the virtual address installed by the stub.
This will allow it to be reused for 32-bit ARM.
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Change the EFI memory reservation logic to use memblock_mark_nomap()
rather than memblock_reserve() to mark UEFI reserved regions as
occupied. In addition to reserving them against allocations done by
memblock, this will also prevent them from being covered by the linear
mapping.
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Take the new memblock attribute MEMBLOCK_NOMAP into account when
deciding whether a certain region is or should be covered by the
kernel direct mapping.
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This introduces the MEMBLOCK_NOMAP attribute and the required plumbing
to make it usable as an indicator that some parts of normal memory
should not be covered by the kernel direct mapping. It is up to the
arch to actually honor the attribute when laying out this mapping,
but the memblock code itself is modified to disregard these regions
for allocations and other general use.
Cc: linux-mm@kvack.org
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Running with CONFIG_DEBUG_SPINLOCK=y can trigger a BUG with the new IRQ
stack code:
BUG: spinlock lockup suspected on CPU#1
This is due to the IRQ_STACK_TO_TASK_STACK macro incorrectly retrieving
the task stack pointer stashed at the top of the IRQ stack.
Sayeth James:
| Yup, this is what is happening. Its an off-by-one due to broken
| thinking about how the stack works. My broken thinking was:
|
| > top ------------
| > | dummy_lr | <- irq_stack_ptr
| > ------------
| > | x29 |
| > ------------
| > | x19 | <- irq_stack_ptr - 0x10
| > ------------
| > | xzr |
| > ------------
|
| But the stack-pointer is decreased before use. So it actually looks
| like this:
|
| > ------------
| > | | <- irq_stack_ptr
| > top ------------
| > | dummy_lr |
| > ------------
| > | x29 | <- irq_stack_ptr - 0x10
| > ------------
| > | x19 |
| > ------------
| > | xzr | <- irq_stack_ptr - 0x20
| > ------------
|
| The value being used as the original stack is x29, which in all the
| tests is sp but without the current frames data, hence there are no
| missing frames in the output.
|
| Jungseok Lee picked it up with a 32bit user space because aarch32
| can't use x29, so it remains 0 forever. The fix he posted is correct.
This patch fixes the macro and adds some of this wisdom to a comment,
so that the layout of the IRQ stack is well understood.
Cc: James Morse <james.morse@arm.com>
Reported-by: Jungseok Lee <jungseoklee85@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
entry.S is modified to switch to the per_cpu irq_stack during el{0,1}_irq.
irq_count is used to detect recursive interrupts on the irq_stack, it is
updated late by do_softirq_own_stack(), when called on the irq_stack, before
__do_softirq() re-enables interrupts to process softirqs.
do_softirq_own_stack() is added by this patch, but does not yet switch
stack.
This patch adds the dummy stack frame and data needed by the previous
stack tracing patches.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch allows unwind_frame() to traverse from interrupt stack to task
stack correctly. It requires data from a dummy stack frame, created
during irq_stack_entry(), added by a later patch.
A similar approach is taken to modify dump_backtrace(), which expects to
find struct pt_regs underneath any call to functions marked __exception.
When on an irq_stack, the struct pt_regs is stored on the old task stack,
the location of which is stored in the dummy stack frame.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
[james.morse: merged two patches, reworked for per_cpu irq_stacks, and
no alignment guarantees, added irq_stack definitions]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
There is need for figuring out how to manage struct thread_info data when
IRQ stack is introduced. struct thread_info information should be copied
to IRQ stack under the current thread_info calculation logic whenever
context switching is invoked. This is too expensive to keep supporting
the approach.
Instead, this patch pays attention to sp_el0 which is an unused scratch
register in EL1 context. sp_el0 utilization not only simplifies the
management, but also prevents text section size from being increased
largely due to static allocated IRQ stack as removing masking operation
using THREAD_SIZE in many places.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Jungseok Lee <jungseoklee85@gmail.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Make sure to clear out any ptrace singlestep state when a ptrace(2)
PTRACE_DETACH call is made on arm64 systems.
Otherwise, the previously ptraced task will die off with a SIGTRAP
signal if the debugger just previously singlestepped the ptraced task.
Cc: <stable@vger.kernel.org>
Signed-off-by: John Blackwood <john.blackwood@ccur.com>
[will: added comment to justify why this is in the arch code]
Signed-off-by: Will Deacon <will.deacon@arm.com>
There is no need to worry about module and __init text disappearing
case, because that ftrace has a module notifier that is called when
a module is being unloaded and before the text goes away and this
code grabs the ftrace_lock mutex and removes the module functions
from the ftrace list, such that it will no longer do any
modifications to that module's text, the update to make functions
be traced or not is done under the ftrace_lock mutex as well.
And by now, __init section codes should not been modified
by ftrace, because it is black listed in recordmcount.c and
ignored by ftrace.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Li Bin <huawei.libin@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
For ftrace on arm64, kstop_machine which is hugely disruptive
to a running system is not needed to convert nops to ftrace calls
or back, because that to be modified instrucions, that NOP, B or BL,
are all safe instructions which called "concurrent modification
and execution of instructions", that can be executed by one
thread of execution as they are being modified by another thread
of execution without requiring explicit synchronization.
Signed-off-by: Li Bin <huawei.libin@huawei.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Boqun Feng reported a rather nasty ordering issue with spin_unlock_wait
on architectures implementing spin_lock with LL/SC sequences and acquire
semantics:
| CPU 1 CPU 2 CPU 3
| ================== ==================== ==============
| spin_unlock(&lock);
| spin_lock(&lock):
| r1 = *lock; // r1 == 0;
| o = READ_ONCE(object); // reordered here
| object = NULL;
| smp_mb();
| spin_unlock_wait(&lock);
| *lock = 1;
| smp_mb();
| o->dead = true;
| if (o) // true
| BUG_ON(o->dead); // true!!
The crux of the problem is that spin_unlock_wait(&lock) can return on
CPU 1 whilst CPU 2 is in the process of taking the lock. This can be
resolved by upgrading spin_unlock_wait to a LOCK operation, forcing it
to serialise against a concurrent locker and giving it acquire semantics
in the process (although it is not at all clear whether this is needed -
different callers seem to assume different things about the barrier
semantics and architectures are similarly disjoint in their
implementations of the macro).
This patch implements spin_unlock_wait using an LL/SC sequence with
acquire semantics on arm64. For v8.1 systems with the LSE atomics, the
exclusive writeback is omitted, since the spin_lock operation is
indivisible and no intermediate state can be observed.
Signed-off-by: Will Deacon <will.deacon@arm.com>
arm64 relies on the arm_arch_timer for sched_clock, so we can select
HAVE_IRQ_TIME_ACCOUNTING and have the core sched-clock code enable the
feature at runtime based on the rate.
Reported-by: Mario Smarduch <m.smarduch@samsung.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
ARM glibc uses (4 * __getpagesize()) for SHMLBA, which is correct for
4KB pages and works fine for 64KB pages, but the kernel uses a hardcoded
16KB that is too small for 64KB page based kernels. This changes the
definition to what user space sees when using 64KB pages.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
These functions/variables are not needed after booting, so mark them
as __init or __initdata.
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch implements the pte_accessible() macro, which can be used to
test whether or not a given pte is a candidate for allocation in the
TLB.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Callees of __create_mapping may decide to create section mappings if
sufficient low bits of the physical and virtual addresses they were
passed are zero. While __create_mapping rounds the virtual base address
down, it does not similarly round the physical base address down, and
hence non-zero bits in the physical address can prevent use of a section
mapping, even where a whole next-level table would be used instead.
Round down the physical base address in __create_mapping to enable all
callees to always create section mappings when such a mapping is
possible.
Cc: Laura Abbott <labbott@fedoraproject.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
If a caller of __create_mapping provides a PA and VA which have
different sub-page offsets, it is not clear which offset they expect to
apply to the mapping, and is indicative of a bad caller.
In some cases, the region we wish to map may validly have a sub-page
offset in the physical and virtual addresses. For example, EFI runtime
regions have 4K granularity, yet may be mapped by a 64K page kernel. So
long as the physical and virtual offsets are the same, the region will
be mapped at the expected VAs.
Disallow calls with differing sub-page offsets, and WARN when they are
encountered, so that we can detect and fix such cases.
Cc: Laura Abbott <labbott@fedoraproject.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Pull nouveau and radeon fixes from Dave Airlie:
"Just some nouveau and radeon/amdgpu fixes.
The nouveau fixes look large as the firmware context files are
regenerated, but the actual change is quite small"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: make some dpm errors debug only
drm/nouveau/volt/pwm/gk104: fix an off-by-one resulting in the voltage not being set
drm/nouveau/nvif: allow userspace access to its own client object
drm/nouveau/gr/gf100-: fix oops when calling zbc methods
drm/nouveau/gr/gf117-: assume no PPC if NV_PGRAPH_GPC_GPM_PD_PES_TPC_ID_MASK is zero
drm/nouveau/gr/gf117-: read NV_PGRAPH_GPC_GPM_PD_PES_TPC_ID_MASK from correct GPC
drm/nouveau/gr/gf100-: split out per-gpc address calculation macro
drm/nouveau/bios: return actual size of the buffer retrieved via _ROM
drm/nouveau/instmem: protect instobj list with a spinlock
drm/nouveau/pci: enable c800 magic for some unknown Samsung laptop
drm/nouveau/pci: enable c800 magic for Clevo P157SM
drm/radeon: make rv770_set_sw_state failures non-fatal
drm/amdgpu: move dependency handling out of atomic section v2
drm/amdgpu: optimize scheduler fence handling
drm/amdgpu: remove vm->mutex
drm/amdgpu: add mutex for ba_va->valids/invalids
drm/amdgpu: adapt vce session create interface changes
drm/amdgpu: vce use multiple cache surface starting from stoney
drm/amdgpu: reset vce trap interrupt flag
Two fixes for the ds1307 alarm and wakeup.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJWW0BzAAoJEKbNnwlvZCyziZ8P/3cvV1g8TAOsORZfHt8D5S6u
IWQrfkTtdfGKvPCAnY4TF/dKTeIZs3hI0/cG9RekciFOmEQ5Vmj9KlyZxzJB5aaI
FGJIFSBIYFVbZGyE8TKsayjrlB2D8/cr9OlrlsIcgqYmsVi8izwzWWfJKj89pVu3
qFptHRHRhTdSimmeyaJ9pmfCJy59jiueTG9sOHLJBPj98vOFWJPwTN0fABRHBbd4
R7KC6N5EjEXJFLXTsyFcu+cNAx/gmTRXJwo9jFpBTFGdSUZDddir9oXXhsrk+86j
4NO/Xa1VawQIz/nStgiZ2FV2L3Y9Hl9wtoz1s8dtG0syqrgbn6yaId7QFrrtHX48
q6aVT6vVBwx/Im2B/4bcw/XF0aSw3NYlVFxHZszIeWTuNfm7KkcQAGeLa47jzTGl
GOJOpdtldPQECii6jlYoURd5pH8FANpzRXQ8AYyVsl6gnNwjf8OBBhDEfv7O4wW9
1yeg0E/5XoaGJ6NdniRcHW3Wixf9b72htytOB/+r1nSJljA4cN0ojczVIpAoQSxt
sNKbE/Eo96v3qrxvDZ1z41J+V2CxKxary1DLlXvMAFnqMFOnF8rK5wI8jow8Xjsf
CACPFDCB0KxoLC5hgbbhBGkZd7eTIq30F1FjP8v0ypc8see/8g4H8+SIKy3R5EqV
wyyt+revuVKibz2NB7Ik
=6IeZ
-----END PGP SIGNATURE-----
Merge tag 'rtc-4.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC fixes from Alexandre Belloni:
"Two fixes for the ds1307 alarm and wakeup"
* tag 'rtc-4.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: ds1307: fix alarm reading at probe time
rtc: ds1307: fix kernel splat due to wakeup irq handling
Pull MIPS fix from Ralf Baechle:
"Just a fix for empty loops that may be removed by non-antique GCC"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: Fix delay loops which may be removed by GCC.
Pull m68k fixes from Geert Uytterhoeven:
"Summary:
- Add missing initialization of max_pfn, which is needed to make
selftests/vm/mlock2-tests succeed,
- Wire up new mlock2 syscall"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: Wire up mlock2
m68knommu: Add missing initialization of max_pfn and {min,max}_low_pfn
m68k/mm: sun3 - Add missing initialization of max_pfn and {min,max}_low_pfn
m68k/mm: m54xx - Add missing initialization of max_pfn
m68k/mm: motorola - Add missing initialization of max_pfn
Pull ARM fixes from Russell King:
"Just two changes this time around:
- wire up the new mlock2 syscall added during the last merge window
- fix a build problem with certain configurations provoked by making
CONFIG_OF user selectable"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8454/1: OF implies OF_FLATTREE
ARM: wire up mlock2 syscall
Pull SCSI target fixes from Nicholas Bellinger:
- fix tcm-user backend driver expired cmd time processing (agrover)
- eliminate kref_put_spinlock_irqsave() for I/O completion (bart)
- fix iscsi login kthread failure case hung task regression (nab)
- fix COMPARE_AND_WRITE completion use-after-free race (nab)
- fix COMPARE_AND_WRITE with SCF_PASSTHROUGH_SG_TO_MEM_NOALLOC non zero
SGL offset data corruption. (Jan + Doug)
- fix >= v4.4-rc1 regression for tcm_qla2xxx enable configfs attribute
(Himanshu + HCH)
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target/stat: print full t10_wwn.model buffer
target: fix COMPARE_AND_WRITE non zero SGL offset data corruption
qla2xxx: Fix regression introduced by target configFS changes
kref: Remove kref_put_spinlock_irqsave()
target: Invoke release_cmd() callback without holding a spinlock
target: Fix race for SCF_COMPARE_AND_WRITE_POST checking
iscsi-target: Fix rx_login_comp hang after login failure
iscsi-target: return -ENOMEM instead of -1 in case of failed kmalloc()
target/user: Do not set unused fields in tcmu_ops
target/user: Fix time calc in expired cmd processing
Pull thermal management fixes from Zhang Rui:
"Specifics:
- several fixes and cleanups on Rockchip thermal drivers.
- add the missing support of RK3368 SoCs in Rockchip driver.
- small fixes on of-thermal, power_allocator, rcar driver, IMX, and
QCOM drivers, and also compilation fixes, on thermal.h, when thermal
is not selected"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
imx: thermal: use CPU temperature grade info for thresholds
thermal: fix thermal_zone_bind_cooling_device prototype
Revert "thermal: qcom_spmi: allow compile test"
thermal: rcar_thermal: remove redundant operation
thermal: of-thermal: Reduce log level for message when can't fine thermal zone
thermal: power_allocator: Use temperature reading from tz
thermal: rockchip: Support the RK3368 SoCs in thermal driver
thermal: rockchip: consistently use int for temperatures
thermal: rockchip: Add the sort mode for adc value increment or decrement
thermal: rockchip: improve the conversion function
thermal: rockchip: trivial: fix typo in commit
thermal: rockchip: better to compatible the driver for different SoCs
dt-bindings: rockchip-thermal: Support the RK3368 SoCs compatible
Cut 'n paste error saw it only process sizeof(t10_wwn.vendor) characters.
Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
target_core_sbc's compare_and_write functionality suffers from taking
data at the wrong memory location when writing a CAW request to disk
when a SGL offset is non-zero.
This can happen with loopback and vhost-scsi fabric drivers when
SCF_PASSTHROUGH_SG_TO_MEM_NOALLOC is used to map existing user-space
SGL memory into COMPARE_AND_WRITE READ/WRITE payload buffers.
Given the following sample LIO subtopology,
% targetcli ls /loopback/
o- loopback ................................. [1 Target]
o- naa.6001405ebb8df14a ....... [naa.60014059143ed2b3]
o- luns ................................... [2 LUNs]
o- lun0 ................ [iblock/ram0 (/dev/ram0)]
o- lun1 ................ [iblock/ram1 (/dev/ram1)]
% lsscsi -g
[3:0:1:0] disk LIO-ORG IBLOCK 4.0 /dev/sdc /dev/sg3
[3:0:1:1] disk LIO-ORG IBLOCK 4.0 /dev/sdd /dev/sg4
the following bug can be observed in Linux 4.3 and 4.4~rc1:
% perl -e 'print chr$_ for 0..255,reverse 0..255' >rand
% perl -e 'print "\0" x 512' >zero
% cat rand >/dev/sdd
% sg_compare_and_write -i rand -D zero --lba 0 /dev/sdd
% sg_compare_and_write -i zero -D rand --lba 0 /dev/sdd
Miscompare reported
% hexdump -Cn 512 /dev/sdd
00000000 0f 0e 0d 0c 0b 0a 09 08 07 06 05 04 03 02 01 00
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
00000200
Rather than writing all-zeroes as instructed with the -D file, it
corrupts the data in the sector by splicing some of the original
bytes in. The page of the first entry of cmd->t_data_sg includes the
CDB, and sg->offset is set to a position past the CDB. I presume that
sg->offset is also the right choice to use for subsequent sglist
members.
Signed-off-by: Jan Engelhardt <jengelh@netitwork.de>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Cc: <stable@vger.kernel.org> # v3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
this patch fixes following regression
# targetcli
[Errno 13] Permission denied: '/sys/kernel/config/target/qla2xxx/21:00:00:0e:1e:08:c7:20/tpgt_1/enable'
Fixes: 2eafd72939 ("target: use per-attribute show and store methods")
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
The last user is gone. Hence remove this function.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Joern Engel <joern@logfs.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch addresses a race + use after free where the first
stage of COMPARE_AND_WRITE in compare_and_write_callback()
is rescheduled after the backend sends the secondary WRITE,
resulting in second stage compare_and_write_post() callback
completing in target_complete_ok_work() before the first
can return.
Because current code depends on checking se_cmd->se_cmd_flags
after return from se_cmd->transport_complete_callback(),
this results in first stage having SCF_COMPARE_AND_WRITE_POST
set, which incorrectly falls through into second stage CAW
processing code, eventually triggering a NULL pointer
dereference due to use after free.
To address this bug, pass in a new *post_ret parameter into
se_cmd->transport_complete_callback(), and depend upon this
value instead of ->se_cmd_flags to determine when to return
or fall through into ->queue_status() code for CAW.
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch addresses a case where iscsi_target_do_tx_login_io()
fails sending the last login response PDU, after the RX/TX
threads have already been started.
The case centers around iscsi_target_rx_thread() not invoking
allow_signal(SIGINT) before the send_sig(SIGINT, ...) occurs
from the failure path, resulting in RX thread hanging
indefinately on iscsi_conn->rx_login_comp.
Note this bug is a regression introduced by:
commit e54198657b
Author: Nicholas Bellinger <nab@linux-iscsi.org>
Date: Wed Jul 22 23:14:19 2015 -0700
iscsi-target: Fix iscsit_start_kthreads failure OOPs
To address this bug, complete ->rx_login_complete for good
measure in the failure path, and immediately return from
RX thread context if connection state did not actually reach
full feature phase (TARG_CONN_STATE_LOGGED_IN).
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Smatch complains about returning hard coded error codes, silence this
warning.
drivers/target/iscsi/iscsi_target_parameters.c:211
iscsi_create_default_params() warn: returning -1 instead of -ENOMEM is sloppy
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
TCMU sets TRANSPORT_FLAG_PASSTHROUGH, so INQUIRY commands will not be
emulated by LIO but passed up to userspace. Therefore TCMU should not
set these, just like pscsi doesn't.
Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>