I recently came upon a scenario where I would get a double fault
machine check exception tiriggered by a kernel module.
However the ensuing crash stacktrace (ksym lookup) was not working
correctly.
Turns out that machine check auto-disables MMU while modules are allocated
in kernel vaddr spapce.
This patch re-enables the MMU before start printing the stacktrace
making stacktracing of modules work upon a fatal exception.
Cc: stable@kernel.org
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Reviewed-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: moved code into low level handler to avoid in 2 places]
[Needed for HSDK]
Currently the first page of system (hence RAM base) is assumed to be
@ CONFIG_LINUX_LINK_BASE, where kernel itself is linked.
However is case of HSDK platform, for reasons explained in that patch,
this is not true. kernel needs to be linked @ 0x9000_0000 while DDR
is still wired at 0x8000_0000. To properly account for this 256M of RAM,
we need to introduce a new option and base page frame accountiing off of
it.
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: renamed CONFIG_KERNEL_RAM_BASE_ADDRESS => CONFIG_LINUX_RAM_BASE
: simplified changelog]
[Needed for HSDK]
- Currently IOC base is hardcoded to 0x8000_0000 which is default value
of LINUX_LINK_BASE, but may not always be the case
- IOC programming model imposes the constraint that IOC aperture size
needs to be aligned to IOC base address, which we were not checking
so far.
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: reworked the changelog]
Some of the boot printing code had printk() w/o explicit log level.
This patch introduces consistency allowing platforms to switch to less
verbose console logging using cmdline.
NPS400 with 4K CPUs needs to avoid the cpu info printing for faster
bootup.
Signed-off-by: Noam Camus <noamca@mellanox.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Due to a HW bug in NPS400 we get from time to time false TLB miss.
Workaround this by validating each miss.
Signed-off-by: Noam Camus <noamca@mellanox.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
- PAE40 related updates
- SLC errata for region ops
- intc line masking by default
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZmw0DAAoJEGnX8d3iisJeoogP/R78jWBmVIRr7YBvOMEbNZAz
r5dJHnd2jCUtqQ/rmCndwzOqLv2j77RRdH3nlwfM7YaS8LZmK0Dkz9zUGReCalQU
z1sZfBSZuOfodWbDzfXdJVNEGGu8eSG7/s87E6K9UxOuaZJIPNMyU9qGxjDb3BTo
dzgNmT0xgiqZYtv9Y3uciPgkddJLOxE+eMuEpxzbzejLerUUc/jRV8m5qAL8ja9w
NanzLjo7Ec0FiczyYf1DtiONXBVl556IPQoFJtXIbsfZww8kJxFSZ+qemvmFXJOF
cxSfZeRBOCWV9mRW36kGAeKVE9EWqFHFn/UiCfvhTCpoFXPX63Hz+nVRLEE1lhrQ
ZaiQSuu0QgUkWP39ZpAPQjmdBlVxzv9Jsz5Dh72l3C00Hf9yw3jVCsKX/nZLpLM2
pS8pFVnJqttOLX/6wU1JjIQDhPvzqn0V21SwCXBt2DwyXd1zuce82ioPY7K2Uefc
4Unrso+YpIJNh8NlIe7Pvn8kEitNF7MViybofjhKPXlFXT4FqSJIV0Q/iq/L6lh8
RAfJO3GCQQymkB03aVmfRWq+xgCS0v3K2vP50T3+XEyix98ZwH+D4ViaXs9egmLk
EO323ebCKp8AJxICTme5qtmXs+k/CH+KeCzwSc90Mtf1SD3ohvyUeJvA5BGCCyP5
NP5sxiH9cNKwrMIBdvuE
=mOLm
-----END PGP SIGNATURE-----
Merge tag 'arc-4.13-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- PAE40 related updates
- SLC errata for region ops
- intc line masking by default
* tag 'arc-4.13-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
arc: Mask individual IRQ lines during core INTC init
ARCv2: PAE40: set MSB even if !CONFIG_ARC_HAS_PAE40 but PAE exists in SoC
ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses
ARC: dma: implement dma_unmap_page and sg variant
ARCv2: SLC: Make sure busy bit is set properly for region ops
ARC: [plat-sim] Include this platform unconditionally
ARC: [plat-axs10x]: prepare dts files for enabling PAE40 on axs103
ARC: defconfig: Cleanup from old Kconfig options
PAE40 confiuration in hardware extends some of the address registers
for TLB/cache ops to 2 words.
So far kernel was NOT setting the higher word if feature was not enabled
in software which is wrong. Those need to be set to 0 in such case.
Normally this would be done in the cache flush / tlb ops, however since
these registers only exist conditionally, this would have to be
conditional to a flag being set on boot which is expensive/ugly -
specially for the more common case of PAE exists but not in use.
Optimize that by zero'ing them once at boot - nobody will write to
them afterwards
Cc: stable@vger.kernel.org #4.4+
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
It is necessary to explicitly set both SLC_AUX_RGN_START1 and SLC_AUX_RGN_END1
which hold MSB bits of the physical address correspondingly of region start
and end otherwise SLC region operation is executed in unpredictable manner
Without this patch, SLC flushes on HSDK (IOC disabled) were taking
seconds.
Cc: stable@vger.kernel.org #4.4+
Reported-by: Vladimir Kondratiev <vladimir.kondratiev@intel.com>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: PAR40 regs only written if PAE40 exist]
c70c473396 "ARCv2: SLC: Make sure busy bit is set properly on SLC flushing"
fixes problem for entire SLC operation where the problem was initially
caught. But given a nature of the issue it is perfectly possible for
busy bit to be read incorrectly even when region operation was started.
So extending initial fix for regional operation as well.
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: stable@vger.kernel.org #4.10
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Christoph noticed [1] that default DMA pool in current form overload
the DMA coherent infrastructure. In reply, Robin suggested [2] to
split the per-device vs. global pool interfaces, so allocation/release
from default DMA pool is driven by dma ops implementation.
This patch implements Robin's idea and provide interface to
allocate/release/mmap the default (aka global) DMA pool.
To make it clear that existing *_from_coherent routines work on
per-device pool rename them to *_from_dev_coherent.
[1] https://lkml.org/lkml/2017/7/7/370
[2] https://lkml.org/lkml/2017/7/7/431
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Andras Szemzo <sza@esh.hu>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.
This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.
Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.
One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications. For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).
Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.
Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.
Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- AXS10x platform clk updates for I2S, PGU
- Adding region based cache flush operation for ARCv2 cores
- Enforcing PAE40 dependency on HIGHMEM
- ptrace support for additional regs in ARCv2 cores
- Fix build failure in linux-next dut to a header include ordering change
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZEeokAAoJEGnX8d3iisJe/zgP/jpqauxaulNK4BvjJ3pI0+Fn
Pgz5Mgd/S7MzmiDPsb3H0YSWQMlxcSop+EPk8V4jXcXyTuDDaC7NzWLuKau2rjiB
jdbTycsJ4xSS5A8NrnG6AeE8SYAubJmtEwD8JASx+trtLGGHCKdTsRIOQpm5faUn
W2IFaXs9Jof5ZYL5btOzmIOOTxdR6f3Aegq/Zqb6rxqD8ZmtVapjPdIlqQEB/xw4
JSbTH9+HxI1+vpzCuOb1Ba4sglfFG9kH0xxWvMN5HuoYsdXG5KfjoXVsxUIEl/RC
m+bNdDOg5nlCS0U5PngnJnm5beBsAS8BApTTTM3ALNWGrsn+a6/1PraLsIgfa/yI
XgcXp80/rGiFqAA3+fUw+ly1oJb06V9p4SRCOaOTErqcxofS4upbJ7oTu5z0HwMV
HM2aw8vxZ2j4Wqrl5U3fGE9KvifUMJKuhcumpzfcmDsTtklCis4z13nbF0texNF2
tj0yqxYlOSdx4PoVzQG9Audf0sDeKUJ6Of+aGAXW55Zwq+QQJoCCzMSvTEV/xH4P
RUIFkkhCQzshVmUDgHnWAjccrxz/LUrO1zLpwmOD0QWjfJHhyhegrlXGuKxNiRH5
rijMkOIe+Z8yG/UKqW0nvtB9svS0y9vqh3kEG1DJIRnNg7U2Fmk7rB404fGuvh4B
T8uxDhsWP60hMZX1rw6h
=zJ9B
-----END PGP SIGNATURE-----
Merge tag 'arc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC updates from Vineet Gupta:
- AXS10x platform clk updates for I2S, PGU
- add region based cache flush operation for ARCv2 cores
- enforce PAE40 dependency on HIGHMEM
- ptrace support for additional regs in ARCv2 cores
- fix build failure in linux-next dut to a header include ordering
change
* tag 'arc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
Revert "ARCv2: Allow enabling PAE40 w/o HIGHMEM"
ARC: mm: fix build failure in linux-next for UP builds
ARCv2: ptrace: provide regset for accumulator/r30 regs
elf: Add ARCv2 specific core note section
ARCv2: mm: micro-optimize region flush generated code
ARCv2: mm: Merge 2 updates to DC_CTRL for region flush
ARCv2: mm: Implement cache region flush operations
ARC: mm: Move full_page computation into cache version agnostic wrapper
arc: axs10x: Fix ARC PGU default clock frequency
arc: axs10x: Add DT bindings for I2S audio playback
Region Flush has a weird programming model.
1. Flush or Invalidate is selected by DC_CTRL.RGN_OP
2 Flush-n-Invalidate is done by DC_CTRL.IM
Given the code structuring before, case #2 above was generating two
seperate updates to DC_CTRL which was pointless.
| 80a342b0 <__dma_cache_wback_inv_l1>:
| 80a342b0: clri r4
| 80a342b4: lr r2,[dc_ctrl]
| 80a342b8: bset_s r2,r2,0x6
| 80a342ba: sr r2,[dc_ctrl] <-- FIRST
|
| 80a342be: bmskn r3,r0,0x5
|
| 80a342c2: lr r2,[dc_ctrl]
| 80a342c6: and r2,r2,0xfffff1ff
| 80a342ce: bset_s r2,r2,0x9
| 80a342d0: sr r2,[dc_ctrl] <-- SECOND
|
| 80a342d4: add_s r1,r1,0x3f
| 80a342d6: bmsk_s r0,r0,0x5
| 80a342d8: add_s r0,r0,r1
| 80a342da: add_s r0,r0,r3
| 80a342dc: sr r0,[78]
| 80a342e0: sr r3,[77]
|...
|...
So move setting of DC_CTRL.RGN_OP into __before_dc_op() and combine with
any other update.
| 80b63324 <__dma_cache_wback_inv_l1>:
| 80b63324: clri r3
| 80b63328: lr r2,[dc_ctrl]
| 80b6332c: and r2,r2,0xfffff1ff
| 80b63334: or r2,r2,576
| 80b63338: sr r2,[dc_ctrl]
|
| 80b6333c: add_s r1,r1,0x3f
| 80b6333e: bmskn r2,r0,0x5
| 80b63342: add_s r0,r0,r1
| 80b63344: sr r0,[78]
| 80b63348: sr r2,[77]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Pull uaccess unification updates from Al Viro:
"This is the uaccess unification pile. It's _not_ the end of uaccess
work, but the next batch of that will go into the next cycle. This one
mostly takes copy_from_user() and friends out of arch/* and gets the
zero-padding behaviour in sync for all architectures.
Dealing with the nocache/writethrough mess is for the next cycle;
fortunately, that's x86-only. Same for cleanups in iov_iter.c (I am
sold on access_ok() in there, BTW; just not in this pile), same for
reducing __copy_... callsites, strn*... stuff, etc. - there will be a
pile about as large as this one in the next merge window.
This one sat in -next for weeks. -3KLoC"
* 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (96 commits)
HAVE_ARCH_HARDENED_USERCOPY is unconditional now
CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now
m32r: switch to RAW_COPY_USER
hexagon: switch to RAW_COPY_USER
microblaze: switch to RAW_COPY_USER
get rid of padding, switch to RAW_COPY_USER
ia64: get rid of copy_in_user()
ia64: sanitize __access_ok()
ia64: get rid of 'segment' argument of __do_{get,put}_user()
ia64: get rid of 'segment' argument of __{get,put}_user_check()
ia64: add extable.h
powerpc: get rid of zeroing, switch to RAW_COPY_USER
esas2r: don't open-code memdup_user()
alpha: fix stack smashing in old_adjtimex(2)
don't open-code kernel_setsockopt()
mips: switch to RAW_COPY_USER
mips: get rid of tail-zeroing in primitives
mips: make copy_from_user() zero tail explicitly
mips: clean and reorder the forest of macros...
mips: consolidate __invoke_... wrappers
...
As reported in STAR 9001165532, an SLC control reg read (for checking
busy state) right after SLC invalidate command may incorrectly return
NOT busy causing software to NOT spin-wait while operation is underway.
(and for some reason this only happens if L1 cache is also disabled - as
required by IOC programming model)
Suggested workaround is to do an additional Control Reg read, which
ensures the 2nd read gets the right status.
Cc: stable@vger.kernel.org #4.10
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
[vgupta: reworte changelog a bit]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
... and switch to generic out of line version in lib/usercopy.c
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Update code that relied on sched.h including various MM types for them.
This will allow us to remove the <linux/mm_types.h> include from <linux/sched.h>.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split more MM APIs out of <linux/sched.h>, which
will have to be picked up from a couple of .c files.
The APIs that we are going to move are:
arch_pick_mmap_layout()
arch_get_unmapped_area()
arch_get_unmapped_area_topdown()
mm_update_next_owner()
Include the header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/signal.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Bart Van Assche noted that the ib DMA mapping code was significantly
similar enough to the core DMA mapping code that with a few changes
it was possible to remove the IB DMA mapping code entirely and
switch the RDMA stack to use the core DMA mapping code. This resulted
in a nice set of cleanups, but touched the entire tree. This branch
will be submitted separately to Linus at the end of the merge window
as per normal practice for tree wide changes like this.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYo06oAAoJELgmozMOVy/d9Z8QALedWHdu98St1L0u2c8sxnR9
2zo/4sF5Vb9u7FpmdIX32L4SQ9s9KhPE8Qp8NtZLf9v10zlDebIRJDpXknXtKooV
CAXxX4sxBXV27/UrhbZEfXiPrmm6ccJFyIfRnMU6NlMqh2AtAsRa5AC2/RMp8oUD
Med97PFiF0o6TD22/UH1VFbRpX1zjaKyqm7a3as5sJfzNA+UGIZAQ7Euz8000DKZ
xCgVLTEwS0FmOujtBkCst7xa9TjuqR1HLOB4DdGvAhP6BHdz2yamM7Qmh9NN+NEX
0BtjsuXomtn6j6AszGC+bpipCZh3NUigcwoFAARXCYFHibBvo4DPdFeGsraFgXdy
1+KyR8CCeQG3Aly5Vwr264RFPGkGpwMj8PsBlXgQVtrlg4rriaCzOJNmIIbfdADw
ftqhxBOzReZw77aH2s+9p2ILRfcAmPqhynLvFGFo9LBvsik8LVso7YgZN0xGxwcI
IjI/XGC8UskPVsIZBIYA6sl2bYzgOjtBIHiXjRrPlW3uhduIXLrvKFfLPP/5XLAG
ehLXK+J0bfsyY9ClmlNS8oH/WdLhXAyy/KNmnj5bRRm9qg6BRJR3bsOBhZJODuoC
XgEXFfF6/7roNESWxowff7pK0rTkRg/m/Pa4VQpeO+6NWHE7kgZhL6kyIp5nKcwS
3e7mgpcwC+3XfA/6vU3F
=e0Si
-----END PGP SIGNATURE-----
Merge tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma DMA mapping updates from Doug Ledford:
"Drop IB DMA mapping code and use core DMA code instead.
Bart Van Assche noted that the ib DMA mapping code was significantly
similar enough to the core DMA mapping code that with a few changes it
was possible to remove the IB DMA mapping code entirely and switch the
RDMA stack to use the core DMA mapping code.
This resulted in a nice set of cleanups, but touched the entire tree
and has been kept separate for that reason."
* tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits)
IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it
IB/core: Remove ib_device.dma_device
nvme-rdma: Switch from dma_device to dev.parent
RDS: net: Switch from dma_device to dev.parent
IB/srpt: Modify a debug statement
IB/srp: Switch from dma_device to dev.parent
IB/iser: Switch from dma_device to dev.parent
IB/IPoIB: Switch from dma_device to dev.parent
IB/rxe: Switch from dma_device to dev.parent
IB/vmw_pvrdma: Switch from dma_device to dev.parent
IB/usnic: Switch from dma_device to dev.parent
IB/qib: Switch from dma_device to dev.parent
IB/qedr: Switch from dma_device to dev.parent
IB/ocrdma: Switch from dma_device to dev.parent
IB/nes: Remove a superfluous assignment statement
IB/mthca: Switch from dma_device to dev.parent
IB/mlx5: Switch from dma_device to dev.parent
IB/mlx4: Switch from dma_device to dev.parent
IB/i40iw: Remove a superfluous assignment statement
IB/hns: Switch from dma_device to dev.parent
...
This file was only including module.h for exception table related
functions. We've now separated that content out into its own file
"extable.h" so now move over to that and avoid all the extra header
content in module.h that we don't really need to compile this file.
Since the file does have some EXPORT_SYMBOL, we add export.h include.
Cc: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Cc: linux-snps-arc@lists.infradead.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The programming model has been fixed with prev patches so re-enable it
by default
This reverts commit 23cb1f6440.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
arc_cache_init() is called for each core so can't be tagged __init.
However bulk of it is only executed by master core and thus is candidate
for __init reaping.
So split it up to allow that.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
vs. fixed 512M before.
But this still assumes that all of memory is under IOC which may not be
true for the SoC. Improve that later when this becomes a real issue, by
specifying this from DT.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
On AXS103 release bitfiles, DMA data corruptions were seen because IOC
setup was not following the recommended way in documentation.
Flipping IOC on when caches are enabled or coherency transactions are in
flight, might cause some of the memory operations to not observe
coherency as expected.
So strictly follow the programming model recommendations as documented
in comment header above arc_ioc_setup()
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
An ARC700 customer reported linux boot crashes when upgrading to bigger
L1 dcache (64K from 32K). Turns out they had an aliasing VIPT config and
current code only assumed 2 colours, while theirs had 4. So default to 4
colours and complain if there are fewer. Ideally this needs to be a
Kconfig option, but heck that's too much of hassle for a single user.
Cc: stable@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Historical MMU revisions have been paired with Cache revision updates
which are captured in MMU and Cache Build Configuration Registers respectively.
This was used in boot code to check for configurations mismatches,
speically in simulations (such as running with non existent caches,
non pairing MMU and Cache version etc). This can instead be inferred
from other cache params such as line size. So remove @ver from post
processed @cpuinfo which could be used later to save soem other
interesting info.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Patch series "Add support for DMA writable pages being writable by the
network stack", v3.
The first 19 patches in the set add support for the DMA attribute
DMA_ATTR_SKIP_CPU_SYNC on multiple platforms/architectures. This is
needed so that we can flag the calls to dma_map/unmap_page so that we do
not invalidate cache lines that do not currently belong to the device.
Instead we have to take care of this in the driver via a call to
sync_single_range_for_cpu prior to freeing the Rx page.
Patch 20 adds support for dma_map_page_attrs and dma_unmap_page_attrs so
that we can unmap and map a page using the DMA_ATTR_SKIP_CPU_SYNC
attribute.
Patch 21 adds support for freeing a page that has multiple references
being held by a single caller. This way we can free page fragments that
were allocated by a given driver.
The last 2 patches use these updates in the igb driver, and lay the
groundwork to allow for us to reimplement the use of build_skb.
This patch (of 23):
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
later via a sync_for_cpu or sync_for_device call.
Link: http://lkml.kernel.org/r/20161110113419.76501.38491.stgit@ahduyck-blue-test.jf.intel.com
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We used to use generic implementation of dma_map_ops.mmap which is
dma_common_mmap() but that only worked for simpler cached mappings when
vaddr = paddr.
If a driver requests uncached DMA buffer kernel maps it to virtual
address so that MMU gets involved and page uncached status takes into
account. In that case usage of dma_common_mmap() lead to mapping of
vaddr to vaddr for user-space which is obviously wrong. For more detals
please refer to verbose explanation here [1].
So here we implement our own version of mmap() which always deals
with dma_addr and maps underlying memory to user-space properly
(note that DMA buffer mapped to user-space is always uncached
because there's no way to properly manage cache from user-space).
[1] https://lkml.org/lkml/2016/10/26/973
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: <stable@vger.kernel.org> #4.5+
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Previously we would not print the case when IOC existed but was not
enabled.
And while at it, reduce one line off boot printing by consolidating
the Peripheral address space and IO-Coherency which in a way
applies to them
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
if user disables IOC from debugger at startup (by clearing @ioc_enable),
@ioc_exists is cleared too. This means boot prints don't capture the
fact that IOC was present but disabled which could be misleading.
So invert how we use @ioc_enable and @ioc_exists and make it more
canonical. @ioc_exists represent whether hardware is present or not and
stays same whether enabled or not. @ioc_enable is still user driven,
but will be auto-disabled if IOC hardware is not present, i.e. if
@ioc_exist=0. This is opposite to what we were doing before, but much
clearer.
This means @ioc_enable is now the "exported" toggle in rest of code such
as dma mapping API.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
HS release 3.0 provides for even more flexibility in specifying the
volatile address space for mapping peripherals.
With HS 2.1 @start was made flexible / programmable - with HS 3.0 even
@end can be setup (vs. fixed to 0xFFFF_FFFF before).
So add code to reflect that and while at it remove an unused struct
defintion
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
For resources shared by all cores such as SLC and IOC, only the master
core needs to do any setups / enabling / disabling etc.
Cc: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer. Thus the pointer can point to const data.
However the attributes do not have to be a bitfield. Instead unsigned
long will do fine:
1. This is just simpler. Both in terms of reading the code and setting
attributes. Instead of initializing local attributes on the stack
and passing pointer to it to dma_set_attr(), just set the bits.
2. It brings safeness and checking for const correctness because the
attributes are passed by value.
Semantic patches for this change (at least most of them):
virtual patch
virtual context
@r@
identifier f, attrs;
@@
f(...,
- struct dma_attrs *attrs
+ unsigned long attrs
, ...)
{
...
}
@@
identifier r.f;
@@
f(...,
- NULL
+ 0
)
and
// Options: --all-includes
virtual patch
virtual context
@r@
identifier f, attrs;
type t;
@@
t f(..., struct dma_attrs *attrs);
@@
identifier r.f;
@@
f(...,
- NULL
+ 0
)
Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Acked-by: Mark Salter <msalter@redhat.com> [c6x]
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There was only one use of __initdata_refok and __exit_refok
__init_refok was used 46 times against 82 for __ref.
Those definitions are obsolete since commit 312b1485fb ("Introduce new
section reference annotations tags: __ref, __refdata, __refconst")
This patch removes the following compatibility definitions and replaces
them treewide.
/* compatibility defines */
#define __init_refok __ref
#define __initdata_refok __refdata
#define __exit_refok __ref
I can also provide separate patches if necessary.
(One patch per tree and check in 1 month or 2 to remove old definitions)
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Things have been calm here - nothing much except for a few fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXm7MUAAoJEGnX8d3iisJeTbkP/3z5vrczERECRsaHGU6KpaUP
F9+SpDG+BmCzcKhz60GzY64p6gl2KWtbE3es66lQf45yHD7s3UhAH1Zxc6pFgZjn
JTF8z3LFRDlKQ0H4gVxWsDp4+QXn3LOklckUpgqTocLxNpg9qXVXCVZhoUb12A3d
AxPtOUGsFTtC1e6wnYofGknJjApRls8f11CYmJdQ8aS9lLC/pA+fC0U0fM7LUzx5
DE+KLB3LithYxQ9TBfVrFSbCTbyxxDmYE59v9DEZQftn9pwVMtZLpGEs4BVO31fw
bQLvROfx5xn1yElN4yH2XT6q+N47XA6MtJ3qDvjPN59yWYwP9mOCWlfoIJ0q8UY0
sduU+9K1qZ5WQkXjh66+tPUKpm01YrZy2vghkCJ6YWXnOg9WzbYZQtkwia5TyU8h
lQ36ri72ncK9gPfOSxQGmxY19o7ujX9our9T+bQ4JyjtMicCVKlxSGJLruyTd2ma
LqOjqpLuZv2Ryf/2UbpoOmCsynfjqSLLc2CR+jlGJvH1vD9ycvfjMS1dcTcpcJ3Z
AcsEDBoviMRbM2mWCybtT5gs35vzWWCpgsi+hG4lg0kYtrclGWsc2/uG13BFNK3w
N9/aKgy6a8hYWWOEpKzqzuonR13oob91pDbkUD/m8uS9PSg/5W4WFlvgAjOIog9G
pfL3M6oF/gEZ4CByLS7J
=/rhI
-----END PGP SIGNATURE-----
Merge tag 'arc-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC updates from Vineet Gupta:
"Things have been calm here - nothing much except for a few fixes"
* tag 'arc-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: mm: don't loose PTE_SPECIAL in pte_modify()
ARC: dma: fix address translation in arc_dma_free
ARC: typo fix in mm/ioremap.c
ARC: fix linux-next build breakage
page should be calculated using physical address.
If platform uses non-trivial dma-to-phys memory translation,
dma_handle should be converted to physicval address before
calculation of page.
Failing to do so results in struct page * pointing to
wrong or non-existent memory.
Fixes: f2e3d55397 ("ARC: dma: reintroduce platform specific dma<->phys")
Cc: stable@vger.kernel.org #4.6+
Signed-off-by: Vladimir Kondratiev <vladimir.kondratiev@intel.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
NPS use special mapping right below TASK_SIZE.
Hence we need to lower STACK_TOP so that user stack won't
overlap NPS special mapping.
Signed-off-by: Noam Camus <noamc@ezchip.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
On ARC, lower 2G of address space is translated and used for
- user vaddr space (region 0 to 5)
- unused kernel-user gutter (region 6)
- kernel vaddr space (region 7)
where each region simply represents 256MB of address space.
The kernel vaddr space of 256MB is used to implement vmalloc, modules
So far this was enough, but not on EZChip system with 4K CPUs (given
that per cpu mechanism uses vmalloc for allocating chunks)
So allow VMALLOC_SIZE to be configurable by expanding down into the unused
kernel-user gutter region which at default 256M was excessive anyways.
Also use _BITUL() to fix a build error since PGDIR_SIZE cannot use "1UL"
as called from assembly code in mm/tlbex.S
Signed-off-by: Noam Camus <noamc@ezchip.com>
[vgupta: rewrote changelog, debugged bootup crash due to int vs. hex]
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Initial HIGHMEM support on ARC was introduced for PAE40 where the low
memory (0x8000_0000 based) and high memory (0x1_0000_0000) were
physically contiguous. So CONFIG_FLATMEM sufficed (despite a peipheral
hole in the middle, which wasted a bit of struct page memory, but things
worked).
However w/o PAE, highmem was not possible and we could only reach
~1.75GB of DDR. Now there is a use case to access ~4GB of DDR w/o PAE40
The idea is to have low memory at canonical 0x8000_0000 and highmem
at 0 so enire 4GB address space is available for physical addressing
This needs additional platform/interconnect mapping to convert
the non contiguous physical addresses into linear bus adresses.
From Linux point of view, non contiguous divide means FLATMEM no
longer works and DISCONTIGMEM is needed to track the pfns in the 2
regions.
This scheme would also work for PAE40, only better in that we don't
waste struct page memory for the peripheral hole.
The DT description will be something like
memory {
...
reg = <0x80000000 0x200000000 /* 512MB: lowmem */
0x00000000 0x10000000>; /* 256MB: highmem */
}
Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The peripheral address space is architectural address window which is
uncached and typically used to wire up peripherals.
For ARC700 cores (ARCompact ISA based) this was fixed to 1GB region
0xC000_0000 - 0xFFFF_FFFF.
For ARCv2 based HS38 cores the start address is flexible and can be
0xC, 0xD, 0xE, 0xF 000_000 by programming AUX_NON_VOLATILE_LIMIT reg
(typically done in bootloader)
Further in cas of PAE, the physical address can extend beyond 4GB so
need to confine this check, otherwise all pages beyond 4GB will be
treated as uncached
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Previously a non-coherent page (hardware IOC or simply driver needs)
could be handled by cpu with paddr alone (kvaddr used to be needed for
coherent mappings to enforce uncached semantics via a MMU mapping).
Now however such a page might still require a V-P mapping if it was in
physical address space > 32bits due to PAE40, which the CPU can't access
directly with a paddr
So decouple decision of kvaddr allocation from type of alloc request
(coh/non-coh)
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
vs. the ones which reutne void *, so that we can handle pages > 4GB
in subsequent patches
Also plug a potential page leak in case ioremap fails
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Let's define page_mapped() to be true for compound pages if any
sub-pages of the compound page is mapped (with PMD or PTE).
On other hand page_mapcount() return mapcount for this particular small
page.
This will make cases like page_get_anon_vma() behave correctly once we
allow huge pages to be mapped with PTE.
Most users outside core-mm should use page_mapcount() instead of
page_mapped().
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| WARNING: vmlinux.o(.text+0xd6c2): Section mismatch in reference from the function alloc_kmap_pgtable() to the function
| .init.text:__alloc_bootmem_low()
The function alloc_kmap_pgtable() references the function __init __alloc_bootmem_low().
This is often because alloc_kmap_pgtable lacks a __init annotation or the annotation of __alloc_bootmem_low is wrong.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
HIGHMEM support bumped the default memory size for nsim platform to 1G.
Thus total memory ended at the very edge of start of peripherals address
space. With linux link base shifted, memory started bleeding into
peripheral space which caused early boot bad_page spew !
Fixes: 29e332261d ("ARC: mm: HIGHMEM: populate high memory from DT")
Reported-by: Anton Kolesov <akolesov@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
ARCompact and ARCv2 only have ASL, while binutils used to support LSL as
a alias mnemonic.
Newer binutils (upstream) don't want to do that so replace it.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This is the first working implementation of 40-bit physical address
extension on ARCv2.
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
That way a single flip of phys_addr_t to 64 bit ensures all places
dealing with physical addresses get correct data
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Implement kmap* API for ARC.
This enables
- permanent kernel maps (pkmaps): :kmap() API
- fixmap : kmap_atomic()
We use a very simple/uniform approach for both (unlike some of the other
arches). So fixmap doesn't use the customary compile time address stuff.
The important semantic is sleep'ability (pkmap) vs. not (fixmap) which
the API guarantees.
Note that this patch only enables highmem for subsequent PAE40 support
as there is no real highmem for ARC in pure 32-bit paradigm as explained
below.
ARC has 2:2 address split of the 32-bit address space with lower half
being translated (virtual) while upper half unstranslated
(0x8000_0000 to 0xFFFF_FFFF). kernel itself is linked at base of
unstranslated space (i.e. 0x8000_0000 onwards), which is mapped to say
DDR 0x0 by external Bus Glue logic (outside the core). So kernel can
potentially access 1.75G worth of memory directly w/o need for highmem.
(the top 256M is taken by uncached peripheral space from 0xF000_0000 to
0xFFFF_FFFF)
In PAE40, hardware can address memory beyond 4G (0x1_0000_0000) while
the logical/virtual addresses remain 32-bits. Thus highmem is required
for kernel proper to be able to access these pages for it's own purposes
(user space is agnostic to this anyways).
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Before we plug in highmem support, some of code needs to be ready for it
- copy_user_highpage() needs to be using the kmap_atomic API
- mk_pte() can't assume page_address()
- do_page_fault() can't assume VMALLOC_END is end of kernel vaddr space
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
- Move the verbosity knob from .data to .bss by using inverted logic
- No need to readout PD1 descriptor
- clip the non pfn bits of PD0 to avoid clipping inside the loop
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This frees up some bits to hold more high level info such as PAE being
present, w/o increasing the size of already bloated cpuinfo struct
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Implement the TLB flush routine to evict a sepcific Super TLB entry,
vs. moving to a new ASID on every such flush.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
MMUv4 in HS38x cores supports Super Pages which are basis for Linux THP
support.
Normal and Super pages can co-exist (ofcourse not overlap) in TLB with a
new bit "SZ" in TLB page desciptor to distinguish between them.
Super Page size is configurable in hardware (4K to 16M), but fixed once
RTL builds.
The exact THP size a Linx configuration will support is a function of:
- MMU page size (typical 8K, RTL fixed)
- software page walker address split between PGD:PTE:PFN (typical
11:8:13, but can be changed with 1 line)
So for above default, THP size supported is 8K * 256 = 2M
Default Page Walker is 2 levels, PGD:PTE:PFN, which in THP regime
reduces to 1 level (as PTE is folded into PGD and canonically referred
to as PMD).
Thus thp PMD accessors are implemented in terms of PTE (just like sparc)
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
In case of ARCv2 CPU there're could be following configurations
that affect cache handling for data exchanged with peripherals
via DMA:
[1] Only L1 cache exists
[2] Both L1 and L2 exist, but no IO coherency unit
[3] L1, L2 caches and IO coherency unit exist
Current implementation takes care of [1] and [2].
Moreover support of [2] is implemented with run-time check
for SLC existence which is not super optimal.
This patch introduces support of [3] and rework of DMA ops
usage. Instead of doing run-time check every time a particular
DMA op is executed we'll have 3 different implementations of
DMA ops and select appropriate one during init.
As for IOC support for it we need:
[a] Implement empty DMA ops because IOC takes care of cache
coherency with DMAed data
[b] Route dma_alloc_coherent() via dma_alloc_noncoherent()
This is required to make IOC work in first place and also
serves as optimization as LD/ST to coherent buffers can be
srviced from caches w/o going all the way to memory
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
[vgupta:
-Added some comments about IOC gains
-Marked dma ops as static,
-Massaged changelog a bit]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
alloc_pages_exact() get gfp flags and handle zero'ing already
And while it, fix the case where ioremap fails: return rightaway.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
SLC maintenance ops need to be serialized by software as there is no
inherent buffering / quequing of aux commands. It can silently ignore a
new aux operation if previous one is still ongoing (SLC_CTRL_BUSY)
So gaurd the SLC op using a spin lock
The spin lock doesn't seem to be contended even in heavy workloads such
as iperf. On FPGA @ 75 MHz.
[1] Before this change:
============================================================
# iperf -c 10.42.0.1
------------------------------------------------------------
Client connecting to 10.42.0.1, TCP port 5001
TCP window size: 43.8 KByte (default)
------------------------------------------------------------
[ 3] local 10.42.0.110 port 38935 connected with 10.42.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 48.4 MBytes 40.6 Mbits/sec
============================================================
[2] After this change:
============================================================
# iperf -c 10.42.0.1
------------------------------------------------------------
Client connecting to 10.42.0.1, TCP port 5001
TCP window size: 43.8 KByte (default)
------------------------------------------------------------
[ 3] local 10.42.0.243 port 60248 connected with 10.42.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 47.5 MBytes 39.8 Mbits/sec
# iperf -c 10.42.0.1
------------------------------------------------------------
Client connecting to 10.42.0.1, TCP port 5001
TCP window size: 43.8 KByte (default)
------------------------------------------------------------
[ 3] local 10.42.0.243 port 60249 connected with 10.42.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 54.9 MBytes 46.0 Mbits/sec
============================================================
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: arc-linux-dev@synopsys.com
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
ARCv2 is the next generation ISA from Synopsys and basis for the
HS3{4,6,8} families of processors which retain the traditional ARC mantra of
low power and configurability and are now more performant and feature rich.
HS38x is a 10 stage pipeline core which supports MMU (with huge pages) and
SMP (upto 4 cores) among other features.
+ www.synopsys.com/dw/ipdir.php?ds=arc-hs38-processor
+ http://news.synopsys.com/2014-10-14-New-DesignWare-ARC-HS38-Processor-Doubles-Performance-for-Embedded-Linux-Applications
+ http://www.embedded.com/electronics-news/4435975/Synopsys-ARC-HS38-core-gives-2X-boost-to-Linux-based-apps
- Support for ARC SDP (Software Development platform): Main Board + CPU Cards
= AXS101: CPU Card with ARC700 in silicon @ 700 MHz
= AXS103: CPU Card with HS38x in FPGA
- Refactoring of ARCompact port to accomodate new ARCv2 ISA
- Miscll updates/cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVk0g8AAoJEGnX8d3iisJecqsQAI6gvBC4GSNYDrmgGJJK1uLQ
uf6ZXQRLBtyxwa6VMvaNFe91i5XV5WvEXDuNBQX4FdYbp7Fs+Jz5VK79xFtbVEdU
H6mgKcs9HBwQvrHBxl54XxxXfX7kD1kxrlV7cL4b7bXTEX0XyH5ROUj600/YP+B4
8t+XdYcfgFK0HpeFGXVP+Xmv/e+hBbzCpOjOd2ZFqEwymvSpZDc4KZ2yDvV2+Ybn
JNZ421urQOrxR27njvvPvtpeN7uuJKfRYq7IuIR8+Ad72S19EDdw+DZHp2XoUMXA
wgydWrrOaX2Dr2CmXHGA1C4nWEG7+Yo9I1WitjJct0tkOQyDR2OIDGmvKGBd1uoS
QsihtoKBRvns+2gpXBEOmOHmF6ggpHNN0ppIwCp+AK5kX3fmxBtyUekyYmVpg8oQ
xgFIuJgmiAvW7QB7xIO6SFFt18De2ifDRrKWJwVauvfW/PvUIwuUBEcbh0OHAn54
ebUUWu2ZdVNe0XCsZOAQGwYHZRWBk8Bn3bhFpNnOliRiF77e9GsKeGYeIswYFy7I
42Gp35ftEj1pLLFZ1vIsAo72N6ErmHwPOcJkaBYaTbPGPcTEO2aR6b8WOcCjsPxK
DUeUV3H2HV+6V4jw/96lnsaRqsaj4TsJxEAFRR3wT1DLoRudCIDubaXTdvvDie77
RgKn4ZdxgmXD97+deBqc
=KwNo
-----END PGP SIGNATURE-----
Merge tag 'arc-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC architecture updates from Vineet Gupta:
- support for HS38 cores based on ARCv2 ISA
ARCv2 is the next generation ISA from Synopsys and basis for the
HS3{4,6,8} families of processors which retain the traditional ARC mantra of
low power and configurability and are now more performant and feature rich.
HS38x is a 10 stage pipeline core which supports MMU (with huge pages) and
SMP (upto 4 cores) among other features.
+ www.synopsys.com/dw/ipdir.php?ds=arc-hs38-processor
+ http://news.synopsys.com/2014-10-14-New-DesignWare-ARC-HS38-Processor-Doubles-Performance-for-Embedded-Linux-Applications
+ http://www.embedded.com/electronics-news/4435975/Synopsys-ARC-HS38-core-gives-2X-boost-to-Linux-based-apps
- support for ARC SDP (Software Development platform): Main Board + CPU Cards
= AXS101: CPU Card with ARC700 in silicon @ 700 MHz
= AXS103: CPU Card with HS38x in FPGA
- refactoring of ARCompact port to accomodate new ARCv2 ISA
- misc updates/cleanups
* tag 'arc-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (72 commits)
ARC: Fix build failures for ARCompact in linux-next after ARCv2 support
ARCv2: Allow older gcc to cope with new regime of ARCv2/ARCompact support
ARCv2: [vdk] dts files and defconfig for HS38 VDK
ARCv2: [axs103] Support ARC SDP FPGA platform for HS38x cores
ARC: [axs101] Prepare for AXS103
ARCv2: [nsim*hs*] Support simulation platforms for HS38x cores
ARCv2: All bits in place, allow ARCv2 builds
ARCv2: SLC: Handle explcit flush for DMA ops (w/o IO-coherency)
ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock
ARC: Reduce bitops lines of code using macros
ARCv2: barriers
arch: conditionally define smp_{mb,rmb,wmb}
ARC: add smp barriers around atomics per Documentation/atomic_ops.txt
ARC: add compiler barrier to LLSC based cmpxchg
ARCv2: SMP: intc: IDU 2nd level intc for dynamic IRQ distribution
ARCv2: SMP: clocksource: Enable Global Real Time counter
ARCv2: SMP: ARConnect debug/robustness
ARCv2: SMP: Support ARConnect (MCIP) for Inter-Core-Interrupts et al
ARC: make plat_smp_ops weak to allow over-rides
ARCv2: clocksource: Introduce 64bit local RTC counter
...
- CONFIG_ARC_UBOOT_SUPPORT to handle arguments passed in r0, r1, r2
- CONFIG_DEVTMPFS_MOUNT for mouting rootfs since it uses external cpio
for rootfs
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: devicetree@vger.kernel.org
Signed-off-by: Ruud Derwig <rderwig@synopsys.com>
[vgupta: folded the Main baord DT files for smp/up into one]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
L2 cache on ARCHS processors is called SLC (System Level Cache)
For working DMA (in absence of hardware assisted IO Coherency) we need
to manage SLC explicitly when buffers transition between cpu and
controllers.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Caveats about cache flush on ARCv2 based cores
- dcache is PIPT so paddr is sufficient for cache maintenance ops (no
need to setup PTAG reg
- icache is still VIPT but only aliasing configs need PTAG setup
So basically this is departure from MMU-v3 which always need vaddr in
line ops registers (DC_IVDL, DC_FLDL, IC_IVIL) but paddr in DC_PTAG,
IC_PTAG respectively.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
- Remove the ifdef'ery and write distinct versions for each mmu ver even
if there is some code duplication
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
That is because __after_dc_op() already reads it for status check, so it
is better anyways to use that "newer" value.
Also reduces the clutter in callers for passing from/to these routines.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Introduce faulthandler_disabled() and use it to check for irq context and
disabled pagefaults (via pagefault_disable()) in the pagefault handlers.
Please note that we keep the in_atomic() checks in place - to detect
whether in irq context (in which case preemption is always properly
disabled).
In contrast, preempt_disable() should never be used to disable pagefaults.
With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt
counter, and therefore the result of in_atomic() differs.
We validate that condition by using might_fault() checks when calling
might_sleep().
Therefore, add a comment to faulthandler_disabled(), describing why this
is needed.
faulthandler_disabled() and pagefault_disable() are defined in
linux/uaccess.h, so let's properly add that include to all relevant files.
This patch is based on a patch from Thomas Gleixner.
Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-7-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
"you should SIGSEGV" error, because the SIGSEGV case was generally
handled by the caller - usually the architecture fault handler.
That results in lots of duplication - all the architecture fault
handlers end up doing very similar "look up vma, check permissions, do
retries etc" - but it generally works. However, there are cases where
the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.
In particular, when accessing the stack guard page, libsigsegv expects a
SIGSEGV. And it usually got one, because the stack growth is handled by
that duplicated architecture fault handler.
However, when the generic VM layer started propagating the error return
from the stack expansion in commit fee7e49d45 ("mm: propagate error
from stack expansion even for guard page"), that now exposed the
existing VM_FAULT_SIGBUS result to user space. And user space really
expected SIGSEGV, not SIGBUS.
To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
duplicate architecture fault handlers about it. They all already have
the code to handle SIGSEGV, so it's about just tying that new return
value to the existing code, but it's all a bit annoying.
This is the mindless minimal patch to do this. A more extensive patch
would be to try to gather up the mostly shared fault handling logic into
one generic helper routine, and long-term we really should do that
cleanup.
Just from this patch, you can generally see that most architectures just
copied (directly or indirectly) the old x86 way of doing things, but in
the meantime that original x86 model has been improved to hold the VM
semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
"newer" things, so it would be a good idea to bring all those
improvements to the generic case and teach other architectures about
them too.
Reported-and-tested-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Jan Engelhardt <jengelh@inai.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 609838cfed ("mm: invoke oom-killer from remaining unconverted page
fault handlers") converted arc to call pagefault_out_of_memory(), so remove
the comment about future conversion.
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
INV cmd for dcache provides 2 modes discard or wback-before-discard.
One is default and other needs to be set, if so desired. This is common
for line-op/entire-cache-op. So refactor them out into a helper
Doesn't affect generated code but paves way for any common
micro-optimization.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* print aliasing or not, VIPT/PIPT etc
* compress param storage using bitfields
* more use of IS_ENABLED to de-uglify code
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
icaches are not snooped hence not cohrent in SMP setups which means
kernel has to do cross core calls to ensure the same.
The leaf routine __ic_line_inv_vaddr() now does cross core calls.
__sync_icache_dcache() is affected due to this:
* local dcache line flushed ahead of remote icache inv requests
* can't disable interrupts anymore, since
__ic_line_inv_vaddr()->on_each_cpu() can deadlock.
| WARNING: CPU: 0 PID: 1 at kernel/smp.c:374
| smp_call_function_many+0x25a/0x2c4()
|
| init_kprobes+0x90/0xc8
| register_kprobe+0x1d6/0x510
| __sync_icache_dcache+0x28/0x80
|
| DISABLE IRQ
|
| __ic_line_inv_vaddr
| on_each_cpu
| smp_call_function_many+0x25a/0x2c4 --> WARN
| __ic_line_inv_vaddr_local
| __dc_line_op
* TODO: Needs to use mask of relevant CPUs to avoid broadcasting
Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
With commit 9df62f0544 "arch: use ASM_NL instead of ';'" the generic
macros can handle the arch specific newline quirk. Hence we can get rid
of ARC asm macros and use the "C" style macros.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This fixes a subtle issue with cache flush which could potentially cause
random userspace crashes because of stale icache lines.
This error crept in when consolidating the cache flush code
Fixes: bd12976c36 (ARC: cacheflush refactor #3: Unify the {d,i}cache)
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # 3.13
Cc: arc-linux-dev@synopsys.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
usual for this cycle with lots of clean-up.
- Cross arch clean-up and consolidation of early DT scanning code.
- Clean-up and removal of arch prom.h headers. Makes arch specific
prom.h optional on all but Sparc.
- Addition of interrupts-extended property for devices connected to
multiple interrupt controllers.
- Refactoring of DT interrupt parsing code in preparation for deferred
probe of interrupts.
- ARM cpu and cpu topology bindings documentation.
- Various DT vendor binding documentation updates.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJSgPQ4AAoJEMhvYp4jgsXif28H/1WkrXq5+lCFQZF8nbYdE2h0
R8PsfiJJmAl6/wFgQTsRel+ScMk2hiP08uTyqf2RLnB1v87gCF7MKVaLOdONfUDi
huXbcQGWCmZv0tbBIklxJe3+X3FIJch4gnyUvPudD1m8a0R0LxWXH/NhdTSFyB20
PNjhN/IzoN40X1PSAhfB5ndWnoxXBoehV/IVHVDU42vkPVbVTyGAw5qJzHW8CLyN
2oGTOalOO4ffQ7dIkBEQfj0mrgGcODToPdDvUQyyGZjYK2FY2sGrjyquir6SDcNa
Q4gwatHTu0ygXpyphjtQf5tc3ZCejJ/F0s3olOAS1ahKGfe01fehtwPRROQnCK8=
=GCbY
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"DeviceTree updates for 3.13. This is a bit larger pull request than
usual for this cycle with lots of clean-up.
- Cross arch clean-up and consolidation of early DT scanning code.
- Clean-up and removal of arch prom.h headers. Makes arch specific
prom.h optional on all but Sparc.
- Addition of interrupts-extended property for devices connected to
multiple interrupt controllers.
- Refactoring of DT interrupt parsing code in preparation for
deferred probe of interrupts.
- ARM cpu and cpu topology bindings documentation.
- Various DT vendor binding documentation updates"
* tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits)
powerpc: add missing explicit OF includes for ppc
dt/irq: add empty of_irq_count for !OF_IRQ
dt: disable self-tests for !OF_IRQ
of: irq: Fix interrupt-map entry matching
MIPS: Netlogic: replace early_init_devtree() call
of: Add Panasonic Corporation vendor prefix
of: Add Chunghwa Picture Tubes Ltd. vendor prefix
of: Add AU Optronics Corporation vendor prefix
of/irq: Fix potential buffer overflow
of/irq: Fix bug in interrupt parsing refactor.
of: set dma_mask to point to coherent_dma_mask
of: add vendor prefix for PHYTEC Messtechnik GmbH
DT: sort vendor-prefixes.txt
of: Add vendor prefix for Cadence
of: Add empty for_each_available_child_of_node() macro definition
arm/versatile: Fix versatile irq specifications.
of/irq: create interrupts-extended property
microblaze/pci: Drop PowerPC-ism from irq parsing
of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code.
of/irq: Use irq_of_parse_and_map()
...
- Add mm_cpumask setting (aggregating only, unlike some other arches)
used to restrict the TLB flush cross-calling
- cross-calling versions of TLB flush routines (thanks to Noam)
Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
------------------>8----------------------
arch/arc/mm/tlb.c: In function ‘do_tlb_overlap_fault’:
arch/arc/mm/tlb.c:688:13: warning: array subscript is above array bounds
[-Warray-bounds]
(pd0[n] & PAGE_MASK)) {
^
------------------>8----------------------
While at it, remove the usless last iteration of outer loop when reading
a TLB SET for duplicate entries.
Suggested-by: Mischa Jonker <mjonker@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
switch the args (address, pt_regs) to match with all the other "C"
exception handlers.
This removes the awkwardness in EV_ProtV for page fault vs. unaligned
access.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Line op needs vaddr (indexing) and paddr (tag match). For page sized
flushes (V-P const), each line op will need a different index, but the
tag bits wil remain constant, hence paddr can be setup once outside the
loop.
This improves select LMBench numbers for Aliasing dcache where we have
more "preventive" cache flushing.
Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host OS Mhz null null open slct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
3.11-rc7- Linux 3.11.0- 80 4.66 8.88 69.7 112. 268. 8.60 28.0 3489 13.K 27.K # Non alias ARC700
3.11-rc7- Linux 3.11.0- 80 4.64 8.51 68.6 98.5 271. 8.58 28.1 4160 15.K 32.K # Aliasing
3.11-rc7- Linux 3.11.0- 80 4.64 8.51 69.8 99.4 270. 8.73 27.5 3880 15.K 31.K # PTAG loop Inv
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
With Line length being constant now, we can fold the 2 helpers into 1.
This allows applying any optimizations (forthcoming) to single place.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
ARC dcache supports 3 ops - Inv, Flush, Flush-n-Inv.
The programming model however provides 2 commands FLUSH, INV.
INV will either discard or flush-n-discard (based on DT_CTRL bit)
The leaf helper __dc_line_loop() used to take the AUX register
(corresponding to the 2 commands). Now we push that to within the
helper, paving way for code consolidations to follow.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
A vmalloc fault needs to sync up PGD/PTE entry from init_mm to current
task's "active_mm". ARC vmalloc fault handler however was using mm.
A vmalloc fault for non user task context (actually pre-userland, from
init thread's open for /dev/console) caused the handler to deref NULL mm
(for mm->pgd)
The reasons it worked so far is amazing:
1. By default (!SMP), vmalloc fault handler uses a cached value of PGD.
In SMP that MMU register is repurposed hence need for mm pointer deref.
2. In pre-3.12 SMP kernel, the problem triggering vmalloc didn't exist in
pre-userland code path - it was introduced with commit 20bafb3d23
"n_tty: Move buffers into n_tty_data"
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: stable@vger.kernel.org #3.10 and 3.11
Cc: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All arches do essentially the same thing now for
early_init_dt_setup_initrd_arch, so it can now be removed.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
Unlike global OOM handling, memory cgroup code will invoke the OOM killer
in any OOM situation because it has no way of telling faults occuring in
kernel context - which could be handled more gracefully - from
user-triggered faults.
Pass a flag that identifies faults originating in user space from the
architecture-specific fault handlers to generic code so that memcg OOM
handling can be improved.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The memcg code can trap tasks in the context of the failing allocation
until an OOM situation is resolved. They can hold all kinds of locks
(fs, mm) at this point, which makes it prone to deadlocking.
This series converts memcg OOM handling into a two step process that is
started in the charge context, but any waiting is done after the fault
stack is fully unwound.
Patches 1-4 prepare architecture handlers to support the new memcg
requirements, but in doing so they also remove old cruft and unify
out-of-memory behavior across architectures.
Patch 5 disables the memcg OOM handling for syscalls, readahead, kernel
faults, because they can gracefully unwind the stack with -ENOMEM. OOM
handling is restricted to user triggered faults that have no other
option.
Patch 6 reworks memcg's hierarchical OOM locking to make it a little
more obvious wth is going on in there: reduce locked regions, rename
locking functions, reorder and document.
Patch 7 implements the two-part OOM handling such that tasks are never
trapped with the full charge stack in an OOM situation.
This patch:
Back before smart OOM killing, when faulting tasks were killed directly on
allocation failures, the arch-specific fault handlers needed special
protection for the init process.
Now that all fault handlers call into the generic OOM killer (see commit
609838cfed97: "mm: invoke oom-killer from remaining unconverted page
fault handlers"), which already provides init protection, the
arch-specific leftovers can be removed.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Generally minor changes. A bunch of bug fixes, particularly for
initialization and some refactoring. Most notable change if feeding the
entire flattened tree into the random pool at boot. May not be
significant, but shouldn't hurt either.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJSL12LAAoJEEFnBt12D9kB64gP/RBipnYbo3RPanHg+lE/J1V7
KSVFNGKWJHxTg47VVC1YJGIG21jqxAilpdS2MQL5FP7iyd+IzvtHpQiJgp+2G+pq
di06yrdyrYErxRgZgGQi8IpR538ZzOEVLCKJGdb09YelkRzPT5au7CC1MAsX3qco
yba7PHk0/Nc4hZE4aGbgR1DlRmn86ob7mM0KFE/LORaSN2BueMgWcwKhQXYNGyoh
assX4yNhAbUG6Bgw7paBLDGqHh8c5Ei5AppU8yPb+N094jgYHBJryUoDlzzUHD23
qqiEqHhUKT0TpgHNs8KH0WZFugcmjKvYEbzdzadBxqfXnJN4fKSEcdfF3iz4T14j
U6EZks89GoHwA523OghUZkKNOqlsUdWfdKz+8/grQqKisYwDcf3fCxEYk/4weDCQ
b6fFlOv6+AI3btjXp6F511ZKxyT4ZZzkHjp/ZSrhBygyamNZfax0ma0j+ZS9AZql
kPxQS0nOve6NKaP7vXxMmW5sGMnL19ER/Hm31wthGcWI43GVebUdklnzfGaEeSjs
pmP8oiCNemceqVpiPKxcOxiguf/eyIjP1SFXbguASygUmQeTDbbJ8n1FYznCitue
xJgWttKWsEf/aMR3eJtQ3aBmHR3rijAV4E28Wlq8XMkocwvpQm2zMocS2Z5BJ80S
hi1kQVy8+RxNX96tOSp1
=GSWl
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux
Pull device tree core updates from Grant Likely:
"Generally minor changes. A bunch of bug fixes, particularly for
initialization and some refactoring. Most notable change if feeding
the entire flattened tree into the random pool at boot. May not be
significant, but shouldn't hurt either"
Tim Bird questions whether the boot time cost of the random feeding may
be noticeable. And "add_device_randomness()" is definitely not some
speed deamon of a function.
* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux:
of/platform: add error reporting to of_amba_device_create()
irq/of: Fix comment typo for irq_of_parse_and_map
of: Feed entire flattened device tree into the random pool
of/fdt: Clean up casting in unflattening path
of/fdt: Remove duplicate memory clearing on FDT unflattening
gpio: implement gpio-ranges binding document fix
of: call __of_parse_phandle_with_args from of_parse_phandle
of: introduce of_parse_phandle_with_fixed_args
of: move of_parse_phandle()
of: move documentation of of_parse_phandle_with_args
of: Fix missing memory initialization on FDT unflattening
of: consolidate definition of early_init_dt_alloc_memory_arch()
of: Make of_get_phy_mode() return int i.s.o. const int
include: dt-binding: input: create a DT header defining key codes.
of/platform: Staticize of_platform_device_create_pdata()
of: Specify initrd location using 64-bit
dt: Typo fix
OF: make of_property_for_each_{u32|string}() use parameters if OF is not enabled
This helps remove asid-to-mm reverse map
While mm->context.id contains the ASID assigned to a process, our ASID
allocator also used asid_mm_map[] reverse map. In a new allocation
cycle (mm->ASID >= @asid_cache), the Round Robin ASID allocator used this
to check if new @asid_cache belonged to some mm2 (from prev cycle).
If so, it could locate that mm using the ASID reverse map, and mark that
mm as unallocated ASID, to force it to refresh at the time of switch_mm()
However, for SMP, the reverse map has to be maintained per CPU, so
becomes 2 dimensional, hence got rid of it.
With reverse map gone, it is NOT possible to reach out to current
assignee. So we track the ASID allocation generation/cycle and
on every switch_mm(), check if the current generation of CPU ASID is
same as mm's ASID; If not it is refreshed.
(Based loosely on arch/sh implementation)
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
ASID allocation changes/1
This patch does 2 things:
(1) get_new_mmu_context() NOW moves mm->ASID to a new value ONLY if it
was from a prev allocation cycle/generation OR if mm had no ASID
allocated (vs. before would unconditionally moving to a new ASID)
Callers desiring unconditional update of ASID, e.g.local_flush_tlb_mm()
(for parent's address space invalidation at fork) need to first force
the parent to an unallocated ASID.
(2) get_new_mmu_context() always sets the MMU PID reg with unchanged/new
ASID value.
The gains are:
- consolidation of all asid alloc logic into get_new_mmu_context()
- avoiding code duplication in switch_mm() for PID reg setting
- Enables future change to fold activate_mm() into switch_mm()
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
-Asm code already has values of SW and HW ASID values, so they can be
passed to the printing routine.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This reorganizes the current TLB operations into psuedo-ops to better
pair with MMUv4's native Insert/Delete operations
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>