Commit Graph

157 Commits

Author SHA1 Message Date
Kirill A. Shutemov 0ebd744615 arm, thp: remove infrastructure for handling splitting PMDs
With new refcounting we don't need to mark PMDs splitting.  Let's drop
code to handle this.

pmdp_splitting_flush() is not needed too: on splitting PMD we will do
pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
needed for fast_gup.

[arnd@arndb.de: fix unterminated ifdef in header file]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Russell King 598bcc6ea6 Merge branches 'misc' and 'misc-rc6' into for-linus 2016-01-05 11:07:28 +00:00
Nicolas Pitre 42f25bddd0 ARM: 8477/1: runtime patch udiv/sdiv instructions into __aeabi_{u}idiv()
The ARM compiler inserts calls to __aeabi_idiv() and
__aeabi_uidiv() when it needs to perform division on signed and
unsigned integers. If a processor has support for the sdiv and
udiv instructions, the kernel may overwrite the beginning of those
functions with those instructions and a "bx lr" to get better
performance.

To ensure that those functions are aligned to a 32-bit word for easier
patching (which might not always be the case in Thumb mode) and that
the two patched instructions end up in the same cache line, a 8-byte
alignment is enforced when ARM_PATCH_IDIV is selected.

This was heavily inspired by a previous patch from Stephen Boyd.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-12-17 10:29:01 +00:00
Russell King c014953d84 ARM: fix uaccess_with_memcpy() with SW_DOMAIN_PAN
The uaccess_with_memcpy() code is currently incompatible with the SW
PAN code: it takes locks within the region that we've changed the DACR,
potentially sleeping as a result.  As we do not save and restore the
DACR across co-operative sleep events, can lead to an incorrect DACR
value later in this code path.

Reported-by: Peter Rosin <peda@axentia.se>
Tested-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-12-15 11:51:02 +00:00
Stephen Boyd 6f56a68d0b ARM: 8438/1: Add unwinding to __clear_user_std()
Add unwinding annotations so that unwinding from this function
works properly.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-10-03 16:36:45 +01:00
Russell King 40d3f02851 Merge branches 'cleanup', 'fixes', 'misc', 'omap-barrier' and 'uaccess' into for-linus 2015-09-03 15:28:37 +01:00
Russell King a5e090acbf ARM: software-based priviledged-no-access support
Provide a software-based implementation of the priviledged no access
support found in ARMv8.1.

Userspace pages are mapped using a different domain number from the
kernel and IO mappings.  If we switch the user domain to "no access"
when we enter the kernel, we can prevent the kernel from touching
userspace.

However, the kernel needs to be able to access userspace via the
various user accessor functions.  With the wrapping in the previous
patch, we can temporarily enable access when the kernel needs user
access, and re-disable it afterwards.

This allows us to trap non-intended accesses to userspace, eg, caused
by an inadvertent dereference of the LIST_POISON* values, which, with
appropriate user mappings setup, can be made to succeed.  This in turn
can allow use-after-free bugs to be further exploited than would
otherwise be possible.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-08-26 20:34:24 +01:00
Russell King 3fba7e23f7 ARM: uaccess: provide uaccess_save_and_enable() and uaccess_restore()
Provide uaccess_save_and_enable() and uaccess_restore() to permit
control of userspace visibility to the kernel, and hook these into
the appropriate places in the kernel where we need to access
userspace.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-08-25 16:14:43 +01:00
Nicolas Pitre 0f64b247e6 ARM: 8414/1: __copy_to_user_memcpy: fix mmap semaphore usage
The mmap semaphore should not be taken when page faults are disabled.
Since pagefault_disable() no longer disables preemption, we now need
to use faulthandler_disabled() in place of in_atomic().

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-08-18 13:59:59 +01:00
Russell King 1bd46782d0 ARM: avoid unwanted GCC memset()/memcpy() optimisations for IO variants
We don't want GCC optimising our memset_io(), memcpy_fromio() or
memcpy_toio() variants, so we must not call one of the standard
functions.  Provide a separate name for our assembly memcpy() and
memset() functions, and use that instead, thereby bypassing GCC's
ability to optimise these operations.

GCCs optimisation may introduce unaligned accesses which are invalid
for device mappings.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-07-03 20:46:15 +01:00
Linus Torvalds e8a0b37d28 Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:
 "Bigger items included in this update are:

   - A series of updates from Arnd for ARM randconfig build failures
   - Updates from Dmitry for StrongARM SA-1100 to move IRQ handling to
     drivers/irqchip/
   - Move ARMs SP804 timer to drivers/clocksource/
   - Perf updates from Mark Rutland in preparation to move the ARM perf
     code into drivers/ so it can be shared with ARM64.
   - MCPM updates from Nicolas
   - Add support for taking platform serial number from DT
   - Re-implement Keystone2 physical address space switch to conform to
     architecture requirements
   - Clean up ARMv7 LPAE code, which goes in hand with the Keystone2
     changes.
   - L2C cleanups to avoid unlocking caches if we're prevented by the
     secure support to unlock.
   - Avoid cleaning a potentially dirty cache containing stale data on
     CPU initialisation
   - Add ARM-only entry point for secondary startup (for machines that
     can only call into a Thumb kernel in ARM mode).  Same thing is also
     done for the resume entry point.
   - Provide arch_irqs_disabled via asm-generic
   - Enlarge ARMv7M vector table
   - Always use BFD linker for VDSO, as gold doesn't accept some of the
     options we need.
   - Fix an incorrect BSYM (for Thumb symbols) usage, and convert all
     BSYM compiler macros to a "badr" (for branch address).
   - Shut up compiler warnings provoked by our cmpxchg() implementation.
   - Ensure bad xchg sizes fail to link"

* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (75 commits)
  ARM: Fix build if CLKDEV_LOOKUP is not configured
  ARM: fix new BSYM() usage introduced via for-arm-soc branch
  ARM: 8383/1: nommu: avoid deprecated source register on mov
  ARM: 8391/1: l2c: add options to overwrite prefetching behavior
  ARM: 8390/1: irqflags: Get arch_irqs_disabled from asm-generic
  ARM: 8387/1: arm/mm/dma-mapping.c: Add arm_coherent_dma_mmap
  ARM: 8388/1: tcm: Don't crash when TCM banks are protected by TrustZone
  ARM: 8384/1: VDSO: force use of BFD linker
  ARM: 8385/1: VDSO: group link options
  ARM: cmpxchg: avoid warnings from macro-ized cmpxchg() implementations
  ARM: remove __bad_xchg definition
  ARM: 8369/1: ARMv7M: define size of vector table for Vybrid
  ARM: 8382/1: clocksource: make ARM_TIMER_SP804 depend on GENERIC_SCHED_CLOCK
  ARM: 8366/1: move Dual-Timer SP804 driver to drivers/clocksource
  ARM: 8365/1: introduce sp804_timer_disable and remove arm_timer.h inclusion
  ARM: 8364/1: fix BE32 module loading
  ARM: 8360/1: add secondary_startup_arm prototype in header file
  ARM: 8359/1: correct secondary_startup_arm mode
  ARM: proc-v7: sanitise and document registers around errata
  ARM: proc-v7: clean up MIDR access
  ...
2015-06-26 12:20:00 -07:00
Linus Torvalds cb8a4deaf9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "As usual, mostly comment, kerneldoc and printk() fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  lpfc: Grammar s/an negative/a negative/
  ARM: lib/lib1funcs.S: fix typo s/substractions/subtractions/
  cx25821: cx25821-medusa-reg.h: fix 0x0x prefix
  lib: crc-itu-t.[ch] fix 0x0x prefix in integer constants
  rapidio: Fix kerneldoc and comment
  qla4xxx: Fix printk() in qla4_83xx_read_reset_template() and qla4_83xx_pre_loopback_config()
  treewide: Kconfig: fix wording / spelling
  usb/serial: fix grammar in Kconfig help text for FTDI_SIO
  megaraid_sas: fix kerneldoc
  netfilter: ebtables: fix comment grammar
  drm/radeon: fix comment
  isdn: fix grammar in comment
  ARM: KVM: fix comment
2015-06-23 14:08:54 -07:00
Antonio Ospite 82350ab167 ARM: lib/lib1funcs.S: fix typo s/substractions/subtractions/
Signed-off-by: Antonio Ospite <ao2@ao2.it>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-05-26 15:28:34 +02:00
Russell King 14327c6628 ARM: replace BSYM() with badr assembly macro
BSYM() was invented to allow us to work around a problem with the
assembler, where local symbols resolved by the assembler for the 'adr'
instruction did not take account of their ISA.

Since we don't want BSYM() used elsewhere, replace BSYM() with a new
macro 'badr', which is like the 'adr' pseudo-op, but with the BSYM()
mechanics integrated into it.  This ensures that the BSYM()-ification
is only used in conjunction with 'adr'.

Acked-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-05-08 17:33:50 +01:00
Russell King 57ca654bef ARM: ensure delay timer has sufficient accuracy for delays
We have recently had an example of someone wanting to use a 90kHz timer
for the software delay loop.

udelay() needs to have at least microsecond resolution to allow drivers
access to a delay mechanism with a reasonable chance of delaying the
period they requested within at least a 50% marging of error, especially
for small delays.

Discussion about the udelay() accuracy can be found at:
	https://lkml.org/lkml/2011/1/9/37

Reject timers which are unable to supply this level of resolution.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:28:07 +01:00
Ard Biesheuvel c4a84ae39b ARM: 8322/1: keep .text and .fixup regions closer together
This moves all fixup snippets to the .text.fixup section, which is
a special section that gets emitted along with the .text section
for each input object file, i.e., the snippets are kept much closer
to the code they refer to, which helps prevent linker failure on
large kernels.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-03-29 23:11:56 +01:00
Nicolas Pitre c25630381c ARM: 8285/1: remove ARMv3 user access code again
This code was restored with commit 080fc66fb5 ("ARM: Bring back ARMv3 IO
and user access code") because the RiscPC memory bus does not understand
half-word load/stores.  However only the IO code needed restoring since
the alternative user access code contains no half-word accesses, is
already used when CONFIG_PREEMPT is set and runs faster on a StrongARM.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-01-16 14:49:08 +00:00
Lin Yongting 279f487e0b ARM: 8225/1: Add unwinding support for memory copy functions
The memory copy functions(memcpy, __copy_from_user, __copy_to_user)
never had unwinding annotations added. Currently, when accessing
invalid pointer by these functions occurs the backtrace shown will
stop at these functions or some completely unrelated function.
Add unwinding annotations in hopes of getting a more useful backtrace
in following cases:
1. die on accessing invalid pointer by these functions
2. kprobe trapped at any instruction within these functions
3. interrupted at any instruction within these functions

Signed-off-by: Lin Yongting <linyongting@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-11-27 16:00:25 +00:00
Lin Yongting 207a6cb069 ARM: 8224/1: Add unwinding support for memmove function
The memmove function never had unwinding annotations added.
Currently, when accessing invalid pointer by memmove occurs the
backtrace shown will stop at memmove or some completely unrelated
function. Add unwinding annotations in hopes of getting a more
useful backtrace in following cases:
1. die on accessing invalid pointer by memmove
2. kprobe trapped at any instruction within memmove
3. interrupted at any instruction within memmove

Signed-off-by: Lin Yongting <linyongting@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-11-27 16:00:24 +00:00
Lin Yongting 20cb6abfe0 ARM: 8223/1: Add unwinding support for __memzero function
The __memzero function never had unwinding annotations added.
Currently, when accessing invalid pointer by __memzero occurs the
backtrace shown will stop at __memzero or some completely unrelated
function. Add unwinding annotations in hopes of getting a more
useful backtrace in following cases:
1. die on accessing invalid pointer by __memzero
2. kprobe trapped at any instruction within __memzero
3. interrupted at any instruction within __memzero

Signed-off-by: Lin Yongting <linyongting@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-11-27 16:00:23 +00:00
Lin Yongting c2459d35f5 ARM: 8204/1: Add unwinding support for memset function
The memset function never had unwinding annotations added.
Currently, when accessing NULL pointer by memset occurs the
backtrace shown will stop at memset or some completely unrelated
function. Add unwinding annotations in hopes of getting a more
useful backtrace when accessing NULL pointer by memset, kprobe
or interrupt.

Signed-off-by: Lin Yongting <linyongting@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-11-21 15:24:49 +00:00
Victor Kamensky d9981380b4 ARM: 8137/1: fix get_user BE behavior for target variable with size of 8 bytes
e38361d 'ARM: 8091/2: add get_user() support for 8 byte types' commit
broke V7 BE get_user call when target var size is 64 bit, but '*ptr' size
is 32 bit or smaller. e38361d changed type of __r2 from 'register
unsigned long' to 'register typeof(x) __r2 asm("r2")' i.e before the change
even when target variable size was 64 bit, __r2 was still 32 bit.
But after e38361d commit, for target var of 64 bit size, __r2 became 64
bit and now it should occupy 2 registers r2, and r3. The issue in BE case
that r3 register is least significant word of __r2 and r2 register is most
significant word of __r2. But __get_user_4 still copies result into r2 (most
significant word of __r2). Subsequent code copies from __r2 into x, but
for situation described it will pick up only garbage from r3 register.

Special __get_user_64t_(124) functions are introduced. They are similar to
corresponding __get_user_(124) function but result stored in r3 register
(lsw in case of 64 bit __r2 in BE image). Those function are used by
get_user macro in case of BE and target var size is 64bit.

Also changed __get_user_lo8 name into __get_user_32t_8 to get consistent
naming accross all cases.

Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
Suggested-by: Daniel Thompson <daniel.thompson@linaro.org>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-09-12 17:38:59 +01:00
Linus Torvalds b3345d7c57 ARM: SoC platform changes for 3.17
This is the bulk of new SoC enablement and other platform changes for 3.17:
 
 * Samsung S5PV210 has been converted to DT and multiplatform
 * Clock drivers and bindings for some of the lower-end i.MX 1/2 platforms
 * Kirkwood, one of the popular Marvell platforms, is folded into the
   mvebu platform code, removing mach-kirkwood.
 * Hwmod data for TI AM43xx and DRA7 platforms.
 * More additions of Renesas shmobile platform support
 * Removal of plat-samsung contents that can be removed with S5PV210 being
   multiplatform/DT-enabled and the other two old platforms being removed.
 
 New platforms (most with only basic support right now):
 
 * Hisilicon X5HD2 settop box chipset is introduced
 * Mediatek MT6589 (mobile chipset) is introduced
 * Broadcom BCM7xxx settop box chipset is introduced
 
 + as usual a lot other pieces all over the platform code.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJT5Dp+AAoJEIwa5zzehBx3w1sP/0vjT/LQOmC8Lv8RW2Ley2ua
 hNu3HcNPnT/N40JEdU9YNv3q0fdxGgcfKj011CNN+49zPSUf1xduk2wfCAk9yV50
 8Sbt1PfDGm1YyUugGN420CzI431pPoM1OGXHZHkAmg+2J286RtUi3NckB//QDbCY
 QhEjhpYc9SXhAOCGwmB4ab7thOljOFSPzKTLMTu3+PNI5zRPRgkDkt6w9XlsAYmB
 nuR271BnzsROkMzAjycwaJ3kdim7wqrMRfk8g96o0jHSF5qf4zsT5uWYYAjTxdUQ
 8Ajz6zjeHe4+95TwTDcq+lCX6rDLZgwkvCAc6hFbeg0uR7Dyek0h6XMEYtwdjaiU
 KNPwOENrYdENNDAGRpkFp1x4h/rY9Plfru0bBo5o6t7aPBvmNeCDzRtlTtLiUNDV
 dG8sfDMtrS/wFHVjylDSQ60Mb+wuW0XneC8D7chY/iRhIllUYi6YXXvt+/tH5C20
 oYDOWqqcDFSb0sJhE5pn4KBV82ZaHx9jMBWGLl+erg2sDX/SK8SxOkLqKYZKtKB5
 0leOGE3Y+C70xt3G9HftLz2sAvvt+C8UPsApPT+dHNE401TWJOYx6LphPkQKjeeK
 P1iwKi+It3l+FaBypgJy/LeMQRy7EyvDBK2I5WoVL/R2qq14EmP1ui3Tthjj0bhq
 tBBof6P9c8OnRVj1Lz3R
 =5TJ6
 -----END PGP SIGNATURE-----

Merge tag 'soc-for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC platform changes from Olof Johansson:
 "This is the bulk of new SoC enablement and other platform changes for
  3.17:

   - Samsung S5PV210 has been converted to DT and multiplatform
   - Clock drivers and bindings for some of the lower-end i.MX 1/2
     platforms
   - Kirkwood, one of the popular Marvell platforms, is folded into the
     mvebu platform code, removing mach-kirkwood
   - Hwmod data for TI AM43xx and DRA7 platforms
   - More additions of Renesas shmobile platform support
   - Removal of plat-samsung contents that can be removed with S5PV210
     being multiplatform/DT-enabled and the other two old platforms
     being removed

  New platforms (most with only basic support right now):

   - Hisilicon X5HD2 settop box chipset is introduced
   - Mediatek MT6589 (mobile chipset) is introduced
   - Broadcom BCM7xxx settop box chipset is introduced

  + as usual a lot other pieces all over the platform code"

* tag 'soc-for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (240 commits)
  ARM: hisi: remove smp from machine descriptor
  power: reset: move hisilicon reboot code
  ARM: dts: Add hix5hd2-dkb dts file.
  ARM: debug: Rename Hi3716 to HIX5HD2
  ARM: hisi: enable hix5hd2 SoC
  ARM: hisi: add ARCH_HISI
  MAINTAINERS: add entry for Broadcom ARM STB architecture
  ARM: brcmstb: select GISB arbiter and interrupt drivers
  ARM: brcmstb: add infrastructure for ARM-based Broadcom STB SoCs
  ARM: configs: enable SMP in bcm_defconfig
  ARM: add SMP support for Broadcom mobile SoCs
  Documentation: arm: misc updates to Marvell EBU SoC status
  Documentation: arm: add URLs to public datasheets for the Marvell Armada XP SoC
  ARM: mvebu: fix build without platforms selected
  ARM: mvebu: add cpuidle support for Armada 38x
  ARM: mvebu: add cpuidle support for Armada 370
  cpuidle: mvebu: add Armada 38x support
  cpuidle: mvebu: add Armada 370 support
  cpuidle: mvebu: rename the driver from armada-370-xp to mvebu-v7
  ARM: mvebu: export the SCU address
  ...
2014-08-08 11:14:29 -07:00
Daniel Thompson e38361d032 ARM: 8091/2: add get_user() support for 8 byte types
Recent contributions, including to DRM and binder, introduce 64-bit
values in their interfaces. A common motivation for this is to allow
the same ABI for 32- and 64-bit userspaces (and therefore also a shared
ABI for 32/64 hybrid userspaces). Anyhow, the developers would like to
avoid gotchas like having to use copy_from_user().

This feature is already implemented on x86-32 and the majority of other
32-bit architectures. The current list of get_user_8 hold out
architectures are: arm, avr32, blackfin, m32r, metag, microblaze,
mn10300, sh.

Credit:

    My name sits rather uneasily at the top of this patch. The v1 and
    v2 versions of the patch were written by Rob Clark and to produce v4
    I mostly copied code from Russell King and H. Peter Anvin. However I
    have mangled the patch sufficiently that *blame* is rightfully mine
    even if credit should more widely shared.

Changelog:

v5: updated to use the ret macro (requested by Russell King)
v4: remove an inlined add on big endian systems (spotted by Russell King),
    used __ARMEB__ rather than BIG_ENDIAN (to match rest of file),
    cleared r3 on EFAULT during __get_user_8.
v3: fix a couple of checkpatch issues
v2: pass correct size to check_uaccess, and better handling of narrowing
    double word read with __get_user_xb() (Russell King's suggestion)
v1: original

Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-07-18 12:29:34 +01:00
Russell King 6ebbf2ce43 ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+
ARMv6 and greater introduced a new instruction ("bx") which can be used
to return from function calls.  Recent CPUs perform better when the
"bx lr" instruction is used rather than the "mov pc, lr" instruction,
and this sequence is strongly recommended to be used by the ARM
architecture manual (section A.4.1.1).

We provide a new macro "ret" with all its variants for the condition
code which will resolve to the appropriate instruction.

Rather than doing this piecemeal, and miss some instances, change all
the "mov pc" instances to use the new macro, with the exception of
the "movs" instruction and the kprobes code.  This allows us to detect
the "mov pc, lr" case and fix it up - and also gives us the possibility
of deploying this for other registers depending on the CPU selection.

Reported-by: Will Deacon <will.deacon@arm.com>
Tested-by: Stephen Warren <swarren@nvidia.com> # Tegra Jetson TK1
Tested-by: Robert Jarzmik <robert.jarzmik@free.fr> # mioa701_bootresume.S
Tested-by: Andrew Lunn <andrew@lunn.ch> # Kirkwood
Tested-by: Shawn Guo <shawn.guo@freescale.com>
Tested-by: Tony Lindgren <tony@atomide.com> # OMAPs
Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com> # Armada XP, 375, 385
Acked-by: Sekhar Nori <nsekhar@ti.com> # DaVinci
Acked-by: Christoffer Dall <christoffer.dall@linaro.org> # kvm/hyp
Acked-by: Haojian Zhuang <haojian.zhuang@gmail.com> # PXA3xx
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> # Xen
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> # ARMv7M
Tested-by: Simon Horman <horms+renesas@verge.net.au> # Shmobile
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-07-18 12:29:04 +01:00
Peter De Schrijver 5930c1a1f7 ARM: choose highest resolution delay timer
In case there are several possible delay timers, choose the one with the
highest resolution. This code relies on the fact secondary CPUs have not yet
been brought online when register_current_timer_delay() is called. This is
ensured by implementing calibration_delay_done(),

Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Stephen Warren <swarren@nvidia.com>
2014-06-16 12:48:07 -06:00
Victor Kamensky d98b90ea22 ARM: 7990/1: asm: rename logical shift macros push pull into lspush lspull
Renames logical shift macros, 'push' and 'pull', defined in
arch/arm/include/asm/assembler.h, into 'lspush' and 'lspull'.
That eliminates name conflict between 'push' logical shift macro
and 'push' instruction mnemonic. That allows assembler.h to be
included in .S files that use 'push' instruction.

Suggested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-02-25 11:33:57 +00:00
Will Deacon c32ffce0f6 ARM: 7984/1: prefetch: add prefetchw invocations for barriered atomics
After a bunch of benchmarking on the interaction between dmb and pldw,
it turns out that issuing the pldw *after* the dmb instruction can
give modest performance gains (~3% atomic_add_return improvement on a
dual A15).

This patch adds prefetchw invocations to our barriered atomic operations
including cmpxchg, test_and_xxx and futexes.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-02-25 11:30:20 +00:00
Kim Phillips 017f161a55 ARM: 7877/1: use built-in byte swap function
Enable the compiler intrinsic for byte swapping on arch ARM. This
allows the compiler to detect and be able to optimize out byte
swappings, and has a very modest benefit on vmlinux size (Linaro gcc
4.8):

text data bss dec hex filename
2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig
2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap

6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig
6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap

7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig
7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-12-29 12:32:45 +00:00
Russell King ef41b5c924 ARM: make kernel oops easier to read
We don't need the offset for the first function name in each backtrace
entry; this needlessly consumes screen space.  This is virtually always
the first or second instruction in the called function.

Also, recognise stmfd instructions which include r10 as a valid stack
saving instruction, and when dumping the registers, dump six registers
per line rather than five, and fix the wrapping.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-12-29 12:32:30 +00:00
Fabio Estevam 11d4bb1bd0 ARM: 7907/1: lib: delay-loop: Add align directive to fix BogoMIPS calculation
Currently mx53 (CortexA8) running at 1GHz reports:
Calibrating delay loop... 663.55 BogoMIPS (lpj=3317760)

Tom Evans verified that alignments of 0x0 and 0x8 run the two instructions of __loop_delay in one clock cycle (1 clock/loop), while alignments of 0x4 and 0xc take 3 clocks to run the loop twice. (1.5 clock/loop)

The original object code looks like this:

00000010 <__loop_const_udelay>:
  10:	e3e01000 	mvn	r1, #0
  14:	e51f201c 	ldr	r2, [pc, #-28]	; 0 <__loop_udelay-0x8>
  18:	e5922000 	ldr	r2, [r2]
  1c:	e0800921 	add	r0, r0, r1, lsr #18
  20:	e1a00720 	lsr	r0, r0, #14
  24:	e0822b21 	add	r2, r2, r1, lsr #22
  28:	e1a02522 	lsr	r2, r2, #10
  2c:	e0000092 	mul	r0, r2, r0
  30:	e0800d21 	add	r0, r0, r1, lsr #26
  34:	e1b00320 	lsrs	r0, r0, #6
  38:	01a0f00e 	moveq	pc, lr

0000003c <__loop_delay>:
  3c:	e2500001 	subs	r0, r0, #1
  40:	8afffffe 	bhi	3c <__loop_delay>
  44:	e1a0f00e 	mov	pc, lr

After adding the 'align 3' directive to __loop_delay (align to 8 bytes):

00000010 <__loop_const_udelay>:
  10:	e3e01000 	mvn	r1, #0
  14:	e51f201c 	ldr	r2, [pc, #-28]	; 0 <__loop_udelay-0x8>
  18:	e5922000 	ldr	r2, [r2]
  1c:	e0800921 	add	r0, r0, r1, lsr #18
  20:	e1a00720 	lsr	r0, r0, #14
  24:	e0822b21 	add	r2, r2, r1, lsr #22
  28:	e1a02522 	lsr	r2, r2, #10
  2c:	e0000092 	mul	r0, r2, r0
  30:	e0800d21 	add	r0, r0, r1, lsr #26
  34:	e1b00320 	lsrs	r0, r0, #6
  38:	01a0f00e 	moveq	pc, lr
  3c:	e320f000 	nop	{0}

00000040 <__loop_delay>:
  40:	e2500001 	subs	r0, r0, #1
  44:	8afffffe 	bhi	40 <__loop_delay>
  48:	e1a0f00e 	mov	pc, lr
  4c:	e320f000 	nop	{0}

, which now reports:
Calibrating delay loop... 996.14 BogoMIPS (lpj=4980736)

Some more test results:

On mx31 (ARM1136) running at 532 MHz, before the patch:
Calibrating delay loop... 351.43 BogoMIPS (lpj=1757184)

On mx31 (ARM1136) running at 532 MHz after the patch:
Calibrating delay loop... 528.79 BogoMIPS (lpj=2643968)

Also tested on mx6 (CortexA9) and on mx27 (ARM926), which shows the same
BogoMIPS value before and after this patch.

Reported-by: Tom Evans <tom_usenet@optusnet.com.au>
Suggested-by: Tom Evans <tom_usenet@optusnet.com.au>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-11-30 22:21:03 +00:00
Will Deacon b7ec699405 ARM: 7893/1: bitops: only emit .arch_extension mp if CONFIG_SMP
Uwe reported a build failure when targetting a NOMMU platform with my
recent prefetch changes:

  arch/arm/lib/changebit.S: Assembler messages:
  arch/arm/lib/changebit.S:15: Error: architectural extension `mp' is
			not allowed for the current base architecture

This is due to use of the .arch_extension mp directive immediately prior
to an ALT_SMP(...) instruction. Whilst the ALT_SMP macro will expand to
nothing if !CONFIG_SMP, gas will still choke on the directive.

This patch fixes the issue by only emitting the sequence (including the
directive) if CONFIG_SMP=y.

Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-11-20 23:05:53 +00:00
Linus Torvalds f47671e2d8 Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM updates from Russell King:
 "Included in this series are:

   1. BE8 (modern big endian) changes for ARM from Ben Dooks
   2. big.Little support from Nicolas Pitre and Dave Martin
   3. support for LPAE systems with all system memory above 4GB
   4. Perf updates from Will Deacon
   5. Additional prefetching and other performance improvements from Will.
   6. Neon-optimised AES implementation fro Ard.
   7. A number of smaller fixes scattered around the place.

  There is a rather horrid merge conflict in tools/perf - I was never
  notified of the conflict because it originally occurred between Will's
  tree and other stuff.  Consequently I have a resolution which Will
  forwarded me, which I'll forward on immediately after sending this
  mail.

  The other notable thing is I'm expecting some build breakage in the
  crypto stuff on ARM only with Ard's AES patches.  These were merged
  into a stable git branch which others had already pulled, so there's
  little I can do about this.  The problem is caused because these
  patches have a dependency on some code in the crypto git tree - I
  tried requesting a branch I can pull to resolve these, and all I got
  each time from the crypto people was "we'll revert our patches then"
  which would only make things worse since I still don't have the
  dependent patches.  I've no idea what's going on there or how to
  resolve that, and since I can't split these patches from the rest of
  this pull request, I'm rather stuck with pushing this as-is or
  reverting Ard's patches.

  Since it should "come out in the wash" I've left them in - the only
  build problems they seem to cause at the moment are with randconfigs,
  and since it's a new feature anyway.  However, if by -rc1 the
  dependencies aren't in, I think it'd be best to revert Ard's patches"

I resolved the perf conflict roughly as per the patch sent by Russell,
but there may be some differences.  Any errors are likely mine.  Let's
see how the crypto issues work out..

* 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (110 commits)
  ARM: 7868/1: arm/arm64: remove atomic_clear_mask() in "include/asm/atomic.h"
  ARM: 7867/1: include: asm: use 'int' instead of 'unsigned long' for 'oldval' in atomic_cmpxchg().
  ARM: 7866/1: include: asm: use 'long long' instead of 'u64' within atomic.h
  ARM: 7871/1: amba: Extend number of IRQS
  ARM: 7887/1: Don't smp_cross_call() on UP devices in arch_irq_work_raise()
  ARM: 7872/1: Support arch_irq_work_raise() via self IPIs
  ARM: 7880/1: Clear the IT state independent of the Thumb-2 mode
  ARM: 7878/1: nommu: Implement dummy early_paging_init()
  ARM: 7876/1: clear Thumb-2 IT state on exception handling
  ARM: 7874/2: bL_switcher: Remove cpu_hotplug_driver_{lock,unlock}()
  ARM: footbridge: fix build warnings for netwinder
  ARM: 7873/1: vfp: clear vfp_current_hw_state for dying cpu
  ARM: fix misplaced arch_virt_to_idmap()
  ARM: 7848/1: mcpm: Implement cpu_kill() to synchronise on powerdown
  ARM: 7847/1: mcpm: Factor out logical-to-physical CPU translation
  ARM: 7869/1: remove unused XSCALE_PMU Kconfig param
  ARM: 7864/1: Handle 64-bit memory in case of 32-bit phys_addr_t
  ARM: 7863/1: Let arm_add_memory() always use 64-bit arguments
  ARM: 7862/1: pcpu: replace __get_cpu_var_uses
  ARM: 7861/1: cacheflush: consolidate single-CPU ARMv7 cache disabling code
  ...
2013-11-14 08:51:29 +09:00
Russell King df762eccba Merge branch 'devel-stable' into for-next
Conflicts:
	arch/arm/include/asm/atomic.h
	arch/arm/include/asm/hardirq.h
	arch/arm/kernel/smp.c
2013-11-12 10:58:59 +00:00
Steven Capper a3a9ea656d ARM: 7858/1: mm: make UACCESS_WITH_MEMCPY huge page aware
The memory pinning code in uaccess_with_memcpy.c does not check
for HugeTLB or THP pmds, and will enter an infinite loop should
a __copy_to_user or __clear_user occur against a huge page.

This patch adds detection code for huge pages to pin_page_for_write.
As this code can be executed in a fast path it refers to the actual
pmds rather than the vma. If a HugeTLB or THP is found (they have
the same pmd representation on ARM), the page table spinlock is
taken to prevent modification whilst the page is pinned.

On ARM, huge pages are only represented as pmds, thus no huge pud
checks are performed. (For huge puds one would lock the page table
in a similar manner as in the pmd case).

Two helper functions are introduced; pmd_thp_or_huge will check
whether or not a page is huge or transparent huge (which have the
same pmd layout on ARM), and pmd_hugewillfault will detect whether
or not a page fault will occur on write to the page.

Running the following test (with the chunking from read_zero
removed):
 $ dd if=/dev/zero of=/dev/null bs=10M count=1024
Gave:  2.3 GB/s backed by normal pages,
       2.9 GB/s backed by huge pages,
       5.1 GB/s backed by huge pages, with page mask=HPAGE_MASK.

After some discussion, it was decided not to adopt the HPAGE_MASK,
as this would have a significant detrimental effect on the overall
system latency due to page_table_lock being held for too long.
This could be revisited if split huge page locks are adopted.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-10-29 11:06:15 +00:00
Will Deacon d779c07dd7 ARM: bitops: prefetch the destination word for write prior to strex
The cost of changing a cacheline from shared to exclusive state can be
significant, especially when this is triggered by an exclusive store,
since it may result in having to retry the transaction.

This patch prefixes our atomic bitops implementation with prefetchw,
to try and grab the line in exclusive state from the start. The testop
macro is left alone, since the barrier semantics limit the usefulness
of prefetching data.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2013-09-30 16:42:56 +01:00
Linus Walleij 136dfa5eda ARM: delete mach-shark
The Shark machine sub-architecture (also known as DNARD, the
DIGITAL Network Appliance Reference Design) lacks a maintainer
able to apply and test patches to modernize the architecture.

It is suspected that the current kernel, while it compiles,
does not even boot on this machine. The listed maintainer has
expressed that he will not be able to spend any time on the
maintenance for the coming year.

So let's delete it from the kernel for now. It can always be
resurrected with git revert if maintenance is resumed.

As the VIA82c505 PCI adapter was only used by this
architecture, that gets deleted too.

Cc: arm@kernel.org
Cc: Alexander Schulz <alex@shark-linux.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-09-17 12:34:36 +02:00
Ard Biesheuvel 9319206d71 ARM: 7835/2: fix modular build of xor_blocks() with NEON enabled
Commit 0195659 introduced a NEON accelerated version of the xor_blocks()
function, but it needs the changes in this patch to allow it to be built
as a module rather than statically into the kernel.

This patch creates a separate module xor-neon.ko which exports the NEON
inner xor_blocks() functions depended upon by the regular xor.ko if it
is built with CONFIG_KERNEL_MODE_NEON=y

Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-09-09 15:24:47 +01:00
Russell King b4f656eea6 Pull branch 'for-rmk' of git://git.linaro.org/people/ardbiesheuvel/linux-arm into devel-stable
Comments from Ard Biesheuvel:

I have included two use cases that I have been using, XOR and RAID-6
checksumming. The former gets a 60% performance boost on the NEON, the
latter over 400%.

ARM: add support for kernel mode NEON

Adds kernel_neon_begin/end (renamed from kernel_vfp_begin/end in the
previous version to de-emphasize the VFP part as VFP code that needs
software assistance is not supported currently.)

Introduces <asm/neon.h> and the Kconfig symbol KERNEL_MODE_NEON. This
has been aligned with Catalin for arm64, so any NEON code that does
not use assembly but intrinsics or the GCC vectorizer (such as my
examples) can potentially be shared between arm and arm64 archs.

ARM: move VFP init to an earlier boot stage

This is needed so the NEON is enabled when the XOR and RAID-6 algo
boot time benchmarks are run.

ARM: be strict about FP exceptions in kernel mode

This adds a check to vfp_support_entry() to flag unsupported uses of
the NEON/VFP in kernel mode. FP exceptions (bounces) are flagged as
a bug, this is because of their potentially intermittent nature.
Exceptions caused by the fact that kernel_neon_begin has not been
called are just routed through the undef handler.

ARM: crypto: add NEON accelerated XOR implementation

This is the xor_blocks() implementation built with -ftree-vectorize,
60% faster than optimized ARM code. It calls in_interrupt() to check
whether the NEON flavor can be used: this should really not be
necessary, but due to xor_blocks'squite generic nature, there is no
telling how exactly people may be using it in the real world.

lib/raid6: add ARM-NEON accelerated syndrome calculation

This is a port of the RAID-6 checksumming code in altivec.uc ported
to use NEON intrinsics. It is about 4x faster than the sequential
code.
2013-07-22 17:46:40 +01:00
Paul Gortmaker 8bd26e3a7e arm: delete __cpuinit/__CPUINIT usage from all ARM users
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications.  For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out.  Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
and are flagged as __cpuinit  -- so if we remove the __cpuinit from
the arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
related content into no-ops as early as possible, since that will get
rid of these warnings.  In any case, they are temporary and harmless.

This removes all the ARM uses of the __cpuinit macros from C code,
and all __CPUINIT from assembly code.  It also had two ".previous"
section statements that were paired off against __CPUINIT
(aka .section ".cpuinit.text") that also get removed here.

[1] https://lkml.org/lkml/2013/5/20/589

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-07-14 19:36:52 -04:00
Ard Biesheuvel 01956597cb ARM: crypto: add NEON accelerated XOR implementation
Add a source file xor-neon.c (which is really just the reference
C implementation passed through the GCC vectorizer) and hook it
up to the XOR framework.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
2013-07-08 22:09:06 +01:00
Will Deacon 6f3d90e556 ARM: 7685/1: delay: use private ticks_per_jiffy field for timer-based delay ops
Commit 70264367a2 ("ARM: 7653/2: do not scale loops_per_jiffy when
using a constant delay clock") fixed a problem with our timer-based
delay loop, where loops_per_jiffy is scaled by cpufreq yet used directly
by the timer delay ops.

This patch fixes the problem in a more elegant way by keeping a private
ticks_per_jiffy field in the delay ops, independent of loops_per_jiffy
and therefore not subject to scaling. The loop-based delay continues to
use loops_per_jiffy directly, as it should.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-04-03 16:45:50 +01:00
Nicolas Pitre 418df63ada ARM: 7670/1: fix the memset fix
Commit 455bd4c430 ("ARM: 7668/1: fix memset-related crashes caused by
recent GCC (4.7.2) optimizations") attempted to fix a compliance issue
with the memset return value.  However the memset itself became broken
by that patch for misaligned pointers.

This fixes the above by branching over the entry code from the
misaligned fixup code to avoid reloading the original pointer.

Also, because the function entry alignment is wrong in the Thumb mode
compilation, that fixup code is moved to the end.

While at it, the entry instructions are slightly reworked to help dual
issue pipelines.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-03-12 12:18:47 +00:00
Ivan Djelic 455bd4c430 ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimizations
Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
assumptions about the implementation of memset and similar functions.
The current ARM optimized memset code does not return the value of
its first argument, as is usually expected from standard implementations.

For instance in the following function:

void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
{
	memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
	waiter->magic = waiter;
	INIT_LIST_HEAD(&waiter->list);
}

compiled as:

800554d0 <debug_mutex_lock_common>:
800554d0:       e92d4008        push    {r3, lr}
800554d4:       e1a00001        mov     r0, r1
800554d8:       e3a02010        mov     r2, #16 ; 0x10
800554dc:       e3a01011        mov     r1, #17 ; 0x11
800554e0:       eb04426e        bl      80165ea0 <memset>
800554e4:       e1a03000        mov     r3, r0
800554e8:       e583000c        str     r0, [r3, #12]
800554ec:       e5830000        str     r0, [r3]
800554f0:       e5830004        str     r0, [r3, #4]
800554f4:       e8bd8008        pop     {r3, pc}

GCC assumes memset returns the value of pointer 'waiter' in register r0; causing
register/memory corruptions.

This patch fixes the return value of the assembly version of memset.
It adds a 'mov' instruction and merges an additional load+store into
existing load/store instructions.
For ease of review, here is a breakdown of the patch into 4 simple steps:

Step 1
======
Perform the following substitutions:
ip -> r8, then
r0 -> ip,
and insert 'mov ip, r0' as the first statement of the function.
At this point, we have a memset() implementation returning the proper result,
but corrupting r8 on some paths (the ones that were using ip).

Step 2
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:

save r8:
-       str     lr, [sp, #-4]!
+       stmfd   sp!, {r8, lr}

and restore r8 on both exit paths:
-       ldmeqfd sp!, {pc}               @ Now <64 bytes to go.
+       ldmeqfd sp!, {r8, pc}           @ Now <64 bytes to go.
(...)
        tst     r2, #16
        stmneia ip!, {r1, r3, r8, lr}
-       ldr     lr, [sp], #4
+       ldmfd   sp!, {r8, lr}

Step 3
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 0:

save r8:
-       stmfd   sp!, {r4-r7, lr}
+       stmfd   sp!, {r4-r8, lr}

and restore r8 on both exit paths:
        bgt     3b
-       ldmeqfd sp!, {r4-r7, pc}
+       ldmeqfd sp!, {r4-r8, pc}
(...)
        tst     r2, #16
        stmneia ip!, {r4-r7}
-       ldmfd   sp!, {r4-r7, lr}
+       ldmfd   sp!, {r4-r8, lr}

Step 4
======
Rewrite register list "r4-r7, r8" as "r4-r8".

Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-03-07 16:14:22 +00:00
Nicolas Pitre 70264367a2 ARM: 7653/2: do not scale loops_per_jiffy when using a constant delay clock
When udelay() is implemented using an architected timer, it is wrong
to scale loops_per_jiffy when changing the CPU clock frequency since
the timer clock remains constant.

The lpj should probably become an implementation detail relevant to
the CPU loop based delay routine only and more confined to it. In the
mean time this is the minimal fix needed to have expected delays with
the timer based implementation when cpufreq is also in use.

Reported-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-02-21 13:25:36 +00:00
Arnd Bergmann f3accb122f ARM: export default read_current_timer
read_current_timer is used by get_cycles since "ARM: 7538/1: delay:
add registration mechanism for delay timer sources", and get_cycles
can be used by device drivers in loadable modules, so it has to
be exported.

Without this patch, building imote2_defconfig fails with

ERROR: "read_current_timer" [crypto/tcrypt.ko] undefined!

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Jonathan Austin <jonathan.austin@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
2012-10-09 20:24:36 +02:00
Russell King ceaa1a13c0 Merge branch 'arch-timers' into for-linus
Conflicts:
	arch/arm/include/asm/timex.h
	arch/arm/lib/delay.c
2012-10-04 23:02:26 +01:00
Jonathan Austin 56942fec06 ARM: 7538/1: delay: add registration mechanism for delay timer sources
The current timer-based delay loop relies on the architected timer to
initiate the switch away from the polling-based implementation. This is
unfortunate for platforms without the architected timers but with a
suitable delay source (that is, constant frequency, always powered-up
and ticking as long as the CPUs are online).

This patch introduces a registration mechanism for the delay timer
(which provides an unconditional read_current_timer implementation) and
updates the architected timer code to use the new interface.

Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Jonathan Austin <jonathan.austin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-09-26 22:57:52 +01:00
Will Deacon beafa0de3d ARM: 7529/1: delay: set loops_per_jiffy when moving to timer-based loop
The delay functions may be called by some platforms between switching to
the timer-based delay loop but before calibration. In this case, the
initial loops_per_jiffy may not be suitable for the timer (although a
compromise may be achievable) and delay times may be considered too
inaccurate.

This patch updates loops_per_jiffy when switching to the timer-based
delay loop so that delays are consistent prior to calibration.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-09-09 17:28:48 +01:00
Russell King 8404663f81 ARM: 7527/1: uaccess: explicitly check __user pointer when !CPU_USE_DOMAINS
The {get,put}_user macros don't perform range checking on the provided
__user address when !CPU_HAS_DOMAINS.

This patch reworks the out-of-line assembly accessors to check the user
address against a specified limit, returning -EFAULT if is is out of
range.

[will: changed get_user register allocation to match put_user]
[rmk: fixed building on older ARM architectures]

Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-09-09 17:28:47 +01:00