linux/arch/arm/lib/Makefile

#
# linux/arch/arm/lib/Makefile
#
# Copyright (C) 1995-2000 Russell King
#

lib-y		:= backtrace.o changebit.o csumipv6.o csumpartial.o   \
		   csumpartialcopy.o csumpartialcopyuser.o clearbit.o \
		   delay.o delay-loop.o findbit.o memchr.o memcpy.o   \
		   memmove.o memset.o memzero.o setbit.o              \
		   strchr.o strrchr.o                                 \
		   testchangebit.o testclearbit.o testsetbit.o        \
		   ashldi3.o ashrdi3.o lshrdi3.o muldi3.o             \
		   ucmpdi2.o lib1funcs.o div64.o                      \
		   io-readsb.o io-writesb.o io-readsl.o io-writesl.o  \
		   call_with_stack.o bswapsdi2.o

mmu-y	:= clear_user.o copy_page.o getuser.o putuser.o

# the code in uaccess.S is not preemption safe and
# probably faster on ARMv3 only
ifeq ($(CONFIG_PREEMPT),y)
  mmu-y	+= copy_from_user.o copy_to_user.o
else
ifneq ($(CONFIG_CPU_32v3),y)
  mmu-y	+= copy_from_user.o copy_to_user.o
else
  mmu-y	+= uaccess.o
endif
endif

# using lib_ here won't override already available weak symbols
obj-$(CONFIG_UACCESS_WITH_MEMCPY) += uaccess_with_memcpy.o

lib-$(CONFIG_MMU) += $(mmu-y)

ifeq ($(CONFIG_CPU_32v3),y)
  lib-y	+= io-readsw-armv3.o io-writesw-armv3.o
else
  lib-y	+= io-readsw-armv4.o io-writesw-armv4.o
endif

lib-$(CONFIG_ARCH_RPC)		+= ecard.o io-acorn.o floppydma.o

$(obj)/csumpartialcopy.o:	$(obj)/csumpartialcopygeneric.S
$(obj)/csumpartialcopyuser.o:	$(obj)/csumpartialcopygeneric.S

ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
  NEON_FLAGS			:= -mfloat-abi=softfp -mfpu=neon
  CFLAGS_xor-neon.o		+= $(NEON_FLAGS)
  obj-$(CONFIG_XOR_BLOCKS)	+= xor-neon.o
endif
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-17 06:20:36 +08:00			`#`
			`# linux/arch/arm/lib/Makefile`
			`#`
			`# Copyright (C) 1995-2000 Russell King`
			`#`

			`lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \`
			`csumpartialcopy.o csumpartialcopyuser.o clearbit.o \`
ARM: 7452/1: delay: allow timer-based delay implementation to be selected This patch allows a timer-based delay implementation to be selected by switching the delay routines over to use get_cycles, which is implemented in terms of read_current_timer. This further allows us to skip the loop calibration and have a consistent delay function in the face of core frequency scaling. To avoid the pain of dealing with memory-mapped counters, this implementation uses the co-processor interface to the architected timers when they are available. The previous loop-based implementation is kept around for CPUs without the architected timers and we retain both the maximum delay (2ms) and the corresponding conversion factors for determining the number of loops required for a given interval. Since the indirection of the timer routines will only work when called from C, the sa1100 sleep routines are modified to branch to the loop-based delay functions directly. Tested-by: Shinya Kuribayashi <shinya.kuribayashi.px@renesas.com> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> 2012-07-06 22:47:17 +08:00			`delay.o delay-loop.o findbit.o memchr.o memcpy.o \`
[ARM] 2947/1: copy template with new memcpy/memmove Patch from Nicolas Pitre This patch provides a new implementation for optimized memory copy functions on ARM. It is made of two levels: a template that consists of the core copy code and separate files that define macros to be used with the core code depending on the type of copy needed. This allows for best performances while sharing the same core for implementing memcpy(), copy_from_user() and copy_to_user() for instance. Two reasons for this work: 1) the current copy_to_user/copy_from_user implementation assumes no task switch will ever occur in the middle of each copied page making it completely unsafe with CONFIG_PREEMPT=y. 2) current copy implementations are measurably suboptimal and optimizing different implementations separately is a pain and more opportunities for bugs. The reason for (1) is the fact that copy inside user pages are performed with the ldm instruction which has no mean for testing user protections and could possibly race with process preemption bypassing the COW mechanism for example. This is a longstanding issue that we said ought to be fixed for about two years now. The solution is to substitute those ldm insns with a series of ldrt or strt insns to enforce user memory protection. At least on StrongARM and XScale cores the ldm is not faster than the equivalent ldr/str insns with a warm i-cache so there is no measurable performance degradation with that change. The fact that the copy code is a template makes it pretty easy to reuse the same core code as for memcpy and benefit from the same performance optimizations. Now (2) is best demonstrated with actual throughput measurements. First, here is a summary of memcopy tests performed on a StrongARM core: PTR alignment buffer size kernel version this version ------------------------------------------------------------ aligned 32 59.73 107.43 unaligned 32 61.31 74.72 aligned 100 132.47 136.15 unaligned 100 103.84 123.76 aligned 4096 130.67 130.80 unaligned 4096 130.68 130.64 aligned 1048576 68.03 68.18 unaligned 1048576 68.03 68.18 The buffer size is in bytes and the measured speed in MB/s. The copy was performed repeatedly with given buffer and throughput averaged over 3 seconds. Here we can see that the current kernel version has a higher entry cost that shows up with small buffers. As buffer size grows both implementation converge to the same throughput. Now here's the exact same test performed on an XScale core (PXA255): PTR alignment buffer size kernel version this version ------------------------------------------------------------ aligned 32 46.99 77.58 unaligned 32 53.61 59.59 aligned 100 107.19 136.59 unaligned 100 83.61 97.58 aligned 4096 129.13 129.98 unaligned 4096 128.36 128.53 aligned 1048576 53.76 59.41 unaligned 1048576 33.67 56.96 Again we can see the entry setup cost being higher for the current kernel before getting to the main copy loop. Then throughput results converge as long as the buffer remains in the cache. Then the 1MB case shows more differences probably due to better pld placement and/or less instruction interlocks in this proposed implementation. Disclaimer: The PXA system was running with slower clocks than the StrongARM system so trying to infer any conclusion by comparing those separate sets of results side by side would be completely inappropriate. So... What this patch does is to replace both memcpy and memmove with an implementation based on the provided copy code template. The memmove code is kept separate since it is used only if the memory areas involved do overlap in which case the code is a transposition of the template but with the copy occurring in the opposite direction (trying to fit that mode into the template turned it into a mess not worth it for memmove alone). And obviously both memcpy and memmove were tested with all kinds of pointer alignments and buffer sizes to exercise all code paths for correctness. The next patch will provide the now trivial replacement implementation copy_to_user and copy_from_user. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> 2005-11-02 03:52:23 +08:00			`memmove.o memset.o memzero.o setbit.o \`
[ARM] 2948/1: new preemption safe copy_{to\|from}_user implementation Patch from Nicolas Pitre This patch provides a preemption safe implementation of copy_to_user and copy_from_user based on the copy template also used for memcpy. It is enabled unconditionally when CONFIG_PREEMPT=y. Otherwise if the configured architecture is not ARMv3 then it is enabled as well as it gives better performances at least on StrongARM and XScale cores. If ARMv3 is not too affected or if it doesn't matter too much then uaccess.S could be removed altogether. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> 2005-11-02 03:52:24 +08:00			`strchr.o strrchr.o \`
			`testchangebit.o testclearbit.o testsetbit.o \`
[ARM] 2946/2: split --arch_clear_user() out of lib/uaccess.S Patch from Nicolas Pitre Required for future enhancement patches. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> 2005-11-02 03:52:22 +08:00			`ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \`
arm: remove "optimized" SHA1 routines Since commit 1eb19a12bd22 ("lib/sha1: use the git implementation of SHA-1"), the ARM SHA1 routines no longer work. The reason? They depended on the larger 320-byte workspace, and now the sha1 workspace is just 16 words (64 bytes). So the assembly version would overwrite the stack randomly. The optimized asm version is also probably slower than the new improved C version, so there's no reason to keep it around. At least that was the case in git, where what appears to be the same assembly language version was removed two years ago because the optimized C BLK_SHA1 code was faster. Reported-and-tested-by: Joachim Eastwood <manabian@gmail.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2011-08-08 05:07:03 +08:00			`ucmpdi2.o lib1funcs.o div64.o \`
ARM: lib: add call_with_stack function for safely changing stack When disabling the MMU, it is necessary to take out a 1:1 identity map of the reset code so that it can safely be executed with and without the MMU active. To avoid the situation where the physical address of the reset code aliases with the virtual address of the active stack (which cannot be included in the 1:1 mapping), it is desirable to change to a new stack at a location which is less likely to alias. This code adds a new lib function, call_with_stack: void call_with_stack(void (fn)(void ), void arg, void sp); which changes the stack to point at the sp parameter, before invoking fn(arg) with the new stack selected. Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Dave Martin <dave.martin@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> 2011-06-08 22:29:00 +08:00			`io-readsb.o io-writesb.o io-readsl.o io-writesl.o \`
ARM: 7877/1: use built-in byte swap function Enable the compiler intrinsic for byte swapping on arch ARM. This allows the compiler to detect and be able to optimize out byte swappings, and has a very modest benefit on vmlinux size (Linaro gcc 4.8): text data bss dec hex filename 2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig 2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap 6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig 6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap 7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig 7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Reviewed-by: Nicolas Pitre <nico@linaro.org> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> 2013-11-06 12:15:24 +08:00			`call_with_stack.o bswapsdi2.o`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-17 06:20:36 +08:00
[ARM] nommu: uaccess tweaks MMUless systems have only one address space for all threads, so both the usual access_ok() checks, and the exception handling do not make much sense. Hence, discard the fixup and exception tables at link time, use memcpy/memset for the user copy/clearing functions, and define the permission check macros to be constants. Some of this patch was derived from the equivalent patch by Hyok S. Choi. Signed-off-by: Hyok S. Choi <hyok.choi@samsung.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> 2006-06-22 03:38:17 +08:00			`mmu-y := clear_user.o copy_page.o getuser.o putuser.o`
ARM: Bring back ARMv3 IO and user access code This partially reverts 357c9c1f07d4546bc3fbc0fd1044d96b114d14ed (ARM: Remove support for ARMv3 ARM610 and ARM710 CPUs). Although we only support StrongARM on the RiscPC, we need to keep the ARMv3 user access code for this platform because the bus does not understand half-word load/stores. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> 2012-08-13 18:44:13 +08:00
			`# the code in uaccess.S is not preemption safe and`
			`# probably faster on ARMv3 only`
			`ifeq ($(CONFIG_PREEMPT),y)`
			`mmu-y += copy_from_user.o copy_to_user.o`
			`else`
			`ifneq ($(CONFIG_CPU_32v3),y)`
			`mmu-y += copy_from_user.o copy_to_user.o`
			`else`
			`mmu-y += uaccess.o`
			`endif`
			`endif`
[ARM] 2948/1: new preemption safe copy_{to\|from}_user implementation Patch from Nicolas Pitre This patch provides a preemption safe implementation of copy_to_user and copy_from_user based on the copy template also used for memcpy. It is enabled unconditionally when CONFIG_PREEMPT=y. Otherwise if the configured architecture is not ARMv3 then it is enabled as well as it gives better performances at least on StrongARM and XScale cores. If ARMv3 is not too affected or if it doesn't matter too much then uaccess.S could be removed altogether. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> 2005-11-02 03:52:24 +08:00
[ARM] alternative copy_to_user/clear_user implementation This implements {copy_to,clear}_user() by faulting in the userland pages and then using the regular kernel mem{cpy,set}() to copy the data (while holding the page table lock). This is a win if the regular mem{cpy,set}() implementations are faster than the user copy functions, which is the case e.g. on Feroceon, where 8-word STMs (which memcpy() uses under the right conditions) give significantly higher memory write throughput than a sequence of individual 32bit stores. Here are numbers for page sized buffers on some Feroceon cores: - copy_to_user on Orion5x goes from 51 MB/s to 83 MB/s - clear_user on Orion5x goes from 89MB/s to 314MB/s - copy_to_user on Kirkwood goes from 240 MB/s to 356 MB/s - clear_user on Kirkwood goes from 367 MB/s to 1108 MB/s - copy_to_user on Disco-Duo goes from 248 MB/s to 398 MB/s - clear_user on Disco-Duo goes from 328 MB/s to 1741 MB/s Because the setup cost is non negligible, this is worthwhile only if the amount of data to copy is large enough. The operation falls back to the standard implementation when the amount of data is below a certain threshold. This threshold was determined empirically, however some targets could benefit from a lower runtime determined value for optimal results eventually. In the copy_from_user() case, this technique does not provide any worthwhile performance gain due to the fact that any kind of read access allocates the cache and subsequent 32bit loads are just as fast as the equivalent 8-word LDM. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: Nicolas Pitre <nico@marvell.com> Tested-by: Martin Michlmayr <tbm@cyrius.com> 2009-03-10 02:30:09 +08:00			`# using lib_ here won't override already available weak symbols`
			`obj-$(CONFIG_UACCESS_WITH_MEMCPY) += uaccess_with_memcpy.o`

ARM: Bring back ARMv3 IO and user access code This partially reverts 357c9c1f07d4546bc3fbc0fd1044d96b114d14ed (ARM: Remove support for ARMv3 ARM610 and ARM710 CPUs). Although we only support StrongARM on the RiscPC, we need to keep the ARMv3 user access code for this platform because the bus does not understand half-word load/stores. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> 2012-08-13 18:44:13 +08:00			`lib-$(CONFIG_MMU) += $(mmu-y)`

			`ifeq ($(CONFIG_CPU_32v3),y)`
			`lib-y += io-readsw-armv3.o io-writesw-armv3.o`
			`else`
			`lib-y += io-readsw-armv4.o io-writesw-armv4.o`
			`endif`

Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-17 06:20:36 +08:00			`lib-$(CONFIG_ARCH_RPC) += ecard.o io-acorn.o floppydma.o`

			`$(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S`
			`$(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S`
ARM: crypto: add NEON accelerated XOR implementation Add a source file xor-neon.c (which is really just the reference C implementation passed through the GCC vectorizer) and hook it up to the XOR framework. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Nicolas Pitre <nico@linaro.org> 2013-05-18 00:51:23 +08:00
			`ifeq ($(CONFIG_KERNEL_MODE_NEON),y)`
			`NEON_FLAGS := -mfloat-abi=softfp -mfpu=neon`
			`CFLAGS_xor-neon.o += $(NEON_FLAGS)`
ARM: 7835/2: fix modular build of xor_blocks() with NEON enabled Commit 0195659 introduced a NEON accelerated version of the xor_blocks() function, but it needs the changes in this patch to allow it to be built as a module rather than statically into the kernel. This patch creates a separate module xor-neon.ko which exports the NEON inner xor_blocks() functions depended upon by the regular xor.ko if it is built with CONFIG_KERNEL_MODE_NEON=y Reported-by: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> 2013-09-09 22:08:38 +08:00			`obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o`
ARM: crypto: add NEON accelerated XOR implementation Add a source file xor-neon.c (which is really just the reference C implementation passed through the GCC vectorizer) and hook it up to the XOR framework. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Nicolas Pitre <nico@linaro.org> 2013-05-18 00:51:23 +08:00			`endif`