Commit Graph

7 Commits

Author SHA1 Message Date
Nicolas Pitre 73e592f3bc ARM: 8504/1: __arch_xprod_64(): small optimization
The tmp variable is used twice: first to pose as a register containing
a value of zero, and then to provide a temporary register that initially
is zero and get added some value. But somehow gcc decides to split those
two usages in different registers.

Example code:

u64 div64const1000(u64 x)
{
	u32 y = 1000;
	do_div(x, y);
	return x;
}

Result:

div64const1000:
	push	{r4, r5, r6, r7, lr}
	mov	lr, #0
	mov	r6, r0
	mov	r7, r1
	adr	r5, .L8
	ldrd	r4, [r5]
	mov	r1, lr
	umull	r2, r3, r4, r6
	cmn	r2, r4
	adcs	r3, r3, r5
	adc	r2, lr, #0
	umlal	r3, r2, r5, r6
	umlal	r3, r1, r4, r7
	mov	r3, #0
	adds	r2, r1, r2
	adc	r3, r3, #0
	umlal	r2, r3, r5, r7
	lsr	r0, r2, #9
	lsr	r1, r3, #9
	orr	r0, r0, r3, lsl #23
	pop	{r4, r5, r6, r7, pc}
	.align	3
.L8:
	.word	-1924145349
	.word	-2095944041

Full kernel build size:

   text	   data	    bss	    dec	    hex	filename
13663814	1553940	 351368	15569122	 ed90e2	vmlinux

Here the two instances of 'tmp' are assigned to r1 and lr.

To avoid that, let's mark the first 'tmp' usage in __arch_xprod_64()
with a "+r" constraint even if the register is not written to, so to
create a dependency for the second usage with the effect of enforcing
a single temporary register throughout.

Result:

div64const1000:
	push	{r4, r5, r6, r7}
	movs	r3, #0
	adr	r5, .L8
	ldrd	r4, [r5]
	umull	r6, r7, r4, r0
	cmn	r6, r4
	adcs	r7, r7, r5
	adc	r6, r3, #0
	umlal	r7, r6, r5, r0
	umlal	r7, r3, r4, r1
	mov	r7, #0
	adds	r6, r3, r6
	adc	r7, r7, #0
	umlal	r6, r7, r5, r1
	lsr	r0, r6, #9
	lsr	r1, r7, #9
	orr	r0, r0, r7, lsl #23
	pop	{r4, r5, r6, r7}
	bx	lr
	.align	3
.L8:
	.word	-1924145349
	.word	-2095944041

   text	   data	    bss	    dec	    hex	filename
13663438	1553940	 351368	15568746	 ed8f6a	vmlinux

This time 'tmp' is assigned to r3 and used throughout. However, by being
assigned to r3, that blocks usage of the r2-r3 double register slot for
64-bit values, forcing more registers to be spilled on the stack. Let's
try to help it by forcing 'tmp' to the caller-saved ip register.

Result:

div64const1000:
	stmfd	sp!, {r4, r5}
	mov	ip, #0
	adr	r5, .L8
	ldrd	r4, [r5]
	umull	r2, r3, r4, r0
	cmn	r2, r4
	adcs	r3, r3, r5
	adc	r2, ip, #0
	umlal	r3, r2, r5, r0
	umlal	r3, ip, r4, r1
	mov	r3, #0
	adds	r2, ip, r2
	adc	r3, r3, #0
	umlal	r2, r3, r5, r1
	mov	r0, r2, lsr #9
	mov	r1, r3, lsr #9
	orr	r0, r0, r3, asl #23
	ldmfd	sp!, {r4, r5}
	bx	lr
	.align	3
.L8:
	.word	-1924145349
	.word	-2095944041

   text	   data	    bss	    dec	    hex	filename
13662838	1553940	 351368	15568146	 ed8d12	vmlinux

We could make the code marginally smaller yet by forcing 'tmp' to lr
instead, but that would have a negative inpact on branch prediction for
which "bx lr" is optimal.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-02-11 15:33:37 +00:00
Nicolas Pitre 040b323b50 ARM: asm/div64.h: adjust to generic codde
Now that the constant divisor optimization is made generic, adapt the
ARM case to it.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
2015-11-19 20:00:43 -05:00
Xiangyu Lu 80bb3ef109 ARM: 8027/1: fix do_div() bug in big-endian systems
In big-endian systems, "%1" get the most significant part of the value, cause the instruction to get the wrong result.

When viewing ftrace record in big-endian ARM systems, we found that
the timestamp errors:

swapper-0   [001] 1325.970000:   0:120:R ==> [001]    16:120:R events/1
events/1-16 [001] 1325.970000:   16:120:S ==> [001]    0:120:R swapper
swapper-0   [000] 1325.1000000:  0:120:R   + [000]    15:120:R events/0
swapper-0   [000] 1325.1000000:  0:120:R ==> [000]    15:120:R events/0
swapper-0   [000] 1326.030000:   0:120:R   + [000]  1150:120:R sshd
swapper-0   [000] 1326.030000:   0:120:R ==> [000]  1150:120:R sshd

When viewed ftrace records, it will call the do_div(n, base) function, which achieved arch/arm/include/asm/div64.h in. When n = 10000000, base = 1000000, in do_div(n, base) will execute "umull %Q0, %R0, %1, %Q2".

Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Cc: <stable@vger.kernel.org> # 2.6.20+
Signed-off-by: Alex Wu <wuquanming@huawei.com>
Signed-off-by: Xiangyu Lu <luxiangyu@huawei.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-04-22 22:23:57 +01:00
Arnd Bergmann 049f3e84d3 ARM: 7705/1: use optimized do_div only for EABI
In OABI configurations, some uses of the do_div function
cause gcc to run out of registers. To work around that,
we can force the use of the out-of-line version for
configurations that build a OABI kernel.

Without this patch, building netx_defconfig results in:

net/core/pktgen.c: In function 'pktgen_if_show':
net/core/pktgen.c:682:2775: error: can't find a register in class 'GENERAL_REGS' while reloading 'asm'
net/core/pktgen.c:682:3153: error: can't find a register in class 'GENERAL_REGS' while reloading 'asm'
net/core/pktgen.c:682:2775: error: 'asm' operand has impossible constraints
net/core/pktgen.c:682:3153: error: 'asm' operand has impossible constraints

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-05-15 19:35:53 +01:00
David Howells 9f97da78bf Disintegrate asm/system.h for ARM
Disintegrate asm/system.h for ARM.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Russell King <linux@arm.linux.org.uk>
cc: linux-arm-kernel@lists.infradead.org
2012-03-28 18:30:01 +01:00
Nicolas Pitre 884afaab44 [ARM] 5320/1: fix assembly constraints in implementation of do_div()
Those inline assembly segments using the umlal instruction must have
the & modifier so to be sure that a purely input register won't alias
one of the registers used as input+output.  In most cases, the inputs
are still used after the outputs are touched, and most binutil versions
insist on "rdhi, rdlo and rm must all be different" even for ARMv6+.

Signed-off-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-10-23 12:53:32 +01:00
Russell King 4baa992243 [ARM] move include/asm-arm to arch/arm/include/asm
Move platform independent header files to arch/arm/include/asm, leaving
those in asm/arch* and asm/plat* alone.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-08-02 21:32:35 +01:00