Make fallthru be TLB hit and branch be TLB miss. Doing this
both improves branch prediction and will allow further cleanup.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Splitting out these functions will allow further cleanups.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Include it in the opcode as an extension, as with P_EXT
or the REX bits in the x86-64 port.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
(1) The output registers were not marked call-clobbered, even though
they can be modified by called functions.
(2) The thread pointer was not marked reserved.
(3) R4-R6 are call-saved, but not saved by the prologue. Rather than
save them, mark them reserved so that we don't use them.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Now that the prologue is generated after GUEST_BASE is fixed,
we can load it as an immediate, and also avoid reserving the
register if it isn't necessary.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This will allow backends to make intelligent choices about how
to implement GUEST_BASE.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The result is shorter than the mov+add that TCG would
otherwise generate for us.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Implement full modrm+sib addressing mode processing.
Use that in qemu_ld/st to output the LEA.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Define OPC_GRP3 and EXT3_FOO to match. Use them instead of
bare constants.
Define OPC_GRP5 and rename the existing EXT_BAR to EXT5_BAR to
make it clear which extension should be used with which opcode.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Define OPC_CALL_Jz, generated by tcg_out_calli; use the later
throughout. Unify the calls within qemu_st; adjust the stack
with a single pop if applicable.
Define and use EXT_CALLN_Ev for indirect calls.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Move tcg_out_push/pop up in the file so that they can be used
by qemu_ld/st. Define a tcg_out_pushi to be used as well.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Add more OPC values, and tgen_arithr. Use the later throughout.
Note that normal reg/reg arithmetic now uses the Gv,Ev opcode form
instead of the Ev,Gv opcode form used previously. Both forms
disassemble properly, and so there's no visible change when diffing
log files before and after the change. This change makes the operand
ordering within the output routines more natural, and avoids the need
to define an OPC_ARITH_EvGv since a read-modify-write with memory is
not needed within TCG.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Define OPC_ARITH_EvI[bz]; use throughout. Use tcg_out_ext8u
directly in setcond. Use tgen_arithi in qemu_ld/st.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Define OPC_MOVSBL and OPC_MOVSWL. Factor opcode emission to
separate functions.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Define OPC_MOVZBL and OPC_MOVZWL. Factor opcode emission to
separate functions.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Define OPC_JCC*, OC_JMP*, and EXT_JMPN_Ev. Use them throughout.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
If the address register overlaps one of the output registers
simply issue the clobbering load last, rather than emitting
an extra move of the address register.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Define OPC_MOVB* and OPC_MOVL*; use them throughout.
Use tcg_out_ld/st instead of bare tcg_out_modrm_offset
when it makes sense.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Setting the registers one by one is easier to read, and gets
optimized by the compiler just the same.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
tcg_out_reloc is only used locally (in */target.c which is
included in tcg.c).
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Branch offsets should only be overwritten during relocation,
to support partial retranslation.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Issue the tlb load as early as possible and perform the address
masking while the load is completing.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Along the tlb hit path, we were modifying the variables holding the input
register numbers, which lead to incorrect expansion of the tlb miss path.
Fix this by extracting the tlb hit path to separate functions with their
own local variables. This also makes the difference between softmmu and
user-only easier to read.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Load from the guest_base variable rather than embed a constant.
Always reserve TCG_GUEST_BASE_REG if guest base support enabled.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Define "M" constraint for and_mask_p and "O" constraint for or_mask_p.
Assume that inputs are correct in tcg_out_ori and tcg_out_andi.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The beginning of the register allocation order list on the TCG arm
target matches the list of clobbered registers. This means that when an
helper is called, there is almost always clobbered registers that have
to be spilled.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
64-bit arguments should be aligned on an even register as specified
by the "Procedure Call Standard for the ARM Architecture".
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
addr_reg, data_reg and data_reg2 can't be register r0 or r1 du to the
constraints. Don't check if they equals these registers.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
On big endian targets, data arguments of qemu_ld/st ops have to be
byte swapped. Two temporary registers are needed for qemu_st to do
the bswap. r0 and r1 are used in system mode, do the same in user
mode, which implies reworking the constraints.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
While it make sense to pass a conditional argument to tcg_out_*()
functions as the ARM architecture allows that, it doesn't make sense
for qemu_ld/st functions. These functions use comparison instructions
and conditional execution already, so it is not possible to use a
second level of conditional execution.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Add an bswap16 and bswap32 ops, either using the rev and rev16
instructions on ARMv6+ or shifts and logical operations on previous
ARM versions. In both cases the result use less instructions than
the pure TCG version.
These ops are also needed by the qemu_ld/st functions.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Add an ext16u op, either using the uxth instruction on ARMv6+ or two
shifts on previous ARM versions. In both cases the result use the same
number or less instructions than the pure TCG version.
Also move all sign extension code to separate functions, so that they
can be reused in other parts of the code.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Use a set of variables to define the allowed ARM instructions, depending
on the __ARM_ARCH_*__ GCC defines.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The TCG ARM backends uses integer values to refer to both immediate
values and register number. This makes the code difficult to read.
The patch below replaces all (if I haven't miss any ;-) integer values
representing register number by TCG_REG_* enum values.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Instead of writing very compact code, declare all registers that are
clobbered or reserved one by one. This makes the code easier to read.
Also declare all the 16 registers to TCG, and mark pc as reserved.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
There is no need to save the LR register (r14) before a call to a
subroutine. According to the "Procedure Call Standard for the ARM
Architecture", it is the job of the callee to save this register.
Moreover, this register is already saved in the prologue/epilogue.
This patch removes the disabled SAVE_LR code, as there is no need to
reenable later.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
There's a header file inclusion ordering problem between cpu-all.h
and qemu-timer.h, such that cpu_get_real_ticks is not defined when
we attempt to use it in profile_getclock.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Register fp (frame pointer) is a bad choice for compilations
without optimisation, because the compiler makes heavy use
of this register (so the resulting code crashes).
Register s0 had been used for TCG_AREG1 in earlier releases,
but was no longer used and is now free for TCG_AREG0.
The resulting code works for compilations without
optimisation (tested with qemu mips in qemu mips
on x86 host).
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
PA-RISC uses procedure descriptors. We'd need to emit a call to
the millicode routine $$dyncall. However, this situation doesn't
actually arise, since we always have the descriptor available at
TCG code generation time.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Handle the output log part overlapping the input high parts.
Also, improve sub2 to handle some constants the second input low part.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Delete inline functions from tcg-target.h that don't need to be there,
move the others to tcg-target.c. Add 'Z', 'I', 'J' constraints for
0, signed 11-bit, and signed 5-bit respectively. Add GUEST_BASE support
similar to ppc64, with the value stored in a register. Add missing
registers to reg_alloc_order. Add support for 12-bit branch relocations.
Add functions for synthetic operations: addi, mtctl, dep, shd, vshd, ori,
andi, shifts, rotates, multiply, branches, setcond. Split out TLB reads
from qemu_ld and qemu_st; fix argument loading for tlb external calls.
Generate the prologue.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Historically the qemu tlb "addend" field was used for both RAM and IO accesses,
so needed to be able to hold both host addresses (unsigned long) and guest
physical addresses (target_phys_addr_t). However since the introduction of
the iotlb field it has only been used for RAM accesses.
This means we can change the type of addend to unsigned long, and remove
associated hacks in the big-endian TCG backends.
We can also remove the host dependence from target_phys_addr_t.
Signed-off-by: Paul Brook <paul@codesourcery.com>
A few words about design choices:
* On IA64, instructions should be grouped by bundle, and dependencies
between instructions declared. A first version of this code tried to
schedule instructions automatically, but was very complex and too
invasive for the current common TCG code (ops not ending at
instruction boundaries, code retranslation breaking already generated
code, etc.) It was also not very efficient, as dependencies between
TCG ops is not available.
Instead the option taken by the current implementation does not try
to fill the bundle by scheduling instructions, but by providing ops
not available as an ia64 instruction, and by offering 22-bit constant
loading for most of the instructions. With both options the bundle are
filled at approximately the same level.
* Up to 128 registers can be affected to a function on IA64, but TCG
limits this number to 64, which is actually more than enough. The
register affectation is the following:
- r0: used to map a constant argument with value 0
- r1: global pointer
- r2, r3: internal use
- r4 to r6: not used to avoid saving them
- r7: env structure
- r8 to r11: free for TCG (call clobbered)
- r12: stack pointer
- r13: thread pointer
- r14 to r31: free for TCG (call clobbered)
- r32: reserved (return address)
- r33: reserved (PFS)
- r33 to r63: free for TCG
* The IA64 architecture has only 64-bit registers and no 32-bit
instructions (the only exception being cmp4). Therefore 64-bit
registers and instructions are used for 32-bit ops. The adopted
strategy is the same as the ABI, that is the higher 32 bits are
undefined. Most ops (and, or, add, shl, etc.) can directly use
the 64-bit registers, while some others have to sign-extend (sar,
div, etc.) or zero-extend (shr, divu, etc.) the register first.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Commit 86feb1c860
did not change all occurrences of INDEX_op_qemu_ld32u
for tcg/arm.
Please note that I could not test this patch
(I have currently no arm system available).
Cc: Richard Henderson <rth@twiddle.net>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Some targets (e.g. Alpha and MIPS64) need to keep 32-bit operands
sign-extended in 64-bit registers (regardless of the "real" sign
of the operand). For that, we need to be able to distinguish
between a 32-bit load with a 32-bit result and a 32-bit load with
a given extension to a 64-bit result. This distinction already
exists for the ld* loads, but not the qemu_ld* loads.
Reserve qemu_ld32u for 64-bit outputs and introduce qemu_ld32 for
32-bit outputs. Adjust all code generators to match.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The TCGType name was already used consistently. Changing it
to an enumeration instead of a set of defines aids debugging.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Use the TCGCond enumeration type in the brcond and setcond
related prototypes in tcg-op.h and each code generator.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Give the enumeration formed from tcg-opc.h a name: TCGOpcode.
Use that enumeration type instead of "int" whereever appropriate.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
There is no need to save r7, it is used to store the address
of the env structure and is not modified by GCC.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
TCG internal helpers only access to the values passed in arguments, and
do not modify the CPU internal state. Thus they can be declared as
const and pure.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Some targets like ARM would benefit to use 32-bit helpers for
div/rem/divu/remu.
Create a #define for div2 so that targets can select between
div, div2 and helper implementation. Use the helper version if none
of the #define are present.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Since commit 6113d6d316 QEMU crashes
on ARM hosts. This is not a bug of this commit, but a latent bug
revealed by this commit.
The TCG code is called through a procedure call using the prologue
and epilogue code. This code does not save and restore enough registers.
The "Procedure Call Standard for the ARM Architecture" says:
A subroutine must preserve the contents of the registers r4-r8, r10,
r11 and SP (and r9 in PCS variants that designate r9 as v6).
The current code only saves and restores r9 to r11, and misses r4 to
r8. The patch fixes that by saving r4 to r12. Theoretically there is
no need to save and restore r12, but an even number of registers have
to be saved as per EABI.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Fix error:
CC sparc-bsd-user/op_helper.o
In file included from /src/qemu/tcg/tcg.c:158:
/src/qemu/tcg/sparc/tcg-target.c:728:5: "TARGET_PHYS_ADDR_BITS" is not defined
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
When restoring register values, increase the stack register for skipped
values.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Andrzej Zaborowski <andrew.zaborowski@intel.com>
On 32-bit hosts op_qemu_ld32s is unused. Remove it to fix the
following assertion failure:
qemu-alpha: tcg/tcg.c:1055:
tcg_add_target_add_op_defs: Assertion `tcg_op_defs[op].used' failed.
Signed-off-by: Jay Foad <jay.foad@gmail.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Previously ORC was always implemented by tcg-op.h with
an explicit NOT opcode. Allow a target implementation.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Previously ANDC was always implemented by tcg-op.h with
an explicit NOT opcode. Allow a target implementation.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The fallback implementation of "ret = arg1 ^ -1" isn't ideal
because of the extra tcg op to load the minus one.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The fallback implementation of "ret = 0 - arg1" isn't ideal,
first because of the extra tcg op to load the zero, and second
because we fail to handle zero as %g0 for arg1 of the sub.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
New version after malc's comments. (This avoids having to do
#if defined __linux__ || defined __FreeBSD__ || defined __FreeBSD_kernel__
for the third case.)
Submitted by: Andreas Tobler <andreast@fgznet.ch> (original version)
Signed-off-by: Juergen Lock <nox@jelal.kn-bremen.de>
Signed-off-by: malc <av1474@comtv.ru>
When compiled with CONFIG_DEBUG_TCG, this code looks
for missing, duplicate and wrong entries in the
op definitions.
Errors will raise an assertion at program start
(all checks are done in the initial phase).
The current code contains such errors, at least for
i386 guest on i386 host.
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Returns the condition as if with swapped comparison operands.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The address register is overriden when it corresponds to v0 and the fast
path is taken, which leads to a crash. Fix that by using the a0 register
instead.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
There are places, like brcond2, where we know that the destination
of a forward branch will be within 127 bytes.
Add the R_386_PC8 relocation type to support this. Add a flag to
tcg_out_jxx and tcg_out_brcond* to enable it. Set the flag in the
brcond2 label_next branches; pass along the input flag otherwise.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
It is very handy to have a reliable mapping of a condition to its inverse.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Defines setcond_{i32,i64} and setcond2_i32 for 64-on-32-bit.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The internal opcodes brcond2, add2, sub2, mulu2 were undocumented.
Place these in a new section that clearly indicates that they are
not to be emitted by translators.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The existing P_REXB internal opcode flag unconditionally emits
the REX prefix. Technically it's not needed if the register in
question is %al, %bl, %cl, %dl.
Eliding the prefix requires splitting the P_REXB flag into two,
in order to indicate whether the byte register in question is
in the REG or the R/M field. Within TCG, the byte register is
in the REG field only for stores.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This avoids an unnecessary REX.W prefix when dealing with AND
operands that fit into a 32-bit quantity. The most common change
actually seen is movz[wb]q -> movz[wb]l.
Similarly, avoid REXW in ext{8,16}u_i64 tcg opcodes.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The 32-bit right-shift instructions is defined to extend the shifted
output to 64-bits. A shift count of zero therefore is a simple
extension without actually shifting.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The {div,divu}2 opcodes are intended for systems for which the
division instruction produces both quotient and remainder. Sparc
is not such a system. Indeed, the remainder must be computed as
quot = a / b
rem = a - (quot * b)
Split out a tcg_out_div32 function that properly initializes Y
with the extension of the input to 64-bits. Discard the code
that used the 64-bit DIVX on sparc9/sparcv8plus without extending
the inputs to 64-bits. Implement remainders in terms of division
followed by multiplication.
Signed-off-by: Richard Henderson <rth@twiddle.net>
[blauwirbel@gmail.com: applied rth's typo fix in tcg_out_div32]
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Add a function to handle the register-vs-immediate test for arithmetic.
Also, adjust the OP_32_64 macro so that it auto-indents properly.
Rename the gen_arith32 label to gen_arith, since it handles 64-bit
arithmetic as well.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Returns an unsigned version of a signed condition;
returns the original condition otherwise.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Split out tcg_out_cmp and properly handle immediate arguments.
Fix constraints on brcond to match what SUBCC accepts.
Add tcg_out_brcond2_i32 for 32-bit host.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The test TCG_TARGET_REG_BITS==64 is exactly the feature that we
are checking for, whereas something involving __sparc_v9__ or
__sparc_v8plus__ should be reserved for something ISA related,
as with SMULX.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Generate sign-extended 32-bit constants with SETHI+XOR.
Otherwise tidy the routine to avoid the need for
conditional compilation and code duplication with movi_imm32.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
We were unnecessarily restricting imm13 constants to 12 bits.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Qemu won't let us run a KVM target without having host TCG support. Well, for
now we don't have any so let's implement a fake target that only stubs out
everything.
I tried to keep the patch as close to Uli's source as possible, so whenever
he feels like it he can easily diff his version against this one.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Based on a patch from Arnaud Patard (Rtp) <arnaud.patard@rtp-net.org>
A few words about design choices:
* Two registers, at and t0, are reserved for TCG internal use. They are
useful for bswap and 64-bit ops.
* Most ops supports a constant argument with value 0, which is actually
mapped to the zero register.
* While the at register is available for constant loading, ops only
support a limited range of constants. TCG does a better job doing the
register allocation and constant loading by itself. There are plenty of
registers available anyway.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Similarly to what is already done in tcg_liveness_analysis() when
USE_LIVENESS_ANALYSIS is not set.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Currently zero extensions ops are implemented by a and op with a
constant. This is then catched in some backend, and replaced by
a zero extension instruction. While this works well on RISC
machines, this adds a useless register move on non-RISC machines.
Example on x86:
ext16u_i32 r1, r2
is translated into
mov %eax,%ebx
movzwl %bx, %ebx
while the optimized version should be:
movzwl %ax, %ebx
This patch adds ext{8,16,32}u_i{32,64} TCG ops that can be
implemented in the backends to avoid emitting useless register
moves.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This patch uses sxtb for ext8s_i32 and sxth for ext16s_i32 in ARM back-end.
Signed-off-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
the encode_imm function in tcg/arm/tcg-target.c lacks shift declaration.
Laurent
Signed-off-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Andrzej Zaborowski <andrew.zaborowski@intel.com>
this patch handles all possible constants for immediate operand of ALU ops.
I'm not very satisfied by the implementation.
Laurent
Signed-off-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Andrzej Zaborowski <andrew.zaborowski@intel.com>
- use movw/movt to load immediate values for ARMv7-A
- implement add/sub/and/or/xor with immediate (only 8-bit)
Laurent
Signed-off-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Andrzej Zaborowski <andrew.zaborowski@intel.com>
Re-implement GUEST_BASE support.
Offset guest ddress space by default if the guest binary contains
regions below the host mmap_min_addr.
Implement support for i386, x86-64 and arm hosts.
Signed-off-by: Riku Voipio <riku.voipio@iki.fi>
Signed-off-by: Paul Brook <paul@codesourcery.com>
Include assert.h from qemu-common.h and remove other direct uses.
cpu-all.h still need to include it because of the dyngen-exec.h hacks
Signed-off-by: Paul Brook <paul@codesourcery.com>
This patch allows DEBUG_TCGV to be defined (and also prevents NDEBUG
from being defined) when passing an option to the configure script.
This should help to prevent any accidental changes that enable
DEBUG_TCGV in tcg/tcg.h from being committed in future, and may
help to encourage testing with DEBUG_TCGV enabled.
Signed-off-by: Stuart Brady <stuart.brady@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7105 c046a42c-6fe2-441c-8c8c-71466251a162
A const function only reads its arguments and does not use TCG
globals variables. Hence a call to such a function does not
save TCG globals variabes back to their canonical location.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7008 c046a42c-6fe2-441c-8c8c-71466251a162
The beginning of the register allocation order list on the TCG x86_64
target matches the list of clobbered registers. This means that when an
helper is called, there is almost always clobbered registers that have
to be spilled.
The same way register %rsi and %rdi are at the top of the register
allocation order list, while they can't be used for load/store
operations. This means the data and/or address registers are very often
%rsi and %rdi, and their values have to be spilled, and then moved back
to another register.
This patches changes to the allocation order to avoid those effects.
It results in a 8% gain speed in qemu-x86_64 to compress a bzip2 file,
and a 6% gain in qemu-system-mips64 to compile a small application.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7003 c046a42c-6fe2-441c-8c8c-71466251a162
Thanks to Stuart Brady for the notice.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6920 c046a42c-6fe2-441c-8c8c-71466251a162
Rename bswap_i32 into bswap32_i32 and bswap_i64 into bswap64_i64
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6829 c046a42c-6fe2-441c-8c8c-71466251a162
Emulating fldl on arm doesn't seem to work too well. It's the way
qemu_ld64 is translated to arm instructions.
tcg_out_ld32_12(s, COND_AL, data_reg, addr_reg, 0);
tcg_out_ld32_12(s, COND_AL, data_reg2, addr_reg, 4);
Consider case where data_reg==0, data_reg2==1, and addr_reg==0. First load
overwrited addr_reg. So let's put an if (data_ref==addr_reg).
(Pablo Virolainen)
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6808 c046a42c-6fe2-441c-8c8c-71466251a162
Simplify nand/nor/eqv and move their optimizations to and/or/xor
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6805 c046a42c-6fe2-441c-8c8c-71466251a162
The introduction of TCGV_EQUAL and not op is slightly broken.
The definition of DEBUG_TCGV shows that.
Signed-off-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6802 c046a42c-6fe2-441c-8c8c-71466251a162
Remove definitions for TCG_AREGs corresponding to AREG definitions
removed in r6778.
Signed-off-by: Stuart Brady <stuart.brady@gmail.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6779 c046a42c-6fe2-441c-8c8c-71466251a162
Since we don't generate any "old op" anymore, the old_op_count
is unneeded.
Signed-off-by: Laurent Desnogues <laurent.desnogues@gmail.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6614 c046a42c-6fe2-441c-8c8c-71466251a162
These are references to 'loglevel' that aren't on a simple 'if (loglevel &
X) qemu_log()' statement.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6340 c046a42c-6fe2-441c-8c8c-71466251a162
This is a large patch that changes all occurrences of logfile/loglevel
global variables to use the new qemu_log*() macros.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6338 c046a42c-6fe2-441c-8c8c-71466251a162
According to $GCC/gcc/config/rs6000/rs6000-c.c _ARCH_PPC is the
ubiquitous define which should be used to test whether gcc targets
PowerPC, on 64bit platforms _ARCH_PPC64 will be also defined.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6301 c046a42c-6fe2-441c-8c8c-71466251a162
This patch removes the unused tcg_global_reg2_new_hack() function, which
was added in r4438 to work around a register shortage problem regarding
dyngen. The only ever user of this function was removed in r4577.
Signed-off-by: Stuart Brady <stuart.brady@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6150 c046a42c-6fe2-441c-8c8c-71466251a162
The functions defined in tcg/tcg-op.h have no _i32 or _i64 suffix,
qemu_ld64 and qemu_st64 were missing from the list, and there are
no 'plain' qemu_ld/qemu_st ops.
Signed-off-by: Stuart Brady <stuart.brady@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6148 c046a42c-6fe2-441c-8c8c-71466251a162
This patch removes useless type information in some calls to
tcg_temp_local_new. It also removes the parameter from the
macro declaration; if a target has to use a specific non-default
size then it should use tcg_temp_local_new_{i32,i64}.
Signed-off-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6146 c046a42c-6fe2-441c-8c8c-71466251a162
Change from v1:
Avoid changing the existing coding style in certain files.
Signed-off-by: Stuart Brady <stuart.brady@gmail.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6120 c046a42c-6fe2-441c-8c8c-71466251a162
Starting with version 4.3, gcc returns the result of a function in
rax/eax/ax/al instead of rax/eax, depending of the return type. As
a consequence we should use a zero extend moe in TCG loads/stores.
See http://gcc.gnu.org/ml/gcc/2008-01/msg00052.html for more details.
A big thanks to malc who founds the problem and wrote the x86 patch.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6011 c046a42c-6fe2-441c-8c8c-71466251a162
Looking at tcg/tcg.c:828, the bug that the warning indicated would show up as
incorrect PC shown in log, only on 32 bit big endian host emulating a 64 bit
target, -d op flag enabled. Now that dyngen is gone, the patch can be applied.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5998 c046a42c-6fe2-441c-8c8c-71466251a162
this patch removes some now unused things after dyngen removal.
1. dyngen-exec.h: op_param, op _jmp and some associated macros
are now unused;
2. Makefile.target: tcg-dyngen is not needed anymore
2. tcg/tcg-op.h, tcg/tcg-opc.h: gen-op.h is dead
3. tcg.c:
- INDEX_op_end is now the first op
- CONFIG_DYNGEN_OP is never defined
4. tcg.h: dyngen_op not needed anymore
5. exec-all.h: remove some ASM macros.
Signed-off-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5922 c046a42c-6fe2-441c-8c8c-71466251a162
This fixes a few things after Paul's improvements for TCG debugging:
- change TCGv_i64 field name to something different from
TCGv_i32
- fix things in tcg that the above change made visible.
Signed-off-by: Laurent Desnogues <laurent.desnogues@gmail.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5919 c046a42c-6fe2-441c-8c8c-71466251a162
According to malc TCG will often genereate an add2/sub2/mul2 with low
half of the output in the same register as high half of one of the
inputs, so account for that.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5847 c046a42c-6fe2-441c-8c8c-71466251a162
Otherwise when tb_exit generates a jump beyond the pc-relative range,
tcg_out_goto() spans two/three instructions and we load the tb return
value from a wrong address. This is #ifdefed out currently because
we take care for the jumps to be local.
Problem spotted by Steffen Liebergeld.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5845 c046a42c-6fe2-441c-8c8c-71466251a162
Calling the clear cache syscall directly generates an illegal instruction
on some (armv4) kernels.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5843 c046a42c-6fe2-441c-8c8c-71466251a162
LD/STD/LWA require displacement to be multiple of 4, provide
tcg_out_ldsta which checks the supplied displacement and falls
back on indexed variant when the check fails. All uses of
LD/STD/LWA outside of tcg_out_ldst appear to be safe.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5670 c046a42c-6fe2-441c-8c8c-71466251a162
Use the same order as the _i32 version (pure code move). Suggested by
Laurent Laurent Desnogues.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5606 c046a42c-6fe2-441c-8c8c-71466251a162
Add a subfi (subtract from immediate) wrapper, useful for the PPC target.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5599 c046a42c-6fe2-441c-8c8c-71466251a162
Move addi_i64, muli_i64 and subi_i64 out of #if TCG_TARGET_REG_BITS
as both implementations are strictly identical. Use the same
optimisation (ie when imm == 0) for addi_i64 and subi_64 than the
32-bit version.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5598 c046a42c-6fe2-441c-8c8c-71466251a162
Use rldimi instead of rldicr/or pair, saves us one instruction.
Suggested by Hollis Blanchard.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5404 c046a42c-6fe2-441c-8c8c-71466251a162
Avoids nasty warnings about flush_icache_range from gcc4 and inability
to compile [cpu-]exec.c with gcc3 and -O, also the function is much
too large to be candidate for inlining anyway.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4974 c046a42c-6fe2-441c-8c8c-71466251a162
This (along with previous 2 commits) makes X86_64 work on ppc64 too.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4960 c046a42c-6fe2-441c-8c8c-71466251a162
a. Use 32bit arithmetics in TARGET_LONG_BITS == 32 case
b. Optimize byte access case in TARGET_LONG_BITS == 64 case
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4955 c046a42c-6fe2-441c-8c8c-71466251a162
That should teach one to not blindly trust documentation that will
(out of the blue) use decimal and binary representation for numbers in
the same chart for no apparent reason and without any prefixes, sigh..
Unbreaks MIPS.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4954 c046a42c-6fe2-441c-8c8c-71466251a162
ppc64.ld from Heikki Lindholm's patch
http://marc.info/?l=qemu-devel&m=114086179024634&w=2
Issues:
x86_64 tripple faults shortly after decompressing the kernel
No immediate versions of most 64 bit operations
More...
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4932 c046a42c-6fe2-441c-8c8c-71466251a162
Move reserved/volatile registers down. Currently qemu_ld/stXX are
marked with TCG_OPF_CALL_CLOBBER and since memory accesses are
frequent and R3 through R12 are volatile moving this down results in
less spills and tighter generated code.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4778 c046a42c-6fe2-441c-8c8c-71466251a162
Previous code assummed 32 by 32 bit divmod operation, and survived
x86_64 test only by sheer luck. MIPS wasn't so forgiving.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4705 c046a42c-6fe2-441c-8c8c-71466251a162
* Fix typo in aliased div2
* "Optimize" aliased div2/divu2
* Fix two remaining branch retranslation problems
(Kudos to Andrzej Zaborowski)
* Rework goto_tb and set_jmp_target1
* Use correct size when flushing icache
* Use correct register selection for ORI
(Was harmless since in both cases srcreg was equal to dstreg)
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4691 c046a42c-6fe2-441c-8c8c-71466251a162
When a branch to label is translated it generates a reloc that is filled in
when the label is translated. However, when handling an exception and
searching for the pc we abort the translation early and we sometimes
translate the branch but not the corresponding label and so no relocation
is done. When the block is executed again the branch points to no-where.
It seems tcg/sparc/ is going to suffer from the same issue.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4547 c046a42c-6fe2-441c-8c8c-71466251a162
offsetof(CPUState, tlb_table[mem_index][0].addr_read) with mem_index > 0
was larger than max immediate offset for ldr and str (12-bit) so insert an
additional insn to add the mem_index offset.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4542 c046a42c-6fe2-441c-8c8c-71466251a162
Updated from previous version to use the tcg prologue/epilogue mechanism, may be slower than direct call.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4500 c046a42c-6fe2-441c-8c8c-71466251a162