2019-06-11 23:39:41 +08:00
|
|
|
# AArch32 VFP instruction descriptions (conditional insns)
|
|
|
|
#
|
|
|
|
# Copyright (c) 2019 Linaro, Ltd
|
|
|
|
#
|
|
|
|
# This library is free software; you can redistribute it and/or
|
|
|
|
# modify it under the terms of the GNU Lesser General Public
|
|
|
|
# License as published by the Free Software Foundation; either
|
|
|
|
# version 2 of the License, or (at your option) any later version.
|
|
|
|
#
|
|
|
|
# This library is distributed in the hope that it will be useful,
|
|
|
|
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
|
|
|
# Lesser General Public License for more details.
|
|
|
|
#
|
|
|
|
# You should have received a copy of the GNU Lesser General Public
|
|
|
|
# License along with this library; if not, see <http://www.gnu.org/licenses/>.
|
|
|
|
|
|
|
|
#
|
|
|
|
# This file is processed by scripts/decodetree.py
|
|
|
|
#
|
|
|
|
# Encodings for the conditional VFP instructions are here:
|
|
|
|
# generally anything matching A32
|
|
|
|
# cccc 11.. .... .... .... 101. .... ....
|
|
|
|
# and T32
|
|
|
|
# 1110 110. .... .... .... 101. .... ....
|
|
|
|
# 1110 1110 .... .... .... 101. .... ....
|
|
|
|
# (but those patterns might also cover some Neon instructions,
|
|
|
|
# which do not live in this file.)
|
2019-06-11 23:39:44 +08:00
|
|
|
|
|
|
|
# VFP registers have an odd encoding with a four-bit field
|
|
|
|
# and a one-bit field which are assembled in different orders
|
|
|
|
# depending on whether the register is double or single precision.
|
|
|
|
# Each individual instruction function must do the checks for
|
|
|
|
# "double register selected but CPU does not have double support"
|
|
|
|
# and "double register number has bit 4 set but CPU does not
|
|
|
|
# support D16-D31" (which should UNDEF).
|
|
|
|
%vm_dp 5:1 0:4
|
|
|
|
%vm_sp 0:4 5:1
|
|
|
|
%vn_dp 7:1 16:4
|
|
|
|
%vn_sp 16:4 7:1
|
|
|
|
%vd_dp 22:1 12:4
|
|
|
|
%vd_sp 12:4 22:1
|
|
|
|
|
|
|
|
%vmov_idx_b 21:1 5:2
|
|
|
|
%vmov_idx_h 21:1 6:1
|
|
|
|
|
2019-06-14 00:39:07 +08:00
|
|
|
%vmov_imm 16:4 0:4
|
|
|
|
|
2019-06-11 23:39:44 +08:00
|
|
|
# VMOV scalar to general-purpose register; note that this does
|
|
|
|
# include some Neon cases.
|
|
|
|
VMOV_to_gp ---- 1110 u:1 1. 1 .... rt:4 1011 ... 1 0000 \
|
|
|
|
vn=%vn_dp size=0 index=%vmov_idx_b
|
|
|
|
VMOV_to_gp ---- 1110 u:1 0. 1 .... rt:4 1011 ..1 1 0000 \
|
|
|
|
vn=%vn_dp size=1 index=%vmov_idx_h
|
|
|
|
VMOV_to_gp ---- 1110 0 0 index:1 1 .... rt:4 1011 .00 1 0000 \
|
|
|
|
vn=%vn_dp size=2 u=0
|
|
|
|
|
|
|
|
VMOV_from_gp ---- 1110 0 1. 0 .... rt:4 1011 ... 1 0000 \
|
|
|
|
vn=%vn_dp size=0 index=%vmov_idx_b
|
|
|
|
VMOV_from_gp ---- 1110 0 0. 0 .... rt:4 1011 ..1 1 0000 \
|
|
|
|
vn=%vn_dp size=1 index=%vmov_idx_h
|
|
|
|
VMOV_from_gp ---- 1110 0 0 index:1 0 .... rt:4 1011 .00 1 0000 \
|
|
|
|
vn=%vn_dp size=2
|
|
|
|
|
|
|
|
VDUP ---- 1110 1 b:1 q:1 0 .... rt:4 1011 . 0 e:1 1 0000 \
|
|
|
|
vn=%vn_dp
|
2019-06-11 23:39:44 +08:00
|
|
|
|
|
|
|
VMSR_VMRS ---- 1110 111 l:1 reg:4 rt:4 1010 0001 0000
|
|
|
|
VMOV_single ---- 1110 000 l:1 .... rt:4 1010 . 001 0000 \
|
|
|
|
vn=%vn_sp
|
2019-06-11 23:39:45 +08:00
|
|
|
|
|
|
|
VMOV_64_sp ---- 1100 010 op:1 rt2:4 rt:4 1010 00.1 .... \
|
|
|
|
vm=%vm_sp
|
|
|
|
VMOV_64_dp ---- 1100 010 op:1 rt2:4 rt:4 1011 00.1 .... \
|
|
|
|
vm=%vm_dp
|
2019-06-11 23:39:45 +08:00
|
|
|
|
|
|
|
# Note that the half-precision variants of VLDR and VSTR are
|
|
|
|
# not part of this decodetree at all because they have bits [9:8] == 0b01
|
|
|
|
VLDR_VSTR_sp ---- 1101 u:1 .0 l:1 rn:4 .... 1010 imm:8 \
|
|
|
|
vd=%vd_sp
|
|
|
|
VLDR_VSTR_dp ---- 1101 u:1 .0 l:1 rn:4 .... 1011 imm:8 \
|
|
|
|
vd=%vd_dp
|
2019-06-11 23:39:45 +08:00
|
|
|
|
|
|
|
# We split the load/store multiple up into two patterns to avoid
|
|
|
|
# overlap with other insns in the "Advanced SIMD load/store and 64-bit move"
|
|
|
|
# grouping:
|
|
|
|
# P=0 U=0 W=0 is 64-bit VMOV
|
|
|
|
# P=1 W=0 is VLDR/VSTR
|
|
|
|
# P=U W=1 is UNDEF
|
|
|
|
# leaving P=0 U=1 W=x and P=1 U=0 W=1 for load/store multiple.
|
|
|
|
# These include FSTM/FLDM.
|
|
|
|
VLDM_VSTM_sp ---- 1100 1 . w:1 l:1 rn:4 .... 1010 imm:8 \
|
|
|
|
vd=%vd_sp p=0 u=1
|
|
|
|
VLDM_VSTM_dp ---- 1100 1 . w:1 l:1 rn:4 .... 1011 imm:8 \
|
|
|
|
vd=%vd_dp p=0 u=1
|
|
|
|
|
|
|
|
VLDM_VSTM_sp ---- 1101 0.1 l:1 rn:4 .... 1010 imm:8 \
|
|
|
|
vd=%vd_sp p=1 u=0 w=1
|
|
|
|
VLDM_VSTM_dp ---- 1101 0.1 l:1 rn:4 .... 1011 imm:8 \
|
|
|
|
vd=%vd_dp p=1 u=0 w=1
|
target/arm: Convert VFP VMLA to decodetree
Convert the VFP VMLA instruction to decodetree.
This is the first of the VFP 3-operand data processing instructions,
so we include in this patch the code which loops over the elements
for an old-style VFP vector operation. The existing code to do this
looping uses the deprecated cpu_F0s/F0d/F1s/F1d TCG globals; since
we are going to be converting instructions one at a time anyway
we can take the opportunity to make the new loop use TCG temporaries,
which means we can do that conversion one operation at a time
rather than needing to do it all in one go.
We include an UNDEF check which was missing in the old code:
short-vector operations (with stride or length non-zero) were
deprecated in v7A and must UNDEF in v8A, so if the MVFR0 FPShVec
field does not indicate that support for short vectors is present
we UNDEF the operations that would use them. (This is a change
of behaviour for Cortex-A7, Cortex-A15 and the v8 CPUs, which
previously were all incorrectly allowing short-vector operations.)
Note that the conversion fixes a bug in the old code for the
case of VFP short-vector "mixed scalar/vector operations". These
happen where the destination register is in a vector bank but
but the second operand is in a scalar bank. For example
vmla.f64 d10, d1, d16 with length 2 stride 2
is equivalent to the pair of scalar operations
vmla.f64 d10, d1, d16
vmla.f64 d8, d3, d16
where the destination and first input register cycle through
their vector but the second input is scalar (d16). In the
old decoder the gen_vfp_F1_mul() operation uses cpu_F1{s,d}
as a temporary output for the multiply, which trashes the
second input operand. For the fully-scalar case (where we
never do a second iteration) and the fully-vector case
(where the loop loads the new second input operand) this
doesn't matter, but for the mixed scalar/vector case we
will end up using the wrong value for later loop iterations.
In the new code we use TCG temporaries and so avoid the bug.
This bug is present for all the multiply-accumulate insns
that operate on short vectors: VMLA, VMLS, VNMLA, VNMLS.
Note 2: the expression used to calculate the next register
number in the vector bank is not in fact correct; we leave
this behaviour unchanged from the old decoder and will
fix this bug later in the series.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-11 23:39:46 +08:00
|
|
|
|
|
|
|
# 3-register VFP data-processing; bits [23,21:20,6] identify the operation.
|
|
|
|
VMLA_sp ---- 1110 0.00 .... .... 1010 .0.0 .... \
|
|
|
|
vm=%vm_sp vn=%vn_sp vd=%vd_sp
|
|
|
|
VMLA_dp ---- 1110 0.00 .... .... 1011 .0.0 .... \
|
|
|
|
vm=%vm_dp vn=%vn_dp vd=%vd_dp
|
2019-06-11 23:39:46 +08:00
|
|
|
|
|
|
|
VMLS_sp ---- 1110 0.00 .... .... 1010 .1.0 .... \
|
|
|
|
vm=%vm_sp vn=%vn_sp vd=%vd_sp
|
|
|
|
VMLS_dp ---- 1110 0.00 .... .... 1011 .1.0 .... \
|
|
|
|
vm=%vm_dp vn=%vn_dp vd=%vd_dp
|
2019-06-11 23:39:46 +08:00
|
|
|
|
|
|
|
VNMLS_sp ---- 1110 0.01 .... .... 1010 .0.0 .... \
|
|
|
|
vm=%vm_sp vn=%vn_sp vd=%vd_sp
|
|
|
|
VNMLS_dp ---- 1110 0.01 .... .... 1011 .0.0 .... \
|
|
|
|
vm=%vm_dp vn=%vn_dp vd=%vd_dp
|
2019-06-11 23:39:47 +08:00
|
|
|
|
|
|
|
VNMLA_sp ---- 1110 0.01 .... .... 1010 .1.0 .... \
|
|
|
|
vm=%vm_sp vn=%vn_sp vd=%vd_sp
|
|
|
|
VNMLA_dp ---- 1110 0.01 .... .... 1011 .1.0 .... \
|
|
|
|
vm=%vm_dp vn=%vn_dp vd=%vd_dp
|
2019-06-11 23:39:47 +08:00
|
|
|
|
|
|
|
VMUL_sp ---- 1110 0.10 .... .... 1010 .0.0 .... \
|
|
|
|
vm=%vm_sp vn=%vn_sp vd=%vd_sp
|
|
|
|
VMUL_dp ---- 1110 0.10 .... .... 1011 .0.0 .... \
|
|
|
|
vm=%vm_dp vn=%vn_dp vd=%vd_dp
|
2019-06-11 23:39:47 +08:00
|
|
|
|
|
|
|
VNMUL_sp ---- 1110 0.10 .... .... 1010 .1.0 .... \
|
|
|
|
vm=%vm_sp vn=%vn_sp vd=%vd_sp
|
|
|
|
VNMUL_dp ---- 1110 0.10 .... .... 1011 .1.0 .... \
|
|
|
|
vm=%vm_dp vn=%vn_dp vd=%vd_dp
|
2019-06-11 23:39:48 +08:00
|
|
|
|
|
|
|
VADD_sp ---- 1110 0.11 .... .... 1010 .0.0 .... \
|
|
|
|
vm=%vm_sp vn=%vn_sp vd=%vd_sp
|
|
|
|
VADD_dp ---- 1110 0.11 .... .... 1011 .0.0 .... \
|
|
|
|
vm=%vm_dp vn=%vn_dp vd=%vd_dp
|
2019-06-11 23:39:48 +08:00
|
|
|
|
|
|
|
VSUB_sp ---- 1110 0.11 .... .... 1010 .1.0 .... \
|
|
|
|
vm=%vm_sp vn=%vn_sp vd=%vd_sp
|
|
|
|
VSUB_dp ---- 1110 0.11 .... .... 1011 .1.0 .... \
|
|
|
|
vm=%vm_dp vn=%vn_dp vd=%vd_dp
|
2019-06-11 23:39:48 +08:00
|
|
|
|
|
|
|
VDIV_sp ---- 1110 1.00 .... .... 1010 .0.0 .... \
|
|
|
|
vm=%vm_sp vn=%vn_sp vd=%vd_sp
|
|
|
|
VDIV_dp ---- 1110 1.00 .... .... 1011 .0.0 .... \
|
|
|
|
vm=%vm_dp vn=%vn_dp vd=%vd_dp
|
2019-06-11 23:39:49 +08:00
|
|
|
|
|
|
|
VFM_sp ---- 1110 1.01 .... .... 1010 . o2:1 . 0 .... \
|
|
|
|
vm=%vm_sp vn=%vn_sp vd=%vd_sp o1=1
|
|
|
|
VFM_dp ---- 1110 1.01 .... .... 1011 . o2:1 . 0 .... \
|
|
|
|
vm=%vm_dp vn=%vn_dp vd=%vd_dp o1=1
|
|
|
|
VFM_sp ---- 1110 1.10 .... .... 1010 . o2:1 . 0 .... \
|
|
|
|
vm=%vm_sp vn=%vn_sp vd=%vd_sp o1=2
|
|
|
|
VFM_dp ---- 1110 1.10 .... .... 1011 . o2:1 . 0 .... \
|
|
|
|
vm=%vm_dp vn=%vn_dp vd=%vd_dp o1=2
|
2019-06-11 23:39:49 +08:00
|
|
|
|
2019-06-14 00:39:07 +08:00
|
|
|
VMOV_imm_sp ---- 1110 1.11 .... .... 1010 0000 .... \
|
|
|
|
vd=%vd_sp imm=%vmov_imm
|
|
|
|
VMOV_imm_dp ---- 1110 1.11 .... .... 1011 0000 .... \
|
|
|
|
vd=%vd_dp imm=%vmov_imm
|
2019-06-11 23:39:49 +08:00
|
|
|
|
2019-06-11 23:39:50 +08:00
|
|
|
VMOV_reg_sp ---- 1110 1.11 0000 .... 1010 01.0 .... \
|
|
|
|
vd=%vd_sp vm=%vm_sp
|
|
|
|
VMOV_reg_dp ---- 1110 1.11 0000 .... 1011 01.0 .... \
|
|
|
|
vd=%vd_dp vm=%vm_dp
|
|
|
|
|
2019-06-11 23:39:49 +08:00
|
|
|
VABS_sp ---- 1110 1.11 0000 .... 1010 11.0 .... \
|
|
|
|
vd=%vd_sp vm=%vm_sp
|
|
|
|
VABS_dp ---- 1110 1.11 0000 .... 1011 11.0 .... \
|
|
|
|
vd=%vd_dp vm=%vm_dp
|
2019-06-11 23:39:50 +08:00
|
|
|
|
|
|
|
VNEG_sp ---- 1110 1.11 0001 .... 1010 01.0 .... \
|
|
|
|
vd=%vd_sp vm=%vm_sp
|
|
|
|
VNEG_dp ---- 1110 1.11 0001 .... 1011 01.0 .... \
|
|
|
|
vd=%vd_dp vm=%vm_dp
|
2019-06-11 23:39:50 +08:00
|
|
|
|
|
|
|
VSQRT_sp ---- 1110 1.11 0001 .... 1010 11.0 .... \
|
|
|
|
vd=%vd_sp vm=%vm_sp
|
|
|
|
VSQRT_dp ---- 1110 1.11 0001 .... 1011 11.0 .... \
|
|
|
|
vd=%vd_dp vm=%vm_dp
|
2019-06-11 23:39:51 +08:00
|
|
|
|
|
|
|
VCMP_sp ---- 1110 1.11 010 z:1 .... 1010 e:1 1.0 .... \
|
|
|
|
vd=%vd_sp vm=%vm_sp
|
|
|
|
VCMP_dp ---- 1110 1.11 010 z:1 .... 1011 e:1 1.0 .... \
|
|
|
|
vd=%vd_dp vm=%vm_dp
|
2019-06-11 23:39:51 +08:00
|
|
|
|
|
|
|
# VCVTT and VCVTB from f16: Vd format depends on size bit; Vm is always vm_sp
|
|
|
|
VCVT_f32_f16 ---- 1110 1.11 0010 .... 1010 t:1 1.0 .... \
|
|
|
|
vd=%vd_sp vm=%vm_sp
|
|
|
|
VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
|
|
|
|
vd=%vd_dp vm=%vm_sp
|
2019-06-11 23:39:51 +08:00
|
|
|
|
|
|
|
# VCVTB and VCVTT to f16: Vd format is always vd_sp; Vm format depends on size bit
|
|
|
|
VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
|
|
|
|
vd=%vd_sp vm=%vm_sp
|
|
|
|
VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
|
|
|
|
vd=%vd_sp vm=%vm_dp
|
2019-06-11 23:39:51 +08:00
|
|
|
|
|
|
|
VRINTR_sp ---- 1110 1.11 0110 .... 1010 01.0 .... \
|
|
|
|
vd=%vd_sp vm=%vm_sp
|
|
|
|
VRINTR_dp ---- 1110 1.11 0110 .... 1011 01.0 .... \
|
|
|
|
vd=%vd_dp vm=%vm_dp
|
|
|
|
|
|
|
|
VRINTZ_sp ---- 1110 1.11 0110 .... 1010 11.0 .... \
|
|
|
|
vd=%vd_sp vm=%vm_sp
|
|
|
|
VRINTZ_dp ---- 1110 1.11 0110 .... 1011 11.0 .... \
|
|
|
|
vd=%vd_dp vm=%vm_dp
|
|
|
|
|
|
|
|
VRINTX_sp ---- 1110 1.11 0111 .... 1010 01.0 .... \
|
|
|
|
vd=%vd_sp vm=%vm_sp
|
|
|
|
VRINTX_dp ---- 1110 1.11 0111 .... 1011 01.0 .... \
|
|
|
|
vd=%vd_dp vm=%vm_dp
|
2019-06-11 23:39:52 +08:00
|
|
|
|
|
|
|
# VCVT between single and double: Vm precision depends on size; Vd is its reverse
|
|
|
|
VCVT_sp ---- 1110 1.11 0111 .... 1010 11.0 .... \
|
|
|
|
vd=%vd_dp vm=%vm_sp
|
|
|
|
VCVT_dp ---- 1110 1.11 0111 .... 1011 11.0 .... \
|
|
|
|
vd=%vd_sp vm=%vm_dp
|
2019-06-11 23:39:52 +08:00
|
|
|
|
|
|
|
# VCVT from integer to floating point: Vm always single; Vd depends on size
|
|
|
|
VCVT_int_sp ---- 1110 1.11 1000 .... 1010 s:1 1.0 .... \
|
|
|
|
vd=%vd_sp vm=%vm_sp
|
|
|
|
VCVT_int_dp ---- 1110 1.11 1000 .... 1011 s:1 1.0 .... \
|
|
|
|
vd=%vd_dp vm=%vm_sp
|
2019-06-11 23:39:52 +08:00
|
|
|
|
|
|
|
# VJCVT is always dp to sp
|
|
|
|
VJCVT ---- 1110 1.11 1001 .... 1011 11.0 .... \
|
|
|
|
vd=%vd_sp vm=%vm_dp
|
2019-06-11 23:39:53 +08:00
|
|
|
|
|
|
|
# VCVT between floating-point and fixed-point. The immediate value
|
|
|
|
# is in the same format as a Vm single-precision register number.
|
|
|
|
# We assemble bits 18 (op), 16 (u) and 7 (sx) into a single opc field
|
|
|
|
# for the convenience of the trans_VCVT_fix functions.
|
|
|
|
%vcvt_fix_op 18:1 16:1 7:1
|
|
|
|
VCVT_fix_sp ---- 1110 1.11 1.1. .... 1010 .1.0 .... \
|
|
|
|
vd=%vd_sp imm=%vm_sp opc=%vcvt_fix_op
|
|
|
|
VCVT_fix_dp ---- 1110 1.11 1.1. .... 1011 .1.0 .... \
|
|
|
|
vd=%vd_dp imm=%vm_sp opc=%vcvt_fix_op
|
2019-06-11 23:39:53 +08:00
|
|
|
|
|
|
|
# VCVT float to integer (VCVT and VCVTR): Vd always single; Vd depends on size
|
|
|
|
VCVT_sp_int ---- 1110 1.11 110 s:1 .... 1010 rz:1 1.0 .... \
|
|
|
|
vd=%vd_sp vm=%vm_sp
|
|
|
|
VCVT_dp_int ---- 1110 1.11 110 s:1 .... 1011 rz:1 1.0 .... \
|
|
|
|
vd=%vd_sp vm=%vm_dp
|