Intel accidentally made this dead code in 2010 with commit
2bef93cc20, and no one's ever noticed.
Since no one noticing for so long implies that it doesn't matter,
let's just kill the supposedly optimized code.
Change-Id: Id5b37056cb8884c20bfe2db362e19b46f02e337d
GGLAssembler assumes addresses to be 32-bit and uses ARM 32-bit
instructions to load/store/manipulate addresses. To support, 64-bit
architectures, following changes has been done
1. ARMAssemblerInterface has been extended to support four new
operations ADDR_LDR, ADDR_STR, ADDR_SUB, ADDR_ADD. Base class
implements these virtual functions to use 32bit equivalent
function. This avoids existing 32-bit Assembler backend
implementations like ARMAssembler and MIPSAssembler from
mapping the new functions to existing equivalent routines.
This also allows 64-bit Architectures like AArch64 to override
the function in their assembler backend implementations.
2. GGLAssembler code (spread over GGLAssembler.cpp, GGLAssembler.h
and texturing.cpp) has been changed to use the new operations
for address operations.
Change-Id: I3d7eace4691e3e47cef737d97ac67ce6ef4fb18d
Signed-off-by: Ashok Bhat <ashok.bhat@arm.com>
I was fed-up with the constant conflicts in Eclipse
with the "libutils" version.
Also fix a few copyright notices.
Change-Id: I8ffcb845af4b5d0d178f5565f64dfcfbfa27fcd6
* Add support for UXTB16 to the disassembler
* Add encoding of the UXTB16 instruction to the Pixelflinger JIT.
Introducing the UXTB16 instruction allows removal of some masking code, and is
beneficial from a pipeline point of view - lots of UXTB16 followed by MUL
sequences.
Also, further rescheduling and use of SMULWB brings extra performance
improvements.
* Use UXTB16 in bilinear filtered texturing
Uses UXTB16 to extract channels for SIMD operations, rather than creating and
ANDing with masks. Saves a register and is faster on A8, as UXTB16 result can
feed into first stage of multiply, unlike AND.
Also, used SMULWB rather than SMULBB, which allows removal of MOVs used to
rescale results.
Code has been scheduled for A8 pipeline, specifically aiming to allow
multiplies to issue in pipeline 0, for efficient dual issue operation.
Testing on SpriteMethodTest (http://code.google.com/p/apps-for-android/) gives
8% improvement (12.7 vs. 13.7 fps.)
SMULBB to SMULWB trick could be used in <v6 code path, but this hasn't been
implemented.