Commit Graph

60 Commits

Author SHA1 Message Date
Emilio G. Cota 37b995f6e7 target-i386: remove helper_lock()
It's been superseded by the atomic helpers.

The use of the atomic helpers provides a significant performance and scalability
improvement. Below is the result of running the atomic_add-test microbenchmark with:
 $ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n
, where $n is the number of threads and $r is the allowed range for the additions.

The scenarios measured are:
- atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset)
- cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper
- master: before this patchset

Results sorted in ascending range, i.e. descending degree of contention.
Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64
Opteron 6376 cores.

                atomic_add-bench: 5000000 ops/thread, [0,1] range

  25 ++---------+----------+---------+----------+----------+----------+---++
     + atomic +-E--+       +         +          +          +          +    |
     |cmpxchg +-H--+                                                       |
  20 +Emaster +-N--+                                                      ++
     ||                                                                    |
     |++                                                                   |
     ||                                                                    |
  15 +++                                                                  ++
     |N|                                                                   |
     |+|                                                                   |
  10 ++|                                                                  ++
     |+|+                                                                  |
     | |    -+E+------        +++  ---+E+------+E+------+E+-----+E+------+E|
     |+E+E+- +++     +E+------+E+--                                        |
   5 ++|+                                                                 ++
     |+N+H+---                                 +++                         |
     ++++N+--+H++----+++   +  +++  --++H+------+H+------+H++----+H+---+--- |
   0 ++---------+-----H----+---H-----+----------+----------+----------+---H+
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 5000000 ops/thread, [0,2] range

  25 ++---------+----------+---------+----------+----------+----------+---++
     ++atomic +-E--+       +         +          +          +          +    |
     |cmpxchg +-H--+                                                       |
  20 ++master +-N--+                                                      ++
     |E|                                                                   |
     |++                                                                   |
     ||E                                                                   |
  15 ++|                                                                  ++
     |N||                                                                  |
     |+||                                   ---+E+------+E+-----+E+------+E|
  10 ++| |        ---+E+------+E+-----+E+---                    +++      +++
     ||H+E+--+E+--                                                         |
     |+++++                                                                |
     | ||                                                                  |
   5 ++|+H+--                                  +++                        ++
     |+N+    -                              ---+H+------+H+------          |
     +  +N+--+H++----+H+---+--+H+----++H+---    +          +    +H+---+--+H|
   0 ++---------+----------+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 5000000 ops/thread, [0,8] range

  40 ++---------+----------+---------+----------+----------+----------+---++
     ++atomic +-E--+       +         +          +          +          +    |
  35 +cmpxchg +-H--+                                                      ++
     | master +-N--+               ---+E+------+E+------+E+-----+E+------+E|
  30 ++|                   ---+E+--   +++                                 ++
     | |            -+E+---                                                |
  25 ++E        ---- +++                                                  ++
     |+++++ -+E+                                                           |
  20 +E+ E-- +++                                                          ++
     |H|+++                                                                |
     |+|                                       +H+-------                  |
  15 ++H+                                   ---+++      +H+------         ++
     |N++H+--                         +++---                    +H+------++|
  10 ++ +++  -       +++           ---+H+                       +++      +H+
     | |     +H+-----+H+------+H+--                                        |
   5 ++|                      +++                                         ++
     ++N+N+--+N++          +         +          +          +          +    |
   0 ++---------+----------+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

               atomic_add-bench: 5000000 ops/thread, [0,128] range

  160 ++---------+---------+----------+---------+----------+----------+---++
      + atomic +-E--+      +          +         +          +          +    |
  140 +cmpxchg +-H--+                          +++      +++               ++
      | master +-N--+                           E--------E------+E+------++|
  120 ++                                      --|        |      +++       E+
      |                                     -- +++      +++              ++|
  100 ++                                   -                              ++
      |                                +++-                     +++      ++|
   80 ++                              -+E+    -+H+------+H+------H--------++
      |                           ----    ----                  +++       H|
      |            ---+E+-----+E+-  ---+H+                               ++|
   60 ++     +E+---   +++  ---+H+---                                      ++
      |    --+++   ---+H+--                                                |
   40 ++ +E+-+H+---                                                       ++
      |  +H+                                                               |
   20 +EE+                                                                ++
      +N+        +         +          +         +          +          +    |
    0 ++N-N---N--+---------+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

              atomic_add-bench: 5000000 ops/thread, [0,1024] range

  350 ++---------+---------+----------+---------+----------+----------+---++
      + atomic +-E--+      +          +         +          +          +    |
  300 +cmpxchg +-H--+                                                    +++
      | master +-N--+                                           +++       ||
      |                                                 +++      |    ----E|
  250 ++                                                 |   ----E----    ++
      |                                              ----E---    |    ---+H|
  200 ++                                      -+E+---   +++  ---+H+---    ++
      |                                   ----         -+H+--              |
      |                                +E+     +++ ---- +++                |
  150 ++                            ---+++  ---+H+-                       ++
      |                          ---  -+H+--                               |
  100 ++                   ---+E+ ---- +++                                ++
      |      +++   ---+E+-----+H+-                                         |
      |     -+E+------+H+--                                                |
   50 ++ +E+                                                              ++
      +EE+       +         +          +         +          +          +    |
    0 ++N-N---N--+---------+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

  hi-res: http://imgur.com/a/fMRmq

For master I stopped measuring master after 8 threads, because there is little
point in measuring the well-known performance collapse of a contended lock.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-21-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:01 -07:00
Emilio G. Cota ae03f8de45 target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
The diff here is uglier than necessary. All this does is to turn

FOO

into:

if (s->prefix & PREFIX_LOCK) {
  BAR
} else {
  FOO
}

where FOO is the original implementation of an unlocked cmpxchg.

[rth: Adjust unlocked cmpxchg to use movcond instead of branches.
Adjust helpers to use atomic helpers.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-6-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:01 -07:00
Paolo Bonzini 0f70ed4759 target-i386: implement PKE for TCG
Tested with kvm-unit-tests.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-24 14:01:08 +01:00
Richard Henderson 07929f2ab2 target-i386: Implement FSGSBASE
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-02-15 14:50:00 +11:00
Richard Henderson 7d117ce81e target-i386: Clear bndregs during legacy near jumps
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-02-15 14:50:00 +11:00
Richard Henderson bdd87b3b59 target-i386: Implement BNDLDX, BNDSTX
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-02-15 14:50:00 +11:00
Richard Henderson 523e28d761 target-i386: Implement BNDCL, BNDCU, BNDCN
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-02-15 14:50:00 +11:00
Richard Henderson 7f0b7141b4 target-i386: Perform set/reset_inhibit_irq inline
With helpers that can be reused for other things.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-02-13 07:59:59 +11:00
Richard Henderson c9cfe8f9fb target-i386: Implement XSAVEOPT
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-02-13 07:59:59 +11:00
Richard Henderson 19dc85dba2 target-i386: Add XSAVE extension
This includes XSAVE, XRSTOR, XGETBV, XSETBV, which are all related,
as well as the associate cpuid bits.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-02-13 07:59:59 +11:00
Richard Henderson 64dbaff09b target-i386: Split fxsave/fxrstor implementation
We will be able to reuse these pieces for XSAVE/XRSTOR.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-02-13 07:59:59 +11:00
Richard Henderson 743e398e2f target-i386: Rewrite gen_enter inline
Use gen_lea_v_seg for centralized segment base knowledge.  Unify
code across 32- and 64-bit.  Fix note about "must save state"
before using the out-of-line helpers.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1450379966-28198-8-git-send-email-rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-09 15:46:54 +01:00
Richard Henderson d005233923 target-i386: Check CR4[DE] for processing DR4/DR5
Introduce helper_get_dr so that we don't have to put CR4[DE]
into the scarce HFLAGS resource.  At the same time, rename
helper_movl_drN_T0 to helper_set_dr and set the helper flags.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-10-23 12:59:27 -02:00
Eduardo Habkost 5223a9423c target-i386: Handle I/O breakpoints
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-10-23 12:59:27 -02:00
Pavel Dovgalyuk 100ec09919 target-i386: exception handling for seg_helper functions
This patch fixes exception handling for seg_helper functions.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-09-15 12:31:59 -07:00
Paolo Bonzini 3f7d846486 target-i386: Use correct memory attributes for ioport accesses
In order to do this, stop using the cpu_in*/out* helpers, and instead
access address_space_io directly.

cpu_in* and cpu_out* remain for usage in the monitor, in qtest, and
in Xen.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:10:00 +02:00
Richard Henderson 2ef6175aa7 tcg: Invert the inclusion of helper.h
Rather than include helper.h with N values of GEN_HELPER, include a
secondary file that sets up the macros to include helper.h.  This
minimizes the files that must be rebuilt when changing the macros
for file N.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-05-28 09:33:54 -07:00
Paolo Bonzini 81f3053b77 target-i386: yield to another VCPU on PAUSE
After commit b1bbfe7 (aio / timers: On timer modification, qemu_notify
or aio_notify, 2013-08-21) FreeBSD guests report a huge slowdown.

The problem shows up as soon as FreeBSD turns out its periodic (~1 ms)
tick, but the timers are only the trigger for a pre-existing problem.

Before the offending patch, setting a timer did a timer_settime system call.

After, setting the timer exits the event loop (which uses poll) and
reenters it with a new deadline.  This does not cause any slowdown; the
difference is between one system call (timer_settime and a signal
delivery (SIGALRM) before the patch, and two system calls afterwards
(write to a pipe or eventfd + calling poll again when re-entering the
event loop).

Unfortunately, the exit/enter causes the main loop to grab the iothread
lock, which in turns kicks the VCPU thread out of execution.  This
causes TCG to execute the next VCPU in its round-robin scheduling of
VCPUS.  When the second VCPU is mostly unused, FreeBSD runs a "pause"
instruction in its idle loop which only burns cycles without any
progress.  As soon as the timer tick expires, the first VCPU runs
the interrupt handler but very soon it sets it again---and QEMU
then goes back doing nothing in the second VCPU.

The fix is to make the pause instruction do "cpu_loop_exit".

Cc: Richard Henderson <rth@twiddle.net>
Reported-by: Luigi Rizzo <rizzo@iet.unipi.it>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 1384948442-24217-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
2013-11-21 07:55:45 -08:00
Richard Henderson a4bcea3d67 target-i386: Use mulu2 and muls2
These correspond very closely to the insns that we're emulating.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-27 19:06:28 +00:00
Richard Henderson 321c535105 target-i386: Implement tzcnt and fix lzcnt
We weren't computing flags for lzcnt at all.  At the same time,
adjust the implementation of bsf/bsr to avoid the local branch,
using movcond instead.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-19 23:05:18 -08:00
Richard Henderson f1300734cb target-i386: Use clz/ctz for bsf/bsr helpers
And mark the helpers as NO_RWG_SE.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-19 23:05:18 -08:00
Richard Henderson 0592f74a75 target-i386: Implement PDEP, PEXT
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:52:32 -08:00
Richard Henderson 5f1f4b1771 target-i386: Implement MULX
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:52:32 -08:00
Richard Henderson 988c3eb0d6 target-i386: Use CC_SRC2 for ADC and SBB
Add another slot in ENV and store two of the three inputs.  This lets us
do less work when carry-out is not needed, and avoids the unpredictable
CC_OP after translating these insns.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:39:09 -08:00
Richard Henderson db9f259772 target-i386: Make helper_cc_compute_{all,c} const
Pass the data in explicitly, rather than indirectly via env.
This avoids all sorts of unnecessary register spillage.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:25:55 -08:00
Paolo Bonzini 022c62cbbc exec: move include files to include/exec/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:31 +01:00
Aurelien Jarno 95b638a292 target-i386: rename helper flags
Rename helper flags to the new ones. This is purely a mechanical change,
it's possible to use better flags by looking at the helpers.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:23 +01:00
H. Peter Anvin a9321a4d49 x86: Implement SMEP and SMAP
This patch implements Supervisor Mode Execution Prevention (SMEP) and
Supervisor Mode Access Prevention (SMAP) for x86.  The purpose of the
patch, obviously, is to help kernel developers debug the support for
those features.

A fair bit of the code relates to the handling of CPUID features.  The
CPUID code probably would get greatly simplified if all the feature
bit words were unified into a single vector object, but in the
interest of producing a minimal patch for SMEP/SMAP, and because I had
very limited time for this project, I followed the existing style.

[ v2: don't change the definition of the qemu64 CPU shorthand, since
  that breaks loading old snapshots.  Per Anthony Liguori this can be
  fixed once the CPU feature set is snapshot.

  Change the coding style slightly to conform to checkpatch.pl. ]

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-10-01 08:04:22 -05:00
Blue Swirl 92fc4b586f x86: switch to AREG0 free mode
Add an explicit CPUX86State parameter instead of relying on AREG0.

Remove temporary wrappers and switch to AREG0 free mode.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-08-14 19:01:26 +00:00
Blue Swirl 2999a0b200 x86: avoid AREG0 in segmentation helpers
Add an explicit CPUX86State parameter instead of relying on AREG0.

Rename remains of op_helper.c to seg_helper.c.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-08-14 19:01:26 +00:00
Blue Swirl 4a7443be52 x86: avoid AREG0 for misc helpers
Add an explicit CPUX86State parameter instead of relying on AREG0.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-08-14 19:01:26 +00:00
Blue Swirl 608badfc66 x86: avoid AREG0 for SMM helpers
Add an explicit CPUX86State parameter instead of relying on AREG0.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-08-14 19:01:25 +00:00
Blue Swirl 052e80d5e0 x86: avoid AREG0 for SVM helpers
Add an explicit CPUX86State parameter instead of relying on AREG0.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-08-14 19:01:25 +00:00
Blue Swirl 7923057bae x86: avoid AREG0 for integer helpers
Add an explicit CPUX86State parameter instead of relying on AREG0.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-08-14 19:01:25 +00:00
Blue Swirl f0967a1add x86: avoid AREG0 for condition code helpers
Add an explicit CPUX86State parameter instead of relying on AREG0.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-08-14 19:01:25 +00:00
Blue Swirl d3eb5eaeb5 x86: avoid AREG0 for FPU helpers
Make FPU helpers take a parameter for CPUState instead
of relying on global env.

Introduce temporary wrappers for FPU load and store ops. Remove
wrappers for non-AREG0 code. Don't call unconverted helpers
directly.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-08-14 19:01:25 +00:00
Blue Swirl 77b2bc2c09 x86: avoid AREG0 for exceptions
Add an explicit CPUX86State parameter instead of relying on AREG0.

Merge raise_exception_env() to raise_exception(), likewise with
raise_exception_err_env() and raise_exception_err().

Introduce cpu_svm_check_intercept_param() and cpu_vmexit()
as wrappers.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-06-28 20:28:08 +00:00
Aurelien Jarno 2355c16e74 target-i386: fix SSE rounding and flush to zero
SSE rounding and flush to zero control has never been implemented. However
given that softfloat-native was using a single state for FPU and SSE and
given that glibc is setting both FPU and SSE state in fesetround(), this
was working correctly up to the switch to softfloat.

Fix that by adding an update_sse_status() function similar to
update_fpu_status(), and callin git on write to mxcsr.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-01-11 09:55:28 +01:00
Andre Przywara 31501a714b target-i386: implement lzcnt emulation
lzcnt is a AMD Phenom/Barcelona added instruction returning the
number of leading zero bits in a word.
As this is similar to the "bsr" instruction, reuse the existing
code. There need to be some more changes, though, as lzcnt always
returns a valid value (in opposite to bsr, which has a special
case when the operand is 0).
lzcnt is guarded by the ABM CPUID bit (Fn8000_0001:ECX_5).

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2009-10-23 17:10:36 +02:00
Andre Przywara 1b050077d2 target-i386: add RDTSCP support
RDTSCP reads the time stamp counter and atomically also the content
of a 32-bit MSR, which can be freely set by the OS. This allows CPU
local data to be queried by userspace.
Linux uses this to allow a fast implementation of the getcpu()
syscall, which uses the vsyscall page to avoid a context switch.
AMD CPUs since K8RevF and Intel CPUs since Nehalem support this
instruction.
RDTSCP is guarded by the RDTSCP CPUID bit (Fn8000_0001:EDX[27]).

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2009-10-04 14:46:34 +02:00
Jan Kiszka a23978077b x86: Add support for resume flag
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
2009-05-22 10:50:37 -05:00
pbrook a7812ae412 TCG variable type checking.
Signed-off-by: Paul Brook <paul@codesourcery.com>


git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5729 c046a42c-6fe2-441c-8c8c-71466251a162
2008-11-17 14:43:54 +00:00
balrog 2436b61a6b SYSENTER/SYSEXIT IA-32e implementation (Alexander Graf).
On Intel CPUs, sysenter and sysexit are valid in 64-bit mode. This patch
makes both 64-bit aware and enables them for Intel CPUs.
Add cpu save/load for 64-bit wide sysenter variables.

Signed-off-by: Alexander Graf <agraf@suse.de>


git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5318 c046a42c-6fe2-441c-8c8c-71466251a162
2008-09-25 18:16:18 +00:00
blueswir1 79383c9c08 Fix some warnings that would be generated by gcc -Wredundant-decls
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5115 c046a42c-6fe2-441c-8c8c-71466251a162
2008-08-30 09:51:20 +00:00
bellard 94451178b6 HLT, MWAIT and MONITOR insn fixes (initial patch by Alexander Graf)
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4746 c046a42c-6fe2-441c-8c8c-71466251a162
2008-06-18 09:32:32 +00:00
bellard db620f46a8 reworked SVM interrupt handling logic - fixed vmrun EIP saved value - reworked cr8 handling - added CPUState.hflags2
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4662 c046a42c-6fe2-441c-8c8c-71466251a162
2008-06-04 17:02:19 +00:00
bellard 914178d34b 32 bit SVM fixes - INVLPG and INVLPGA updates
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4660 c046a42c-6fe2-441c-8c8c-71466251a162
2008-06-04 13:53:05 +00:00
bellard 872929aa59 SVM rework
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4605 c046a42c-6fe2-441c-8c8c-71466251a162
2008-05-28 16:16:54 +00:00
bellard 437a88a51c proper helper definition registering (all targets must do that)
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4530 c046a42c-6fe2-441c-8c8c-71466251a162
2008-05-22 16:11:04 +00:00
bellard 1b9d9ebb8a cmpxchg8b fix - added cmpxchg16b
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4522 c046a42c-6fe2-441c-8c8c-71466251a162
2008-05-22 09:52:38 +00:00