Commit Graph

8 Commits

Author SHA1 Message Date
Emilio G. Cota bd224fce60 qht-bench: add -p flag to precompute hash values
Precomputing the hash values allows us to perform more frequent
accesses to the hash table, thereby reaching higher throughputs.

We keep the old behaviour by default, since (1) we might confuse
users if they measured a speedup without changing anything in
the QHT implementation, and (2) benchmarking the hash function
"on line" is also valuable.

Before:
$ taskset -c 0 tests/qht-bench -n 1
 Throughput:        38.18 MT/s

After:
$ taskset -c 0 tests/qht-bench -n 1
 Throughput:        38.16 MT/s

After (with precomputing):
$ taskset -c 0 tests/qht-bench -n 1 -p
 Throughput:        50.87 MT/s

Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-09-26 08:55:54 -07:00
Emilio G. Cota 32359d529f qht: return existing entry when qht_insert fails
The meaning of "existing" is now changed to "matches in hash and
ht->cmp result". This is saner than just checking the pointer value.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by:  Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 07:42:55 -10:00
Emilio G. Cota 61b8cef1d4 qht: require a default comparison function
qht_lookup now uses the default cmp function. qht_lookup_custom is defined
to retain the old behaviour, that is a cmp function is explicitly provided.

qht_insert will gain use of the default cmp in the next patch.

Note that we move qht_lookup_custom's @func to be the last argument,
which makes the new qht_lookup as simple as possible.
Instead of this (i.e. keeping @func 2nd):
0000000000010750 <qht_lookup>:
   10750:       89 d1                   mov    %edx,%ecx
   10752:       48 89 f2                mov    %rsi,%rdx
   10755:       48 8b 77 08             mov    0x8(%rdi),%rsi
   10759:       e9 22 ff ff ff          jmpq   10680 <qht_lookup_custom>
   1075e:       66 90                   xchg   %ax,%ax

We get:
0000000000010740 <qht_lookup>:
   10740:       48 8b 4f 08             mov    0x8(%rdi),%rcx
   10744:       e9 37 ff ff ff          jmpq   10680 <qht_lookup_custom>
   10749:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 07:42:55 -10:00
Emilio G. Cota 4e2ca83e71 tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK
This will enable us to decouple code translation from the value
of parallel_cpus at any given time. It will also help us minimize
TB flushes when generating code via EXCP_ATOMIC.

Note that the declaration of parallel_cpus is brought to exec-all.h
to be able to define there the "curr_cflags" inline.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 13:53:41 -07:00
Lluís Vilanova 61a67f71dd exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state
Every vCPU now uses a separate set of TBs for each set of dynamic
tracing event state values. Each set of TBs can be used by any number of
vCPUs to maximize TB reuse when vCPUs have the same tracing state.

This feature is later used by tracetool to optimize tracing of guest
code events.

The maximum number of TB sets is defined as 2^E, where E is the number
of events that have the 'vcpu' property (their state is stored in
CPUState->trace_dstate).

For this to work, a change on the dynamic tracing state of a vCPU will
force it to flush its virtual TB cache (which is only indexed by
address), and fall back to the physical TB cache (which now contains the
vCPU's dynamic tracing state as part of the hashing function).

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-id: 149915775266.6295.10060144081246467690.stgit@frigg.lan
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-07-17 13:11:05 +01:00
Emilio G. Cota 977ec47de0 qht-bench: relax test_start/stop atomic accesses
test_start/stop are used only as flags to loop on. Barriers are unnecessary,
since no dependent data is transferred among threads apart from the flags
themselves.

This commit relaxes the three accesses to test_start/stop that were
not yet relaxed.

Signed-off-by: Emilio G. Cota <cota@braap.org>
2016-10-24 15:27:19 +02:00
Paolo Bonzini e9abfcb57f clean-includes: run it once more
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:03 +02:00
Emilio G. Cota 515864a0d7 qht: add qht-bench, a performance benchmark
This serves as a performance benchmark as well as a stress test
for QHT. We can tweak quite a number of things, including the
number of resize threads and how frequently resizes are triggered.

A performance comparison of QHT vs CLHT[1] and ck_hs[2] using
this same benchmark program can be found here:
  http://imgur.com/a/0Bms4

The tests are run on a 64-core AMD Opteron 6376, pinning threads
to cores favoring same-socket cores. For each run, qht-bench is
invoked with:
  $ tests/qht-bench -d $duration -n $n -u $u -g $range
, where $duration is in seconds, $n is the number of threads,
$u is the update rate (0.0 to 100.0), and $range is the number
of keys.

Note that ck_hs's performance drops significantly as writes go
up, since it requires an external lock (I used a ck_spinlock)
around every write.

Also, note that CLHT instead of using a seqlock, relies on an
allocator that does not ever return the same address during the
same read-critical section. This gives it a slight performance
advantage over QHT on read-heavy workloads, since the seqlock
writes aren't there.

[1] CLHT: https://github.com/LPD-EPFL/CLHT
          https://infoscience.epfl.ch/record/207109/files/ascy_asplos15.pdf

[2] ck_hs: http://concurrencykit.org/
           http://backtrace.io/blog/blog/2015/03/13/workload-specialization/

A few of those plots are shown in text here, since that site
might not be online forever. Throughput is on Mops/s on the Y axis.

                             200K keys, 0 % updates

  450 ++--+------+------+-------+-------+-------+-------+------+-------+--++
      |   +      +      +       +       +       +       +      +      +N+  |
  400 ++                                                           ---+E+ ++
      |                                                       +++----      |
  350 ++          9 ++------+------++                       --+E+    -+H+ ++
      |             |      +H+-     |                 -+N+----   ---- +++  |
  300 ++          8 ++     +E+     ++             -----+E+  --+H+         ++
      |             |      +++      |         -+N+-----+H+--               |
  250 ++          7 ++------+------++  +++-----+E+----                    ++
  200 ++                    1         -+E+-----+H+                        ++
      |                           ----                     qht +-E--+      |
  150 ++                      -+E+                        clht +-H--+     ++
      |                   ----                              ck +-N--+      |
  100 ++               +E+                                                ++
      |            ----                                                    |
   50 ++       -+E+                                                       ++
      |   +E+E+  +      +       +       +       +       +      +       +   |
    0 ++--E------+------+-------+-------+-------+-------+------+-------+--++
          1      8      16      24      32      40      48     56      64
                                Number of threads

                             200K keys, 1 % updates

  350 ++--+------+------+-------+-------+-------+-------+------+-------+--++
      |   +      +      +       +       +       +       +      +     -+E+  |
  300 ++                                                         -----+H+ ++
      |                                                       +E+--        |
      |           9 ++------+------++                  +++----             |
  250 ++            |      +E+   -- |                 -+E+                ++
      |           8 ++         --  ++             ----                     |
  200 ++            |      +++-     |  +++  ---+E+                        ++
      |           7 ++------N------++ -+E+--               qht +-E--+      |
      |                     1  +++----                    clht +-H--+      |
  150 ++                      -+E+                          ck +-N--+     ++
      |                   ----                                             |
  100 ++               +E+                                                ++
      |            ----                                                    |
      |        -+E+                                                        |
   50 ++    +H+-+N+----+N+-----+N+------                                  ++
      |   +E+E+  +      +       +      +N+-----+N+-----+N+----+N+-----+N+  |
    0 ++--E------+------+-------+-------+-------+-------+------+-------+--++
          1      8      16      24      32      40      48     56      64
                                Number of threads

                             200K keys, 20 % updates

  300 ++--+------+------+-------+-------+-------+-------+------+-------+--++
      |   +      +      +       +       +       +       +      +       +   |
      |                                                              -+H+  |
  250 ++                                                         ----     ++
      |           9 ++------+------++                       --+H+  ---+E+  |
      |           8 ++     +H+--   ++                 -+H+----+E+--        |
  200 ++            |      +E+    --|             -----+E+--  +++         ++
      |           7 ++      + ---- ++       ---+H+---- +++ qht +-E--+      |
  150 ++          6 ++------N------++ -+H+-----+E+        clht +-H--+     ++
      |                     1     -----+E+--                ck +-N--+      |
      |                       -+H+----                                     |
  100 ++                  -----+E+                                        ++
      |                +E+--                                               |
      |            ----+++                                                 |
   50 ++       -+E+                                                       ++
      |     +E+ +++                                                        |
      |   +E+N+-+N+-----+       +       +       +       +      +       +   |
    0 ++--E------+------N-------N-------N-------N-------N------N-------N--++
          1      8      16      24      32      40      48     56      64
                                Number of threads

                            200K keys, 100 % updates       qht +-E--+
                                                          clht +-H--+
  160 ++--+------+------+-------+-------+-------+-------+---ck-+-N-----+--++
      |   +      +      +       +       +       +       +      +   ----H   |
  140 ++                                                      +H+--  -+E+ ++
      |                                                +++----   ----      |
  120 ++          8 ++------+------++                 -+H+    +E+         ++
      |           7 ++     +H+---- ++             ---- +++----             |
  100 ++            |      +E+      |  +++  ---+H+    -+E+                ++
      |           6 ++     +++     ++ -+H+--   +++----                     |
   80 ++          5 ++------N----------+E+-----+E+                        ++
      |                     1 -+H+---- +++                                 |
      |                   -----+E+                                         |
   60 ++               +H+---- +++                                        ++
      |            ----+E+                                                 |
   40 ++        +H+----                                                   ++
      |       --+E+                                                        |
   20 ++    +E+                                                           ++
      |  +EE+    +      +       +       +       +       +      +       +   |
    0 ++--+N-N---N------N-------N-------N-------N-------N------N-------N--++
          1      8      16      24      32      40      48     56      64
                                Number of threads

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-13-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-06-11 17:11:16 -07:00