CI / build-libc-malloc (push) Failing after 31sDetails
CI / build-debian-old (push) Failing after 1m32sDetails
CI / build-old-chain-jemalloc (push) Failing after 31sDetails
Codecov / code-coverage (push) Failing after 31sDetails
CI / test-ubuntu-latest (push) Failing after 3m21sDetails
Spellcheck / Spellcheck (push) Failing after 31sDetails
CI / test-sanitizer-address (push) Failing after 6m36sDetails
CI / build-centos-jemalloc (push) Failing after 6m36sDetails
External Server Tests / test-external-standalone (push) Failing after 2m10sDetails
Coverity Scan / coverity (push) Has been skippedDetails
External Server Tests / test-external-nodebug (push) Failing after 2m12sDetails
External Server Tests / test-external-cluster (push) Failing after 2m16sDetails
### Background
The program runs normally in standalone mode, but migrating to cluster
mode may cause errors, this is because some cross slot commands can not
run in cluster mode. We should provide an approach to detect this issue
when running in standalone mode, and need to expose a metric which
indicates the usage of no incompatible commands.
### Solution
To avoid perf impact, we introduce a new config
`cluster-compatibility-sample-ratio` which define the sampling ratio
(0-100) for checking command compatibility in cluster mode. When a
command is executed, it is sampled at the specified ratio to determine
if it complies with Redis cluster constraints, such as cross-slot
restrictions.
A new metric is exposed: `cluster_incompatible_ops` in `info stats`
output.
The following operations will be considered incompatible operations.
- cross-slot command
If a command has multiple cross slot keys, it is incompatible
- `swap, copy, move, select` command
These commands involve multi databases in some cases, we don't allow
multiple DB in cluster mode, so there are not compatible
- Module command with `no-cluster` flag
If a module command has `no-cluster` flag, we will encounter an error
when loading module, leading to fail to load module if cluster is
enabled, so this is incompatible.
- Script/function with `no-cluster` flag
Similar with module command, if we declare `no-cluster` in shebang of
script/function, we also can not run it in cluster mode
- `sort` command by/get pattern
When `sort` command has `by/get` pattern option, we must ask that the
pattern slot is equal with the slot of keys, otherwise it is
incompatible in cluster mode.
- The script/function command accesses the keys and declared keys have
different slots
For the script/function command, we not only check the slot of declared
keys, but only check the slot the accessing keys, if they are different,
we think it is incompatible.
**Besides**, commands like `keys, scan, flushall, script/function
flush`, that in standalone mode iterate over all data to perform the
operation, are only valid for the server that executes the command in
cluster mode and are not broadcasted. However, this does not lead to
errors, so we do not consider them as incompatible commands.
### Performance impact test
**cross slot test**
Below are the test commands and results. When using MSET with 8 keys,
performance drops by approximately 3%.
**single key test**
It may be due to the overhead of the sampling function, and single-key
commands could cause a 1-2% performance drop.
CI / build-macos-latest (push) Waiting to runDetails
CI / build-debian-old (push) Failing after 6sDetails
CI / build-centos-jemalloc (push) Failing after 5sDetails
CI / build-old-chain-jemalloc (push) Failing after 3sDetails
Codecov / code-coverage (push) Failing after 7sDetails
CI / build-libc-malloc (push) Successful in 56sDetails
CI / test-sanitizer-address (push) Failing after 1m8sDetails
CI / test-ubuntu-latest (push) Failing after 2m13sDetails
CI / build-32bit (push) Failing after 3m28sDetails
Coverity Scan / coverity (push) Has been skippedDetails
External Server Tests / test-external-nodebug (push) Failing after 1m48sDetails
External Server Tests / test-external-standalone (push) Failing after 2m9sDetails
External Server Tests / test-external-cluster (push) Failing after 2m14sDetails
Spellcheck / Spellcheck (push) Successful in 9m3sDetails
Since https://github.com/redis/redis/pull/11884, what was previously
accepted as a valid input (hexadecimal string) before 8.0 returned an
error. This PR addresses it. To avoid performance penalties if hints the
compiler that the fallbacks are not likely to happen.
Furthermore, we were ignoring std::result_out_of_range outputs from
fast_float. This PR addresses it as well and includes tests for both
identified scenarios.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
CI / build-libc-malloc (push) Successful in 47sDetails
CI / test-sanitizer-address (push) Failing after 2m6sDetails
CI / test-ubuntu-latest (push) Failing after 2m11sDetails
External Server Tests / test-external-standalone (push) Failing after 2m13sDetails
External Server Tests / test-external-nodebug (push) Failing after 2m11sDetails
External Server Tests / test-external-cluster (push) Failing after 2m18sDetails
Spellcheck / Spellcheck (push) Successful in 9m4sDetails
Fix https://github.com/redis/redis/pull/13853#pullrequestreview-2675227138
This PR ensures that the client's current command is not reset by
unblockClient(), while still needing to be handled after `unblockclient()`.
The FLUSH command still requires reprocessing (update the replication
offset) after unblockClient(). Therefore, we mark such blocked clients
with the CLIENT_PENDING_COMMAND flag to prevent the command from being
reset during unblockClient().
CI / build-debian-old (push) Failing after 7sDetails
CI / build-centos-jemalloc (push) Failing after 6sDetails
CI / build-old-chain-jemalloc (push) Failing after 4sDetails
Codecov / code-coverage (push) Failing after 7sDetails
CI / build-libc-malloc (push) Successful in 53sDetails
CI / test-sanitizer-address (push) Failing after 1m4sDetails
CI / test-ubuntu-latest (push) Failing after 2m9sDetails
CI / build-32bit (push) Failing after 9m50sDetails
Spellcheck / Spellcheck (push) Successful in 9m0sDetails
Coverity Scan / coverity (push) Has been skippedDetails
External Server Tests / test-external-standalone (push) Failing after 31sDetails
External Server Tests / test-external-cluster (push) Failing after 6m36sDetails
External Server Tests / test-external-nodebug (push) Failing after 9m54sDetails
CI / build-macos-latest (push) Has been cancelledDetails
After https://github.com/redis/redis/pull/13167, when a client calls
`FLUSHDB` command, we still async empty database, and the client was
blocked until the lazyfree completes.
1) If another client calls `SLAVEOF` command during this time, the
server will unblock all blocked clients, including those blocked by the
lazyfree. However, when unblocking a lazyfree blocked client, we forgot
to call `updateStatsOnUnblock()`, which ultimately triggered the
following assertion.
2) If a client blocked by Lazyfree is unblocked midway, and at this
point the `bio_comp_list` has already received the completion
notification for the bio, we might end up processing a client that has
already been unblocked in `flushallSyncBgDone()`. Therefore, we need to
filter it out.
---------
Co-authored-by: oranagra <oran@redislabs.com>
After https://github.com/redis/redis/pull/13816, we make a new API to
defrag RedisModuleDict.
Currently, we only support incremental defragmentation of the dictionary
itself, but the defragmentation of values is still not incremental. If
the values are very large, it could lead to significant blocking.
Therefore, in this PR, we have added incremental defragmentation for the
values.
The main change is to the `RedisModuleDefragDictValueCallback`, we
modified the return value of this callback.
When the callback returns 1, we will save the `seekTo` as the key of the
current unfinished node, and the next time we enter, we will continue
defragmenting this node.
When the return value is 0, we will proceed to the next node.
## Test
Since each dictionary in the global dict originally contained only 10
strings, but now it has been changed to a nested dictionary, each
dictionary now has 10 sub-dictionaries, with each sub-dictionary
containing 10 strings, this has led to a corresponding reduction in the
defragmentation time obtained from other tests.
Therefore, the other tests have been modified to always wait for
defragmentation to be turned off before the test begins, then start it
after creating fragmentation, ensuring that they can always run for a
full defragmentation cycle.
---------
Co-authored-by: ephraimfeldblum <ephraim.feldblum@redis.com>
Now attributes are added as well. Moreover the code no longer uses
the first node to guess the size of the items, but does an average
of the few first items/attributes found. Still O(1) but more precise.