Commit Graph

1536 Commits

Author SHA1 Message Date
Vitah Lin 47505c3533
Fix 'Client output buffer hard limit is enforced' test causing infinite loop (#13934)
This PR fixes an issue in the CI test for client-output-buffer-limit,
which was causing an infinite loop when running on macOS 15.4.

### Problem

This test start two clients, R and R1:
```c
R1 subscribe foo
R publish foo bar
```

When R executes `PUBLISH foo bar`, the server first stores the message
`bar` in R1‘s buf. Only when the space in buf is insufficient does it
call `_addReplyProtoToList`.
Inside this function, `closeClientOnOutputBufferLimitReached` is invoked
to check whether the client’s R1 output buffer has reached its
configured limit.
On macOS 15.4, because the server writes to the client at a high speed,
R1’s buf never gets full. As a result,
`closeClientOnOutputBufferLimitReached` in the test is never triggered,
causing the test to never exit and fall into an infinite loop.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-05-06 10:44:16 +08:00
Pieter Cailliau d65102861f
Adding AGPLv3 as a license option to Redis! (#13997)
Read more about [the new license option](http://redis.io/blog/agplv3/)
and [the Redis 8 release](http://redis.io/blog/redis-8-ga/).
2025-05-01 14:04:22 +01:00
YaacovHazan de16bee70a
Limiting output buffer for unauthenticated client (CVE-2025-21605) (#13993)
CI / build-centos-jemalloc (push) Failing after 18s Details
CI / build-32bit (push) Failing after 30s Details
CI / test-ubuntu-latest (push) Failing after 32s Details
CI / build-old-chain-jemalloc (push) Failing after 41s Details
CI / build-libc-malloc (push) Successful in 1m16s Details
CI / test-sanitizer-address (push) Failing after 1m39s Details
Codecov / code-coverage (push) Failing after 2m47s Details
Spellcheck / Spellcheck (push) Failing after 4m31s Details
CI / build-debian-old (push) Failing after 6m49s Details
Coverity Scan / coverity (push) Has been skipped Details
External Server Tests / test-external-standalone (push) Failing after 1m53s Details
External Server Tests / test-external-nodebug (push) Failing after 1m57s Details
External Server Tests / test-external-cluster (push) Failing after 1m59s Details
CI / build-macos-latest (push) Has been cancelled Details
For unauthenticated clients the output buffer is limited to prevent them
from abusing it by not reading the replies
2025-04-30 09:58:51 +03:00
Yuan Wang 14dd59ab12
Remove io-threads-do-reads from normal config list (#13987)
CI / test-ubuntu-latest (push) Failing after 30s Details
CI / test-sanitizer-address (push) Failing after 31s Details
CI / build-32bit (push) Failing after 31s Details
CI / build-libc-malloc (push) Failing after 31s Details
CI / build-old-chain-jemalloc (push) Failing after 31s Details
Codecov / code-coverage (push) Failing after 31s Details
Spellcheck / Spellcheck (push) Failing after 30s Details
CI / build-debian-old (push) Failing after 43s Details
CI / build-centos-jemalloc (push) Failing after 1m31s Details
CI / build-macos-latest (push) Has been cancelled Details
Coverity Scan / coverity (push) Has been skipped Details
External Server Tests / test-external-standalone (push) Failing after 1m52s Details
External Server Tests / test-external-nodebug (push) Failing after 1m58s Details
External Server Tests / test-external-cluster (push) Failing after 3m3s Details
Since after https://github.com/redis/redis/pull/13695,
`io-threads-do-reads` config is deprecated, we should remove it from
normal config list and only keep it in deprecated config list, but we
forgot to do this, this PR fixes this.

thanks @YaacovHazan for reporting this
2025-04-28 12:55:47 +03:00
Vitah Lin 9f99dd5f6d
Fix tls port update not reflected in CLUSTER SLOTS (#13966)
### Problem 

A previous PR (https://github.com/redis/redis/pull/13932) fixed the TCP
port issue in CLUSTER SLOTS, but it seems the handling of the TLS port
was overlooked.

There is this comment in the `addNodeToNodeReply` function in the
`cluster.c` file:
```c
    /* Report TLS ports to TLS client, and report non-TLS port to non-TLS client. */
    addReplyLongLong(c, clusterNodeClientPort(node, shouldReturnTlsInfo()));
    addReplyBulkCBuffer(c, clusterNodeGetName(node), CLUSTER_NAMELEN);
```

### Fixed 

This PR fixes the TLS port issue and adds relevant tests.
2025-04-24 09:36:45 +08:00
nesty92 8468ded667
Fix incorrect lag due to trimming stream via XTRIM or XADD command (#13958)
CI / build-libc-malloc (push) Failing after 1s Details
CI / build-centos-jemalloc (push) Failing after 1s Details
CI / test-ubuntu-latest (push) Failing after 31s Details
CI / build-debian-old (push) Failing after 31s Details
CI / build-32bit (push) Failing after 31s Details
CI / build-old-chain-jemalloc (push) Failing after 31s Details
Codecov / code-coverage (push) Failing after 32s Details
CI / test-sanitizer-address (push) Failing after 1m22s Details
Spellcheck / Spellcheck (push) Failing after 31s Details
CI / build-macos-latest (push) Has been cancelled Details
Coverity Scan / coverity (push) Has been skipped Details
External Server Tests / test-external-standalone (push) Failing after 31s Details
External Server Tests / test-external-cluster (push) Failing after 31s Details
External Server Tests / test-external-nodebug (push) Failing after 31s Details
This PR fix the lag calculation by ensuring that when consumer group's last_id
is behind the first entry, the consumer group's entries read is considered
invalid and recalculated from the start of the stream

Supplement to PR #13473 

Close #13957

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
2025-04-22 10:11:10 +08:00
Stav-Levi a257b6b4ba
Fix port update not reflected in CLUSTER SLOTS (#13932)
Close https://github.com/redis/redis/issues/13892 
config set port cmd updates server.port. cluster slot retrieves
information about cluster slots and their associated nodes. the fix
updates this info when config set port cmd is done, so cluster slots cmd
returns the right value.
2025-04-21 17:13:55 +08:00
Oran Agra d3a0d95dab
Avoid using debug log level in tests that produce many keys (#13942)
CI / test-ubuntu-latest (push) Failing after 31s Details
CI / test-sanitizer-address (push) Failing after 31s Details
CI / build-debian-old (push) Failing after 31s Details
CI / build-32bit (push) Failing after 31s Details
CI / build-libc-malloc (push) Failing after 31s Details
CI / build-centos-jemalloc (push) Failing after 31s Details
CI / build-old-chain-jemalloc (push) Failing after 31s Details
Codecov / code-coverage (push) Failing after 31s Details
Spellcheck / Spellcheck (push) Failing after 31s Details
Coverity Scan / coverity (push) Has been skipped Details
External Server Tests / test-external-standalone (push) Failing after 31s Details
External Server Tests / test-external-cluster (push) Failing after 31s Details
External Server Tests / test-external-nodebug (push) Failing after 32s Details
CI / build-macos-latest (push) Has been cancelled Details
if the test fails, and there are per-key log prints, this can flood the
CI due to --dump-logs
2025-04-14 14:15:50 +03:00
Ozan Tezcan c9be4fbd72
Fix order of KSN for hgetex command (#13931)
If HGETEX command deletes the only field due to lazy expiry, Redis
currently sends `del` KSN (Keyspace Notification) first, followed by
`hexpired` KSN. The order should be reversed, `hexpired` should be sent
first and `del` later.

Additonal changes: More test coverage for HGETDEL KSN

---------

Co-authored-by: hristosko <hristosko.chaushev@redis.com>
2025-04-14 13:31:31 +03:00
debing.sun b33a405bf1
Fix timing issue in lazyfree test (#13926)
This test was introduced by https://github.com/redis/redis/issues/13853
We determine if the client is in blocked status, but if async flushdb is
completed before checking the blocked status, the test will fail.
So modify the test to only determine if `lazyfree_pending_objects` is
correct to ensure that flushdb is async, that is, the client must be
blocked.
2025-04-13 20:32:16 +08:00
Vitah Lin 057f039c4b
Fix 'RESTORE can set LFU' test (#13896)
CI / test-ubuntu-latest (push) Failing after 31s Details
CI / test-sanitizer-address (push) Failing after 31s Details
CI / build-32bit (push) Failing after 31s Details
CI / build-libc-malloc (push) Failing after 32s Details
CI / build-centos-jemalloc (push) Failing after 32s Details
CI / build-debian-old (push) Failing after 48s Details
CI / build-old-chain-jemalloc (push) Failing after 31s Details
Codecov / code-coverage (push) Failing after 31s Details
Spellcheck / Spellcheck (push) Failing after 31s Details
CodeQL / Analyze (cpp) (push) Failing after 31s Details
CI / build-macos-latest (push) Has been cancelled Details
Coverity Scan / coverity (push) Has been skipped Details
External Server Tests / test-external-standalone (push) Failing after 31s Details
External Server Tests / test-external-cluster (push) Failing after 31s Details
External Server Tests / test-external-nodebug (push) Failing after 31s Details
When the `restore foo 0 $encoded freq 100` command and `set freq [r
object freq foo]` run in different minute timestamps (i.e., when
server.unixtime/60 changes between these operations), the assertion may
fail due to the LFU decay.

This PR updates the “RESTORE can set LFU” test to verify the actual freq
value based on minute timestamps.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-03-28 13:33:58 +08:00
Cong Chen 981aa5c12f
Fix timing issue in HEXPIREAT test (#13873)
CI / build-macos-latest (push) Waiting to run Details
CI / build-debian-old (push) Failing after 7s Details
CI / build-centos-jemalloc (push) Failing after 3s Details
CI / build-old-chain-jemalloc (push) Failing after 3s Details
CI / build-32bit (push) Failing after 21s Details
Codecov / code-coverage (push) Failing after 8s Details
CI / build-libc-malloc (push) Successful in 50s Details
CI / test-ubuntu-latest (push) Failing after 2m9s Details
CI / test-sanitizer-address (push) Failing after 2m40s Details
Spellcheck / Spellcheck (push) Successful in 9m2s Details
External Server Tests / test-external-standalone (push) Failing after 32s Details
External Server Tests / test-external-cluster (push) Failing after 32s Details
Coverity Scan / coverity (push) Has been skipped Details
External Server Tests / test-external-nodebug (push) Failing after 31s Details
This fixes an error that occurs in the job
[test-valgrind-no-malloc-usable-size-test](https://github.com/redis/redis/actions/runs/13912357739/job/38929051397)
of the Daily workflow:

```
*** [err]: HEXPIREAT - Set time and then get TTL (listpackex) in tests/unit/type/hash-field-expire.tcl
Expected '999' to be between to '1000' and '2000' (context: type eval line 6 cmd {assert_range [r hpttl myhash FIELDS 1 field1] 1000 2000} proc ::test)
```
2025-03-26 10:00:38 +08:00
Oran Agra 2a189709e0
avoid possible use-after-free with module KSN changes (#13875)
CI / build-debian-old (push) Failing after 4s Details
CI / build-centos-jemalloc (push) Failing after 3s Details
CI / build-old-chain-jemalloc (push) Failing after 3s Details
CI / build-32bit (push) Failing after 18s Details
CI / build-libc-malloc (push) Successful in 53s Details
CI / test-sanitizer-address (push) Failing after 1m6s Details
CI / test-ubuntu-latest (push) Failing after 2m57s Details
Spellcheck / Spellcheck (push) Successful in 9m5s Details
Coverity Scan / coverity (push) Has been skipped Details
External Server Tests / test-external-cluster (push) Failing after 31s Details
External Server Tests / test-external-standalone (push) Failing after 6m35s Details
External Server Tests / test-external-nodebug (push) Failing after 15m1s Details
CI / build-macos-latest (push) Has been cancelled Details
in #13505, we changed the code to use the string value of the key rather
than the integer value on the stack, but we have a test in
unit/moduleapi/keyspace_events that uses keyspace notification hook to
modify the value with RM_StringDMA, which can cause this value to be
released before used. the reason it didn't happen so far is because we
were using shared integers, so releasing the object doesn't free it.
2025-03-24 12:24:52 +02:00
Benson-li 427c36888e
Fix potential infinite loop of RANDOMKEY during client pause (#13863)
CI / test-ubuntu-latest (push) Failing after 31s Details
CI / build-debian-old (push) Failing after 32s Details
CI / build-libc-malloc (push) Failing after 31s Details
CI / build-centos-jemalloc (push) Failing after 31s Details
CI / build-old-chain-jemalloc (push) Failing after 32s Details
Codecov / code-coverage (push) Failing after 32s Details
Spellcheck / Spellcheck (push) Failing after 32s Details
CI / test-sanitizer-address (push) Failing after 4m35s Details
CI / build-32bit (push) Failing after 5m35s Details
CI / build-macos-latest (push) Has been cancelled Details
CodeQL / Analyze (cpp) (push) Failing after 32s Details
Coverity Scan / coverity (push) Has been skipped Details
External Server Tests / test-external-standalone (push) Failing after 31s Details
External Server Tests / test-external-cluster (push) Failing after 31s Details
External Server Tests / test-external-nodebug (push) Failing after 6m47s Details
The bug mentioned in this
[#13862](https://github.com/redis/redis/issues/13862) has been fixed.

---------

Signed-off-by: li-benson <1260437731@qq.com>
Signed-off-by: youngmore1024 <youngmore1024@outlook.com>
Co-authored-by: youngmore1024 <youngmore1024@outlook.com>
2025-03-20 21:32:12 +08:00
debing.sun cb02bd190b
Fix timing issue in module defrag test (#13870)
After #13840, the data we populate becomes more complex and slower, we
always wait for a defragmentation cycle to end before verifying that the
test is okay.
However, in some slow environments, an entire defragmentation cycle can
exceed 5 seconds, and in my local test using 'taskset -c 0' it can reach
6 seconds, so increase the threshold to avoid test failures.
2025-03-20 21:22:47 +08:00
Yuan Wang 951ec79654
Cluster compatibility check (#13846)
CI / build-macos-latest (push) Waiting to run Details
CI / build-32bit (push) Failing after 31s Details
CI / build-libc-malloc (push) Failing after 31s Details
CI / build-debian-old (push) Failing after 1m32s Details
CI / build-old-chain-jemalloc (push) Failing after 31s Details
Codecov / code-coverage (push) Failing after 31s Details
CI / test-ubuntu-latest (push) Failing after 3m21s Details
Spellcheck / Spellcheck (push) Failing after 31s Details
CI / test-sanitizer-address (push) Failing after 6m36s Details
CI / build-centos-jemalloc (push) Failing after 6m36s Details
External Server Tests / test-external-standalone (push) Failing after 2m10s Details
Coverity Scan / coverity (push) Has been skipped Details
External Server Tests / test-external-nodebug (push) Failing after 2m12s Details
External Server Tests / test-external-cluster (push) Failing after 2m16s Details
### Background
The program runs normally in standalone mode, but migrating to cluster
mode may cause errors, this is because some cross slot commands can not
run in cluster mode. We should provide an approach to detect this issue
when running in standalone mode, and need to expose a metric which
indicates the usage of no incompatible commands.

### Solution
To avoid perf impact, we introduce a new config
`cluster-compatibility-sample-ratio` which define the sampling ratio
(0-100) for checking command compatibility in cluster mode. When a
command is executed, it is sampled at the specified ratio to determine
if it complies with Redis cluster constraints, such as cross-slot
restrictions.

A new metric is exposed: `cluster_incompatible_ops` in `info stats`
output.

The following operations will be considered incompatible operations.

- cross-slot command
   If a command has multiple cross slot keys, it is incompatible
- `swap, copy, move, select` command
These commands involve multi databases in some cases, we don't allow
multiple DB in cluster mode, so there are not compatible
- Module command with `no-cluster` flag
If a module command has `no-cluster` flag, we will encounter an error
when loading module, leading to fail to load module if cluster is
enabled, so this is incompatible.
- Script/function with `no-cluster` flag
Similar with module command, if we declare `no-cluster` in shebang of
script/function, we also can not run it in cluster mode
- `sort` command by/get pattern
When `sort` command has `by/get` pattern option, we must ask that the
pattern slot is equal with the slot of keys, otherwise it is
incompatible in cluster mode.

- The script/function command accesses the keys and declared keys have
different slots
For the script/function command, we not only check the slot of declared
keys, but only check the slot the accessing keys, if they are different,
we think it is incompatible.

**Besides**, commands like `keys, scan, flushall, script/function
flush`, that in standalone mode iterate over all data to perform the
operation, are only valid for the server that executes the command in
cluster mode and are not broadcasted. However, this does not lead to
errors, so we do not consider them as incompatible commands.

### Performance impact test
**cross slot test**
Below are the test commands and results. When using MSET with 8 keys,
performance drops by approximately 3%.

**single key test**
It may be due to the overhead of the sampling function, and single-key
commands could cause a 1-2% performance drop.
2025-03-20 10:35:53 +08:00
Filipe Oliveira (Redis) 3e012c9260
Fix string2d usage in case of hexadecimal strings parsing and overflow (#13845)
CI / build-macos-latest (push) Waiting to run Details
CI / build-debian-old (push) Failing after 6s Details
CI / build-centos-jemalloc (push) Failing after 5s Details
CI / build-old-chain-jemalloc (push) Failing after 3s Details
Codecov / code-coverage (push) Failing after 7s Details
CI / build-libc-malloc (push) Successful in 56s Details
CI / test-sanitizer-address (push) Failing after 1m8s Details
CI / test-ubuntu-latest (push) Failing after 2m13s Details
CI / build-32bit (push) Failing after 3m28s Details
Coverity Scan / coverity (push) Has been skipped Details
External Server Tests / test-external-nodebug (push) Failing after 1m48s Details
External Server Tests / test-external-standalone (push) Failing after 2m9s Details
External Server Tests / test-external-cluster (push) Failing after 2m14s Details
Spellcheck / Spellcheck (push) Successful in 9m3s Details
Since https://github.com/redis/redis/pull/11884, what was previously
accepted as a valid input (hexadecimal string) before 8.0 returned an
error. This PR addresses it. To avoid performance penalties if hints the
compiler that the fallbacks are not likely to happen.
Furthermore, we were ignoring std::result_out_of_range outputs from
fast_float. This PR addresses it as well and includes tests for both
identified scenarios.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-03-19 20:08:45 +08:00
debing.sun a5a3afd923
Fix crash during SLAVEOF when clients are blocked on lazyfree (#13853)
CI / build-debian-old (push) Failing after 7s Details
CI / build-centos-jemalloc (push) Failing after 6s Details
CI / build-old-chain-jemalloc (push) Failing after 4s Details
Codecov / code-coverage (push) Failing after 7s Details
CI / build-libc-malloc (push) Successful in 53s Details
CI / test-sanitizer-address (push) Failing after 1m4s Details
CI / test-ubuntu-latest (push) Failing after 2m9s Details
CI / build-32bit (push) Failing after 9m50s Details
Spellcheck / Spellcheck (push) Successful in 9m0s Details
Coverity Scan / coverity (push) Has been skipped Details
External Server Tests / test-external-standalone (push) Failing after 31s Details
External Server Tests / test-external-cluster (push) Failing after 6m36s Details
External Server Tests / test-external-nodebug (push) Failing after 9m54s Details
CI / build-macos-latest (push) Has been cancelled Details
After https://github.com/redis/redis/pull/13167, when a client calls
`FLUSHDB` command, we still async empty database, and the client was
blocked until the lazyfree completes.

1) If another client calls `SLAVEOF` command during this time, the
server will unblock all blocked clients, including those blocked by the
lazyfree. However, when unblocking a lazyfree blocked client, we forgot
to call `updateStatsOnUnblock()`, which ultimately triggered the
following assertion.

2) If a client blocked by Lazyfree is unblocked midway, and at this
point the `bio_comp_list` has already received the completion
notification for the bio, we might end up processing a client that has
already been unblocked in `flushallSyncBgDone()`. Therefore, we need to
filter it out.

---------

Co-authored-by: oranagra <oran@redislabs.com>
2025-03-17 20:27:05 +08:00
debing.sun f364dcca2d
Make RM_DefragRedisModuleDict API support incremental defragmentation for dict leaf (#13840)
After https://github.com/redis/redis/pull/13816, we make a new API to
defrag RedisModuleDict.
Currently, we only support incremental defragmentation of the dictionary
itself, but the defragmentation of values is still not incremental. If
the values are very large, it could lead to significant blocking.
Therefore, in this PR, we have added incremental defragmentation for the
values.

The main change is to the `RedisModuleDefragDictValueCallback`, we
modified the return value of this callback.
When the callback returns 1, we will save the `seekTo` as the key of the
current unfinished node, and the next time we enter, we will continue
defragmenting this node.
When the return value is 0, we will proceed to the next node.

## Test
Since each dictionary in the global dict originally contained only 10
strings, but now it has been changed to a nested dictionary, each
dictionary now has 10 sub-dictionaries, with each sub-dictionary
containing 10 strings, this has led to a corresponding reduction in the
defragmentation time obtained from other tests.
Therefore, the other tests have been modified to always wait for
defragmentation to be turned off before the test begins, then start it
after creating fragmentation, ensuring that they can always run for a
full defragmentation cycle.

---------

Co-authored-by: ephraimfeldblum <ephraim.feldblum@redis.com>
2025-03-04 17:19:41 +08:00
debing.sun 7939ba031d
Enable the callback to be NULL for RM_DefragRedisModuleDict() and reduce the system calls of RM_DefragShouldStop() (#13830)
1) Enable the callback to be NULL for RM_DefragRedisModuleDict()
    Because the dictionary may store only the key without the value.

2) Reduce the system calls of RM_DefragShouldStop()
The API checks the following thresholds before performing a time check:
over 512 defrag hits, or over 1024 defrag misses, and performs the time
judgment if any of these thresholds are reached.

3) Added defragmentation statistics for dictionary items to cover the
associated code for RM_DefragRedisModuleDict().

4) Removed `module_ctx` from `defragModuleCtx` struct, which can be
replaced by a temporary variable.

---------

Co-authored-by: oranagra <oran@redislabs.com>
2025-02-26 20:04:29 +08:00
Yuan Wang f1d6542b1a
Stabilize tcl test cases (#13829)
Recently encountered some errors as bellow,

HGETEX/HSETEX with PXAT/EXAT options, after getting ttl, we calculate
current time by `[clock seconds]` that may have a delay that causes
results greater than expected.

Dismiss memory test error, now we introduced rdb-channel replication,
the full synchronization might finish before the child process exits. So
we may fail if calling `bgsave` immediately after full sync.
2025-02-25 16:31:53 +08:00
Denis Nevmerzhitskii 33f03f6fc8
Fix wrong behavior of XREAD + after last entry of stream have been removed (#13632)
Close #13628

This PR changes behavior of special `+` id of XREAD command. Now it uses
`streamLastValidID` to find last entry instead of `last_id` field of
stream object.
This PR adds test for the issue.

**Notes**

Initial idea to update `last_id` while executing XDEL seems to be wrong.
`last_id` is used to strore last generated id and not id of last entry.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: guybe7 <guy.benoish@redislabs.com>
2025-02-25 13:40:24 +08:00
Moti Cohen 0200e8ada6
Fix multiple issues with "INFO KEYSIZES" (#13825)
This commit addresses several issues related to the `INFO KEYSIZES` feature:
- HyperLogLog commands: `KEYSIZES` hooks were not properly set or tested.
- HFE lazy expiration: `KEYSIZES` hooks were not properly set or tested.
- Empty DB & SYNC flow: On `blocking_async=0` flow, global `keysizes`
  histogram were not reset (can reproduced using `DEBUG RELOAD`).
- Empty string handling: Fix histogram for strings of size 0. Not 
  relevant to other data-types.
2025-02-25 00:38:44 +02:00
debing.sun ee933d9e2b
Fixed passing incorrect endtime value for module context (#13822)
1) Fix a bug that passing an incorrect endtime to module.
   This bug was found by @ShooterIT.
After #13814, all endtime will be monotonic time, and we should no
longer convert it to ustime relative.
Add assertions to prevent endtime from being much larger thatn the
current time.

2) Fix a race in test `Reduce defrag CPU usage when module data can't be
defragged`

---------

Co-authored-by: ShooterIT <wangyuancode@163.com>
2025-02-23 12:58:48 +08:00
debing.sun 032357ec0f
Add RM_DefragRedisModuleDict module API (#13816)
After #13815, we introduced incremental defragmentation for global data
for module.
Now we added a new module API `RM_DefragRedisModuleDict` to incremental
defrag `RedisModuleDict`.

This PR adds a new APIs and a new defrag callback:
```c
RedisModuleDict *RM_DefragRedisModuleDict(RedisModuleDefragCtx *ctx, RedisModuleDict *dict, RedisModuleDefragDictValueCallback valueCB, RedisModuleString **seekTo);

typedef void *(*RedisModuleDefragDictValueCallback)(RedisModuleDefragCtx *ctx, void *data, unsigned char *key, size_t keylen);
```

Usage:
```c
RedisModuleString *seekTo = NULL;
RedisModuleDict *dict = = RedisModule_CreateDict(ctx);
... populate the dict code ...
/* Defragment a dictionary completely */
do {
    RedisModuleDict *new = RedisModule_DefragRedisModuleDict(ctx, dict, defragGlobalDictValueCB, &seekTo);
    if (new != NULL) {
        dict = new;
    }
} while (seekTo);
```

---------

Co-authored-by: ShooterIT <wangyuancode@163.com>
Co-authored-by: oranagra <oran@redislabs.com>
2025-02-20 21:09:29 +08:00
debing.sun 695126ccce
Add support for incremental defragmentation of global module data (#13815)
## Description

Currently, when performing defragmentation on non-key data within the
module, we cannot process the defragmentation incrementally. This
limitation affects the efficiency and flexibility of defragmentation in
certain scenarios.
The primary goal of this PR is to introduce support for incremental
defragmentation of global module data.

## Interface Change
New module API `RegisterDefragFunc2`

This is a more advanced version of `RM_RegisterDefragFunc`, in that it
takes a new callbacks(`RegisterDefragFunc2`) that has a return value,
and can use RM_DefragShouldStop in and indicate that it should be called
again later, or is it done (returned 0).

## Note
The `RegisterDefragFunc` API remains available.

---------

Co-authored-by: ShooterIT <wangyuancode@163.com>
Co-authored-by: oranagra <oran@redislabs.com>
2025-02-20 00:28:16 +08:00
debing.sun 725cd268e6
Refactor of ActiveDefrag to reduce latencies (#13814)
This PR is based on: https://github.com/valkey-io/valkey/pull/1462

## Issue/Problems

Duty Cycle: Active Defrag has configuration values which determine the
intended percentage of CPU to be used based on a gradient of the
fragmentation percentage. However, Active Defrag performs its work on
the 100ms serverCron timer. It then computes a duty cycle and performs a
single long cycle. For example, if the intended CPU is computed to be
10%, Active Defrag will perform 10ms of work on this 100ms timer cron.

* This type of cycle introduces large latencies on the client (up to
25ms with default configurations)
* This mechanism is subject to starvation when slow commands delay the
serverCron

Maintainability: The current Active Defrag code is difficult to read &
maintain. Refactoring of the high level control mechanisms and functions
will allow us to more seamlessly adapt to new defragmentation needs.
Specific examples include:

* A single function (activeDefragCycle) includes the logic to
start/stop/modify the defragmentation as well as performing one "step"
of the defragmentation. This should be separated out, so that the actual
defrag activity can be performed on an independent timer (see duty cycle
above).
* The code is focused on kvstores, with other actions just thrown in at
the end (defragOtherGlobals). There's no mechanism to break this up to
reduce latencies.
* For the main dictionary (only), there is a mechanism to set aside
large keys to be processed in a later step. However this code creates a
separate list in each kvstore (main dict or not), bleeding/exposing
internal defrag logic. We only need 1 list - inside defrag. This logic
should be more contained for the main key store.
* The structure is not well suited towards other non-main-dictionary
items. For example, pub-sub and pub-sub-shard was added, but it's added
in such a way that in CMD mode, with multiple DBs, we will defrag
pub-sub repeatedly after each DB.

## Description of the feature

Primarily, this feature will split activeDefragCycle into 2 functions.

1. One function will be called from serverCron to determine if a defrag
cycle (a complete scan) needs to be started. It will also determine if
the CPU expenditure needs to be adjusted.
2. The 2nd function will be a timer proc dedicated to performing defrag.
This will be invoked independently from serverCron.

Once the functions are split, there is more control over the latency
created by the defrag process. A new configuration will be used to
determine the running time for the defrag timer proc. The default for
this will be 500us (one-half of the current minimum time). Then the
timer will be adjusted to achieve the desired CPU. As an example, 5% of
CPU will run the defrag process for 500us every 10ms. This is much
better than running for 5ms every 100ms.

The timer function will also adjust to compensate for starvation. If a
slow command delays the timer, the process will run proportionately
longer to ensure that the configured CPU is achieved. Given the presence
of slow commands, the proportional extra time is insignificant to
latency. This also addresses the overload case. At 100% CPU, if the
event loop slows, defrag will run proportionately longer to achieve the
configured CPU utilization.

Optionally, in low CPU situations, there would be little impact in
utilizing more than the configured CPU. We could optionally allow the
timer to pop more often (even with a 0ms delay) and the (tail) latency
impact would not change.

And we add a time limit for the defrag duty cycle to prevent excessive
latency. When latency is already high (indicated by a long time between
calls), we don't want to make it worse by running defrag for too long.

Addressing maintainability:

* The basic code structure can more clearly be organized around a
"cycle".
* Have clear begin/end functions and a set of "stages" to be executed.
* Rather than stages being limited to "kvstore" type data, a cycle
should be more flexible, incorporating the ability to incrementally
perform arbitrary work. This will likely be necessary in the future for
certain module types. It can be used today to address oddballs like
defragOtherGlobals.
* We reduced some of the globals, and reduce some of the coupling.
defrag_later should be removed from serverDb.
* Each stage should begin on a fresh cycle. So if there are
non-time-bounded operations like kvstoreDictLUTDefrag, these would be
less likely to introduce additional latency.


Signed-off-by: Jim Brunner
[brunnerj@amazon.com](mailto:brunnerj@amazon.com)
Signed-off-by: Madelyn Olson
[madelyneolson@gmail.com](mailto:madelyneolson@gmail.com)
Co-authored-by: Madelyn Olson
[madelyneolson@gmail.com](mailto:madelyneolson@gmail.com)

---------

Signed-off-by: Jim Brunner brunnerj@amazon.com
Signed-off-by: Madelyn Olson madelyneolson@gmail.com
Co-authored-by: Madelyn Olson madelyneolson@gmail.com
Co-authored-by: ShooterIT <wangyuancode@163.com>
2025-02-20 00:05:24 +08:00
Ozan Tezcan e2608478b6
Add HGETDEL, HGETEX and HSETEX hash commands (#13798)
This PR adds three new hash commands: HGETDEL, HGETEX and HSETEX. These
commands enable user to do multiple operations in one step atomically
e.g. set a hash field and update its TTL with a single command.
Previously, it was only possible to do it by calling hset and hexpire
commands subsequently.

- **HGETDEL command**

  ```
  HGETDEL <key> FIELDS <numfields> field [field ...]
  ```
  
  **Description**  
  Get and delete the value of one or more fields of a given hash key
  
  **Reply**  
Array reply: list of the value associated with each field or nil if the
field doesn’t exist.

- **HGETEX command**

  ```
   HGETEX <key>  
[EX seconds | PX milliseconds | EXAT unix-time-seconds | PXAT
unix-time-milliseconds | PERSIST]
     FIELDS <numfields> field [field ...]
  ```

  **Description**
Get the value of one or more fields of a given hash key, and optionally
set their expiration

  **Options:**
  EX seconds: Set the specified expiration time, in seconds.
  PX milliseconds: Set the specified expiration time, in milliseconds.
EXAT timestamp-seconds: Set the specified Unix time at which the field
will expire, in seconds.
PXAT timestamp-milliseconds: Set the specified Unix time at which the
field will expire, in milliseconds.
  PERSIST: Remove the time to live associated with the field.

  **Reply** 
Array reply: list of the value associated with each field or nil if the
field doesn’t exist.

- **HSETEX command**

  ```
  HSETEX <key>
     [FNX | FXX]
[EX seconds | PX milliseconds | EXAT unix-time-seconds | PXAT
unix-time-milliseconds | KEEPTTL]
     FIELDS <numfields> field value [field value...]
  ```
  **Description**
Set the value of one or more fields of a given hash key, and optionally
set their expiration

  **Options:**
  FNX: Only set the fields if all do not already exist.
  FXX: Only set the fields if all already exist.

  EX seconds: Set the specified expiration time, in seconds.
  PX milliseconds: Set the specified expiration time, in milliseconds.
EXAT timestamp-seconds: Set the specified Unix time at which the field
will expire, in seconds.
PXAT timestamp-milliseconds: Set the specified Unix time at which the
field will expire, in milliseconds.
  KEEPTTL: Retain the time to live associated with the field.

  
Note: If no option is provided, any associated expiration time will be
discarded similar to how SET command behaves.

  **Reply**
  Integer reply: 0 if no fields were set
  Integer reply: 1 if all the fields were set
2025-02-14 17:13:35 +03:00
jonathan keinan 7a40fd630d * fix comments 2025-02-06 13:16:33 +02:00
jonathan keinan b9361ad5fe * only use new api if override-default was provided as an argument 2025-02-06 13:16:33 +02:00
kei-nan f164012c19 Update tests/unit/moduleapi/moduleconfigs.tcl
Co-authored-by: nafraf <nafraf@users.noreply.github.com>
2025-02-06 13:16:33 +02:00
jonathan keinan 49455c43ae * change foo to goo so test will be correct and pass 2025-02-06 13:16:33 +02:00
jonathan keinan c2694fb696 * change config value in test to be different than overwritten value 2025-02-06 13:16:33 +02:00
jonathan keinan f7353db7eb * fix test
* cleanup strval2 on if an error during the OnLoad was encountered.
2025-02-06 13:16:33 +02:00
jonathan keinan 294492dbf2 * fix tests
* add some logging to test module
2025-02-06 13:16:33 +02:00
jonathan keinan fd5c325886 * initial commit 2025-02-06 13:16:33 +02:00
YaacovHazan 0aeb86d78d Revert "Improve GETRANGE command behavior (#12272)"
Although the commit #6ceadfb58 improves GETRANGE command behavior,
we can't accept it as we should avoid breaking changes for non-critical bug fixes.

This reverts commit 6ceadfb580.
2025-02-05 20:49:42 +02:00
YaacovHazan 8afb72a326 Revert "improve performance for scan command when matching data type (#12395)"
Although the commit #7f0a7f0a6 improves the performance of the SCAN command,
we can't accept it as we should avoid breaking changes for non-critical bug fixes.

This reverts commit 7f0a7f0a69.
2025-02-05 20:49:42 +02:00
Raz Monsonego 04589f90d7
Add internal connection and command mechanism (#13740)
# PR: Add Mechanism for Internal Commands and Connections in Redis

This PR introduces a mechanism to handle **internal commands and
connections** in Redis. It includes enhancements for command
registration, internal authentication, and observability.

## Key Features

1. **Internal Command Flag**:
   - Introduced a new **module command registration flag**: `internal`.
- Commands marked with `internal` can only be executed by **internal
connections**, AOF loading flows, and master-replica connections.
- For any other connection, these commands will appear as non-existent.

2. **Support for internal authentication added to `AUTH`**:
- Used by depicting the special username `internal connection` with the
right internal password, i.e.,: `AUTH "internal connection"
<internal_secret>`.
- No user-defined ACL username can have this name, since spaces are not
aloud in the ACL parser.
   - Allows connections to authenticate as **internal connections**.
- Authenticated internal connections can execute internal commands
successfully.

4. **Module API for Internal Secret**:
- Added the `RedisModule_GetInternalSecret()` API, that exposes the
internal secret that should be used as the password for the new `AUTH
"internal connection" <password>` command.
- This API enables the modules to authenticate against other shards as
local connections.

## Notes on Behavior

- **ACL validation**:
- Commands dispatched by internal connections bypass ACL validation, to
give the caller full access regardless of the user with which it is
connected.

- **Command Visibility**:
- Internal commands **do not appear** in `COMMAND <subcommand>` and
`MONITOR` for non-internal connections.
- Internal commands **are logged** in the slow log, latency report and
commands' statistics to maintain observability.

- **`RM_Call()` Updates**:
  - **Non-internal connections**:
- Cannot execute internal commands when the command is sent with the `C`
flag (otherwise can).
- Internal connections bypass ACL validations (i.e., run as the
unrestricted user).

- **Internal commands' success**:
- Internal commands succeed upon being sent from either an internal
connection (i.e., authenticated via the new `AUTH "internal connection"
<internal_secret>` API), an AOF loading process, or from a master via
the replication link.
Any other connections that attempt to execute an internal command fail
with the `unknown command` error message raised.

- **`CLIENT LIST` flags**:
  - Added the `I` flag, to indicate that the connection is internal.

- **Lua Scripts**:
   - Prevented internal commands from being executed via Lua scripts.

---------

Co-authored-by: Meir Shpilraien <meir@redis.com>
2025-02-05 11:48:08 +02:00
Meir Shpilraien (Spielrein) 870b6bd487
Added a shared secret over Redis cluster. (#13763)
The PR introduces a new shared secret that is shared over all the nodes
on the Redis cluster. The main idea is to leverage the cluster bus to
share a secret between all the nodes such that later the nodes will be
able to authenticate using this secret and send internal commands to
each other (see #13740 for more information about internal commands).

The way the shared secret is chosen is the following:
1. Each node, when start, randomly generate its own internal secret.
2. Each node share its internal secret over the cluster ping messages.
3. If a node gets a ping message with secret smaller then his current
secret, it embrace it.
4. Eventually all nodes should embrace the minimal secret

The converges of the secret is as good as the topology converges.

To extend the ping messages to contain the secret, we leverage the
extension mechanism. Nodes that runs an older Redis version will just
ignore those extensions.

Specific tests were added to verify that eventually all nodes see the
secrets. In addition, a verification was added to the test infra to
verify the secret on `cluster_config_consistent` and to
`assert_cluster_state`.
2025-02-03 09:54:37 +02:00
Raz Monsonego c688537d49
Add flag for ability of a module context to execute debug commands (#13774)
This PR adds a flag to the `RM_GetContextFlags` module-API function that
depicts whether the context may execute debug commands, according to
redis's standards.
2025-02-03 09:52:41 +02:00
debing.sun f86575f210
Gradually reduce defrag CPU usage when defragmentation is ineffective (#13752)
This PR addresses an issue where if a module does not provide a
defragmentation callback, we cannot defragment the fragmentation it
generates. However, the defragmentation process still considers a large
amount of fragmentation to be present, leading to more aggressive
defragmentation efforts that ultimately have no effect.

To mitigate this, the PR introduces a mechanism to gradually reduce the
CPU consumption for defragmentation when the defragmentation
effectiveness is poor. This occurs when the fragmentation rate drops
below 2% and the hit ratio is less than 1%, or when the fragmentation
rate increases by no more than 2%. The CPU consumption will be gradually
decreased until it reaches the minimum threshold defined by
`active-defrag-cycle-min`.

---------

Co-authored-by: oranagra <oran@redislabs.com>
2025-01-24 11:35:32 +08:00
debing.sun 0f65806b5b
Update info.tcl test to revert client output limits sooner (#13738)
This PR is based on: https://github.com/valkey-io/valkey/pull/1462

We set the client output buffer limits to 10 bytes, and then execute
info stats which produces more than 10 bytes of output, which can cause
that command to throw an error.

I'm not sure why it wasn't consistently erroring before, might have been
some change related to the ubuntu upgrade though.

failed CI:
https://github.com/redis/redis/actions/runs/12738281720/job/35500381299

------
Co-authored-by: Madelyn Olson
[madelyneolson@gmail.com](mailto:madelyneolson@gmail.com)
2025-01-14 17:30:18 +08:00
YaacovHazan 4a95b3005a Fix Read/Write key pattern selector (CVE-2024-51741)
The '%' rule must contain one or both of R/W
2025-01-13 21:20:19 +02:00
Ozan Tezcan 73a9b916c9
Rdb channel replication (#13732)
This PR is based on:

https://github.com/redis/redis/pull/12109
https://github.com/valkey-io/valkey/pull/60

Closes: https://github.com/redis/redis/issues/11678

**Motivation**

During a full sync, when master is delivering RDB to the replica,
incoming write commands are kept in a replication buffer in order to be
sent to the replica once RDB delivery is completed. If RDB delivery
takes a long time, it might create memory pressure on master. Also, once
a replica connection accumulates replication data which is larger than
output buffer limits, master will kill replica connection. This may
cause a replication failure.

The main benefit of the rdb channel replication is streaming incoming
commands in parallel to the RDB delivery. This approach shifts
replication stream buffering to the replica and reduces load on master.
We do this by opening another connection for RDB delivery. The main
channel on replica will be receiving replication stream while rdb
channel is receiving the RDB.

This feature also helps to reduce master's main process CPU load. By
opening a dedicated connection for the RDB transfer, the bgsave process
has access to the new connection and it will stream RDB directly to the
replicas. Before this change, due to TLS connection restriction, the
bgsave process was writing RDB bytes to a pipe and the main process was
forwarding
it to the replica. This is no longer necessary, the main process can
avoid these expensive socket read/write syscalls. It also means RDB
delivery to replica will be faster as it avoids this step.

In summary, replication will be faster and master's performance during
full syncs will improve.


**Implementation steps**

1. When replica connects to the master, it sends 'rdb-channel-repl' as
part of capability exchange to let master to know replica supports rdb
channel.
2. When replica lacks sufficient data for PSYNC, master sends
+RDBCHANNELSYNC reply with replica's client id. As the next step, the
replica opens a new connection (rdb-channel) and configures it against
the master with the appropriate capabilities and requirements. It also
sends given client id back to master over rdbchannel, so that master can
associate these channels. (initial replica connection will be referred
as main-channel) Then, replica requests fullsync using the RDB channel.
3. Prior to forking, master attaches the replica's main channel to the
replication backlog to deliver replication stream starting at the
snapshot end offset.
4. The master main process sends replication stream via the main
channel, while the bgsave process sends the RDB directly to the replica
via the rdb-channel. Replica accumulates replication stream in a local
buffer, while the RDB is being loaded into the memory.
5. Once the replica completes loading the rdb, it drops the rdb channel
and streams the accumulated replication stream into the db. Sync is
completed.

**Some details**
- Currently, rdbchannel replication is supported only if
`repl-diskless-sync` is enabled on master. Otherwise, replication will
happen over a single connection as in before.
- On replica, there is a limit to replication stream buffering. Replica
uses a new config `replica-full-sync-buffer-limit` to limit number of
bytes to accumulate. If it is not set, replica inherits
`client-output-buffer-limit <replica>` hard limit config. If we reach
this limit, replica stops accumulating. This is not a failure scenario
though. Further accumulation will happen on master side. Depending on
the configured limits on master, master may kill the replica connection.

**API changes in INFO output:**

1. New replica state: `send_bulk_and_stream`. Indicates full sync is
still in progress for this replica. It is receiving replication stream
and rdb in parallel.
```
slave0:ip=127.0.0.1,port=5002,state=send_bulk_and_stream,offset=0,lag=0
```
Replica state changes in steps:
- First, replica sends psync and receives +RDBCHANNELSYNC
:`state=wait_bgsave`
- After replica connects with rdbchannel and delivery starts:
`state=send_bulk_and_stream`
 - After full sync: `state=online`

2. On replica side, replication stream buffering metrics:
- replica_full_sync_buffer_size: Currently accumulated replication
stream data in bytes.
- replica_full_sync_buffer_peak: Peak number of bytes that this instance
accumulated in the lifetime of the process.

```
replica_full_sync_buffer_size:20485             
replica_full_sync_buffer_peak:1048560
```

**API changes in CLIENT LIST**

In `client list` output, rdbchannel clients will have 'C' flag in
addition to 'S' replica flag:
```
id=11 addr=127.0.0.1:39108 laddr=127.0.0.1:5001 fd=14 name= age=5 idle=5 flags=SC db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1920 events=r cmd=psync user=default redir=-1 resp=2 lib-name= lib-ver= io-thread=0
```

**Config changes:**
- `replica-full-sync-buffer-limit`: Controls how much replication data
replica can accumulate during rdbchannel replication. If it is not set,
a value of 0 means replica will inherit `client-output-buffer-limit
<replica>` hard limit config to limit accumulated data.
- `repl-rdb-channel` config is added as a hidden config. This is mostly
for testing as we need to support both rdbchannel replication and the
older single connection replication (to keep compatibility with older
versions and rdbchannel replication will not be enabled if
repl-diskless-sync is not enabled). it affects both the master (not to
respond to rdb channel requests), and the replica (not to declare
capability)

**Internal API changes:**
Changes that were introduced to Redis replication:
- New replication capability is added to replconf command: `capa
rdb-channel-repl`. Indicates replica is capable of rdb channel
replication. Replica sends it when it connects to master along with
other capabilities.
- If replica needs fullsync, master replies `+RDBCHANNELSYNC
<client-id>` to the replica's PSYNC request.
- When replica opens rdbchannel connection, as part of replconf command,
it sends `rdb-channel 1` to let master know this is rdb channel. Also,
it sends `main-ch-client-id <client-id>` as part of replconf command so
master can associate channels.
  
**Testing:**
As rdbchannel replication is enabled by default, we run whole test suite
with it. Though, as we need to support both rdbchannel and single
connection replication, we'll be running some tests twice with
`repl-rdb-channel yes/no` config.

**Replica state diagram**
```
* * Replica state machine *
 *
 * Main channel state
 * ┌───────────────────┐
 * │RECEIVE_PING_REPLY │
 * └────────┬──────────┘
 *          │ +PONG
 * ┌────────▼──────────┐
 * │SEND_HANDSHAKE     │                     RDB channel state
 * └────────┬──────────┘            ┌───────────────────────────────┐
 *          │+OK                ┌───► RDB_CH_SEND_HANDSHAKE         │
 * ┌────────▼──────────┐        │   └──────────────┬────────────────┘
 * │RECEIVE_AUTH_REPLY │        │    REPLCONF main-ch-client-id <clientid>
 * └────────┬──────────┘        │   ┌──────────────▼────────────────┐
 *          │+OK                │   │ RDB_CH_RECEIVE_AUTH_REPLY     │
 * ┌────────▼──────────┐        │   └──────────────┬────────────────┘
 * │RECEIVE_PORT_REPLY │        │                  │ +OK
 * └────────┬──────────┘        │   ┌──────────────▼────────────────┐
 *          │+OK                │   │  RDB_CH_RECEIVE_REPLCONF_REPLY│
 * ┌────────▼──────────┐        │   └──────────────┬────────────────┘
 * │RECEIVE_IP_REPLY   │        │                  │ +OK
 * └────────┬──────────┘        │   ┌──────────────▼────────────────┐
 *          │+OK                │   │ RDB_CH_RECEIVE_FULLRESYNC     │
 * ┌────────▼──────────┐        │   └──────────────┬────────────────┘
 * │RECEIVE_CAPA_REPLY │        │                  │+FULLRESYNC
 * └────────┬──────────┘        │                  │Rdb delivery
 *          │                   │   ┌──────────────▼────────────────┐
 * ┌────────▼──────────┐        │   │ RDB_CH_RDB_LOADING            │
 * │SEND_PSYNC         │        │   └──────────────┬────────────────┘
 * └─┬─────────────────┘        │                  │ Done loading
 *   │PSYNC (use cached-master) │                  │
 * ┌─▼─────────────────┐        │                  │
 * │RECEIVE_PSYNC_REPLY│        │    ┌────────────►│ Replica streams replication
 * └─┬─────────────────┘        │    │             │ buffer into memory
 *   │                          │    │             │
 *   │+RDBCHANNELSYNC client-id │    │             │
 *   ├──────┬───────────────────┘    │             │
 *   │      │ Main channel           │             │
 *   │      │ accumulates repl data  │             │
 *   │   ┌──▼────────────────┐       │     ┌───────▼───────────┐
 *   │   │ REPL_TRANSFER     ├───────┘     │    CONNECTED      │
 *   │   └───────────────────┘             └────▲───▲──────────┘
 *   │                                          │   │
 *   │                                          │   │
 *   │  +FULLRESYNC    ┌───────────────────┐    │   │
 *   ├────────────────► REPL_TRANSFER      ├────┘   │
 *   │                 └───────────────────┘        │
 *   │  +CONTINUE                                   │
 *   └──────────────────────────────────────────────┘
 */
 ```
 -----
 This PR also contains changes and ideas from: 
https://github.com/valkey-io/valkey/pull/837
https://github.com/valkey-io/valkey/pull/1173
https://github.com/valkey-io/valkey/pull/804
https://github.com/valkey-io/valkey/pull/945
https://github.com/valkey-io/valkey/pull/989
---------

Co-authored-by: Yuan Wang <wangyuancode@163.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Moti Cohen <moticless@gmail.com>
Co-authored-by: naglera <anagler123@gmail.com>
Co-authored-by: Amit Nagler <58042354+naglera@users.noreply.github.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Ping Xie <pingxie@outlook.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>
Co-authored-by: xbasel <103044017+xbasel@users.noreply.github.com>
2025-01-13 15:09:52 +03:00
debing.sun 21aee83abd
Fix issue with argv not being shrunk (#13698)
Found by @ShooterIT 

## Describe
If a client first creates a command with a very large number of
parameters, such as 10,000 parameters, the argv will be expanded to
accommodate 10,000. If the subsequent commands have fewer than 10,000
parameters, this argv will continue to be reused and will never be
shrunk.

## Solution
When determining whether it is necessary to rebuild argv, if the length
of the previous argv has already exceeded 1024, we will progressively
create argv regardless.

## Free argv in cron
Add a new condition to determine whether argv needs to be resized in
cron. When the number of parameters exceeds 128, we will resize it
regardless to avoid a single client consuming too much memory. It will
now occupy a maximum of (128 * 8 bytes).

---------

Co-authored-by: Yuan Wang <wangyuancode@163.com>
2025-01-08 16:12:52 +08:00
debing.sun 08d714d0e5
Fix crash due to cron argv release (#13725)
Introduced by https://github.com/redis/redis/issues/13521

If the client argv was released due to a timeout before sending the
complete command, `argv_len` will be reset to 0.
When argv is parsed again and resized, requesting a length of 0 may
result in argv being NULL, then leading to a crash.

And fix a bug that `argv_len` is not updated correctly in
`replaceClientCommandVector()`.

---------

Co-authored-by: ShooterIT <wangyuancode@163.com>
Co-authored-by: meiravgri <109056284+meiravgri@users.noreply.github.com>
2025-01-08 09:57:23 +08:00
raffertyyu 04f63d4af7
Fix index error of CRLF when replying with integer-encoded strings (#13711)
close #13709

Fix the index error of CRLF character for integer-encoded strings
in addReplyBulk function

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2024-12-31 21:41:10 +08:00
Yuan Wang 64a40b20d9
Async IO Threads (#13695)
## Introduction
Redis introduced IO Thread in 6.0, allowing IO threads to handle client
request reading, command parsing and reply writing, thereby improving
performance. The current IO thread implementation has a few drawbacks.
- The main thread is blocked during IO thread read/write operations and
must wait for all IO threads to complete their current tasks before it
can continue execution. In other words, the entire process is
synchronous. This prevents the efficient utilization of multi-core CPUs
for parallel processing.

- When the number of clients and requests increases moderately, it
causes all IO threads to reach full CPU utilization due to the busy wait
mechanism used by the IO threads. This makes it challenging for us to
determine which part of Redis has reached its bottleneck.

- When IO threads are enabled with TLS and io-threads-do-reads, a
disconnection of a connection with pending data may result in it being
assigned to multiple IO threads simultaneously. This can cause race
conditions and trigger assertion failures. Related issue:
redis#12540

Therefore, we designed an asynchronous IO threads solution. The IO
threads adopt an event-driven model, with the main thread dedicated to
command processing, meanwhile, the IO threads handle client read and
write operations in parallel.

## Implementation
### Overall
As before, we did not change the fact that all client commands must be
executed on the main thread, because Redis was originally designed to be
single-threaded, and processing commands in a multi-threaded manner
would inevitably introduce numerous race and synchronization issues. But
now each IO thread has independent event loop, therefore, IO threads can
use a multiplexing approach to handle client read and write operations,
eliminating the CPU overhead caused by busy-waiting.

the execution process can be briefly described as follows:
the main thread assigns clients to IO threads after accepting
connections, IO threads will notify the main thread when clients
finish reading and parsing queries, then the main thread processes
queries from IO threads and generates replies, IO threads handle
writing reply to clients after receiving clients list from main thread,
and then continue to handle client read and write events.

### Each IO thread has independent event loop
We now assign each IO thread its own event loop. This approach
eliminates the need for the main thread to perform the costly
`epoll_wait` operation for handling connections (except for specific
ones). Instead, the main thread processes requests from the IO threads
and hands them back once completed, fully offloading read and write
events to the IO threads.

Additionally, all TLS operations, including handling pending data, have
been moved entirely to the IO threads. This resolves the issue where
io-threads-do-reads could not be used with TLS.

### Event-notified client queue
To facilitate communication between the IO threads and the main thread,
we designed an event-notified client queue. Each IO thread and the main
thread have two such queues to store clients waiting to be processed.
These queues are also integrated with the event loop to enable handling.
We use pthread_mutex to ensure the safety of queue operations, as well
as data visibility and ordering, and race conditions are minimized, as
each IO thread and the main thread operate on independent queues,
avoiding thread suspension due to lock contention. And we implemented an
event notifier based on `eventfd` or `pipe` to support event-driven
handling.

### Thread safety
Since the main thread and IO threads can execute in parallel, we must
handle data race issues carefully.

**client->flags**
The primary tasks of IO threads are reading and writing, i.e.
`readQueryFromClient` and `writeToClient`. However, IO threads and the
main thread may concurrently modify or access `client->flags`, leading
to potential race conditions. To address this, we introduced an io-flags
variable to record operations performed by IO threads, thereby avoiding
race conditions on `client->flags`.

**Pause IO thread**
In the main thread, we may want to operate data of IO threads, maybe
uninstall event handler, access or operate query/output buffer or resize
event loop, we need a clean and safe context to do that. We pause IO
thread in `IOThreadBeforeSleep`, do some jobs and then resume it. To
avoid thread suspended, we use busy waiting to confirm the target
status. Besides we use atomic variable to make sure memory visibility
and ordering. We introduce these functions to pause/resume IO Threads as
below.
```
pauseIOThread, resumeIOThread
pauseAllIOThreads, resumeAllIOThreads
pauseIOThreadsRange, resumeIOThreadsRange
```
Testing has shown that `pauseIOThread` is highly efficient, allowing the
main thread to execute nearly 200,000 operations per second during
stress tests. Similarly, `pauseAllIOThreads` with 8 IO threads can
handle up to nearly 56,000 operations per second. But operations
performed between pausing and resuming IO threads must be quick;
otherwise, they could cause the IO threads to reach full CPU
utilization.

**freeClient and freeClientAsync**
The main thread may need to terminate a client currently running on an
IO thread, for example, due to ACL rule changes, reaching the output
buffer limit, or evicting a client. In such cases, we need to pause the
IO thread to safely operate on the client.

**maxclients and maxmemory-clients updating**
When adjusting `maxclients`, we need to resize the event loop for all IO
threads. Similarly, when modifying `maxmemory-clients`, we need to
traverse all clients to calculate their memory usage. To ensure safe
operations, we pause all IO threads during these adjustments.

**Client info reading**
The main thread may need to read a client’s fields to generate a
descriptive string, such as for the `CLIENT LIST` command or logging
purposes. In such cases, we need to pause the IO thread handling that
client. If information for all clients needs to be displayed, all IO
threads must be paused.

**Tracking redirect**
Redis supports the tracking feature and can even send invalidation
messages to a connection with a specified ID. But the target client may
be running on IO thread, directly manipulating the client’s output
buffer is not thread-safe, and the IO thread may not be aware that the
client requires a response. In such cases, we pause the IO thread
handling the client, modify the output buffer, and install a write event
handler to ensure proper handling.

**clientsCron**
In the `clientsCron` function, the main thread needs to traverse all
clients to perform operations such as timeout checks, verifying whether
they have reached the soft output buffer limit, resizing the
output/query buffer, or updating memory usage. To safely operate on a
client, the IO thread handling that client must be paused.
If we were to pause the IO thread for each client individually, the
efficiency would be very low. Conversely, pausing all IO threads
simultaneously would be costly, especially when there are many IO
threads, as clientsCron is invoked relatively frequently.
To address this, we adopted a batched approach for pausing IO threads.
At most, 8 IO threads are paused at a time. The operations mentioned
above are only performed on clients running in the paused IO threads,
significantly reducing overhead while maintaining safety.

### Observability
In the current design, the main thread always assigns clients to the IO
thread with the least clients. To clearly observe the number of clients
handled by each IO thread, we added the new section in INFO output. The
`INFO THREADS` section can show the client count for each IO thread.
```
# Threads
io_thread_0:clients=0
io_thread_1:clients=2
io_thread_2:clients=2
```

Additionally, in the `CLIENT LIST` output, we also added a field to
indicate the thread to which each client is assigned.

`id=244 addr=127.0.0.1:41870 laddr=127.0.0.1:6379 ... resp=2 lib-name=
lib-ver= io-thread=1`

## Trade-off
### Special Clients
For certain special types of clients, keeping them running on IO threads
would result in severe race issues that are difficult to resolve.
Therefore, we chose not to offload these clients to the IO threads.

For replica, monitor, subscribe, and tracking clients, main thread may
directly write them a reply when conditions are met. Race issues are
difficult to resolve, so we have them processed in the main thread. This
includes the Lua debug clients as well, since we may operate connection
directly.

For blocking client, after the IO thread reads and parses a command and
hands it over to the main thread, if the client is identified as a
blocking type, it will be remained in the main thread. Once the blocking
operation completes and the reply is generated, the client is
transferred back to the IO thread to send the reply and wait for event
triggers.

### Clients Eviction
To support client eviction, it is necessary to update each client’s
memory usage promptly during operations such as read, write, or command
execution. However, when a client operates on an IO thread, it is not
feasible to update the memory usage immediately due to the risk of data
races. As a result, memory usage can only be updated either in the main
thread while processing commands or in the `ClientsCron` periodically.
The downside of this approach is that updates might experience a delay
of up to one second, which could impact the precision of memory
management for eviction.

To avoid incorrectly evicting clients. We adopted a best-effort
compensation solution, when we decide to eviction a client, we update
its memory usage again before evicting, if the memory used by the client
does not decrease or memory usage bucket is not changed, then we will
evict it, otherwise, not evict it.

However, we have not completely solved this problem. Due to the delay in
memory usage updates, it may lead us to make incorrect decisions about
the need to evict clients.

### Defragment
In the majority of cases we do NOT use the data from argv directly in
the db.
1. key names
We store a copy that we allocate in the main thread, see `sdsdup()` in
`dbAdd()`.
2. hash key and value
We store key as hfield and store value as sds, see `hfieldNew()` and
`sdsdup()` in `hashTypeSet()`.
3. other datatypes
   They don't even use SDS, so there is no reference issues.

But in some cases client the data from argv may be retain by the main
thread.
As a result, during fragmentation cleanup, we need to move allocations
from the IO thread’s arena to the main thread’s arena. We always
allocate new memory in the main thread’s arena, but the memory released
by IO threads may not yet have been reclaimed. This ultimately causes
the fragmentation rate to be higher compared to creating and allocating
entirely within a single thread.
The following cases below will lead to memory allocated by the IO thread
being kept by the main thread.
1. string related command: `append`, `getset`, `mset` and `set`.
If `tryObjectEncoding()` does not change argv, we will keep it directly
in the main thread, see the code in `tryObjectEncoding()`(specifically
`trimStringObjectIfNeeded()`)
2. block related command.
    the key names will be kept in `c->db->blocking_keys`.
3. watch command
    the key names will be kept in `c->db->watched_keys`.
4. [s]subscribe command
    channel name will be kept in `serverPubSubChannels`.
5. script load command
    script will be kept in `server.lua_scripts`.
7. some module API: `RM_RetainString`, `RM_HoldString`

Those issues will be handled in other PRs.

## Testing
### Functional Testing
The commit with enabling IO Threads has passed all TCL tests, but we did
some changes:
**Client query buffer**: In the original code, when using a reusable
query buffer, ownership of the query buffer would be released after the
command was processed. However, with IO threads enabled, the client
transitions from an IO thread to the main thread for processing. This
causes the ownership release to occur earlier than the command
execution. As a result, when IO threads are enabled, the client's
information will never indicate that a shared query buffer is in use.
Therefore, we skip the corresponding query buffer tests in this case.
**Defragment**: Add a new defragmentation test to verify the effect of
io threads on defragmentation.
**Command delay**: For deferred clients in TCL tests, due to clients
being assigned to different threads for execution, delays may occur. To
address this, we introduced conditional waiting: the process proceeds to
the next step only when the `client list` contains the corresponding
commands.

### Sanitizer Testing
The commit passed all TCL tests and reported no errors when compiled
with the `fsanitizer=thread` and `fsanitizer=address` options enabled.
But we made the following modifications: we suppressed the sanitizer
warnings for clients with watched keys when updating `client->flags`, we
think IO threads read `client->flags`, but never modify it or read the
`CLIENT_DIRTY_CAS` bit, main thread just only modifies this bit, so
there is no actual data race.

## Others
### IO thread number
In the new multi-threaded design, the main thread is primarily focused
on command processing to improve performance. Typically, the main thread
does not handle regular client I/O operations but is responsible for
clients such as replication and tracking clients. To avoid breaking
changes, we still consider the main thread as the first IO thread.

When the io-threads configuration is set to a low value (e.g., 2),
performance does not show a significant improvement compared to a
single-threaded setup for simple commands (such as SET or GET), as the
main thread does not consume much CPU for these simple operations. This
results in underutilized multi-core capacity. However, for more complex
commands, having a low number of IO threads may still be beneficial.
Therefore, it’s important to adjust the `io-threads` based on your own
performance tests.

Additionally, you can clearly monitor the CPU utilization of the main
thread and IO threads using `top -H -p $redis_pid`. This allows you to
easily identify where the bottleneck is. If the IO thread is the
bottleneck, increasing the `io-threads` will improve performance. If the
main thread is the bottleneck, the overall performance can only be
scaled by increasing the number of shards or replicas.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: oranagra <oran@redislabs.com>
2024-12-23 14:16:40 +08:00
Nugine 684077682e
Fix bug in PFMERGE command (#13672)
The bug was introduced in #13558 . 

When merging dense hll structures, `hllDenseCompress` writes to wrong
location and the result will be zero. The unit tests didn't cover this
case.

This PR
+ fixes the bug
+ adds `PFDEBUG SIMD (ON|OFF)` for unit tests
+ adds a new TCL test to cover the cases

Synchronized from https://github.com/valkey-io/valkey/pull/1293

---------

Signed-off-by: Xuyang Wang <xuyangwang@link.cuhk.edu.cn>
Co-authored-by: debing.sun <debing.sun@redis.com>
2024-12-18 14:41:04 +08:00