Commit Graph

12600 Commits

Author SHA1 Message Date
debing.sun e750c619b2
Fix some test failures caused by key being deleted due to premature expiration (#13453)
1. Fix fuzzer test failure when the key was deleted due to expiration
before sending random traffic for the key.

After HFE, when all fields in a hash are expired, the hash might be
deleted due to expiration.

If the key was expired in the mid of `RESTORE` command and sending rand
trafic, `fuzzer` test will fail in the following code because the 'TYPE
key' will return `none` and then throw an exception because it cannot be
found in `$commands`

94b9072e44/tests/support/util.tcl (L712-L713)

This PR adds a `None` check for the reply of `KEY TYPE` command and adds
a print of `err` to avoid false positives in the future.

failed CI:
https://github.com/redis/redis/actions/runs/10127334121/job/28004985388

2. Fix the issue where key was deleted due to expiration before the
`scan.scan_key` command could execute, caused by premature enabling of
`set-active-expire`.

failed CI:
https://github.com/redis/redis/actions/runs/10153722715/job/28077610552

---------

Co-authored-by: oranagra <oran@redislabs.com>
2024-07-31 08:15:39 +08:00
debing.sun 93fb83b4cb
Fix incorrect lag field in XINFO when tombstone is after the last_id of consume group (#13338)
Fix #13337

Ths PR fixes fixed two bugs that caused lag calculation errors.
1. When the latest tombstone is before the first entry, the tombstone
may stil be after the last id of consume group.
2. When a tombstone is after the last id of consume group, the group's
counter will be invalid, we should caculate the entries_read by using
estimates.
2024-07-30 22:31:31 +08:00
Lior Kogan 94b9072e44
Rename to "Redis Community Edition" (#13448) 2024-07-28 20:54:28 +03:00
Oran Agra e74550dd10
solve races in replication lpop tests (#13445)
* some tests didn't wait for replication offset sync
* tests that used deferring client, didn't wait for it to get blocked.
an in some cases, the replication offset sync ended before the deferring
client finished, so the digest match failed.
* some tests used deferring clients excessively
* the tests didn't read the client response
* the tests didn't close the client (fd leak)
2024-07-25 14:06:40 +03:00
Moti Cohen d0c64d78d4
On active expire, factor maxToExpire based on Hertz (#13439) 2024-07-25 13:22:02 +03:00
Moti Cohen 82f00f5179
Optimize RDB_TYPE_HASH_METADATA to keep relative expiration time (#13438)
Modify RDB_TYPE_HASH_METADATA layout to store expiration times relative
to the minimum expiration time, which is written at the start as absolute time.
2024-07-24 08:39:10 +03:00
Oran Agra 447ce11a64
solve race conditions in tests (#13433)
[exception]: Executing test client: ERR FAILOVER target replica is not
online.. ERR FAILOVER target replica is not online.
    while executing
"$node_0 failover to $node_1_host $node_1_port"
    ("uplevel" body line 16)
    invoked from within
"uplevel 1 $code"
    (procedure "test" line 58)
    invoked from within
"test {failover command to specific replica works} {

[err]: client evicted due to percentage of maxmemory in
tests/unit/client-eviction.tcl
Expected 33622 >= 220200 && 33622 < 440401 (context: type eval line 17
cmd {assert {$tot_mem >= $n && $tot_mem < $maxmemory_clients_actual}}
proc ::test)
2024-07-22 10:11:56 +03:00
Oran Agra 13d227fa46
Different fix for the race in #13361 (#13434)
Recently in #13361, i attempted to fix a race between FLUSHALL and
BGSAVE, where despite calling killRDBChild, the
backgroundSaveDoneHandler will terminate with success.
Turns out that even if the child didn't yet exit, there's a chance it'll
still miss our signal and exit with success.
in that case, we will still mess up the dirty counter (deducting
dirty_before_bgsave) which is reset by FLUSHALL, and override the
synchronous rdb file we saved.

instead, we'll set a flag to treat the next done handler as a failed
one.
2024-07-22 10:11:30 +03:00
Oran Agra a331978583
Fix external test hang in redis-cli test when run in a certain order (#13423)
When the tests are run against an external server in this order:
`--single unit/introspection --single unit/moduleapi/blockonbackground
--single integration/redis-cli`
the test would hang when the "ASK redirect test" test attempts to create
a listening socket (it fails, and then redis-cli itself hangs waiting
for a non-responsive socket created by the introspection test).

the reasons are:
1. the blockedbackground test includes util.tcl and resets the
`::last_port_attempted` variable
2. the test in introspection didn't close the listening server, so it's
still alive.
3. find_available_port doesn't properly detect the busy port, and it
thinks that the port is free even though it's busy.

fixing all 3 of these problems, even though fixing just one would be
enough to let the test pass.
2024-07-17 15:42:28 +03:00
Oran Agra fa46aa4d85
Test infra adjustments for external CI runs (#13421)
- when uploading server logs, make sure they don't overwrite each other.
- sort the test units to get consistent order between them (following
#13220)
- backup and restore the entire server configuration, to protect one
unit from config changes another unit performs
2024-07-16 11:38:20 +03:00
debing.sun 88af96c7a2
Trigger Lua GC after script loading (#13407)
Nowdays we do not trigger LUA GC after loading lua script. This means
that when a large number of scripts are loaded, such as when functions
are propagating from the master to the replica, if the LUA scripts are
never touched on the replica, the garbage might remain there
indefinitely.

Before this PR, we would share a gc_count between scripts and functions.
This means that, under certain circumstances, the GC trigger for scripts
and functions was not fair.
For example, loading a large number of scripts followed by a small
number of functions could result in the functions triggering GC.
In this PR, we assign a unique `gc_count` to each of them, so the GC
triggers between them will no longer affect each other.

on the other hand, this PR will to bring regession for script loading
commands(`FUNCTION LOAD` and `SCRIPT LOAD`), but they are not hot path,
we can ignore it, and it will be replaced
https://github.com/redis/redis/pull/13375 in the future.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2024-07-16 09:28:47 +08:00
debing.sun 76415fa2cf
Prevent deleting RDB read event after restarting RDB saving for other diskless replicas (#13410)
When we terminate the diskless RDB saving child process and, at the same
time, we start a new BGSAVE for new replicas, we should not delete the
RDB read event. Otherwise, these replicas will never receive a response.
this is a result of the recent change in
https://github.com/redis/redis/pull/13361

---------

Co-authored-by: oranagra <oran@redislabs.com>
2024-07-16 09:22:43 +08:00
debing.sun 7d3545cb16
Reduce redundant call of prepareClientToWrite when call addReply* continuously (#13412)
This PR is based on the commits from PR
https://github.com/valkey-io/valkey/pull/670.

## Description

While exploring hotspots with profiling some benchmark workloads, we
noticed the high cycles ratio of `prepareClientToWrite`, taking about 9%
of the CPU of `smembers`, `lrange` commands. After deep dive the code
logic, we thought we can gain the performance by reducing the redundant
call of `prepareClientToWrite` when call addReply* continuously.

For example: In

https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L1080-L1082,
`prepareClientToWrite` is called three times in a row.

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
Co-authored-by: Lipeng Zhu <lipeng.zhu@intel.com>
Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>
2024-07-15 23:19:19 +08:00
Oran Agra 880e147d52
solve redis-cli test failures due to local history file (#13419)
test failure:
```
[err]: Interactive CLI: should find second search result if user presses ctrl+s in tests/integration/redis-cli.tcl
Expected '1' to be equal to '0' (context: type eval line 10 cmd {assert_equal 1 [regexp {\(i-search\): \x1B\[0mk\x1B\[1mey\x1B\[0ms one} $result]} proc ::test)
```

this test (introduced in #12543) depends on the local history file, so
it can fail if there's some match there.
the fix is to use a different history file, and delete it before each
run.
2024-07-15 15:24:43 +03:00
guybe7 b10e19e3d6
Crash report: Use more chars for argv (#13413)
128 is not enough chars when we're talking about commands like RESTORE.
Of course, it's impossible to find the perfect number, but 1024 is
better than 128, and it's not obscenely large.
2024-07-14 11:37:44 +08:00
debing.sun d39548c854
Avoid starting defrag after config resetstat for defrag test (#13399)
If `config resetstat` is executed and a defrag is started after it, the
`total_active_defrag_time` will not be 0.
When we start the defrag again, we will skip the following steps:
1. waiting for the defrag to start. (s total_active_defrag_time is equal
0)
2. waiting for the test to complete. (active_defrag_running is euqal 0)
which result in the test failed.

---------

Co-authored-by: oranagra <oran@redislabs.com>
2024-07-12 10:07:09 +08:00
debing.sun ad9277238f
Disable aof for `Scripts do not block on waitaof` test (#13409)
The`Scripts do not block on waitaof` test keeps failing in the external
CI:
https://github.com/redis/redis/actions/runs/9875670156/job/27272955900?pr=13407

The main reason is that if AOF happens to be enabled, we get an acklocal
of 1.
2024-07-11 20:25:57 +08:00
guybe7 915a9e4b93
KSN: HEXPIRE-like commands should emit hdel if expire-time is in the past (#13408)
To be more similar to EXPIRE-like commands, which emit a "del"
notification if the expire-time is in the past
2024-07-11 16:24:58 +08:00
guybe7 81440a333d
Module unblock on keys: updateStatsOnUnblock is called twice (#13405)
This commit reverts the deletion of the condition `!bc->blocked_on_keys`
that was accidentally introduced by
https://github.com/redis/redis/pull/12817.
In case a blocked-on-keys module client is unblocked both
`moduleUnblockClientOnKey` and `moduleHandleBlockedClients` are called
which resulted in `updateStatsOnUnblock` being called twice

Now, that `moduleHandleBlockedClients` doesn't call
`updateStatsOnUnblock` in case of unblocked module key-blocked clients,
in the unlikely event that the module decides to call `RM_UnblockClient`
on a key-blocked client, we need to call `updateStatsOnUnblock` from
within `moduleBlockedClientTimedOut`, but since
`moduleBlockedClientTimedOut` is not tread-safe we can't call it
directly from withing `RM_UnblockClient`.
Added a new flag `blocked_on_keys_explicit_unblock` for that specific
case, which will cause `moduleBlockedClientTimedOut` to be called from
`moduleHandleBlockedClients` (which is only called from the main thread)

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2024-07-11 16:13:38 +08:00
debing.sun ffff7fea7c
Rebuild function engines for function flush command (#13383)
### Issue
The current implementation of `FUNCTION FLUSH` command uses
`lua_unref()` to unreference script closures in Lua vm. However,
invoking `lua_unref()` during lazy free (`ASYNC` argument) is risky
since it is not thread-safe.

Another issue is that using `lua_unref()` to unreference references does
not trigger GC, This can result in the Lua VM leaves a significant
amount of garbage, which may never be cleaned up if not properly GC.

### Solution
The proposed solution is to completely rebuild the engines, resulting in
a brand new Lua VM.

---------

Co-authored-by: meir <meir@redis.com>
2024-07-10 17:35:36 +08:00
debing.sun 69b480cb7a
Hide user data from log (#13400)
This PR is based on the commits from PR #11747.

In the event of an assertion failure, hide command arguments from the
operator.

In some cases, private client information can be voluntarily exposed
when a redis instance crashes due to an assertion failure.
This commit prevent וnintentional client info exposure.
Operators can still access the hidden data, but they must actively
request it.
Any of the client info commands remains the unchanged.

### Config
Add a new config `hide-user-data-from-log` to turn this feature on and
off, default off.

---------

Co-authored-by: naglera <anagler123@gmail.com>
Co-authored-by: naglera <58042354+naglera@users.noreply.github.com>
2024-07-09 18:54:18 +08:00
Vitah Lin b699e8bfe0
Upgrade action/checkout version and add old-chain CI actions to test gcc4.8 (#13394)
https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/

Due to GitHub removing support for CentOS 7 in GitHub Actions, all
actions utilizing CentOS 7 need to be upgraded, upgrade the centos
version from `contos:7` to `quay.io/centos/centos:stream9` which is the
official RedHat centos container.

Create some new actions named `old-chain` to verify support for gcc 4.8.

This PR also includes the upgrade of actions/checkout from version 3 to
version 4.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2024-07-08 21:34:14 +08:00
Moti Cohen dcf0229811
HFE - RDB serialize also hash min expiration (For ROF flow) (#13391)
* Following this feature, Redis (ROF) may implement flow that allows hashes to be
  dumped directly from RDB to FLUSH without parsing. In this scenario, it is still
  essential to determine when to update hashes due to expired fields. By writing
  and reading the next minimum hash-field expiration before serializing objects
  to and from RDB, we can effectively track and expire hash fields without the need
  to parse the hash during loading.

    Before:
    #define RDB_TYPE_HASH_METADATA 22
    #define RDB_TYPE_HASH_LISTPACK_EX 23
    
    After:
    /* Hash with HFEs. Doesn't attach min TTL at start */
    #define RDB_TYPE_HASH_METADATA_PRE_GA 22      
    /* Hash LP with HFEs. Doesn't attach min TTL at start */
    #define RDB_TYPE_HASH_LISTPACK_EX_PRE_GA 23   
    /* Hash with HFEs. Attach min TTL at start */
    #define RDB_TYPE_HASH_METADATA 24             
    /* Hash LP with HFEs. Attach min TTL at start */
    #define RDB_TYPE_HASH_LISTPACK_EX 25          


* Manually test loading RDB file before the change and verify hash and its HFEs 
  is as expected.
* Added `subexpires` counter to `redis-check-rdb`
2024-07-07 17:00:19 +03:00
debing.sun 58edea7385
Add help for `DEBUG SCRIPT` command (#13385)
1. Add help for `DEBUG SCRIPT` command.
2. Remove a duplicate `getLuaScripts()` which is same as
`evalScriptsDict()`.
2024-07-04 14:44:42 +08:00
Filipe Oliveira (Redis) 26a2dcb936
Reduce getNodeByQuery overhead (#13221)
The following PR does the following changes based upon on CPU profile
info. The `getNodeByQuery` function represents 8.2% of an overhead of
12.3% when comparing single shard cluster with standalone.
Proposed changes:
- inlinging keyHashSlot to reduce overhead of that function call
- Reduce duplicate calls to getCommandFlags within getNodeByQuery

The above changes represent an improvement of approximately 5% on the achievable ops/sec.

Co-authored-by: filipecosta90 <filipecosta.90@gmail.com>
2024-07-03 18:23:14 +08:00
Moti Cohen a84cc20aef
HFE - Fix statistic to count also lazy expired and rename INFO params (#13372)
* INFO command : rename `hashes_with_expiry_fields` to `subexpiry`
* INFO command : rename `expired_hash_fields` to `expired_subkeys`
* Fix statistic of `expired_subkeys` to count also lazy expired
* Remove TODOs comments leftover in TCL
* Fix potential flaky test of rdb load of hash-field-expiration
2024-07-02 18:22:10 +03:00
debing.sun 76d179c64f
Update CentOS7 repo URL for EOL and macOS11 GitHub Action for deprecation (#13365)
1. Update macOS version in GitHub Actions due to deprecation of macOS 11
Because the macOS 11 runner image will be removed by 06/28/2024.

2. CentOS has arrived at EOL, we need to update the yum repo to new url.
2024-07-02 21:53:32 +08:00
David Dougherty c6ceea9a23
DOC-3932: remove frontmatter attributes from module API spec generator (#13377)
This PR removes the Hugo frontmatter from this utility.
2024-07-02 08:25:42 +03:00
Oran Agra 799c5e5f51
Solve a race between BGSAVE and FLUSHALL messing up the dirty counter (#13361)
If we run FLUSHALL when the 'save' config is set, and there's a fork
child ding BGSAVE, there's a chance the child is already finished, and
the parent process is unaware of it. in that case the child will not get
the kill signal and will finish successfully, but the parent process
thinks it killed it and will reset the dirty counter to 0, then the
backgroundSaveDoneHandlerDisk method can set the dirty counter to a
negative value.
2024-07-01 09:04:52 +03:00
Oran Agra 69b7137d32
Fix possible crash due to OOM panic on invalid command (#13380)
getKeysUsingKeySpece had the range check AFTER the allocation, of the
keys buffer, which could lead to an OOM panic when invalid arguments are
provided, leading to an overflow.
The allocated memory is only used after the range check, so there's no
risk of buffer overrun.
The OOM panic can happen on 32bit builds, or 64 builds running on
systems with less than 4GB of RAM, and is reachable via the COMMAND
GETKEYSANDFLAGS, and ACL key name validation.
2024-07-01 09:04:20 +03:00
Moti Cohen a9267137ee
HFE - count in command must match actual number of fields (#13369)
There was wrong preliminary assumption that we can optionally provide
vector of arguments more than count.
This is error-prone approach that leaded to actual error in that case.
This PR enforce that vector of argument match count.

Also fixed flaky HRANDFIELD test.
2024-06-26 14:12:06 +03:00
debing.sun 52e12d8bac
Don't keep global replication buffer reference for replicas marked CLIENT_CLOSE_ASAP (#13363)
In certain situations, we might generate a large number of propagates
(e.g., multi/exec, Lua script, or a single command generating tons of
propagations) within an event loop.
During the process of propagating to a replica, if the replica is
disconnected(marked as CLIENT_CLOSE_ASAP) due to exceeding the output
buffer limit, we should remove its reference to the global replication
buffer to avoid the global replication buffer being unable to be
properly trimmed due to being referenced.

---------

Co-authored-by: oranagra <oran@redislabs.com>
2024-06-26 08:26:23 +08:00
Moti Cohen 5eac99c312
Fix H(P)EXPIREAT command to propagate HDEL as well (#13364)
H(P)EXPIREAT command might delete fields in case the absolute time is in the 
past. Those HDELs need to be propagated as well.
 
In general, as we need to propagate H(P)EXPIRE(AT) command to the replica, each 
field that is mentioned in the command should be categorized into one of the four
options:
1. Managed to update field’s expiration time - propagate it to replica as part 
   of the HPEXPIREAT command.
2. Deleted the field because the time is in the past - propagate also HDEL command
   to delete the field and remove the field from the propagated HPEXPIREAT.
3. Condition not met for the field - Remove the field from the propagated
   HPEXPIREAT command.
4. Field does not exists - Remove the field from the propagated HPEXPIREAT command.

If none of the provided fields match option number 1, then avoid also propagating 
the HPEXPIREAT command to the replica.

This approach is aligned with the EXPIRE command:
If a given key has already expired, then DEL will be propagated instead of
EXPIRE command. If condition not met, then command will be rejected. Otherwise, 
EXPIRE command will be propagated for given key.
2024-06-25 13:15:09 +03:00
Moti Cohen e26ea35cd4
Adapt HRANDFIELD to HFE feature (#13348)
Considerations for the selected imp of HRANDFIELD & HFE feature:

HRANDFIELD might access any of the fields in the hash as some of them
might be expired. And so the Implementation of HRANDFIELD along with HFEs
might be one of the two options:
1. Expire hash-fields before diving into handling HRANDFIELD.
2. Refine HRANDFIELD cases to deal with expired fields.

Regarding the first option, as reference, the command RANDOMKEY also
declareson O(1) complexity, yet might be stuck on a very long (but not infinite)
loop trying to find non-expired keys. Furthermore RANDOMKEY also evicts expired 
keys along the way even though it is categorized as a read-only command. Note
that the case of HRANDFIELD is more lightweight versus RANDOMKEY since 
HFEs have much more effective and aggressive active-expiration for fields behind.

The second option introduces additional implementation complexity to HRANDFIELD.
We could further refine HRANDFIELD cases to differentiate between scenarios
with many expired fields versus few expired fields, and adjust based on the
percentage of expired fields. However, this approach could still lead to long
loops or necessitate expiring fields before selecting them. For the “lightweight”
cases it is also expected to have a lightweight expiration.

Considering the pros and cons, and the fact that HRANDFIELD is an infrequent
command (particularly with HFEs) and the fact we have effective active-expiration
behind for hash-fields, it is better to keep it simple and choose option number 1.

Other changes:
* Don't mark command dirty by internal hashTypeExpire(). It causes to read 
  only command of HRANDFIELD to be accidently propagated (This flag
  should be indicated at higher level, by the command functions).
* Align `hashTypeExpireIfNeeded()` and `hashTypeGetValue()` to be more
  aligned with `expireIfNeeded()` logic of keyspace.
2024-06-24 18:11:53 +03:00
AcherTT 811c5d7aeb
Add debug script command (#13289)
Add two new debug commands for outputing script.
1. `DEBUG SCRIPT LIST`
   Output all scripts.
2. `DEBUG SCRIPT <sha1>`
    Output a specific script.
    
Close #3846
2024-06-21 18:21:25 +08:00
debing.sun a03b6e29a9
Sort out mess document for RM_Replicate and RM_ReplicateVerbatim (#13323)
Related to #12647

1. Make clear that `RM_Replicate` and `RM_ReplicateVerbatim` are
non-thread safe.
2. Make clear that `RM_Replicate` and `RM_ReplicateVerbatim` are alwarys
wrapped into MULTI in any case.
2024-06-21 17:59:49 +08:00
Moti Cohen e18a173a81
Fix rdbLoadObject() empty hash (#13347)
As part of HFE feature, the logic of rdbLoadObject() was wrongly
modified to indicate of loaded empty hash from RDB as hash that all its
fields got expired. Rollback to `emptykey` logic. This function should
load blindly all fields, expired or not. Manually verified.

Few more minor fixes:
- remove hash double check of emptyKey
- Fix from `sds` to `hfield` in rdbLoadObject() (not really a bug. Both
are of type char*)
- Revert code rdbLoadObject() to get dbid instead of db
2024-06-20 13:53:47 +03:00
Filipe Oliveira (Redis) 24c85cc368
reduce getNodeByQuery CPU time by using less cache lines (from 2064 Bytes struct to 64 Bytes): reduces LLC misses and Memory Loads (#13296)
The following PR goes from 33 cacheline on getKeysResult struct (by
default has 256 static buffer)

```
root@hpe10:~/redis# pahole -p   ./src/server.o -C getKeysResult
typedef struct {
	keyReference               keysbuf[256];         /*     0  2048 */
	/* --- cacheline 32 boundary (2048 bytes) --- */
	/* typedef keyReference */ struct {
		int                pos;
		int                flags;
	} *keys; /*  2048     8 */
	int                        numkeys;              /*  2056     4 */
	int                        size;                 /*  2060     4 */

	/* size: 2064, cachelines: 33, members: 4 */
	/* last cacheline: 16 bytes */
} getKeysResult;
```


to 1 cacheline with a static buffer of 6 keys per command):
```
root@hpe10:~/redis# pahole -p   ./src/server.o -C getKeysResult
typedef struct {
	int                        numkeys;              /*     0     4 */
	int                        size;                 /*     4     4 */
	keyReference               keysbuf[6];           /*     8    48 */
	/* typedef keyReference */ struct {
		int                pos;
		int                flags;
	} *keys; /*    56     8 */

	/* size: 64, cachelines: 1, members: 4 */
} getKeysResult; 
```

we get around 1.5% higher ops/sec, and a confirmation of around 15% less
LLC loads on getNodeByQuery and 37% less Stores.

Function / Call Stack | CPU Time: Difference | CPU Time:
9462436fa4 | CPU Time: this PR | Loads:
Difference | Loads: 9462436fa4 | Loads:
this PR | Stores: Difference | Stores:
9462436fa4 | Stores: This PR
-- | -- | -- | -- | -- | -- | -- | -- | -- | --
getNodeByQuery | 0.753767 | 1.57118 | 0.817416 | 144297829 (15% less
loads) | 920575969 | 776278140 | 367607824 (37% less stores) | 991642384
| 624034560

## results on client side

### baseline 
```
taskset -c 2,3 memtier_benchmark -s 192.168.1.200 --port 6379 --authenticate perf --cluster-mode --pipeline 10 --data-size 100 --ratio 1:0 --key-pattern P:P --key-minimum=1 --key-maximum 1000000 --test-time 180 -c 25 -t 2 --hide-histogram 
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 180 secs]  0 threads:   110333450 ops,  604992 (avg:  612942) ops/sec, 84.75MB/sec (avg: 85.86MB/sec),  0.82 (avg:  0.81) msec latency

2         Threads
25        Connections per thread
180       Seconds


ALL STATS
======================================================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    MOVED/sec      ASK/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
------------------------------------------------------------------------------------------------------------------------------------------------------
Sets       612942.14          ---          ---         0.00         0.00         0.81332         0.80700         1.26300         2.92700     87924.12 
Gets            0.00         0.00         0.00         0.00         0.00             ---             ---             ---             ---         0.00 
Waits           0.00          ---          ---          ---          ---             ---             ---             ---             ---          --- 
Totals     612942.14         0.00         0.00         0.00         0.00         0.81332         0.80700         1.26300         2.92700     87924.12 
```

### comparison 
```
taskset -c 2,3 memtier_benchmark -s 192.168.1.200 --port 6379 --authenticate perf --cluster-mode --pipeline 10 --data-size 100 --ratio 1:0 --key-pattern P:P --key-minimum=1 --key-maximum 1000000 --test-time 180 -c 25 -t 2 --hide-histogram 
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 180 secs]  0 threads:   111731310 ops,  610195 (avg:  620707) ops/sec, 85.48MB/sec (avg: 86.95MB/sec),  0.82 (avg:  0.80) msec latency

2         Threads
25        Connections per thread
180       Seconds


ALL STATS
======================================================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    MOVED/sec      ASK/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
------------------------------------------------------------------------------------------------------------------------------------------------------
Sets       620707.72          ---          ---         0.00         0.00         0.80312         0.79900         1.23900         2.87900     89037.78 
Gets            0.00         0.00         0.00         0.00         0.00             ---             ---             ---             ---         0.00 
Waits           0.00          ---          ---          ---          ---             ---             ---             ---             ---          --- 
Totals     620707.72         0.00         0.00         0.00         0.00         0.80312         0.79900         1.23900         2.87900     89037.78
```

Co-authored-by: filipecosta90 <filipecosta.90@gmail.com>
2024-06-18 18:00:47 +08:00
Ozan Tezcan 4aa25d042c
Reply with array of return codes if the key does not exist for HFE commands (#13343)
Currently, HFE commands reply with empty array if the key does not
exist. Though, non-existing key and empty key is the same thing. 
It means fields given in the command do not exist in the empty key. 
So, replying with an array of 'no field' error codes (-2) suits better 
to Redis logic. e.g. Similarly, `hmget` returns array of nulls if the 
key does not exist.

After this PR:
```
127.0.0.1:6379> hpersist missingkey fields 2 a b
1) (integer) -2
2) (integer) -2
```
2024-06-14 09:35:05 +03:00
Jo 871c985919
Update `FIELDS` argument to block type for HFE commands schema (#13339)
I reviewed `XREAD` command syntax:
```
XREAD [COUNT count] [BLOCK milliseconds] STREAMS key [key ...] id [id ...]
```

Here’s the structure for `XREAD`:
```json
"arguments": [
            {
                "token": "COUNT",
                "name": "count",
                "type": "integer",
                "optional": true
            },
            {
                "token": "BLOCK",
                "name": "milliseconds",
                "type": "integer",
                "optional": true
            },
            {
                "name": "streams",
                "token": "STREAMS",
                "type": "block",
                "arguments": [
                    {
                        "name": "key",
                        "type": "key",
                        "key_spec_index": 0,
                        "multiple": true
                    },
                    {
                        "name": "ID",
                        "type": "string",
                        "multiple": true
                    }
                ]
            }
]
```

Now, consider the `HEXPIRE` syntax:
```
HEXPIRE key seconds [NX | XX | GT | LT] FIELDS numfields field [field ...]
```

Since the `FIELDS` token functions similarly to `STREAMS`, and given that `STREAMS` is defined as a block, I believe the `FIELDS` in `hepxire` should also be defined as a block.
2024-06-14 13:51:49 +08:00
debing.sun ed10f737b8
Add new hexpired notification for HFE (#13329)
When the hash field expired, we will send a new `hexpired` notification.
It mainly includes the following three cases:
1. When field expired by active expiration.
2. When field expired by lazy expiration.
3. When the user uses the `h(p)expire(at)` command, the user will also
get a `hexpired` notification if the field expires during the command.

## Improvement
1. Now if more than one field expires in the hmget command, we will only
send a `hexpired` notification.
2. When a field with TTL is deleted by commands like hdel without
updating the global DS, active expire will not send a notification.

---------

Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
Co-authored-by: Moti Cohen <moti.cohen@redis.com>
2024-06-11 21:42:34 +08:00
Moti Cohen f01fdc3960
Reserve 2 bits out of EB_EXPIRE_TIME_MAX for possible future use (#13331)
Reserve 2 bits out of hash-field expiration time (`EB_EXPIRE_TIME_MAX`)
for possible future lightweight indexing/categorizing of fields. It can
be achieved by hacking HFE as follows:
```
HPEXPIREAT key [ 2^47 + USER_INDEX ] FIELDS numfields field [field …]
```

Redis will also need to expose kind of `HEXPIRESCAN` and `HEXPIRECOUNT`
for this idea. Yet to be better defined.

`HFE_MAX_ABS_TIME_MSEC` constraint must be enforced only at API level.
Internally, the expiration time can be up to `EB_EXPIRE_TIME_MAX` for
future readiness.
2024-06-10 16:57:26 +03:00
Moti Cohen ce121b9261
HFE - Avoid lazy expire if called by modules + cleanup (#13326)
Need to be carefull if called by modules since modules API allow to open
and close key handler. We don't want to invalidate the handler
underneath.

* hashTypeExists(), hashTypeGetValueObject() - will return the logical
state of the field. A flag will indicate noExpire.
* RM_HashGet() - Will get NULL if the field expired. Fields won’t be
deleted.
* RM_ScanKey() - might return 0 items if all fields got expired. Fields
won’t be deleted.
* RM_HashSet() - If set, then override expired field. If delete, we can
either delete or leave it to active-expiration. XX/NX - logically
correct (Verify with tests).

Nice to have (not implemented):
* RedisModule_CloseKey() - We can local active-expire up-to 100 items. 

Note:
Length will be wrong to modules just like redis (Count expired fields).
2024-06-10 11:24:26 +03:00
debing.sun 690ef36330
Fix HFE notification test timing caused by field expiration in hexpire command (#13325)
In the old test, we give the `hexpire` a very short expire time, which
caused the filed to be deleted by the time `hpersist` command was
executed. As a result, the `hpersist` command won't be able to give a
`hpersist` notification, leading to test stuck.

fail CI:
https://github.com/redis/redis/actions/runs/9342175887/job/25709886471
2024-06-05 14:49:45 +08:00
debing.sun 9a2c6ba4e7
Prevent negative expire parameter in HEXPIRE/HEXPIREAT/HPEXPIRE/HPEXPIREAT commands (#13310)
1. Don't allow HEXPIRE/HEXPIREAT/HPEXPIRE/HPEXPIREAT command expire
parameters is negative

2. Remove a dead code reported from Coverity.
when `unit` is not `UNIT_SECONDS`, the second `if (expire > (long long)
EB_EXPIRE_TIME_MAX)` will be dead code.
```c
# t_hash.c
2988    /* Check expire overflow */
      	cond_at_most: Condition expire > 281474976710655LL, taking false branch. Now the value of expire is at most 281474976710655.
2989    if (expire > (long long) EB_EXPIRE_TIME_MAX) {
2990        addReplyErrorExpireTime(c);
2991        return;
2992    }

2994    if (unit == UNIT_SECONDS) {
2995        if (expire > (long long) EB_EXPIRE_TIME_MAX / 1000) {
2996            addReplyErrorExpireTime(c);
2997            return;
2998        }
2999        expire *= 1000;
3000    } else {
      	at_most: At condition expire > 281474976710655LL, the value of expire must be at most 281474976710655.
      	dead_error_condition: The condition expire > 281474976710655LL cannot be true.
3001        if (expire > (long long) EB_EXPIRE_TIME_MAX) {
      	
CID 494223: (#1 of 1): Logically dead code (DEADCODE)
dead_error_begin: Execution cannot reach this statement: addReplyErrorExpireTime(c);.
3002            addReplyErrorExpireTime(c);
3003            return;
3004        }
3005    }
```

---------

Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
2024-06-04 20:35:26 +08:00
gms f36b5a8586
Fix crash due to unblock client during slot migration (#13311)
In #13224, we found a crash during cluster slot migration but don't know
why. So i check all the return C_OK in processCommand to see if we are
missing some duration reset and see this.

This fix is like #12247, when we reject the command, we should reset the
duration. I test it and verify it can fix #13224.

So the reason may because we are using stream block and then during the
slot migration, it got a redirect and then crash the server.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2024-06-04 20:16:36 +08:00
Ozan Tezcan 293a68afdf
Use lookupKeyWrite() for hpersist command (#13321)
As hpersist is a write command, we should use lookupKeyWrite() instead
of lookupKeyRead() to fetch the key.
2024-06-04 12:04:37 +03:00
Ozan Tezcan 44352beefa
Fix crash in RM_ScanKey() when used with hexpire (#13320)
RM_ScanKey() was overlooked while introducing hash field expiration. 
An assert is triggered when it is called on a hash key with
OBJ_ENCODING_LISTPACK_EX encoding.

I've changed to code to handle listpackex encoding properly.
2024-06-04 11:42:02 +03:00
Moti Cohen 56169112e3
Fix returned value nextExpireTime by ebExpire() (#13313)
At `ebuckets` structure, On `ebExpire()`, if the callback indicated to update 
the item expiration time and return it back to ebuckets (`ACT_UPDATE_EXP_ITEM`), 
then returned value `nextExpireTime` should be updated, if needed. 
Invalid value of `nextExpireTime` was modified from 0 to `EB_EXPIRE_TIME_INVALID`.
2024-06-03 13:54:44 +03:00
Valentino Geron 50569a906c
Free current client asynchronously after user permissions changes (#13274)
The crash happens when the user that triggers the permission changes
should be affected (and should be disconnected eventually).

To handle such a scenario, we should use the
`CLIENT_CLOSE_AFTER_COMMAND` flag.

This commit encapsulates all the places that should be handled in the
same way in `deauthenticateAndCloseClient`

Also:
* bugfix: during the ACL LOAD we ignore clients that are marked as
`CLIENT MASTER`
2024-05-30 22:09:30 +08:00