In the old test, we give the `hexpire` a very short expire time, which
caused the filed to be deleted by the time `hpersist` command was
executed. As a result, the `hpersist` command won't be able to give a
`hpersist` notification, leading to test stuck.
fail CI:
https://github.com/redis/redis/actions/runs/9342175887/job/25709886471
1. Don't allow HEXPIRE/HEXPIREAT/HPEXPIRE/HPEXPIREAT command expire
parameters is negative
2. Remove a dead code reported from Coverity.
when `unit` is not `UNIT_SECONDS`, the second `if (expire > (long long)
EB_EXPIRE_TIME_MAX)` will be dead code.
```c
# t_hash.c
2988 /* Check expire overflow */
cond_at_most: Condition expire > 281474976710655LL, taking false branch. Now the value of expire is at most 281474976710655.
2989 if (expire > (long long) EB_EXPIRE_TIME_MAX) {
2990 addReplyErrorExpireTime(c);
2991 return;
2992 }
2994 if (unit == UNIT_SECONDS) {
2995 if (expire > (long long) EB_EXPIRE_TIME_MAX / 1000) {
2996 addReplyErrorExpireTime(c);
2997 return;
2998 }
2999 expire *= 1000;
3000 } else {
at_most: At condition expire > 281474976710655LL, the value of expire must be at most 281474976710655.
dead_error_condition: The condition expire > 281474976710655LL cannot be true.
3001 if (expire > (long long) EB_EXPIRE_TIME_MAX) {
CID 494223: (#1 of 1): Logically dead code (DEADCODE)
dead_error_begin: Execution cannot reach this statement: addReplyErrorExpireTime(c);.
3002 addReplyErrorExpireTime(c);
3003 return;
3004 }
3005 }
```
---------
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
In #13224, we found a crash during cluster slot migration but don't know
why. So i check all the return C_OK in processCommand to see if we are
missing some duration reset and see this.
This fix is like #12247, when we reject the command, we should reset the
duration. I test it and verify it can fix#13224.
So the reason may because we are using stream block and then during the
slot migration, it got a redirect and then crash the server.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
RM_ScanKey() was overlooked while introducing hash field expiration.
An assert is triggered when it is called on a hash key with
OBJ_ENCODING_LISTPACK_EX encoding.
I've changed to code to handle listpackex encoding properly.
At `ebuckets` structure, On `ebExpire()`, if the callback indicated to update
the item expiration time and return it back to ebuckets (`ACT_UPDATE_EXP_ITEM`),
then returned value `nextExpireTime` should be updated, if needed.
Invalid value of `nextExpireTime` was modified from 0 to `EB_EXPIRE_TIME_INVALID`.
The crash happens when the user that triggers the permission changes
should be affected (and should be disconnected eventually).
To handle such a scenario, we should use the
`CLIENT_CLOSE_AFTER_COMMAND` flag.
This commit encapsulates all the places that should be handled in the
same way in `deauthenticateAndCloseClient`
Also:
* bugfix: during the ACL LOAD we ignore clients that are marked as
`CLIENT MASTER`
**Related issue**
https://github.com/redis/redis/issues/13219
**Motivation**
Currently we have to manually update the all_tests variable when
introducing new test files.
**Modification**
I have modified it to list test files dynamically, but instead of
modifying it to add all test files, I have modified it to only add only
test files from the following 4 paths
- unit
- unit/type
- unit/cluster
- integration
so that it doesn't deviate too much from what we already do
**Result**
- dynamically list test files to all_tests variable
- close issue https://github.com/redis/redis/issues/13219
**Additional information**
- removed `list-common.tcl` file and added
`generate_largevalue_test_array` proc in `util.tcl`. because
`list-common.tcl` is not a test file
- There is an order dependency. So I added a code to the "Is a ziplist
encoded Hash promoted on big payload?" test that resets
hash-max-listpack-value to the default (64).
---------
Signed-off-by: jonghoonpark <dev@jonghoonpark.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
## Background
This PR introduces support for field-level expiration in Redis hashes. Previously, Redis supported expiration only at the key level, but this enhancement allows setting expiration times for individual fields within a hash.
## New commands
* HEXPIRE
* HEXPIREAT
* HEXPIRETIME
* HPERSIST
* HPEXPIRE
* HPEXPIREAT
* HPEXPIRETIME
* HPTTL
* HTTL
## Short example
from @moticless
```sh
127.0.0.1:6379> hset myhash f1 v1 f2 v2 f3 v3
(integer) 3
127.0.0.1:6379> hpexpire myhash 10000 NX fields 2 f2 f3
1) (integer) 1
2) (integer) 1
127.0.0.1:6379> hpttl myhash fields 3 f1 f2 f3
1) (integer) -1
2) (integer) 9997
3) (integer) 9997
127.0.0.1:6379> hgetall myhash
1) "f3"
2) "v3"
3) "f2"
4) "v2"
5) "f1"
6) "v1"
... after 10 seconds ...
127.0.0.1:6379> hgetall myhash
1) "f1"
2) "v1"
127.0.0.1:6379>
```
## Expiration strategy
1. Integrate active
Redis periodically performs active expiration and deletion of hash keys that contain expired fields, with a maximum attempt limit.
3. Lazy expiration
When a client touches fields within a hash, Redis checks if the fields are expired. If a field is expired, it will be deleted. However, we do not delete expired fields during a traversal, we implicitly skip over them.
## RDB changes
Add two new rdb type s`RDB_TYPE_HASH_METADATA` and `RDB_TYPE_HASH_LISTPACK_EX`.
## Notification
1. Add `hpersist` notification for `HPERSIST` command.
5. Add `hexpire` notification for `HEXPIRE`, `HEXPIREAT`, `HPEXPIRE` and `HPEXPIREAT` commands.
## Internal
1. Add new data structure `ebuckets`, which is used to store TTL and keys, enabling quick retrieval of keys based on TTL.
2. Add new data structure `mstr` like sds, which is used to store a string with TTL.
This work was done by @moticless, @tezc, @ronen-kalish, @sundb, I just release it.
* For replica sake, rewrite commands `H*EXPIRE*` , `HSETF`, `HGETF` to
have absolute unix time in msec.
* On active-expiration of field, propagate HDEL to replica
(`propagateHashFieldDeletion()`)
* On lazy-expiration, propagate HDEL to replica (`hashTypeGetValue()`
now calls `hashTypeDelete()`. It also takes care to call
`propagateHashFieldDeletion()`).
* Fix `H*EXPIRE*` command such that if it gets flag `LT` and it doesn’t
have any expiration on the field then it will considered as valid
condition.
Note, replicas doesn’t make any active expiration, and should avoid lazy
expiration. On `hashTypeGetValue()` it doesn't check expiration (As long
as the master didn’t request to delete the field, it is valid)
TODO:
* Attach `dbid` to HASH metadata. See
[here](https://github.com/redis/redis/pull/13209#discussion_r1593385850)
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
In the last step of hscan, while replying to client, we assume all items
in the result list are keys which are mstr instances. Though, there
might be values which are sds instances.
Added a check to avoid calling mstrlen() for value objects.
To reproduce:
```
127.0.0.1:6379> hset myhash1 a 11111111111111111111111111111111111111111111111111111111111111111
(integer) 0
127.0.0.1:6379> hscan myhash1 0
1) "0"
2) 1) "a"
2) "11111111111111111111111111111111111111111111111111111111111111111\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
```
Added `--keystats` and `--keystats-samples <n>` commands.
The new commands combines memkeys and bigkeys with additional
distribution data.
We often run memkeys and bigkeys one after the other. It will be
convenient to just have one command.
Distribution and top 10 key sizes are useful when we have multiple keys
taking a lot of memory.
Like for memkeys and bigkeys, we can use `-i <n>` and Ctrl-C to
interrupt the scan loop. It will still show the statistics on the keys
sampled so far.
We can use two new optional parameters:
- `--cursor <n>` to continue from the last cursor after an interruption
of the scan.
- `--top <n>` to change the number of top key sizes (10 by default).
Implemented a fix for the `--memkeys-samples 0` issue in order to use
`--keystats-samples 0`.
Used hdr_histogram for the key size distribution.
For key length, hdr_histogram seemed overkill and preset ranges were
used.
The memory used by keystats with hdr_histogram is around 7MB (compared
to 3MB for memkeys or bigkeys).
Execution time is somewhat equivalent to adding memkeys and bigkeys
time. Each scan loop is having more commands (key type, key size, key
length/cardinality).
We can redirect the output to a file. In that case, no color or text
refresh will happen.
Limitation:
- As the information printed during the loop is refreshed (moving cursor
up), stderr and information not fitting in the terminal window (width or
height) might create some refresh issues.
Comments:
- config.top_sizes_limit could be used globally like config.count, but
it is passed as parameter to be consistent with config.memkeys_samples.
- Not sure if we should move some utility functions to cli-common.c.
Got some tips and help from @ofirluzon.
---------
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: debing.sun <debing.sun@redis.com>
Added hashes_with_expiry_fields.
Optimially it would better to have statistic of that counts all fields
with expiry. But it requires careful logic and computation to follow and
deep dive listpacks and hashes. This statistics is trivial to achieve
and reflected by global HFE DS that has builtin enumeration of all the
hashes that are registered in it.
on active-expire of buckets, at function `ebExpire()` ->
`ebSegExpire()`, if callback `onExpireItem()` indicated to stop, Yet
iterator (iter) was wrongly already got updated to point to next item.
In turn the segment will be updated without current item.
This PR contains a few optimizations for hfe listpack.
- Hfe fields are ordered by TTL in the listpack. There are two cases
that we want to search listpack according to TTLs:
- As part of active-expiry, we need to find the fields that are expired.
e.g. find fields that have smaller TTLs than given timestamp.
- When we want to add a new field, we need to find the correct position
to maintain the order by TTL. e.g. find the field that has a higher TTL
than the one we want to insert.
Iterating with lpNext() to compare TTLs has a performance cost as
lpNext() calls lpValidateIntegrity() for each entry. Instead, this PR
adds `lpFindCb()` to the listpack which accepts a comparator callback.
It preserves same validation logic of lpFind() which is faster than
search with lpNext().
- We have field name, value, ttl for a single hfe field. Inserting these
items one by one to listpack is costly. Especially, as we place fields
according to TTL, most additions will end up in the middle of the
listpack. Each insert causes realloc + memmove. This PR introduces
`lpBatchInsert()` to add multiple items in one go.
- For hsetf, if we are going to update value and TTL at the same time,
currently, we update the value first and later update the TTL (two
distinct listpack operation). This PR improves it by doing it with a
single update operation.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Add the following validations:
1. Get TTL using the lpGetIntegerValue() method instead of lpGetValue(),
Ref https://github.com/redis/redis/pull/13209#discussion_r1602569422
2. The TTL of listpackex is a number in the valid range
(0~EB_EXPIRE_TIME_MAX) and ordered.
3. The TTL fields of listpackex are ordered.
4. The TTL of hashtable is within the valid range
(0~EB_EXPIRE_TIME_MAX).
Other:
Fix the missing of handling OBJ_ENCODING_LISTPACK_EX in
dismissHashObject().
---------
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
`reclaimFilePageCache` did not set `errno` but `rdbSaveInternal` which
is logging the error assumed it did. This makes sure `errno` is set.
Fixes#13245
Signed-off-by: Ted Lyngmo <ted@lyncon.se>
This PR is based on the commits from PR #12944.
Allow SPUBLISH command within multi/exec on replica
Behavior on unstable:
```
127.0.0.1:6380> CLUSTER NODES
39ce8aa20f1f0d91f1a88d976ee1926dfefcdf1a 127.0.0.1:6380@16380 myself,slave 8b0feb120b68aac489d6a5af9c77dc40d71bc792 0 0 0 connected
8b0feb120b68aac489d6a5af9c77dc40d71bc792 127.0.0.1:6379@16379 master - 0 1705091681202 0 connected 0-16383
127.0.0.1:6380> SPUBLISH hello world
(integer) 0
127.0.0.1:6380> MULTI
OK
127.0.0.1:6380(TX)> SPUBLISH hello world
QUEUED
127.0.0.1:6380(TX)> EXEC
(error) MOVED 866 127.0.0.1:6379
```
With this change:
```
127.0.0.1:6380> SPUBLISH hello world
(integer) 0
127.0.0.1:6380> MULTI
OK
127.0.0.1:6380(TX)> SPUBLISH hello world
QUEUED
127.0.0.1:6380(TX)> EXEC
1) (integer) 0
```
---------
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: oranagra <oran@redislabs.com>
Add RDB de/serialization for HFE
This PR adds two new RDB types: `RDB_TYPE_HASH_METADATA` and
`RDB_TYPE_HASH_LISTPACK_TTL` to save HFE data.
When the hash RAM encoding is dict, it will be saved in the former, and
when it is listpack it will be saved in the latter.
Both formats just add the TTL value for each field after the data that
was previously saved, i.e HASH_METADATA will save the number of entries
and, for each entry, key, value and TTL, whereas listpack is saved as a
blob.
On read, the usual dict <--> listpack conversion takes place if
required.
In addition, when reading a hash that was saved as a dict fields are
actively expired if expiry is due. Currently this slao holds for
listpack encoding, but it is supposed to be removed.
TODO:
Remove active expiry on load when loading from listpack format (unless
we'll decide to keep it)
This test was introducted by #13251.
Normally we auto transform the reply format of XREADGROUP to array under
RESP3 (see trasformer_funcs).
But when we execute XREADGROUP command in multi it can't work, which
cause the new test failed.
The solution is to verity the reply of XREADGROUP in advance rather than
in MULTI.
Failed validate schema CI:
https://github.com/redis/redis/actions/runs/9025128323/job/24800285684
---------
Co-authored-by: guybe7 <guy.benoish@redislabs.com>
## Background
1. All hash objects that contain HFE are referenced by db->hexpires.
2. All fields in a dict hash object with HFE are referenced by an
ebucket.
So when we defrag the hash object or the field in a dict with HFE, we
also need to update the references in them.
## Interface
1. Add a new interface `ebDefragItem`, which can accept a defrag
callback to defrag items in ebuckets, and simultaneously update their
references in the ebucket.
## Mainly changes
1. The key type of dict of hash object is no longer sds, so add new
`activeDefragHfieldDict()` to defrag the dict instead of
`activeDefragSdsDict()`.
2. When we defrag the dict of hash object by using `dictScanDefrag()`,
we always set the defrag callback `defragKey` of `dictDefragFunctions`
to NULL, because we can't reallocate a field with out updating it's
reference in ebuckets.
Instead, we will defrag the field of the dict and update its reference
in the callback `dictScanDefrag` of dictScanFunction().
3. When we defrag the hash robj with HFE, we will use `ebDefragItem` to
defrag the robj and update the reference in db->hexpires.
## TODO:
Defrag ebucket structure incremently, which will be handler in a future
PR.
---------
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
Co-authored-by: Moti Cohen <moti.cohen@redis.com>
If encoding is listpack, hgetf and hsetf commands reply field value type
as integer.
This PR fixes it by returning string.
Problematic cases:
```
127.0.0.1:6379> hset hash one 1
(integer) 1
127.0.0.1:6379> hgetf hash fields 1 one
1) (integer) 1
127.0.0.1:6379> hsetf hash GETOLD fvs 1 one 2
1) (integer) 1
127.0.0.1:6379> hsetf hash DOF GETNEW fvs 1 one 2
1) (integer) 2
```
Additional fixes:
- hgetf/hsetf command description text
Fixes#13261, #13262
added reverse history search to redis-cli, use it with the following:
* CTRL+R : enable search backward mode, and search next one when
pressing CTRL+R again until reach index 0.
```
127.0.0.1:6379> keys one
127.0.0.1:6379> keys two
(reverse-i-search): # press CTRL+R
(reverse-i-search): keys two # input `keys`
(reverse-i-search): keys one # press CTRL+R again
(reverse-i-search): keys one # press CTRL+R again, still `keys one` due to reaching index 0
(i-search): keys two # press CTRL+S, enable search forward
(i-search): keys two # press CTRL+S, still `keys one` due to reaching index 1
```
* CTRL+S : enable search forward mode, and search next one when pressing
CTRL+S again until reach index 0.
```
127.0.0.1:6379> keys one
127.0.0.1:6379> keys two
(i-search): # press CTRL+S
(i-search): keys one # input `keys`
(i-search): keys two # press CTRL+S again
(i-search): keys two # press CTRL+R again, still `keys two` due to reaching index 0
(reverse-i-search): keys one # press CTRL+R, enable search backward
(reverse-i-search): keys one # press CTRL+S, still `keys one` due to reaching index 1
```
* CTRL+G : disable
```
127.0.0.1:6379> keys one
127.0.0.1:6379> keys two
(reverse-i-search): # press CTRL+R
(reverse-i-search): keys two # input `keys`
127.0.0.1:6379> # press CTRL+G
```
* CTRL+C : disable
```
127.0.0.1:6379> keys one
127.0.0.1:6379> keys two
(reverse-i-search): # press CTRL+R
(reverse-i-search): keys two # input `keys`
127.0.0.1:6379> # press CTRL+G
```
* TAB : use the current search result and exit search mode
```
127.0.0.1:6379> keys one
127.0.0.1:6379> keys two
(reverse-i-search): # press CTRL+R
(reverse-i-search): keys two # input `keys`
127.0.0.1:6379> keys two # press TAB
```
* ENTER : use the current search result and execute the command
```
127.0.0.1:6379> keys one
127.0.0.1:6379> keys two
(reverse-i-search): # press CTRL+R
(reverse-i-search): keys two # input `keys`
127.0.0.1:6379> keys two # press ENTER
(empty array)
127.0.0.1:6379>
```
* any arrow key will disable reverse search
your result will have the search match bolded, you can press enter to
execute the full result
note: I have _only added this for multi-line mode_, as it seems to be
forced that way when `repl` is called
Closes: https://github.com/redis/redis/issues/8277
---------
Co-authored-by: Clayton Northey <clayton@knowbl.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Bjorn Svensson <bjorn.a.svensson@est.tech>
Co-authored-by: Viktor Söderqvist <viktor@zuiderkwast.se>
**Changes:**
- Adds listpack support to hash field expiration
- Implements hgetf/hsetf commands
**Listpack support for hash field expiration**
We keep field name and value pairs in listpack for the hash type. With
this PR, if one of hash field expiration command is called on the key
for the first time, it converts listpack layout to triplets to hold
field name, value and ttl per field. If a field does not have a TTL, we
store zero as the ttl value. Zero is encoded as two bytes in the
listpack. So, once we convert listpack to hold triplets, for the fields
that don't have a TTL, it will be consuming those extra 2 bytes per
item. Fields are ordered by ttl in the listpack to find the field with
minimum expiry time efficiently.
**New command implementations as part of this PR:**
- HGETF command
For each specified field get its value and optionally set the field's
expiration time in sec/msec /unix-sec/unix-msec:
```
HGETF key
[NX | XX | GT | LT]
[EX seconds | PX milliseconds | EXAT unix-time-seconds | PXAT
unix-time-milliseconds | PERSIST]
<FIELDS count field [field ...]>
```
- HSETF command
For each specified field value pair: set field to value and optionally
set the field's expiration time in sec/msec /unix-sec/unix-msec:
```
HSETF key
[DC]
[DCF | DOF]
[NX | XX | GT | LT]
[GETNEW | GETOLD]
[EX seconds | PX milliseconds | EXAT unix-time-seconds | PXAT
unix-time-milliseconds | KEEPTTL]
<FVS count field value [field value …]>
```
Todo:
- Performance improvement.
- rdb load/save
- aof
- defrag
Because it does not cause any propagation (arguably it should, see the
comment in the tcl file)
The motivation for this fix is that in 6.2 if dirty changed without
propagation inside MULTI/EXEC it would cause propagation of EXEC only,
which would result in the replica sending errors to its master
Unify infra of `HSETF`, `HEXPIRE`, `HSET` and provide API for RDB load
as well. Whereas setting plain fields is rather straightforward, setting
expiration time to fields might be time-consuming and complex since each
update of expiration time, not only updates `ebuckets` of corresponding hash,
but also might update `ebuckets` of global HFE DS. It is required to opt
sequence of field updates with expirartion for a given hash, such that only once
done, the global HFE DS will get updated.
To do so, follow the scheme:
1. Call `hashTypeSetExInit()` to initialize the HashTypeSetEx struct.
2. Call `hashTypeSetEx()` one time or more, for each field/expiration update.
3. Call `hashTypeSetExDone()` for notification and update of global HFE.
If expiration is not required, then avoid this API and use instead hashTypeSet().
For my mistake, in the last revert commit in #13231, I originally wanted
to revert the last one, but reverted the penultimate fix.
Now that we have fix another potential memory read issue in [`743f1dd`
(#13231)](743f1dde79),
now it just seems to avoid confusion, i will verify in the future
whether it will have any impact, if so we will add this PR to backport.
Failed CI: https://github.com/sundb/redis/actions/runs/8826731960
Introducted by #13013
After defragmenting the dictionary in the kvstore, if the dict is
reallocated, the value of its node in the kvstore rehashing list must be
updated.
- Add ebuckets & mstr data structures
- Integrate active & lazy expiration
- Add most of the commands
- Add support for dict (listpack is missing)
TODOs: RDB, notification, listpack, HSET, HGETF, defrag, aof
## Background
1. Currently Lua memory control does not pass through Redis's zmalloc.c.
Redis maxmemory cannot limit memory problems caused by users abusing lua
since these lua VM memory is not part of used_memory.
2. Since jemalloc is much better (fragmentation and speed), and also we
know it and trust it. we are
going to use jemalloc instead of libc to allocate the Lua VM code and
count it used memory.
## Process:
In this PR, we will use jemalloc in lua.
1. Create an arena for all lua vm (script and function), which is
shared, in order to avoid blocking defragger.
2. Create a bound tcache for the lua VM, since the lua VM and the main
thread are by default in the same tcache, and if there is no isolated
tcache, lua may request memory from the tcache which has just been freed
by main thread, and vice versa
On the other hand, since lua vm might be release in bio thread, but
tcache is not thread-safe, we need to recreate
the tcache every time we recreate the lua vm.
3. Remove lua memory statistics from memory fragmentation statistics to
avoid the effects of lua memory fragmentation
## Other
Add the following new fields to `INFO DEBUG` (we may promote them to
INFO MEMORY some day)
1. allocator_allocated_lua: total number of bytes allocated of lua arena
2. allocator_active_lua: total number of bytes in active pages allocated
in lua arena
3. allocator_resident_lua: maximum number of bytes in physically
resident data pages mapped in lua arena
4. allocator_frag_bytes_lua: fragment bytes in lua arena
This is oranagra's idea, and i got some help from sundb.
This solves the third point in #13102.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
When using Redis Cloud, sendReadOnly() exit with `Error: ERR unknown
command 'READONLY'`.
It is impacting `--memkeys`, `--bigkeys`, `--hotkeys`, and will impact
`--keystats`.
Added one line to ignore this error.
issue introduced in #12735 (not yet released).
when the main thread and the module thread are in the same thread,
sched_yield() can work well.
when they are both bind to different cpus, sched_yield() will look for
the thread with the highest priority, and if the module thread is always
the highest priority on a cpu, it will take a long time to let the main
thread to reacquire the GIL.
ref https://man7.org/linux/man-pages/man2/sched_yield.2.html
```
If the calling thread is the only thread in the highest priority
list at that time, it will continue to run after a call to
sched_yield().
```
# Overview
Users utilize the `FLUSHDB SYNC` and `FLUSHALL SYNC` commands for a variety of
reasons. The main issue with this command is that if the database becomes
substantial in size, the server will be unresponsive for an extended period.
Other than freezing application traffic, this may also lead some clients making
incorrect judgments about the server's availability. For instance, a watchdog may
erroneously decide to terminate the process, resulting in potential adverse
outcomes. While a `FLUSH* ASYNC` can address these issues, it might not be used
for two reasons: firstly, it's not the default, and secondly, in some cases, the
client issuing the flush wants to wait for its completion before repopulating the
database.
Between the option of triggering FLUSH* asynchronously in the background without
indication for completion versus running it synchronously in the foreground by
the main thread, there is another more appealing option. We can block the
client that requested the flush, execute the flush command in the background, and
once done, unblock the client and return notification for completion. This approach
ensures the server remains responsive to other clients, and the blocked client
receives the expected response only after the flush operation has been successfully
carried out.
# Implementation details
Instead of defining yet another flavor to the flush command, we can modify
`FLUSHALL SYNC` and `FLUSHDB SYNC` always run in this new mode.
## Extending BIO Threads capabilities
Today jobs that are carried out by BIO threads don't have the capability to
indicate completion to the main thread. We can add this infrastructure by having
an additional dummy job, coined as completion-job, that eventually will be written
by BIO threads to a response-queue. The main thread will take care to consume items
from the response-queue and call the provided callback function of each
completion-job.
## FLUSH* SYNC to run as blocking ASYNC
Command `FLUSH* SYNC` will be modified to create one or more async jobs to flush
DB(s) and afterward will push additional completion-job request. By sending the
completion job request only at the end, the main thread will be called back only
after all the preceding jobs completed their task in the background. During that
time, the client of the command is suspended and marked as `BLOCKED_LAZYFREE`
whereas any other client will be able to communicate with the server without any
issue.
It calls kvstoreIteratorNextDict() which eventually calls dictResumeRehashing()
And then, on return, it calls dictResetIterator(iter) which calls dictResumeRehashing().
We end up with pauserehash value decremented twice instead of once.
In ASAN CI, we find server may crash because of NULL ptr in `kvstoreIncrementallyRehash`.
the reason is that we use two phase unlink in `dbGenericDelete`. After `kvstoreDictTwoPhaseUnlinkFind`,
the dict may be in rehashing and only have one element in ht[0] of `db->keys`.
When we delete the last element in `db->keys` meanwhile `db->keys` is in rehashing, we may free the
dict in `kvstoreDictTwoPhaseUnlinkFree` without deleting the node in `kvs->rehashing`. Then we may
use this freed ptr in `kvstoreIncrementallyRehash` in the `serverCron` and cause the crash.
This is indeed a use-after-free problem.
The fix is to call rehashingCompleted in dictRelease and dictEmpty, so that every call for
rehashingStarted is always matched with a rehashingCompleted.
Adding a test in the unit test to catch it consistently
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
In `beginResultEmission`, -1 means the result length is not known in
advance. But after #12185, if we pass -1 to `zrangeResultBeginStore`, it
will convert to SIZE_MAX in `zsetTypeCreate` and try to `dictExpand`.
Although `dictExpand` won't succeed because the size overflows, I think
we'd better to avoid this wrong conversion.
This bug can be triggered when the source of `zrangestore` doesn't exist
or we use `zrangestore` command with `byscore` or `bylex`.
The impact is that dst keys will be converted to use skiplist instead of
listpack.
Users who abuse lua error_reply will generate a new error object on each
error call, which can make server.errors get bigger and bigger. This
will
cause the server to block when calling INFO (we also return errorstats
by
default).
To prevent the damage it can cause, when a misuse is detected, we will
print a warning log and disable the errorstats to avoid adding more new
errors. It can be re-enabled via CONFIG RESETSTAT.
Because server.errors may be very large (it may be better now since we
have the limit), config resetstat may block for a while. So in
resetErrorTableStats, we will try to lazyfree server.errors.
See the related discussion at the end of #8217.