Commit Graph

12297 Commits

Author SHA1 Message Date
Moti Cohen 33fc0fbfae
HFE to support AOF and replicas (#13285)
* For replica sake, rewrite commands `H*EXPIRE*` , `HSETF`, `HGETF` to
have absolute unix time in msec.
* On active-expiration of field, propagate HDEL to replica
(`propagateHashFieldDeletion()`)
* On lazy-expiration, propagate HDEL to replica (`hashTypeGetValue()`
now calls `hashTypeDelete()`. It also takes care to call
`propagateHashFieldDeletion()`).
* Fix `H*EXPIRE*` command such that if it gets flag `LT` and it doesn’t
have any expiration on the field then it will considered as valid
condition.

Note, replicas doesn’t make any active expiration, and should avoid lazy
expiration. On `hashTypeGetValue()` it doesn't check expiration (As long
as the master didn’t request to delete the field, it is valid)

TODO: 
* Attach `dbid` to HASH metadata. See
[here](https://github.com/redis/redis/pull/13209#discussion_r1593385850)

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2024-05-29 19:47:48 +08:00
Ozan Tezcan 6a11d458be
Fix hscan return value (#13297)
In the last step of hscan, while replying to client, we assume all items
in the result list are keys which are mstr instances. Though, there 
might be values which are sds instances. 

Added a check to avoid calling mstrlen() for value objects.

To reproduce:
```
127.0.0.1:6379> hset myhash1 a 11111111111111111111111111111111111111111111111111111111111111111
(integer) 0
127.0.0.1:6379> hscan myhash1 0
1) "0"
2) 1) "a"
   2) "11111111111111111111111111111111111111111111111111111111111111111\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
```
2024-05-28 11:59:57 +03:00
Ozan Tezcan e2918705c8
Fix hfe reply schemas (#13295)
In https://github.com/redis/redis/pull/13291, we've changed that hfe
commands to return empty array if the key does not exist.
Forgot to update json schemas.
2024-05-27 16:07:01 +03:00
debing.sun 2d1bb42cba
Update version references from 8.0 to 7.4 for upcoming release (#13294) 2024-05-27 16:47:23 +08:00
Ozan Tezcan 2f34f6f0b9
Delete hsetf and hgetf (#13291)
Changes:
- Delete hsetf and hgetf commands
- Hfe commands will return empty array instead of nil.
---------

Co-authored-by: Moti Cohen <moticless@gmail.com>
2024-05-26 13:30:45 +03:00
Moti Cohen 60e1582ddb
Fix statistics test (#13293) 2024-05-26 03:34:15 +03:00
Yves LeBras 6801a3cec8
redis-cli --keystats and --keystats-samples with --top and --cursor (#12826)
Added `--keystats` and `--keystats-samples <n>` commands.

The new commands combines memkeys and bigkeys with additional
distribution data.
We often run memkeys and bigkeys one after the other. It will be
convenient to just have one command.
Distribution and top 10 key sizes are useful when we have multiple keys
taking a lot of memory.

Like for memkeys and bigkeys, we can use `-i <n>` and Ctrl-C to
interrupt the scan loop. It will still show the statistics on the keys
sampled so far.
We can use two new optional parameters:
- `--cursor <n>` to continue from the last cursor after an interruption
of the scan.
- `--top <n>` to change the number of top key sizes (10 by default).

Implemented a fix for the `--memkeys-samples 0` issue in order to use
`--keystats-samples 0`.

Used hdr_histogram for the key size distribution.

For key length, hdr_histogram seemed overkill and preset ranges were
used.

The memory used by keystats with hdr_histogram is around 7MB (compared
to 3MB for memkeys or bigkeys).

Execution time is somewhat equivalent to adding memkeys and bigkeys
time. Each scan loop is having more commands (key type, key size, key
length/cardinality).

We can redirect the output to a file. In that case, no color or text
refresh will happen.

Limitation:
- As the information printed during the loop is refreshed (moving cursor
up), stderr and information not fitting in the terminal window (width or
height) might create some refresh issues.

Comments:
- config.top_sizes_limit could be used globally like config.count, but
it is passed as parameter to be consistent with config.memkeys_samples.
- Not sure if we should move some utility functions to cli-common.c.

Got some tips and help from @ofirluzon.

---------

Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: debing.sun <debing.sun@redis.com>
2024-05-25 15:59:27 +08:00
Moti Cohen f34f2ade85
Add Statistics hashes_with_expiry_fields to INFO (#13275)
Added hashes_with_expiry_fields.
Optimially it would better to have statistic of that counts all fields
with expiry. But it requires careful logic and computation to follow and
deep dive listpacks and hashes. This statistics is trivial to achieve
and reflected by global HFE DS that has builtin enumeration of all the
hashes that are registered in it.
2024-05-23 17:29:45 +03:00
Moti Cohen ae6df30eb1
Fix ebuckets stop indication during ebExpire() (#13287)
on active-expire of buckets, at function `ebExpire()` ->
`ebSegExpire()`, if callback `onExpireItem()` indicated to stop, Yet
iterator (iter) was wrongly already got updated to point to next item.
In turn the segment will be updated without current item.
2024-05-23 10:46:50 +03:00
Ozan Tezcan a25b15392c
Improve performance of hfe listpack (#13279)
This PR contains a few optimizations for hfe listpack.
- Hfe fields are ordered by TTL in the listpack. There are two cases
that we want to search listpack according to TTLs:
- As part of active-expiry, we need to find the fields that are expired.
e.g. find fields that have smaller TTLs than given timestamp.
- When we want to add a new field, we need to find the correct position
to maintain the order by TTL. e.g. find the field that has a higher TTL
than the one we want to insert.
  
Iterating with lpNext() to compare TTLs has a performance cost as
lpNext() calls lpValidateIntegrity() for each entry. Instead, this PR
adds `lpFindCb()` to the listpack which accepts a comparator callback.
It preserves same validation logic of lpFind() which is faster than
search with lpNext().
  
- We have field name, value, ttl for a single hfe field. Inserting these
items one by one to listpack is costly. Especially, as we place fields
according to TTL, most additions will end up in the middle of the
listpack. Each insert causes realloc + memmove. This PR introduces
`lpBatchInsert()` to add multiple items in one go.

- For hsetf, if we are going to update value and TTL at the same time,
currently, we update the value first and later update the TTL (two
distinct listpack operation). This PR improves it by doing it with a
single update operation.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2024-05-22 23:05:32 +03:00
debing.sun 95cbe879e5
sanitize dump payload for HFE (#13278)
Add the following validations:
1. Get TTL using the lpGetIntegerValue() method instead of lpGetValue(),
Ref https://github.com/redis/redis/pull/13209#discussion_r1602569422
2. The TTL of listpackex is a number in the valid range
(0~EB_EXPIRE_TIME_MAX) and ordered.
3. The TTL fields of listpackex are ordered. 
4. The TTL of hashtable is within the valid range
(0~EB_EXPIRE_TIME_MAX).

Other:
Fix the missing of handling OBJ_ENCODING_LISTPACK_EX in
dismissHashObject().

---------

Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
2024-05-22 10:53:30 +08:00
Ted Lyngmo e92363e248
Log the real reason for why posix_fadvise failed (#13246)
`reclaimFilePageCache` did not set `errno` but `rdbSaveInternal` which
is logging the error assumed it did. This makes sure `errno` is set.

Fixes #13245

Signed-off-by: Ted Lyngmo <ted@lyncon.se>
2024-05-21 10:33:14 +08:00
debing.sun 9ffc35c98e
Have consistent behavior of SPUBLISH within multi/exec like regular command (#13276)
This PR is based on the commits from PR #12944.

Allow SPUBLISH command within multi/exec on replica

Behavior on unstable:

```
127.0.0.1:6380> CLUSTER NODES
39ce8aa20f1f0d91f1a88d976ee1926dfefcdf1a 127.0.0.1:6380@16380 myself,slave 8b0feb120b68aac489d6a5af9c77dc40d71bc792 0 0 0 connected
8b0feb120b68aac489d6a5af9c77dc40d71bc792 127.0.0.1:6379@16379 master - 0 1705091681202 0 connected 0-16383
127.0.0.1:6380> SPUBLISH hello world
(integer) 0
127.0.0.1:6380> MULTI
OK
127.0.0.1:6380(TX)> SPUBLISH hello world
QUEUED
127.0.0.1:6380(TX)> EXEC
(error) MOVED 866 127.0.0.1:6379
```

With this change:

```
127.0.0.1:6380> SPUBLISH hello world
(integer) 0
127.0.0.1:6380> MULTI
OK
127.0.0.1:6380(TX)> SPUBLISH hello world
QUEUED
127.0.0.1:6380(TX)> EXEC
1) (integer) 0
```

---------

Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: oranagra <oran@redislabs.com>
2024-05-21 09:25:13 +08:00
Ozan Tezcan 36c3cec6d1
Fix hfe RDB tests by adding FIELDS keyword to hexpire commands (#13277)
FIELDS keyword was added as part of
[#13270](https://github.com/redis/redis/pull/13270). 

It was missing in
[#13243](https://github.com/redis/redis/pull/13243)
2024-05-18 11:34:13 +03:00
Ronen Kalish 323be4d699
Hfe serialization listpack (#13243)
Add RDB de/serialization for HFE

This PR adds two new RDB types: `RDB_TYPE_HASH_METADATA` and
`RDB_TYPE_HASH_LISTPACK_TTL` to save HFE data.
When the hash RAM encoding is dict, it will be saved in the former, and
when it is listpack it will be saved in the latter.
Both formats just add the TTL value for each field after the data that
was previously saved, i.e HASH_METADATA will save the number of entries
and, for each entry, key, value and TTL, whereas listpack is saved as a
blob.
On read, the usual dict <--> listpack conversion takes place if
required.
In addition, when reading a hash that was saved as a dict fields are
actively expired if expiry is due. Currently this slao holds for
listpack encoding, but it is supposed to be removed.

TODO:
Remove active expiry on load when loading from listpack format (unless
we'll decide to keep it)
2024-05-17 18:27:02 +08:00
Moti Cohen 71676513dd
Fix commands H*EXPIRE* and H*TTL to include `FIELDS` constant (#13270)
The same goes to: HPEXPIRE, HEXPIREAT, HPEXPIREAT, HEXPIRETIME,
HPEXPIRETIME, HPTTL, HTTL, HPERSIST
2024-05-16 19:35:58 +03:00
debing.sun f1b0212917
Revert "Change mmap rand bits as a temporary mitigation to resolve asan bug (#13150)" (#13266)
The kernel config `vm.mmap_rnd_bits` had been revert in
https://github.com/actions/runner-images/issues/9491, so we can revert
the changes from #13150.

CI only with ASAN passed:
https://github.com/sundb/redis/actions/runs/9058263634
2024-05-15 14:12:22 +08:00
debing.sun ffbdf2f6f3
Fix test failure due to differing reply format of XREADGROUP under RESP3 in MULTI (#13255)
This test was introducted by #13251.
Normally we auto transform the reply format of XREADGROUP to array under
RESP3 (see trasformer_funcs).
But when we execute XREADGROUP command in multi it can't work, which
cause the new test failed.
The solution is to verity the reply of XREADGROUP in advance rather than
in MULTI.

Failed validate schema CI:
https://github.com/redis/redis/actions/runs/9025128323/job/24800285684

---------

Co-authored-by: guybe7 <guy.benoish@redislabs.com>
2024-05-14 20:08:32 +08:00
debing.sun 80be2cc291
Add defragment support for HFE (#13229)
## Background
1. All hash objects that contain HFE are referenced by db->hexpires.
2. All fields in a dict hash object with HFE are referenced by an
ebucket.

So when we defrag the hash object or the field in a dict with HFE, we
also need to update the references in them.

## Interface
1. Add a new interface `ebDefragItem`, which can accept a defrag
callback to defrag items in ebuckets, and simultaneously update their
references in the ebucket.

## Mainly changes
1. The key type of dict of hash object is no longer sds, so add new
`activeDefragHfieldDict()` to defrag the dict instead of
`activeDefragSdsDict()`.
2. When we defrag the dict of hash object by using `dictScanDefrag()`,
we always set the defrag callback `defragKey` of `dictDefragFunctions`
to NULL, because we can't reallocate a field with out updating it's
reference in ebuckets.
Instead, we will defrag the field of the dict and update its reference
in the callback `dictScanDefrag` of dictScanFunction().
3. When we defrag the hash robj with HFE, we will use `ebDefragItem` to
defrag the robj and update the reference in db->hexpires.

## TODO:
Defrag ebucket structure incremently, which will be handler in a future
PR.

---------

Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
Co-authored-by: Moti Cohen <moti.cohen@redis.com>
2024-05-14 17:32:33 +08:00
Ozan Tezcan 5066e6e9cd
Fix hgetf/hsetf reply type by returning string (#13263)
If encoding is listpack, hgetf and hsetf commands reply field value type
as integer.
This PR fixes it by returning string.

Problematic cases:
```
127.0.0.1:6379> hset hash one 1
(integer) 1
127.0.0.1:6379> hgetf hash fields 1 one
1) (integer) 1
127.0.0.1:6379> hsetf hash GETOLD fvs 1 one 2
1) (integer) 1
127.0.0.1:6379> hsetf hash DOF GETNEW fvs 1 one 2
1) (integer) 2
```

Additional fixes:
- hgetf/hsetf command description text

Fixes #13261, #13262
2024-05-13 11:09:49 +03:00
ClaytonNorthey92 8a05f0092b
Add reverse history search in redis-cli (linenoise) (#12543)
added reverse history search to redis-cli, use it with the following:

* CTRL+R : enable search backward mode, and search next one when
pressing CTRL+R again until reach index 0.
```
127.0.0.1:6379> keys one
127.0.0.1:6379> keys two
(reverse-i-search):                   # press CTRL+R
(reverse-i-search): keys two          # input `keys`
(reverse-i-search): keys one          # press CTRL+R again
(reverse-i-search): keys one          # press CTRL+R again, still `keys one` due to reaching index 0
(i-search): keys two                  # press CTRL+S, enable search forward
(i-search): keys two                  # press CTRL+S, still `keys one` due to reaching index 1
```

* CTRL+S : enable search forward mode, and search next one when pressing
CTRL+S again until reach index 0.
```
127.0.0.1:6379> keys one
127.0.0.1:6379> keys two
(i-search):                       # press CTRL+S
(i-search): keys one              # input `keys`
(i-search): keys two              # press CTRL+S again
(i-search): keys two              # press CTRL+R again, still `keys two` due to reaching index 0
(reverse-i-search): keys one      # press CTRL+R, enable search backward
(reverse-i-search): keys one      # press CTRL+S, still `keys one` due to reaching index 1
```

* CTRL+G : disable
```
127.0.0.1:6379> keys one
127.0.0.1:6379> keys two
(reverse-i-search):                   # press CTRL+R
(reverse-i-search): keys two          # input `keys`
127.0.0.1:6379>                       # press CTRL+G
```

* CTRL+C : disable
```
127.0.0.1:6379> keys one
127.0.0.1:6379> keys two
(reverse-i-search):                   # press CTRL+R
(reverse-i-search): keys two          # input `keys`
127.0.0.1:6379>                       # press CTRL+G
```

* TAB : use the current search result and exit search mode
```
127.0.0.1:6379> keys one
127.0.0.1:6379> keys two
(reverse-i-search):                # press CTRL+R
(reverse-i-search): keys two       # input `keys`
127.0.0.1:6379> keys two           # press TAB
```

* ENTER : use the current search result and execute the command
```
127.0.0.1:6379> keys one
127.0.0.1:6379> keys two
(reverse-i-search):                 # press CTRL+R
(reverse-i-search): keys two        # input `keys`
127.0.0.1:6379> keys two            # press ENTER
(empty array)
127.0.0.1:6379>
```

* any arrow key will disable reverse search

your result will have the search match bolded, you can press enter to
execute the full result

note: I have _only added this for multi-line mode_, as it seems to be
forced that way when `repl` is called

Closes: https://github.com/redis/redis/issues/8277

---------

Co-authored-by: Clayton Northey <clayton@knowbl.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Bjorn Svensson <bjorn.a.svensson@est.tech>
Co-authored-by: Viktor Söderqvist <viktor@zuiderkwast.se>
2024-05-10 11:10:14 +08:00
debing.sun 7010f41c96
Add notification support for HFE (#13237)
1. Add `hpersist` notification for `hpersist` command.
2. Add `pexpire` notification for `hexpire`, `hexpireat` and `hpexpire`.
2024-05-09 22:23:00 +08:00
Ozan Tezcan ca4ed48db6
Add listpack support, hgetf and hsetf commands (#13209)
**Changes:**
- Adds listpack support to hash field expiration 
- Implements hgetf/hsetf commands

**Listpack support for hash field expiration**

We keep field name and value pairs in listpack for the hash type. With
this PR, if one of hash field expiration command is called on the key
for the first time, it converts listpack layout to triplets to hold
field name, value and ttl per field. If a field does not have a TTL, we
store zero as the ttl value. Zero is encoded as two bytes in the
listpack. So, once we convert listpack to hold triplets, for the fields
that don't have a TTL, it will be consuming those extra 2 bytes per
item. Fields are ordered by ttl in the listpack to find the field with
minimum expiry time efficiently.

**New command implementations as part of this PR:** 

- HGETF command

For each specified field get its value and optionally set the field's
expiration time in sec/msec /unix-sec/unix-msec:
  ```
  HGETF key 
    [NX | XX | GT | LT]
[EX seconds | PX milliseconds | EXAT unix-time-seconds | PXAT
unix-time-milliseconds | PERSIST]
    <FIELDS count field [field ...]>
  ```

- HSETF command

For each specified field value pair: set field to value and optionally
set the field's expiration time in sec/msec /unix-sec/unix-msec:
  ```
  HSETF key 
    [DC] 
    [DCF | DOF] 
    [NX | XX | GT | LT] 
    [GETNEW | GETOLD] 
[EX seconds | PX milliseconds | EXAT unix-time-seconds | PXAT
unix-time-milliseconds | KEEPTTL]
    <FVS count field value [field value …]>
  ```

Todo:
- Performance improvement.
- rdb load/save
- aof
- defrag
2024-05-08 23:11:32 +03:00
Moti Cohen 13401f8bc1
ebuckets: Add test for ACT_UPDATE_EXP_ITEM (#13249)
- On ebExpire() verify the logic of update expired value to a new time
rather than remove it.
- Refine ebuckets benchmark
2024-05-08 18:38:45 +03:00
guybe7 0e1de78fca
XREADGROUP from PEL should not affect server.dirty (#13251)
Because it does not cause any propagation (arguably it should, see the
comment in the tcl file)

The motivation for this fix is that in 6.2 if dirty changed without
propagation inside MULTI/EXEC it would cause propagation of EXEC only,
which would result in the replica sending errors to its master
2024-05-06 16:55:42 +08:00
debing.sun 03cd525ffa
Fix reply schema for hfe related commands (#13238) 2024-05-03 11:11:41 +08:00
Moti Cohen c33c91dbce
Support HSET+expire in one command, at infra level (#13230)
Unify infra of `HSETF`, `HEXPIRE`, `HSET` and provide API for RDB load
as well. Whereas setting plain fields is rather straightforward, setting
expiration time to fields might be time-consuming and complex since each 
update of expiration time, not only updates `ebuckets` of corresponding hash, 
but also might update `ebuckets` of global HFE DS. It is required to opt 
sequence of field updates with expirartion for a given hash, such that only once
done, the global HFE DS will get updated.

To do so, follow the scheme:
1. Call `hashTypeSetExInit()` to initialize the HashTypeSetEx struct.
2. Call `hashTypeSetEx()` one time or more, for each field/expiration update.
3. Call `hashTypeSetExDone()` for notification and update of global HFE.

If expiration is not required, then avoid this API and use instead hashTypeSet().
2024-04-25 18:29:03 +03:00
debing.sun f95031c473
Fix CI failure caused by PR #13231 (#13233)
For my mistake, in the last revert commit in #13231, I originally wanted
to revert the last one, but reverted the penultimate fix.
Now that we have fix another potential memory read issue in [`743f1dd`
(#13231)](743f1dde79),
now it just seems to avoid confusion, i will verify in the future
whether it will have any impact, if so we will add this PR to backport.

Failed CI: https://github.com/sundb/redis/actions/runs/8826731960
2024-04-25 14:11:45 +08:00
debing.sun 772564fc9e
Fix forget to update the dict's node in the kvstore's rehashing list after defragment (#13231)
Introducted by #13013

After defragmenting the dictionary in the kvstore, if the dict is
reallocated, the value of its node in the kvstore rehashing list must be
updated.
2024-04-24 16:15:42 +08:00
Moti Cohen c18ff05665
Hash Field Expiration - Basic support
- Add ebuckets & mstr data structures
- Integrate active & lazy expiration
- Add most of the commands 
- Add support for dict (listpack is missing)
TODOs:  RDB, notification, listpack, HSET, HGETF, defrag, aof
2024-04-18 16:06:30 +03:00
Binbin 804110a487
Allocate Lua VM code with jemalloc instead of libc, and count it used memory (#13133)
## Background
1. Currently Lua memory control does not pass through Redis's zmalloc.c.
Redis maxmemory cannot limit memory problems caused by users abusing lua
since these lua VM memory is not part of used_memory.

2. Since jemalloc is much better (fragmentation and speed), and also we
know it and trust it. we are
going to use jemalloc instead of libc to allocate the Lua VM code and
count it used memory.

## Process:
In this PR, we will use jemalloc in lua. 
1. Create an arena for all lua vm (script and function), which is
shared, in order to avoid blocking defragger.
2. Create a bound tcache for the lua VM, since the lua VM and the main
thread are by default in the same tcache, and if there is no isolated
tcache, lua may request memory from the tcache which has just been freed
by main thread, and vice versa
On the other hand, since lua vm might be release in bio thread, but
tcache is not thread-safe, we need to recreate
    the tcache every time we recreate the lua vm.
3. Remove lua memory statistics from memory fragmentation statistics to
avoid the effects of lua memory fragmentation

## Other
Add the following new fields to `INFO DEBUG` (we may promote them to
INFO MEMORY some day)
1. allocator_allocated_lua: total number of bytes allocated of lua arena
2. allocator_active_lua: total number of bytes in active pages allocated
in lua arena
3. allocator_resident_lua: maximum number of bytes in physically
resident data pages mapped in lua arena
4. allocator_frag_bytes_lua: fragment bytes in lua arena

This is oranagra's idea, and i got some help from sundb.

This solves the third point in #13102.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2024-04-16 12:43:33 +03:00
Yves LeBras e3550f01dd
redis-cli - sendReadOnly() to work with Redis Cloud (#13195)
When using Redis Cloud, sendReadOnly() exit with `Error: ERR unknown
command 'READONLY'`.
It is impacting `--memkeys`, `--bigkeys`, `--hotkeys`, and will impact
`--keystats`.
Added one line to ignore this error.

issue introduced in #12735 (not yet released).
2024-04-08 11:12:57 +03:00
debing.sun f4481e657f
Use usleep() instead of sched_yield() to yield cpu (#13183)
when the main thread and the module thread are in the same thread,
sched_yield() can work well.
when they are both bind to different cpus, sched_yield() will look for
the thread with the highest priority, and if the module thread is always
the highest priority on a cpu, it will take a long time to let the main
thread to reacquire the GIL.

ref https://man7.org/linux/man-pages/man2/sched_yield.2.html
```
If the calling thread is the only thread in the highest priority
list at that time, it will continue to run after a call to
sched_yield().
```
2024-04-07 20:59:36 +08:00
debing.sun 4581d43230
Fix daylight race condition and some thread leaks (#13191)
fix some issues that come from sanitizer thread report.

1. when the main thread is updating daylight_active, other threads (bio,
module thread) may be writing logs at the same time.
```
WARNING: ThreadSanitizer: data race (pid=661064)
  Read of size 4 at 0x55c9a4d11c70 by thread T2:
    #0 serverLogRaw /home/sundb/data/redis_fork/src/server.c:116 (redis-server+0x8d797) (BuildId: dca0b1945ba30010e36129bdb296e488dd2b32d0)
    #1 _serverLog.constprop.2 /home/sundb/data/redis_fork/src/server.c:146 (redis-server+0x2a3b14) (BuildId: dca0b1945ba30010e36129bdb296e488dd2b32d0)
    #2 bioProcessBackgroundJobs /home/sundb/data/redis_fork/src/bio.c:329 (redis-server+0x1c24ca) (BuildId: dca0b1945ba30010e36129bdb296e488dd2b32d0)

  Previous write of size 4 at 0x55c9a4d11c70 by main thread (mutexes: write M0, write M1, write M2, write M3):
    #0 updateCachedTimeWithUs /home/sundb/data/redis_fork/src/server.c:1102 (redis-server+0x925e7) (BuildId: dca0b1945ba30010e36129bdb296e488dd2b32d0)
    #1 updateCachedTimeWithUs /home/sundb/data/redis_fork/src/server.c:1087 (redis-server+0x925e7)
    #2 updateCachedTime /home/sundb/data/redis_fork/src/server.c:1118 (redis-server+0x925e7)
    #3 afterSleep /home/sundb/data/redis_fork/src/server.c:1811 (redis-server+0x925e7)
    #4 aeProcessEvents /home/sundb/data/redis_fork/src/ae.c:389 (redis-server+0x85ae0) (BuildId: dca0b1945ba30010e36129bdb296e488dd2b32d0)
    #5 aeProcessEvents /home/sundb/data/redis_fork/src/ae.c:342 (redis-server+0x85ae0)
    #6 aeMain /home/sundb/data/redis_fork/src/ae.c:477 (redis-server+0x85ae0)
    #7 main /home/sundb/data/redis_fork/src/server.c:7211 (redis-server+0x7168c) (BuildId: dca0b1945ba30010e36129bdb296e488dd2b32d0)
```

2. thread leaks in module tests
```
WARNING: ThreadSanitizer: thread leak (pid=668683)
  Thread T13 (tid=670041, finished) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1036 (libtsan.so.2+0x3d179) (BuildId: 28a9f70061dbb2dfa2cef661d3b23aff4ea13536)
    #1 HelloBlockNoTracking_RedisCommand /home/sundb/data/redis_fork/tests/modules/blockonbackground.c:200 (blockonbackground.so+0x97fd) (BuildId: 9cd187906c57e88cdf896d121d1d96448b37a136)
    #2 HelloBlockNoTracking_RedisCommand /home/sundb/data/redis_fork/tests/modules/blockonbackground.c:169 (blockonbackground.so+0x97fd)
    #3 call /home/sundb/data/redis_fork/src/server.c:3546 (redis-server+0x9b7fb) (BuildId: dca0b1945ba30010e36129bdb296e488dd2b32d0)
    #4 processCommand /home/sundb/data/redis_fork/src/server.c:4176 (redis-server+0xa091c) (BuildId: dca0b1945ba30010e36129bdb296e488dd2b32d0)
    #5 processCommandAndResetClient /home/sundb/data/redis_fork/src/networking.c:2468 (redis-server+0xd2b8e) (BuildId: dca0b1945ba30010e36129bdb296e488dd2b32d0)
    #6 processInputBuffer /home/sundb/data/redis_fork/src/networking.c:2576 (redis-server+0xd2b8e)
    #7 readQueryFromClient /home/sundb/data/redis_fork/src/networking.c:2722 (redis-server+0xd358f) (BuildId: dca0b1945ba30010e36129bdb296e488dd2b32d0)
    #8 callHandler /home/sundb/data/redis_fork/src/connhelpers.h:58 (redis-server+0x288a7b) (BuildId: dca0b1945ba30010e36129bdb296e488dd2b32d0)
    #9 connSocketEventHandler /home/sundb/data/redis_fork/src/socket.c:277 (redis-server+0x288a7b)
    #10 aeProcessEvents /home/sundb/data/redis_fork/src/ae.c:417 (redis-server+0x85b45) (BuildId: dca0b1945ba30010e36129bdb296e488dd2b32d0)
    #11 aeProcessEvents /home/sundb/data/redis_fork/src/ae.c:342 (redis-server+0x85b45)
    #12 aeMain /home/sundb/data/redis_fork/src/ae.c:477 (redis-server+0x85b45)
    #13 main /home/sundb/data/redis_fork/src/server.c:7211 (redis-server+0x7168c) (BuildId: dca0b1945ba30010e36129bdb296e488dd2b32d0)
```
2024-04-04 13:49:51 +03:00
Moti Cohen 4df037962d
Change FLUSHALL/FLUSHDB SYNC to run as blocking ASYNC (#13167)
# Overview
Users utilize the `FLUSHDB SYNC` and `FLUSHALL SYNC` commands for a variety of 
reasons. The main issue with this command is that if the database becomes 
substantial in size, the server will be unresponsive for an extended period. 
Other than freezing application traffic, this may also lead some clients making 
incorrect judgments about the server's availability. For instance, a watchdog may 
erroneously decide to terminate the process, resulting in potential adverse 
outcomes. While a `FLUSH* ASYNC` can address these issues, it might not be used 
for two reasons: firstly, it's not the default, and secondly, in some cases, the 
client issuing the flush wants to wait for its completion before repopulating the 
database.

Between the option of triggering FLUSH* asynchronously in the background without 
indication for completion versus running it synchronously in the foreground by 
the main thread, there is another more appealing option. We can block the
client that requested the flush, execute the flush command in the background, and 
once done, unblock the client and return notification for completion. This approach 
ensures the server remains responsive to other clients, and the blocked client 
receives the expected response only after the flush operation has been successfully 
carried out.

# Implementation details
Instead of defining yet another flavor to the flush command, we can modify
`FLUSHALL SYNC` and `FLUSHDB SYNC` always run in this new mode.

## Extending BIO Threads capabilities
Today jobs that are carried out by BIO threads don't have the capability to 
indicate completion to the main thread. We can add this infrastructure by having
an additional dummy job, coined as completion-job, that eventually will be written 
by BIO threads to a response-queue. The main thread will take care to consume items
from the response-queue and call the provided callback function of each 
completion-job.

## FLUSH* SYNC to run as blocking ASYNC
Command `FLUSH* SYNC` will be modified to create one or more async jobs to flush
DB(s) and afterward will push additional completion-job request. By sending the
completion job request only at the end, the main thread will be called back only
after all the preceding jobs completed their task in the background. During that
time, the client of the command is suspended and marked as `BLOCKED_LAZYFREE`
whereas any other client will be able to communicate with the server without any
issue.
2024-04-02 15:09:52 +03:00
Moti Cohen ce47834309
kvstoreIteratorNext() wrongly reset iterator twice (#13178)
It calls kvstoreIteratorNextDict() which eventually calls dictResumeRehashing()
And then, on return, it calls dictResetIterator(iter) which calls dictResumeRehashing().
We end up with pauserehash value decremented twice instead of once.
2024-04-01 18:08:55 +03:00
Pieter Cailliau 0b34396924
Change license from BSD-3 to dual RSALv2+SSPLv1 (#13157)
[Read more about the license change
here](https://redis.com/blog/redis-adopts-dual-source-available-licensing/)
Live long and prosper 🖖
2024-03-20 22:38:24 +00:00
Yanqi Lv e64d91c371
Fix dict use-after-free problem in kvs->rehashing (#13154)
In ASAN CI, we find server may crash because of NULL ptr in `kvstoreIncrementallyRehash`.
the reason is that we use two phase unlink in `dbGenericDelete`. After `kvstoreDictTwoPhaseUnlinkFind`,
the dict may be in rehashing and only have one element in ht[0] of `db->keys`.

When we delete the last element in `db->keys` meanwhile `db->keys` is in rehashing, we may free the
dict in `kvstoreDictTwoPhaseUnlinkFree` without deleting the node in `kvs->rehashing`. Then we may
use this freed ptr in `kvstoreIncrementallyRehash` in the `serverCron` and cause the crash.
This is indeed a use-after-free problem.

The fix is to call rehashingCompleted in dictRelease and dictEmpty, so that every call for
rehashingStarted is always matched with a rehashingCompleted.

Adding a test in the unit test to catch it consistently

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
2024-03-20 22:44:28 +02:00
Yanqi Lv bad33f8738
fix wrong data type conversion in zrangeResultBeginStore (#13148)
In `beginResultEmission`, -1 means the result length is not known in
advance. But after #12185, if we pass -1 to `zrangeResultBeginStore`, it
will convert to SIZE_MAX in `zsetTypeCreate` and try to `dictExpand`.
Although `dictExpand` won't succeed because the size overflows, I think
we'd better to avoid this wrong conversion.

This bug can be triggered when the source of `zrangestore` doesn't exist
or we use `zrangestore` command with `byscore` or `bylex`.
The impact is that dst keys will be converted to use skiplist instead of
listpack.
2024-03-19 08:52:55 +02:00
Binbin e04d41d78d
Prevent lua error_reply abuse from causing errorstats to become larger (#13141)
Users who abuse lua error_reply will generate a new error object on each
error call, which can make server.errors get bigger and bigger. This
will
cause the server to block when calling INFO (we also return errorstats
by
default).

To prevent the damage it can cause, when a misuse is detected, we will
print a warning log and disable the errorstats to avoid adding more new
errors. It can be re-enabled via CONFIG RESETSTAT.

Because server.errors may be very large (it may be better now since we
have the limit), config resetstat may block for a while. So in
resetErrorTableStats, we will try to lazyfree server.errors.

See the related discussion at the end of #8217.
2024-03-19 08:18:22 +02:00
Chen Tianjie aeada20140
Avoid unnecessary dict shrink in zremrangeGenericCommand (#13143)
If the skiplist is emptied, there is no need to shrink the dict in
skiplist, it can be deleted directly.
2024-03-19 10:14:19 +08:00
Binbin 7b070423b8
Fix dictionary use-after-free in active expire and make kvstore iter to respect EMPTY flag (#13135)
After #13072, there is an use-after-free error. In expireScanCallback, we
will delete the dict, and then in dictScan we will continue to use the dict,
like we will doing `dictResumeRehashing(d)` in the end, this casued an error.

In this PR, in freeDictIfNeeded, if the dict's pauserehash is set, don't
delete the dict yet, and then when scan returns try to delete it again.

At the same time, we noticed that there will be similar problems in iterator.
We may also delete elements during the iteration process, causing the dict
to be deleted, so the part related to iter in the PR has also been modified.
dictResetIterator was also missing from the previous kvstoreIteratorNextDict,
we currently have no scenario that elements will be deleted in kvstoreIterator
process, deal with it together to avoid future problems. Added some simple
tests to verify the changes.

In addition, the modification in #13072 omitted initTempDb and emptyDbAsync,
and they were also added. This PR also remove the slow flag from the expire
test (consumes 1.3s) so that problems can be found in CI in the future.
2024-03-18 17:41:54 +02:00
Alexander Mahone 98a6e55d4e
Add missing REDIS_STATIC in quicklist (#13147)
Compiler complained when I tried to compile only quicklist.c.
Since static keyword is needed when a static function declaration is
placed before its implementation.

```
#ifndef REDIS_STATIC
#define REDIS_STATIC static
#endif
```

[How to solve static declaration follows non-static declaration in GCC C
code?](https://stackoverflow.com/questions/3148244/how-to-solve-static-declaration-follows-non-static-declaration-in-gcc-c-code)
2024-03-18 08:22:19 +02:00
Madelyn Olson 3f725b8619
Change mmap rand bits as a temporary mitigation to resolve asan bug (#13150)
There is a new change in linux kernel 6.6.6 that uses randomization of
address space to harden security, but it interferes with the way ASAN
works. Folks are working on a fix, but this is a temporary mitigation
for us to get our CI to be green again.

See https://github.com/google/sanitizers/issues/1716 for more
information

See
https://github.com/redis/redis/actions/runs/8305126288/job/22731614306?pr=13149
for a recent failure
2024-03-17 09:06:51 +02:00
Viktor Söderqvist 1d77a8e2c5
Makefile respect user's REDIS_CFLAGS and OPT (#13073)
This change to the Makefile makes it possible to opt out of
`-fno-omit-frame-pointer` added in #12973 and `-flto` (#11350). Those
features were implemented by conditionally modifying the `REDIS_CFLAGS`
and `REDIS_LDFLAGS` variables. Historically, those variables provided a
way for users to pass options to the compiler and linker unchanged.

Instead of conditionally appending optimization flags to REDIS_CFLAGS
and REDIS_LDFLAGS, I want to append them to the OPTIMIZATION variable.

Later in the Makefile, we have `OPT=$(OPTIMIZATION)` (meaning
OPTIMIZATION is only a default for OPT, but OPT can be overridden by the
user), and later the flags are combined like this:

FINAL_CFLAGS=$(STD) $(WARN) $(OPT) $(DEBUG) $(CFLAGS) $(REDIS_CFLAGS)
    FINAL_LDFLAGS=$(LDFLAGS) $(OPT) $(REDIS_LDFLAGS) $(DEBUG)

This makes it possible for the the user to override all optimization
flags with e.g. `make OPT=-O1` or just `make OPT=`.

For some reason `-O3` was also already added to REDIS_LDFLAGS by default
in #12339, so I added OPT to FINAL_LDFLAGS to avoid more complex logic
(such as introducing a separate LD_OPT variable).
2024-03-13 17:02:00 +02:00
Binbin 3b3d16f748
Add KVSTORE_FREE_EMPTY_DICTS to cluster mode keys / expires kvstore (#13072)
Currently (following #11695, and #12822), keys kvstore and expires
kvstore both flag with ON_DEMAND, it means that a cluster node will
only allocate a dict when the slot is assigned to it and populated,
but on the other hand, when the slot is unassigned, the dict will
remain allocated.

We considered releasing the dict when the slot is unassigned, but it
causes complications on replicas. On the other hand, from benchmarks
we conducted, it looks like the performance impact of releasing the
dict when it becomes empty and re-allocate it when a key is added
again, isn't huge.

This PR add KVSTORE_FREE_EMPTY_DICTS to cluster mode keys / expires
kvstore.

The impact is about about 2% performance drop, for this hopefully
uncommon scenario.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2024-03-13 08:30:20 +02:00
Binbin ad28d222ed
Lua eval scripts first in first out LRU eviction (#13108)
In some cases, users will abuse lua eval. Each EVAL call generates
a new lua script, which is added to the lua interpreter and cached
to redis-server, consuming a large amount of memory over time.

Since EVAL is mostly the one that abuses the lua cache, and these
won't have pipeline issues (i.e. the script won't disappear
unexpectedly,
and cause errors like it would with SCRIPT LOAD and EVALSHA),
we implement a plain FIFO LRU eviction only for these (not for
scripts loaded with SCRIPT LOAD).

### Implementation notes:
When not abused we'll probably have less than 100 scripts, and when
abused we'll have many thousands. So we use a hard coded value of 500
scripts. And considering that we don't have many scripts, then unlike
keys, we don't need to worry about the memory usage of keeping a true
sorted LRU linked list. We compute the SHA of each script anyway,
and put the script in a dict, we can store a listNode there, and use
it for quick removal and re-insertion into an LRU list each time the
script is used.

### New interfaces:
At the same time, a new `evicted_scripts` field is added to
INFO, which represents the number of evicted eval scripts. Users
can check it to see if they are abusing EVAL.

### benchmark:
`./src/redis-benchmark -P 10 -n 1000000 -r 10000000000 eval "return
__rand_int__" 0`

The simple abuse of eval benchmark test that will create 1 million EVAL
scripts. The performance has been improved by 50%, and the max latency
has dropped from 500ms to 13ms (this may be caused by table expansion
inside Lua when the number of scripts is large). And in the INFO memory,
it used to consume 120MB (server cache) + 310MB (lua engine), but now
it only consumes 70KB (server cache) + 210KB (lua_engine) because of
the scripts eviction.

For non-abusive case of about 100 EVAL scripts, there's no noticeable
change in performance or memory usage.

### unlikely potentially breaking change:
in theory, a user can maybe load a
script with EVAL and then use EVALSHA to call it (by calculating the
SHA1 value on the client side), it could be that if we read the docs
carefully we'll realized it's a valid scenario, but we suppose it's
extremely rare. So it may happen that EVALSHA acts on a script created
by EVAL, and the script is evicted and EVALSHA returns a NOSCRIPT error.
that is if you have more than 500 scripts being used in the same
transaction / pipeline.

This solves the second point in #13102.
2024-03-13 08:27:41 +02:00
Ronen Kalish a8e745117f
Xread last entry in stream (#7388) (#13117)
Allow using `+` as a special ID for last item in stream on XREAD
command.

This would allow to iterate on a stream with XREAD starting with the
last available message instead of the next one which `$` is used for.
I.e. the caller can use `BLOCK` and `+` on the first call, and change to
`$` on the next call.

Closes #7388

---------

Co-authored-by: Felipe Machado <462154+felipou@users.noreply.github.com>
2024-03-13 08:23:32 +02:00
Viktor Söderqvist 9efc6ad6a6
Add API RedisModule_ClusterKeySlot and RedisModule_ClusterCanonicalKeyNameInSlot (#13069)
Sometimes it's useful to compute a key's cluster slot in a module.

This API function is just like the command CLUSTER KEYSLOT (but faster).

A "reverse" API is also added:
`RedisModule_ClusterCanonicalKeyNameInSlot`. Given a slot, it returns a
short string that we can call a canonical key for the slot.
2024-03-12 09:26:12 -07:00
Andy Pan 9c065c417d
Enable accept4() on *BSD (#13104)
Redis enabled `accept4` on Linux after #9177, reducing extra system
calls for sockets.

`accept4` system call is also widely supported on *BSD and Solaris in
addition to Linux. This PR enables `accept4` on all platforms that
support it.

### References
- [accept4 on
FreeBSD](https://man.freebsd.org/cgi/man.cgi?query=accept4&sektion=2&n=1)
- [accept4 on
DragonFly](https://man.dragonflybsd.org/?command=accept&section=2)
- [accept4 on NetBSD](https://man.netbsd.org/accept.2)
- [accept4 on OpenBSD](https://man.openbsd.org/accept4.2)
- [accept4 on
Solaris](https://docs.oracle.com/cd/E88353_01/html/E37843/accept4-3c.html)
2024-03-12 16:35:52 +02:00