Commit Graph

12326 Commits

Author SHA1 Message Date
Filipe Oliveira (Redis) a31b516e25
Optimize the HELLO command reply (#13490)
# Overall improvement

TBD ( current is approximately 6% on the achievable ops/sec), coming
from:

- In case of no module we can skip 1.3% CPU cycles on dict Iterator
creation/deletion
- Use addReplyBulkCBuffer instead of addReplyBulkCString to avoid
runtime strlen overhead within HELLO reply on string constants.

## Optimization 1: In case of no module we can skip 1.3% CPU cycles on
dict Iterator creation/deletion.

## Optimization 2: Use addReplyBulkCBuffer instead of
addReplyBulkCString to avoid runtime strlen overhead within HELLO reply
on string constants.
2024-09-03 17:27:35 +08:00
cyy-tag c77b8f45e9
Fixed variable parameter formatting issues in serverPanic function (#13504)
Currently aeApiPoll panic does not record error code information. Added
variable parameter formatting to _serverPanic to fix the issue

---------

Co-authored-by: yingyin.chen <15816602944@163.com>
2024-09-03 15:51:46 +08:00
Ozan Tezcan a7afd1d2b2
Reply LOADING on replica while flushing the db (#13495)
On a full sync, replica starts discarding existing db. If the existing 
db is huge and flush is happening synchronously, replica may become 
unresponsive. 

Adding a change to yield back to event loop while flushing db on 
a replica. Replica will reply -LOADING in this case. Note that while 
replica is loading the new rdb, it may get an error and start flushing
the partial db. This step may take a long time as well. Similarly, 
replica will reply -LOADING in this case. 

To call processEventsWhileBlocked() and reply -LOADING, we need to do:
- Set connSetReadHandler() null not to process further data from the master
- Set server.loading flag
- Call blockingOperationStarts()

rdbload() already does these steps and calls processEventsWhileBlocked()
while loading the rdb. Added a new call rdbLoadWithEmptyFunc() which 
accepts callback to flush db before loading rdb or when an error 
happens while loading. 

For diskless replication, doing something similar and calling emptyData()
after setting required flags.

Additional changes:
- Allow `appendonly` config change during loading. 
 Config can be changed while loading data on startup or on replication 
 when slave is loading RDB. We allow config change command to update 
 `server.aof_enabled` and then lazily apply config change after loading
 operation is completed.
 
 - Added a test for `replica-lazy-flush` config
2024-09-03 09:48:44 +03:00
Oran Agra 3fcddfb61f
testsuite --dump-logs works on servers started before the test (#13500)
so far ./runtest --dump-logs used work for servers started within the
test proc.
now it'll also work on servers started outside the test proc scope.
the downside is that these logs can be huge if they served many tests
and not just the failing one.
but for some rare failures, we rather have that than nothing.
this feature isn't enabled y default, but is used by our GH actions.
2024-08-29 07:27:23 +01:00
CoolThi 3c9f5954b5
Remove variable expired in expireSlaveKeys() to prevent confusing the compiler (#13299)
This change prevents missed optimization for some compilers:
https://godbolt.org/z/W66h86E13 (the reduced intermediate form in
optimization).
2024-08-28 21:52:23 +08:00
Raz Monsonego 3b1b1d1486
MOD-7645: Return module commands in ACL CAT (#13489)
Currently, module commands are not returned for the `ACL CAT <category>`
command, but skipped instead. Since now modules can add ACL categories
they should no longer be skipped.
2024-08-26 19:25:22 +01:00
paoloredis 60f22ca830
Add redis_docs_sync workflow (#13426)
This workflow executes on releases. It invokes the `redis_docs_sync`
workflow on `redis/docs`.
2024-08-21 09:41:14 +01:00
Zihao Lin 6ceadfb580
Improve GETRANGE command behavior (#12272)
Fixed the issue about GETRANGE and SUBSTR command
return unexpected result caused by the `start` and `end` out of
definition range of string.

---
## break change
Before this PR, when negative `end` was out of range (i.e., end <
-strlen), we would fix it to 0 to get the substring, which also resulted
in the first character still being returned for this kind of out of
range.
After this PR, we ensure that `GETRANGE` returns an empty bulk when the
negative end index is out of range.

Closes #11738

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2024-08-20 12:34:43 +08:00
judeng 7f0a7f0a69
improve performance for scan command when matching data type (#12395)
Move the TYPE filtering to the scan callback so that avoided the
`lookupKey` operation. This is the follow-up to #12209 . In this thread
we introduced two breaking changes:
1. we will not attempt to do lazy expire (delete) a key that was
filtered by not matching the TYPE (like we already do for MATCH
pattern).
2. when the specified key TYPE filter is an unknown type, server will
reply a error immediately instead of doing a full scan that comes back
empty handed.
2024-08-20 10:47:51 +08:00
Meir Shpilraien (Spielrein) 3264deb24e
Avoid used_memory contention when update from multiple threads. (#13431)
The PR attempt to avoid contention on the `used_memory` global variable
when allocate or free memory from multiple threads at the same time.

Each time a thread is allocating or releasing a memory, it needs to
update the `used_memory` global variable. This update might cause a
contention when done aggressively from multiple threads.

### The solution

Instead of having a single global variable that need to be updated from
multiple thread. We create an array of used_memory, each entry in the
array is updated by a single thread and the main thread summarizes all
the values to accumulate the memory usage.

This solution, though reduces the contention between threads on updating
the `used_memory` global variable, it adds work to the main thread that
need to summarize all the entries at the `used_memory` array. To avoid
increasing the work done by the main thread by too much, we limit the
size of the used memory array to 16. This means that up to 16 threads
can run without any contention between them. If there are more than 16
threads, we will reuse entries on the used_memory array, in this case we
might still have contention between threads, but it will be much less
significant.

Notice, that in order to really avoid contention, the entries in the
`used_memory` array must reside on different cache lines. To achieve
that we create a struct with padding such that its size will be exactly
cache_line size. In addition we make sure the address of the
`used_memory` array will be aligned to cache_line size.

### Benchmark

Some benchmark shows improvement (up to 15%):

| Test Case |Baseline unstable (median obs. +- std.dev)|Comparison
test_used_memory_per_thread_array (median obs. +- std.dev)|% change
(higher-better)| Note |

|-------------------------------------------------------------------------------|------------------------------------------|--------------------------------------------------------------------:|------------------------|------------------------------------|
|memtier_benchmark-1key-list-100-elements-lrange-all-elements | 92657 +-
2.0% (2 datapoints) | 101445|9.5% |IMPROVEMENT |
|memtier_benchmark-1key-list-1K-elements-lrange-all-elements | 14965 +-
1.3% (2 datapoints) | 16296|8.9% |IMPROVEMENT |
|memtier_benchmark-1key-set-10-elements-smembers-pipeline-10 | 431019 +-
5.2% (2 datapoints) | 461039|7.0% |waterline=5.2%. IMPROVEMENT |
|memtier_benchmark-1key-set-100-elements-smembers | 74367 +- 0.0% (2
datapoints) | 80190|7.8% |IMPROVEMENT |
|memtier_benchmark-1key-set-1K-elements-smembers | 11730 +- 0.4% (2
datapoints) | 13519|15.3% |IMPROVEMENT |


Full results:

| Test Case |Baseline unstable (median obs. +- std.dev)|Comparison
test_used_memory_per_thread_array (median obs. +- std.dev)|% change
(higher-better)| Note |

|-------------------------------------------------------------------------------|------------------------------------------|--------------------------------------------------------------------:|------------------------|------------------------------------|
|memtier_benchmark-10Mkeys-load-hash-5-fields-with-1000B-values | 88613
+- 1.0% (2 datapoints) | 88688|0.1% |No Change |

|memtier_benchmark-10Mkeys-load-hash-5-fields-with-1000B-values-pipeline-10
| 124786 +- 1.2% (2 datapoints) | 123671|-0.9% |No Change |
|memtier_benchmark-10Mkeys-load-hash-5-fields-with-100B-values | 122460
+- 1.4% (2 datapoints) | 122990|0.4% |No Change |

|memtier_benchmark-10Mkeys-load-hash-5-fields-with-100B-values-pipeline-10
| 333384 +- 5.1% (2 datapoints) | 319221|-4.2% |waterline=5.1%.
potential REGRESSION|
|memtier_benchmark-10Mkeys-load-hash-5-fields-with-10B-values | 137354
+- 0.3% (2 datapoints) | 138759|1.0% |No Change |

|memtier_benchmark-10Mkeys-load-hash-5-fields-with-10B-values-pipeline-10
| 401261 +- 4.3% (2 datapoints) | 398524|-0.7% |No Change |
|memtier_benchmark-1Mkeys-100B-expire-use-case | 179058 +- 0.4% (2
datapoints) | 180114|0.6% |No Change |
|memtier_benchmark-1Mkeys-10B-expire-use-case | 180390 +- 0.2% (2
datapoints) | 180401|0.0% |No Change |
|memtier_benchmark-1Mkeys-1KiB-expire-use-case | 175993 +- 0.7% (2
datapoints) | 175147|-0.5% |No Change |
|memtier_benchmark-1Mkeys-4KiB-expire-use-case | 165771 +- 0.0% (2
datapoints) | 164434|-0.8% |No Change |
|memtier_benchmark-1Mkeys-bitmap-getbit-pipeline-10 | 931339 +- 2.1% (2
datapoints) | 929487|-0.2% |No Change |
|memtier_benchmark-1Mkeys-generic-exists-pipeline-10 | 999462 +- 0.4% (2
datapoints) | 963226|-3.6% |potential REGRESSION |
|memtier_benchmark-1Mkeys-generic-expire-pipeline-10 | 905333 +- 1.4% (2
datapoints) | 896673|-1.0% |No Change |
|memtier_benchmark-1Mkeys-generic-expireat-pipeline-10 | 885015 +- 1.0%
(2 datapoints) | 865010|-2.3% |No Change |
|memtier_benchmark-1Mkeys-generic-pexpire-pipeline-10 | 897115 +- 1.2%
(2 datapoints) | 887544|-1.1% |No Change |
|memtier_benchmark-1Mkeys-generic-scan-pipeline-10 | 451103 +- 3.2% (2
datapoints) | 465571|3.2% |potential IMPROVEMENT |
|memtier_benchmark-1Mkeys-generic-touch-pipeline-10 | 996809 +- 0.6% (2
datapoints) | 984478|-1.2% |No Change |
|memtier_benchmark-1Mkeys-generic-ttl-pipeline-10 | 979570 +- 1.7% (2
datapoints) | 958752|-2.1% |No Change |

|memtier_benchmark-1Mkeys-hash-hget-hgetall-hkeys-hvals-with-100B-values
| 180888 +- 0.5% (2 datapoints) | 182295|0.8% |No Change |

|memtier_benchmark-1Mkeys-hash-hmget-5-fields-with-100B-values-pipeline-10
| 717881 +- 1.0% (2 datapoints) | 724814|1.0% |No Change |
|memtier_benchmark-1Mkeys-hash-transactions-multi-exec-pipeline-20 |
1055447 +- 0.4% (2 datapoints) | 1065836|1.0% |No Change |
|memtier_benchmark-1Mkeys-lhash-hexists | 164332 +- 0.1% (2 datapoints)
| 163636|-0.4% |No Change |
|memtier_benchmark-1Mkeys-lhash-hincbry | 171674 +- 0.3% (2 datapoints)
| 172737|0.6% |No Change |
|memtier_benchmark-1Mkeys-list-lpop-rpop-with-100B-values | 180904 +-
1.1% (2 datapoints) | 179467|-0.8% |No Change |
|memtier_benchmark-1Mkeys-list-lpop-rpop-with-10B-values | 181746 +-
0.8% (2 datapoints) | 182416|0.4% |No Change |
|memtier_benchmark-1Mkeys-list-lpop-rpop-with-1KiB-values | 182004 +-
0.7% (2 datapoints) | 180237|-1.0% |No Change |
|memtier_benchmark-1Mkeys-load-hash-5-fields-with-1000B-values | 105191
+- 0.9% (2 datapoints) | 105058|-0.1% |No Change |

|memtier_benchmark-1Mkeys-load-hash-5-fields-with-1000B-values-pipeline-10
| 150683 +- 0.9% (2 datapoints) | 153597|1.9% |No Change |
|memtier_benchmark-1Mkeys-load-hash-hmset-5-fields-with-1000B-values |
104122 +- 0.7% (2 datapoints) | 105236|1.1% |No Change |
|memtier_benchmark-1Mkeys-load-list-with-100B-values | 149770 +- 0.9% (2
datapoints) | 150510|0.5% |No Change |
|memtier_benchmark-1Mkeys-load-list-with-10B-values | 165537 +- 1.9% (2
datapoints) | 164329|-0.7% |No Change |
|memtier_benchmark-1Mkeys-load-list-with-1KiB-values | 113315 +- 0.5% (2
datapoints) | 114110|0.7% |No Change |
|memtier_benchmark-1Mkeys-load-stream-1-fields-with-100B-values | 131201
+- 0.7% (2 datapoints) | 129545|-1.3% |No Change |

|memtier_benchmark-1Mkeys-load-stream-1-fields-with-100B-values-pipeline-10
| 352891 +- 2.8% (2 datapoints) | 348338|-1.3% |No Change |
|memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values | 104386
+- 0.7% (2 datapoints) | 105796|1.4% |No Change |

|memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10
| 227593 +- 5.5% (2 datapoints) | 218783|-3.9% |waterline=5.5%.
potential REGRESSION|
|memtier_benchmark-1Mkeys-load-string-with-100B-values | 167552 +- 0.2%
(2 datapoints) | 170282|1.6% |No Change |
|memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10 |
646888 +- 0.5% (2 datapoints) | 639680|-1.1% |No Change |
|memtier_benchmark-1Mkeys-load-string-with-10B-values | 174891 +- 0.7%
(2 datapoints) | 174382|-0.3% |No Change |
|memtier_benchmark-1Mkeys-load-string-with-10B-values-pipeline-10 |
749988 +- 5.1% (2 datapoints) | 769986|2.7% |waterline=5.1%. No Change |
|memtier_benchmark-1Mkeys-load-string-with-1KiB-values | 155929 +- 0.1%
(2 datapoints) | 156387|0.3% |No Change |
|memtier_benchmark-1Mkeys-load-zset-with-10-elements-double-score |
92241 +- 0.2% (2 datapoints) | 92189|-0.1% |No Change |
|memtier_benchmark-1Mkeys-load-zset-with-10-elements-int-score | 114328
+- 1.3% (2 datapoints) | 113154|-1.0% |No Change |
|memtier_benchmark-1Mkeys-string-get-100B | 180685 +- 0.2% (2
datapoints) | 180359|-0.2% |No Change |
|memtier_benchmark-1Mkeys-string-get-100B-pipeline-10 | 991291 +- 3.1%
(2 datapoints) | 1020086|2.9% |No Change |
|memtier_benchmark-1Mkeys-string-get-10B | 181183 +- 0.3% (2 datapoints)
| 177868|-1.8% |No Change |
|memtier_benchmark-1Mkeys-string-get-10B-pipeline-10 | 1032554 +- 0.8%
(2 datapoints) | 1023120|-0.9% |No Change |
|memtier_benchmark-1Mkeys-string-get-1KiB | 180479 +- 0.9% (2
datapoints) | 182215|1.0% |No Change |
|memtier_benchmark-1Mkeys-string-get-1KiB-pipeline-10 | 979286 +- 0.9%
(2 datapoints) | 989888|1.1% |No Change |
|memtier_benchmark-1Mkeys-string-mget-1KiB | 121950 +- 0.4% (2
datapoints) | 120996|-0.8% |No Change |
|memtier_benchmark-1key-geo-60M-elements-geodist | 179404 +- 1.0% (2
datapoints) | 181232|1.0% |No Change |
|memtier_benchmark-1key-geo-60M-elements-geodist-pipeline-10 | 1023797
+- 0.5% (2 datapoints) | 1014980|-0.9% |No Change |
|memtier_benchmark-1key-geo-60M-elements-geohash | 180808 +- 1.2% (2
datapoints) | 180606|-0.1% |No Change |
|memtier_benchmark-1key-geo-60M-elements-geohash-pipeline-10 | 1056458
+- 1.6% (2 datapoints) | 1040050|-1.6% |No Change |
|memtier_benchmark-1key-geo-60M-elements-geopos | 181808 +- 0.2% (2
datapoints) | 175945|-3.2% |potential REGRESSION |
|memtier_benchmark-1key-geo-60M-elements-geopos-pipeline-10 | 1038180 +-
3.4% (2 datapoints) | 1033005|-0.5% |No Change |
|memtier_benchmark-1key-geo-60M-elements-geosearch-fromlonlat | 142614
+- 0.3% (2 datapoints) | 144259|1.2% |No Change |
|memtier_benchmark-1key-geo-60M-elements-geosearch-fromlonlat-bybox |
141008 +- 0.4% (2 datapoints) | 139602|-1.0% |No Change |

|memtier_benchmark-1key-geo-60M-elements-geosearch-fromlonlat-pipeline-10
| 560698 +- 0.8% (2 datapoints) | 548806|-2.1% |No Change |
|memtier_benchmark-1key-list-10-elements-lrange-all-elements | 166132 +-
0.9% (2 datapoints) | 170259|2.5% |No Change |
|memtier_benchmark-1key-list-100-elements-lrange-all-elements | 92657 +-
2.0% (2 datapoints) | 101445|9.5% |IMPROVEMENT |
|memtier_benchmark-1key-list-1K-elements-lrange-all-elements | 14965 +-
1.3% (2 datapoints) | 16296|8.9% |IMPROVEMENT |
|memtier_benchmark-1key-pfadd-4KB-values-pipeline-10 | 264156 +- 0.2% (2
datapoints) | 262582|-0.6% |No Change |
|memtier_benchmark-1key-set-10-elements-smembers | 138916 +- 1.7% (2
datapoints) | 138016|-0.6% |No Change |
|memtier_benchmark-1key-set-10-elements-smembers-pipeline-10 | 431019 +-
5.2% (2 datapoints) | 461039|7.0% |waterline=5.2%. IMPROVEMENT |
|memtier_benchmark-1key-set-10-elements-smismember | 173545 +- 1.1% (2
datapoints) | 173488|-0.0% |No Change |
|memtier_benchmark-1key-set-100-elements-smembers | 74367 +- 0.0% (2
datapoints) | 80190|7.8% |IMPROVEMENT |
|memtier_benchmark-1key-set-100-elements-smismember | 155682 +- 1.6% (2
datapoints) | 151367|-2.8% |No Change |
|memtier_benchmark-1key-set-1K-elements-smembers | 11730 +- 0.4% (2
datapoints) | 13519|15.3% |IMPROVEMENT |
|memtier_benchmark-1key-set-200K-elements-sadd-constant | 181070 +- 1.1%
(2 datapoints) | 180214|-0.5% |No Change |
|memtier_benchmark-1key-set-2M-elements-sadd-increasing | 166364 +- 0.1%
(2 datapoints) | 166944|0.3% |No Change |
|memtier_benchmark-1key-zincrby-1M-elements-pipeline-1 | 46071 +- 0.6%
(2 datapoints) | 44979|-2.4% |No Change |
|memtier_benchmark-1key-zrank-1M-elements-pipeline-1 | 48429 +- 0.4% (2
datapoints) | 49265|1.7% |No Change |
|memtier_benchmark-1key-zrem-5M-elements-pipeline-1 | 48528 +- 0.4% (2
datapoints) | 48869|0.7% |No Change |
|memtier_benchmark-1key-zrevrangebyscore-256K-elements-pipeline-1 |
100580 +- 1.5% (2 datapoints) | 101782|1.2% |No Change |
|memtier_benchmark-1key-zrevrank-1M-elements-pipeline-1 | 48621 +- 2.0%
(2 datapoints) | 48473|-0.3% |No Change |
|memtier_benchmark-1key-zset-10-elements-zrange-all-elements | 83485 +-
0.6% (2 datapoints) | 83095|-0.5% |No Change |

|memtier_benchmark-1key-zset-10-elements-zrange-all-elements-long-scores
| 118673 +- 0.8% (2 datapoints) | 118006|-0.6% |No Change |
|memtier_benchmark-1key-zset-100-elements-zrange-all-elements | 19009 +-
1.1% (2 datapoints) | 19293|1.5% |No Change |
|memtier_benchmark-1key-zset-100-elements-zrangebyscore-all-elements |
18957 +- 0.5% (2 datapoints) | 19419|2.4% |No Change |

|memtier_benchmark-1key-zset-100-elements-zrangebyscore-all-elements-long-scores|
171693 +- 0.5% (2 datapoints) | 172432|0.4% |No Change |
|memtier_benchmark-1key-zset-1K-elements-zrange-all-elements | 3566 +-
0.6% (2 datapoints) | 3672|3.0% |No Change |
|memtier_benchmark-1key-zset-1M-elements-zcard-pipeline-10 | 1067713 +-
0.4% (2 datapoints) | 1071550|0.4% |No Change |
|memtier_benchmark-1key-zset-1M-elements-zrevrange-5-elements | 169195
+- 0.7% (2 datapoints) | 169620|0.3% |No Change |
|memtier_benchmark-1key-zset-1M-elements-zscore-pipeline-10 | 914338 +-
0.2% (2 datapoints) | 905540|-1.0% |No Change |
|memtier_benchmark-2keys-lua-eval-hset-expire | 88346 +- 1.7% (2
datapoints) | 87259|-1.2% |No Change |
|memtier_benchmark-2keys-lua-evalsha-hset-expire | 103273 +- 1.2% (2
datapoints) | 102393|-0.9% |No Change |
|memtier_benchmark-2keys-set-10-100-elements-sdiff | 15418 +- 10.9%
UNSTABLE (2 datapoints) | 14369|-6.8% |UNSTABLE (very high variance) |
|memtier_benchmark-2keys-set-10-100-elements-sinter | 83601 +- 3.6% (2
datapoints) | 82508|-1.3% |No Change |
|memtier_benchmark-2keys-set-10-100-elements-sunion | 14942 +- 11.2%
UNSTABLE (2 datapoints) | 14001|-6.3% |UNSTABLE (very high variance) |
|memtier_benchmark-2keys-stream-5-entries-xread-all-entries | 75938 +-
0.4% (2 datapoints) | 76565|0.8% |No Change |
|memtier_benchmark-2keys-stream-5-entries-xread-all-entries-pipeline-10
| 120781 +- 1.1% (2 datapoints) | 119142|-1.4% |No Change |
2024-08-19 13:22:16 +03:00
debing.sun 6c6489280c
Fix a race condition issue in the cache_memory of functionsLibCtx (#13476)
This is a missing of the PR https://github.com/redis/redis/pull/13383.
We will call `functionsLibCtxClear()` in bio, so we shouldn't touch
`curr_functions_lib_ctx` in it.
2024-08-19 10:11:45 +08:00
debing.sun 2b88db90aa
Fix incorrect lag due to trimming stream via XTRIM command (#13473)
## Describe
When using the `XTRIM` command to trim a stream, it does not update the
maximal tombstone (`max_deleted_entry_id`). This leads to an issue where
the lag calculation incorrectly assumes that there are no tombstones
after the consumer group's last_id, resulting in an inaccurate lag.

The reason XTRIM doesn't need to update the maximal tombstone is that it
always trims from the beginning of the stream. This means that it
consistently changes the position of the first entry, leading to the
following scenarios:

1) First entry trimmed after maximal tombstone:
If the first entry is trimmed to a position after the maximal tombstone,
all tombstones will be before the first entry, so they won't affect the
consumer group's lag.

2) First entry trimmed before maximal tombstone:
If the first entry is trimmed to a position before the maximal
tombstone, the maximal tombstone will not be updated.

## Solution
Therefore, this PR optimizes the lag calculation by ensuring that when
both the consumer group's last_id and the maximal tombstone are behind
the first entry, the consumer group's lag is always equal to the number
of remaining elements in the stream.

Supplement to PR https://github.com/redis/redis/pull/13338
2024-08-16 23:13:31 +08:00
debing.sun b94b714f81
Fix error message for XREAD command with wrong parameter (#13474)
Fixed a missing from #13117.
When the number of streams is incorrect, the error message for `XREAD`
needs to include the '+' symbol.
2024-08-14 21:40:43 +08:00
Moti Cohen 806459f481
On HDEL last field with expiry, update global HFE DS (#13470)
Hash field expiration is optimized to avoid frequent update global HFE DS for
each field deletion. Eventually active-expiration will run and update or remove
the hash from global HFE DS gracefully. Nevertheless, statistic "subexpiry"
might reflect wrong number of hashes with HFE to the user if HDEL deletes
the last field with expiration in hash (yet there are more fields without expiration).

Following this change, if HDEL the last field with expiration in the hash then
take care to remove the hash from global HFE DS as well.
2024-08-11 16:39:03 +03:00
LuMingYinDetect 3a08819f51
Fix some memory leaks in redis-cli (#13258)
Fix memory leak related to variable slot_nodes  in the
clusterManagerFixSlotsCoverage() function.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2024-08-08 10:51:33 +08:00
debing.sun 6f0ddc9d92
Pass extensions to node if extension processing is handled by it (#13465)
This PR is based on the commits from PR
https://github.com/valkey-io/valkey/pull/52.
Ref: https://github.com/redis/redis/pull/12760
Close https://github.com/redis/redis/issues/13401
This PR will replace https://github.com/redis/redis/pull/13449

Fixes compatibilty of Redis cluster (7.2 - extensions enabled by
default) with older Redis cluster (< 7.0 - extensions not handled) .

With some of the extensions enabled by default in 7.2 version, new nodes
running 7.2 and above start sending out larger clusterbus message
payload including the ping extensions. This caused an incompatibility
with node running engine versions < 7.0. Old nodes (< 7.0) would receive
the payload from new nodes (> 7.2) would observe a payload length
(totlen) > (estlen) and would perform an early exit and won't process
the message.

This fix does the following things:
1. Always set `CLUSTERMSG_FLAG0_EXT_DATA`, because during the meet
phase, we do not know whether the connected node supports ext data, we
need to make sure that it knows and send back its ext data if it has.
2. If another node does not support ext data, we will not send it ext
data to avoid the handshake failure due to the incorrect payload length.

Note: A successful `PING`/`PONG` is required as a sender for a given
node to be marked as `CLUSTERMSG_FLAG0_EXT_DATA` and then extensions
message
will be sent to it. This could cause a slight delay in receiving the
extensions message(s).

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
2024-08-08 10:48:03 +08:00
Anurag Bandyopadhyay 731f2dc5c7
Make some commets more friendly (#13319)
Close  #13316

---------

Co-authored-by: Anuragkillswitch <70265851+Anuragkillswitch@users.noreply.github.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
2024-08-07 01:20:06 +08:00
debing.sun bf643a63c8
Ensure validity of myself as master or replica when loading cluster config (#13443)
First, we need to ensure that `curmaster` in
`clusterUpdateSlotsConfigWith()` is not NULL in the line
82f00f5179/src/cluster_legacy.c (L2320)
otherwise, it will crash in the
82f00f5179/src/cluster_legacy.c (L2395)

So when loading cluster node config, we need to ensure that the
following conditions are met:
1. A node must be at least one of the master or replica.
2. If a node is a replica, its master can't be NULL.
2024-08-06 20:40:46 +08:00
YaacovHazan e4ddc34463
Keep cluster shards command implementation generic (#13440)
Make the clusterCommandShards function use only cluster API functions
instead of accessing cluster implementation details.
This way the cluster API implementation doesn't have to have intimate
knowledge of the command reply format, and doesn't need to interact with
the client directly (the addReply function family).
The PR has two commits, one moves the function from cluster_legacy.c to
cluster.c, and the other modifies it's implementation.


**better merge without squashing.**
2024-08-05 11:08:48 +03:00
Josh Hershberg 6d5d754119 Make cluster shards cmd implementation generic
This and the previous commit make the cluster
shards command a generic implementation instead of a
specific implementation for each cluster API implementation.
This commit (a) adds functions to the cluster API
and (b) modifies the cluster shards cmd implementation
to use cluster API functions instead of directly
accessing the legacy clustering implementation.

Signed-off-by: Josh Hershberg <yehoshua@redis.com>
2024-08-05 10:31:40 +03:00
Josh Hershberg e3e631f394 Prep to make cluster shards cmd generic
This and the next following commit makes the cluster
shards command a generic implementation instead of a
specific implementation for each cluster API implementation.
This commit simply moves the cluster shards implementation
from cluster_legacy.c to cluster.c without changing the
implementation at all. The reason for doing so was to
help with reviewing the changes in the diff.

Signed-off-by: Josh Hershberg <yehoshua@redis.com>
2024-08-05 10:27:03 +03:00
Zhongxian Pan 6263823e54
Replace bit shift with __builtin_ctzll in HyperLogLog (#13218)
## Replace bit shift with `__builtin_ctzll` in HyperLogLog

Builtin function `__builtin_ctzll` is more effective than bit shift even
though "in the average case there are high probabilities to find a 1
after a few iterations" mentioned in the source file comment.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2024-08-05 10:51:43 +08:00
Moti Cohen 4dd8b1faa9
Fix HTTL/HPTTL to be NONDETERMINISTIC_OUTPUT (#13461)
H[P]TTL should be marked as NONDETERMINISTIC_OUTPUT just like [P]TTL.
2024-08-04 17:42:50 +03:00
Vitah Lin 8038eb3147
Fix wrong dbnum showed in redis-cli after client reconnected (#13411)
When the server restarts while the CLI is connecting, the reconnection
does not automatically select the previous db.
This may lead users to believe they are still in the previous db, in
fact, they are in db0.

This PR will automatically reset the current dbnum and `cliSelect()`
again when reconnecting.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2024-08-03 12:06:02 +08:00
c8ef 89742a95db
Fix typo in `hyperloglog.c` (#13458) 2024-08-02 07:49:52 +08:00
debing.sun 60e9e630bd
Fix CLUSTER SHARDS command returns empty array (#13422)
Close https://github.com/redis/redis/issues/13414

When the cluster's master node fails and is switched to another node,
the first node in the shard node list (the old master) is no longer
valid.
Add a new method clusterGetMasterFromShard() to obtain the current
master.
2024-08-02 07:22:13 +08:00
debing.sun e750c619b2
Fix some test failures caused by key being deleted due to premature expiration (#13453)
1. Fix fuzzer test failure when the key was deleted due to expiration
before sending random traffic for the key.

After HFE, when all fields in a hash are expired, the hash might be
deleted due to expiration.

If the key was expired in the mid of `RESTORE` command and sending rand
trafic, `fuzzer` test will fail in the following code because the 'TYPE
key' will return `none` and then throw an exception because it cannot be
found in `$commands`

94b9072e44/tests/support/util.tcl (L712-L713)

This PR adds a `None` check for the reply of `KEY TYPE` command and adds
a print of `err` to avoid false positives in the future.

failed CI:
https://github.com/redis/redis/actions/runs/10127334121/job/28004985388

2. Fix the issue where key was deleted due to expiration before the
`scan.scan_key` command could execute, caused by premature enabling of
`set-active-expire`.

failed CI:
https://github.com/redis/redis/actions/runs/10153722715/job/28077610552

---------

Co-authored-by: oranagra <oran@redislabs.com>
2024-07-31 08:15:39 +08:00
debing.sun 93fb83b4cb
Fix incorrect lag field in XINFO when tombstone is after the last_id of consume group (#13338)
Fix #13337

Ths PR fixes fixed two bugs that caused lag calculation errors.
1. When the latest tombstone is before the first entry, the tombstone
may stil be after the last id of consume group.
2. When a tombstone is after the last id of consume group, the group's
counter will be invalid, we should caculate the entries_read by using
estimates.
2024-07-30 22:31:31 +08:00
Lior Kogan 94b9072e44
Rename to "Redis Community Edition" (#13448) 2024-07-28 20:54:28 +03:00
Oran Agra e74550dd10
solve races in replication lpop tests (#13445)
* some tests didn't wait for replication offset sync
* tests that used deferring client, didn't wait for it to get blocked.
an in some cases, the replication offset sync ended before the deferring
client finished, so the digest match failed.
* some tests used deferring clients excessively
* the tests didn't read the client response
* the tests didn't close the client (fd leak)
2024-07-25 14:06:40 +03:00
Moti Cohen d0c64d78d4
On active expire, factor maxToExpire based on Hertz (#13439) 2024-07-25 13:22:02 +03:00
Moti Cohen 82f00f5179
Optimize RDB_TYPE_HASH_METADATA to keep relative expiration time (#13438)
Modify RDB_TYPE_HASH_METADATA layout to store expiration times relative
to the minimum expiration time, which is written at the start as absolute time.
2024-07-24 08:39:10 +03:00
Oran Agra 447ce11a64
solve race conditions in tests (#13433)
[exception]: Executing test client: ERR FAILOVER target replica is not
online.. ERR FAILOVER target replica is not online.
    while executing
"$node_0 failover to $node_1_host $node_1_port"
    ("uplevel" body line 16)
    invoked from within
"uplevel 1 $code"
    (procedure "test" line 58)
    invoked from within
"test {failover command to specific replica works} {

[err]: client evicted due to percentage of maxmemory in
tests/unit/client-eviction.tcl
Expected 33622 >= 220200 && 33622 < 440401 (context: type eval line 17
cmd {assert {$tot_mem >= $n && $tot_mem < $maxmemory_clients_actual}}
proc ::test)
2024-07-22 10:11:56 +03:00
Oran Agra 13d227fa46
Different fix for the race in #13361 (#13434)
Recently in #13361, i attempted to fix a race between FLUSHALL and
BGSAVE, where despite calling killRDBChild, the
backgroundSaveDoneHandler will terminate with success.
Turns out that even if the child didn't yet exit, there's a chance it'll
still miss our signal and exit with success.
in that case, we will still mess up the dirty counter (deducting
dirty_before_bgsave) which is reset by FLUSHALL, and override the
synchronous rdb file we saved.

instead, we'll set a flag to treat the next done handler as a failed
one.
2024-07-22 10:11:30 +03:00
Oran Agra a331978583
Fix external test hang in redis-cli test when run in a certain order (#13423)
When the tests are run against an external server in this order:
`--single unit/introspection --single unit/moduleapi/blockonbackground
--single integration/redis-cli`
the test would hang when the "ASK redirect test" test attempts to create
a listening socket (it fails, and then redis-cli itself hangs waiting
for a non-responsive socket created by the introspection test).

the reasons are:
1. the blockedbackground test includes util.tcl and resets the
`::last_port_attempted` variable
2. the test in introspection didn't close the listening server, so it's
still alive.
3. find_available_port doesn't properly detect the busy port, and it
thinks that the port is free even though it's busy.

fixing all 3 of these problems, even though fixing just one would be
enough to let the test pass.
2024-07-17 15:42:28 +03:00
Oran Agra fa46aa4d85
Test infra adjustments for external CI runs (#13421)
- when uploading server logs, make sure they don't overwrite each other.
- sort the test units to get consistent order between them (following
#13220)
- backup and restore the entire server configuration, to protect one
unit from config changes another unit performs
2024-07-16 11:38:20 +03:00
debing.sun 88af96c7a2
Trigger Lua GC after script loading (#13407)
Nowdays we do not trigger LUA GC after loading lua script. This means
that when a large number of scripts are loaded, such as when functions
are propagating from the master to the replica, if the LUA scripts are
never touched on the replica, the garbage might remain there
indefinitely.

Before this PR, we would share a gc_count between scripts and functions.
This means that, under certain circumstances, the GC trigger for scripts
and functions was not fair.
For example, loading a large number of scripts followed by a small
number of functions could result in the functions triggering GC.
In this PR, we assign a unique `gc_count` to each of them, so the GC
triggers between them will no longer affect each other.

on the other hand, this PR will to bring regession for script loading
commands(`FUNCTION LOAD` and `SCRIPT LOAD`), but they are not hot path,
we can ignore it, and it will be replaced
https://github.com/redis/redis/pull/13375 in the future.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2024-07-16 09:28:47 +08:00
debing.sun 76415fa2cf
Prevent deleting RDB read event after restarting RDB saving for other diskless replicas (#13410)
When we terminate the diskless RDB saving child process and, at the same
time, we start a new BGSAVE for new replicas, we should not delete the
RDB read event. Otherwise, these replicas will never receive a response.
this is a result of the recent change in
https://github.com/redis/redis/pull/13361

---------

Co-authored-by: oranagra <oran@redislabs.com>
2024-07-16 09:22:43 +08:00
debing.sun 7d3545cb16
Reduce redundant call of prepareClientToWrite when call addReply* continuously (#13412)
This PR is based on the commits from PR
https://github.com/valkey-io/valkey/pull/670.

## Description

While exploring hotspots with profiling some benchmark workloads, we
noticed the high cycles ratio of `prepareClientToWrite`, taking about 9%
of the CPU of `smembers`, `lrange` commands. After deep dive the code
logic, we thought we can gain the performance by reducing the redundant
call of `prepareClientToWrite` when call addReply* continuously.

For example: In

https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L1080-L1082,
`prepareClientToWrite` is called three times in a row.

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
Co-authored-by: Lipeng Zhu <lipeng.zhu@intel.com>
Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>
2024-07-15 23:19:19 +08:00
Oran Agra 880e147d52
solve redis-cli test failures due to local history file (#13419)
test failure:
```
[err]: Interactive CLI: should find second search result if user presses ctrl+s in tests/integration/redis-cli.tcl
Expected '1' to be equal to '0' (context: type eval line 10 cmd {assert_equal 1 [regexp {\(i-search\): \x1B\[0mk\x1B\[1mey\x1B\[0ms one} $result]} proc ::test)
```

this test (introduced in #12543) depends on the local history file, so
it can fail if there's some match there.
the fix is to use a different history file, and delete it before each
run.
2024-07-15 15:24:43 +03:00
guybe7 b10e19e3d6
Crash report: Use more chars for argv (#13413)
128 is not enough chars when we're talking about commands like RESTORE.
Of course, it's impossible to find the perfect number, but 1024 is
better than 128, and it's not obscenely large.
2024-07-14 11:37:44 +08:00
debing.sun d39548c854
Avoid starting defrag after config resetstat for defrag test (#13399)
If `config resetstat` is executed and a defrag is started after it, the
`total_active_defrag_time` will not be 0.
When we start the defrag again, we will skip the following steps:
1. waiting for the defrag to start. (s total_active_defrag_time is equal
0)
2. waiting for the test to complete. (active_defrag_running is euqal 0)
which result in the test failed.

---------

Co-authored-by: oranagra <oran@redislabs.com>
2024-07-12 10:07:09 +08:00
debing.sun ad9277238f
Disable aof for `Scripts do not block on waitaof` test (#13409)
The`Scripts do not block on waitaof` test keeps failing in the external
CI:
https://github.com/redis/redis/actions/runs/9875670156/job/27272955900?pr=13407

The main reason is that if AOF happens to be enabled, we get an acklocal
of 1.
2024-07-11 20:25:57 +08:00
guybe7 915a9e4b93
KSN: HEXPIRE-like commands should emit hdel if expire-time is in the past (#13408)
To be more similar to EXPIRE-like commands, which emit a "del"
notification if the expire-time is in the past
2024-07-11 16:24:58 +08:00
guybe7 81440a333d
Module unblock on keys: updateStatsOnUnblock is called twice (#13405)
This commit reverts the deletion of the condition `!bc->blocked_on_keys`
that was accidentally introduced by
https://github.com/redis/redis/pull/12817.
In case a blocked-on-keys module client is unblocked both
`moduleUnblockClientOnKey` and `moduleHandleBlockedClients` are called
which resulted in `updateStatsOnUnblock` being called twice

Now, that `moduleHandleBlockedClients` doesn't call
`updateStatsOnUnblock` in case of unblocked module key-blocked clients,
in the unlikely event that the module decides to call `RM_UnblockClient`
on a key-blocked client, we need to call `updateStatsOnUnblock` from
within `moduleBlockedClientTimedOut`, but since
`moduleBlockedClientTimedOut` is not tread-safe we can't call it
directly from withing `RM_UnblockClient`.
Added a new flag `blocked_on_keys_explicit_unblock` for that specific
case, which will cause `moduleBlockedClientTimedOut` to be called from
`moduleHandleBlockedClients` (which is only called from the main thread)

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2024-07-11 16:13:38 +08:00
debing.sun ffff7fea7c
Rebuild function engines for function flush command (#13383)
### Issue
The current implementation of `FUNCTION FLUSH` command uses
`lua_unref()` to unreference script closures in Lua vm. However,
invoking `lua_unref()` during lazy free (`ASYNC` argument) is risky
since it is not thread-safe.

Another issue is that using `lua_unref()` to unreference references does
not trigger GC, This can result in the Lua VM leaves a significant
amount of garbage, which may never be cleaned up if not properly GC.

### Solution
The proposed solution is to completely rebuild the engines, resulting in
a brand new Lua VM.

---------

Co-authored-by: meir <meir@redis.com>
2024-07-10 17:35:36 +08:00
debing.sun 69b480cb7a
Hide user data from log (#13400)
This PR is based on the commits from PR #11747.

In the event of an assertion failure, hide command arguments from the
operator.

In some cases, private client information can be voluntarily exposed
when a redis instance crashes due to an assertion failure.
This commit prevent וnintentional client info exposure.
Operators can still access the hidden data, but they must actively
request it.
Any of the client info commands remains the unchanged.

### Config
Add a new config `hide-user-data-from-log` to turn this feature on and
off, default off.

---------

Co-authored-by: naglera <anagler123@gmail.com>
Co-authored-by: naglera <58042354+naglera@users.noreply.github.com>
2024-07-09 18:54:18 +08:00
Vitah Lin b699e8bfe0
Upgrade action/checkout version and add old-chain CI actions to test gcc4.8 (#13394)
https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/

Due to GitHub removing support for CentOS 7 in GitHub Actions, all
actions utilizing CentOS 7 need to be upgraded, upgrade the centos
version from `contos:7` to `quay.io/centos/centos:stream9` which is the
official RedHat centos container.

Create some new actions named `old-chain` to verify support for gcc 4.8.

This PR also includes the upgrade of actions/checkout from version 3 to
version 4.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2024-07-08 21:34:14 +08:00
Moti Cohen dcf0229811
HFE - RDB serialize also hash min expiration (For ROF flow) (#13391)
* Following this feature, Redis (ROF) may implement flow that allows hashes to be
  dumped directly from RDB to FLUSH without parsing. In this scenario, it is still
  essential to determine when to update hashes due to expired fields. By writing
  and reading the next minimum hash-field expiration before serializing objects
  to and from RDB, we can effectively track and expire hash fields without the need
  to parse the hash during loading.

    Before:
    #define RDB_TYPE_HASH_METADATA 22
    #define RDB_TYPE_HASH_LISTPACK_EX 23
    
    After:
    /* Hash with HFEs. Doesn't attach min TTL at start */
    #define RDB_TYPE_HASH_METADATA_PRE_GA 22      
    /* Hash LP with HFEs. Doesn't attach min TTL at start */
    #define RDB_TYPE_HASH_LISTPACK_EX_PRE_GA 23   
    /* Hash with HFEs. Attach min TTL at start */
    #define RDB_TYPE_HASH_METADATA 24             
    /* Hash LP with HFEs. Attach min TTL at start */
    #define RDB_TYPE_HASH_LISTPACK_EX 25          


* Manually test loading RDB file before the change and verify hash and its HFEs 
  is as expected.
* Added `subexpires` counter to `redis-check-rdb`
2024-07-07 17:00:19 +03:00
debing.sun 58edea7385
Add help for `DEBUG SCRIPT` command (#13385)
1. Add help for `DEBUG SCRIPT` command.
2. Remove a duplicate `getLuaScripts()` which is same as
`evalScriptsDict()`.
2024-07-04 14:44:42 +08:00