This change improves the performance of cluster slots by removing the deferring lengths that are used. Deferring lengths are used in two contexts, the first is for determining the number of replicas that serve a slot (Added in 6.2 as part of a different performance improvement) and the second is for determining the extra networking options for each node (Added in 7.0). For continuous slots, (e.g. 0-8196) this improvement is very negligible, however it becomes more significant when slots are not continuous (e.g. 0 2 4 6 etc) which can happen in production for various users.
The `cluster slots` command is deprecated in favor of `cluster shards`, but since most clients don't support the new command yet I think it's important to not degrade performance here.
Benchmarking shows about 2x improvement, however I wasn't able to get a coherent TPS number since the benchmark process was being saturated long before Redis was, so had to run with multiple benchmarks and merge results. If needed I can add this to our memtier framework. Instead the next section shows the number of usec per call from the benchmark results, which shows significant improvement as well as having a more coherent response in the CoB.
| | New Code | Old Code | % Improvements
|----|----|----- |-----
| Uniform slots| usec_per_call=10.46 | usec_per_call=11.03 | 5.7%
| Worst case (Only even slots)| usec_per_call=963.80 | usec_per_call=2950.99 | 307%
This change also removes some extra white space that I added a when making a code change for adding hostnames.
(cherry picked from commit e74a1f3bd9)
We need to honor the post-execution-unit API and call it after each KSN
Note that this is an edge case that only happens in case volatile keys were
created directly on a writable replica, and that anyway nothing is propagated to sub-replicas
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit df327b8bd5)
According to the source code, the commands can be executed with only key name,
and no GET/SET/INCR operation arguments.
change the docs to reflect that by marking these arguments as optional.
also add tests.
(cherry picked from commit fea9bbbe0f)
Authenticated users issuing specially crafted SETRANGE and SORT(_RO)
commands can trigger an integer overflow, resulting with Redis attempting
to allocate impossible amounts of memory and abort with an OOM panic.
Related to the hang reported in #11671
Currently, redis can disconnect a client due to reaching output buffer limit,
it'll also avoid feeding that output buffer with more data, but it will keep
running the loop in the command (despite the client already being marked for
disconnection)
This PR is an attempt to mitigate the problem, specifically for commands that
are easy to abuse, specifically: KEYS, HRANDFIELD, SRANDMEMBER, ZRANDMEMBER.
The RAND family of commands can take a negative COUNT argument (which is not
bound to the number of elements in the key), so it's enough to create a key
with one field, and then these commands can be used to hang redis.
For KEYS the caller can use the existing keyspace in redis (if big enough).
the metadata for the new arguments of XSETID,
entries-added and max-deleted-id, which have been added
in Redis 7.0 was missing.
(cherry picked from commit 44c6770372)
Turns out that a fork child calling getExpire while persisting keys (and
possibly also a result of some module fork tasks) could cause dictFind
to do incremental rehashing in the child process, which is both a waste
of time, and also causes COW harm.
(cherry picked from commit 2bec254d89)
Any value in the range of [0-1) turns to 0 when being cast from double to long long. This change rounds up instead of down for values that can't be stored precisely as long doubles.
(cherry picked from commit eef29b68a2)
TLDR: solve a problem introduced in Redis 7.0.6 (#11541) with
RM_CommandFilterArgInsert being called from scripts, which can
lead to memory corruption.
Libc realloc can return the same pointer even if the size was changed. The code in
freeLuaRedisArgv had an assumption that if the pointer didn't change, then the
allocation didn't change, and the cache can still be reused.
However, if rewriteClientCommandArgument or RM_CommandFilterArgInsert were
used, it could be that we realloced the argv array, and the pointer didn't change, then
a consecutive command being executed from Lua can use that argv cache reaching
beyond its size.
This was actually only possible with modules, since the decision to realloc was based
on argc, rather than argv_len.
(cherry picked from commit c8052122a2)
This call is introduced in #8687, but became irrelevant in #11348, and is currently a no-op.
The fact is that #11348 an unintended side effect, which is that even if the client eviction config
is enabled, there are certain types of clients for which memory consumption is not accurately
tracked, and so unlike normal clients, their memory isn't reported correctly in INFO.
(cherry picked from commit af0a4fe207)
As Sentinel supports dynamic IP only when using hostnames, there
are few leftover addess comparison logic that doesn't take into
account that the IP might get change.
Co-authored-by: moticless <moticless@github.com>
(cherry picked from commit 4a27aa4875)
Fixes a regression introduced by #11552 in 7.0.6.
it causes replies in the GEO commands to contain garbage when the
result is a very small distance (less than 1)
Includes test to confirm indeed with junk in buffer now we properly reply
(cherry picked from commit d7b4c9175e)
In replica, the key expired before master's `INCR` was arrived, so INCR
creates a new key in the replica and the test failed.
```
*** [err]: Replication of an expired key does not delete the expired key in tests/integration/replication-4.tcl
Expected '0' to be equal to '1' (context: type eval line 13 cmd {assert_equal 0 [$slave exists k]} proc ::test)
```
This test is very likely to do a false positive if the `wait_for_ofs_sync`
takes longer than the expiration time, so give it a few more chances.
The test was introduced in #9572.
(cherry picked from commit 06b577aad0)
There is a race condition in the test:
```
*** [err]: redis-cli --cluster add-node with cluster-port in tests/unit/cluster/cli.tcl
Expected '5' to be equal to '4' {assert_equal 5 [CI 0 cluster_known_nodes]} proc ::test)
```
When using cli to add node, there can potentially be a race condition
in which all nodes presenting cluster state o.k even though the added
node did not yet meet all cluster nodes.
This comment and the fix were taken from #11221. Also apply it in several
other similar places.
(cherry picked from commit a549b78c48)
When using cli to add node, there can potentially be a race condition in
which all nodes presenting cluster state o.k even though the added node
did not yet meet all cluster nodes.
this adds another utility function to wait until all cluster nodes see the same cluster size
(cherry picked from commit c0ce97facc)
A timing issue like this was reported in freebsd daily CI:
```
*** [err]: Sanity test push cmd after resharding in tests/unit/cluster/cli.tcl
Expected 'CLUSTERDOWN The cluster is down' to match '*MOVED*'
```
We additionally wait for each node to reach a consensus on the cluster
state in wait_for_condition to avoid the cluster down error.
The fix just like #10495, quoting madolson's comment:
Cluster check just verifies the the config state is self-consistent,
waiting for cluster_state to be okay is an independent check that all
the nodes actually believe each other are healthy.
At the same time i noticed that unit/moduleapi/cluster.tcl has an exact
same test, may have the same problem, also modified it.
(cherry picked from commit 5ce64ab010)
* Fixes build warning when CACHE_LINE_SIZE is already defined
* Fixes wrong CACHE_LINE_SIZE on some FreeBSD systems where it could be set to 128 (e.g. on MIPS)
* Fixes wrong CACHE_LINE_SIZE on Apple M1 (use 128 instead of 64)
Wrong cache line size in that case can some false sharing of array elements between threads, see #10892
(cherry picked from commit 871cc200a0)
Clang Address Sanitizer tests started reporting unknown-crash on these
tests due to the memcheck, disable the memcheck to avoid that noise.
(cherry picked from commit 18ff6a3269)
This test sets the master ping interval to 1 hour, in order to avoid
pings in the replicatoin stream incrementing the replication offset,
however, it didn't increase the repl-timeout so on slow machines
where the test took more than 60 seconds, the replicas would drop
and reconnect.
```
*** [err]: PSYNC2: Partial resync after restart using RDB aux fields in tests/integration/psync2.tcl
Replica didn't partial sync
```
The test would detect 4 additional partial syncs where it expects
only one.
(cherry picked from commit b0250b4508)
Our FreeBSD daily has been failing recently:
```
Config file: freebsd-13.1.conf
cd: /Users/runner/work/redis/redis: No such file or directory
gmake: *** No targets specified and no makefile found. Stop.
```
Upgrade vmactions/freebsd-vm to the latest version (0.3.0) can work.
I've tested it, but don't know why, but first let's fix it.
(cherry picked from commit 5246bf4544)
The kill above is sometimes successful and sometimes already too late.
The PING in pysnc wrong offset test got rejected by bgsaveerr because
lastbgsave_status is C_ERR.
In theory, using diskless can avoid PING being affected, because when
the replica is dropped, we will kill the child with SIGUSR1, and this
will not affect lastbgsave_status.
Anyway, this kill is not particularly needed here, dropping the kill
is the best one, since we do have the waitForBgsave, so just let it
take care of the bgsave. No need for fast termination.
(cherry picked from commit e7144693e2)
There is overhead on Redis 7.0 EXPIRE command that is not present on 6.2.7.
We could see that on the unstable profile there are around 7% of CPU cycles
spent on rewriteClientCommandVector that are not present on 6.2.7.
This was introduced in #8474.
This PR reduces the overhead by using 2X rewriteClientCommandArgument instead of
rewriteClientCommandVector. In this scenario rewriteClientCommandVector creates 4 arguments.
the above usage of rewriteClientCommandArgument reduces the overhead in half.
This PR should also improve PEXPIREAT performance by avoiding at all
rewriteClientCommandArgument usage.
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit c3fb48da8b)
## Issue
During the client input/output buffer processing, the memory usage is
incrementally updated to keep track of clients going beyond a certain
threshold `maxmemory-clients` to be evicted. However, this additional
tracking activity leads to unnecessary CPU cycles wasted when no
client-eviction is required. It is applicable in two cases.
* `maxmemory-clients` is set to `0` which equates to no client eviction
(applicable to all clients)
* `CLIENT NO-EVICT` flag is set to `ON` which equates to a particular
client not applicable for eviction.
## Solution
* Disable client memory usage tracking during the read/write flow when
`maxmemory-clients` is set to `0` or `client no-evict` is `on`.
The memory usage is tracked only during the `clientCron` i.e. it gets
periodically updated.
* Cleanup the clients from the memory usage bucket when client eviction
is disabled.
* When the maxmemory-clients config is enabled or disabled at runtime,
we immediately update the memory usage buckets for all clients (tested
scanning 80000 took some 20ms)
Benchmark shown that this can improve performance by about 5% in
certain situations.
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit c0267b3fa5)
There is a issue with --sentinel:
```
[root]# src/redis-server sentinel.conf --sentinel --loglevel verbose
*** FATAL CONFIG FILE ERROR (Redis 255.255.255) ***
Reading the configuration file, at line 352
>>> 'sentinel "--loglevel" "verbose"'
Unrecognized sentinel configuration statement
```
This is because in #10660 (Redis 7.0.1), `--` prefix change break it.
In this PR, we will handle `--sentinel` the same as we did for `--save`
in #10866. i.e. it's a pseudo config option with no value.
(cherry picked from commit 8f13ac10b4)
This is take 2 of `GEOSEARCH BYBOX` optimizations based on haversine
distance formula when longitude diff is 0.
The first one was in #11535 .
- Given longitude diff is 0 the asin(sqrt(a)) on the haversine is asin(sin(abs(u))).
- arcsin(sin(x)) equal to x when x ∈[−𝜋/2,𝜋/2].
- Given latitude is between [−𝜋/2,𝜋/2] we can simplifiy arcsin(sin(x)) to x.
On the sample dataset with 60M datapoints, we've measured 55% increase
in the achievable ops/sec.
(cherry picked from commit e48ac075c0)
This mechanism aims to reduce calls to malloc and free when
preparing the arguments the script sends to redis commands.
This is a mechanism was originally implemented in 48c49c4
and 4f68655, and was removed in #10220 (thinking it's not needed
and that it has no impact), but it now turns out it was wrong, and it
indeed provides some 5% performance improvement.
The implementation is a little bit too simplistic, it assumes consecutive
calls use the same size in the same arg index, but that's arguably
sufficient since it's only aimed at caching very small things.
We could even consider always pre-allocating args to the full
LUA_CMD_OBJCACHE_MAX_LEN (64 bytes) rather than the right size for the argument,
that would increase the chance they'll be able to be re-used.
But in some way this is already happening since we're using
sdsalloc, which in turn uses s_malloc_usable and takes ownership
of the full side of the allocation, so we are padded to the allocator
bucket size.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: sundb <sundbcn@gmail.com>
(cherry picked from commit 2d80cd7840)
GEODIST used snprintf("%.4f") for the reply using addReplyDoubleDistance,
which was slow. This PR optimizes it without breaking compatibility by following
the approach of ll2string with some changes to match the use case of distance
and precision. I.e. we multiply it by 10000 format it as an integer, and then add
a decimal point. This can achieve about 35% increase in the achievable ops/sec.
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 61c85a2b20)
profiling EVALSHA\ we see that luaReplyToRedisReply takes 8.73% out of the
56.90% of luaCallFunction CPU cycles.
Using addReplyStatusLength instead of directly composing the protocol to avoid
sdscatprintf and addReplySds ( which imply multiple sdslen calls ).
The new approach drops
luaReplyToRedisReply CPU cycles to 3.77%
(cherry picked from commit 68e87eb088)
As being discussed in #10981 we see a degradation in performance
between v6.2 and v7.0 of Redis on the EVAL command.
After profiling the current unstable branch we can see that we call the
expensive function evalCalcFunctionName twice.
The current "fix" is to basically avoid calling evalCalcFunctionName and
even dictFind(lua_scripts) twice for the same command.
Instead we cache the current script's dictEntry (for both Eval and Functions)
in the current client so we don't have to repeat these calls.
The exception would be when doing an EVAL on a new script that's not yet
in the script cache. in that case we will call evalCalcFunctionName (and even
evalExtractShebangFlags) twice.
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 7dfd7b9197)
redis-benchmark: when trying to get the CONFIG before benchmark,
avoid printing any warning on most errors (e.g. NOPERM error).
avoid aborting the benchmark on NOPERM.
keep the warning only when we abort the benchmark on a NOAUTH error
(cherry picked from commit f0005b5328)
In scenarios in which we have large datasets and the elements are not
contained within the range we do spurious calls do sdsdup and sdsfree.
I.e. instead of pre-creating an sds before we know if we're gonna use it
or not, change the role of geoAppendIfWithinShape to just do geoWithinShape,
and let the caller create the string only when needed.
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 376b689b03)
The cluster-announce-port/cluster-announce-bus-port/cluster-announce-tls-port should take effect at runtime
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
(cherry picked from commit 25ffa79b64)
Optimize geohashGetDistanceIfInRectangle when there are many misses.
It calls 3x geohashGetDistance. The first 2 times we call them to produce intermediate results.
This PR focus on optimizing for those 2 intermediate results.
1 Reduce expensive computation on intermediate geohashGetDistance with same long
2 Avoid expensive lon_distance calculation if lat_distance fails beforehand
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit ae1de54900)
Fix compile warning for SHA1Transform() method under alpine with GCC 12.
Warning:
```
In function 'SHA1Update',
inlined from 'SHA1Final' at sha1.c:187:9:
sha1.c:144:13: error: 'SHA1Transform' reading 64 bytes from a region of size 0 [-Werror=stringop-overread]
144 | SHA1Transform(context->state, &data[i]);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sha1.c:144:13: note: referencing argument 2 of type 'const unsigned char[64]'
sha1.c: In function 'SHA1Final':
sha1.c:56:6: note: in a call to function 'SHA1Transform'
56 | void SHA1Transform(uint32_t state[5], const unsigned char buffer[64])
| ^~~~~~~~~~~~~
```
This warning is a false positive because it has been determined in the loop judgment that there must be 64 chars after position `i`
```c
for ( ; i + 63 < len; i += 64) {
SHA1Transform(context->state, &data[i]);
}
```
Reference: e1d7d3e40a
(cherry picked from commit fd80818552)
Now, according to the comments, if the truncated file is not the last file,
it will be considered as a fatal error.
And the return code will updated to AOF_FAILED, then server will exit
without any error message to the client.
Similar to other error situations, this PR add an explicit error message
for this case and make the client know clearly what happens.
(cherry picked from commit 6e9724cb6a)
Both functions and eval are marked as "no-monitor", since we want to explicitly feed in the script command before the commands generated by the script. Note that we want this behavior generally, so that commands can redact arguments before being added to the monitor.
(cherry picked from commit d136bf2830)
In moduleFireServerEvent we change the real client DB to 0 on freeClient in case the event is REDISMODULE_EVENT_CLIENT_CHANGE.
It results in a crash if the client is blocked on a key on other than DB 0.
The DB change is not necessary even for module-client, as we set its DB to 0 on either createClient or moduleReleaseTempClient.
Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit e4eb18b303)
Fix a few issues with the recent #11463
* use exitFromChild instead of exit
* test should ignore defunct process since that's what we expect to
happen for thees child processes when the parent dies.
* fix typo
Co-authored-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit 4c54528f0f)
During a diskless sync, if the master main process crashes, the child would
have hung in `write`. This fix closes the read fd on the child side, so that if the
parent crashes, the child will get a write error and exit.
This change also fixes disk-based replication, BGSAVE and AOFRW.
In that case the child wouldn't have been hang, it would have just kept
running until done which may be pointless.
There is a certain degree of risk here. in case there's a BGSAVE child that could
maybe succeed and the parent dies for some reason, the old code would have let
the child keep running and maybe succeed and avoid data loss.
On the other hand, if the parent is restarted, it would have loaded an old rdb file
(or none), and then the child could reach the end and rename the rdb file (data
conflicting with what the parent has), or also have a race with another BGSAVE
child that the new parent started.
Note that i removed a comment saying a write error will be ignored in the child
and handled by the parent (this comment was very old and i don't think relevant).
(cherry picked from commit ccaef5c923)
Funcion sentinelAddrEqualsHostname() of sentinel makes DNS resolve
and based on it determines if two IP addresses are equal. Now, If the
DNS resolve command fails, the function simply returns 0, even if the
hostnames are identical.
This might become an issue in case of failover such that sentinel might
receives from Redis instance, response to regular INFO query it sent,
and wrongly decide that the instance is pointing to is different leader
than the one recorded because of this function, yet hostnames are
identical. In turn sentinel disconnects the connection between sentinel
and valid slave which leads to -failover-abort-no-good-slave.
See issue #11241.
I managed to reproduce only part of the flow in which the function
return wrong result and trigger +fix-slave-config.
The fix is even if the function failed to resolve then compare based on
hostnames. That is our best effort as long as the server is unavailable
for some reason. It is fine since Redis instance cannot have multiple
hostnames for a given setup
(cherry picked from commit bd23b15ad7)
In the module, we will reuse the list iterator entry for RM_ListDelete, but `listTypeDelete` will only update
`quicklistEntry->zi` but not `quicklistEntry->node`, which will result in `quicklistEntry->node` pointing to
a freed memory address if the quicklist node is deleted.
This PR sync `key->u.list.index` and `key->u.list.entry` to list iterator after `RM_ListDelete`.
This PR also optimizes the release code of the original list iterator.
Co-authored-by: Viktor Söderqvist <viktor@zuiderkwast.se>
(cherry picked from commit 6dd213558b)
When using the MIGRATE, with a destination Redis that has the user name or password set to the string "keys",
Redis would have determine the wrong set of key names the command is gonna access.
This lead to ACL returning wrong authentication result.
Destination instance:
```
127.0.0.1:6380> acl setuser default >keys
OK
127.0.0.1:6380> acl setuser keys on nopass ~* &* +@all
OK
```
Source instance:
```
127.0.0.1:6379> set a 123
OK
127.0.0.1:6379> acl setuser cc on nopass ~a* +@all
OK
127.0.0.1:6379> auth cc 1
OK
127.0.0.1:6379> migrate 127.0.0.1 6380 "" 0 1000 auth keys keys a
(error) NOPERM this user has no permissions to access one of the keys used as arguments
127.0.0.1:6379> migrate 127.0.0.1 6380 "" 0 1000 auth2 keys pswd keys a
(error) NOPERM this user has no permissions to access one of the keys used as arguments
```
Using `acl dryrun` we know that the parameters of `auth` and `auth2` are mistaken for the `keys` option.
```
127.0.0.1:6379> acl dryrun cc migrate whatever whatever "" 0 1000 auth keys keys a
"This user has no permissions to access the 'keys' key"
127.0.0.1:6379> acl dryrun cc migrate whatever whatever "" 0 1000 auth2 keys pswd keys a
"This user has no permissions to access the 'pswd' key"
```
Fix the bug by editing db.c/migrateGetKeys function, which finds the `keys` option and all the keys following.
(cherry picked from commit 9ab873d9d3)
1. show the overcommit warning when overcommit is disabled (2),
not just when it is set to heuristic (0).
2. improve warning text to mention the issue with jemalloc causing VM
mapping fragmentation when set to 2.
(cherry picked from commit dd60c6c8d3)