An integer overflow bug in Redis version 6.0 or newer could be exploited using
the STRALGO LCS command to corrupt the heap and potentially result with remote
code execution.
(cherry picked from commit f0c5f920d0)
An integer overflow bug in Redis 6.2 could be exploited to corrupt the heap and
potentially result with remote code execution.
The vulnerability involves changing the default set-max-intset-entries
configuration value, creating a large set key that consists of integer values
and using the COPY command to duplicate it.
The integer overflow bug exists in all versions of Redis starting with 2.6,
where it could result with a corrupted RDB or DUMP payload, but not exploited
through COPY (which did not exist before 6.2).
(cherry picked from commit 29900d4e6b)
We sometimes see the crash report saying we were killed by a random
process even in cases where the crash was spontanius in redis.
for instance, crashes found by the corrupt-dump test.
It looks like this si_pid is sometimes left uninitialized, and a good
way to tell if the crash originated in redis or trigged by outside is to
look at si_code, real signal codes are always > 0, and ones generated by
kill are have si_code of 0 or below.
(cherry picked from commit b45b0d81bb)
listenToPort attempts to gracefully handle and ignore certain errors but does not store errno prior to logging, which in turn calls several libc functions that may overwrite errno.
This has been discovered due to libmusl strftime() always returning with errno set to EINVAL, which resulted with docker-library/redis#273.
(cherry picked from commit df5f543b65)
Scenario:
1. A module client is blocked on keys with a timeout
2. Shortly before the timeout expires, the key is being populated and signaled
as ready
3. Redis calls moduleTryServeClientBlockedOnKey (which replies to client) and
then moduleUnblockClient
4. moduleUnblockClient doesn't really unblock the client, it writes to
server.module_blocked_pipe and only marks the BC as unblocked.
5. beforeSleep kics in, by this time the client still exists and techincally
timed-out. beforeSleep replies to the timeout client (double reply) and
only then moduleHandleBlockedClients is called, reading from module_blocked_pipe
and calling unblockClient
The solution is similar to what was done in moduleTryServeClientBlockedOnKey: we
should avoid re-processing an already-unblocked client
(cherry picked from commit e58118cda6)
* Remove linux/version.h dependency.
This introduces unnecessary dependencies, and generally not a good idea
as the platform we build on may be different than the platform we run
on.
To determine if sync_file_range exists we can simply rely on header file
hints.
* Fix setproctitle() on libmusl.
The previous ifdef checks were a bit too strict for no apparent
reason.
* Fix tests failure on Linux with no backtrace.
* Add alpine daily CI job.
(cherry picked from commit 95ea74549c)
Also adds a new daily CI test, relying on the fact that we don't use malloc_size() on alpine libmusl.
Fixes#8531
(cherry picked from commit dd885780d6)
On 32-bit systems, setting the proto-max-bulk-len config parameter to a high value may result with integer overflow and a subsequent heap overflow when parsing an input bulk (CVE-2021-21309).
This fix has two parts:
Set a reasonable limit to the config parameter.
Add additional checks to prevent the problem in other potential but unknown code paths.
(cherry picked from commit d32f2e9999)
Github started shifting some repositoreis to use ubuntu 20.04 by default
tcl8.5 is missing in these, but 8.6 exists in both 20.04 and 18.04
(cherry picked from commit 018f7b7378)
Without this fix, RM_ZsetRem can leave empty sorted sets which are
not allowed to exist.
Removing from a sorted set while iterating seems to work (while
inserting causes failed assetions). RM_ZsetRangeEndReached is
modified to return 1 if the key doesn't exist, to terminate
iteration when the last element has been removed.
(cherry picked from commit aea6e71ef8)
When a database on a 64 bit build grows past 2^31 keys, the underlying hash table expands to 2^32 buckets. After this point, the algorithms for selecting random elements only return elements from half of the available buckets because they use random() which has a range of 0 to 2^31 - 1. This causes problems for eviction policies which use dictGetSomeKeys or dictGetRandomKey. Over time they cause the hash table to become unbalanced because, while new keys are spread out evenly across all buckets, evictions come from only half of the available buckets. Eventually this half of the table starts to run out of keys and it takes longer and longer to find candidates for eviction. This continues until no more evictions can happen.
This solution addresses this by using a 64 bit PRNG instead of libc random().
Co-authored-by: Greg Femec <gfemec@google.com>
(cherry picked from commit 266949c7fc)
When redis responds with tracking-redir-broken push message (RESP3),
it was responding with a broken protocol: an array of 3 elements, but only
pushes 2 elements.
(cherry picked from commit f687ac0c32)
Fix the pointers to the slot hash tags in the case of prefixed commands usage
i.e. AUTH / SELECT
It adjusts the pointers to the slot hash tags in the case of prefixed commands
usage as soon as we get the 1st reply (same like we already did for the random
strings within the command )
(cherry picked from commit efccd6353b)
At least in one case the arm64 cow kernel bug test triggers an assert, which is a problem because it cannot be ignored like cases where the bug is found.
On older systems (Linux <4.5) madvise fails because MADV_FREE is not supported. We treat these failures as an indication the system is not affected.
Fixes#8351, #8406
(cherry picked from commit 3a5049042a)
Fixes a regression introduced due to a new (safer) way of rewriting configuration files. In the past the file was simply overwritten (same inode), but now Redis creates a new temporary file and later renames it over the old one.
The temp file typically gets created with 0600 permissions so we later fchmod it to fix that. Unlike open with O_CREAT, fchmod doesn't consider umask so we have to do that explicitly.
Fixes#8369
(cherry picked from commit b548ffabbe)
The output for COMMAND command was wrong for some commands.
clients can use firstkey,lastkey,step to find (some) key name arguments, and the
"movablekeys" flag to know that they can't know all (or any) of the key name arguments.
These commands had the wrong output:
1. GEORADIUS*_RO used to have "movablekeys" (which it doesn't really need)
2. XREAD and XREADGROUP used to have (1,1,1). but that's completely wrong.
3. Z*STORE used to have (0,0,0) but it can at lest give the index of the dstkey (1,1,1)
(cherry picked from commit baf92f3f1c)
clientsArePaused isn't thread safe because it has a side effect of attempting to unpause,
which may cause multiple threads concurrently updating the unblocked_clients global list.
This change resolves this issue by no longer postponing client for threaded reads when
clients are paused and then skipping the check for client paused for threaded reads,
in case one is postponed and then clients are paused. (I don't think this is strictly possible,
but being defensive seems better here)
c4fdf09c0 added a test that now fails with valgrind
it fails for two resons:
1) the test samples the used memory and then limits the maxmemory to
that value, but it turns out this is not atomic and on slow machines
the background cron process that clean out old query buffers reduces
the memory so that the setting doesn't cause eviction.
2) the dbsize was tested late, after reading some invalidation messages
by that time more and more keys got evicted, partially draining the
db. this is not the focus of this fix (still a known limitation)
(cherry picked from commit a102b21d17)
The test was trying to wait for the replica to start loading the rdb
from the master before it kills the master, but it was actually waiting
for ROLE to be in "sync" mode, which corresponds to REPL_STATE_TRANSFER
that starts before the actual loading starts.
now instead it waits for the loading flag to be set.
Besides, the test was dependent on the previous configuration of the
servers, relying on the fact the replica is configured to persist
(either RDB of AOF), now it is set explicitly.
(cherry picked from commit 264953871b)
This will allow to use: RedisModule_CreateStringPrintf(ctx, "%s %c %s", "string1", 0, "string2");
On large string, the previous code would incrementally retry to double the output buffer.
now it uses the the return value of snprintf and grows to the right size in one step.
and also avoids an excessive strlen in sdscat at the end.
(cherry picked from commit f4ca3d8757)
The bug occurs when 'callback' re-registers itself to a point
in the future and the execution time in non-negligible:
'now' refers to time BEFORE callback was executed and is used
to calculate 'next_period'.
We must get the actual current time when calculating 'next_period'
(cherry picked from commit 1a91a2700b)
Turns out this was broken since version 4.0 when we added sds size
classes.
The cluster code uses sds for the receive buffer, and then casts it to a
struct and accesses a 64 bit variable.
This commit replaces the use of sds with a simple reallocated buffer.
(cherry picked from commit 380f6048e0)
When client tracking is enabled signalModifiedKey can increase memory usage,
this can cause the loop in performEvictions to keep running since it was measuring
the memory usage impact of signalModifiedKey.
The section that measures the memory impact of the eviction should be just on dbDelete,
excluding keyspace notification, client tracking, and propagation to AOF and replicas.
This resolves part of the problem described in #8069
p.s. fix took 1 minute, test took about 3 hours to write.
(cherry picked from commit c4fdf09c05)
This PR not only fixes the problem that swapdb does not make the
transaction fail, but also optimizes the FLUSHALL and FLUSHDB command to
set the CLIENT_DIRTY_CAS flag to avoid unnecessary traversal of clients.
FLUSHDB was changed to first iterate on all watched keys, and then on the
clients watching each key.
Instead of iterating though all clients, and for each iterate on watched keys.
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 10f94b0ab1)
Turns out that when the fork child crashes, the crash log was deleting
the pidfile from the disk (although the parent is still running.
Now we set the pidfile of the fork process to NULL so the fork process
will never deletes it.
(cherry picked from commit 92a483bca2)
instead of asking for the extra new space it wanted, it asked to grow the
string by the size it already has too.
i.e. a string of 1000 bytes, needing to grow by 10 bytes, would have been
asking for an **additional** 1010 bytes.
(cherry picked from commit 595ecd5f4b)
This is a recent problem, introduced by 7471743 (redis 6.0)
The implications are:
The sole difference between LookupKeyRead and LookupKeyWrite is for command
executed on a replica, which are not received from its master client. (for the master,
and for the master client on the replica, these two functions behave the same)!
Since SORT is a write command, this bug only implicates a writable-replica.
And these are its implications:
- SORT STORE will behave as it did before the above mentioned commit (like before
redis 6.0). on a writable-replica an already logically expired the key would have
appeared missing. (store dest key would be deleted, instead of being populated
with the data from the already logically expired key)
- SORT (the non store variant, which in theory could have been executed on
read-only-replica if it weren't for the write flag), will (in redis 6.0) have a new bug
and return the data from the already logically expired key instead of empty response.
(cherry picked from commit 08ad6abd04)
Turns out the RDB checksum in Redis 6.0 on bigendian is broken.
It always returned 0, so the RDB files are generated as if checksum is
disabled, and will be loaded ok on littleendian, and on bigendian.
But it'll not be able to load RDB files generated on littleendian or older versions.
Similarly DUMP and RESTORE will work on the same version (0==0),
but will be unable to exchange dump payloads with littleendian or old versions.
(cherry picked from commit 324070c8f6)
getRDB is "designed" to work in two modes: one for redis-cli --rdb and
one for redis-cli --cluster backup.
in the later case it uses the hiredis connection from the cluster nodes
and it used to free it without nullifying the context, so a later
attempt to free the context would crash.
I suppose the reason it seems to want to free the hiredis context ASAP
is that it wants to disconnect the replica link, so that replication
buffers will not be accumulated.
(cherry picked from commit 41b2ed2bbc)
When a Lua script returns a map to redis (a feature which was added in
redis 6 together with RESP3), it would have returned the value first and
the key second.
If the client was using RESP2, it was getting them out of order, and if
the client was in RESP3, it was getting a map of value => key.
This was happening regardless of the Lua script using redis.setresp(3)
or not.
This also affects a case where the script was returning a map which it got
from from redis by doing something like: redis.setresp(3); return redis.call()
This fix is a breaking change for redis 6.0 users who happened to rely
on the wrong order (either ones that used redis.setresp(3), or ones that
returned a map explicitly).
This commit also includes other two changes in the tests:
1. The test suite now handles RESP3 maps as dicts rather than nested
lists
2. Remove some redundant (duplicate) tests from tracking.tcl
(cherry picked from commit 2017407b4d)