Fix a few issues with the recent #11463
* use exitFromChild instead of exit
* test should ignore defunct process since that's what we expect to
happen for thees child processes when the parent dies.
* fix typo
Co-authored-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit 4c54528f0f)
During a diskless sync, if the master main process crashes, the child would
have hung in `write`. This fix closes the read fd on the child side, so that if the
parent crashes, the child will get a write error and exit.
This change also fixes disk-based replication, BGSAVE and AOFRW.
In that case the child wouldn't have been hang, it would have just kept
running until done which may be pointless.
There is a certain degree of risk here. in case there's a BGSAVE child that could
maybe succeed and the parent dies for some reason, the old code would have let
the child keep running and maybe succeed and avoid data loss.
On the other hand, if the parent is restarted, it would have loaded an old rdb file
(or none), and then the child could reach the end and rename the rdb file (data
conflicting with what the parent has), or also have a race with another BGSAVE
child that the new parent started.
Note that i removed a comment saying a write error will be ignored in the child
and handled by the parent (this comment was very old and i don't think relevant).
(cherry picked from commit ccaef5c923)
Funcion sentinelAddrEqualsHostname() of sentinel makes DNS resolve
and based on it determines if two IP addresses are equal. Now, If the
DNS resolve command fails, the function simply returns 0, even if the
hostnames are identical.
This might become an issue in case of failover such that sentinel might
receives from Redis instance, response to regular INFO query it sent,
and wrongly decide that the instance is pointing to is different leader
than the one recorded because of this function, yet hostnames are
identical. In turn sentinel disconnects the connection between sentinel
and valid slave which leads to -failover-abort-no-good-slave.
See issue #11241.
I managed to reproduce only part of the flow in which the function
return wrong result and trigger +fix-slave-config.
The fix is even if the function failed to resolve then compare based on
hostnames. That is our best effort as long as the server is unavailable
for some reason. It is fine since Redis instance cannot have multiple
hostnames for a given setup
(cherry picked from commit bd23b15ad7)
In the module, we will reuse the list iterator entry for RM_ListDelete, but `listTypeDelete` will only update
`quicklistEntry->zi` but not `quicklistEntry->node`, which will result in `quicklistEntry->node` pointing to
a freed memory address if the quicklist node is deleted.
This PR sync `key->u.list.index` and `key->u.list.entry` to list iterator after `RM_ListDelete`.
This PR also optimizes the release code of the original list iterator.
Co-authored-by: Viktor Söderqvist <viktor@zuiderkwast.se>
(cherry picked from commit 6dd213558b)
When using the MIGRATE, with a destination Redis that has the user name or password set to the string "keys",
Redis would have determine the wrong set of key names the command is gonna access.
This lead to ACL returning wrong authentication result.
Destination instance:
```
127.0.0.1:6380> acl setuser default >keys
OK
127.0.0.1:6380> acl setuser keys on nopass ~* &* +@all
OK
```
Source instance:
```
127.0.0.1:6379> set a 123
OK
127.0.0.1:6379> acl setuser cc on nopass ~a* +@all
OK
127.0.0.1:6379> auth cc 1
OK
127.0.0.1:6379> migrate 127.0.0.1 6380 "" 0 1000 auth keys keys a
(error) NOPERM this user has no permissions to access one of the keys used as arguments
127.0.0.1:6379> migrate 127.0.0.1 6380 "" 0 1000 auth2 keys pswd keys a
(error) NOPERM this user has no permissions to access one of the keys used as arguments
```
Using `acl dryrun` we know that the parameters of `auth` and `auth2` are mistaken for the `keys` option.
```
127.0.0.1:6379> acl dryrun cc migrate whatever whatever "" 0 1000 auth keys keys a
"This user has no permissions to access the 'keys' key"
127.0.0.1:6379> acl dryrun cc migrate whatever whatever "" 0 1000 auth2 keys pswd keys a
"This user has no permissions to access the 'pswd' key"
```
Fix the bug by editing db.c/migrateGetKeys function, which finds the `keys` option and all the keys following.
(cherry picked from commit 9ab873d9d3)
1. show the overcommit warning when overcommit is disabled (2),
not just when it is set to heuristic (0).
2. improve warning text to mention the issue with jemalloc causing VM
mapping fragmentation when set to 2.
(cherry picked from commit dd60c6c8d3)
As mentioned on docs, `RM_ResetDataset` Performs similar operation to FLUSHALL.
As FLUSHALL do not clean the function, `RM_ResetDataset` should not clean the functions
as well.
(cherry picked from commit d2ad01ab3e)
If Redis crashes due to calling an invalid function pointer,
the `backtrace` function will try to dereference this invalid pointer
which will cause a crash inside the crash report and will kill
the processes without having all the crash report information.
Example:
```
=== REDIS BUG REPORT START: Cut & paste starting from here ===
198672:M 19 Sep 2022 18:06:12.936 # Redis 255.255.255 crashed by signal: 11, si_code: 1
198672:M 19 Sep 2022 18:06:12.936 # Accessing address: 0x1
198672:M 19 Sep 2022 18:06:12.936 # Crashed running the instruction at: 0x1
// here the processes is crashing
```
This PR tries to fix this crash be:
1. Identify the issue when it happened.
2. Replace the invalid pointer with a pointer to some dummy function
so that `backtrace` will not crash.
I identification is done by comparing `eip` to `info->si_addr`, if they
are the same we know that the crash happened on the same address it tries to
accesses and we can conclude that it tries to call and invalid function pointer.
To replace the invalid pointer we introduce a new function, `setMcontextEip`,
which is very similar to `getMcontextEip` and it knows to set the Eip for the
different supported OS's. After printing the trace we retrieve the old `Eip` value.
(cherry picked from commit 0bf90d9443)
The original idea behind auto-setting the default (first,last,step) spec was to use
the most "open" flags when the user didn't provide any key-spec flags information.
While the above idea is a good approach, it really makes no sense to set
CMD_KEY_VARIABLE_FLAGS if the user didn't provide the getkeys-api flag:
in this case there's not way to retrieve these variable flags, so what's the point?
Internally in redis there was code to ignore this already, so this fix doesn't change
redis's behavior, it only affects the output of COMMAND command.
(cherry picked from commit 3330ea1864)
SunOS/Solaris and its relatives don't support the flock() function.
While "redis" has been excluding setting up the lock using flock() on the cluster
configuration file when compiling under Solaris, it was still using flock() in the
unlock call while shutting down.
This pull request eliminates the flock() call also in the unlocking stage
for Oracle Solaris and its relatives.
Fix compilation regression from #10912
(cherry picked from commit 6aab4cb736)
* Fix CLUSTER SHARDS showing empty hostname
In #10290, we changed clusterNode hostname from `char*`
to `sds`, and the old `node->hostname` was changed to
`sdslen(node->hostname)!=0`.
But in `addNodeDetailsToShardReply` it is missing.
It results in the return of an empty string hostname
in CLUSTER SHARDS command if it unavailable.
Like this (note that we listed it as optional in the doc):
```
9) "hostname"
10) ""
```
(cherry picked from commit 1de675b3d5)
Adds a number of user management/ACL validaiton/command execution functions to improve a
Redis module's ability to enforce ACLs correctly and easily.
* RM_SetContextUser - sets a RedisModuleUser on the context, which RM_Call will use to both
validate ACLs (if requested and set) as well as assign to the client so that scripts executed via
RM_Call will have proper ACL validation.
* RM_SetModuleUserACLString - Enables one to pass an entire ACL string, not just a single OP
and have it applied to the user
* RM_GetModuleUserACLString - returns a stringified version of the user's ACL (same format as dump
and list). Contains an optimization to cache the stringified version until the underlying ACL is modified.
* Slightly re-purpose the "C" flag to RM_Call from just being about ACL check before calling the
command, to actually running the command with the right user, so that it also affects commands
inside EVAL scripts. see #11231
(cherry picked from commit 6e993a5dfa)
The bug is that the the server keeps on sending newlines to the client.
As a result, the receiver might not find the EOF marker since it searches
for it only on the end of each payload it reads from the socket.
The but only affects `CLIENT_REPL_RDBONLY`.
This affects `redis-cli --rdb` (depending on timing)
The fixed consist of two steps:
1. The `CLIENT_REPL_RDBONLY` should be closed ASAP (we cannot
always call to `freeClient` so we use `freeClientAsync`)
2. Add new replication state `SLAVE_STATE_RDB_TRANSMITTED`
(cherry picked from commit e53bf65245)
The previous implementation calls `snprintf` twice, the second time used to
'memcpy' the output of the first, which could be a very large string.
The new implementation reserves space for the protocol header ahead
of the formatted double, and then prepends the string length ahead of it.
Measured improvement of simple ZADD of some 25%.
(cherry picked from commit 90223759a3)
Executing an XAUTOCLAIM command on a stream key in a specific state, with a
specially crafted COUNT argument may cause an integer overflow, a subsequent
heap overflow, and potentially lead to remote code execution.
The problem affects Redis versions 7.0.0 or newer.
When RM_Call was used with `M` (reject OOM), `W` (reject writes),
as well as `S` (rejecting stale or write commands in "Script mode"),
it would have only checked the command flags, but not the declared
script flag in case it's a command that runs a script.
Refactoring: extracts out similar code in server.c's processCommand
to be usable in RM_Call as well.
(cherry picked from commit bed6d759bc)
When using `INFO ALL <section>`, when `section` is a specific module section.
Redis will not print the additional section(s).
The fix in this case, will search the modules info sections if the user provided additional sections to `ALL`.
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit eedb8b1724)
This bug was introduced in #10344 (7.0.3), and it breaks the
redis-cli --cluster create usage in #10436 (7.0 RC3).
At the same time, the cluster-port support introduced in #10344
cannot use the DNS lookup brought by #10436.
(cherry picked from commit c2b0c13d5c)
This PR mainly deals with 2 crashes introduced in #9357,
and fix the QUICKLIST-PACKED-THRESHOLD mess in external test mode.
1. Fix crash due to deleting an entry from a compress quicklistNode
When inserting a large element, we need to create a new quicklistNode first,
and then delete its previous element, if the node where the deleted element is
located is compressed, it will cause a crash.
Now add `dont_compress` to quicklistNode, if we want to use a quicklistNode
after some operation, we can use this flag like following:
```c
node->dont_compress = 1; /* Prevent to be compressed */
some_operation(node); /* This operation might try to compress this node */
some_other_operation(node); /* We can use this node without decompress it */
node->dont_compress = 0; /* Re-able compression */
quicklistCompressNode(node);
```
Perhaps in the future, we could just disable the current entry from being
compressed during the iterator loop, but that would require more work.
2. Fix crash due to wrongly split quicklist
before #9357, the offset param of _quicklistSplitNode() will not negative.
For now, when offset is negative, the split extent will be wrong.
following example:
```c
int orig_start = after ? offset + 1 : 0;
int orig_extent = after ? -1 : offset;
int new_start = after ? 0 : offset;
int new_extent = after ? offset + 1 : -1;
# offset: -2, after: 1, node->count: 2
# current wrong range: [-1,-1] [0,-1]
# correct range: [1,-1] [0, 1]
```
Because only `_quicklistInsert()` splits the quicklistNode and only
`quicklistInsertAfter()`, `quicklistInsertBefore()` call _quicklistInsert(),
so `quicklistReplaceEntry()` and `listTypeInsert()` might occur this crash.
But the iterator of `listTypeInsert()` is alway from head to tail(iter->offset is
always positive), so it is not affected.
The final conclusion is this crash only occur when we insert a large element
with negative index into a list, that affects `LSET` command and `RM_ListSet`
module api.
3. In external test mode, we need to restore quicklist packed threshold after
when the end of test.
4. Show `node->count` in quicklistRepr().
5. Add new tcl proc `config_get_set` to support restoring config in tests.
(cherry picked from commit 13d25dd95e)
This bug is introduced in #7653. (Redis 6.2.0)
When `server.maxmemory_eviction_tenacity` is 100, `eviction_time_limit_us` is
`ULONG_MAX`, and if we cannot find the best key to delete (e.g. maxmemory-policy
is `volatile-lru` and all keys with ttl have been evicted), in `cant_free` redis will sleep
forever if some items are being freed in the lazyfree thread.
(cherry picked from commit 464aa04188)
EVAL scripts are by default not considered `write` commands, so they were allowed on a replica.
But when adding a shebang, they become `write` command (unless the `no-writes` flag is added).
With this change we'll handle them as write commands, and reply with MOVED instead of
READONLY when executed on a redis cluster replica.
Co-authored-by: chendianqiang <chendianqiang@meituan.com>
(cherry picked from commit e42d98ed27)
TLDR: the CLUSTER command originally had the `random` flag,
so all the sub-commands initially got that new flag, but in fact many
of them don't need it.
The only effect of this change is on the output of COMMAND INFO.
(cherry picked from commit a7da7473cb)
Redis 7.0 has #9890 which added an assertion when the propagation queue
was not flushed and we got to beforeSleep.
But it turns out that when processCommands calls getNodeByQuery and
decides to reject the command, it can lead to a key that was lazy
expired and is deleted without later flushing the propagation queue.
This change prevents lazy expiry from deleting the key at this stage
(not as part of a command being processed in `call`)
(cherry picked from commit c789fb0aa7)
Currently, we call zfree(cmd->args), but the argument array
needs to be freed recursively (there might be sub-args).
Also fixed memory leaks on cmd->tips and cmd->history.
Fixes#11145
(cherry picked from commit fc3956e8f4)
`bitfield` with `get` may not be readonly.
```
127.0.0.1:6384> acl setuser hello on nopass %R~* +@all
OK
127.0.0.1:6384> auth hello 1
OK
127.0.0.1:6384> bitfield hello set i8 0 1
(error) NOPERM this user has no permissions to access one of the keys used as arguments
127.0.0.1:6384> bitfield hello set i8 0 1 get i8 0
1) (integer) 0
2) (integer) 1
```
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit ec5034a2e3)
The docs state that there is a new and an old argument format.
The current state of the arguments allows mixing the old and new format,
thus the need for two additional oneof blocks.
One for differentiating the new from the old format and then one to
allow setting multiple filters using the new format.
(cherry picked from commit 4ce3fd51b9)
When FLUSHDB / FLUSHALL / SWAPDB is inside MULTI / EXEC, the
client side tracking invalidation message was interleaved with transaction response.
(cherry picked from commit 6f0a27e38e)
Fix#11030, use lua_rawget to avoid triggering metatables.
#11030 shows how return `_G` from the Lua script (either function or eval), cause the
Lua interpreter to Panic and the Redis processes to exit with error code 1.
Though return `_G` only panic on Redis 7 and 6.2.7, the underline issue exists on older
versions as well (6.0 and 6.2). The underline issue is returning a table with a metatable
such that the metatable raises an error.
The following example demonstrate the issue:
```
127.0.0.1:6379> eval "local a = {}; setmetatable(a,{__index=function() foo() end}) return a" 0
Error: Server closed the connection
```
```
PANIC: unprotected error in call to Lua API (user_script:1: Script attempted to access nonexistent global variable 'foo')
```
The Lua panic happened because when returning the result to the client, Redis needs to
introspect the returning table and transform the table into a resp. In order to scan the table,
Redis uses `lua_gettable` api which might trigger the metatable (if exists) and might raise an error.
This code is not running inside `pcall` (Lua protected call), so raising an error causes the
Lua to panic and exit. Notice that this is not a crash, its a Lua panic that exit with error code 1.
Returning `_G` panics on Redis 7 and 6.2.7 because on those versions `_G` has a metatable
that raises error when trying to fetch a none existing key.
### Solution
Instead of using `lua_gettable` that might raise error and cause the issue, use `lua_rawget`
that simply return the value from the table without triggering any metatable logic.
This is promised not to raise and error.
The downside of this solution is that it might be considered as breaking change, if someone
rely on metatable in the returned value. An alternative solution is to wrap this entire logic
with `pcall` (Lua protected call), this alternative require a much bigger refactoring.
### Back Porting
The same fix will work on older versions as well (6.2, 6.0). Notice that on those version,
the issue can cause Redis to crash if inside the metatable logic there is an attempt to accesses
Redis (`redis.call`). On 7.0, there is not crash and the `redis.call` is executed as if it was done
from inside the script itself.
### Tests
Tests was added the verify the fix
(cherry picked from commit 020e046b42)
In rewriteAppendOnlyFileBackground, after flushAppendOnlyFile(1),
and before openNewIncrAofForAppend, we should call redis_fsync
to fsync the aof file.
Because we may open a new INCR AOF in openNewIncrAofForAppend,
in the case of using everysec policy, the old AOF file may not
be fsynced in time (or even at all).
When using everysec, we don't want to pay the disk latency from
the main thread, so we will do a background fsync.
Adding a argument for bioCreateCloseJob, a `need_fsync` flag to
indicate that a fsync is required before the file is closed. So we will
fsync the old AOF file before we close it.
A cleanup, we make union become a union, since the free_* args and
the fd / fsync args are never used together.
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 03fff10ab4)
RM_SetAbsExpire and RM_GetAbsExpire were not actually operational since
they were introduced, due to omission in API registration.
(cherry picked from commit 39d216a326)
these are missing from the RO_ commands, present in the other ones.
Co-authored-by: Ubuntu <lucas.guang.yang1@huawei.com>
(cherry picked from commit 56828bab59)
According to the Redis functions documentation, FCALL command format could be
FCALL function_name numberOfKeys [key1, key2, key3.....] [arg1, arg2, arg3.....]
So in the json file of fcall and fcall_ro, we should add optional for key and arg part.
Just like EVAL...
Co-authored-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit 2d3240f31b)
The temporary array for deleted entries reply of XAUTOCLAIM was
insufficient, but also in fact the COUNT argument should be used to
control the size of the reply, so instead of terminating the loop by
only counting the claimed entries, we'll count deleted entries as well.
Fix#10968
Addresses CVE-2022-31144
(cherry picked from commit 2825b6057b)
If we do not use jemalloc (mostly with valgrind) and use an old compiler that does not support C11
we will get compilation error
Co-authored-by: Valentino Geron <valentino@redis.com>
(cherry picked from commit 82b8203555)