redis

Commit Graph

Author	SHA1	Message	Date
Vitah Lin	e7cd611be1	Fix tls port update not reflected in CLUSTER SLOTS (#13966 ) ### Problem A previous PR (https://github.com/redis/redis/pull/13932) fixed the TCP port issue in CLUSTER SLOTS, but it seems the handling of the TLS port was overlooked. There is this comment in the `addNodeToNodeReply` function in the `cluster.c` file: ```c /* Report TLS ports to TLS client, and report non-TLS port to non-TLS client. */ addReplyLongLong(c, clusterNodeClientPort(node, shouldReturnTlsInfo())); addReplyBulkCBuffer(c, clusterNodeGetName(node), CLUSTER_NAMELEN); ``` ### Fixed This PR fixes the TLS port issue and adds relevant tests.	2025-05-27 15:39:09 +03:00
nesty92	89aee9556d	Fix incorrect lag due to trimming stream via XTRIM or XADD command (#13958 ) This PR fix the lag calculation by ensuring that when consumer group's last_id is behind the first entry, the consumer group's entries read is considered invalid and recalculated from the start of the stream Supplement to PR #13473 Close #13957 Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>	2025-05-27 15:39:09 +03:00
Stav-Levi	50e91ca7db	Fix port update not reflected in CLUSTER SLOTS (#13932 ) Close https://github.com/redis/redis/issues/13892 config set port cmd updates server.port. cluster slot retrieves information about cluster slots and their associated nodes. the fix updates this info when config set port cmd is done, so cluster slots cmd returns the right value.	2025-05-27 15:39:09 +03:00
YaacovHazan	42fb340ce4	Limiting output buffer for unauthenticated client (CVE-2025-21605) For unauthenticated clients the output buffer is limited to prevent them from abusing it by not reading the replies	2025-04-23 08:09:40 +00:00
Benson-li	a26774cee1	Fix potential infinite loop of RANDOMKEY during client pause (#13863 ) The bug mentioned in this [#13862](https://github.com/redis/redis/issues/13862) has been fixed. --------- Signed-off-by: li-benson <1260437731@qq.com> Signed-off-by: youngmore1024 <youngmore1024@outlook.com> Co-authored-by: youngmore1024 <youngmore1024@outlook.com>	2025-04-22 13:49:12 +03:00
nafraf	6e819e188b	Fix module loadex command crash due to invalid config (#13653 ) Fix to https://github.com/redis/redis/issues/13650 providing an invalid config to a module with datatype crashes when redis tries to unload the module due to the invalid config --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-04-22 13:49:12 +03:00
YaacovHazan	15e212bf69	Fix Read/Write key pattern selector (CVE-2024-51741) The '%' rule must contain one or both of R/W	2025-01-06 16:03:47 +02:00
Yossi Gottlieb	ffde2cf74d	Use cross-platform-actions for FreeBSD support. (#12732 ) This change overcomes many stability issues experienced with the vmactions action. We need to limit VMs to 8GB for better stability, as the 13GB default seems to hang them occasionally. Shell code has been simplified since this action seem to use `bash -e` which will abort on non-zero exit codes anyway.	2025-01-06 16:03:47 +02:00
debing.sun	a904067f6f	Fix incorrect lag due to trimming stream via XTRIM command (#13473 ) ## Describe When using the `XTRIM` command to trim a stream, it does not update the maximal tombstone (`max_deleted_entry_id`). This leads to an issue where the lag calculation incorrectly assumes that there are no tombstones after the consumer group's last_id, resulting in an inaccurate lag. The reason XTRIM doesn't need to update the maximal tombstone is that it always trims from the beginning of the stream. This means that it consistently changes the position of the first entry, leading to the following scenarios: 1) First entry trimmed after maximal tombstone: If the first entry is trimmed to a position after the maximal tombstone, all tombstones will be before the first entry, so they won't affect the consumer group's lag. 2) First entry trimmed before maximal tombstone: If the first entry is trimmed to a position before the maximal tombstone, the maximal tombstone will not be updated. ## Solution Therefore, this PR optimizes the lag calculation by ensuring that when both the consumer group's last_id and the maximal tombstone are behind the first entry, the consumer group's lag is always equal to the number of remaining elements in the stream. Supplement to PR https://github.com/redis/redis/pull/13338	2025-01-06 16:03:47 +02:00
debing.sun	3ce29e0b11	Pass extensions to node if extension processing is handled by it (#13465 ) This PR is based on the commits from PR https://github.com/valkey-io/valkey/pull/52. Ref: https://github.com/redis/redis/pull/12760 Close https://github.com/redis/redis/issues/13401 This PR will replace https://github.com/redis/redis/pull/13449 Fixes compatibilty of Redis cluster (7.2 - extensions enabled by default) with older Redis cluster (< 7.0 - extensions not handled) . With some of the extensions enabled by default in 7.2 version, new nodes running 7.2 and above start sending out larger clusterbus message payload including the ping extensions. This caused an incompatibility with node running engine versions < 7.0. Old nodes (< 7.0) would receive the payload from new nodes (> 7.2) would observe a payload length (totlen) > (estlen) and would perform an early exit and won't process the message. This fix does the following things: 1. Always set `CLUSTERMSG_FLAG0_EXT_DATA`, because during the meet phase, we do not know whether the connected node supports ext data, we need to make sure that it knows and send back its ext data if it has. 2. If another node does not support ext data, we will not send it ext data to avoid the handshake failure due to the incorrect payload length. Note: A successful `PING`/`PONG` is required as a sender for a given node to be marked as `CLUSTERMSG_FLAG0_EXT_DATA` and then extensions message will be sent to it. This could cause a slight delay in receiving the extensions message(s). --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> Co-authored-by: Harkrishn Patro <harkrisp@amazon.com> --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>	2025-01-06 16:03:47 +02:00
debing.sun	0e06e67a36	Fix incorrect lag field in XINFO when tombstone is after the last_id of consume group (#13338 ) Fix #13337 Ths PR fixes fixed two bugs that caused lag calculation errors. 1. When the latest tombstone is before the first entry, the tombstone may stil be after the last id of consume group. 2. When a tombstone is after the last id of consume group, the group's counter will be invalid, we should caculate the entries_read by using estimates.	2025-01-06 16:03:47 +02:00
Oran Agra	045617e10a	Fix possible crash due to OOM panic on invalid command (#13380 ) getKeysUsingKeySpece had the range check AFTER the allocation, of the keys buffer, which could lead to an OOM panic when invalid arguments are provided, leading to an overflow. The allocated memory is only used after the range check, so there's no risk of buffer overrun. The OOM panic can happen on 32bit builds, or 64 builds running on systems with less than 4GB of RAM, and is reachable via the COMMAND GETKEYSANDFLAGS, and ACL key name validation.	2025-01-06 16:03:47 +02:00
debing.sun	4a92d66ca2	Don't keep global replication buffer reference for replicas marked CLIENT_CLOSE_ASAP (#13363 ) In certain situations, we might generate a large number of propagates (e.g., multi/exec, Lua script, or a single command generating tons of propagations) within an event loop. During the process of propagating to a replica, if the replica is disconnected(marked as CLIENT_CLOSE_ASAP) due to exceeding the output buffer limit, we should remove its reference to the global replication buffer to avoid the global replication buffer being unable to be properly trimmed due to being referenced. --------- Co-authored-by: oranagra <oran@redislabs.com>	2025-01-06 16:03:47 +02:00
debing.sun	9f823685d2	Have consistent behavior of SPUBLISH within multi/exec like regular command (#13276 ) This PR is based on the commits from PR #12944. Allow SPUBLISH command within multi/exec on replica Behavior on unstable: ``` 127.0.0.1:6380> CLUSTER NODES 39ce8aa20f1f0d91f1a88d976ee1926dfefcdf1a 127.0.0.1:6380@16380 myself,slave 8b0feb120b68aac489d6a5af9c77dc40d71bc792 0 0 0 connected 8b0feb120b68aac489d6a5af9c77dc40d71bc792 127.0.0.1:6379@16379 master - 0 1705091681202 0 connected 0-16383 127.0.0.1:6380> SPUBLISH hello world (integer) 0 127.0.0.1:6380> MULTI OK 127.0.0.1:6380(TX)> SPUBLISH hello world QUEUED 127.0.0.1:6380(TX)> EXEC (error) MOVED 866 127.0.0.1:6379 ``` With this change: ``` 127.0.0.1:6380> SPUBLISH hello world (integer) 0 127.0.0.1:6380> MULTI OK 127.0.0.1:6380(TX)> SPUBLISH hello world QUEUED 127.0.0.1:6380(TX)> EXEC 1) (integer) 0 ``` --------- Co-authored-by: Harkrishn Patro <harkrisp@amazon.com> Co-authored-by: oranagra <oran@redislabs.com>	2025-01-06 16:03:47 +02:00
sundb	1936746e63	Fix oom-score-adj test due to no permission (#12887 ) Fix #12792 On ubuntu 23(lunar), non-root users will not be allowed to change the oom_score_adj of a process to a value that is too low. Since terminal's default oom_score_adj is 200, if we run the test on terminal, we won't be able to set the oom_score_adj of the redis process to 9 or 22, which is too low. Reproduction on ubuntu 23(lunar) terminal: ```sh $ cat /proc/`pgrep redis-server`/oom_score_adj 200 $ echo 100 > /proc/`pgrep redis-server`/oom_score_adj # success without error $ echo 99 > /proc/`pgrep redis-server`/oom_score_adj echo: write error: Permission denied ``` As from the output above, we can only set the minimum oom score of redis processes to 100. By modifying the test, make oom_score_adj only increase upwards and not decrease. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-01-06 16:03:47 +02:00
Oran Agra	c8649f8e85	Prevent pattern matching abuse (CVE-2024-31228)	2024-10-02 22:13:33 +03:00
Oran Agra	fe8de4313f	Fix lua bit.tohex (CVE-2024-31449) INT_MIN value must be explicitly checked, and cannot be negated.	2024-10-02 22:13:33 +03:00
debing.sun	2ad2548747	Fixed crashes due to missed slotToKeyInit() and missed expires_cursor reset (#13315 ) this PR fixes two crashes: 1. Fix missing slotToKeyInit() when using `flushdb async` under cluster mode. https://github.com/redis/redis/issues/13205 2. Fix missing expires_cursor reset when stopping active defrag in the middle of defragment. https://github.com/redis/redis/issues/13307 If we stop active defrag in the middle of defragging db->expires, if `expires_cursor` is not reset to 0, the next time we enable active defrag again, defragLaterStep(db, ...) will be entered. However, at this time, `db` has been reset to NULL, which results in crash. The affected code were removed by #11695 and #13058 in usntable, so we just need backport this to 7.2.	2024-06-18 18:02:22 +08:00
Yanqi Lv	464aad9ee7	fix wrong data type conversion in zrangeResultBeginStore (#13148 ) In `beginResultEmission`, -1 means the result length is not known in advance. But after #12185, if we pass -1 to `zrangeResultBeginStore`, it will convert to SIZE_MAX in `zsetTypeCreate` and try to `dictExpand`. Although `dictExpand` won't succeed because the size overflows, I think we'd better to avoid this wrong conversion. This bug can be triggered when the source of `zrangestore` doesn't exist or we use `zrangestore` command with `byscore` or `bylex`. The impact is that dst keys will be converted to use skiplist instead of listpack. (cherry picked from commit `bad33f8738`)	2024-05-19 09:12:35 +03:00
Matthew Douglass	1caaf581f1	Fix conversion of numbers in lua args to redis args (#13115 ) Since lua_Number is not explicitly an integer or a double, we need to make an effort to convert it as an integer when that's possible, since the string could later be used in a context that doesn't support scientific notation (e.g. 1e9 instead of 100000000). Since fpconv_dtoa converts numbers with the equivalent of `%f` or `%e`, which ever is shorter, this would break if we try to pass a long integer number to a command that takes integer. we'll get an implicit conversion to string in Lua, and then the parsing in getLongLongFromObjectOrReply will fail. ``` > eval "redis.call('hincrby', 'key', 'field', '1000000000')" 0 (nil) > eval "redis.call('hincrby', 'key', 'field', tonumber('1000000000'))" 0 (error) ERR value is not an integer or out of range script: ac99c32e4daf7e300d593085b611de261954a946, on @user_script:1. ``` Switch to using ll2string if the number can be safely represented as a long long. The problem was introduced in #10587 (Redis 7.2). closes #13113. --------- Co-authored-by: Binbin <binloveplay1314@qq.com> Co-authored-by: debing.sun <debing.sun@redis.com> Co-authored-by: Oran Agra <oran@redislabs.com> (cherry picked from commit `5fdaa53d20`)	2024-05-19 09:12:35 +03:00
debing.sun	e46ddde28d	Check user's oom_score_adj write permission for oom-score-adj test (#13111 ) `CONFIG SET oom-score-adj handles configuration failures` test failed in some CI jobs today. Failed CI: https://github.com/redis/redis/actions/runs/8152519326 Not sure why the github action's docker image perssions have changed, but the issue is similar to #12887, where we can't assume the range of oom_score_adj that a user can change. ## Solution: Modify the way of determining whether the current user has no privileges or not, instead of relying on whether the user id is 0 or not. (cherry picked from commit `9738ba9841`)	2024-05-19 09:12:35 +03:00
Binbin	8d3a1c978a	Increase tolerance range to block reprocess tests to avoid timing issues (#13053 ) These tests have all failed in daily CI: ``` * [err]: Blocking XREADGROUP for stream key that has clients blocked on stream - reprocessing command in tests/unit/type/stream-cgroups.tcl Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test) * [err]: BLPOP unblock but the key is expired and then block again - reprocessing command in tests/unit/type/list.tcl Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test) *** [err]: BZPOPMIN unblock but the key is expired and then block again - reprocessing command in tests/unit/type/zset.tcl Expected '1103' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test) ``` Increase the range to avoid failures, and improve the comment to be clearer. tests was introduced in #13004. (cherry picked from commit `32f44da510`)	2024-05-19 09:12:35 +03:00
debing.sun	c34e64848f	Fix crash due to merge of quicklist node introduced by #12955 (#13040 ) Fix two crash introducted by #12955 When a quicklist node can't be inserted and split, we eventually merge the current node with its neighboring nodes after inserting, and compress the current node and its siblings. 1. When the current node is merged with another node, the current node may become invalid and can no longer be used. Solution: let `_quicklistMergeNodes()` return the merged nodes. 3. If the current node is a LZF quicklist node, its recompress will be 1. If the split node can be merged with a sibling node to become head or tail, recompress may cause the head and tail to be compressed, which is not allowed. Solution: always recompress to 0 after merging. (cherry picked from commit `1e8dc1da0d`)	2024-05-19 09:12:35 +03:00
debing.sun	1be66b98f5	Prevent LSET command from causing quicklist plain node size to exceed 4GB (#12955 ) Fix #12864 The main reason for this crash is that when replacing a element of a quicklist packed node with lpReplace() method, if the final size is larger than 4GB, lpReplace() will fail and returns NULL, causing `node->entry` to be incorrectly set to NULL. Since the inserted data is not a large element, we can't just replace it like a large element, first quicklistInsertAfter() and then quicklistDelIndex(), because the current node may be merged and invalidated in quicklistInsertAfter(). The solution of this PR: When replacing a node fails (listpack exceeds 4GB), split the current node, create a new node to put in the middle, and try to merge them. This is the same as inserting a large element. In the worst case, its size will not exceed 4GB. (cherry picked from commit `1f00c951c2`)	2024-05-19 09:12:35 +03:00
Binbin	1bda797dc0	Fix blocking commands timeout is reset due to re-processing command (#13004 ) In #11012, we will reprocess command when client is unblocked on keys, in some blocking commands, for example, in the XREADGROUP BLOCK scenario, because of the re-processing command, we will recalculate the block timeout, causing the blocking time to be reset. This commit add a new CLIENT_REPROCESSING_COMMAND clent flag, explicitly let the command know that it is being re-processed, later in blockForKeys we will not reset the timeout. Affected BLOCK cases: - list / zset / stream, added test cases for each. Unaffected cases: - module (never re-process the commands). - WAIT / WAITAOF (never re-process the commands). Fixes #12998. (cherry picked from commit `492021db95`)	2024-05-19 09:12:35 +03:00
bentotten	099a2f40ec	When one shard, sole primary node marks potentially failed replica as FAIL instead of PFAIL (#12824 ) Fixes issue where a single primary cannot mark a replica as failed in a single-shard cluster. (cherry picked from commit `b3aaa0a136`)	2024-05-19 09:12:35 +03:00
Binbin	4bd614c52d	Add announced-endpoints test to all_tests and fix tls related tests (#12927 ) The test was introduced in #10745, but we forgot to add it to the test_helper.tcl, so our CI did not actually run it. This PR adds it and ensures it passes CI tests. (cherry picked from commit `b351a04b1e`)	2024-05-19 09:12:35 +03:00
Binbin	c4776cafcf	Un-register notification and server event when RedisModule_OnLoad fails (#12809 ) When we register notification or server event in RedisModule_OnLoad, but RedisModule_OnLoad eventually fails, triggering notification or server event will cause the server to crash. If the loading fails on a later stage of moduleLoad, we do call moduleUnload which handles all un-registration, but when it fails on the RedisModule_OnLoad call, we only un-register several specific things and these were missing: - moduleUnsubscribeNotifications - moduleUnregisterFilters - moduleUnsubscribeAllServerEvents Refactored the code to reuse the code from moduleUnload. Fixes #12808. (cherry picked from commit `d6f19539d2`)	2024-01-09 13:51:49 +02:00
Meir Shpilraien (Spielrein)	4cbf903083	Before evicted and before expired server events are not executed inside an execution unit. (#12733 ) Redis 7.2 (#9406) introduced a new modules event, `RedisModuleEvent_Key`. This new event allows the module to read the key data just before it is removed from the database (either deleted, expired, evicted, or overwritten). When the key is removed from the database, either by active expire or eviction. The new event was not called as part of an execution unit. This can cause an issue if the module registers a post notification job inside the event. This job will not be executed atomically with the expiration/eviction operation and will not replicated inside a Multi/Exec. Moreover, the post notification job will be executed right after the event where it is still not safe to perform any write operation, this will violate the promise that post notification job will be called atomically with the operation that triggered it and only when it is safe to write. This PR fixes the issue by wrapping each expiration/eviction of a key with an execution unit. This makes sure the entire operation will run atomically and all the post notification jobs will be executed at the end where it is safe to write. Tests were modified to verify the fix. (cherry picked from commit `0ffb9d2ea9`)	2024-01-09 13:51:49 +02:00
guybe7	ba113c8337	WAITAOF: Update fsynced_reploff_pending just before starting the initial AOFRW fork (#12620 ) If we set `fsynced_reploff_pending` in `startAppendOnly`, and the fork doesn't start immediately (e.g. there's another fork active at the time), any subsequent commands will increment `server.master_repl_offset`, but will not cause a fsync (given they were executed before the fork started, they just ended up in the RDB part of it) Therefore, any WAITAOF will wait on the new master_repl_offset, but it will time out because no fsync will be executed. Release notes: ``` WAITAOF could timeout in the absence of write traffic in case a new AOF is created and an AOFRW can't immediately start. This can happen by the appendonly config is changed at runtime, but also after FLUSHALL, and replica full sync. ``` (cherry picked from commit `bfa3931a04`)	2023-10-18 10:44:10 +03:00
Binbin	0b2cbf138e	Update sort_ro reply_schema to mention the null reply (#12534 ) Also added a test to cover this case, so this can cover the reply schemas check. (cherry picked from commit `9ce8c54d74`)	2023-09-06 20:56:15 +03:00
bodong.ybd	9e505e6cd8	Fix sort_ro get-keys function return wrong key number (#12522 ) Before： ``` 127.0.0.1:6379> command getkeys sort_ro key (empty array) 127.0.0.1:6379> ``` After: ``` 127.0.0.1:6379> command getkeys sort_ro key 1) "key" 127.0.0.1:6379> ``` (cherry picked from commit `b59f53efb3`)	2023-09-06 20:56:15 +03:00
Madelyn Olson	7c179f9bf4	Fixed a bug where sequential matching ACL rules weren't compressed (#12472 ) When adding a new ACL rule was added, an attempt was made to remove any "overlapping" rules. However, there when a match was found, the search was not resumed at the right location, but instead after the original position of the original command. For example, if the current rules were `-config +config\|get` and a rule `+config` was added. It would identify that `-config` was matched, but it would skip over `+config\|get`, leaving the compacted rule `-config +config`. This would be evaluated safely, but looks weird. This bug can only be triggered with subcommands, since that is the only way to have sequential matching rules. Resolves #12470. This is also only present in 7.2. I think there was also a minor risk of removing another valid rule, since it would start the search of the next command at an arbitrary point. I couldn't find a valid offset that would have cause a match using any of the existing commands that have subcommands with another command.	2023-08-10 09:58:53 +03:00
zhaozhao.zz	8226f39fb2	do not call handleClientsBlockedOnKeys inside yielding command (#12459 ) Fix the assertion when a busy script (timeout) signal ready keys (like LPUSH), and then an arbitrary client's `allow-busy` command steps into `handleClientsBlockedOnKeys` try wake up clients blocked on keys (like BLPOP). Reproduction process: 1. start a redis with aof `./redis-server --appendonly yes` 2. exec blpop `127.0.0.1:6379> blpop a 0` 3. use another client call a busy script and this script push the blocked key `127.0.0.1:6379> eval "redis.call('lpush','a','b') while(1) do end" 0` 4. user a new client call an allow-busy command like auth `127.0.0.1:6379> auth a` BTW, this issue also break the atomicity of script. This bug has been around for many years, the old versions only have the atomic problem, only 7.0/7.2 has the assertion problem. Co-authored-by: Oran Agra <oran@redislabs.com>	2023-08-05 09:52:03 +03:00
sundb	da9c2804a5	Avoid mostly harmless integer overflow in cjson (#12456 ) This PR mainly fixes a possible integer overflow in `json_append_string()`. When we use `cjson.encoding()` to encode a string larger than 2GB, at specific compilation flags, an integer overflow may occur leading to truncation, resulting in the part of the string larger than 2GB not being encoded. On the other hand, this overflow doesn't cause any read or write out-of-range or segment fault. 1) using -O0 for lua_cjson (`make LUA_DEBUG=yes`) In this case, `i` will overflow and leads to truncation. When `i` reaches `INT_MAX+1` and overflows to INT_MIN, when compared to len, `i` (1000000..00) is expanded to 64 bits signed integer (1111111.....000000) . At this point i will be greater than len and jump out of the loop, so `for (i = 0; i < len; i++)` will loop up to 2^31 times, and the part of larger than 2GB will be truncated. ```asm `i` => -0x24(%rbp) <+253>: addl $0x1,-0x24(%rbp) ; overflow if i large than 2^31 <+257>: mov -0x24(%rbp),%eax <+260>: movslq %eax,%rdx ; move a 32-bit value with sign extension into a 64-bit signed <+263>: mov -0x20(%rbp),%rax <+267>: cmp %rax,%rdx ; check `i < len` <+270>: jb 0x212600 <json_append_string+148> ``` 2) using -O2/-O3 for lua_cjson (`make LUA_DEBUG=no`, the default) In this case, because singed integer overflow is an undefined behavior, `i` will not overflow. `i` will be optimized by the compiler and use 64-bit registers for all subsequent instructions. ```asm <+180>: add $0x1,%rbx ; Using 64-bit register `rbx` for i++ <+184>: lea 0x1(%rdx),%rsi <+188>: mov %rsi,0x10(%rbp) <+192>: mov %al,(%rcx,%rdx,1) <+195>: cmp %rbx,(%rsp) ; check `i < len` <+199>: ja 0x20b63a <json_append_string+154> ``` 3) using 32bit Because `strbuf_ensure_empty_length()` preallocates memory of length (len * 6 + 2), in 32-bit `cjson.encode()` can only handle strings smaller than ((2 ^ 32) - 3 ) / 6. So 32bit is not affected. Also change `i` in `strbuf_append_string()` to `size_t`. Since its second argument `str` is taken from the `char2escape` string array which is never larger than 6, so `strbuf_append_string()` is not at risk of overflow (the bug was unreachable).	2023-08-05 07:57:06 +03:00
zhaozhao.zz	90ab91f00b	fix false success and a memory leak for ACL selector with bad parenthesis combination (#12452 ) When doing merge selector, we should check whether the merge has started (i.e., whether open_bracket_start is -1) every time. Otherwise, encountering an illegal selector pattern could succeed and also cause memory leaks, for example: ``` acl setuser test1 (+PING (+SELECT (+DEL ) ``` The above would leak memory and succeed with only DEL being applied, and would now error after the fix. Co-authored-by: Oran Agra <oran@redislabs.com>	2023-08-02 10:46:06 +03:00
Diego Lopez Recas	b653c759cd	Fix race condition in tests/unit/auth.tcl (#12444 ) Changing the masterauth while turning into a replica is racy. Turn into replica after changing the masterauth instead.	2023-08-01 18:03:33 +03:00
Harkrishn Patro	42985b00ea	Test coverage for incr/decr operation on robj encoding type optimization (#12435 ) Additional test coverage for incr/decr operation. integer number could be present in raw encoding format due to operation like append. A incr/decr operation following it optimize the string to int encoding format.	2023-07-25 16:43:31 -07:00
Harkrishn Patro	34b95f752c	Add test case for APPEND command usage on integer value (#12429 ) Add test coverage to validate object encoding update on APPEND command usage on a integer value	2023-07-24 18:25:50 -07:00
Makdon	2495b90a64	redis-cli: use previous hostip when not provided by redis cluster server (#12273 ) When the redis server cluster running on cluster-preferred-endpoint-type unknown-endpoint mode, and receive a request that should be redirected to another redis server node, it does not reply the hostip, but a empty host like MOVED 3999 :6381. The redis-cli would try to connect to an address without a host, which cause the issue: ``` 127.0.0.1:7002> set bar bar -> Redirected to slot [5061] located at :7000 Could not connect to Redis at :7000: No address associated with hostname Could not connect to Redis at :7000: No address associated with hostname not connected> exit ``` In this case, the redis-cli should use the previous hostip when there's no host provided by the server. --------- Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Madelyn Olson <madelynolson@gmail.com>	2023-07-20 15:31:06 -07:00
Oran Agra	936cfa464f	Lua cjson and cmsgpack integer overflow issues (CVE-2022-24834) (#12398 ) * Fix integer overflows due to using wrong integer size. * Add assertions / panic when overflow still happens. * Deletion of dead code to avoid need to maintain it * Some changes are not because of bugs, but rather paranoia. * Improve cmsgpack and cjson test coverage. Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>	2023-07-10 10:26:09 +03:00
Binbin	d56a7d9b05	Fix and increase tollerance of event loop test, add verbose logs (#12385 ) The test fails on freebsd CI: ``` *** [err]: stats: eventloop metrics in tests/unit/info.tcl Expected '31777' to be less than '16183' (context: type eval line 17 cmd {assert_lessthan $el_sum2 [expr $el_sum1+10000] } proc ::test) ``` The test added in #11963, fails on freebsd CI which is slow, increase tollerance and also add some verbose logs, now we can see these logs in verbose mode (for better views): ``` eventloop metrics cycle1: 12, cycle2: 15 eventloop metrics el_sum1: 315, el_sum2: 411 eventloop metrics cmd_sum1: 126, cmd_sum2: 137 [ok]: stats: eventloop metrics (111 ms) instantaneous metrics instantaneous_eventloop_cycles_per_sec: 8 instantaneous metrics instantaneous_eventloop_duration_usec: 55 [ok]: stats: instantaneous metrics (1603 ms) [ok]: stats: debug metrics (112 ms) ```	2023-07-05 09:32:30 +03:00
Lior Lahav	b7559d9f3b	Fix possible crash in command getkeys (#12380 ) When getKeysUsingKeySpecs processes a command with more than one key-spec, and called with a total of more than 256 keys, it'll call getKeysPrepareResult again, but since numkeys isn't updated, getKeysPrepareResult will not bother to copy key names from the old result (leaving these slots uninitialized). Furthermore, it did not consider the keys it already found when allocating more space. Co-authored-by: Oran Agra <oran@redislabs.com>	2023-07-03 12:45:18 +03:00
Binbin	6b57241fa8	Revert zrangeGenericCommand negative offset check (#12377 ) The negative offset check was added in #9052, we realized that this is a non-mandatory breaking change and we would like to add it only in 8.0. This reverts PR #9052, will be re-introduced later in 8.0.	2023-07-02 20:38:30 +03:00
judeng	07ed0eafa9	improve performance for scan command when matching pattern or data type (#12209 ) Optimized the performance of the SCAN command in a few ways: 1. Move the key filtering (by MATCH pattern) in the scan callback, so as to avoid collecting them for later filtering. 2. Reduce a many memory allocations and copying (use a reference to the original sds, instead of creating an robj, an excessive 2 mallocs and one string duplication) 3. Compare TYPE filter directly (as integers), instead of inefficient string compare per key. 4. fixed a small bug: when scan zset and hash types, maxiterations uses a more accurate number to avoid wrong double maxiterations. Changes postponed for a later version (8.0): 1. Prepare to move the TYPE filtering to the scan callback as well. this was put on hold since it has side effects that can be considered a breaking change, which is that we will not attempt to do lazy expire (delete) a key that was filtered by not matching the TYPE (changing it would mean TYPE filter starts behaving the same as MATCH filter already does in that respect). 2. when the specified key TYPE filter is an unknown type, server will reply a error immediately instead of doing a full scan that comes back empty handed. Benchmark result: For different scenarios, we obtained about 30% or more performance improvement. Co-authored-by: Oran Agra <oran@redislabs.com>	2023-06-27 16:43:46 +03:00
Chen Tianjie	22a29935ff	Support TLS service when "tls-cluster" is not enabled and persist both plain and TLS port in nodes.conf (#12233 ) Originally, when "tls-cluster" is enabled, `port` is set to TLS port. In order to support non-TLS clients, `pport` is used to propagate TCP port across cluster nodes. However when "tls-cluster" is disabled, `port` is set to TCP port, and `pport` is not used, which means the cluster cannot provide TLS service unless "tls-cluster" is on. ``` typedef struct { // ... uint16_t port; /* Latest known clients port (TLS or plain). / uint16_t pport; / Latest known clients plaintext port. Only used if the main clients port is for TLS. / // ... } clusterNode; ``` ``` typedef struct { // ... uint16_t port; / TCP base port number. / uint16_t pport; / Sender TCP plaintext port, if base port is TLS */ // ... } clusterMsg; ``` This PR renames `port` and `pport` in `clusterNode` to `tcp_port` and `tls_port`, to record both ports no matter "tls-cluster" is enabled or disabled. This allows to provide TLS service to clients when "tls-cluster" is disabled: when displaying cluster topology, or giving `MOVED` error, server can provide TLS or TCP port according to client's connection type, no matter what type of connection cluster bus is using. For backwards compatibility, `port` and `pport` in `clusterMsg` are preserved, when "tls-cluster" is enabled, `port` is set to TLS port and `pport` is set to TCP port, when "tls-cluster" is disabled, `port` is set to TCP port and `pport` is set to TLS port (instead of 0). Also, in the nodes.conf file, a new aux field displaying an extra port is added to complete the persisted info. We may have `tls_port=xxxxx` or `tcp_port=xxxxx` in the aux field, to complete the cluster topology, while the other port is stored in the normal `<ip>:<port>` field. The format is shown below. ``` <node-id> <ip>:<tcp_port>@<cport>,<hostname>,shard-id=...,tls-port=6379 myself,master - 0 0 0 connected 0-1000 ``` Or we can switch the position of two ports, both can be correctly resolved. ``` <node-id> <ip>:<tls_port>@<cport>,<hostname>,shard-id=...,tcp-port=6379 myself,master - 0 0 0 connected 0-1000 ```	2023-06-26 07:43:38 -07:00
Meir Shpilraien (Spielrein)	153f8f082e	Fix use after free on blocking RM_Call. (#12342 ) blocking RM_Call was introduced on: #11568, It allows a module to perform blocking commands and get the reply asynchronously.If the command gets block, a special promise CallReply is returned that allow to set the unblock handler. The unblock handler will be called when the command invocation finish and it gets, as input, the command real reply. The issue was that the real CallReply was created using a stack allocated RedisModuleCtx which is no longer available after the unblock handler finishes. So if the module keeps the CallReply after the unblock handler finished, the CallReply holds a pointer to invalid memory and will try to access it when the CallReply will be released. The solution is to create the CallReply with a NULL context to make it totally detached and can be freed freely when the module wants. Test was added to cover this case, running the test with valgrind before the fix shows the use after free error. With the fix, there are no valgrind errors. unrelated: adding a missing `$rd close` in many tests in that file.	2023-06-25 14:12:27 +03:00
guybe7	3230199920	Modules: Unblock from within a timer coverage (#12337 ) Apart from adding the missing coverage, this PR also adds `blockedBeforeSleep` that gathers all block-related functions from `beforeSleep` The order inside `blockedBeforeSleep` is different: now `handleClientsBlockedOnKeys` (which may unblock clients) is called before `processUnblockedClients` (which handles unblocked clients). It makes sense to have this order. There are no visible effects of the wrong ordering, except some cleanups of the now-unblocked client would have happen in the next `beforeSleep` (will now happen in the current one) The reason we even got into it is because i triggers an assertion in logresreq.c (breaking the assumption that `unblockClient` is called before actually flushing the reply to the socket): `handleClientsBlockedOnKeys` is called, then it calls `moduleUnblockClientOnKey`, which calls `moduleUnblockClient`, which adds the client to `moduleUnblockedClients` back to `beforeSleep`, we call `handleClientsWithPendingWritesUsingThreads`, it writes the data of buf to the client, so `client->bufpos` became 0 On the next `beforeSleep`, we call `moduleHandleBlockedClients`, which calls `unblockClient`, which calls `reqresAppendResponse`, triggering the assert. (because the `bufpos` is 0) - see https://github.com/redis/redis/pull/12301#discussion_r1226386716	2023-06-22 23:15:16 +03:00
Madelyn Olson	73cf0243df	Make nodename test more consistent (#12330 ) To determine when everything was stable, we couldn't just query the nodename since they aren't API visible by design. Instead, we were using a proxy piece of information which was bumping the epoch and waiting for everyone to observe that. This works for making source Node 0 and Node 1 had pinged, and Node 0 and Node 2 had pinged, but did not guarantee that Node 1 and Node 2 had pinged. Although unlikely, this can cause this failure message. To fix it I hijacked hostnames and used its validation that it has been propagated, since we know that it is stable. I also noticed while stress testing this sometimes the test took almost 4.5 seconds to finish, which is really close to the current 5 second limit of the log check, so I bumped that up as well just to make it a bit more consistent.	2023-06-20 18:00:55 -07:00
guybe7	20fa156067	Align RM_ReplyWithErrorFormat with RM_ReplyWithError (#12321 ) Introduced by https://github.com/redis/redis/pull/11923 (Redis 7.2 RC2) It's very weird and counterintuitive that `RM_ReplyWithError` requires the error-code without a hyphen while `RM_ReplyWithErrorFormat` requires either the error-code with a hyphen or no error-code at all ``` RedisModule_ReplyWithError(ctx, "BLA bla bla"); ``` vs. ``` RedisModule_ReplyWithErrorFormat(ctx, "-BLA %s", "bla bla"); ``` This commit aligns RM_ReplyWithErrorFormat to behvae like RM_ReplyWithError. it's a breaking changes but it's done before 7.2 goes GA.	2023-06-20 20:44:43 +03:00

1 2 3 4 5 ...

1358 Commits