redis

Commit Graph

Author	SHA1	Message	Date
Mincho Paskalev	7916e58211	Remove string cat usage in tcl tests in order to support tcl8.5	2025-07-06 15:12:33 +03:00
debing.sun	f35b72dd17	Fix out of bounds write in hyperloglog commands (CVE-2025-32023) Co-authored-by: oranagra <oran@redislabs.com>	2025-07-06 15:12:33 +03:00
Vitah Lin	e7cd611be1	Fix tls port update not reflected in CLUSTER SLOTS (#13966 ) ### Problem A previous PR (https://github.com/redis/redis/pull/13932) fixed the TCP port issue in CLUSTER SLOTS, but it seems the handling of the TLS port was overlooked. There is this comment in the `addNodeToNodeReply` function in the `cluster.c` file: ```c /* Report TLS ports to TLS client, and report non-TLS port to non-TLS client. */ addReplyLongLong(c, clusterNodeClientPort(node, shouldReturnTlsInfo())); addReplyBulkCBuffer(c, clusterNodeGetName(node), CLUSTER_NAMELEN); ``` ### Fixed This PR fixes the TLS port issue and adds relevant tests.	2025-05-27 15:39:09 +03:00
nesty92	89aee9556d	Fix incorrect lag due to trimming stream via XTRIM or XADD command (#13958 ) This PR fix the lag calculation by ensuring that when consumer group's last_id is behind the first entry, the consumer group's entries read is considered invalid and recalculated from the start of the stream Supplement to PR #13473 Close #13957 Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>	2025-05-27 15:39:09 +03:00
Stav-Levi	50e91ca7db	Fix port update not reflected in CLUSTER SLOTS (#13932 ) Close https://github.com/redis/redis/issues/13892 config set port cmd updates server.port. cluster slot retrieves information about cluster slots and their associated nodes. the fix updates this info when config set port cmd is done, so cluster slots cmd returns the right value.	2025-05-27 15:39:09 +03:00
YaacovHazan	42fb340ce4	Limiting output buffer for unauthenticated client (CVE-2025-21605) For unauthenticated clients the output buffer is limited to prevent them from abusing it by not reading the replies	2025-04-23 08:09:40 +00:00
Benson-li	a26774cee1	Fix potential infinite loop of RANDOMKEY during client pause (#13863 ) The bug mentioned in this [#13862](https://github.com/redis/redis/issues/13862) has been fixed. --------- Signed-off-by: li-benson <1260437731@qq.com> Signed-off-by: youngmore1024 <youngmore1024@outlook.com> Co-authored-by: youngmore1024 <youngmore1024@outlook.com>	2025-04-22 13:49:12 +03:00
nafraf	6e819e188b	Fix module loadex command crash due to invalid config (#13653 ) Fix to https://github.com/redis/redis/issues/13650 providing an invalid config to a module with datatype crashes when redis tries to unload the module due to the invalid config --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-04-22 13:49:12 +03:00
guybe7	8e72b42a9a	Modules: defrag CB should take robj, not sds (#13627 ) Added a log of the keyname in the test modules to reproduce the problem (tests crash without the fix)	2025-04-22 13:49:12 +03:00
debing.sun	30fe743638	Fix race condition issues between the main thread and module threads (#12817 ) Fix #12785 and other race condition issues. See the following isolated comments. The following report was obtained using SANITIZER thread. ```sh make SANITIZER=thread ./runtest-moduleapi --config io-threads 4 --config io-threads-do-reads yes --accurate ``` 1. Fixed thread-safe issue in RM_UnblockClient() Related discussion: https://github.com/redis/redis/pull/12817#issuecomment-1831181220 * When blocking a client in a module using `RM_BlockClientOnKeys()` or `RM_BlockClientOnKeysWithFlags()` with a timeout_callback, calling RM_UnblockClient() in module threads can lead to race conditions in `updateStatsOnUnblock()`. - Introduced: Version: 6.2 PR: #7491 - Touch: `server.stat_numcommands`, `cmd->latency_histogram`, `server.slowlog`, and `server.latency_events` - Harm Level: High Potentially corrupts the memory data of `cmd->latency_histogram`, `server.slowlog`, and `server.latency_events` - Solution: Differentiate whether the call to moduleBlockedClientTimedOut() comes from the module or the main thread. Since we can't know if RM_UnblockClient() comes from module threads, we always assume it does and let `updateStatsOnUnblock()` asynchronously update the unblock status. * When error reply is called in timeout_callback(), ctx is not thread-safe, eventually lead to race conditions in `afterErrorReply`. - Introduced: Version: 6.2 PR: #8217 - Touch `server.stat_total_error_replies`, `server.errors`, - Harm Level: High Potentially corrupts the memory data of `server.errors` - Solution: Make the ctx in `timeout_callback()` with `REDISMODULE_CTX_THREAD_SAFE`, and asynchronously reply errors to the client. 2. Made RM_Reply() family API thread-safe Related discussion: https://github.com/redis/redis/pull/12817#discussion_r1408707239 Call chain: `RM_Reply()` -> `_addReplyToBufferOrList()` -> touch server.current_client - Introduced: Version: 7.2.0 PR: #12326 - Harm Level: None Since the module fake client won't have the `CLIENT_PUSHING` flag, even if we touch server.current_client, we can still exit after `c->flags & CLIENT_PUSHING`. - Solution Checking `c->flags & CLIENT_PUSHING` earlier. 3. Made freeClient() thread-safe Fix #12785 - Introduced: Version: 4.0 Commit: `3fcf959e60` - Harm Level: Moderate * Trigger assertion It happens when the module thread calls freeClient while the io-thread is in progress, which just triggers an assertion, and doesn't make any race condiaions. * Touch `server.current_client`, `server.stat_clients_type_memory`, and `clientMemUsageBucket->clients`. It happens between the main thread and the module threads, may cause data corruption. 1. Error reset `server.current_client` to NULL, but theoretically this won't happen, because the module has already reset `server.current_client` to old value before entering freeClient. 2. corrupts `clientMemUsageBucket->clients` in updateClientMemUsageAndBucket(). 3. Causes server.stat_clients_type_memory memory statistics to be inaccurate. - Solution: * No longer counts memory usage on fake clients, to avoid updating `server.stat_clients_type_memory` in freeClient. * No longer resetting `server.current_client` in unlinkClient, because the fake client won't be evicted or disconnected in the mid of the process. * Judgment assertion `io_threads_op == IO_THREADS_OP_IDLE` only if c is not a fake client. 4. Fixed free client args without GIL Related discussion: https://github.com/redis/redis/pull/12817#discussion_r1408706695 When freeing retained strings in the module thread (refcount decr), or using them in some way (refcount incr), we should do so while holding the GIL, otherwise, they might be simultaneously freed while the main thread is processing the unblock client state. - Introduced: Version: 6.2.0 PR: #8141 - Harm Level: Low Trigger assertion or double free or memory leak. - Solution: Documenting that module API users need to ensure any access to these retained strings is done with the GIL locked 5. Fix adding fake client to server.clients_pending_write It will incorrectly log the memory usage for the fake client. Related discussion: https://github.com/redis/redis/pull/12817#issuecomment-1851899163 - Introduced: Version: 4.0 Commit: `9b01b64430` - Harm Level: None Only result in NOP - Solution: * Don't add fake client into server.clients_pending_write * Add c->conn assertion for updateClientMemUsageAndBucket() and updateClientMemoryUsage() to avoid same issue in the future. So now it will be the responsibility of the caller of both of them to avoid passing in fake client. 6. Fix calling RM_BlockedClientMeasureTimeStart() and RM_BlockedClientMeasureTimeEnd() without GIL - Introduced: Version: 6.2 PR: #7491 - Harm Level: Low Causes inaccuracies in command latency histogram and slow logs, but does not corrupt memory. - Solution: Module API users, if know that non-thread-safe APIs will be used in multi-threading, need to take responsibility for protecting them with their own locks instead of the GIL, as using the GIL is too expensive. ### Other issue 1. RM_Yield is not thread-safe, fixed via #12905. ### Summarize 1. Fix thread-safe issues for `RM_UnblockClient()`, `freeClient()` and `RM_Yield`, potentially preventing memory corruption, data disorder, or assertion. 2. Updated docs and module test to clarify module API users' responsibility for locking non-thread-safe APIs in multi-threading, such as RM_BlockedClientMeasureTimeStart/End(), RM_FreeString(), RM_RetainString(), and RM_HoldString(). ### About backpot to 7.2 1. The implement of (1) is not too satisfying, would like to get more eyes. 2. (2), (3) can be safely for backport 3. (4), (6) just modifying the module tests and updating the documentation, no need for a backpot. 4. (5) is harmless, no need for a backpot. --------- Co-authored-by: Oran Agra <oran@redislabs.com>	2025-04-22 12:35:44 +03:00
YaacovHazan	15e212bf69	Fix Read/Write key pattern selector (CVE-2024-51741) The '%' rule must contain one or both of R/W	2025-01-06 16:03:47 +02:00
Yossi Gottlieb	ffde2cf74d	Use cross-platform-actions for FreeBSD support. (#12732 ) This change overcomes many stability issues experienced with the vmactions action. We need to limit VMs to 8GB for better stability, as the 13GB default seems to hang them occasionally. Shell code has been simplified since this action seem to use `bash -e` which will abort on non-zero exit codes anyway.	2025-01-06 16:03:47 +02:00
debing.sun	a904067f6f	Fix incorrect lag due to trimming stream via XTRIM command (#13473 ) ## Describe When using the `XTRIM` command to trim a stream, it does not update the maximal tombstone (`max_deleted_entry_id`). This leads to an issue where the lag calculation incorrectly assumes that there are no tombstones after the consumer group's last_id, resulting in an inaccurate lag. The reason XTRIM doesn't need to update the maximal tombstone is that it always trims from the beginning of the stream. This means that it consistently changes the position of the first entry, leading to the following scenarios: 1) First entry trimmed after maximal tombstone: If the first entry is trimmed to a position after the maximal tombstone, all tombstones will be before the first entry, so they won't affect the consumer group's lag. 2) First entry trimmed before maximal tombstone: If the first entry is trimmed to a position before the maximal tombstone, the maximal tombstone will not be updated. ## Solution Therefore, this PR optimizes the lag calculation by ensuring that when both the consumer group's last_id and the maximal tombstone are behind the first entry, the consumer group's lag is always equal to the number of remaining elements in the stream. Supplement to PR https://github.com/redis/redis/pull/13338	2025-01-06 16:03:47 +02:00
debing.sun	3ce29e0b11	Pass extensions to node if extension processing is handled by it (#13465 ) This PR is based on the commits from PR https://github.com/valkey-io/valkey/pull/52. Ref: https://github.com/redis/redis/pull/12760 Close https://github.com/redis/redis/issues/13401 This PR will replace https://github.com/redis/redis/pull/13449 Fixes compatibilty of Redis cluster (7.2 - extensions enabled by default) with older Redis cluster (< 7.0 - extensions not handled) . With some of the extensions enabled by default in 7.2 version, new nodes running 7.2 and above start sending out larger clusterbus message payload including the ping extensions. This caused an incompatibility with node running engine versions < 7.0. Old nodes (< 7.0) would receive the payload from new nodes (> 7.2) would observe a payload length (totlen) > (estlen) and would perform an early exit and won't process the message. This fix does the following things: 1. Always set `CLUSTERMSG_FLAG0_EXT_DATA`, because during the meet phase, we do not know whether the connected node supports ext data, we need to make sure that it knows and send back its ext data if it has. 2. If another node does not support ext data, we will not send it ext data to avoid the handshake failure due to the incorrect payload length. Note: A successful `PING`/`PONG` is required as a sender for a given node to be marked as `CLUSTERMSG_FLAG0_EXT_DATA` and then extensions message will be sent to it. This could cause a slight delay in receiving the extensions message(s). --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> Co-authored-by: Harkrishn Patro <harkrisp@amazon.com> --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>	2025-01-06 16:03:47 +02:00
debing.sun	f4a6721dfd	Fix CLUSTER SHARDS command returns empty array (#13422 ) Close https://github.com/redis/redis/issues/13414 When the cluster's master node fails and is switched to another node, the first node in the shard node list (the old master) is no longer valid. Add a new method clusterGetMasterFromShard() to obtain the current master.	2025-01-06 16:03:47 +02:00
debing.sun	0e06e67a36	Fix incorrect lag field in XINFO when tombstone is after the last_id of consume group (#13338 ) Fix #13337 Ths PR fixes fixed two bugs that caused lag calculation errors. 1. When the latest tombstone is before the first entry, the tombstone may stil be after the last id of consume group. 2. When a tombstone is after the last id of consume group, the group's counter will be invalid, we should caculate the entries_read by using estimates.	2025-01-06 16:03:47 +02:00
Oran Agra	045617e10a	Fix possible crash due to OOM panic on invalid command (#13380 ) getKeysUsingKeySpece had the range check AFTER the allocation, of the keys buffer, which could lead to an OOM panic when invalid arguments are provided, leading to an overflow. The allocated memory is only used after the range check, so there's no risk of buffer overrun. The OOM panic can happen on 32bit builds, or 64 builds running on systems with less than 4GB of RAM, and is reachable via the COMMAND GETKEYSANDFLAGS, and ACL key name validation.	2025-01-06 16:03:47 +02:00
debing.sun	4a92d66ca2	Don't keep global replication buffer reference for replicas marked CLIENT_CLOSE_ASAP (#13363 ) In certain situations, we might generate a large number of propagates (e.g., multi/exec, Lua script, or a single command generating tons of propagations) within an event loop. During the process of propagating to a replica, if the replica is disconnected(marked as CLIENT_CLOSE_ASAP) due to exceeding the output buffer limit, we should remove its reference to the global replication buffer to avoid the global replication buffer being unable to be properly trimmed due to being referenced. --------- Co-authored-by: oranagra <oran@redislabs.com>	2025-01-06 16:03:47 +02:00
gms	2c8ae06b67	Fix crash due to unblock client during slot migration (#13311 ) In #13224, we found a crash during cluster slot migration but don't know why. So i check all the return C_OK in processCommand to see if we are missing some duration reset and see this. This fix is like #12247, when we reject the command, we should reset the duration. I test it and verify it can fix #13224. So the reason may because we are using stream block and then during the slot migration, it got a redirect and then crash the server. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-01-06 16:03:47 +02:00
debing.sun	9f823685d2	Have consistent behavior of SPUBLISH within multi/exec like regular command (#13276 ) This PR is based on the commits from PR #12944. Allow SPUBLISH command within multi/exec on replica Behavior on unstable: ``` 127.0.0.1:6380> CLUSTER NODES 39ce8aa20f1f0d91f1a88d976ee1926dfefcdf1a 127.0.0.1:6380@16380 myself,slave 8b0feb120b68aac489d6a5af9c77dc40d71bc792 0 0 0 connected 8b0feb120b68aac489d6a5af9c77dc40d71bc792 127.0.0.1:6379@16379 master - 0 1705091681202 0 connected 0-16383 127.0.0.1:6380> SPUBLISH hello world (integer) 0 127.0.0.1:6380> MULTI OK 127.0.0.1:6380(TX)> SPUBLISH hello world QUEUED 127.0.0.1:6380(TX)> EXEC (error) MOVED 866 127.0.0.1:6379 ``` With this change: ``` 127.0.0.1:6380> SPUBLISH hello world (integer) 0 127.0.0.1:6380> MULTI OK 127.0.0.1:6380(TX)> SPUBLISH hello world QUEUED 127.0.0.1:6380(TX)> EXEC 1) (integer) 0 ``` --------- Co-authored-by: Harkrishn Patro <harkrisp@amazon.com> Co-authored-by: oranagra <oran@redislabs.com>	2025-01-06 16:03:47 +02:00
sundb	1936746e63	Fix oom-score-adj test due to no permission (#12887 ) Fix #12792 On ubuntu 23(lunar), non-root users will not be allowed to change the oom_score_adj of a process to a value that is too low. Since terminal's default oom_score_adj is 200, if we run the test on terminal, we won't be able to set the oom_score_adj of the redis process to 9 or 22, which is too low. Reproduction on ubuntu 23(lunar) terminal: ```sh $ cat /proc/`pgrep redis-server`/oom_score_adj 200 $ echo 100 > /proc/`pgrep redis-server`/oom_score_adj # success without error $ echo 99 > /proc/`pgrep redis-server`/oom_score_adj echo: write error: Permission denied ``` As from the output above, we can only set the minimum oom score of redis processes to 100. By modifying the test, make oom_score_adj only increase upwards and not decrease. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-01-06 16:03:47 +02:00
Oran Agra	c8649f8e85	Prevent pattern matching abuse (CVE-2024-31228)	2024-10-02 22:13:33 +03:00
Oran Agra	fe8de4313f	Fix lua bit.tohex (CVE-2024-31449) INT_MIN value must be explicitly checked, and cannot be negated.	2024-10-02 22:13:33 +03:00
debing.sun	2ad2548747	Fixed crashes due to missed slotToKeyInit() and missed expires_cursor reset (#13315 ) this PR fixes two crashes: 1. Fix missing slotToKeyInit() when using `flushdb async` under cluster mode. https://github.com/redis/redis/issues/13205 2. Fix missing expires_cursor reset when stopping active defrag in the middle of defragment. https://github.com/redis/redis/issues/13307 If we stop active defrag in the middle of defragging db->expires, if `expires_cursor` is not reset to 0, the next time we enable active defrag again, defragLaterStep(db, ...) will be entered. However, at this time, `db` has been reset to NULL, which results in crash. The affected code were removed by #11695 and #13058 in usntable, so we just need backport this to 7.2.	2024-06-18 18:02:22 +08:00
Yanqi Lv	464aad9ee7	fix wrong data type conversion in zrangeResultBeginStore (#13148 ) In `beginResultEmission`, -1 means the result length is not known in advance. But after #12185, if we pass -1 to `zrangeResultBeginStore`, it will convert to SIZE_MAX in `zsetTypeCreate` and try to `dictExpand`. Although `dictExpand` won't succeed because the size overflows, I think we'd better to avoid this wrong conversion. This bug can be triggered when the source of `zrangestore` doesn't exist or we use `zrangestore` command with `byscore` or `bylex`. The impact is that dst keys will be converted to use skiplist instead of listpack. (cherry picked from commit `bad33f8738`)	2024-05-19 09:12:35 +03:00
Binbin	439b8da475	Fix redis-check-aof incorrectly considering data in manifest format as MP-AOF (#12958 ) The check in fileIsManifest misjudged the manifest file. For example, if resp aof contains "file", it will be considered a manifest file and the check will fail: ``` *3 $3 set $4 file $4 file ``` In #12951, if the preamble aof also contains it, it will also fail. Fixes #12951. the bug was happening if the the word "file" is mentioned in the first 1024 lines of the AOF. and now as soon as it finds a non-comment line it'll break (if it contains "file" or doesn't) (cherry picked from commit `da727ad445`)	2024-05-19 09:12:35 +03:00
Matthew Douglass	1caaf581f1	Fix conversion of numbers in lua args to redis args (#13115 ) Since lua_Number is not explicitly an integer or a double, we need to make an effort to convert it as an integer when that's possible, since the string could later be used in a context that doesn't support scientific notation (e.g. 1e9 instead of 100000000). Since fpconv_dtoa converts numbers with the equivalent of `%f` or `%e`, which ever is shorter, this would break if we try to pass a long integer number to a command that takes integer. we'll get an implicit conversion to string in Lua, and then the parsing in getLongLongFromObjectOrReply will fail. ``` > eval "redis.call('hincrby', 'key', 'field', '1000000000')" 0 (nil) > eval "redis.call('hincrby', 'key', 'field', tonumber('1000000000'))" 0 (error) ERR value is not an integer or out of range script: ac99c32e4daf7e300d593085b611de261954a946, on @user_script:1. ``` Switch to using ll2string if the number can be safely represented as a long long. The problem was introduced in #10587 (Redis 7.2). closes #13113. --------- Co-authored-by: Binbin <binloveplay1314@qq.com> Co-authored-by: debing.sun <debing.sun@redis.com> Co-authored-by: Oran Agra <oran@redislabs.com> (cherry picked from commit `5fdaa53d20`)	2024-05-19 09:12:35 +03:00
debing.sun	e46ddde28d	Check user's oom_score_adj write permission for oom-score-adj test (#13111 ) `CONFIG SET oom-score-adj handles configuration failures` test failed in some CI jobs today. Failed CI: https://github.com/redis/redis/actions/runs/8152519326 Not sure why the github action's docker image perssions have changed, but the issue is similar to #12887, where we can't assume the range of oom_score_adj that a user can change. ## Solution: Modify the way of determining whether the current user has no privileges or not, instead of relying on whether the user id is 0 or not. (cherry picked from commit `9738ba9841`)	2024-05-19 09:12:35 +03:00
Binbin	8d3a1c978a	Increase tolerance range to block reprocess tests to avoid timing issues (#13053 ) These tests have all failed in daily CI: ``` * [err]: Blocking XREADGROUP for stream key that has clients blocked on stream - reprocessing command in tests/unit/type/stream-cgroups.tcl Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test) * [err]: BLPOP unblock but the key is expired and then block again - reprocessing command in tests/unit/type/list.tcl Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test) *** [err]: BZPOPMIN unblock but the key is expired and then block again - reprocessing command in tests/unit/type/zset.tcl Expected '1103' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test) ``` Increase the range to avoid failures, and improve the comment to be clearer. tests was introduced in #13004. (cherry picked from commit `32f44da510`)	2024-05-19 09:12:35 +03:00
debing.sun	c34e64848f	Fix crash due to merge of quicklist node introduced by #12955 (#13040 ) Fix two crash introducted by #12955 When a quicklist node can't be inserted and split, we eventually merge the current node with its neighboring nodes after inserting, and compress the current node and its siblings. 1. When the current node is merged with another node, the current node may become invalid and can no longer be used. Solution: let `_quicklistMergeNodes()` return the merged nodes. 3. If the current node is a LZF quicklist node, its recompress will be 1. If the split node can be merged with a sibling node to become head or tail, recompress may cause the head and tail to be compressed, which is not allowed. Solution: always recompress to 0 after merging. (cherry picked from commit `1e8dc1da0d`)	2024-05-19 09:12:35 +03:00
debing.sun	1be66b98f5	Prevent LSET command from causing quicklist plain node size to exceed 4GB (#12955 ) Fix #12864 The main reason for this crash is that when replacing a element of a quicklist packed node with lpReplace() method, if the final size is larger than 4GB, lpReplace() will fail and returns NULL, causing `node->entry` to be incorrectly set to NULL. Since the inserted data is not a large element, we can't just replace it like a large element, first quicklistInsertAfter() and then quicklistDelIndex(), because the current node may be merged and invalidated in quicklistInsertAfter(). The solution of this PR: When replacing a node fails (listpack exceeds 4GB), split the current node, create a new node to put in the middle, and try to merge them. This is the same as inserting a large element. In the worst case, its size will not exceed 4GB. (cherry picked from commit `1f00c951c2`)	2024-05-19 09:12:35 +03:00
Binbin	1bda797dc0	Fix blocking commands timeout is reset due to re-processing command (#13004 ) In #11012, we will reprocess command when client is unblocked on keys, in some blocking commands, for example, in the XREADGROUP BLOCK scenario, because of the re-processing command, we will recalculate the block timeout, causing the blocking time to be reset. This commit add a new CLIENT_REPROCESSING_COMMAND clent flag, explicitly let the command know that it is being re-processed, later in blockForKeys we will not reset the timeout. Affected BLOCK cases: - list / zset / stream, added test cases for each. Unaffected cases: - module (never re-process the commands). - WAIT / WAITAOF (never re-process the commands). Fixes #12998. (cherry picked from commit `492021db95`)	2024-05-19 09:12:35 +03:00
bentotten	099a2f40ec	When one shard, sole primary node marks potentially failed replica as FAIL instead of PFAIL (#12824 ) Fixes issue where a single primary cannot mark a replica as failed in a single-shard cluster. (cherry picked from commit `b3aaa0a136`)	2024-05-19 09:12:35 +03:00
Binbin	4bd614c52d	Add announced-endpoints test to all_tests and fix tls related tests (#12927 ) The test was introduced in #10745, but we forgot to add it to the test_helper.tcl, so our CI did not actually run it. This PR adds it and ensures it passes CI tests. (cherry picked from commit `b351a04b1e`)	2024-05-19 09:12:35 +03:00
Binbin	c4776cafcf	Un-register notification and server event when RedisModule_OnLoad fails (#12809 ) When we register notification or server event in RedisModule_OnLoad, but RedisModule_OnLoad eventually fails, triggering notification or server event will cause the server to crash. If the loading fails on a later stage of moduleLoad, we do call moduleUnload which handles all un-registration, but when it fails on the RedisModule_OnLoad call, we only un-register several specific things and these were missing: - moduleUnsubscribeNotifications - moduleUnregisterFilters - moduleUnsubscribeAllServerEvents Refactored the code to reuse the code from moduleUnload. Fixes #12808. (cherry picked from commit `d6f19539d2`)	2024-01-09 13:51:49 +02:00
Meir Shpilraien (Spielrein)	4cbf903083	Before evicted and before expired server events are not executed inside an execution unit. (#12733 ) Redis 7.2 (#9406) introduced a new modules event, `RedisModuleEvent_Key`. This new event allows the module to read the key data just before it is removed from the database (either deleted, expired, evicted, or overwritten). When the key is removed from the database, either by active expire or eviction. The new event was not called as part of an execution unit. This can cause an issue if the module registers a post notification job inside the event. This job will not be executed atomically with the expiration/eviction operation and will not replicated inside a Multi/Exec. Moreover, the post notification job will be executed right after the event where it is still not safe to perform any write operation, this will violate the promise that post notification job will be called atomically with the operation that triggered it and only when it is safe to write. This PR fixes the issue by wrapping each expiration/eviction of a key with an execution unit. This makes sure the entire operation will run atomically and all the post notification jobs will be executed at the end where it is safe to write. Tests were modified to verify the fix. (cherry picked from commit `0ffb9d2ea9`)	2024-01-09 13:51:49 +02:00
guybe7	ba113c8337	WAITAOF: Update fsynced_reploff_pending just before starting the initial AOFRW fork (#12620 ) If we set `fsynced_reploff_pending` in `startAppendOnly`, and the fork doesn't start immediately (e.g. there's another fork active at the time), any subsequent commands will increment `server.master_repl_offset`, but will not cause a fsync (given they were executed before the fork started, they just ended up in the RDB part of it) Therefore, any WAITAOF will wait on the new master_repl_offset, but it will time out because no fsync will be executed. Release notes: ``` WAITAOF could timeout in the absence of write traffic in case a new AOF is created and an AOFRW can't immediately start. This can happen by the appendonly config is changed at runtime, but also after FLUSHALL, and replica full sync. ``` (cherry picked from commit `bfa3931a04`)	2023-10-18 10:44:10 +03:00
Binbin	0b2cbf138e	Update sort_ro reply_schema to mention the null reply (#12534 ) Also added a test to cover this case, so this can cover the reply schemas check. (cherry picked from commit `9ce8c54d74`)	2023-09-06 20:56:15 +03:00
bodong.ybd	9e505e6cd8	Fix sort_ro get-keys function return wrong key number (#12522 ) Before： ``` 127.0.0.1:6379> command getkeys sort_ro key (empty array) 127.0.0.1:6379> ``` After: ``` 127.0.0.1:6379> command getkeys sort_ro key 1) "key" 127.0.0.1:6379> ``` (cherry picked from commit `b59f53efb3`)	2023-09-06 20:56:15 +03:00
Madelyn Olson	7c179f9bf4	Fixed a bug where sequential matching ACL rules weren't compressed (#12472 ) When adding a new ACL rule was added, an attempt was made to remove any "overlapping" rules. However, there when a match was found, the search was not resumed at the right location, but instead after the original position of the original command. For example, if the current rules were `-config +config\|get` and a rule `+config` was added. It would identify that `-config` was matched, but it would skip over `+config\|get`, leaving the compacted rule `-config +config`. This would be evaluated safely, but looks weird. This bug can only be triggered with subcommands, since that is the only way to have sequential matching rules. Resolves #12470. This is also only present in 7.2. I think there was also a minor risk of removing another valid rule, since it would start the search of the next command at an arbitrary point. I couldn't find a valid offset that would have cause a match using any of the existing commands that have subcommands with another command.	2023-08-10 09:58:53 +03:00
Binbin	6abfda54c3	Fix flaky SENTINEL RESET test (#12437 ) After SENTINEL RESET, sometimes the sentinel can sense the master again, causing the test to fail. Here we give it a few more chances.	2023-08-10 08:58:52 +03:00
zhaozhao.zz	8226f39fb2	do not call handleClientsBlockedOnKeys inside yielding command (#12459 ) Fix the assertion when a busy script (timeout) signal ready keys (like LPUSH), and then an arbitrary client's `allow-busy` command steps into `handleClientsBlockedOnKeys` try wake up clients blocked on keys (like BLPOP). Reproduction process: 1. start a redis with aof `./redis-server --appendonly yes` 2. exec blpop `127.0.0.1:6379> blpop a 0` 3. use another client call a busy script and this script push the blocked key `127.0.0.1:6379> eval "redis.call('lpush','a','b') while(1) do end" 0` 4. user a new client call an allow-busy command like auth `127.0.0.1:6379> auth a` BTW, this issue also break the atomicity of script. This bug has been around for many years, the old versions only have the atomic problem, only 7.0/7.2 has the assertion problem. Co-authored-by: Oran Agra <oran@redislabs.com>	2023-08-05 09:52:03 +03:00
sundb	da9c2804a5	Avoid mostly harmless integer overflow in cjson (#12456 ) This PR mainly fixes a possible integer overflow in `json_append_string()`. When we use `cjson.encoding()` to encode a string larger than 2GB, at specific compilation flags, an integer overflow may occur leading to truncation, resulting in the part of the string larger than 2GB not being encoded. On the other hand, this overflow doesn't cause any read or write out-of-range or segment fault. 1) using -O0 for lua_cjson (`make LUA_DEBUG=yes`) In this case, `i` will overflow and leads to truncation. When `i` reaches `INT_MAX+1` and overflows to INT_MIN, when compared to len, `i` (1000000..00) is expanded to 64 bits signed integer (1111111.....000000) . At this point i will be greater than len and jump out of the loop, so `for (i = 0; i < len; i++)` will loop up to 2^31 times, and the part of larger than 2GB will be truncated. ```asm `i` => -0x24(%rbp) <+253>: addl $0x1,-0x24(%rbp) ; overflow if i large than 2^31 <+257>: mov -0x24(%rbp),%eax <+260>: movslq %eax,%rdx ; move a 32-bit value with sign extension into a 64-bit signed <+263>: mov -0x20(%rbp),%rax <+267>: cmp %rax,%rdx ; check `i < len` <+270>: jb 0x212600 <json_append_string+148> ``` 2) using -O2/-O3 for lua_cjson (`make LUA_DEBUG=no`, the default) In this case, because singed integer overflow is an undefined behavior, `i` will not overflow. `i` will be optimized by the compiler and use 64-bit registers for all subsequent instructions. ```asm <+180>: add $0x1,%rbx ; Using 64-bit register `rbx` for i++ <+184>: lea 0x1(%rdx),%rsi <+188>: mov %rsi,0x10(%rbp) <+192>: mov %al,(%rcx,%rdx,1) <+195>: cmp %rbx,(%rsp) ; check `i < len` <+199>: ja 0x20b63a <json_append_string+154> ``` 3) using 32bit Because `strbuf_ensure_empty_length()` preallocates memory of length (len * 6 + 2), in 32-bit `cjson.encode()` can only handle strings smaller than ((2 ^ 32) - 3 ) / 6. So 32bit is not affected. Also change `i` in `strbuf_append_string()` to `size_t`. Since its second argument `str` is taken from the `char2escape` string array which is never larger than 6, so `strbuf_append_string()` is not at risk of overflow (the bug was unreachable).	2023-08-05 07:57:06 +03:00
zhaozhao.zz	90ab91f00b	fix false success and a memory leak for ACL selector with bad parenthesis combination (#12452 ) When doing merge selector, we should check whether the merge has started (i.e., whether open_bracket_start is -1) every time. Otherwise, encountering an illegal selector pattern could succeed and also cause memory leaks, for example: ``` acl setuser test1 (+PING (+SELECT (+DEL ) ``` The above would leak memory and succeed with only DEL being applied, and would now error after the fix. Co-authored-by: Oran Agra <oran@redislabs.com>	2023-08-02 10:46:06 +03:00
Diego Lopez Recas	b653c759cd	Fix race condition in tests/unit/auth.tcl (#12444 ) Changing the masterauth while turning into a replica is racy. Turn into replica after changing the masterauth instead.	2023-08-01 18:03:33 +03:00
DarrenJiang13	6abb3c4038	change log match to line match in tcl sanitizer_errors_from_file. (#12446 ) In the tcl foreach loop, the function should compare line rather than the whole file.	2023-07-30 08:48:29 +03:00
Harkrishn Patro	42985b00ea	Test coverage for incr/decr operation on robj encoding type optimization (#12435 ) Additional test coverage for incr/decr operation. integer number could be present in raw encoding format due to operation like append. A incr/decr operation following it optimize the string to int encoding format.	2023-07-25 16:43:31 -07:00
Harkrishn Patro	34b95f752c	Add test case for APPEND command usage on integer value (#12429 ) Add test coverage to validate object encoding update on APPEND command usage on a integer value	2023-07-24 18:25:50 -07:00
Makdon	2495b90a64	redis-cli: use previous hostip when not provided by redis cluster server (#12273 ) When the redis server cluster running on cluster-preferred-endpoint-type unknown-endpoint mode, and receive a request that should be redirected to another redis server node, it does not reply the hostip, but a empty host like MOVED 3999 :6381. The redis-cli would try to connect to an address without a host, which cause the issue: ``` 127.0.0.1:7002> set bar bar -> Redirected to slot [5061] located at :7000 Could not connect to Redis at :7000: No address associated with hostname Could not connect to Redis at :7000: No address associated with hostname not connected> exit ``` In this case, the redis-cli should use the previous hostip when there's no host provided by the server. --------- Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Madelyn Olson <madelynolson@gmail.com>	2023-07-20 15:31:06 -07:00
Oran Agra	936cfa464f	Lua cjson and cmsgpack integer overflow issues (CVE-2022-24834) (#12398 ) * Fix integer overflows due to using wrong integer size. * Add assertions / panic when overflow still happens. * Deletion of dead code to avoid need to maintain it * Some changes are not because of bugs, but rather paranoia. * Improve cmsgpack and cjson test coverage. Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>	2023-07-10 10:26:09 +03:00

1 2 3 4 5 ...

2153 Commits