redis

Commit Graph

Author	SHA1	Message	Date
YaacovHazan	5a752e1978	Redis 7.2.10	2025-07-06 15:12:33 +03:00
Mincho Paskalev	7916e58211	Remove string cat usage in tcl tests in order to support tcl8.5	2025-07-06 15:12:33 +03:00
Ozan Tezcan	c76d618209	Retry accept() even if accepted connection reports an error (CVE-2025-48367) In case of accept4() returns an error, we should check errno value and decide if we should retry accept4() without waiting next event loop iteration.	2025-07-06 15:12:33 +03:00
debing.sun	f35b72dd17	Fix out of bounds write in hyperloglog commands (CVE-2025-32023) Co-authored-by: oranagra <oran@redislabs.com>	2025-07-06 15:12:33 +03:00
YaacovHazan	080b99d982	Redis 7.2.9	2025-05-27 15:39:09 +03:00
YaacovHazan	d0eeee6e31	Check length of AOF file name in redis-check-aof (CVE-2025-27151) Ensure that the length of the input file name does not exceed PATH_MAX	2025-05-27 15:39:09 +03:00
debing.sun	35eff3d49a	Resolve bounds checks on cluster_legacy.c (#13970 ) Based on https://github.com/valkey-io/valkey/pull/1463 and https://github.com/valkey-io/valkey/pull/1481 In the failure of fully CI(https://github.com/redis/redis/actions/runs/14595343452/job/40979173087?pr=13965) in version 7.0 we are getting a number of errors like: ``` array subscript ‘clusterMsg[0]’ is partly outside array bounds of ‘unsigned char[2272]’ ``` Which is basically GCC telling us that we have an object which is longer than the underlying storage of the allocation. We actually do this a lot, but GCC is generally not aware of how big the underlying allocation is, so it doesn't throw this error. We are specifically getting this error because the msgBlock can be of variable length depending on the type of message, but GCC assumes it's the longest one possible. The solution I went with here was make the message type optional, so that it wasn't included in the size. I think this also makes some sense, since it's really just a helper for us to easily cast the object around. This compilation warning only occurs in version 7.2, because in [this PR](https://github.com/redis/redis/pull/13073), we started passing `-flto` to `CFLAGS` by default. It seems that in this case, GCC is unable to detect such warnings. However, this change is not present in version 7.2. So, to reproduce this compilation warning in versions after 7.2, we can pass `OPTIMIZATION=-O2` manually. --------- Co-authored-by: madolson <34459052+madolson@users.noreply.github.com>	2025-05-27 15:39:09 +03:00
Vitah Lin	e7cd611be1	Fix tls port update not reflected in CLUSTER SLOTS (#13966 ) ### Problem A previous PR (https://github.com/redis/redis/pull/13932) fixed the TCP port issue in CLUSTER SLOTS, but it seems the handling of the TLS port was overlooked. There is this comment in the `addNodeToNodeReply` function in the `cluster.c` file: ```c /* Report TLS ports to TLS client, and report non-TLS port to non-TLS client. */ addReplyLongLong(c, clusterNodeClientPort(node, shouldReturnTlsInfo())); addReplyBulkCBuffer(c, clusterNodeGetName(node), CLUSTER_NAMELEN); ``` ### Fixed This PR fixes the TLS port issue and adds relevant tests.	2025-05-27 15:39:09 +03:00
nesty92	89aee9556d	Fix incorrect lag due to trimming stream via XTRIM or XADD command (#13958 ) This PR fix the lag calculation by ensuring that when consumer group's last_id is behind the first entry, the consumer group's entries read is considered invalid and recalculated from the start of the stream Supplement to PR #13473 Close #13957 Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>	2025-05-27 15:39:09 +03:00
Stav-Levi	50e91ca7db	Fix port update not reflected in CLUSTER SLOTS (#13932 ) Close https://github.com/redis/redis/issues/13892 config set port cmd updates server.port. cluster slot retrieves information about cluster slots and their associated nodes. the fix updates this info when config set port cmd is done, so cluster slots cmd returns the right value.	2025-05-27 15:39:09 +03:00
YaacovHazan	31d93c5928	Release 7.2.8 (#13965 ) CI / test-ubuntu-latest (push) Failing after 0s Details CI / test-sanitizer-address (push) Failing after 0s Details CI / build-debian-old (push) Failing after 0s Details CI / build-32bit (push) Failing after 0s Details CI / build-libc-malloc (push) Failing after 0s Details CI / build-centos-jemalloc (push) Failing after 0s Details CI / build-old-chain-jemalloc (push) Failing after 0s Details External Server Tests / test-external-standalone (push) Failing after 0s Details External Server Tests / test-external-cluster (push) Failing after 0s Details External Server Tests / test-external-nodebug (push) Failing after 0s Details Spellcheck / Spellcheck (push) Failing after 1s Details CI / build-macos-latest (push) Has been cancelled Details	2025-04-23 14:54:38 +03:00
YaacovHazan	62b766aef6	Redis 7.2.8	2025-04-23 08:11:54 +00:00
YaacovHazan	42fb340ce4	Limiting output buffer for unauthenticated client (CVE-2025-21605) For unauthenticated clients the output buffer is limited to prevent them from abusing it by not reading the replies	2025-04-23 08:09:40 +00:00
guybe7	21d5e64ace	Module unblock on keys: updateStatsOnUnblock is called twice (#13405 ) This commit reverts the deletion of the condition `!bc->blocked_on_keys` that was accidentally introduced by https://github.com/redis/redis/pull/12817. In case a blocked-on-keys module client is unblocked both `moduleUnblockClientOnKey` and `moduleHandleBlockedClients` are called which resulted in `updateStatsOnUnblock` being called twice Now, that `moduleHandleBlockedClients` doesn't call `updateStatsOnUnblock` in case of unblocked module key-blocked clients, in the unlikely event that the module decides to call `RM_UnblockClient` on a key-blocked client, we need to call `updateStatsOnUnblock` from within `moduleBlockedClientTimedOut`, but since `moduleBlockedClientTimedOut` is not tread-safe we can't call it directly from withing `RM_UnblockClient`. Added a new flag `blocked_on_keys_explicit_unblock` for that specific case, which will cause `moduleBlockedClientTimedOut` to be called from `moduleHandleBlockedClients` (which is only called from the main thread) --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-04-22 20:56:52 +08:00
Vitah Lin	067a0dac61	Fix oldTC CI dk.archive.ubuntu.com could not connect (#13961 )	2025-04-22 20:20:44 +08:00
YaacovHazan	9af9c4deff	Avoid sanitizer warning for stable CI	2025-04-22 14:54:27 +03:00
Jason	779a20058b	Ignore shardId updates from replica nodes (#13877 ) Close https://github.com/redis/redis/issues/13868 This bug was introduced by https://github.com/redis/redis/pull/13468 To maintain compatibility with older versions that do not support shardid, when a replica passes a shardid, we also update the master’s shardid accordingly. However, when both the master and replica support shardid, an issue arises: in one moment, the master may pass a shardid, causing us to update both the master and all its replicas to match the master’s shardid. But if the replica later passes a different shardid, we would then update the master’s shardid again, leading to continuous changes in shardid. Regardless of the situation, we always ensure that the replica’s shardid remains consistent with the master’s shardid.	2025-04-22 13:49:12 +03:00
Benson-li	a26774cee1	Fix potential infinite loop of RANDOMKEY during client pause (#13863 ) The bug mentioned in this [#13862](https://github.com/redis/redis/issues/13862) has been fixed. --------- Signed-off-by: li-benson <1260437731@qq.com> Signed-off-by: youngmore1024 <youngmore1024@outlook.com> Co-authored-by: youngmore1024 <youngmore1024@outlook.com>	2025-04-22 13:49:12 +03:00
Mingyi Kang	9c5c2dc3a8	Bump actions/upload-artifact from 3 to 4 (#13780 ) Update `upload-artifact` from v3 to v4 to avoid the failure of `External Server Tests` (I encountered this error when opening [#13779](https://github.com/redis/redis/pull/13779)): > Error: This request has been automatically failed because it uses a deprecated version of `actions/upload-artifact: v3`. Learn more: https://github.blog/changelog/2024-04-16-deprecation-notice-v3-of-the-artifact-actions/	2025-04-22 13:49:12 +03:00
nafraf	6e819e188b	Fix module loadex command crash due to invalid config (#13653 ) Fix to https://github.com/redis/redis/issues/13650 providing an invalid config to a module with datatype crashes when redis tries to unload the module due to the invalid config --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-04-22 13:49:12 +03:00
guybe7	8e72b42a9a	Modules: defrag CB should take robj, not sds (#13627 ) Added a log of the keyname in the test modules to reproduce the problem (tests crash without the fix)	2025-04-22 13:49:12 +03:00
Ping Xie	bdc78175b5	Fix PONG message processing for primary-ship tracking during failovers (#13055 ) This commit updates the processing of PONG gossip messages in the cluster. When a node (B) becomes a replica due to a failover, its PONG messages include its new primary node's (A) information and B's configuration epoch is aligned with A's. This allows observer nodes to identify changes in primary-ship, addressing issues of intermediate states and enhancing cluster state consistency during topology changes. Fix #13018	2025-04-22 13:49:12 +03:00
debing.sun	30fe743638	Fix race condition issues between the main thread and module threads (#12817 ) Fix #12785 and other race condition issues. See the following isolated comments. The following report was obtained using SANITIZER thread. ```sh make SANITIZER=thread ./runtest-moduleapi --config io-threads 4 --config io-threads-do-reads yes --accurate ``` 1. Fixed thread-safe issue in RM_UnblockClient() Related discussion: https://github.com/redis/redis/pull/12817#issuecomment-1831181220 * When blocking a client in a module using `RM_BlockClientOnKeys()` or `RM_BlockClientOnKeysWithFlags()` with a timeout_callback, calling RM_UnblockClient() in module threads can lead to race conditions in `updateStatsOnUnblock()`. - Introduced: Version: 6.2 PR: #7491 - Touch: `server.stat_numcommands`, `cmd->latency_histogram`, `server.slowlog`, and `server.latency_events` - Harm Level: High Potentially corrupts the memory data of `cmd->latency_histogram`, `server.slowlog`, and `server.latency_events` - Solution: Differentiate whether the call to moduleBlockedClientTimedOut() comes from the module or the main thread. Since we can't know if RM_UnblockClient() comes from module threads, we always assume it does and let `updateStatsOnUnblock()` asynchronously update the unblock status. * When error reply is called in timeout_callback(), ctx is not thread-safe, eventually lead to race conditions in `afterErrorReply`. - Introduced: Version: 6.2 PR: #8217 - Touch `server.stat_total_error_replies`, `server.errors`, - Harm Level: High Potentially corrupts the memory data of `server.errors` - Solution: Make the ctx in `timeout_callback()` with `REDISMODULE_CTX_THREAD_SAFE`, and asynchronously reply errors to the client. 2. Made RM_Reply() family API thread-safe Related discussion: https://github.com/redis/redis/pull/12817#discussion_r1408707239 Call chain: `RM_Reply()` -> `_addReplyToBufferOrList()` -> touch server.current_client - Introduced: Version: 7.2.0 PR: #12326 - Harm Level: None Since the module fake client won't have the `CLIENT_PUSHING` flag, even if we touch server.current_client, we can still exit after `c->flags & CLIENT_PUSHING`. - Solution Checking `c->flags & CLIENT_PUSHING` earlier. 3. Made freeClient() thread-safe Fix #12785 - Introduced: Version: 4.0 Commit: `3fcf959e60` - Harm Level: Moderate * Trigger assertion It happens when the module thread calls freeClient while the io-thread is in progress, which just triggers an assertion, and doesn't make any race condiaions. * Touch `server.current_client`, `server.stat_clients_type_memory`, and `clientMemUsageBucket->clients`. It happens between the main thread and the module threads, may cause data corruption. 1. Error reset `server.current_client` to NULL, but theoretically this won't happen, because the module has already reset `server.current_client` to old value before entering freeClient. 2. corrupts `clientMemUsageBucket->clients` in updateClientMemUsageAndBucket(). 3. Causes server.stat_clients_type_memory memory statistics to be inaccurate. - Solution: * No longer counts memory usage on fake clients, to avoid updating `server.stat_clients_type_memory` in freeClient. * No longer resetting `server.current_client` in unlinkClient, because the fake client won't be evicted or disconnected in the mid of the process. * Judgment assertion `io_threads_op == IO_THREADS_OP_IDLE` only if c is not a fake client. 4. Fixed free client args without GIL Related discussion: https://github.com/redis/redis/pull/12817#discussion_r1408706695 When freeing retained strings in the module thread (refcount decr), or using them in some way (refcount incr), we should do so while holding the GIL, otherwise, they might be simultaneously freed while the main thread is processing the unblock client state. - Introduced: Version: 6.2.0 PR: #8141 - Harm Level: Low Trigger assertion or double free or memory leak. - Solution: Documenting that module API users need to ensure any access to these retained strings is done with the GIL locked 5. Fix adding fake client to server.clients_pending_write It will incorrectly log the memory usage for the fake client. Related discussion: https://github.com/redis/redis/pull/12817#issuecomment-1851899163 - Introduced: Version: 4.0 Commit: `9b01b64430` - Harm Level: None Only result in NOP - Solution: * Don't add fake client into server.clients_pending_write * Add c->conn assertion for updateClientMemUsageAndBucket() and updateClientMemoryUsage() to avoid same issue in the future. So now it will be the responsibility of the caller of both of them to avoid passing in fake client. 6. Fix calling RM_BlockedClientMeasureTimeStart() and RM_BlockedClientMeasureTimeEnd() without GIL - Introduced: Version: 6.2 PR: #7491 - Harm Level: Low Causes inaccuracies in command latency histogram and slow logs, but does not corrupt memory. - Solution: Module API users, if know that non-thread-safe APIs will be used in multi-threading, need to take responsibility for protecting them with their own locks instead of the GIL, as using the GIL is too expensive. ### Other issue 1. RM_Yield is not thread-safe, fixed via #12905. ### Summarize 1. Fix thread-safe issues for `RM_UnblockClient()`, `freeClient()` and `RM_Yield`, potentially preventing memory corruption, data disorder, or assertion. 2. Updated docs and module test to clarify module API users' responsibility for locking non-thread-safe APIs in multi-threading, such as RM_BlockedClientMeasureTimeStart/End(), RM_FreeString(), RM_RetainString(), and RM_HoldString(). ### About backpot to 7.2 1. The implement of (1) is not too satisfying, would like to get more eyes. 2. (2), (3) can be safely for backport 3. (4), (6) just modifying the module tests and updating the documentation, no need for a backpot. 4. (5) is harmless, no need for a backpot. --------- Co-authored-by: Oran Agra <oran@redislabs.com>	2025-04-22 12:35:44 +03:00
debing.sun	7278a0c26a	Make RM_Yield thread-safe (#12905 ) ## Issues and solutions from #12817 1. Touch ProcessingEventsWhileBlocked and calling moduleCount() without GIL in afterSleep() - Introduced: Version: 7.0.0 PR: #9963 - Harm Level: Very High If the module thread calls `RM_Yield()` before the main thread enters afterSleep(), and modifies `ProcessingEventsWhileBlocked`(+1), it will cause the main thread to not wait for GIL, which can lead to all kinds of unforeseen problems, including memory data corruption. - Initial / Abandoned Solution: * Added `__thread` specifier for ProcessingEventsWhileBlocked. `ProcessingEventsWhileBlocked` is used to protect against nested event processing, but event processing in the main thread and module threads should be completely independent and unaffected, so it is safer to use TLS. * Adding a cached module count to keep track of the current number of modules, to avoid having to use `dictSize()`. - Related Warnings: ``` WARNING: ThreadSanitizer: data race (pid=1136) Write of size 4 at 0x0001045990c0 by thread T4 (mutexes: write M0): #0 processEventsWhileBlocked networking.c:4135 (redis-server:arm64+0x10006d124) #1 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c) #2 bg_call_worker <null>:83232836 (blockedclient.so:arm64+0x16a8) Previous read of size 4 at 0x0001045990c0 by main thread: #0 afterSleep server.c:1861 (redis-server:arm64+0x100024f98) #1 aeProcessEvents ae.c:408 (redis-server:arm64+0x10000fd64) #2 aeMain ae.c:496 (redis-server:arm64+0x100010f0c) #3 main server.c:7220 (redis-server:arm64+0x10003f38c) ``` 2. aeApiPoll() is not thread-safe When using RM_Yield to handle events in a module thread, if the main thread has not yet entered `afterSleep()`, both the module thread and the main thread may touch `server.el` at the same time. - Introduced: Version: 7.0.0 PR: #9963 - Old / Abandoned Solution: Adding a new mutex to protect timing between after beforeSleep() and before afterSleep(). Defect: If the main thread enters the ae loop without any IO events, it will wait until the next timeout or until there is any event again, and the module thread will always hang until the main thread leaves the event loop. - Related Warnings: ``` SUMMARY: ThreadSanitizer: data race ae_kqueue.c:55 in addEventMask ================== ================== WARNING: ThreadSanitizer: data race (pid=14682) Write of size 4 at 0x000100b54000 by thread T9 (mutexes: write M0): #0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588) #1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84) #2 processEventsWhileBlocked networking.c:4138 (redis-server:arm64+0x10006d3c4) #3 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c) #4 bg_call_worker <null>:16042052 (blockedclient.so:arm64+0x169c) Previous write of size 4 at 0x000100b54000 by main thread: #0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588) #1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84) #2 aeMain ae.c:496 (redis-server:arm64+0x100010da8) #3 main server.c:7238 (redis-server:arm64+0x10003f51c) ``` ## The final fix as the comments: https://github.com/redis/redis/pull/12817#discussion_r1436427232 Optimized solution based on the above comment: First, we add `module_gil_acquring` to indicate whether the main thread is currently in the acquiring GIL state. When the module thread starts to yield, there are two possibilities(we assume the caller keeps the GIL): 1. The main thread is in the mid of beforeSleep() and afterSleep(), that is, `module_gil_acquring` is not 1 now. At this point, the module thread will wake up the main thread through the pipe and leave the yield, waiting for the next yield when the main thread may already in the acquiring GIL state. 2. The main thread is in the acquiring GIL state. The module thread release the GIL, yielding CPU to give the main thread an opportunity to start event processing, and then acquire the GIL again until the main thread releases it. This is what https://github.com/redis/redis/pull/12817#discussion_r1436427232 mentioned direction. --------- Co-authored-by: Oran Agra <oran@redislabs.com>	2025-04-22 12:35:36 +03:00
YaacovHazan	ba18105722	Redis 7.2.7	2025-01-06 16:03:47 +02:00
Vitah Lin	ee4f0c5af7	Deprecate ubuntu lunar and macos-12 in workflows (#13669 ) 1. Ubuntu Lunar reached End of Life on January 25, 2024, so upgrade the ubuntu version to plucky in action `test-ubuntu-jemalloc-fortify` to pass the daily CI 2. The macOS-12 environment is deprecated so upgrade macos-12 to macos-13 in daily CI --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-01-06 16:03:47 +02:00
YaacovHazan	e344b2b587	Fix LUA garbage collector (CVE-2024-46981) Reset GC state before closing the lua VM to prevent user data to be wrongly freed while still might be used on destructor callbacks.	2025-01-06 16:03:47 +02:00
YaacovHazan	15e212bf69	Fix Read/Write key pattern selector (CVE-2024-51741) The '%' rule must contain one or both of R/W	2025-01-06 16:03:47 +02:00
Vitah Lin	941ca6c07f	Upgrade action/checkout version and add old-chain CI actions to test gcc4.8 (#13394 ) https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/ Due to GitHub removing support for CentOS 7 in GitHub Actions, all actions utilizing CentOS 7 need to be upgraded, upgrade the centos version from `contos:7` to `quay.io/centos/centos:stream9` which is the official RedHat centos container. Create some new actions named `old-chain` to verify support for gcc 4.8. This PR also includes the upgrade of actions/checkout from version 3 to version 4. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-01-06 16:03:47 +02:00
Yossi Gottlieb	ffde2cf74d	Use cross-platform-actions for FreeBSD support. (#12732 ) This change overcomes many stability issues experienced with the vmactions action. We need to limit VMs to 8GB for better stability, as the 13GB default seems to hang them occasionally. Shell code has been simplified since this action seem to use `bash -e` which will abort on non-zero exit codes anyway.	2025-01-06 16:03:47 +02:00
Steve	52b6c0a27b	Avoid cluster.nodes load corruption due to shard-id generation (#13468 ) PR #13428 doesn't fully resolve an issue where corruption errors can still occur on loading of cluster.nodes file - seen on upgrade where there were no shard_ids (from old Redis), 7.2.5 loading generated new random ones, and persisted them to the file before gossip/handshake could propagate the correct ones (or some other nodes unreachable). This results in a primary/replica having differing shard_id in the cluster.nodes and then the server cannot startup - reports corruption. This PR builds on #13428 by simply ignoring the replica's shard_id in cluster.nodes (if it exists), and uses the replica's primary's shard_id. Additional handling was necessary to cover the case where the replica appears before the primary in cluster.nodes, where it will first use a generated shard_id for the primary, and then correct after it loads the primary cluster.nodes entry. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-01-06 16:03:47 +02:00
debing.sun	a904067f6f	Fix incorrect lag due to trimming stream via XTRIM command (#13473 ) ## Describe When using the `XTRIM` command to trim a stream, it does not update the maximal tombstone (`max_deleted_entry_id`). This leads to an issue where the lag calculation incorrectly assumes that there are no tombstones after the consumer group's last_id, resulting in an inaccurate lag. The reason XTRIM doesn't need to update the maximal tombstone is that it always trims from the beginning of the stream. This means that it consistently changes the position of the first entry, leading to the following scenarios: 1) First entry trimmed after maximal tombstone: If the first entry is trimmed to a position after the maximal tombstone, all tombstones will be before the first entry, so they won't affect the consumer group's lag. 2) First entry trimmed before maximal tombstone: If the first entry is trimmed to a position before the maximal tombstone, the maximal tombstone will not be updated. ## Solution Therefore, this PR optimizes the lag calculation by ensuring that when both the consumer group's last_id and the maximal tombstone are behind the first entry, the consumer group's lag is always equal to the number of remaining elements in the stream. Supplement to PR https://github.com/redis/redis/pull/13338	2025-01-06 16:03:47 +02:00
debing.sun	3ce29e0b11	Pass extensions to node if extension processing is handled by it (#13465 ) This PR is based on the commits from PR https://github.com/valkey-io/valkey/pull/52. Ref: https://github.com/redis/redis/pull/12760 Close https://github.com/redis/redis/issues/13401 This PR will replace https://github.com/redis/redis/pull/13449 Fixes compatibilty of Redis cluster (7.2 - extensions enabled by default) with older Redis cluster (< 7.0 - extensions not handled) . With some of the extensions enabled by default in 7.2 version, new nodes running 7.2 and above start sending out larger clusterbus message payload including the ping extensions. This caused an incompatibility with node running engine versions < 7.0. Old nodes (< 7.0) would receive the payload from new nodes (> 7.2) would observe a payload length (totlen) > (estlen) and would perform an early exit and won't process the message. This fix does the following things: 1. Always set `CLUSTERMSG_FLAG0_EXT_DATA`, because during the meet phase, we do not know whether the connected node supports ext data, we need to make sure that it knows and send back its ext data if it has. 2. If another node does not support ext data, we will not send it ext data to avoid the handshake failure due to the incorrect payload length. Note: A successful `PING`/`PONG` is required as a sender for a given node to be marked as `CLUSTERMSG_FLAG0_EXT_DATA` and then extensions message will be sent to it. This could cause a slight delay in receiving the extensions message(s). --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> Co-authored-by: Harkrishn Patro <harkrisp@amazon.com> --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>	2025-01-06 16:03:47 +02:00
debing.sun	17be6d92e8	Ensure validity of myself as master or replica when loading cluster config (#13443 ) First, we need to ensure that `curmaster` in `clusterUpdateSlotsConfigWith()` is not NULL in the line `82f00f5179/src/cluster_legacy.c (L2320)` otherwise, it will crash in the `82f00f5179/src/cluster_legacy.c (L2395)` So when loading cluster node config, we need to ensure that the following conditions are met: 1. A node must be at least one of the master or replica. 2. If a node is a replica, its master can't be NULL.	2025-01-06 16:03:47 +02:00
debing.sun	f4a6721dfd	Fix CLUSTER SHARDS command returns empty array (#13422 ) Close https://github.com/redis/redis/issues/13414 When the cluster's master node fails and is switched to another node, the first node in the shard node list (the old master) is no longer valid. Add a new method clusterGetMasterFromShard() to obtain the current master.	2025-01-06 16:03:47 +02:00
debing.sun	0e06e67a36	Fix incorrect lag field in XINFO when tombstone is after the last_id of consume group (#13338 ) Fix #13337 Ths PR fixes fixed two bugs that caused lag calculation errors. 1. When the latest tombstone is before the first entry, the tombstone may stil be after the last id of consume group. 2. When a tombstone is after the last id of consume group, the group's counter will be invalid, we should caculate the entries_read by using estimates.	2025-01-06 16:03:47 +02:00
Oran Agra	045617e10a	Fix possible crash due to OOM panic on invalid command (#13380 ) getKeysUsingKeySpece had the range check AFTER the allocation, of the keys buffer, which could lead to an OOM panic when invalid arguments are provided, leading to an overflow. The allocated memory is only used after the range check, so there's no risk of buffer overrun. The OOM panic can happen on 32bit builds, or 64 builds running on systems with less than 4GB of RAM, and is reachable via the COMMAND GETKEYSANDFLAGS, and ACL key name validation.	2025-01-06 16:03:47 +02:00
debing.sun	4a92d66ca2	Don't keep global replication buffer reference for replicas marked CLIENT_CLOSE_ASAP (#13363 ) In certain situations, we might generate a large number of propagates (e.g., multi/exec, Lua script, or a single command generating tons of propagations) within an event loop. During the process of propagating to a replica, if the replica is disconnected(marked as CLIENT_CLOSE_ASAP) due to exceeding the output buffer limit, we should remove its reference to the global replication buffer to avoid the global replication buffer being unable to be properly trimmed due to being referenced. --------- Co-authored-by: oranagra <oran@redislabs.com>	2025-01-06 16:03:47 +02:00
gms	2c8ae06b67	Fix crash due to unblock client during slot migration (#13311 ) In #13224, we found a crash during cluster slot migration but don't know why. So i check all the return C_OK in processCommand to see if we are missing some duration reset and see this. This fix is like #12247, when we reject the command, we should reset the duration. I test it and verify it can fix #13224. So the reason may because we are using stream block and then during the slot migration, it got a redirect and then crash the server. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-01-06 16:03:47 +02:00
Ted Lyngmo	e2aaa9096f	Log the real reason for why posix_fadvise failed (#13246 ) `reclaimFilePageCache` did not set `errno` but `rdbSaveInternal` which is logging the error assumed it did. This makes sure `errno` is set. Fixes #13245 Signed-off-by: Ted Lyngmo <ted@lyncon.se>	2025-01-06 16:03:47 +02:00
debing.sun	9f823685d2	Have consistent behavior of SPUBLISH within multi/exec like regular command (#13276 ) This PR is based on the commits from PR #12944. Allow SPUBLISH command within multi/exec on replica Behavior on unstable: ``` 127.0.0.1:6380> CLUSTER NODES 39ce8aa20f1f0d91f1a88d976ee1926dfefcdf1a 127.0.0.1:6380@16380 myself,slave 8b0feb120b68aac489d6a5af9c77dc40d71bc792 0 0 0 connected 8b0feb120b68aac489d6a5af9c77dc40d71bc792 127.0.0.1:6379@16379 master - 0 1705091681202 0 connected 0-16383 127.0.0.1:6380> SPUBLISH hello world (integer) 0 127.0.0.1:6380> MULTI OK 127.0.0.1:6380(TX)> SPUBLISH hello world QUEUED 127.0.0.1:6380(TX)> EXEC (error) MOVED 866 127.0.0.1:6379 ``` With this change: ``` 127.0.0.1:6380> SPUBLISH hello world (integer) 0 127.0.0.1:6380> MULTI OK 127.0.0.1:6380(TX)> SPUBLISH hello world QUEUED 127.0.0.1:6380(TX)> EXEC 1) (integer) 0 ``` --------- Co-authored-by: Harkrishn Patro <harkrisp@amazon.com> Co-authored-by: oranagra <oran@redislabs.com>	2025-01-06 16:03:47 +02:00
sundb	1936746e63	Fix oom-score-adj test due to no permission (#12887 ) Fix #12792 On ubuntu 23(lunar), non-root users will not be allowed to change the oom_score_adj of a process to a value that is too low. Since terminal's default oom_score_adj is 200, if we run the test on terminal, we won't be able to set the oom_score_adj of the redis process to 9 or 22, which is too low. Reproduction on ubuntu 23(lunar) terminal: ```sh $ cat /proc/`pgrep redis-server`/oom_score_adj 200 $ echo 100 > /proc/`pgrep redis-server`/oom_score_adj # success without error $ echo 99 > /proc/`pgrep redis-server`/oom_score_adj echo: write error: Permission denied ``` As from the output above, we can only set the minimum oom score of redis processes to 100. By modifying the test, make oom_score_adj only increase upwards and not decrease. --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-01-06 16:03:47 +02:00
Oran Agra	ae6a2aa95c	Release Redis 7.2.6	2024-10-02 22:13:33 +03:00
Oran Agra	c8649f8e85	Prevent pattern matching abuse (CVE-2024-31228)	2024-10-02 22:13:33 +03:00
Oran Agra	b351d5a321	Fix ACL SETUSER Read/Write key pattern selector (CVE-2024-31227) The '%' rule must contain one or both of R/W	2024-10-02 22:13:33 +03:00
Oran Agra	fe8de4313f	Fix lua bit.tohex (CVE-2024-31449) INT_MIN value must be explicitly checked, and cannot be negated.	2024-10-02 22:13:33 +03:00
debing.sun	2ad2548747	Fixed crashes due to missed slotToKeyInit() and missed expires_cursor reset (#13315 ) this PR fixes two crashes: 1. Fix missing slotToKeyInit() when using `flushdb async` under cluster mode. https://github.com/redis/redis/issues/13205 2. Fix missing expires_cursor reset when stopping active defrag in the middle of defragment. https://github.com/redis/redis/issues/13307 If we stop active defrag in the middle of defragging db->expires, if `expires_cursor` is not reset to 0, the next time we enable active defrag again, defragLaterStep(db, ...) will be entered. However, at this time, `db` has been reset to NULL, which results in crash. The affected code were removed by #11695 and #13058 in usntable, so we just need backport this to 7.2.	2024-06-18 18:02:22 +08:00
YaacovHazan	f60370ce28	Redis 7.2.5	2024-05-19 09:12:35 +03:00
Yanqi Lv	464aad9ee7	fix wrong data type conversion in zrangeResultBeginStore (#13148 ) In `beginResultEmission`, -1 means the result length is not known in advance. But after #12185, if we pass -1 to `zrangeResultBeginStore`, it will convert to SIZE_MAX in `zsetTypeCreate` and try to `dictExpand`. Although `dictExpand` won't succeed because the size overflows, I think we'd better to avoid this wrong conversion. This bug can be triggered when the source of `zrangestore` doesn't exist or we use `zrangestore` command with `byscore` or `bylex`. The impact is that dst keys will be converted to use skiplist instead of listpack. (cherry picked from commit `bad33f8738`)	2024-05-19 09:12:35 +03:00
Binbin	439b8da475	Fix redis-check-aof incorrectly considering data in manifest format as MP-AOF (#12958 ) The check in fileIsManifest misjudged the manifest file. For example, if resp aof contains "file", it will be considered a manifest file and the check will fail: ``` *3 $3 set $4 file $4 file ``` In #12951, if the preamble aof also contains it, it will also fail. Fixes #12951. the bug was happening if the the word "file" is mentioned in the first 1024 lines of the AOF. and now as soon as it finds a non-comment line it'll break (if it contains "file" or doesn't) (cherry picked from commit `da727ad445`)	2024-05-19 09:12:35 +03:00

1 2 3 4 5 ...

11919 Commits All Branches Search

11919 Commits

All Branches