redis

Commit Graph

Author	SHA1	Message	Date
Ozan Tezcan	6c202f495c	Remove DENYOOM flag from hexpire command (#13800 ) Remove DENYOOM flag from hexpire / hexpireat / hpexpire / hpexpireat commands. h(p)expire(at) commands may allocate some memory but it is not that big. Similary, we don't have DENYOOM flag for EXPIRE command. This change will align EXPIRE and HEXPIRE commands in this manner.	2025-02-16 20:07:29 +03:00
Ozan Tezcan	e2608478b6	Add HGETDEL, HGETEX and HSETEX hash commands (#13798 ) This PR adds three new hash commands: HGETDEL, HGETEX and HSETEX. These commands enable user to do multiple operations in one step atomically e.g. set a hash field and update its TTL with a single command. Previously, it was only possible to do it by calling hset and hexpire commands subsequently. - HGETDEL command ``` HGETDEL <key> FIELDS <numfields> field [field ...] ``` Description Get and delete the value of one or more fields of a given hash key Reply Array reply: list of the value associated with each field or nil if the field doesn’t exist. - HGETEX command ``` HGETEX <key> [EX seconds \| PX milliseconds \| EXAT unix-time-seconds \| PXAT unix-time-milliseconds \| PERSIST] FIELDS <numfields> field [field ...] ``` Description Get the value of one or more fields of a given hash key, and optionally set their expiration Options: EX seconds: Set the specified expiration time, in seconds. PX milliseconds: Set the specified expiration time, in milliseconds. EXAT timestamp-seconds: Set the specified Unix time at which the field will expire, in seconds. PXAT timestamp-milliseconds: Set the specified Unix time at which the field will expire, in milliseconds. PERSIST: Remove the time to live associated with the field. Reply Array reply: list of the value associated with each field or nil if the field doesn’t exist. - HSETEX command ``` HSETEX <key> [FNX \| FXX] [EX seconds \| PX milliseconds \| EXAT unix-time-seconds \| PXAT unix-time-milliseconds \| KEEPTTL] FIELDS <numfields> field value [field value...] ``` Description Set the value of one or more fields of a given hash key, and optionally set their expiration Options: FNX: Only set the fields if all do not already exist. FXX: Only set the fields if all already exist. EX seconds: Set the specified expiration time, in seconds. PX milliseconds: Set the specified expiration time, in milliseconds. EXAT timestamp-seconds: Set the specified Unix time at which the field will expire, in seconds. PXAT timestamp-milliseconds: Set the specified Unix time at which the field will expire, in milliseconds. KEEPTTL: Retain the time to live associated with the field. Note: If no option is provided, any associated expiration time will be discarded similar to how SET command behaves. Reply Integer reply: 0 if no fields were set Integer reply: 1 if all the fields were set	2025-02-14 17:13:35 +03:00
Ofir Luzon	57807cd338	Memory Usage command LIST accuracy fix (#13783 ) MEMORY USAGE on a List samples quicklist entries, but does not account to how many elements are in each sampled node. This can skew the calculation when the sampled nodes are not balanced. The fix calculate the average element size in the sampled nodes instead of the average node size.	2025-02-14 09:18:47 +08:00
Yuan Wang	7f5f588232	AOF offset info (#13773 ) ### Background AOF is often used as an effective data recovery method, but now if we have two AOFs from different nodes, it is hard to learn which one has latest data. Generally, we determine whose data is more up-to-date by reading the latest modification time of the AOF file, but because of replication delay, even if both master and replica write to the AOF at the same time, the data in the master is more up-to-date (there are commands that didn't arrive at the replica yet, or a large number of commands have accumulated on replica side ), so we may make wrong decision. ### Solution The replication offset always increments when AOF is enabled even if there is no replica, we think replication offset is better method to determine which one has more up-to-date data, whoever has a larger offset will have newer data, so we add the start replication offset info for AOF, as bellow. ``` file appendonly.aof.2.base.rdb seq 2 type b file appendonly.aof.2.incr.aof seq 2 type i startoffset 224 ``` And if we close gracefully the AOF file, not a crash, such as `shutdown`, `kill signal 15` or `config set appendonly no`, we will add the end replication offset, as bellow. ``` file appendonly.aof.2.base.rdb seq 2 type b file appendonly.aof.2.incr.aof seq 2 type i startoffset 224 endoffset 532 ``` #### Things to pay attention to - For BASE AOF, we do not add `startoffset` and `endoffset` info, since we could not know the start replication replication of data, and it is useless to help us to determine which one has more up-to-date data. - For AOFs from old version, we also don't add `startoffset` and `endoffset` info, since we also don't know start replication replication of them. If we add the start offset from 0, we might make the judgment even less accurate. For example, if the master has just rewritten the AOF, its INCR AOF will inevitably be very small. However, if the replica has not rewritten AOF for a long time, its INCR AOF might be much larger. By applying the following method, we might make incorrect decisions, so we still just check timestamp instead of adding offset info - If the last INCR AOF has `startoffset` or `endoffset`, we need to restore `server.master_repl_offset` according to them to avoid the rollback of the `startoffset` of next INCR AOF. If it has `endoffset`, we just use this value as `server.master_repl_offset`, and a very important thing is to remove this information from the manifest file to avoid the next time we load the manifest file with wrong `endoffset`. If it only has `startoffset`, we calculate `server.master_repl_offset` by the `startoffset` plus the file size. ### How to determine which one has more up-to-date data If AOF has a larger replication offset, it will have more up-to-date data. The following is how to get AOF offset: Read the AOF manifest file to obtain information about the last INCR AOF 1. If the last INCR AOF has `endoffset` field, we can directly use the `endoffset` to present the replication offset of AOF 2. If there is no `endoffset`(such as redis crashes abnormally), but there is `startoffset` filed of the last INCR AOF, we can get the replication offset of AOF by `startoffset` plus the file size 3. Finally, if the AOF doesn’t have both `startoffset` and `endoffset`, maybe from old version, and new version redis has not rewritten AOF yet, we still need to check the modification timestamp of the last INCR AOF ### TODO Fix ping causing inconsistency between AOF size and replication offset in the future PR. Because we increment the replication offset when sending PING/REPLCONF to the replica but do not write data to the AOF file, this might cause the starting offset of the AOF file plus its size to be inconsistent with the actual replication offset.	2025-02-13 17:31:40 +08:00
Yuan Wang	662cb2fe75	Don't send unnecessary PING to replicas (#13790 ) The reason why master sends PING is to keep the connection with replica active, so master need not send PING to replicas if already sent replication stream in the past `repl_ping_slave_period` time. Now master only sends PINGs and increases `master_repl_offset` if there is no traffic, so this PR also can reduce the impact of issue in https://github.com/redis/redis/pull/13773, of course, does not resolve it completely. > Fix ping causing inconsistency between AOF size and replication offset in the future PR. Because we increment the replication offset when sending PING/REPLCONF to the replica but do not write data to the AOF file, this might cause the starting offset of the AOF file plus its size to be inconsistent with the actual replication offset.	2025-02-13 10:52:19 +08:00
Yuan Wang	87124a38b6	Fix wrongly updating fsynced_reploff_pending when appendfsync=everysecond (#13793 ) ``` if (server.aof_fsync == AOF_FSYNC_EVERYSEC && server.aof_last_incr_fsync_offset != server.aof_last_incr_size && server.mstime - server.aof_last_fsync >= 1000 && !(sync_in_progress = aofFsyncInProgress())) { goto try_fsync; ``` In https://github.com/redis/redis/pull/12622, when when appendfsync=everysecond, if redis has written some data to AOF but not `fsync`, and less than 1 second has passed since the last `fsync `, redis will won't fsync AOF, but we will update ` fsynced_reploff_pending`, so it cause the `WAITAOF` to return prematurely. this bug is introduced in https://github.com/redis/redis/pull/12622, from 7.4 The bug fix `1bd6688bca` is just as follows: ```diff diff --git a/src/aof.c b/src/aof.c index 8ccd8d8f8..521b30449 100644 --- a/src/aof.c +++ b/src/aof.c @@ -1096,8 +1096,11 @@ void flushAppendOnlyFile(int force) { * in which case master_repl_offset will increase but fsynced_reploff_pending won't be updated * (because there's no reason, from the AOF POV, to call fsync) and then WAITAOF may wait on * the higher offset (which contains data that was only propagated to replicas, and not to AOF) */ - if (!sync_in_progress && server.aof_fsync != AOF_FSYNC_NO) + if (server.aof_last_incr_fsync_offset == server.aof_last_incr_size && + !(sync_in_progress = aofFsyncInProgress())) + { atomicSet(server.fsynced_reploff_pending, server.master_repl_offset); + } return; ``` Additionally, we slightly refactored fsync AOF to make it simpler, as `584f008d1c`	2025-02-13 10:48:29 +08:00
Yves LeBras	1583d60cd6	Missing --memkeys and --keystats for some options in redis-cli help text (#13794 ) Help text modified for -i, --count, --pattern.	2025-02-13 08:42:38 +08:00
YaacovHazan	1cd622bdca	Add an API to load default configuration values (#13788 ) Currently we have RedisModule_LoadConfigs which the module is expected to call during OnLoad which sets the configuration values from the config queue or it sets the default value. The problem is that the module might still want to support loading values from the command line. If we want to give precedence to the config file values then it means the module needs to set the values before calling the Load Config function. The problem is that then the API overrides the variables which were set from the module command line with default values. The new API should solve that in the following way. 1.Module registers its configuration parameters with redis 2.Module calls RedisModule_LoadDefaultConfigs which loads the default values for all the registered configuration parameters of the module 3.Module sets the variables internally using the values it got from the command line 4.Module calls RedisModule_LoadConfigs which will set the values based on the redis configuration file. This allows for the default values to be set, for the module to override them and for redis to override what the module wrote. In short it determines a logical flow and ordering of where the values for the parameters should come from. The change done by all these previous commits: `d9134f8f9` `7a40fd630` `b9361ad5f` `83c034855` `f164012c1` `98be450f1` `0f6e3a827` `de4e92ac3` `49455c43a` `c2694fb69` `c88f9fe26` `855ec46a6` `f7353db7e` `294492dbf` `192799539` `a8850a8d3` `f35ad8231` `a03477349` `fd5c32588` Co-authored-by: YaacovHazan <yaacov.hazan@redislabs.com>	2025-02-06 13:38:07 +02:00
kei-nan	d9134f8f95	Update tests/modules/moduleconfigs.c missing else clause Co-authored-by: debing.sun <debing.sun@redis.com>	2025-02-06 13:16:33 +02:00
jonathan keinan	7a40fd630d	* fix comments	2025-02-06 13:16:33 +02:00
jonathan keinan	b9361ad5fe	* only use new api if override-default was provided as an argument	2025-02-06 13:16:33 +02:00
kei-nan	83c0348553	Apply suggestions from code review * apply comment suggestions Co-authored-by: Oran Agra <oran@redislabs.com>	2025-02-06 13:16:33 +02:00
kei-nan	f164012c19	Update tests/unit/moduleapi/moduleconfigs.tcl Co-authored-by: nafraf <nafraf@users.noreply.github.com>	2025-02-06 13:16:33 +02:00
jonathan keinan	98be450f1d	* fix typo	2025-02-06 13:16:33 +02:00
jonathan keinan	0f6e3a8273	* improve function documentation	2025-02-06 13:16:33 +02:00
jonathan keinan	de4e92ac39	* addressing code review comments	2025-02-06 13:16:33 +02:00
jonathan keinan	49455c43ae	* change foo to goo so test will be correct and pass	2025-02-06 13:16:33 +02:00
jonathan keinan	c2694fb696	* change config value in test to be different than overwritten value	2025-02-06 13:16:33 +02:00
jonathan keinan	c88f9fe26f	* update comment	2025-02-06 13:16:33 +02:00
jonathan keinan	855ec46a6a	* rename MODULE_ONLOAD_CONFIG to MODULE_NON_DEFAULT_CONFIG	2025-02-06 13:16:33 +02:00
jonathan keinan	f7353db7eb	* fix test * cleanup strval2 on if an error during the OnLoad was encountered.	2025-02-06 13:16:33 +02:00
jonathan keinan	294492dbf2	* fix tests * add some logging to test module	2025-02-06 13:16:33 +02:00
jonathan keinan	192799539f	* register `LoadDefaultConfigs`	2025-02-06 13:16:33 +02:00
jonathan keinan	a8850a8d30	* remove unused variable	2025-02-06 13:16:33 +02:00
jonathan keinan	f35ad82314	* add missing newline	2025-02-06 13:16:33 +02:00
jonathan keinan	a034773497	* remove redundant module config variable	2025-02-06 13:16:33 +02:00
jonathan keinan	fd5c325886	* initial commit	2025-02-06 13:16:33 +02:00
Alexander Gorbulya	17eb33e0c3	Fix typo in repl-ping-replica-period comment in redis.conf (#13782 ) The comment for the `repl-ping-replica-period` option in `redis.conf` mistakenly refers to `repl_ping_replica_period` (with underscores). This PR corrects it to use the proper format with dashes, as per the actual configuration option.	2025-02-06 10:14:56 +08:00
YaacovHazan	0aeb86d78d	Revert "Improve GETRANGE command behavior (#12272 )" Although the commit #6ceadfb58 improves GETRANGE command behavior, we can't accept it as we should avoid breaking changes for non-critical bug fixes. This reverts commit `6ceadfb580`.	2025-02-05 20:49:42 +02:00
YaacovHazan	8afb72a326	Revert "improve performance for scan command when matching data type (#12395 )" Although the commit #7f0a7f0a6 improves the performance of the SCAN command, we can't accept it as we should avoid breaking changes for non-critical bug fixes. This reverts commit `7f0a7f0a69`.	2025-02-05 20:49:42 +02:00
Raz Monsonego	04589f90d7	Add internal connection and command mechanism (#13740 ) # PR: Add Mechanism for Internal Commands and Connections in Redis This PR introduces a mechanism to handle internal commands and connections in Redis. It includes enhancements for command registration, internal authentication, and observability. ## Key Features 1. Internal Command Flag: - Introduced a new module command registration flag: `internal`. - Commands marked with `internal` can only be executed by internal connections, AOF loading flows, and master-replica connections. - For any other connection, these commands will appear as non-existent. 2. Support for internal authentication added to `AUTH`: - Used by depicting the special username `internal connection` with the right internal password, i.e.,: `AUTH "internal connection" <internal_secret>`. - No user-defined ACL username can have this name, since spaces are not aloud in the ACL parser. - Allows connections to authenticate as internal connections. - Authenticated internal connections can execute internal commands successfully. 4. Module API for Internal Secret: - Added the `RedisModule_GetInternalSecret()` API, that exposes the internal secret that should be used as the password for the new `AUTH "internal connection" <password>` command. - This API enables the modules to authenticate against other shards as local connections. ## Notes on Behavior - ACL validation: - Commands dispatched by internal connections bypass ACL validation, to give the caller full access regardless of the user with which it is connected. - Command Visibility: - Internal commands do not appear in `COMMAND <subcommand>` and `MONITOR` for non-internal connections. - Internal commands are logged in the slow log, latency report and commands' statistics to maintain observability. - `RM_Call()` Updates: - Non-internal connections: - Cannot execute internal commands when the command is sent with the `C` flag (otherwise can). - Internal connections bypass ACL validations (i.e., run as the unrestricted user). - Internal commands' success: - Internal commands succeed upon being sent from either an internal connection (i.e., authenticated via the new `AUTH "internal connection" <internal_secret>` API), an AOF loading process, or from a master via the replication link. Any other connections that attempt to execute an internal command fail with the `unknown command` error message raised. - `CLIENT LIST` flags: - Added the `I` flag, to indicate that the connection is internal. - Lua Scripts: - Prevented internal commands from being executed via Lua scripts. --------- Co-authored-by: Meir Shpilraien <meir@redis.com>	2025-02-05 11:48:08 +02:00
Ozan Tezcan	09f8a2f374	Start AOFRW before streaming repl buffer during fullsync (#13758 ) During fullsync, before loading RDB on the replica, we stop aof child to prevent copy-on-write disaster. Once rdb is loaded, aof is started again and it will trigger aof rewrite. With https://github.com/redis/redis/pull/13732 , for rdbchannel replication, this behavior was changed. Currently, we start aof after replication buffer is streamed to db. This PR changes it back to start aof just after rdb is loaded (before repl buffer is streamed) Both approaches may have pros and cons. If we start aof before streaming repl buffers, we may still face with copy-on-write issues as repl buffers potentially include large amount of changes. If we wait until replication buffer drained, it means we are delaying starting aof persistence. Additional changes are introduced as part of this PR: - Interface change: Added `mem_replica_full_sync_buffer` field to the `INFO MEMORY` command reply. During full sync, it shows total memory consumed by accumulated replication stream buffer on replica. Added same metric to `MEMORY STATS` command reply as `replica.fullsync.buffer` field. - Fixes: - Count repl stream buffer size of replica as part of 'memory overhead' calculation for fields in "INFO MEMORY" and "MEMORY STATS" outputs. Before this PR, repl buffer was not counted as part of memory overhead calculation, causing misreports for fields like `used_memory_overhead` and `used_memory_dataset` in "INFO STATS" and for `overhead.total` field in "MEMORY STATS" command reply. - Dismiss replication stream buffers memory of replica in the fork to reduce COW impact during a fork. - Fixed a few time sensitive flaky tests, deleted a noop statement, fixed some comments and fail messages in rdbchannel tests.	2025-02-04 21:40:18 +03:00
Meir Shpilraien (Spielrein)	870b6bd487	Added a shared secret over Redis cluster. (#13763 ) The PR introduces a new shared secret that is shared over all the nodes on the Redis cluster. The main idea is to leverage the cluster bus to share a secret between all the nodes such that later the nodes will be able to authenticate using this secret and send internal commands to each other (see #13740 for more information about internal commands). The way the shared secret is chosen is the following: 1. Each node, when start, randomly generate its own internal secret. 2. Each node share its internal secret over the cluster ping messages. 3. If a node gets a ping message with secret smaller then his current secret, it embrace it. 4. Eventually all nodes should embrace the minimal secret The converges of the secret is as good as the topology converges. To extend the ping messages to contain the secret, we leverage the extension mechanism. Nodes that runs an older Redis version will just ignore those extensions. Specific tests were added to verify that eventually all nodes see the secrets. In addition, a verification was added to the test infra to verify the secret on `cluster_config_consistent` and to `assert_cluster_state`.	2025-02-03 09:54:37 +02:00
Raz Monsonego	c688537d49	Add flag for ability of a module context to execute debug commands (#13774 ) This PR adds a flag to the `RM_GetContextFlags` module-API function that depicts whether the context may execute debug commands, according to redis's standards.	2025-02-03 09:52:41 +02:00
Mingyi Kang	e3b9397dfe	Bump actions/upload-artifact from 3 to 4 (#13780 ) Update `upload-artifact` from v3 to v4 to avoid the failure of `External Server Tests` (I encountered this error when opening [#13779](https://github.com/redis/redis/pull/13779)): > Error: This request has been automatically failed because it uses a deprecated version of `actions/upload-artifact: v3`. Learn more: https://github.blog/changelog/2024-04-16-deprecation-notice-v3-of-the-artifact-actions/	2025-02-01 14:09:00 +08:00
Moti Cohen	2bfffe85e9	Fix memleak of SFLUSH experimental command (#13766 ) On flushallSyncBgDone, if client doesn't exists, take care release sflush struct.	2025-01-30 13:35:02 +02:00
Mason	f5e046a730	Update history for ban-list propagation (#13749 ) Update CLUSTER FORGET docs for changes in https://github.com/redis/redis/pull/10869 Docs PR: https://github.com/redis/docs/pull/1057 --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-01-27 21:05:37 +08:00
Vitah Lin	5dbcb3e4ab	Add codecov for automated code coverage (#13393 ) This PR introduces Codecov to automate code coverage tracking for our project's tests. For more information about the Codecov platform, please refer to https://docs.codecov.com/docs/quick-start --------- Co-authored-by: debing.sun <debing.sun@redis.com>	2025-01-27 21:04:11 +08:00
Dustin Rodrigues	eb50eb20a5	Fix compile error with TLS_DEBUGGING (#13772 )	2025-01-26 10:14:43 +08:00
debing.sun	f86575f210	Gradually reduce defrag CPU usage when defragmentation is ineffective (#13752 ) This PR addresses an issue where if a module does not provide a defragmentation callback, we cannot defragment the fragmentation it generates. However, the defragmentation process still considers a large amount of fragmentation to be present, leading to more aggressive defragmentation efforts that ultimately have no effect. To mitigate this, the PR introduces a mechanism to gradually reduce the CPU consumption for defragmentation when the defragmentation effectiveness is poor. This occurs when the fragmentation rate drops below 2% and the hit ratio is less than 1%, or when the fragmentation rate increases by no more than 2%. The CPU consumption will be gradually decreased until it reaches the minimum threshold defined by `active-defrag-cycle-min`. --------- Co-authored-by: oranagra <oran@redislabs.com>	2025-01-24 11:35:32 +08:00
nafraf	dcd0b3d020	Update RQE version 7.99.3 (#13767 ) Update RQE version 7.99.3	2025-01-23 22:46:50 +02:00
YaacovHazan	781ccc1bee	Update modules with latest version (#13755 ) Update redisbloom, redisjson and redistimeseries versions to 7.99.2 Co-authored-by: YaacovHazan <yaacov.hazan@redislabs.com>	2025-01-20 10:08:19 +02:00
DvirDukhan	ee96a5a6f5	Update RQE version (#13750 ) v7.99.2	2025-01-16 08:40:19 +02:00
debing.sun	0f65806b5b	Update info.tcl test to revert client output limits sooner (#13738 ) This PR is based on: https://github.com/valkey-io/valkey/pull/1462 We set the client output buffer limits to 10 bytes, and then execute info stats which produces more than 10 bytes of output, which can cause that command to throw an error. I'm not sure why it wasn't consistently erroring before, might have been some change related to the ubuntu upgrade though. failed CI: https://github.com/redis/redis/actions/runs/12738281720/job/35500381299 ------ Co-authored-by: Madelyn Olson [madelyneolson@gmail.com](mailto:madelyneolson@gmail.com)	2025-01-14 17:30:18 +08:00
Yuan Wang	5b8b58e472	Fix incorrect parameter type reports (#13744 ) After upgrading of ubuntu 24.04, clang18 can check runtime error: call to function XXX through pointer to incorrect function type, our daily CI reports the errors by UndefinedBehaviorSanitizer (UBSan): https://github.com/redis/redis/actions/runs/12738281720/job/35500380251#step:6:346 now we add generic version of some existing `free` functions to support to call function through (void*) pointer, actually, they just are the wrapper functions that will cast the data type and call the corresponding functions.	2025-01-14 15:51:05 +08:00
YaacovHazan	342ee426ad	Fix LUA garbage collector (CVE-2024-46981) Reset GC state before closing the lua VM to prevent user data to be wrongly freed while still might be used on destructor callbacks.	2025-01-13 21:20:19 +02:00
YaacovHazan	4a95b3005a	Fix Read/Write key pattern selector (CVE-2024-51741) The '%' rule must contain one or both of R/W	2025-01-13 21:20:19 +02:00
Ozan Tezcan	73a9b916c9	Rdb channel replication (#13732 ) This PR is based on: https://github.com/redis/redis/pull/12109 https://github.com/valkey-io/valkey/pull/60 Closes: https://github.com/redis/redis/issues/11678 Motivation During a full sync, when master is delivering RDB to the replica, incoming write commands are kept in a replication buffer in order to be sent to the replica once RDB delivery is completed. If RDB delivery takes a long time, it might create memory pressure on master. Also, once a replica connection accumulates replication data which is larger than output buffer limits, master will kill replica connection. This may cause a replication failure. The main benefit of the rdb channel replication is streaming incoming commands in parallel to the RDB delivery. This approach shifts replication stream buffering to the replica and reduces load on master. We do this by opening another connection for RDB delivery. The main channel on replica will be receiving replication stream while rdb channel is receiving the RDB. This feature also helps to reduce master's main process CPU load. By opening a dedicated connection for the RDB transfer, the bgsave process has access to the new connection and it will stream RDB directly to the replicas. Before this change, due to TLS connection restriction, the bgsave process was writing RDB bytes to a pipe and the main process was forwarding it to the replica. This is no longer necessary, the main process can avoid these expensive socket read/write syscalls. It also means RDB delivery to replica will be faster as it avoids this step. In summary, replication will be faster and master's performance during full syncs will improve. Implementation steps 1. When replica connects to the master, it sends 'rdb-channel-repl' as part of capability exchange to let master to know replica supports rdb channel. 2. When replica lacks sufficient data for PSYNC, master sends +RDBCHANNELSYNC reply with replica's client id. As the next step, the replica opens a new connection (rdb-channel) and configures it against the master with the appropriate capabilities and requirements. It also sends given client id back to master over rdbchannel, so that master can associate these channels. (initial replica connection will be referred as main-channel) Then, replica requests fullsync using the RDB channel. 3. Prior to forking, master attaches the replica's main channel to the replication backlog to deliver replication stream starting at the snapshot end offset. 4. The master main process sends replication stream via the main channel, while the bgsave process sends the RDB directly to the replica via the rdb-channel. Replica accumulates replication stream in a local buffer, while the RDB is being loaded into the memory. 5. Once the replica completes loading the rdb, it drops the rdb channel and streams the accumulated replication stream into the db. Sync is completed. Some details - Currently, rdbchannel replication is supported only if `repl-diskless-sync` is enabled on master. Otherwise, replication will happen over a single connection as in before. - On replica, there is a limit to replication stream buffering. Replica uses a new config `replica-full-sync-buffer-limit` to limit number of bytes to accumulate. If it is not set, replica inherits `client-output-buffer-limit <replica>` hard limit config. If we reach this limit, replica stops accumulating. This is not a failure scenario though. Further accumulation will happen on master side. Depending on the configured limits on master, master may kill the replica connection. API changes in INFO output: 1. New replica state: `send_bulk_and_stream`. Indicates full sync is still in progress for this replica. It is receiving replication stream and rdb in parallel. ``` slave0:ip=127.0.0.1,port=5002,state=send_bulk_and_stream,offset=0,lag=0 ``` Replica state changes in steps: - First, replica sends psync and receives +RDBCHANNELSYNC :`state=wait_bgsave` - After replica connects with rdbchannel and delivery starts: `state=send_bulk_and_stream` - After full sync: `state=online` 2. On replica side, replication stream buffering metrics: - replica_full_sync_buffer_size: Currently accumulated replication stream data in bytes. - replica_full_sync_buffer_peak: Peak number of bytes that this instance accumulated in the lifetime of the process. ``` replica_full_sync_buffer_size:20485 replica_full_sync_buffer_peak:1048560 ``` API changes in CLIENT LIST In `client list` output, rdbchannel clients will have 'C' flag in addition to 'S' replica flag: ``` id=11 addr=127.0.0.1:39108 laddr=127.0.0.1:5001 fd=14 name= age=5 idle=5 flags=SC db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1920 events=r cmd=psync user=default redir=-1 resp=2 lib-name= lib-ver= io-thread=0 ``` Config changes: - `replica-full-sync-buffer-limit`: Controls how much replication data replica can accumulate during rdbchannel replication. If it is not set, a value of 0 means replica will inherit `client-output-buffer-limit <replica>` hard limit config to limit accumulated data. - `repl-rdb-channel` config is added as a hidden config. This is mostly for testing as we need to support both rdbchannel replication and the older single connection replication (to keep compatibility with older versions and rdbchannel replication will not be enabled if repl-diskless-sync is not enabled). it affects both the master (not to respond to rdb channel requests), and the replica (not to declare capability) Internal API changes: Changes that were introduced to Redis replication: - New replication capability is added to replconf command: `capa rdb-channel-repl`. Indicates replica is capable of rdb channel replication. Replica sends it when it connects to master along with other capabilities. - If replica needs fullsync, master replies `+RDBCHANNELSYNC <client-id>` to the replica's PSYNC request. - When replica opens rdbchannel connection, as part of replconf command, it sends `rdb-channel 1` to let master know this is rdb channel. Also, it sends `main-ch-client-id <client-id>` as part of replconf command so master can associate channels. Testing: As rdbchannel replication is enabled by default, we run whole test suite with it. Though, as we need to support both rdbchannel and single connection replication, we'll be running some tests twice with `repl-rdb-channel yes/no` config. Replica state diagram ``` * * Replica state machine * * * Main channel state * ┌───────────────────┐ * │RECEIVE_PING_REPLY │ * └────────┬──────────┘ * │ +PONG * ┌────────▼──────────┐ * │SEND_HANDSHAKE │ RDB channel state * └────────┬──────────┘ ┌───────────────────────────────┐ * │+OK ┌───► RDB_CH_SEND_HANDSHAKE │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_AUTH_REPLY │ │ REPLCONF main-ch-client-id <clientid> * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_AUTH_REPLY │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_PORT_REPLY │ │ │ +OK * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_REPLCONF_REPLY│ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_IP_REPLY │ │ │ +OK * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_FULLRESYNC │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_CAPA_REPLY │ │ │+FULLRESYNC * └────────┬──────────┘ │ │Rdb delivery * │ │ ┌──────────────▼────────────────┐ * ┌────────▼──────────┐ │ │ RDB_CH_RDB_LOADING │ * │SEND_PSYNC │ │ └──────────────┬────────────────┘ * └─┬─────────────────┘ │ │ Done loading * │PSYNC (use cached-master) │ │ * ┌─▼─────────────────┐ │ │ * │RECEIVE_PSYNC_REPLY│ │ ┌────────────►│ Replica streams replication * └─┬─────────────────┘ │ │ │ buffer into memory * │ │ │ │ * │+RDBCHANNELSYNC client-id │ │ │ * ├──────┬───────────────────┘ │ │ * │ │ Main channel │ │ * │ │ accumulates repl data │ │ * │ ┌──▼────────────────┐ │ ┌───────▼───────────┐ * │ │ REPL_TRANSFER ├───────┘ │ CONNECTED │ * │ └───────────────────┘ └────▲───▲──────────┘ * │ │ │ * │ │ │ * │ +FULLRESYNC ┌───────────────────┐ │ │ * ├────────────────► REPL_TRANSFER ├────┘ │ * │ └───────────────────┘ │ * │ +CONTINUE │ * └──────────────────────────────────────────────┘ */ ``` ----- This PR also contains changes and ideas from: https://github.com/valkey-io/valkey/pull/837 https://github.com/valkey-io/valkey/pull/1173 https://github.com/valkey-io/valkey/pull/804 https://github.com/valkey-io/valkey/pull/945 https://github.com/valkey-io/valkey/pull/989 --------- Co-authored-by: Yuan Wang <wangyuancode@163.com> Co-authored-by: debing.sun <debing.sun@redis.com> Co-authored-by: Moti Cohen <moticless@gmail.com> Co-authored-by: naglera <anagler123@gmail.com> Co-authored-by: Amit Nagler <58042354+naglera@users.noreply.github.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Ping Xie <pingxie@outlook.com> Co-authored-by: Ran Shidlansik <ranshid@amazon.com> Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com> Co-authored-by: xbasel <103044017+xbasel@users.noreply.github.com>	2025-01-13 15:09:52 +03:00
Filipe Oliveira (Redis)	dc0ee51cb1	Refactor Client Write Preparation and Handling (#13721 ) This update refactors prepareClientToWrite by introducing _prepareClientToWrite for inline checks within network.c file, and separates replica and non-replica handling for pending replies and writes (_clientHasPendingRepliesSlave/NonSlave and _writeToClientSlave/NonSlave). --------- Co-authored-by: debing.sun <debing.sun@redis.com> Co-authored-by: Yuan Wang <wangyuancode@163.com>	2025-01-13 15:40:36 +08:00
debing.sun	21aee83abd	Fix issue with argv not being shrunk (#13698 ) Found by @ShooterIT ## Describe If a client first creates a command with a very large number of parameters, such as 10,000 parameters, the argv will be expanded to accommodate 10,000. If the subsequent commands have fewer than 10,000 parameters, this argv will continue to be reused and will never be shrunk. ## Solution When determining whether it is necessary to rebuild argv, if the length of the previous argv has already exceeded 1024, we will progressively create argv regardless. ## Free argv in cron Add a new condition to determine whether argv needs to be resized in cron. When the number of parameters exceeds 128, we will resize it regardless to avoid a single client consuming too much memory. It will now occupy a maximum of (128 * 8 bytes). --------- Co-authored-by: Yuan Wang <wangyuancode@163.com>	2025-01-08 16:12:52 +08:00

1 2 3 4 5 ...

12358 Commits All Branches Search

12358 Commits

All Branches