The comment for the `repl-ping-replica-period` option in `redis.conf`
mistakenly refers to `repl_ping_replica_period` (with underscores).
This PR corrects it to use the proper format with dashes, as per the
actual configuration option.
Although the commit #6ceadfb58 improves GETRANGE command behavior,
we can't accept it as we should avoid breaking changes for non-critical bug fixes.
This reverts commit 6ceadfb580.
Although the commit #7f0a7f0a6 improves the performance of the SCAN command,
we can't accept it as we should avoid breaking changes for non-critical bug fixes.
This reverts commit 7f0a7f0a69.
# PR: Add Mechanism for Internal Commands and Connections in Redis
This PR introduces a mechanism to handle **internal commands and
connections** in Redis. It includes enhancements for command
registration, internal authentication, and observability.
## Key Features
1. **Internal Command Flag**:
- Introduced a new **module command registration flag**: `internal`.
- Commands marked with `internal` can only be executed by **internal
connections**, AOF loading flows, and master-replica connections.
- For any other connection, these commands will appear as non-existent.
2. **Support for internal authentication added to `AUTH`**:
- Used by depicting the special username `internal connection` with the
right internal password, i.e.,: `AUTH "internal connection"
<internal_secret>`.
- No user-defined ACL username can have this name, since spaces are not
aloud in the ACL parser.
- Allows connections to authenticate as **internal connections**.
- Authenticated internal connections can execute internal commands
successfully.
4. **Module API for Internal Secret**:
- Added the `RedisModule_GetInternalSecret()` API, that exposes the
internal secret that should be used as the password for the new `AUTH
"internal connection" <password>` command.
- This API enables the modules to authenticate against other shards as
local connections.
## Notes on Behavior
- **ACL validation**:
- Commands dispatched by internal connections bypass ACL validation, to
give the caller full access regardless of the user with which it is
connected.
- **Command Visibility**:
- Internal commands **do not appear** in `COMMAND <subcommand>` and
`MONITOR` for non-internal connections.
- Internal commands **are logged** in the slow log, latency report and
commands' statistics to maintain observability.
- **`RM_Call()` Updates**:
- **Non-internal connections**:
- Cannot execute internal commands when the command is sent with the `C`
flag (otherwise can).
- Internal connections bypass ACL validations (i.e., run as the
unrestricted user).
- **Internal commands' success**:
- Internal commands succeed upon being sent from either an internal
connection (i.e., authenticated via the new `AUTH "internal connection"
<internal_secret>` API), an AOF loading process, or from a master via
the replication link.
Any other connections that attempt to execute an internal command fail
with the `unknown command` error message raised.
- **`CLIENT LIST` flags**:
- Added the `I` flag, to indicate that the connection is internal.
- **Lua Scripts**:
- Prevented internal commands from being executed via Lua scripts.
---------
Co-authored-by: Meir Shpilraien <meir@redis.com>
During fullsync, before loading RDB on the replica, we stop aof child to
prevent copy-on-write disaster.
Once rdb is loaded, aof is started again and it will trigger aof
rewrite. With https://github.com/redis/redis/pull/13732 , for rdbchannel
replication, this behavior was changed. Currently, we start aof after
replication buffer is streamed to db. This PR changes it back to start
aof just after rdb is loaded (before repl buffer is streamed)
Both approaches may have pros and cons. If we start aof before streaming
repl buffers, we may still face with copy-on-write issues as repl
buffers potentially include large amount of changes. If we wait until
replication buffer drained, it means we are delaying starting aof
persistence.
Additional changes are introduced as part of this PR:
- Interface change:
Added `mem_replica_full_sync_buffer` field to the `INFO MEMORY` command
reply. During full sync, it shows total memory consumed by accumulated
replication stream buffer on replica. Added same metric to `MEMORY
STATS` command reply as `replica.fullsync.buffer` field.
- Fixes:
- Count repl stream buffer size of replica as part of 'memory overhead'
calculation for fields in "INFO MEMORY" and "MEMORY STATS" outputs.
Before this PR, repl buffer was not counted as part of memory overhead
calculation, causing misreports for fields like `used_memory_overhead`
and `used_memory_dataset` in "INFO STATS" and for `overhead.total` field
in "MEMORY STATS" command reply.
- Dismiss replication stream buffers memory of replica in the fork to
reduce COW impact during a fork.
- Fixed a few time sensitive flaky tests, deleted a noop statement,
fixed some comments and fail messages in rdbchannel tests.
The PR introduces a new shared secret that is shared over all the nodes
on the Redis cluster. The main idea is to leverage the cluster bus to
share a secret between all the nodes such that later the nodes will be
able to authenticate using this secret and send internal commands to
each other (see #13740 for more information about internal commands).
The way the shared secret is chosen is the following:
1. Each node, when start, randomly generate its own internal secret.
2. Each node share its internal secret over the cluster ping messages.
3. If a node gets a ping message with secret smaller then his current
secret, it embrace it.
4. Eventually all nodes should embrace the minimal secret
The converges of the secret is as good as the topology converges.
To extend the ping messages to contain the secret, we leverage the
extension mechanism. Nodes that runs an older Redis version will just
ignore those extensions.
Specific tests were added to verify that eventually all nodes see the
secrets. In addition, a verification was added to the test infra to
verify the secret on `cluster_config_consistent` and to
`assert_cluster_state`.
This PR adds a flag to the `RM_GetContextFlags` module-API function that
depicts whether the context may execute debug commands, according to
redis's standards.
This PR introduces Codecov to automate code coverage tracking for our
project's tests.
For more information about the Codecov platform, please refer to
https://docs.codecov.com/docs/quick-start
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
This PR addresses an issue where if a module does not provide a
defragmentation callback, we cannot defragment the fragmentation it
generates. However, the defragmentation process still considers a large
amount of fragmentation to be present, leading to more aggressive
defragmentation efforts that ultimately have no effect.
To mitigate this, the PR introduces a mechanism to gradually reduce the
CPU consumption for defragmentation when the defragmentation
effectiveness is poor. This occurs when the fragmentation rate drops
below 2% and the hit ratio is less than 1%, or when the fragmentation
rate increases by no more than 2%. The CPU consumption will be gradually
decreased until it reaches the minimum threshold defined by
`active-defrag-cycle-min`.
---------
Co-authored-by: oranagra <oran@redislabs.com>
After upgrading of ubuntu 24.04, clang18 can check runtime error: call
to function XXX through pointer to incorrect function type, our daily CI
reports the errors by UndefinedBehaviorSanitizer (UBSan):
https://github.com/redis/redis/actions/runs/12738281720/job/35500380251#step:6:346
now we add generic version of some existing `free` functions to support
to call function through (void*) pointer, actually, they just are the
wrapper functions that will cast the data type and call the
corresponding functions.
This PR is based on:
https://github.com/redis/redis/pull/12109https://github.com/valkey-io/valkey/pull/60
Closes: https://github.com/redis/redis/issues/11678
**Motivation**
During a full sync, when master is delivering RDB to the replica,
incoming write commands are kept in a replication buffer in order to be
sent to the replica once RDB delivery is completed. If RDB delivery
takes a long time, it might create memory pressure on master. Also, once
a replica connection accumulates replication data which is larger than
output buffer limits, master will kill replica connection. This may
cause a replication failure.
The main benefit of the rdb channel replication is streaming incoming
commands in parallel to the RDB delivery. This approach shifts
replication stream buffering to the replica and reduces load on master.
We do this by opening another connection for RDB delivery. The main
channel on replica will be receiving replication stream while rdb
channel is receiving the RDB.
This feature also helps to reduce master's main process CPU load. By
opening a dedicated connection for the RDB transfer, the bgsave process
has access to the new connection and it will stream RDB directly to the
replicas. Before this change, due to TLS connection restriction, the
bgsave process was writing RDB bytes to a pipe and the main process was
forwarding
it to the replica. This is no longer necessary, the main process can
avoid these expensive socket read/write syscalls. It also means RDB
delivery to replica will be faster as it avoids this step.
In summary, replication will be faster and master's performance during
full syncs will improve.
**Implementation steps**
1. When replica connects to the master, it sends 'rdb-channel-repl' as
part of capability exchange to let master to know replica supports rdb
channel.
2. When replica lacks sufficient data for PSYNC, master sends
+RDBCHANNELSYNC reply with replica's client id. As the next step, the
replica opens a new connection (rdb-channel) and configures it against
the master with the appropriate capabilities and requirements. It also
sends given client id back to master over rdbchannel, so that master can
associate these channels. (initial replica connection will be referred
as main-channel) Then, replica requests fullsync using the RDB channel.
3. Prior to forking, master attaches the replica's main channel to the
replication backlog to deliver replication stream starting at the
snapshot end offset.
4. The master main process sends replication stream via the main
channel, while the bgsave process sends the RDB directly to the replica
via the rdb-channel. Replica accumulates replication stream in a local
buffer, while the RDB is being loaded into the memory.
5. Once the replica completes loading the rdb, it drops the rdb channel
and streams the accumulated replication stream into the db. Sync is
completed.
**Some details**
- Currently, rdbchannel replication is supported only if
`repl-diskless-sync` is enabled on master. Otherwise, replication will
happen over a single connection as in before.
- On replica, there is a limit to replication stream buffering. Replica
uses a new config `replica-full-sync-buffer-limit` to limit number of
bytes to accumulate. If it is not set, replica inherits
`client-output-buffer-limit <replica>` hard limit config. If we reach
this limit, replica stops accumulating. This is not a failure scenario
though. Further accumulation will happen on master side. Depending on
the configured limits on master, master may kill the replica connection.
**API changes in INFO output:**
1. New replica state: `send_bulk_and_stream`. Indicates full sync is
still in progress for this replica. It is receiving replication stream
and rdb in parallel.
```
slave0:ip=127.0.0.1,port=5002,state=send_bulk_and_stream,offset=0,lag=0
```
Replica state changes in steps:
- First, replica sends psync and receives +RDBCHANNELSYNC
:`state=wait_bgsave`
- After replica connects with rdbchannel and delivery starts:
`state=send_bulk_and_stream`
- After full sync: `state=online`
2. On replica side, replication stream buffering metrics:
- replica_full_sync_buffer_size: Currently accumulated replication
stream data in bytes.
- replica_full_sync_buffer_peak: Peak number of bytes that this instance
accumulated in the lifetime of the process.
```
replica_full_sync_buffer_size:20485
replica_full_sync_buffer_peak:1048560
```
**API changes in CLIENT LIST**
In `client list` output, rdbchannel clients will have 'C' flag in
addition to 'S' replica flag:
```
id=11 addr=127.0.0.1:39108 laddr=127.0.0.1:5001 fd=14 name= age=5 idle=5 flags=SC db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1920 events=r cmd=psync user=default redir=-1 resp=2 lib-name= lib-ver= io-thread=0
```
**Config changes:**
- `replica-full-sync-buffer-limit`: Controls how much replication data
replica can accumulate during rdbchannel replication. If it is not set,
a value of 0 means replica will inherit `client-output-buffer-limit
<replica>` hard limit config to limit accumulated data.
- `repl-rdb-channel` config is added as a hidden config. This is mostly
for testing as we need to support both rdbchannel replication and the
older single connection replication (to keep compatibility with older
versions and rdbchannel replication will not be enabled if
repl-diskless-sync is not enabled). it affects both the master (not to
respond to rdb channel requests), and the replica (not to declare
capability)
**Internal API changes:**
Changes that were introduced to Redis replication:
- New replication capability is added to replconf command: `capa
rdb-channel-repl`. Indicates replica is capable of rdb channel
replication. Replica sends it when it connects to master along with
other capabilities.
- If replica needs fullsync, master replies `+RDBCHANNELSYNC
<client-id>` to the replica's PSYNC request.
- When replica opens rdbchannel connection, as part of replconf command,
it sends `rdb-channel 1` to let master know this is rdb channel. Also,
it sends `main-ch-client-id <client-id>` as part of replconf command so
master can associate channels.
**Testing:**
As rdbchannel replication is enabled by default, we run whole test suite
with it. Though, as we need to support both rdbchannel and single
connection replication, we'll be running some tests twice with
`repl-rdb-channel yes/no` config.
**Replica state diagram**
```
* * Replica state machine *
*
* Main channel state
* ┌───────────────────┐
* │RECEIVE_PING_REPLY │
* └────────┬──────────┘
* │ +PONG
* ┌────────▼──────────┐
* │SEND_HANDSHAKE │ RDB channel state
* └────────┬──────────┘ ┌───────────────────────────────┐
* │+OK ┌───► RDB_CH_SEND_HANDSHAKE │
* ┌────────▼──────────┐ │ └──────────────┬────────────────┘
* │RECEIVE_AUTH_REPLY │ │ REPLCONF main-ch-client-id <clientid>
* └────────┬──────────┘ │ ┌──────────────▼────────────────┐
* │+OK │ │ RDB_CH_RECEIVE_AUTH_REPLY │
* ┌────────▼──────────┐ │ └──────────────┬────────────────┘
* │RECEIVE_PORT_REPLY │ │ │ +OK
* └────────┬──────────┘ │ ┌──────────────▼────────────────┐
* │+OK │ │ RDB_CH_RECEIVE_REPLCONF_REPLY│
* ┌────────▼──────────┐ │ └──────────────┬────────────────┘
* │RECEIVE_IP_REPLY │ │ │ +OK
* └────────┬──────────┘ │ ┌──────────────▼────────────────┐
* │+OK │ │ RDB_CH_RECEIVE_FULLRESYNC │
* ┌────────▼──────────┐ │ └──────────────┬────────────────┘
* │RECEIVE_CAPA_REPLY │ │ │+FULLRESYNC
* └────────┬──────────┘ │ │Rdb delivery
* │ │ ┌──────────────▼────────────────┐
* ┌────────▼──────────┐ │ │ RDB_CH_RDB_LOADING │
* │SEND_PSYNC │ │ └──────────────┬────────────────┘
* └─┬─────────────────┘ │ │ Done loading
* │PSYNC (use cached-master) │ │
* ┌─▼─────────────────┐ │ │
* │RECEIVE_PSYNC_REPLY│ │ ┌────────────►│ Replica streams replication
* └─┬─────────────────┘ │ │ │ buffer into memory
* │ │ │ │
* │+RDBCHANNELSYNC client-id │ │ │
* ├──────┬───────────────────┘ │ │
* │ │ Main channel │ │
* │ │ accumulates repl data │ │
* │ ┌──▼────────────────┐ │ ┌───────▼───────────┐
* │ │ REPL_TRANSFER ├───────┘ │ CONNECTED │
* │ └───────────────────┘ └────▲───▲──────────┘
* │ │ │
* │ │ │
* │ +FULLRESYNC ┌───────────────────┐ │ │
* ├────────────────► REPL_TRANSFER ├────┘ │
* │ └───────────────────┘ │
* │ +CONTINUE │
* └──────────────────────────────────────────────┘
*/
```
-----
This PR also contains changes and ideas from:
https://github.com/valkey-io/valkey/pull/837https://github.com/valkey-io/valkey/pull/1173https://github.com/valkey-io/valkey/pull/804https://github.com/valkey-io/valkey/pull/945https://github.com/valkey-io/valkey/pull/989
---------
Co-authored-by: Yuan Wang <wangyuancode@163.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Moti Cohen <moticless@gmail.com>
Co-authored-by: naglera <anagler123@gmail.com>
Co-authored-by: Amit Nagler <58042354+naglera@users.noreply.github.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Ping Xie <pingxie@outlook.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>
Co-authored-by: xbasel <103044017+xbasel@users.noreply.github.com>
This update refactors prepareClientToWrite by introducing
_prepareClientToWrite for inline checks within network.c file, and
separates replica and non-replica handling for pending replies and
writes (_clientHasPendingRepliesSlave/NonSlave and
_writeToClientSlave/NonSlave).
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Yuan Wang <wangyuancode@163.com>
Found by @ShooterIT
## Describe
If a client first creates a command with a very large number of
parameters, such as 10,000 parameters, the argv will be expanded to
accommodate 10,000. If the subsequent commands have fewer than 10,000
parameters, this argv will continue to be reused and will never be
shrunk.
## Solution
When determining whether it is necessary to rebuild argv, if the length
of the previous argv has already exceeded 1024, we will progressively
create argv regardless.
## Free argv in cron
Add a new condition to determine whether argv needs to be resized in
cron. When the number of parameters exceeds 128, we will resize it
regardless to avoid a single client consuming too much memory. It will
now occupy a maximum of (128 * 8 bytes).
---------
Co-authored-by: Yuan Wang <wangyuancode@163.com>
Introduced by https://github.com/redis/redis/issues/13521
If the client argv was released due to a timeout before sending the
complete command, `argv_len` will be reset to 0.
When argv is parsed again and resized, requesting a length of 0 may
result in argv being NULL, then leading to a crash.
And fix a bug that `argv_len` is not updated correctly in
`replaceClientCommandVector()`.
---------
Co-authored-by: ShooterIT <wangyuancode@163.com>
Co-authored-by: meiravgri <109056284+meiravgri@users.noreply.github.com>
The main job of the IO thread is read queries and write replies, so
reads/writes metrics can reflect the workload of IO threads, now we also
support this metrics `io_threaded_reads/writes_processed` in detail for
each IO thread.
Of course, to avoid break changes, `io_threaded_reads/writes_processed`
is still there. But before async io thread commit, we may sum the IO
done by the main thread if IO threads are active, but now we only sum
the IO done by IO threads.
Now threads section in `info` command output is as follows:
```
# Threads
io_thread_0:clients=0,reads=0,writes=0
io_thread_1:clients=54,reads=6546940,writes=6546919
io_thread_2:clients=54,reads=6513650,writes=6513625
io_thread_3:clients=54,reads=6396571,writes=6396525
io_thread_4:clients=53,reads=6511120,writes=6511097
io_thread_5:clients=53,reads=6539302,writes=6539280
io_thread_6:clients=53,reads=6502269,writes=6502248
```