This PR is based on:
https://github.com/redis/redis/pull/12109https://github.com/valkey-io/valkey/pull/60
Closes: https://github.com/redis/redis/issues/11678
**Motivation**
During a full sync, when master is delivering RDB to the replica,
incoming write commands are kept in a replication buffer in order to be
sent to the replica once RDB delivery is completed. If RDB delivery
takes a long time, it might create memory pressure on master. Also, once
a replica connection accumulates replication data which is larger than
output buffer limits, master will kill replica connection. This may
cause a replication failure.
The main benefit of the rdb channel replication is streaming incoming
commands in parallel to the RDB delivery. This approach shifts
replication stream buffering to the replica and reduces load on master.
We do this by opening another connection for RDB delivery. The main
channel on replica will be receiving replication stream while rdb
channel is receiving the RDB.
This feature also helps to reduce master's main process CPU load. By
opening a dedicated connection for the RDB transfer, the bgsave process
has access to the new connection and it will stream RDB directly to the
replicas. Before this change, due to TLS connection restriction, the
bgsave process was writing RDB bytes to a pipe and the main process was
forwarding
it to the replica. This is no longer necessary, the main process can
avoid these expensive socket read/write syscalls. It also means RDB
delivery to replica will be faster as it avoids this step.
In summary, replication will be faster and master's performance during
full syncs will improve.
**Implementation steps**
1. When replica connects to the master, it sends 'rdb-channel-repl' as
part of capability exchange to let master to know replica supports rdb
channel.
2. When replica lacks sufficient data for PSYNC, master sends
+RDBCHANNELSYNC reply with replica's client id. As the next step, the
replica opens a new connection (rdb-channel) and configures it against
the master with the appropriate capabilities and requirements. It also
sends given client id back to master over rdbchannel, so that master can
associate these channels. (initial replica connection will be referred
as main-channel) Then, replica requests fullsync using the RDB channel.
3. Prior to forking, master attaches the replica's main channel to the
replication backlog to deliver replication stream starting at the
snapshot end offset.
4. The master main process sends replication stream via the main
channel, while the bgsave process sends the RDB directly to the replica
via the rdb-channel. Replica accumulates replication stream in a local
buffer, while the RDB is being loaded into the memory.
5. Once the replica completes loading the rdb, it drops the rdb channel
and streams the accumulated replication stream into the db. Sync is
completed.
**Some details**
- Currently, rdbchannel replication is supported only if
`repl-diskless-sync` is enabled on master. Otherwise, replication will
happen over a single connection as in before.
- On replica, there is a limit to replication stream buffering. Replica
uses a new config `replica-full-sync-buffer-limit` to limit number of
bytes to accumulate. If it is not set, replica inherits
`client-output-buffer-limit <replica>` hard limit config. If we reach
this limit, replica stops accumulating. This is not a failure scenario
though. Further accumulation will happen on master side. Depending on
the configured limits on master, master may kill the replica connection.
**API changes in INFO output:**
1. New replica state: `send_bulk_and_stream`. Indicates full sync is
still in progress for this replica. It is receiving replication stream
and rdb in parallel.
```
slave0:ip=127.0.0.1,port=5002,state=send_bulk_and_stream,offset=0,lag=0
```
Replica state changes in steps:
- First, replica sends psync and receives +RDBCHANNELSYNC
:`state=wait_bgsave`
- After replica connects with rdbchannel and delivery starts:
`state=send_bulk_and_stream`
- After full sync: `state=online`
2. On replica side, replication stream buffering metrics:
- replica_full_sync_buffer_size: Currently accumulated replication
stream data in bytes.
- replica_full_sync_buffer_peak: Peak number of bytes that this instance
accumulated in the lifetime of the process.
```
replica_full_sync_buffer_size:20485
replica_full_sync_buffer_peak:1048560
```
**API changes in CLIENT LIST**
In `client list` output, rdbchannel clients will have 'C' flag in
addition to 'S' replica flag:
```
id=11 addr=127.0.0.1:39108 laddr=127.0.0.1:5001 fd=14 name= age=5 idle=5 flags=SC db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1920 events=r cmd=psync user=default redir=-1 resp=2 lib-name= lib-ver= io-thread=0
```
**Config changes:**
- `replica-full-sync-buffer-limit`: Controls how much replication data
replica can accumulate during rdbchannel replication. If it is not set,
a value of 0 means replica will inherit `client-output-buffer-limit
<replica>` hard limit config to limit accumulated data.
- `repl-rdb-channel` config is added as a hidden config. This is mostly
for testing as we need to support both rdbchannel replication and the
older single connection replication (to keep compatibility with older
versions and rdbchannel replication will not be enabled if
repl-diskless-sync is not enabled). it affects both the master (not to
respond to rdb channel requests), and the replica (not to declare
capability)
**Internal API changes:**
Changes that were introduced to Redis replication:
- New replication capability is added to replconf command: `capa
rdb-channel-repl`. Indicates replica is capable of rdb channel
replication. Replica sends it when it connects to master along with
other capabilities.
- If replica needs fullsync, master replies `+RDBCHANNELSYNC
<client-id>` to the replica's PSYNC request.
- When replica opens rdbchannel connection, as part of replconf command,
it sends `rdb-channel 1` to let master know this is rdb channel. Also,
it sends `main-ch-client-id <client-id>` as part of replconf command so
master can associate channels.
**Testing:**
As rdbchannel replication is enabled by default, we run whole test suite
with it. Though, as we need to support both rdbchannel and single
connection replication, we'll be running some tests twice with
`repl-rdb-channel yes/no` config.
**Replica state diagram**
```
* * Replica state machine *
*
* Main channel state
* ┌───────────────────┐
* │RECEIVE_PING_REPLY │
* └────────┬──────────┘
* │ +PONG
* ┌────────▼──────────┐
* │SEND_HANDSHAKE │ RDB channel state
* └────────┬──────────┘ ┌───────────────────────────────┐
* │+OK ┌───► RDB_CH_SEND_HANDSHAKE │
* ┌────────▼──────────┐ │ └──────────────┬────────────────┘
* │RECEIVE_AUTH_REPLY │ │ REPLCONF main-ch-client-id <clientid>
* └────────┬──────────┘ │ ┌──────────────▼────────────────┐
* │+OK │ │ RDB_CH_RECEIVE_AUTH_REPLY │
* ┌────────▼──────────┐ │ └──────────────┬────────────────┘
* │RECEIVE_PORT_REPLY │ │ │ +OK
* └────────┬──────────┘ │ ┌──────────────▼────────────────┐
* │+OK │ │ RDB_CH_RECEIVE_REPLCONF_REPLY│
* ┌────────▼──────────┐ │ └──────────────┬────────────────┘
* │RECEIVE_IP_REPLY │ │ │ +OK
* └────────┬──────────┘ │ ┌──────────────▼────────────────┐
* │+OK │ │ RDB_CH_RECEIVE_FULLRESYNC │
* ┌────────▼──────────┐ │ └──────────────┬────────────────┘
* │RECEIVE_CAPA_REPLY │ │ │+FULLRESYNC
* └────────┬──────────┘ │ │Rdb delivery
* │ │ ┌──────────────▼────────────────┐
* ┌────────▼──────────┐ │ │ RDB_CH_RDB_LOADING │
* │SEND_PSYNC │ │ └──────────────┬────────────────┘
* └─┬─────────────────┘ │ │ Done loading
* │PSYNC (use cached-master) │ │
* ┌─▼─────────────────┐ │ │
* │RECEIVE_PSYNC_REPLY│ │ ┌────────────►│ Replica streams replication
* └─┬─────────────────┘ │ │ │ buffer into memory
* │ │ │ │
* │+RDBCHANNELSYNC client-id │ │ │
* ├──────┬───────────────────┘ │ │
* │ │ Main channel │ │
* │ │ accumulates repl data │ │
* │ ┌──▼────────────────┐ │ ┌───────▼───────────┐
* │ │ REPL_TRANSFER ├───────┘ │ CONNECTED │
* │ └───────────────────┘ └────▲───▲──────────┘
* │ │ │
* │ │ │
* │ +FULLRESYNC ┌───────────────────┐ │ │
* ├────────────────► REPL_TRANSFER ├────┘ │
* │ └───────────────────┘ │
* │ +CONTINUE │
* └──────────────────────────────────────────────┘
*/
```
-----
This PR also contains changes and ideas from:
https://github.com/valkey-io/valkey/pull/837https://github.com/valkey-io/valkey/pull/1173https://github.com/valkey-io/valkey/pull/804https://github.com/valkey-io/valkey/pull/945https://github.com/valkey-io/valkey/pull/989
---------
Co-authored-by: Yuan Wang <wangyuancode@163.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Moti Cohen <moticless@gmail.com>
Co-authored-by: naglera <anagler123@gmail.com>
Co-authored-by: Amit Nagler <58042354+naglera@users.noreply.github.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Ping Xie <pingxie@outlook.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>
Co-authored-by: xbasel <103044017+xbasel@users.noreply.github.com>
By the convention of errors, there is supposed to be a space between the code and the name.
While looking at some lua stuff I noticed that interpreter errors were not adding the space,
so some clients will try to map the detailed error message into the error.
We have tests that hit this condition, but they were just checking that the string "starts" with ERR.
I updated some other tests with similar incorrect string checking. This isn't complete though, as
there are other ways we check for ERR I didn't fix.
Produces some fun output like:
```
# Errorstats
errorstat_ERR:count=1
errorstat_ERRuser_script_1_:count=1
```
This change sets a low limit for multibulk and bulk length in the
protocol for unauthenticated connections, so that they can't easily
cause redis to allocate massive amounts of memory by sending just a few
characters on the network.
The new limits are 10 arguments of 16kb each (instead of 1m of 512mb)
This commit revives the improves the ability to run the test suite against
external servers, instead of launching and managing `redis-server` processes as
part of the test fixture.
This capability existed in the past, using the `--host` and `--port` options.
However, it was quite limited and mostly useful when running a specific tests.
Attempting to run larger chunks of the test suite experienced many issues:
* Many tests depend on being able to start and control `redis-server` themselves,
and there's no clear distinction between external server compatible and other
tests.
* Cluster mode is not supported (resulting with `CROSSSLOT` errors).
This PR cleans up many things and makes it possible to run the entire test suite
against an external server. It also provides more fine grained controls to
handle cases where the external server supports a subset of the Redis commands,
limited number of databases, cluster mode, etc.
The tests directory now contains a `README.md` file that describes how this
works.
This commit also includes additional cleanups and fixes:
* Tests can now be tagged.
* Tag-based selection is now unified across `start_server`, `tags` and `test`.
* More information is provided about skipped or ignored tests.
* Repeated patterns in tests have been extracted to common procedures, both at a
global level and on a per-test file basis.
* Cleaned up some cases where test setup was based on a previous test executing
(a major anti-pattern that repeats itself in many places).
* Cleaned up some cases where test teardown was not part of a test (in the
future we should have dedicated teardown code that executes even when tests
fail).
* Fixed some tests that were flaky running on external servers.
This way the behavior is very similar to the past one.
This is useful in order to remember the user she probably failed to
configure a password correctly.