Based on https://github.com/valkey-io/valkey/pull/1463 and
https://github.com/valkey-io/valkey/pull/1481
In the failure of fully
CI(https://github.com/redis/redis/actions/runs/14595343452/job/40979173087?pr=13965)
in version 7.0 we are getting a number of errors like:
```
array subscript ‘clusterMsg[0]’ is partly outside array bounds of ‘unsigned char[2272]’
```
Which is basically GCC telling us that we have an object which is longer
than the underlying storage of the allocation. We actually do this a
lot, but GCC is generally not aware of how big the underlying allocation
is, so it doesn't throw this error. We are specifically getting this
error because the msgBlock can be of variable length depending on the
type of message, but GCC assumes it's the longest one possible. The
solution I went with here was make the message type optional, so that it
wasn't included in the size. I think this also makes some sense, since
it's really just a helper for us to easily cast the object around.
This compilation warning only occurs in version 7.2, because in [this
PR](https://github.com/redis/redis/pull/13073), we started passing
`-flto` to `CFLAGS` by default. It seems that in this case, GCC is
unable to detect such warnings. However, this change is not present in
version 7.2.
So, to reproduce this compilation warning in versions after 7.2, we can
pass `OPTIMIZATION=-O2` manually.
---------
Co-authored-by: madolson <34459052+madolson@users.noreply.github.com>
### Problem
A previous PR (https://github.com/redis/redis/pull/13932) fixed the TCP
port issue in CLUSTER SLOTS, but it seems the handling of the TLS port
was overlooked.
There is this comment in the `addNodeToNodeReply` function in the
`cluster.c` file:
```c
/* Report TLS ports to TLS client, and report non-TLS port to non-TLS client. */
addReplyLongLong(c, clusterNodeClientPort(node, shouldReturnTlsInfo()));
addReplyBulkCBuffer(c, clusterNodeGetName(node), CLUSTER_NAMELEN);
```
### Fixed
This PR fixes the TLS port issue and adds relevant tests.
This PR fix the lag calculation by ensuring that when consumer group's last_id
is behind the first entry, the consumer group's entries read is considered
invalid and recalculated from the start of the stream
Supplement to PR #13473Close#13957
Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
Close https://github.com/redis/redis/issues/13892
config set port cmd updates server.port. cluster slot retrieves
information about cluster slots and their associated nodes. the fix
updates this info when config set port cmd is done, so cluster slots cmd
returns the right value.
This commit reverts the deletion of the condition `!bc->blocked_on_keys`
that was accidentally introduced by
https://github.com/redis/redis/pull/12817.
In case a blocked-on-keys module client is unblocked both
`moduleUnblockClientOnKey` and `moduleHandleBlockedClients` are called
which resulted in `updateStatsOnUnblock` being called twice
Now, that `moduleHandleBlockedClients` doesn't call
`updateStatsOnUnblock` in case of unblocked module key-blocked clients,
in the unlikely event that the module decides to call `RM_UnblockClient`
on a key-blocked client, we need to call `updateStatsOnUnblock` from
within `moduleBlockedClientTimedOut`, but since
`moduleBlockedClientTimedOut` is not tread-safe we can't call it
directly from withing `RM_UnblockClient`.
Added a new flag `blocked_on_keys_explicit_unblock` for that specific
case, which will cause `moduleBlockedClientTimedOut` to be called from
`moduleHandleBlockedClients` (which is only called from the main thread)
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Close https://github.com/redis/redis/issues/13868
This bug was introduced by https://github.com/redis/redis/pull/13468
To maintain compatibility with older versions that do not support
shardid, when a replica passes a shardid, we also update the master’s
shardid accordingly.
However, when both the master and replica support shardid, an issue
arises: in one moment, the master may pass a shardid, causing us to
update both the master and all its replicas to match the master’s
shardid. But if the replica later passes a different shardid, we would
then update the master’s shardid again, leading to continuous changes in
shardid.
Regardless of the situation, we always ensure that the replica’s shardid
remains consistent with the master’s shardid.
The bug mentioned in this
[#13862](https://github.com/redis/redis/issues/13862) has been fixed.
---------
Signed-off-by: li-benson <1260437731@qq.com>
Signed-off-by: youngmore1024 <youngmore1024@outlook.com>
Co-authored-by: youngmore1024 <youngmore1024@outlook.com>
Fix to https://github.com/redis/redis/issues/13650
providing an invalid config to a module with datatype crashes when redis
tries to unload the module due to the invalid config
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
This commit updates the processing of PONG gossip messages in the
cluster. When a node (B) becomes a replica due to a failover, its PONG
messages include its new primary node's (A) information and B's
configuration epoch is aligned with A's. This allows observer nodes to
identify changes in primary-ship, addressing issues of intermediate
states and enhancing cluster state consistency during topology changes.
Fix#13018
Fix#12785 and other race condition issues.
See the following isolated comments.
The following report was obtained using SANITIZER thread.
```sh
make SANITIZER=thread
./runtest-moduleapi --config io-threads 4 --config io-threads-do-reads yes --accurate
```
1. Fixed thread-safe issue in RM_UnblockClient()
Related discussion:
https://github.com/redis/redis/pull/12817#issuecomment-1831181220
* When blocking a client in a module using `RM_BlockClientOnKeys()` or
`RM_BlockClientOnKeysWithFlags()`
with a timeout_callback, calling RM_UnblockClient() in module threads
can lead to race conditions
in `updateStatsOnUnblock()`.
- Introduced:
Version: 6.2
PR: #7491
- Touch:
`server.stat_numcommands`, `cmd->latency_histogram`, `server.slowlog`,
and `server.latency_events`
- Harm Level: High
Potentially corrupts the memory data of `cmd->latency_histogram`,
`server.slowlog`, and `server.latency_events`
- Solution:
Differentiate whether the call to moduleBlockedClientTimedOut() comes
from the module or the main thread.
Since we can't know if RM_UnblockClient() comes from module threads, we
always assume it does and
let `updateStatsOnUnblock()` asynchronously update the unblock status.
* When error reply is called in timeout_callback(), ctx is not
thread-safe, eventually lead to race conditions in `afterErrorReply`.
- Introduced:
Version: 6.2
PR: #8217
- Touch
`server.stat_total_error_replies`, `server.errors`,
- Harm Level: High
Potentially corrupts the memory data of `server.errors`
- Solution:
Make the ctx in `timeout_callback()` with `REDISMODULE_CTX_THREAD_SAFE`,
and asynchronously reply errors to the client.
2. Made RM_Reply*() family API thread-safe
Related discussion:
https://github.com/redis/redis/pull/12817#discussion_r1408707239
Call chain: `RM_Reply*()` -> `_addReplyToBufferOrList()` -> touch
server.current_client
- Introduced:
Version: 7.2.0
PR: #12326
- Harm Level: None
Since the module fake client won't have the `CLIENT_PUSHING` flag, even
if we touch server.current_client,
we can still exit after `c->flags & CLIENT_PUSHING`.
- Solution
Checking `c->flags & CLIENT_PUSHING` earlier.
3. Made freeClient() thread-safe
Fix#12785
- Introduced:
Version: 4.0
Commit:
3fcf959e60
- Harm Level: Moderate
* Trigger assertion
It happens when the module thread calls freeClient while the io-thread
is in progress,
which just triggers an assertion, and doesn't make any race condiaions.
* Touch `server.current_client`, `server.stat_clients_type_memory`, and
`clientMemUsageBucket->clients`.
It happens between the main thread and the module threads, may cause
data corruption.
1. Error reset `server.current_client` to NULL, but theoretically this
won't happen,
because the module has already reset `server.current_client` to old
value before entering freeClient.
2. corrupts `clientMemUsageBucket->clients` in
updateClientMemUsageAndBucket().
3. Causes server.stat_clients_type_memory memory statistics to be
inaccurate.
- Solution:
* No longer counts memory usage on fake clients, to avoid updating
`server.stat_clients_type_memory` in freeClient.
* No longer resetting `server.current_client` in unlinkClient, because
the fake client won't be evicted or disconnected in the mid of the
process.
* Judgment assertion `io_threads_op == IO_THREADS_OP_IDLE` only if c is
not a fake client.
4. Fixed free client args without GIL
Related discussion:
https://github.com/redis/redis/pull/12817#discussion_r1408706695
When freeing retained strings in the module thread (refcount decr), or
using them in some way (refcount incr), we should do so while holding
the GIL,
otherwise, they might be simultaneously freed while the main thread is
processing the unblock client state.
- Introduced:
Version: 6.2.0
PR: #8141
- Harm Level: Low
Trigger assertion or double free or memory leak.
- Solution:
Documenting that module API users need to ensure any access to these
retained strings is done with the GIL locked
5. Fix adding fake client to server.clients_pending_write
It will incorrectly log the memory usage for the fake client.
Related discussion:
https://github.com/redis/redis/pull/12817#issuecomment-1851899163
- Introduced:
Version: 4.0
Commit:
9b01b64430
- Harm Level: None
Only result in NOP
- Solution:
* Don't add fake client into server.clients_pending_write
* Add c->conn assertion for updateClientMemUsageAndBucket() and
updateClientMemoryUsage() to avoid same
issue in the future.
So now it will be the responsibility of the caller of both of them to
avoid passing in fake client.
6. Fix calling RM_BlockedClientMeasureTimeStart() and
RM_BlockedClientMeasureTimeEnd() without GIL
- Introduced:
Version: 6.2
PR: #7491
- Harm Level: Low
Causes inaccuracies in command latency histogram and slow logs, but does
not corrupt memory.
- Solution:
Module API users, if know that non-thread-safe APIs will be used in
multi-threading, need to take responsibility for protecting them with
their own locks instead of the GIL, as using the GIL is too expensive.
### Other issue
1. RM_Yield is not thread-safe, fixed via #12905.
### Summarize
1. Fix thread-safe issues for `RM_UnblockClient()`, `freeClient()` and
`RM_Yield`, potentially preventing memory corruption, data disorder, or
assertion.
2. Updated docs and module test to clarify module API users'
responsibility for locking non-thread-safe APIs in multi-threading, such
as RM_BlockedClientMeasureTimeStart/End(), RM_FreeString(),
RM_RetainString(), and RM_HoldString().
### About backpot to 7.2
1. The implement of (1) is not too satisfying, would like to get more
eyes.
2. (2), (3) can be safely for backport
3. (4), (6) just modifying the module tests and updating the
documentation, no need for a backpot.
4. (5) is harmless, no need for a backpot.
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
## Issues and solutions from #12817
1. Touch ProcessingEventsWhileBlocked and calling moduleCount() without
GIL in afterSleep()
- Introduced:
Version: 7.0.0
PR: #9963
- Harm Level: Very High
If the module thread calls `RM_Yield()` before the main thread enters
afterSleep(),
and modifies `ProcessingEventsWhileBlocked`(+1), it will cause the main
thread to not wait for GIL,
which can lead to all kinds of unforeseen problems, including memory
data corruption.
- Initial / Abandoned Solution:
* Added `__thread` specifier for ProcessingEventsWhileBlocked.
`ProcessingEventsWhileBlocked` is used to protect against nested event
processing, but event processing
in the main thread and module threads should be completely independent
and unaffected, so it is safer
to use TLS.
* Adding a cached module count to keep track of the current number of
modules, to avoid having to use `dictSize()`.
- Related Warnings:
```
WARNING: ThreadSanitizer: data race (pid=1136)
Write of size 4 at 0x0001045990c0 by thread T4 (mutexes: write M0):
#0 processEventsWhileBlocked networking.c:4135 (redis-server:arm64+0x10006d124)
#1 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c)
#2 bg_call_worker <null>:83232836 (blockedclient.so:arm64+0x16a8)
Previous read of size 4 at 0x0001045990c0 by main thread:
#0 afterSleep server.c:1861 (redis-server:arm64+0x100024f98)
#1 aeProcessEvents ae.c:408 (redis-server:arm64+0x10000fd64)
#2 aeMain ae.c:496 (redis-server:arm64+0x100010f0c)
#3 main server.c:7220 (redis-server:arm64+0x10003f38c)
```
2. aeApiPoll() is not thread-safe
When using RM_Yield to handle events in a module thread, if the main
thread has not yet
entered `afterSleep()`, both the module thread and the main thread may
touch `server.el` at the same time.
- Introduced:
Version: 7.0.0
PR: #9963
- Old / Abandoned Solution:
Adding a new mutex to protect timing between after beforeSleep() and
before afterSleep().
Defect: If the main thread enters the ae loop without any IO events, it
will wait until
the next timeout or until there is any event again, and the module
thread will
always hang until the main thread leaves the event loop.
- Related Warnings:
```
SUMMARY: ThreadSanitizer: data race ae_kqueue.c:55 in addEventMask
==================
==================
WARNING: ThreadSanitizer: data race (pid=14682)
Write of size 4 at 0x000100b54000 by thread T9 (mutexes: write M0):
#0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588)
#1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84)
#2 processEventsWhileBlocked networking.c:4138 (redis-server:arm64+0x10006d3c4)
#3 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c)
#4 bg_call_worker <null>:16042052 (blockedclient.so:arm64+0x169c)
Previous write of size 4 at 0x000100b54000 by main thread:
#0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588)
#1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84)
#2 aeMain ae.c:496 (redis-server:arm64+0x100010da8)
#3 main server.c:7238 (redis-server:arm64+0x10003f51c)
```
## The final fix as the comments:
https://github.com/redis/redis/pull/12817#discussion_r1436427232
Optimized solution based on the above comment:
First, we add `module_gil_acquring` to indicate whether the main thread
is currently in the acquiring GIL state.
When the module thread starts to yield, there are two possibilities(we
assume the caller keeps the GIL):
1. The main thread is in the mid of beforeSleep() and afterSleep(), that
is, `module_gil_acquring` is not 1 now.
At this point, the module thread will wake up the main thread through
the pipe and leave the yield,
waiting for the next yield when the main thread may already in the
acquiring GIL state.
2. The main thread is in the acquiring GIL state.
The module thread release the GIL, yielding CPU to give the main thread
an opportunity to start
event processing, and then acquire the GIL again until the main thread
releases it.
This is what
https://github.com/redis/redis/pull/12817#discussion_r1436427232
mentioned direction.
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
1. Ubuntu Lunar reached End of Life on January 25, 2024, so upgrade the
ubuntu version to plucky in action `test-ubuntu-jemalloc-fortify` to
pass the daily CI
2. The macOS-12 environment is deprecated so upgrade macos-12 to
macos-13 in daily CI
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/
Due to GitHub removing support for CentOS 7 in GitHub Actions, all
actions utilizing CentOS 7 need to be upgraded, upgrade the centos
version from `contos:7` to `quay.io/centos/centos:stream9` which is the
official RedHat centos container.
Create some new actions named `old-chain` to verify support for gcc 4.8.
This PR also includes the upgrade of actions/checkout from version 3 to
version 4.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
This change overcomes many stability issues experienced with the
vmactions action.
We need to limit VMs to 8GB for better stability, as the 13GB default
seems to hang them occasionally.
Shell code has been simplified since this action seem to use `bash -e`
which will abort on non-zero exit codes anyway.
PR #13428 doesn't fully resolve an issue where corruption errors can
still occur on loading of cluster.nodes file - seen on upgrade where
there were no shard_ids (from old Redis), 7.2.5 loading generated new
random ones, and persisted them to the file before gossip/handshake
could propagate the correct ones (or some other nodes unreachable).
This results in a primary/replica having differing shard_id in the
cluster.nodes and then the server cannot startup - reports corruption.
This PR builds on #13428 by simply ignoring the replica's shard_id in
cluster.nodes (if it exists), and uses the replica's primary's shard_id.
Additional handling was necessary to cover the case where the replica
appears before the primary in cluster.nodes, where it will first use a
generated shard_id for the primary, and then correct after it loads the
primary cluster.nodes entry.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
## Describe
When using the `XTRIM` command to trim a stream, it does not update the
maximal tombstone (`max_deleted_entry_id`). This leads to an issue where
the lag calculation incorrectly assumes that there are no tombstones
after the consumer group's last_id, resulting in an inaccurate lag.
The reason XTRIM doesn't need to update the maximal tombstone is that it
always trims from the beginning of the stream. This means that it
consistently changes the position of the first entry, leading to the
following scenarios:
1) First entry trimmed after maximal tombstone:
If the first entry is trimmed to a position after the maximal tombstone,
all tombstones will be before the first entry, so they won't affect the
consumer group's lag.
2) First entry trimmed before maximal tombstone:
If the first entry is trimmed to a position before the maximal
tombstone, the maximal tombstone will not be updated.
## Solution
Therefore, this PR optimizes the lag calculation by ensuring that when
both the consumer group's last_id and the maximal tombstone are behind
the first entry, the consumer group's lag is always equal to the number
of remaining elements in the stream.
Supplement to PR https://github.com/redis/redis/pull/13338
This PR is based on the commits from PR
https://github.com/valkey-io/valkey/pull/52.
Ref: https://github.com/redis/redis/pull/12760
Close https://github.com/redis/redis/issues/13401
This PR will replace https://github.com/redis/redis/pull/13449
Fixes compatibilty of Redis cluster (7.2 - extensions enabled by
default) with older Redis cluster (< 7.0 - extensions not handled) .
With some of the extensions enabled by default in 7.2 version, new nodes
running 7.2 and above start sending out larger clusterbus message
payload including the ping extensions. This caused an incompatibility
with node running engine versions < 7.0. Old nodes (< 7.0) would receive
the payload from new nodes (> 7.2) would observe a payload length
(totlen) > (estlen) and would perform an early exit and won't process
the message.
This fix does the following things:
1. Always set `CLUSTERMSG_FLAG0_EXT_DATA`, because during the meet
phase, we do not know whether the connected node supports ext data, we
need to make sure that it knows and send back its ext data if it has.
2. If another node does not support ext data, we will not send it ext
data to avoid the handshake failure due to the incorrect payload length.
Note: A successful `PING`/`PONG` is required as a sender for a given
node to be marked as `CLUSTERMSG_FLAG0_EXT_DATA` and then extensions
message
will be sent to it. This could cause a slight delay in receiving the
extensions message(s).
---------
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
---------
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
First, we need to ensure that `curmaster` in
`clusterUpdateSlotsConfigWith()` is not NULL in the line
82f00f5179/src/cluster_legacy.c (L2320)
otherwise, it will crash in the
82f00f5179/src/cluster_legacy.c (L2395)
So when loading cluster node config, we need to ensure that the
following conditions are met:
1. A node must be at least one of the master or replica.
2. If a node is a replica, its master can't be NULL.
Close https://github.com/redis/redis/issues/13414
When the cluster's master node fails and is switched to another node,
the first node in the shard node list (the old master) is no longer
valid.
Add a new method clusterGetMasterFromShard() to obtain the current
master.
Fix#13337
Ths PR fixes fixed two bugs that caused lag calculation errors.
1. When the latest tombstone is before the first entry, the tombstone
may stil be after the last id of consume group.
2. When a tombstone is after the last id of consume group, the group's
counter will be invalid, we should caculate the entries_read by using
estimates.
getKeysUsingKeySpece had the range check AFTER the allocation, of the
keys buffer, which could lead to an OOM panic when invalid arguments are
provided, leading to an overflow.
The allocated memory is only used after the range check, so there's no
risk of buffer overrun.
The OOM panic can happen on 32bit builds, or 64 builds running on
systems with less than 4GB of RAM, and is reachable via the COMMAND
GETKEYSANDFLAGS, and ACL key name validation.
In certain situations, we might generate a large number of propagates
(e.g., multi/exec, Lua script, or a single command generating tons of
propagations) within an event loop.
During the process of propagating to a replica, if the replica is
disconnected(marked as CLIENT_CLOSE_ASAP) due to exceeding the output
buffer limit, we should remove its reference to the global replication
buffer to avoid the global replication buffer being unable to be
properly trimmed due to being referenced.
---------
Co-authored-by: oranagra <oran@redislabs.com>
In #13224, we found a crash during cluster slot migration but don't know
why. So i check all the return C_OK in processCommand to see if we are
missing some duration reset and see this.
This fix is like #12247, when we reject the command, we should reset the
duration. I test it and verify it can fix#13224.
So the reason may because we are using stream block and then during the
slot migration, it got a redirect and then crash the server.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
`reclaimFilePageCache` did not set `errno` but `rdbSaveInternal` which
is logging the error assumed it did. This makes sure `errno` is set.
Fixes#13245
Signed-off-by: Ted Lyngmo <ted@lyncon.se>
This PR is based on the commits from PR #12944.
Allow SPUBLISH command within multi/exec on replica
Behavior on unstable:
```
127.0.0.1:6380> CLUSTER NODES
39ce8aa20f1f0d91f1a88d976ee1926dfefcdf1a 127.0.0.1:6380@16380 myself,slave 8b0feb120b68aac489d6a5af9c77dc40d71bc792 0 0 0 connected
8b0feb120b68aac489d6a5af9c77dc40d71bc792 127.0.0.1:6379@16379 master - 0 1705091681202 0 connected 0-16383
127.0.0.1:6380> SPUBLISH hello world
(integer) 0
127.0.0.1:6380> MULTI
OK
127.0.0.1:6380(TX)> SPUBLISH hello world
QUEUED
127.0.0.1:6380(TX)> EXEC
(error) MOVED 866 127.0.0.1:6379
```
With this change:
```
127.0.0.1:6380> SPUBLISH hello world
(integer) 0
127.0.0.1:6380> MULTI
OK
127.0.0.1:6380(TX)> SPUBLISH hello world
QUEUED
127.0.0.1:6380(TX)> EXEC
1) (integer) 0
```
---------
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: oranagra <oran@redislabs.com>
Fix#12792
On ubuntu 23(lunar), non-root users will not be allowed to change the
oom_score_adj of a process to a value that is too low.
Since terminal's default oom_score_adj is 200, if we run the test on
terminal, we won't be able to set the oom_score_adj of the redis process
to 9 or 22, which is too low.
Reproduction on ubuntu 23(lunar) terminal:
```sh
$ cat /proc/`pgrep redis-server`/oom_score_adj
200
$ echo 100 > /proc/`pgrep redis-server`/oom_score_adj
# success without error
$ echo 99 > /proc/`pgrep redis-server`/oom_score_adj
echo: write error: Permission denied
```
As from the output above, we can only set the minimum oom score of redis
processes to 100.
By modifying the test, make oom_score_adj only increase upwards and not
decrease.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
this PR fixes two crashes:
1. Fix missing slotToKeyInit() when using `flushdb async` under cluster
mode.
https://github.com/redis/redis/issues/13205
2. Fix missing expires_cursor reset when stopping active defrag in the
middle of defragment.
https://github.com/redis/redis/issues/13307
If we stop active defrag in the middle of defragging db->expires, if
`expires_cursor` is not reset to 0, the next time we enable active
defrag again, defragLaterStep(db, ...) will be entered. However, at this
time, `db` has been reset to NULL, which results in crash.
The affected code were removed by #11695 and #13058 in usntable, so we
just need backport this to 7.2.
In `beginResultEmission`, -1 means the result length is not known in
advance. But after #12185, if we pass -1 to `zrangeResultBeginStore`, it
will convert to SIZE_MAX in `zsetTypeCreate` and try to `dictExpand`.
Although `dictExpand` won't succeed because the size overflows, I think
we'd better to avoid this wrong conversion.
This bug can be triggered when the source of `zrangestore` doesn't exist
or we use `zrangestore` command with `byscore` or `bylex`.
The impact is that dst keys will be converted to use skiplist instead of
listpack.
(cherry picked from commit bad33f8738)
The check in fileIsManifest misjudged the manifest file. For example,
if resp aof contains "file", it will be considered a manifest file and
the check will fail:
```
*3
$3
set
$4
file
$4
file
```
In #12951, if the preamble aof also contains it, it will also fail.
Fixes#12951.
the bug was happening if the the word "file" is mentioned
in the first 1024 lines of the AOF. and now as soon as it finds
a non-comment line it'll break (if it contains "file" or doesn't)
(cherry picked from commit da727ad445)