Commit Graph

12280 Commits

Author SHA1 Message Date
Yuan Wang 033abd6f57
Async IO threads (#13665)
## Introduction
Redis introduced IO Thread in 6.0, allowing IO threads to handle client
request reading, command parsing and reply writing, thereby improving
performance. The current IO thread implementation has a few drawbacks.
- The main thread is blocked during IO thread read/write operations and
must wait for all IO threads to complete their current tasks before it
can continue execution. In other words, the entire process is
synchronous. This prevents the efficient utilization of multi-core CPUs
for parallel processing.

- When the number of clients and requests increases moderately, it
causes all IO threads to reach full CPU utilization due to the busy wait
mechanism used by the IO threads. This makes it challenging for us to
determine which part of Redis has reached its bottleneck.

- When IO threads are enabled with TLS and io-threads-do-reads, a
disconnection of a connection with pending data may result in it being
assigned to multiple IO threads simultaneously. This can cause race
conditions and trigger assertion failures. Related issue:
https://github.com/redis/redis/issues/12540

Therefore, we designed an asynchronous IO threads solution. The IO
threads adopt an event-driven model, with the main thread dedicated to
command processing, meanwhile, the IO threads handle client read and
write operations in parallel.

## Implementation
### Overall
As before, we did not change the fact that all client commands must be
executed on the main thread, because Redis was originally designed to be
single-threaded, and processing commands in a multi-threaded manner
would inevitably introduce numerous race and synchronization issues. But
now each IO thread has independent event loop, therefore, IO threads can
use a multiplexing approach to handle client read and write operations,
eliminating the CPU overhead caused by busy-waiting.

the execution process can be briefly described as follows:
the main thread assigns clients to IO threads after accepting
connections, IO threads will notify the main thread when clients
finish reading and parsing queries, then the main thread processes
queries from IO threads and generates replies, IO threads handle
writing reply to clients after receiving clients list from main thread,
and then continue to handle client read and write events.

### Each IO thread has independent event loop
We now assign each IO thread its own event loop. This approach
eliminates the need for the main thread to perform the costly
`epoll_wait` operation for handling connections (except for specific
ones). Instead, the main thread processes requests from the IO threads
and hands them back once completed, fully offloading read and write
events to the IO threads.

Additionally, all TLS operations, including handling pending data, have
been moved entirely to the IO threads. This resolves the issue where
io-threads-do-reads could not be used with TLS.

### Event-notified client queue
To facilitate communication between the IO threads and the main thread,
we designed an event-notified client queue. Each IO thread and the main
thread have two such queues to store clients waiting to be processed.
These queues are also integrated with the event loop to enable handling.
We use pthread_mutex to ensure the safety of queue operations, as well
as data visibility and ordering, and race conditions are minimized, as
each IO thread and the main thread operate on independent queues,
avoiding thread suspension due to lock contention. And we implemented an
event notifier based on `eventfd` or `pipe` to support event-driven
handling.

### Thread safety
Since the main thread and IO threads can execute in parallel, we must
handle data race issues carefully.

**client->flags**
The primary tasks of IO threads are reading and writing, i.e.
`readQueryFromClient` and `writeToClient`. However, IO threads and the
main thread may concurrently modify or access `client->flags`, leading
to potential race conditions. To address this, we introduced an io-flags
variable to record operations performed by IO threads, thereby avoiding
race conditions on `client->flags`.

**Pause IO thread**
In the main thread, we may want to operate data of IO threads, maybe
uninstall event handler, access or operate query/output buffer or resize
event loop, we need a clean and safe context to do that. We pause IO
thread in `IOThreadBeforeSleep`, do some jobs and then resume it. To
avoid thread suspended, we use busy waiting to confirm the target
status. Besides we use atomic variable to make sure memory visibility
and ordering. We introduce these functions to pause/resume IO Threads as
below.
```
pauseIOThread, resumeIOThread
pauseAllIOThreads, resumeAllIOThreads
pauseIOThreadsRange, resumeIOThreadsRange
```
Testing has shown that `pauseIOThread` is highly efficient, allowing the
main thread to execute nearly 200,000 operations per second during
stress tests. Similarly, `pauseAllIOThreads` with 8 IO threads can
handle up to nearly 56,000 operations per second. But operations
performed between pausing and resuming IO threads must be quick;
otherwise, they could cause the IO threads to reach full CPU
utilization.

**freeClient and freeClientAsync**
The main thread may need to terminate a client currently running on an
IO thread, for example, due to ACL rule changes, reaching the output
buffer limit, or evicting a client. In such cases, we need to pause the
IO thread to safely operate on the client.

**maxclients and maxmemory-clients updating**
When adjusting `maxclients`, we need to resize the event loop for all IO
threads. Similarly, when modifying `maxmemory-clients`, we need to
traverse all clients to calculate their memory usage. To ensure safe
operations, we pause all IO threads during these adjustments.

**Client info reading**
The main thread may need to read a client’s fields to generate a
descriptive string, such as for the `CLIENT LIST` command or logging
purposes. In such cases, we need to pause the IO thread handling that
client. If information for all clients needs to be displayed, all IO
threads must be paused.

**Tracking redirect**
Redis supports the tracking feature and can even send invalidation
messages to a connection with a specified ID. But the target client may
be running on IO thread, directly manipulating the client’s output
buffer is not thread-safe, and the IO thread may not be aware that the
client requires a response. In such cases, we pause the IO thread
handling the client, modify the output buffer, and install a write event
handler to ensure proper handling.

**clientsCron**
In the `clientsCron` function, the main thread needs to traverse all
clients to perform operations such as timeout checks, verifying whether
they have reached the soft output buffer limit, resizing the
output/query buffer, or updating memory usage. To safely operate on a
client, the IO thread handling that client must be paused.
If we were to pause the IO thread for each client individually, the
efficiency would be very low. Conversely, pausing all IO threads
simultaneously would be costly, especially when there are many IO
threads, as clientsCron is invoked relatively frequently.
To address this, we adopted a batched approach for pausing IO threads.
At most, 8 IO threads are paused at a time. The operations mentioned
above are only performed on clients running in the paused IO threads,
significantly reducing overhead while maintaining safety.

### Observability
In the current design, the main thread always assigns clients to the IO
thread with the least clients. To clearly observe the number of clients
handled by each IO thread, we added the new section in INFO output. The
`INFO THREADS` section can show the client count for each IO thread.
```
# Threads
io_thread_0:clients=0
io_thread_1:clients=2
io_thread_2:clients=2
```

Additionally, in the `CLIENT LIST` output, we also added a field to
indicate the thread to which each client is assigned.

`id=244 addr=127.0.0.1:41870 laddr=127.0.0.1:6379 ... resp=2 lib-name=
lib-ver= io-thread=1`

## Trade-off
### Special Clients
For certain special types of clients, keeping them running on IO threads
would result in severe race issues that are difficult to resolve.
Therefore, we chose not to offload these clients to the IO threads.

For replica, monitor, subscribe, and tracking clients, main thread may
directly write them a reply when conditions are met. Race issues are
difficult to resolve, so we have them processed in the main thread. This
includes the Lua debug clients as well, since we may operate connection
directly.

For blocking client, after the IO thread reads and parses a command and
hands it over to the main thread, if the client is identified as a
blocking type, it will be remained in the main thread. Once the blocking
operation completes and the reply is generated, the client is
transferred back to the IO thread to send the reply and wait for event
triggers.

### Clients Eviction
To support client eviction, it is necessary to update each client’s
memory usage promptly during operations such as read, write, or command
execution. However, when a client operates on an IO thread, it is not
feasible to update the memory usage immediately due to the risk of data
races. As a result, memory usage can only be updated either in the main
thread while processing commands or in the `ClientsCron` periodically.
The downside of this approach is that updates might experience a delay
of up to one second, which could impact the precision of memory
management for eviction.

To avoid incorrectly evicting clients. We adopted a best-effort
compensation solution, when we decide to eviction a client, we update
its memory usage again before evicting, if the memory used by the client
does not decrease or memory usage bucket is not changed, then we will
evict it, otherwise, not evict it.

However, we have not completely solved this problem. Due to the delay in
memory usage updates, it may lead us to make incorrect decisions about
the need to evict clients.

### Defragment
In the majority of cases we do NOT use the data from argv directly in
the db.
1. key names
We store a copy that we allocate in the main thread, see `sdsdup()` in
`dbAdd()`.
2. hash key and value
We store key as hfield and store value as sds, see `hfieldNew()` and
`sdsdup()` in `hashTypeSet()`.
3. other datatypes
   They don't even use SDS, so there is no reference issues.

But in some cases client the data from argv may be retain by the main
thread.
As a result, during fragmentation cleanup, we need to move allocations
from the IO thread’s arena to the main thread’s arena. We always
allocate new memory in the main thread’s arena, but the memory released
by IO threads may not yet have been reclaimed. This ultimately causes
the fragmentation rate to be higher compared to creating and allocating
entirely within a single thread.
The following cases below will lead to memory allocated by the IO thread
being kept by the main thread.
1. string related command: `append`, `getset`, `mset` and `set`.
If `tryObjectEncoding()` does not change argv, we will keep it directly
in the main thread, see the code in `tryObjectEncoding()`(specifically
`trimStringObjectIfNeeded()`)
2. block related command.
    the key names will be kept in `c->db->blocking_keys`.
3. watch command
    the key names will be kept in `c->db->watched_keys`.
4. [s]subscribe command
    channel name will be kept in `serverPubSubChannels`.
5. script load command
    script will be kept in `server.lua_scripts`.
7. some module API: `RM_RetainString`, `RM_HoldString`

Those issues will be handled in other PRs.

## Testing
### Functional Testing
The commit with enabling IO Threads has passed all TCL tests, but we did
some changes:
**Client query buffer**: In the original code, when using a reusable
query buffer, ownership of the query buffer would be released after the
command was processed. However, with IO threads enabled, the client
transitions from an IO thread to the main thread for processing. This
causes the ownership release to occur earlier than the command
execution. As a result, when IO threads are enabled, the client's
information will never indicate that a shared query buffer is in use.
Therefore, we skip the corresponding query buffer tests in this case.
**Defragment**: Add a new defragmentation test to verify the effect of
io threads on defragmentation.
**Command delay**: For deferred clients in TCL tests, due to clients
being assigned to different threads for execution, delays may occur. To
address this, we introduced conditional waiting: the process proceeds to
the next step only when the `client list` contains the corresponding
commands.

### Sanitizer Testing
The commit passed all TCL tests and reported no errors when compiled
with the `fsanitizer=thread` and `fsanitizer=address` options enabled.
But we made the following modifications: we suppressed the sanitizer
warnings for clients with watched keys when updating `client->flags`, we
think IO threads read `client->flags`, but never modify it or read the
`CLIENT_DIRTY_CAS` bit, main thread just only modifies this bit, so
there is no actual data race.

## Others
### IO thread number
In the new multi-threaded design, the main thread is primarily focused
on command processing to improve performance. Typically, the main thread
does not handle regular client I/O operations but is responsible for
clients such as replication and tracking clients. To avoid breaking
changes, we still consider the main thread as the first IO thread.

When the io-threads configuration is set to a low value (e.g., 2),
performance does not show a significant improvement compared to a
single-threaded setup for simple commands (such as SET or GET), as the
main thread does not consume much CPU for these simple operations. This
results in underutilized multi-core capacity. However, for more complex
commands, having a low number of IO threads may still be beneficial.
Therefore, it’s important to adjust the `io-threads` based on your own
performance tests.

Additionally, you can clearly monitor the CPU utilization of the main
thread and IO threads using `top -H -p $redis_pid`. This allows you to
easily identify where the bottleneck is. If the IO thread is the
bottleneck, increasing the `io-threads` will improve performance. If the
main thread is the bottleneck, the overall performance can only be
scaled by increasing the number of shards or replicas.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: oranagra <oran@redislabs.com>
2024-12-22 19:30:37 +08:00
Yuan Wang 779af3ab92
Dynamic event loop binding for connection structure (#13642)
The IO thread has an independent event loop, so we can no longer
hard-code the event loop to the connection, instead, we should
dynamically select the event loop for the connection.

- configure the event loop during connection creation.
- add a new interface to allow dynamic event loop binding.

For TLS connection, we need to check for any pending data on the
connection and handle it accordingly when changing connection cross IO
thread and main thread. This commit doesn't handle it, @sundb will
overall support for TLS connection later.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2024-11-08 15:58:10 +08:00
guybe7 ded8d993b7
Modules: defrag CB should take robj, not sds (#13627)
Added a log of the keyname in the test modules to reproduce the problem
(tests crash without the fix)
2024-10-30 17:32:51 +08:00
Moti Cohen 6437d07b03
Fix memory leak on rdbload error (#13626)
On RDB load error, if an invalid `expireAt` value is read,
`dupSearchDict` is not released.
2024-10-30 10:03:31 +02:00
debing.sun 4b29be3f36
Avoid redundant lpGet to boost quicklistCompare (#11533)
`lpCompare()` in `quicklistCompare()` will call `lpGet()` again, which
would be a waste.
The change will result in a boost for all commands that use
`quicklistCompre()`, including `linsert`, `lpos` and `lrem`.
2024-10-30 08:45:25 +08:00
Moti Cohen 2ec78d262d
Add KEYSIZES section to INFO (#13592)
This PR adds a new section to the `INFO` command output, called
`keysizes`. This section provides detailed statistics on the
distribution of key sizes for each data type (strings, lists, sets,
hashes and zsets) within the dataset. The distribution is tracked using
a base-2 logarithmic histogram.

# Motivation
Currently, Redis lacks a built-in feature to track key sizes and item
sizes per data type at a granular level. Understanding the distribution
of key sizes is critical for monitoring memory usage and optimizing
performance, particularly in large datasets. This enhancement will allow
users to inspect the size distribution of keys directly from the `INFO`
command, assisting with performance analysis and capacity planning.

# Changes
New Section in `INFO` Command: A new section called `keysizes` has been
added to the `INFO` command output. This section reports a per-database,
per-type histogram of key sizes. It provides insights into how many keys
fall into specific size ranges (represented in powers of 2).

**Example output:**
```
127.0.0.1:6379> INFO keysizes
# Keysizes
db0_distrib_strings_sizes:1=19,2=655,512=100899,1K=31,2K=29,4K=23,8K=16,16K=3,32K=2
db0_distrib_lists_items:1=5784492,32=3558,64=1047,128=676,256=533,512=218,4K=1,8K=42
db0_distrib_sets_items:1=735564=50612,8=21462,64=1365,128=974,2K=292,4K=154,8K=89,
db0_distrib_hashes_items:2=1,4=544,32=141169,64=207329,128=4349,256=136226,1K=1
```
## Future Use Cases:
The key size distribution is collected per slot as well, laying the
groundwork for future enhancements related to Redis Cluster.
2024-10-29 13:07:26 +02:00
Shockingly Good 611c950293
Fix crash in RM_GetCurrentUserName() when the user isn't accessible (#13619)
The crash happens whenever the user isn't accessible, for example, it
isn't set for the context (when it is temporary) or in some other cases
like `notifyKeyspaceEvent`. To properly check for the ACL compliance, we
need to get the user name and the user to invoke other APIs. However, it
is not possible if it crashes, and it is impossible to work that around
in the code since we don't know (and **shouldn't know**!) when it is
available and when it is not.
2024-10-28 21:26:29 +08:00
opt-m 0a8e546957
Fix get # option in sort command (#13608)
From 7.4, Redis allows `GET` options in cluster mode when the pattern maps to
the same slot as the key, but GET # pattern that represents key itself is missed.
This commit resolves it, bug report #13607.

---------

Co-authored-by: Yuan Wang <yuan.wang@redis.com>
2024-10-22 09:55:00 +08:00
debing.sun 4f8cdc2a1e
Fix compilation on compilers that do not support target attribute (#13609)
introduced by https://github.com/redis/redis/pull/13359
failure CI on ARM64:
https://github.com/redis/redis-extra-ci/actions/runs/11377893230/job/31652773710

---------

Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
Co-authored-by: ShooterIT <wangyuancode@163.com>
2024-10-18 09:11:23 +08:00
hanhui365 3788a055fe
Optimize bitcount command by using popcnt (#13359)
Nowadays popcnt instruction is almost supported by X86 machine, which is
used to calculate "Hamming weight", it can bring much performance boost
in redis bitcount comand.

---------

Signed-off-by: hanhui365(hanhui@hygon.cn)
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: oranagra <oran@redislabs.com>
Co-authored-by: Nugine <nugine@foxmail.com>
2024-10-17 09:13:19 +08:00
Yuan Wang b71a610f5c
Clean up .rediscli_history_test temporary file (#13601)
After running test in local, there will be a file named
`.rediscli_history_test`, and it is not in `.gitignore` file, so this is
considered to have changed the code base. It is a little annoying, this
commit just clean up the temporary file.

We should delete `.rediscli_history_test` in the end since the second
server tests also write somethings into it, to make it corresponding, i
put `set ::env(REDISCLI_HISTFILE) ".rediscli_history_test"` at the
beginning.

Maybe we also can add this file into `.gitignore`
2024-10-17 09:12:11 +08:00
YaacovHazan efcfffc528
Update modules with latest version (#13606)
Update redisbloom, redisjson and redistimeseries versions to 7.99.1

Co-authored-by: YaacovHazan <yaacov.hazan@redislabs.com>
2024-10-15 19:58:42 +03:00
paoloredis 99d09c824c
Only run redis_docs_sync.yaml on latest release (#13603)
We only want to trigger the workflow on the documentation repository for
the latest release
2024-10-15 16:02:11 +03:00
YaacovHazan 6c5e263d7b
Temporarily hide the new SFLUSH command by marking it as experimental (#13600)
- Add a new 'EXPERIMENTAL' command flag, which causes the command
generator to skip over it and make the command to be unavailable for
execution
- Skip experimental tests by default
- Move the SFLUSH tests from the old framework to the new one

---------

Co-authored-by: YaacovHazan <yaacov.hazan@redislabs.com>
2024-10-15 11:02:51 +03:00
debing.sun 3fc7ef8f81
Fix race in stream-cgroups test (#13593)
failed CI:
https://github.com/redis/redis/actions/runs/11171608362/job/31056659165
https://github.com/redis/redis/actions/runs/11226025974/job/31205787575
2024-10-12 09:23:19 +08:00
guybe7 a38c29b6c8
Cleanups related to expiry/eviction (#13591)
1. `dbRandomKey`: excessive call to `dbFindExpires` (will always return
1 if `allvolatile` + anyway called inside `expireIfNeeded`
2. Add `deleteKeyAndPropagate` that is used by both expiry/eviction
3. Change the order of calls in `expireIfNeeded` to save redundant calls
to `keyIsExpired`
4. `expireIfNeeded`: move `OBJ_STATIC_REFCOUNT` to
`deleteKeyAndPropagate`
5. `performEvictions` now uses `deleteEvictedKeyAndPropagate`
6. active-expire: moved `postExecutionUnitOperations` inside
`activeExpireCycleTryExpire`
7. `activeExpireCycleTryExpire`: less indentation + expire a key if `now
== t`
8. rename `lazy_expire_disabled` to `allow_access_expired`
2024-10-10 16:58:52 +08:00
Oran Agra 472d8a0df5 Prevent pattern matching abuse (CVE-2024-31228) 2024-10-08 20:55:44 +03:00
Oran Agra 8ec5da785b Fix ACL SETUSER Read/Write key pattern selector (CVE-2024-31227)
The '%' rule must contain one or both of R/W
2024-10-08 20:55:44 +03:00
Oran Agra 3a2669e8ae Fix lua bit.tohex (CVE-2024-31449)
INT_MIN value must be explicitly checked, and cannot be negated.
2024-10-08 20:55:44 +03:00
alonre24 f39e51178e
Update target module in search (#13578)
Update search target path and version from M02
2024-10-08 13:58:28 +03:00
chx9 5f7d7ce8b0
fix typo in test_helper.tcl (#13576)
fix typo in test_helper.tcl: even driven => event driven
2024-10-08 14:15:48 +08:00
Moti Cohen d092d64d7a
Add new SFLUSH command to cluster for slot-based FLUSH (#13564)
This PR introduces a new `SFLUSH` command to cluster mode that allows
partial flushing of nodes based on specified slot ranges. Current
implementation is designed to flush all slots of a shard, but future
extensions could allow for more granular flushing.

**Command Usage:**
`SFLUSH <start-slot> <end-slot> [<start-slot> <end-slot>]* [SYNC|ASYNC]`

This command removes all data from the specified slots, either
synchronously or asynchronously depending on the optional SYNC/ASYNC
argument.

**Functionality:**
Current imp of `SFLUSH` command verifies that the provided slot ranges
are valid and cover all of the node's slots before proceeding. If slots
are partially or incorrectly specified, the command will fail and return
an error, ensuring that all slots of a node must be fully covered for
the flush to proceed.

The function supports both synchronous (default) and asynchronous
flushing. In addition, if possible, SFLUSH SYNC will be run as blocking
ASYNC as an optimization.
2024-09-29 09:13:21 +03:00
Ozan Tezcan 99c40ab53d
Use hashtable as the default type of temp set object during sunion/sdiff (#13567)
This PR is based on https://github.com/valkey-io/valkey/pull/996


Currently, for operations like SUNION or SDIFF, temporary set object can
be intset or listpack. Search operation is costly for these encodings.
This patch tries to set the temporary set object as hash table by
default. It also tries to determine correct encoding for the temporary
set object to reduce the unnecessary conversation.

This change is supposed to give performance boost for tests like:
-
[memtier_benchmark-2keys-set-10-100-elements-sdiff](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-2keys-set-10-100-elements-sdiff.yml)
66.2% IMPROVEMENT
-
[memtier_benchmark-2keys-set-10-100-elements-sunion](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-2keys-set-10-100-elements-sunion.yml)
126.5% IMPROVEMENT

-------
Co-authored-by: Lipeng Zhu <lipeng.zhu@intel.com>
Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>

Co-authored-by: Lipeng Zhu <lipeng.zhu@intel.com>
Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>
2024-09-25 12:41:17 +03:00
Moti Cohen 26ef28467a
Optimize ZUNION[STORE] by avoiding redundant temporary dict usage (#13566)
This PR is based on valkey-io/valkey#829

Previously, ZUNION and ZUNIONSTORE commands used a temporary accumulator dict
and at the end copied it as-is to dstzset->dict. This PR removes accumulator and directly
stores into dstzset->dict, eliminating the extra copy.

Co-authored-by: Rayacoo zisong.cw@alibaba-inc.com
2024-09-25 11:55:00 +03:00
Moti Cohen 5f28bd96db
Fix race in HFE tests (#13563)
Test 1 - give more time for expiration
Test 2 - Evaluate expiration time boundaries [+1,+2] before setting expiration [+1]
Test 3 - Avoid race on test HFEs propagated to replica
2024-09-23 10:30:29 +03:00
debing.sun 438cfed70a
Replace wrongly free with zfree in redis-cli (#13560)
#13258 Incorrect use of free instead of zfree
2024-09-23 09:40:47 +08:00
Moti Cohen 3a3cacfefa
Extend modules API to read also expired keys and subkeys (#13526)
The PR extends `RedisModule_OpenKey`'s flags to include
`REDISMODULE_OPEN_KEY_ACCESS_EXPIRED`, which allows to access expired
keys.

It also allows to access expired subkeys. Currently relevant only for
hash fields
and has its impact on `RM_HashGet` and `RM_Scan`.
2024-09-19 20:47:00 +03:00
debing.sun 617909e943
Align the offset in ASCII logo (#13557)
Since `\\` is only one character, we need to add an extra space to the right.
2024-09-18 14:42:32 +08:00
adamiBs e9cbfccec6
Support `musl` Rust Installation in Modules Makefile (#13549)
This PR introduces the installation of the `musl`-based version of Rust,
in order to support alpine-based runtime environments (Rust is used by
[RedisJSON](https://github.com/RedisJSON/RedisJSON)).
2024-09-15 20:23:05 +03:00
Filipe Oliveira (Redis) 7b69183a8d
Replace usage of _addReplyLongLongWithPrefix with specific bulk/mbulk functions to reduce condition checks in hotpath. (#13520)
Instead of adding runtime logic to decide which prefix/shared object to
use when doing the reply we can simply use an inline method to avoid runtime
overhead of condition checks, and also keep the code change small.
Preliminary data show improvements on commands that heavily rely on
bulk/mbulk replies (example of LRANGE).

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2024-09-15 21:40:09 +08:00
Filipe Oliveira (Personal) af7fca797a
Using fast_float library for faster parsing of 64 decimal strings. (#11884)
Fixes #8825 

We're using the fast_float library[1] in our (compiled-in)
floating-point fast_float_strtod implementation for faster and more
portable parsing of 64 decimal strings.

The single file fast_float.h is an amalgamation of the entire library,
which can be (re)generated with the amalgamate.py script (from the
fast_float repository) via the command:

```
python3 ./script/amalgamate.py --license=MIT > $REDIS_SRC/deps/fast_float/fast_float.h
```

[1]: https://github.com/fastfloat/fast_float

The used commit from fast_float library was the one from
https://github.com/fastfloat/fast_float/releases/tag/v3.10.1

---------

Co-authored-by: fcostaoliveira <filipe@redis.com>
2024-09-15 21:37:29 +08:00
Filipe Oliveira (Redis) 9146ac050b
Optimize HSCAN/ZSCAN command in case of listpack encoding: avoid the usage of intermediate list (#13531)
Similar to #13530 , applied to HSCAN and ZSCAN in case of listpack
encoding.

**Preliminary benchmark results showcase an improvement of 108% on the
achievable ops/sec for ZSCAN and 65% for HSCAN**.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2024-09-13 20:36:19 +08:00
debing.sun ef3a5f58a8
Fix missing initialization of EbucketsIterator->isRax (#13545)
in https://github.com/redis/redis/pull/13519, when `eb` is empty,
`isRax` is not correctly initialized to 0, which can lead to `ebStop()`
potentially entering the wrong rax branch.
2024-09-13 17:12:27 +08:00
Filipe Oliveira (Redis) f2f85ba354
Optimize SSCAN command in case of listpack or intset encoding: avoid the usage of intermediate list. From 2N to N iterations (#13530)
On SSCAN, in case of listpack and intset encoding we actually reply the
entire set, and always reply with the cursor 0.

For those cases, we don't need to accumulate the replies in a list and
can completely avoid the overhead of list appending and then iterating
over the list again -- meaning we do N iterations instead of 2N
iterations over the SET and save intermediate memory as well.

Preliminary benchmarks, `SSCAN set:100 0`, showcased an improvement of
60% as visible bellow on a SET with 100 string elements (listpack
encoded).
2024-09-12 22:36:54 +08:00
Moti Cohen c115c5230e
Add iterator capability to ebuckets DS (#13519)
Add basic iterator API for ebuckets of start, next, nextBucket and stop.
2024-09-12 15:02:32 +03:00
Moti Cohen 65a87cb773
Correct spelling error at t_hash.c comment (#13540)
spell check error : ./src/t_hash.c:1141: RESOTRE ==> RESTORE
2024-09-12 12:50:44 +03:00
Moti Cohen 9a89e32a95
HFE - Fix key ref by the hash on RENAME/MOVE/SWAPDB/RESTORE (#13539)
If the hash previously had HFEs (hash-fields with expiration) but later no longer
does, the key ref in the hash might become outdated after a MOVE, COPY,
RENAME or RESTORE operation. These commands maintain the key ref only
if HFEs are present. That is, we can only be sure that key ref is valid as long as the
hash has HFEs.
2024-09-12 12:40:12 +03:00
Oran Agra 610eb26c11
RED-129256, Fix TOUCH command from script in no-touch mode (#13512)
When a client in no-touch mode issues a TOUCH command on a key, the
key’s access time should be updated, but in scripts, and module's
RM_Call, it isn’t updated.
Command proc should be matched to the executing client, not the current
client.

Co-authored-by: Udi Ron <udi@speedb.io>
2024-09-12 11:33:26 +03:00
Steve d265a61438
Avoid cluster.nodes load corruption due to shard-id generation (#13468)
PR #13428 doesn't fully resolve an issue where corruption errors can
still occur on loading of cluster.nodes file - seen on upgrade where
there were no shard_ids (from old Redis), 7.2.5 loading generated new
random ones, and persisted them to the file before gossip/handshake
could propagate the correct ones (or some other nodes unreachable).
This results in a primary/replica having differing shard_id in the
cluster.nodes and then the server cannot startup - reports corruption.

This PR builds on #13428 by simply ignoring the replica's shard_id in
cluster.nodes (if it exists), and uses the replica's primary's shard_id.
Additional handling was necessary to cover the case where the replica
appears before the primary in cluster.nodes, where it will first use a
generated shard_id for the primary, and then correct after it loads the
primary cluster.nodes entry.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2024-09-12 14:01:09 +08:00
debing.sun 2dd4cca363
Increment kvstore's non_empty_dicts only on first insert (#13528)
Found by @oranagra 

Currently, when the size of dict becomes 1, we do not check whether
`delta` is positive or negative.
As a result, `non_empty_dicts` is still incremented when the size of
dict changes from 2 to 1.
We should only increment `non_empty_dicts` when `delta` is positive, as
this indicates the first time an element is inserted into the dict.

---------

Co-authored-by: oranagra <oran@redislabs.com>
2024-09-11 09:36:01 +08:00
Filipe Oliveira (Redis) bcae770819
Optimize LREM, LPOS, LINSERT, LINDEX: Avoid N-1 sdslen() calls on listTypeEqual (#13529)
This is a very easy optimization, that avoids duplicate computation of
the object length for LREM, LPOS, LINSERT na LINDEX.

We can see that sdslen takes 7.7% of the total CPU cycles of the
benchmarks.

Function Stack | CPU Time: Total | CPU Time: Self | Module | Function
(Full) | Source File | Start Address
-- | -- | -- | -- | -- | -- | --
listTypeEqual | 15.50% | 2.346s | redis-server | listTypeEqual |
t_list.c | 0x845dd
sdslen | 7.70% | 2.300s | redis-server | sdslen | sds.h | 0x845e4

Preliminary data showcases 4% improvement in the achievable ops/sec of
LPOS in string elements, and 2% in int elements.
2024-09-10 20:26:36 +08:00
YaacovHazan bf802b0764
Add the option to build Redis with modules (#13524)
A new BUILD_WITH_MODULES flag was added to the Makefile to control
building the module directory.

The new module directory includes a general Makefile that iterates
over each module, fetch a specific version, and build it.

Co-authored-by: YaacovHazan <yaacov.hazan@redislabs.com>
2024-09-09 15:47:02 +03:00
Ozan Tezcan ac03e3721d
Fix flaky replication tests (#13518)
#13495 introduced a change to reply -LOADING while flushing existing db on a replica. Some of our tests are
 sensitive to this change and do no expect -LOADING reply.

Fixing a couple of tests that fail time to time.
2024-09-08 12:54:01 +03:00
Filipe Oliveira (Redis) 31227f4faf
Optimize client type check on reply hot code paths (#13516)
## Proposed improvement

This PR introduces the static inlined function `clientTypeIsSlave` which
is doing only 1 condition check vs 3 checks of `getClientType`, and also
uses the `unlikely` to tell the compiler that the most common outcome is
for the client not to be a slave.
Preliminary data show 3% improvement on the achievable ops/sec on the
specific LRANGE benchmark. After running the entire suite we see up to
5% improvement in 2 tests.
https://github.com/redis/redis/pull/13516#issuecomment-2331326052

## Context

This optimization efforts comes from analyzing the profile info from the
[memtier_benchmark-1key-list-1K-elements-lrange-all-elements](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1key-list-1K-elements-lrange-all-elements.yml)
benchmark.
 
By going over it, we can see that `getClientType` consumes 2% of the cpu
time, strictly to check if the client is a slave (
https://github.com/redis/redis/blob/unstable/src/networking.c#L397 , and
https://github.com/redis/redis/blob/unstable/src/networking.c#L1254 )


Function | CPU Time: Total | CPU Time: Self | Module | Function (Full)
-- | -- | -- | -- | --
_addReplyToBufferOrList->getClientType | 1.20% | 0.728s | redis-server |
getClientType
clientHasPendingReplies->getClientType | 0.80% | 0.482s | redis-server |
getClientType

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2024-09-06 10:24:30 +08:00
Max Malekzadeh f6f11f3ef1
Remove outdated "Try Redis" link in README.md (#13498) 2024-09-05 22:04:49 +08:00
Moti Cohen 569584d463
HFE - Simplify logic of HGETALL command (#13425) 2024-09-05 12:48:44 +03:00
debing.sun ea3e8b79a1
Introduce reusable query buffer for client reads (#13488)
This PR is based on the commits from PR
https://github.com/valkey-io/valkey/pull/258,
https://github.com/valkey-io/valkey/pull/593,
https://github.com/valkey-io/valkey/pull/639

This PR optimizes client query buffer handling in Redis by introducing
a reusable query buffer that is used by default for client reads. This
reduces memory usage by ~20KB per client by avoiding allocations for
most clients using short (<16KB) complete commands. For larger or
partial commands, the client still gets its own private buffer.

The primary changes are:

* Adding a reusable query buffer `thread_shared_qb` that clients use by
default.
* Modifying client querybuf initialization and reset logic.
* Freeing idle client query buffers when empty to allow reuse of the
reusable query buffer.
* Master client query buffers are kept private as their contents need to
be preserved for replication stream.
* When nested commands is executed, only the first user uses the reuse
buffer, and subsequent users will still use the private buffer.

In addition to the memory savings, this change shows a 3% improvement in
latency and throughput when running with 1000 active clients.

The memory reduction may also help reduce the need to evict clients when
reaching max memory limit, as the query buffer is the main memory
consumer per client.

This PR is different from https://github.com/valkey-io/valkey/pull/258
1. When a client is in the mid of requiring a reused buffer and
returning it, regardless of whether the query buffer has changed
(expanded), we do not update the reused query buffer in the middle, but
return the reused query buffer (expanded or with data remaining) or
reset it at the end.
2. Adding a new thread variable `thread_shared_qb_used` to avoid
multiple clients requiring the reusable query buffer at the same time.

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Madelyn Olson <matolson@amazon.com>
Co-authored-by: Uri Yagelnik <uriy@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: oranagra <oran@redislabs.com>
2024-09-04 19:10:40 +08:00
debing.sun 74609d44cd
Fix set with invalid length causes smembers to hang (#13515)
After https://github.com/redis/redis/pull/13499, If the length set by
`addReplySetLen()` does not match the actual number of elements in the
reply, it will cause protocol broken and result in the client hanging.
2024-09-04 17:35:46 +08:00
Ozan Tezcan ea05c6ac47
Fix RM_RdbLoad() to enable AOF after loading is completed (#13510)
RM_RdbLoad() disables AOF temporarily while loading RDB.
Later, it does not enable it back as it checks AOF state (disabled by then) 
rather than AOF config parameter.

Added a change to restart AOF according to config parameter.
2024-09-04 11:11:04 +03:00
Filipe Oliveira (Redis) 05aed4cab9
Optimize SET/INCR*/DECR*/SETRANGE/APPEND by reducing duplicate computation (#13505)
- Avoid addReplyLongLong (which converts back to string) the value we
already have as a robj, by using addReplyProto + addReply
- Avoid doing dbFind Twice for the same dictEntry on
INCR*/DECR*/SETRANGE/APPEND commands.
- Avoid multiple sdslen calls with the same input on setrangeCommand and
appendCommand
- Introduce setKeyWithDictEntry, which is like setKey(), but accepts an
optional dictEntry input: Avoids the second dictFind in SET command

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2024-09-04 14:51:21 +08:00