redis

Commit Graph

Author	SHA1	Message	Date
Yuan Wang	033abd6f57	Async IO threads (#13665 ) ## Introduction Redis introduced IO Thread in 6.0, allowing IO threads to handle client request reading, command parsing and reply writing, thereby improving performance. The current IO thread implementation has a few drawbacks. - The main thread is blocked during IO thread read/write operations and must wait for all IO threads to complete their current tasks before it can continue execution. In other words, the entire process is synchronous. This prevents the efficient utilization of multi-core CPUs for parallel processing. - When the number of clients and requests increases moderately, it causes all IO threads to reach full CPU utilization due to the busy wait mechanism used by the IO threads. This makes it challenging for us to determine which part of Redis has reached its bottleneck. - When IO threads are enabled with TLS and io-threads-do-reads, a disconnection of a connection with pending data may result in it being assigned to multiple IO threads simultaneously. This can cause race conditions and trigger assertion failures. Related issue: https://github.com/redis/redis/issues/12540 Therefore, we designed an asynchronous IO threads solution. The IO threads adopt an event-driven model, with the main thread dedicated to command processing, meanwhile, the IO threads handle client read and write operations in parallel. ## Implementation ### Overall As before, we did not change the fact that all client commands must be executed on the main thread, because Redis was originally designed to be single-threaded, and processing commands in a multi-threaded manner would inevitably introduce numerous race and synchronization issues. But now each IO thread has independent event loop, therefore, IO threads can use a multiplexing approach to handle client read and write operations, eliminating the CPU overhead caused by busy-waiting. the execution process can be briefly described as follows: the main thread assigns clients to IO threads after accepting connections, IO threads will notify the main thread when clients finish reading and parsing queries, then the main thread processes queries from IO threads and generates replies, IO threads handle writing reply to clients after receiving clients list from main thread, and then continue to handle client read and write events. ### Each IO thread has independent event loop We now assign each IO thread its own event loop. This approach eliminates the need for the main thread to perform the costly `epoll_wait` operation for handling connections (except for specific ones). Instead, the main thread processes requests from the IO threads and hands them back once completed, fully offloading read and write events to the IO threads. Additionally, all TLS operations, including handling pending data, have been moved entirely to the IO threads. This resolves the issue where io-threads-do-reads could not be used with TLS. ### Event-notified client queue To facilitate communication between the IO threads and the main thread, we designed an event-notified client queue. Each IO thread and the main thread have two such queues to store clients waiting to be processed. These queues are also integrated with the event loop to enable handling. We use pthread_mutex to ensure the safety of queue operations, as well as data visibility and ordering, and race conditions are minimized, as each IO thread and the main thread operate on independent queues, avoiding thread suspension due to lock contention. And we implemented an event notifier based on `eventfd` or `pipe` to support event-driven handling. ### Thread safety Since the main thread and IO threads can execute in parallel, we must handle data race issues carefully. client->flags The primary tasks of IO threads are reading and writing, i.e. `readQueryFromClient` and `writeToClient`. However, IO threads and the main thread may concurrently modify or access `client->flags`, leading to potential race conditions. To address this, we introduced an io-flags variable to record operations performed by IO threads, thereby avoiding race conditions on `client->flags`. Pause IO thread In the main thread, we may want to operate data of IO threads, maybe uninstall event handler, access or operate query/output buffer or resize event loop, we need a clean and safe context to do that. We pause IO thread in `IOThreadBeforeSleep`, do some jobs and then resume it. To avoid thread suspended, we use busy waiting to confirm the target status. Besides we use atomic variable to make sure memory visibility and ordering. We introduce these functions to pause/resume IO Threads as below. ``` pauseIOThread, resumeIOThread pauseAllIOThreads, resumeAllIOThreads pauseIOThreadsRange, resumeIOThreadsRange ``` Testing has shown that `pauseIOThread` is highly efficient, allowing the main thread to execute nearly 200,000 operations per second during stress tests. Similarly, `pauseAllIOThreads` with 8 IO threads can handle up to nearly 56,000 operations per second. But operations performed between pausing and resuming IO threads must be quick; otherwise, they could cause the IO threads to reach full CPU utilization. freeClient and freeClientAsync The main thread may need to terminate a client currently running on an IO thread, for example, due to ACL rule changes, reaching the output buffer limit, or evicting a client. In such cases, we need to pause the IO thread to safely operate on the client. maxclients and maxmemory-clients updating When adjusting `maxclients`, we need to resize the event loop for all IO threads. Similarly, when modifying `maxmemory-clients`, we need to traverse all clients to calculate their memory usage. To ensure safe operations, we pause all IO threads during these adjustments. Client info reading The main thread may need to read a client’s fields to generate a descriptive string, such as for the `CLIENT LIST` command or logging purposes. In such cases, we need to pause the IO thread handling that client. If information for all clients needs to be displayed, all IO threads must be paused. Tracking redirect Redis supports the tracking feature and can even send invalidation messages to a connection with a specified ID. But the target client may be running on IO thread, directly manipulating the client’s output buffer is not thread-safe, and the IO thread may not be aware that the client requires a response. In such cases, we pause the IO thread handling the client, modify the output buffer, and install a write event handler to ensure proper handling. clientsCron In the `clientsCron` function, the main thread needs to traverse all clients to perform operations such as timeout checks, verifying whether they have reached the soft output buffer limit, resizing the output/query buffer, or updating memory usage. To safely operate on a client, the IO thread handling that client must be paused. If we were to pause the IO thread for each client individually, the efficiency would be very low. Conversely, pausing all IO threads simultaneously would be costly, especially when there are many IO threads, as clientsCron is invoked relatively frequently. To address this, we adopted a batched approach for pausing IO threads. At most, 8 IO threads are paused at a time. The operations mentioned above are only performed on clients running in the paused IO threads, significantly reducing overhead while maintaining safety. ### Observability In the current design, the main thread always assigns clients to the IO thread with the least clients. To clearly observe the number of clients handled by each IO thread, we added the new section in INFO output. The `INFO THREADS` section can show the client count for each IO thread. ``` # Threads io_thread_0:clients=0 io_thread_1:clients=2 io_thread_2:clients=2 ``` Additionally, in the `CLIENT LIST` output, we also added a field to indicate the thread to which each client is assigned. `id=244 addr=127.0.0.1:41870 laddr=127.0.0.1:6379 ... resp=2 lib-name= lib-ver= io-thread=1` ## Trade-off ### Special Clients For certain special types of clients, keeping them running on IO threads would result in severe race issues that are difficult to resolve. Therefore, we chose not to offload these clients to the IO threads. For replica, monitor, subscribe, and tracking clients, main thread may directly write them a reply when conditions are met. Race issues are difficult to resolve, so we have them processed in the main thread. This includes the Lua debug clients as well, since we may operate connection directly. For blocking client, after the IO thread reads and parses a command and hands it over to the main thread, if the client is identified as a blocking type, it will be remained in the main thread. Once the blocking operation completes and the reply is generated, the client is transferred back to the IO thread to send the reply and wait for event triggers. ### Clients Eviction To support client eviction, it is necessary to update each client’s memory usage promptly during operations such as read, write, or command execution. However, when a client operates on an IO thread, it is not feasible to update the memory usage immediately due to the risk of data races. As a result, memory usage can only be updated either in the main thread while processing commands or in the `ClientsCron` periodically. The downside of this approach is that updates might experience a delay of up to one second, which could impact the precision of memory management for eviction. To avoid incorrectly evicting clients. We adopted a best-effort compensation solution, when we decide to eviction a client, we update its memory usage again before evicting, if the memory used by the client does not decrease or memory usage bucket is not changed, then we will evict it, otherwise, not evict it. However, we have not completely solved this problem. Due to the delay in memory usage updates, it may lead us to make incorrect decisions about the need to evict clients. ### Defragment In the majority of cases we do NOT use the data from argv directly in the db. 1. key names We store a copy that we allocate in the main thread, see `sdsdup()` in `dbAdd()`. 2. hash key and value We store key as hfield and store value as sds, see `hfieldNew()` and `sdsdup()` in `hashTypeSet()`. 3. other datatypes They don't even use SDS, so there is no reference issues. But in some cases client the data from argv may be retain by the main thread. As a result, during fragmentation cleanup, we need to move allocations from the IO thread’s arena to the main thread’s arena. We always allocate new memory in the main thread’s arena, but the memory released by IO threads may not yet have been reclaimed. This ultimately causes the fragmentation rate to be higher compared to creating and allocating entirely within a single thread. The following cases below will lead to memory allocated by the IO thread being kept by the main thread. 1. string related command: `append`, `getset`, `mset` and `set`. If `tryObjectEncoding()` does not change argv, we will keep it directly in the main thread, see the code in `tryObjectEncoding()`(specifically `trimStringObjectIfNeeded()`) 2. block related command. the key names will be kept in `c->db->blocking_keys`. 3. watch command the key names will be kept in `c->db->watched_keys`. 4. [s]subscribe command channel name will be kept in `serverPubSubChannels`. 5. script load command script will be kept in `server.lua_scripts`. 7. some module API: `RM_RetainString`, `RM_HoldString` Those issues will be handled in other PRs. ## Testing ### Functional Testing The commit with enabling IO Threads has passed all TCL tests, but we did some changes: Client query buffer: In the original code, when using a reusable query buffer, ownership of the query buffer would be released after the command was processed. However, with IO threads enabled, the client transitions from an IO thread to the main thread for processing. This causes the ownership release to occur earlier than the command execution. As a result, when IO threads are enabled, the client's information will never indicate that a shared query buffer is in use. Therefore, we skip the corresponding query buffer tests in this case. Defragment: Add a new defragmentation test to verify the effect of io threads on defragmentation. Command delay: For deferred clients in TCL tests, due to clients being assigned to different threads for execution, delays may occur. To address this, we introduced conditional waiting: the process proceeds to the next step only when the `client list` contains the corresponding commands. ### Sanitizer Testing The commit passed all TCL tests and reported no errors when compiled with the `fsanitizer=thread` and `fsanitizer=address` options enabled. But we made the following modifications: we suppressed the sanitizer warnings for clients with watched keys when updating `client->flags`, we think IO threads read `client->flags`, but never modify it or read the `CLIENT_DIRTY_CAS` bit, main thread just only modifies this bit, so there is no actual data race. ## Others ### IO thread number In the new multi-threaded design, the main thread is primarily focused on command processing to improve performance. Typically, the main thread does not handle regular client I/O operations but is responsible for clients such as replication and tracking clients. To avoid breaking changes, we still consider the main thread as the first IO thread. When the io-threads configuration is set to a low value (e.g., 2), performance does not show a significant improvement compared to a single-threaded setup for simple commands (such as SET or GET), as the main thread does not consume much CPU for these simple operations. This results in underutilized multi-core capacity. However, for more complex commands, having a low number of IO threads may still be beneficial. Therefore, it’s important to adjust the `io-threads` based on your own performance tests. Additionally, you can clearly monitor the CPU utilization of the main thread and IO threads using `top -H -p $redis_pid`. This allows you to easily identify where the bottleneck is. If the IO thread is the bottleneck, increasing the `io-threads` will improve performance. If the main thread is the bottleneck, the overall performance can only be scaled by increasing the number of shards or replicas. --------- Co-authored-by: debing.sun <debing.sun@redis.com> Co-authored-by: oranagra <oran@redislabs.com>	2024-12-22 19:30:37 +08:00
Moti Cohen	3a3cacfefa	Extend modules API to read also expired keys and subkeys (#13526 ) The PR extends `RedisModule_OpenKey`'s flags to include `REDISMODULE_OPEN_KEY_ACCESS_EXPIRED`, which allows to access expired keys. It also allows to access expired subkeys. Currently relevant only for hash fields and has its impact on `RM_HashGet` and `RM_Scan`.	2024-09-19 20:47:00 +03:00
Ozan Tezcan	ea05c6ac47	Fix RM_RdbLoad() to enable AOF after loading is completed (#13510 ) RM_RdbLoad() disables AOF temporarily while loading RDB. Later, it does not enable it back as it checks AOF state (disabled by then) rather than AOF config parameter. Added a change to restart AOF according to config parameter.	2024-09-04 11:11:04 +03:00
Meir Shpilraien (Spielrein)	d3d94ccf2e	Added new defrag API to allocate and free raw memory. (#13509 ) All the defrag allocations API expects to get a value and replace it, leaving the old value untouchable. In some cases a value might be shared between multiple keys, in such cases we can not simply replace it when the defrag callback is called. To allow support such use cases, the PR adds two new API's to the defrag API: 1. `RM_DefragAllocRaw` - allocate memory base on a given size. 2. `RM_DefragFreeRaw` - Free the given pointer. Those API's avoid using tcache so they operate just like `RM_DefragAlloc` but allows the user to split the allocation and the memory free operations into two stages and control when those happen. In addition the PR adds new API to allow the module to receive notifications when defrag start and end: `RM_RegisterDefragCallbacks` Those callbacks are the same as `RM_RegisterDefragFunc` but promised to be called and the start and the end of the defrag process.	2024-09-03 15:03:19 +03:00
debing.sun	e750c619b2	Fix some test failures caused by key being deleted due to premature expiration (#13453 ) 1. Fix fuzzer test failure when the key was deleted due to expiration before sending random traffic for the key. After HFE, when all fields in a hash are expired, the hash might be deleted due to expiration. If the key was expired in the mid of `RESTORE` command and sending rand trafic, `fuzzer` test will fail in the following code because the 'TYPE key' will return `none` and then throw an exception because it cannot be found in `$commands` `94b9072e44/tests/support/util.tcl (L712-L713)` This PR adds a `None` check for the reply of `KEY TYPE` command and adds a print of `err` to avoid false positives in the future. failed CI: https://github.com/redis/redis/actions/runs/10127334121/job/28004985388 2. Fix the issue where key was deleted due to expiration before the `scan.scan_key` command could execute, caused by premature enabling of `set-active-expire`. failed CI: https://github.com/redis/redis/actions/runs/10153722715/job/28077610552 --------- Co-authored-by: oranagra <oran@redislabs.com>	2024-07-31 08:15:39 +08:00
Oran Agra	a331978583	Fix external test hang in redis-cli test when run in a certain order (#13423 ) When the tests are run against an external server in this order: `--single unit/introspection --single unit/moduleapi/blockonbackground --single integration/redis-cli` the test would hang when the "ASK redirect test" test attempts to create a listening socket (it fails, and then redis-cli itself hangs waiting for a non-responsive socket created by the introspection test). the reasons are: 1. the blockedbackground test includes util.tcl and resets the `::last_port_attempted` variable 2. the test in introspection didn't close the listening server, so it's still alive. 3. find_available_port doesn't properly detect the busy port, and it thinks that the port is free even though it's busy. fixing all 3 of these problems, even though fixing just one would be enough to let the test pass.	2024-07-17 15:42:28 +03:00
debing.sun	52e12d8bac	Don't keep global replication buffer reference for replicas marked CLIENT_CLOSE_ASAP (#13363 ) In certain situations, we might generate a large number of propagates (e.g., multi/exec, Lua script, or a single command generating tons of propagations) within an event loop. During the process of propagating to a replica, if the replica is disconnected(marked as CLIENT_CLOSE_ASAP) due to exceeding the output buffer limit, we should remove its reference to the global replication buffer to avoid the global replication buffer being unable to be properly trimmed due to being referenced. --------- Co-authored-by: oranagra <oran@redislabs.com>	2024-06-26 08:26:23 +08:00
Moti Cohen	ce121b9261	HFE - Avoid lazy expire if called by modules + cleanup (#13326 ) Need to be carefull if called by modules since modules API allow to open and close key handler. We don't want to invalidate the handler underneath. * hashTypeExists(), hashTypeGetValueObject() - will return the logical state of the field. A flag will indicate noExpire. * RM_HashGet() - Will get NULL if the field expired. Fields won’t be deleted. * RM_ScanKey() - might return 0 items if all fields got expired. Fields won’t be deleted. * RM_HashSet() - If set, then override expired field. If delete, we can either delete or leave it to active-expiration. XX/NX - logically correct (Verify with tests). Nice to have (not implemented): * RedisModule_CloseKey() - We can local active-expire up-to 100 items. Note: Length will be wrong to modules just like redis (Count expired fields).	2024-06-10 11:24:26 +03:00
Ozan Tezcan	44352beefa	Fix crash in RM_ScanKey() when used with hexpire (#13320 ) RM_ScanKey() was overlooked while introducing hash field expiration. An assert is triggered when it is called on a hash key with OBJ_ENCODING_LISTPACK_EX encoding. I've changed to code to handle listpackex encoding properly.	2024-06-04 11:42:02 +03:00
Viktor Söderqvist	9efc6ad6a6	Add API RedisModule_ClusterKeySlot and RedisModule_ClusterCanonicalKeyNameInSlot (#13069 ) Sometimes it's useful to compute a key's cluster slot in a module. This API function is just like the command CLUSTER KEYSLOT (but faster). A "reverse" API is also added: `RedisModule_ClusterCanonicalKeyNameInSlot`. Given a slot, it returns a short string that we can call a canonical key for the slot.	2024-03-12 09:26:12 -07:00
Binbin	4e3be944fc	Fix timing issue in blockedclient test (#13071 ) We can see that the past time here happens to be busy_time_limit, causing the test to fail: ``` [err]: RM_Call from blocked client in tests/unit/moduleapi/blockedclient.tcl Expected '50' to be more than '50' (context: type eval line 26 cmd {assert_morethan [expr [clock clicks -milliseconds]-$start] $busy_time_limit} proc ::test) ``` It is reasonable for them to be equal, so equal is added here. It should be noted that in the previous `Busy module command` test, we also used assert_morethan_equal, so this should have been missed at the time.	2024-02-20 08:43:13 +02:00
Binbin	6016973ac0	Fix module assertion crash when timer and timeout are unlocked in the same event loop (#13015 ) When we use a timer to unblock a client in module, if the timer period and the block timeout are very close, they will unblock the client in the same event loop, and it will trigger the assertion. The reason is that in moduleBlockedClientTimedOut we will protect against re-processing, so we don't actually call updateStatsOnUnblock (see #12817), so we are not able to reset the c->duration. The reason is unblockClientOnTimeout() didn't realize that bc had been unblocked. We add a function to the module to determine if bc is blocked, and then use it in unblockClientOnTimeout() to exit. There is the stack: ``` beforeSleep blockedBeforeSleep handleBlockedClientsTimeout checkBlockedClientTimeout unblockClientOnTimeout unblockClient resetClient -- assertion, crash the server 'c->duration == 0' is not true ```	2024-01-31 13:10:19 +02:00
Binbin	74a6e48a3d	Fix module unblock crash due to no timeout_callback (#13017 ) The block timeout is passed in the test case, but we do not pass in the timeout_callback, and it will crash when unlocking. In this case, in moduleBlockedClientTimedOut we will check timeout_callback. There is the stack: ``` beforeSleep blockedBeforeSleep handleBlockedClientsTimeout checkBlockedClientTimeout unblockClientOnTimeout replyToBlockedClientTimedOut moduleBlockedClientTimedOut -- timeout_callback is NULL, invalidFunctionWasCalled bc->timeout_callback(&ctx,(void**)c->argv,c->argc); ```	2024-01-31 09:28:50 +02:00
Ozan Tezcan	c5273cae18	Add RM_TryCalloc() and RM_TryRealloc() (#12985 ) Modules may want to handle allocation failures gracefully. Adding RM_TryCalloc() and RM_TryRealloc() for it. RM_TryAlloc() was added before: https://github.com/redis/redis/pull/10541	2024-01-29 20:56:03 +02:00
Madelyn Olson	8bb9a2895e	Address some failures with new tests for improving debug report (#12915 ) Fix a daily test failure because alpine doesn't support stack traces and add in an extra assertion related to making sure the stack trace was printed twice.	2024-01-08 17:56:06 -08:00
Madelyn Olson	068051e378	Handle recursive serverAsserts and provide more information for recursive segfaults (#12857 ) This change is trying to make two failure modes a bit easier to deep dive: 1. If a serverPanic or serverAssert occurs during the info (or module) printing, it will recursively panic, which is a lot of fun as it will just keep recursively printing. It will eventually stack overflow, but will generate a lot of text in the process. 2. When a segfault happens during the segfault handler, no information is communicated other than it happened. This can be problematic because `info` may help diagnose the real issue, but without fixing the recursive crash it might be hard to get at that info.	2024-01-02 18:20:22 -08:00
Binbin	c85a9b7896	Fix delKeysInSlot server events are not executed inside an execution unit (#12745 ) This is a follow-up fix to #12733. We need to apply the same changes to delKeysInSlot. Refer to #12733 for more details. This PR contains some other minor cleanups / improvements to the test suite and docs. It uses the postnotifications test module in a cluster mode test which revealed a leak in the test module (fixed).	2023-12-11 20:15:19 +02:00
Binbin	d6f19539d2	Un-register notification and server event when RedisModule_OnLoad fails (#12809 ) When we register notification or server event in RedisModule_OnLoad, but RedisModule_OnLoad eventually fails, triggering notification or server event will cause the server to crash. If the loading fails on a later stage of moduleLoad, we do call moduleUnload which handles all un-registration, but when it fails on the RedisModule_OnLoad call, we only un-register several specific things and these were missing: - moduleUnsubscribeNotifications - moduleUnregisterFilters - moduleUnsubscribeAllServerEvents Refactored the code to reuse the code from moduleUnload. Fixes #12808.	2023-11-27 17:26:33 +02:00
meiravgri	2e854bccc6	Fix async safety in signal handlers (#12658 ) see discussion from after https://github.com/redis/redis/pull/12453 was merged ---- This PR replaces signals that are not considered async-signal-safe (AS-safe) with safe calls. #### 1. serverLog() and serverLogFromHandler() `serverLog` uses unsafe calls. It was decided that we will avoid `serverLog` calls by the signal handlers when: * The signal is not fatal, such as SIGALRM. In these cases, we prefer using `serverLogFromHandler` which is the safe version of `serverLog`. Note they have different prompts: `serverLog`: `62220:M 26 Oct 2023 14:39:04.526 # <msg>` `serverLogFromHandler`: `62220:signal-handler (1698331136) <msg>` * The code was added recently. Calls to `serverLog` by the signal handler have been there ever since Redis exists and it hasn't caused problems so far. To avoid regression, from now we should use `serverLogFromHandler` #### 2. `snprintf` `fgets` and `strtoul`(base = 16) --------> `_safe_snprintf`, `fgets_async_signal_safe`, `string_to_hex` The safe version of `snprintf` was taken from [here](`8cfc4ca5e7/src/mc_util.c (L754)`) #### 3. fopen(), fgets(), fclose() --------> open(), read(), close() #### 4. opendir(), readdir(), closedir() --------> open(), syscall(SYS_getdents64), close() #### 5. Threads_mngr sync mechanisms * waiting for the thread to generate stack trace: semaphore --------> busy-wait * `globals_rw_lock` was removed: as we are not using malloc and the semaphore anymore we don't need to protect `ThreadsManager_cleanups`. #### 6. Stacktraces buffer The initial problem was that we were not able to safely call malloc within the signal handler. To solve that we created a buffer on the stack of `writeStacktraces` and saved it in a global pointer, assuming that under normal circumstances, the function `writeStacktraces` would complete before any thread attempted to write to it. However, if threads lag behind, they might access this global pointer after it no longer belongs to the `writeStacktraces` stack, potentially corrupting memory. To address this, various solutions were discussed [here](https://github.com/redis/redis/pull/12658#discussion_r1390442896) Eventually, we decided to create a pipe at server startup that will remain valid as long as the process is alive. We chose this solution due to its minimal memory usage, and since `write()` and `read()` are atomic operations. It ensures that stack traces from different threads won't mix. The stacktraces collection process is now as follows: * Cleaning the pipe to eliminate writes of late threads from previous runs. * Each thread writes to the pipe its stacktrace * Waiting for all the threads to mark completion or until a timeout (2 sec) is reached * Reading from the pipe to print the stacktraces. #### 7. Changes that were considered and eventually were dropped * replace watchdog timer with a POSIX timer: according to [settimer man](https://linux.die.net/man/2/setitimer) > POSIX.1-2008 marks getitimer() and setitimer() obsolete, recommending the use of the POSIX timers API ([timer_gettime](https://linux.die.net/man/2/timer_gettime)(2), [timer_settime](https://linux.die.net/man/2/timer_settime)(2), etc.) instead. However, although it is supposed to conform to POSIX std, POSIX timers API is not supported on Mac. You can take a look here at the Linux implementation: [here](`c7562ee135`) To avoid messing up the code, and uncertainty regarding compatibility, it was decided to drop it for now. * avoid using sds (uses malloc) in logConfigDebugInfo It was considered to print config info instead of using sds, however apparently, `logConfigDebugInfo` does more than just print the sds, so it was decided this fix is out of this issue scope. #### 8. fix Signal mask check The check `signum & sig_mask` intended to indicate whether the signal is blocked by the thread was incorrect. Actually, the bit position in the signal mask corresponds to the signal number. We fixed this by changing the condition to: `sig_mask & (1L << (sig_num - 1))` #### 9. Unrelated changes both `fork.tcl `and `util.tcl` implemented a function called `count_log_message` expecting different parameters. This caused confusion when trying to run daily tests with additional test parameters to run a specific test. The `count_log_message` in `fork.tcl` was removed and the calls were replaced with calls to `count_log_message` located in `util.tcl` --------- Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2023-11-23 13:22:20 +02:00
Meir Shpilraien (Spielrein)	0ffb9d2ea9	Before evicted and before expired server events are not executed inside an execution unit. (#12733 ) Redis 7.2 (#9406) introduced a new modules event, `RedisModuleEvent_Key`. This new event allows the module to read the key data just before it is removed from the database (either deleted, expired, evicted, or overwritten). When the key is removed from the database, either by active expire or eviction. The new event was not called as part of an execution unit. This can cause an issue if the module registers a post notification job inside the event. This job will not be executed atomically with the expiration/eviction operation and will not replicated inside a Multi/Exec. Moreover, the post notification job will be executed right after the event where it is still not safe to perform any write operation, this will violate the promise that post notification job will be called atomically with the operation that triggered it and only when it is safe to write. This PR fixes the issue by wrapping each expiration/eviction of a key with an execution unit. This makes sure the entire operation will run atomically and all the post notification jobs will be executed at the end where it is safe to write. Tests were modified to verify the fix.	2023-11-08 09:28:22 +02:00
guybe7	c2a4b78491	WAITAOF: Update fsynced_reploff_pending even if there's nothing to fsync (#12622 ) The problem is that WAITAOF could have hang in case commands were propagated only to replicas. This can happen if a module uses RM_Call with the REDISMODULE_ARGV_NO_AOF flag. In that case, master_repl_offset would increase, but there would be nothing to fsync, so in the absence of other traffic, fsynced_reploff_pending would stay the static, and WAITAOF can hang. This commit updates fsynced_reploff_pending to the latest offset in flushAppendOnlyFile in case there's nothing to fsync. i.e. in case it's behind because of the above mentions case it'll be refreshed and release the WAITAOF. Other changes: Fix a race in wait.tcl (client getting blocked vs. the fsync thread)	2023-09-28 17:19:20 +03:00
Roshan Khatri	7519960527	Allows modules to declare new ACL categories. (#12486 ) This PR adds a new Module API int RM_AddACLCategory(RedisModuleCtx ctx, const char category_name) to add a new ACL command category. Here, we initialize the ACLCommandCategories array by allocating space for 64 categories and duplicate the 21 default categories from the predefined array 'ACLDefaultCommandCategories' into the ACLCommandCategories array while ACL initialization. Valid ACL category names can only contain alphanumeric characters, underscores, and dashes. The API when called, checks for the onload flag, category name validity, and for duplicate category name if present. If the conditions are satisfied, the API adds the new category to the trailing end of the ACLCommandCategories array and assigns the acl_categories flag bit according to the index at which the category is added. If any error is encountered the errno is set accordingly by the API. --------- Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2023-08-30 13:01:24 -07:00
judeng	07ed0eafa9	improve performance for scan command when matching pattern or data type (#12209 ) Optimized the performance of the SCAN command in a few ways: 1. Move the key filtering (by MATCH pattern) in the scan callback, so as to avoid collecting them for later filtering. 2. Reduce a many memory allocations and copying (use a reference to the original sds, instead of creating an robj, an excessive 2 mallocs and one string duplication) 3. Compare TYPE filter directly (as integers), instead of inefficient string compare per key. 4. fixed a small bug: when scan zset and hash types, maxiterations uses a more accurate number to avoid wrong double maxiterations. Changes postponed for a later version (8.0): 1. Prepare to move the TYPE filtering to the scan callback as well. this was put on hold since it has side effects that can be considered a breaking change, which is that we will not attempt to do lazy expire (delete) a key that was filtered by not matching the TYPE (changing it would mean TYPE filter starts behaving the same as MATCH filter already does in that respect). 2. when the specified key TYPE filter is an unknown type, server will reply a error immediately instead of doing a full scan that comes back empty handed. Benchmark result: For different scenarios, we obtained about 30% or more performance improvement. Co-authored-by: Oran Agra <oran@redislabs.com>	2023-06-27 16:43:46 +03:00
Meir Shpilraien (Spielrein)	153f8f082e	Fix use after free on blocking RM_Call. (#12342 ) blocking RM_Call was introduced on: #11568, It allows a module to perform blocking commands and get the reply asynchronously.If the command gets block, a special promise CallReply is returned that allow to set the unblock handler. The unblock handler will be called when the command invocation finish and it gets, as input, the command real reply. The issue was that the real CallReply was created using a stack allocated RedisModuleCtx which is no longer available after the unblock handler finishes. So if the module keeps the CallReply after the unblock handler finished, the CallReply holds a pointer to invalid memory and will try to access it when the CallReply will be released. The solution is to create the CallReply with a NULL context to make it totally detached and can be freed freely when the module wants. Test was added to cover this case, running the test with valgrind before the fix shows the use after free error. With the fix, there are no valgrind errors. unrelated: adding a missing `$rd close` in many tests in that file.	2023-06-25 14:12:27 +03:00
guybe7	3230199920	Modules: Unblock from within a timer coverage (#12337 ) Apart from adding the missing coverage, this PR also adds `blockedBeforeSleep` that gathers all block-related functions from `beforeSleep` The order inside `blockedBeforeSleep` is different: now `handleClientsBlockedOnKeys` (which may unblock clients) is called before `processUnblockedClients` (which handles unblocked clients). It makes sense to have this order. There are no visible effects of the wrong ordering, except some cleanups of the now-unblocked client would have happen in the next `beforeSleep` (will now happen in the current one) The reason we even got into it is because i triggers an assertion in logresreq.c (breaking the assumption that `unblockClient` is called before actually flushing the reply to the socket): `handleClientsBlockedOnKeys` is called, then it calls `moduleUnblockClientOnKey`, which calls `moduleUnblockClient`, which adds the client to `moduleUnblockedClients` back to `beforeSleep`, we call `handleClientsWithPendingWritesUsingThreads`, it writes the data of buf to the client, so `client->bufpos` became 0 On the next `beforeSleep`, we call `moduleHandleBlockedClients`, which calls `unblockClient`, which calls `reqresAppendResponse`, triggering the assert. (because the `bufpos` is 0) - see https://github.com/redis/redis/pull/12301#discussion_r1226386716	2023-06-22 23:15:16 +03:00
guybe7	20fa156067	Align RM_ReplyWithErrorFormat with RM_ReplyWithError (#12321 ) Introduced by https://github.com/redis/redis/pull/11923 (Redis 7.2 RC2) It's very weird and counterintuitive that `RM_ReplyWithError` requires the error-code without a hyphen while `RM_ReplyWithErrorFormat` requires either the error-code with a hyphen or no error-code at all ``` RedisModule_ReplyWithError(ctx, "BLA bla bla"); ``` vs. ``` RedisModule_ReplyWithErrorFormat(ctx, "-BLA %s", "bla bla"); ``` This commit aligns RM_ReplyWithErrorFormat to behvae like RM_ReplyWithError. it's a breaking changes but it's done before 7.2 goes GA.	2023-06-20 20:44:43 +03:00
Oran Agra	8ad8f0f9d8	Fix broken protocol when PUBLISH emits local push inside MULTI (#12326 ) When a connection that's subscribe to a channel emits PUBLISH inside MULTI-EXEC, the push notification messes up the EXEC response. e.g. MULTI, PING, PUSH foo bar, PING, EXEC the EXEC's response will contain: PONG, {message foo bar}, 1. and the second PONG will be delivered outside the EXEC's response. Additionally, this PR changes the order of responses in case of a plain PUBLISH (when the current client also subscribed to it), by delivering the push after the command's response instead of before it. This also affects modules calling RM_PublishMessage in a similar way, so that we don't run the risk of getting that push mixed together with the module command's response.	2023-06-20 20:41:41 +03:00
Shaya Potter	07316f16f0	Add ability for modules to know which client is being cmd filtered (#12219 ) Adds API - RedisModule_CommandFilterGetClientId() Includes addition to commandfilter test module to validate that it works by performing the same command from 2 different clients	2023-06-20 09:03:52 +03:00
Oran Agra	6117f28822	Fix WAIT for clients being blocked in a module command (#12220 ) So far clients being blocked and unblocked by a module command would update the c->woff variable and so WAIT was ineffective and got released without waiting for the command actions to propagate. This seems to have existed since forever, but not for RM_BlockClientOnKeys. It is problematic though to know if the module did or didn't propagate anything in that command, so for now, instead of adding an API, we'll just update the woff to the latest offset when unblocking, this will cause the client to possibly wait excessively, but that's not that bad.	2023-05-28 10:10:52 +03:00
sundb	ce5f4ea3a9	Delete empty key if fails after moduleCreateEmptyKey() in module (#12129 ) When `RM_ZsetAdd()`/`RM_ZsetIncrby()`/`RM_StreamAdd()` fails, if a new key happens to be created using `moduleCreateEmptyKey()`, we should clean up the empty key. ## Test 1) Add new module commands(`zset.add` and `zset.incrby`) to cover `RM_ZsetAdd()`/`RM_ZsetIncrby()`. 2) Add a large-memory test to cover `RM_StreamAdd()`.	2023-05-07 10:13:19 +03:00
Binbin	20533cc1d7	Tests: Do not save an RDB by default and add a SIGTERM default AOFRW test (#12064 ) In order to speed up tests, avoid saving an RDB (mostly notable on shutdown), except for tests that explicitly test the RDB mechanism In addition, use `shutdown-on-sigterm force` to prevetn shutdown from failing in case the server is in the middle of the initial AOFRW Also a a test that checks that the `shutdown-on-sigterm default` is to refuse shutdown if there's an initial AOFRW Co-authored-by: Guy Benoish <guy.benoish@redislabs.com>	2023-04-18 16:14:26 +03:00
Binbin	bfec2d700b	Add RM_ReplyWithErrorFormat that can support format (#11923 ) * Add RM_ReplyWithErrorFormat that can support format Reply with the error create from a printf format and arguments. If the error code is already passed in the string 'fmt', the error code provided is used, otherwise the string "-ERR " for the generic error code is automatically added. The usage is, for example: RedisModule_ReplyWithErrorFormat(ctx, "An error: %s", "foo"); RedisModule_ReplyWithErrorFormat(ctx, "-WRONGTYPE Wrong Type: %s", "foo"); The function always returns REDISMODULE_OK.	2023-04-12 10:11:29 +03:00
Oran Agra	997fa41e99	Attempt to solve MacOS CI issues in GH Actions (#12013 ) The MacOS CI in github actions often hangs without any logs. GH argues that it's due to resource utilization, either running out of disk space, memory, or CPU starvation, and thus the runner is terminated. This PR contains multiple attempts to resolve this: 1. introducing pause_process instead of SIGSTOP, which waits for the process to stop before resuming the test, possibly resolving race conditions in some tests, this was a suspect since there was one test that could result in an infinite loop in that case, in practice this didn't help, but still a good idea to keep. 2. disable the `save` config in many tests that don't need it, specifically ones that use heavy writes and could create large files. 3. change the `populate` proc to use short pipeline rather than an infinite one. 4. use `--clients 1` in the macos CI so that we don't risk running multiple resource demanding tests in parallel. 5. enable `--verbose` to be repeated to elevate verbosity and print more info to stdout when a test or a server starts.	2023-04-12 09:19:21 +03:00
Ozan Tezcan	e55568edb5	Add RM_RdbLoad and RM_RdbSave module API functions (#11852 ) Add `RM_RdbLoad()` and `RM_RdbSave()` to load/save RDB files from the module API. In our use case, we have our clustering implementation as a module. As part of this implementation, the module needs to trigger RDB save operation at specific points. Also, this module delivers RDB files to other nodes (not using Redis' replication). When a node receives an RDB file, it should be able to load the RDB. Currently, there is no module API to save/load RDB files. This PR adds four new APIs: ```c RedisModuleRdbStream RM_RdbStreamCreateFromFile(const char filename); void RM_RdbStreamFree(RedisModuleRdbStream stream); int RM_RdbLoad(RedisModuleCtx ctx, RedisModuleRdbStream stream, int flags); int RM_RdbSave(RedisModuleCtx ctx, RedisModuleRdbStream stream, int flags); ``` The first step is to create a `RedisModuleRdbStream` object. This PR provides a function to create RedisModuleRdbStream from the filename. (You can load/save RDB with the filename). In the future, this API can be extended if needed: e.g., `RM_RdbStreamCreateFromFd()`, `RM_RdbStreamCreateFromSocket()` to save/load RDB from an `fd` or a `socket`. Usage: ```c / Save RDB / RedisModuleRdbStream stream = RedisModule_RdbStreamCreateFromFile("example.rdb"); RedisModule_RdbSave(ctx, stream, 0); RedisModule_RdbStreamFree(stream); /* Load RDB / RedisModuleRdbStream stream = RedisModule_RdbStreamCreateFromFile("example.rdb"); RedisModule_RdbLoad(ctx, stream, 0); RedisModule_RdbStreamFree(stream); ```	2023-04-09 12:07:32 +03:00
Roshan Khatri	6948dacaf6	Module commands to have ACL categories. (#11708 ) This allows modules to register commands to existing ACL categories and blocks the creation of [sub]commands, datatypes and registering the configs outside of the OnLoad function. For allowing modules to register commands to existing ACL categories, This PR implements a new API int RM_SetCommandACLCategories() which takes a pointer to a RedisModuleCommand and a C string aclflags containing the set of space separated ACL categories. Example, 'write slow' marks the command as part of the write and slow ACL categories. The C string aclflags is tokenized by implementing a helper function categoryFlagsFromString(). Theses tokens are matched and the corresponding ACL categories flags are set by a helper function matchAclCategoriesFlags. The helper function categoryFlagsFromString() returns the corresponding categories_flags or returns -1 if some token not processed correctly. If the module contains commands which are registered to existing ACL categories, the number of [sub]commands are tracked by num_commands_with_acl_categories in struct RedisModule. Further, the allowed command bit-map of the existing users are recomputed from the command_rules list, by implementing a function called ACLRecomputeCommandBitsFromCommandRulesAllUsers() for the existing users to have access to the module commands on runtime. ## Breaking change This change requires that registering commands and subcommands only occur during a modules "OnLoad" function, in order to allow efficient recompilation of ACL bits. We also chose to block registering configs and types, since we believe it's only valid for those to be created during onLoad. We check for this onload flag in struct RedisModule to check if the call is made from the OnLoad function. Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2023-03-21 10:07:11 -07:00
Shaya Potter	6cf8fc08f5	Don't run command filter on blocked command reprocessing (#11895 ) Previously we would run the module command filters even upon blocked command reprocessing. This could modify the command, and it's args. This is irrelevant in the context of a command being reprocessed (it already went through the filters), as well as breaks the crashed command lookup that exists in the case of a reprocessed command. fixes #11894. Co-authored-by: Oran Agra <oran@redislabs.com>	2023-03-20 08:04:13 +02:00
Meir Shpilraien (Spielrein)	d0da0a6a3f	Support for RM_Call on blocking commands (#11568 ) Allow running blocking commands from within a module using `RM_Call`. Today, when `RM_Call` is used, the fake client that is used to run command is marked with `CLIENT_DENY_BLOCKING` flag. This flag tells the command that it is not allowed to block the client and in case it needs to block, it must fallback to some alternative (either return error or perform some default behavior). For example, `BLPOP` fallback to simple `LPOP` if it is not allowed to block. All the commands must respect the `CLIENT_DENY_BLOCKING` flag (including module commands). When the command invocation finished, Redis asserts that the client was not blocked. This PR introduces the ability to call blocking command using `RM_Call` by passing a callback that will be called when the client will get unblocked. In order to do that, the user must explicitly say that he allow to perform blocking command by passing a new format specifier argument, `K`, to the `RM_Call` function. This new flag will tell Redis that it is allow to run blocking command and block the client. In case the command got blocked, Redis will return a new type of call reply (`REDISMODULE_REPLY_PROMISE`). This call reply indicates that the command got blocked and the user can set the on_unblocked handler using `RM_CallReplyPromiseSetUnblockHandler`. When clients gets unblocked, it eventually reaches `processUnblockedClients` function. This is where we check if the client is a fake module client and if it is, we call the unblock callback instead of performing the usual unblock operations. Notice: `RM_CallReplyPromiseSetUnblockHandler` must be called atomically along side the command invocation (without releasing the Redis lock in between). In addition, unlike other CallReply types, the promise call reply must be released by the module when the Redis GIL is acquired. The module can abort the execution on the blocking command (if it was not yet executed) using `RM_CallReplyPromiseAbort`. the API will return `REDISMODULE_OK` on success and `REDISMODULE_ERR` if the operation is already executed. Notice that in case of misbehave module, Abort might finished successfully but the operation will not really be aborted. This can only happened if the module do not respect the disconnect callback of the blocked client. For pure Redis commands this can not happened. ### Atomicity Guarantees The API promise that the unblock handler will run atomically as an execution unit. This means that all the operation performed on the unblock handler will be wrapped with a multi exec transaction when replicated to the replica and AOF. The API do not grantee any other atomicity properties such as when the unblock handler will be called. This gives us the flexibility to strengthen the grantees (or not) in the future if we will decide that we need a better guarantees. That said, the implementation does provide a better guarantees when performing pure Redis blocking command like `BLPOP`. In this case the unblock handler will run atomically with the operation that got unblocked (for example, in case of `BLPOP`, the unblock handler will run atomically with the `LPOP` operation that run when the command got unblocked). This is an implementation detail that might be change in the future and the module writer should not count on that. ### Calling blocking commands while running on script mode (`S`) `RM_Call` script mode (`S`) was introduced on #0372. It is used for usecases where the command that was invoked on `RM_Call` comes from a user input and we want to make sure the user will not run dangerous commands like `shutdown`. Some command, such as `BLPOP`, are marked with `NO_SCRIPT` flag, which means they will not be allowed on script mode. Those commands are marked with `NO_SCRIPT` just because they are blocking commands and not because they are dangerous. Now that we can run blocking commands on RM_Call, there is no real reason not to allow such commands on script mode. The underline problem is that the `NO_SCRIPT` flag is abused to also mark some of the blocking commands (notice that those commands know not to block the client if it is not allowed to do so, and have a fallback logic to such cases. So even if those commands were not marked with `NO_SCRIPT` flag, it would not harm Redis, and today we can already run those commands within multi exec). In addition, not all blocking commands are marked with `NO_SCRIPT` flag, for example `blmpop` are not marked and can run from within a script. Those facts shows that there are some ambiguity about the meaning of the `NO_SCRIPT` flag, and its not fully clear where it should be use. The PR suggest that blocking commands should not be marked with `NO_SCRIPT` flag, those commands should handle `CLIENT_DENY_BLOCKING` flag and only block when it's safe (like they already does today). To achieve that, the PR removes the `NO_SCRIPT` flag from the following commands: * `blmove` * `blpop` * `brpop` * `brpoplpush` * `bzpopmax` * `bzpopmin` * `wait` This might be considered a breaking change as now, on scripts, instead of getting `command is not allowed from script` error, the user will get some fallback behavior base on the command implementation. That said, the change matches the behavior of scripts and multi exec with respect to those commands and allow running them on `RM_Call` even when script mode is used. ### Additional RedisModule API and changes * `RM_BlockClientSetPrivateData` - Set private data on the blocked client without the need to unblock the client. This allows up to set the promise CallReply as the private data of the blocked client and abort it if the client gets disconnected. * `RM_BlockClientGetPrivateData` - Return the current private data set on a blocked client. We need it so we will have access to this private data on the disconnect callback. * On RM_Call, the returned reply will be added to the auto memory context only if auto memory is enabled, this allows us to keep the call reply for longer time then the context lifetime and does not force an unneeded borrow relationship between the CallReply and the RedisModuleContext.	2023-03-16 14:04:31 +02:00
KarthikSubbarao	f8a5a4f70c	Custom authentication for Modules (#11659 ) This change adds new module callbacks that can override the default password based authentication associated with ACLs. With this, Modules can register auth callbacks through which they can implement their own Authentication logic. When `AUTH` and `HELLO AUTH ...` commands are used, Module based authentication is attempted and then normal password based authentication is attempted if needed. The new Module APIs added in this PR are - `RM_RegisterCustomAuthCallback` and `RM_BlockClientOnAuth` and `RedisModule_ACLAddLogEntryByUserName `. Module based authentication will be attempted for all Redis users (created through the ACL SETUSER cmd or through Module APIs) even if the Redis user does not exist at the time of the command. This gives a chance for the Module to create the RedisModule user and then authenticate via the RedisModule API - from the custom auth callback. For the AUTH command, we will support both variations - `AUTH <username> <password>` and `AUTH <password>`. In case of the `AUTH <password>` variation, the custom auth callbacks are triggered with “default” as the username and password as what is provided. ### RedisModule_RegisterCustomAuthCallback ``` void RM_RegisterCustomAuthCallback(RedisModuleCtx ctx, RedisModuleCustomAuthCallback cb) { ``` This API registers a callback to execute to prior to normal password based authentication. Multiple callbacks can be registered across different modules. These callbacks are responsible for either handling the authentication, each authenticating the user or explicitly denying, or deferring it to other authentication mechanisms. Callbacks are triggered in the order they were registered. When a Module is unloaded, all the auth callbacks registered by it are unregistered. The callbacks are attempted, in the order of most recently registered callbacks, when the AUTH/HELLO (with AUTH field is provided) commands are called. The callbacks will be called with a module context along with a username and a password, and are expected to take one of the following actions: (1) Authenticate - Use the RM_Authenticate API successfully and return `REDISMODULE_AUTH_HANDLED`. This will immediately end the auth chain as successful and add the OK reply. (2) Block a client on authentication - Use the `RM_BlockClientOnAuth` API and return `REDISMODULE_AUTH_HANDLED`. Here, the client will be blocked until the `RM_UnblockClient `API is used which will trigger the auth reply callback (provided earlier through the `RM_BlockClientOnAuth`). In this reply callback, the Module should authenticate, deny or skip handling authentication. (3) Deny Authentication - Return `REDISMODULE_AUTH_HANDLED` without authenticating or blocking the client. Optionally, `err` can be set to a custom error message. This will immediately end the auth chain as unsuccessful and add the ERR reply. (4) Skip handling Authentication - Return `REDISMODULE_AUTH_NOT_HANDLED` without blocking the client. This will allow the engine to attempt the next custom auth callback. If none of the callbacks authenticate or deny auth, then password based auth is attempted and will authenticate or add failure logs and reply to the clients accordingly. ### RedisModule_BlockClientOnAuth ``` RedisModuleBlockedClient RM_BlockClientOnAuth(RedisModuleCtx ctx, RedisModuleCustomAuthCallback reply_callback, void (free_privdata)(RedisModuleCtx,void)) ``` This API can only be used from a Module from the custom auth callback. If a client is not in the middle of custom module based authentication, ERROR is returned. Otherwise, the client is blocked and the `RedisModule_BlockedClient` is returned similar to the `RedisModule_BlockClient` API. ### RedisModule_ACLAddLogEntryByUserName ``` int RM_ACLAddLogEntryByUserName(RedisModuleCtx ctx, RedisModuleString username, RedisModuleString object, RedisModuleACLLogEntryReason reason) ``` Adds a new entry in the ACL log with the `username` RedisModuleString provided. This simplifies the Module usage because now, developers do not need to create a Module User just to add an error ACL Log entry. Aside from accepting username (RedisModuleString) instead of a RedisModuleUser, it is the same as the existing `RedisModule_ACLAddLogEntry` API. ### Breaking changes - HELLO command - Clients can now only set the client name and RESP protocol from the `HELLO` command if they are authenticated. Also, we now finish command arg validation first and return early with a ERR reply if any arg is invalid. This is to avoid mutating the client name / RESP from a command that would have failed on invalid arguments. ### Notable behaviors - Module unblocking - Now, we will not allow Modules to block the client from inside the context of a reply callback (triggered from the Module unblock flow `moduleHandleBlockedClients`). --------- Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>	2023-03-15 15:18:42 -07:00
guybe7	4ba47d2d21	Add reply_schema to command json files (internal for now) (#10273 ) Work in progress towards implementing a reply schema as part of COMMAND DOCS, see #9845 Since ironing the details of the reply schema of each and every command can take a long time, we would like to merge this PR when the infrastructure is ready, and let this mature in the unstable branch. Meanwhile the changes of this PR are internal, they are part of the repo, but do not affect the produced build. ### Background In #9656 we add a lot of information about Redis commands, but we are missing information about the replies ### Motivation 1. Documentation. This is the primary goal. 2. It should be possible, based on the output of COMMAND, to be able to generate client code in typed languages. In order to do that, we need Redis to tell us, in detail, what each reply looks like. 3. We would like to build a fuzzer that verifies the reply structure (for now we use the existing testsuite, see the "Testing" section) ### Schema The idea is to supply some sort of schema for the various replies of each command. The schema will describe the conceptual structure of the reply (for generated clients), as defined in RESP3. Note that the reply structure itself may change, depending on the arguments (e.g. `XINFO STREAM`, with and without the `FULL` modifier) We decided to use the standard json-schema (see https://json-schema.org/) as the reply-schema. Example for `BZPOPMIN`: ``` "reply_schema": { "oneOf": [ { "description": "Timeout reached and no elements were popped.", "type": "null" }, { "description": "The keyname, popped member, and its score.", "type": "array", "minItems": 3, "maxItems": 3, "items": [ { "description": "Keyname", "type": "string" }, { "description": "Member", "type": "string" }, { "description": "Score", "type": "number" } ] } ] } ``` #### Notes 1. It is ok that some commands' reply structure depends on the arguments and it's the caller's responsibility to know which is the relevant one. this comes after looking at other request-reply systems like OpenAPI, where the reply schema can also be oneOf and the caller is responsible to know which schema is the relevant one. 2. The reply schemas will describe RESP3 replies only. even though RESP3 is structured, we want to use reply schema for documentation (and possibly to create a fuzzer that validates the replies) 3. For documentation, the description field will include an explanation of the scenario in which the reply is sent, including any relation to arguments. for example, for `ZRANGE`'s two schemas we will need to state that one is with `WITHSCORES` and the other is without. 4. For documentation, there will be another optional field "notes" in which we will add a short description of the representation in RESP2, in case it's not trivial (RESP3's `ZRANGE`'s nested array vs. RESP2's flat array, for example) Given the above: 1. We can generate the "return" section of all commands in [redis-doc](https://redis.io/commands/) (given that "description" and "notes" are comprehensive enough) 2. We can generate a client in a strongly typed language (but the return type could be a conceptual `union` and the caller needs to know which schema is relevant). see the section below for RESP2 support. 3. We can create a fuzzer for RESP3. ### Limitations (because we are using the standard json-schema) The problem is that Redis' replies are more diverse than what the json format allows. This means that, when we convert the reply to a json (in order to validate the schema against it), we lose information (see the "Testing" section below). The other option would have been to extend the standard json-schema (and json format) to include stuff like sets, bulk-strings, error-string, etc. but that would mean also extending the schema-validator - and that seemed like too much work, so we decided to compromise. Examples: 1. We cannot tell the difference between an "array" and a "set" 2. We cannot tell the difference between simple-string and bulk-string 3. we cannot verify true uniqueness of items in commands like ZRANGE: json-schema doesn't cover the case of two identical members with different scores (e.g. `[["m1",6],["m1",7]]`) because `uniqueItems` compares (member,score) tuples and not just the member name. ### Testing This commit includes some changes inside Redis in order to verify the schemas (existing and future ones) are indeed correct (i.e. describe the actual response of Redis). To do that, we added a debugging feature to Redis that causes it to produce a log of all the commands it executed and their replies. For that, Redis needs to be compiled with `-DLOG_REQ_RES` and run with `--reg-res-logfile <file> --client-default-resp 3` (the testsuite already does that if you run it with `--log-req-res --force-resp3`) You should run the testsuite with the above args (and `--dont-clean`) in order to make Redis generate `.reqres` files (same dir as the `stdout` files) which contain request-response pairs. These files are later on processed by `./utils/req-res-log-validator.py` which does: 1. Goes over req-res files, generated by redis-servers, spawned by the testsuite (see logreqres.c) 2. For each request-response pair, it validates the response against the request's reply_schema (obtained from the extended COMMAND DOCS) 5. In order to get good coverage of the Redis commands, and all their different replies, we chose to use the existing redis test suite, rather than attempt to write a fuzzer. #### Notes about RESP2 1. We will not be able to use the testing tool to verify RESP2 replies (we are ok with that, it's time to accept RESP3 as the future RESP) 2. Since the majority of the test suite is using RESP2, and we want the server to reply with RESP3 so that we can validate it, we will need to know how to convert the actual reply to the one expected. - number and boolean are always strings in RESP2 so the conversion is easy - objects (maps) are always a flat array in RESP2 - others (nested array in RESP3's `ZRANGE` and others) will need some special per-command handling (so the client will not be totally auto-generated) Example for ZRANGE: ``` "reply_schema": { "anyOf": [ { "description": "A list of member elements", "type": "array", "uniqueItems": true, "items": { "type": "string" } }, { "description": "Members and their scores. Returned in case `WITHSCORES` was used.", "notes": "In RESP2 this is returned as a flat array", "type": "array", "uniqueItems": true, "items": { "type": "array", "minItems": 2, "maxItems": 2, "items": [ { "description": "Member", "type": "string" }, { "description": "Score", "type": "number" } ] } } ] } ``` ### Other changes 1. Some tests that behave differently depending on the RESP are now being tested for both RESP, regardless of the special log-req-res mode ("Pub/Sub PING" for example) 2. Update the history field of CLIENT LIST 3. Added basic tests for commands that were not covered at all by the testsuite ### TODO - [x] (maybe a different PR) add a "condition" field to anyOf/oneOf schemas that refers to args. e.g. when `SET` return NULL, the condition is `arguments.get\|\|arguments.condition`, for `OK` the condition is `!arguments.get`, and for `string` the condition is `arguments.get` - https://github.com/redis/redis/issues/11896 - [x] (maybe a different PR) also run `runtest-cluster` in the req-res logging mode - [x] add the new tests to GH actions (i.e. compile with `-DLOG_REQ_RES`, run the tests, and run the validator) - [x] (maybe a different PR) figure out a way to warn about (sub)schemas that are uncovered by the output of the tests - https://github.com/redis/redis/issues/11897 - [x] (probably a separate PR) add all missing schemas - [x] check why "SDOWN is triggered by misconfigured instance replying with errors" fails with --log-req-res - [x] move the response transformers to their own file (run both regular, cluster, and sentinel tests - need to fight with the tcl including mechanism a bit) - [x] issue: module API - https://github.com/redis/redis/issues/11898 - [x] (probably a separate PR): improve schemas: add `required` to `object`s - https://github.com/redis/redis/issues/11899 Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com> Co-authored-by: Hanna Fadida <hanna.fadida@redislabs.com> Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Shaya Potter <shaya@redislabs.com>	2023-03-11 10:14:16 +02:00
ranshid	4988b92850	Fix an issue when module decides to unblock a client which is blocked on keys (#11832 ) Currently (starting at #11012) When a module is blocked on keys it sets the CLIENT_PENDING_COMMAND flag. However in case the module decides to unblock the client not via the regular flow (eg timeout, key signal or CLIENT UNBLOCK command) it will attempt to reprocess the module command and potentially blocked again. This fix remove the CLIENT_PENDING_COMMAND flag in case blockedForKeys is issued from module context.	2023-03-08 10:08:54 +02:00
sundb	3fba3ccd96	Skip test for sdsRemoveFreeSpace when mem_allocator is not jemalloc (#11878 ) Test `trim on SET with big value` (introduced from #11817) fails under mac m1 with libc mem_allocator. The reason is that malloc(33000) will allocate 65536 bytes(>42000). This test still passes under ubuntu with libc mem_allocator. ``` *** [err]: trim on SET with big value in tests/unit/type/string.tcl Expected [r memory usage key] < 42000 (context: type source line 471 file /Users/iospack/data/redis_fork/tests/unit/type/string.tcl cmd {assert {[r memory usage key] < 42000}} proc ::test) ``` simple test under mac m1 with libc mem_allocator: ```c void *p = zmalloc(33000); printf("malloc size: %zu\n", zmalloc_size(p)); # output malloc size: 65536 ```	2023-03-07 09:06:58 +02:00
uriyage	9d336ac398	Try to trim strings only when applicable (#11817 ) As `sdsRemoveFreeSpace` have an impact on performance even if it is a no-op (see details at #11508). Only call the function when there is a possibility that the string contains free space. * For strings coming from the network, it's only if they're bigger than PROTO_MBULK_BIG_ARG * For strings coming from scripts, it's only if they're smaller than LUA_CMD_OBJCACHE_MAX_LEN * For strings coming from modules, it could be anything. Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: sundb <sundbcn@gmail.com>	2023-02-28 19:38:58 +02:00
Oran Agra	233abbbe03	Cleanup around script_caller, fix tracking of scripts and ACL logging for RM_Call (#11770 ) * Make it clear that current_client is the root client that was called by external connection * add executing_client which is the client that runs the current command (can be a module or a script) * Remove script_caller that was used for commands that have CLIENT_SCRIPT to get the client that called the script. in most cases, that's the current_client, and in others (when being called from a module), it could be an intermediate client when we actually want the original one used by the external connection. bugfixes: * RM_Call with C flag should log ACL errors with the requested user rather than the one used by the original client, this also solves a crash when RM_Call is used with C flag from a detached thread safe context. * addACLLogEntry would have logged info about the script_caller, but in case the script was issued by a module command we actually want the current_client. the exception is when RM_Call is called from a timer event, in which case we don't have a current_client. behavior changes: * client side tracking for scripts now tracks the keys that are read by the script instead of the keys that are declared by the caller for EVAL other changes: * Log both current_client and executing_client in the crash log. * remove prepareLuaClient and resetLuaClient, being dead code that was forgotten. * remove scriptTimeSnapshot and snapshot_time and instead add cmd_time_snapshot that serves all commands and is reset only when execution nesting starts. * remove code to propagate CLIENT_FORCE_REPL from the executed command to the script caller since scripts aren't propagated anyway these days and anyway this flag wouldn't have had an effect since CLIENT_PREVENT_PROP is added by scriptResetRun. * fix a module GIL violation issue in afterSleep that was introduced in #10300 (unreleased)	2023-02-16 08:07:35 +02:00
guybe7	9483ab0b8e	Minor changes around the blockonkeys test module (#11803 ) All of the POP commands must not decr length below 0. So, get_fsl will delete the key if the length is 0 (unless the caller wished to create if doesn't exist) Other: 1. Use REDISMODULE_WRITE where needed (POP commands) 2. Use wait_for_blokced_clients in test Unrelated: Use quotes instead of curly braces in zset.tcl, for variable expansion	2023-02-14 20:06:30 +02:00
Meir Shpilraien (Spielrein)	5c3938d5cc	Match REDISMODULE_OPEN_KEY_* flags to LOOKUP_* flags (#11772 ) The PR adds support for the following flags on RedisModule_OpenKey: * REDISMODULE_OPEN_KEY_NONOTIFY - Don't trigger keyspace event on key misses. * REDISMODULE_OPEN_KEY_NOSTATS - Don't update keyspace hits/misses counters. * REDISMODULE_OPEN_KEY_NOEXPIRE - Avoid deleting lazy expired keys. * REDISMODULE_OPEN_KEY_NOEFFECTS - Avoid any effects from fetching the key In addition, added `RM_GetOpenKeyModesAll`, which returns the mask of all supported OpenKey modes. This allows the module to check, in runtime, which OpenKey modes are supported by the current Redis instance.	2023-02-09 14:59:05 +02:00
Oran Agra	c8052122a2	Fix potential issue with Lua argv caching, module command filter and libc realloc (#11652 ) TLDR: solve a problem introduced in Redis 7.0.6 (#11541) with RM_CommandFilterArgInsert being called from scripts, which can lead to memory corruption. Libc realloc can return the same pointer even if the size was changed. The code in freeLuaRedisArgv had an assumption that if the pointer didn't change, then the allocation didn't change, and the cache can still be reused. However, if rewriteClientCommandArgument or RM_CommandFilterArgInsert were used, it could be that we realloced the argv array, and the pointer didn't change, then a consecutive command being executed from Lua can use that argv cache reaching beyond its size. This was actually only possible with modules, since the decision to realloc was based on argc, rather than argv_len.	2023-01-04 11:03:55 +02:00
ranshid	383d902ce6	reprocess command when client is unblocked on keys (#11012 ) TL;DR --------------------------------------- Following the discussion over the issue [#7551](https://github.com/redis/redis/issues/7551) We decided to refactor the client blocking code to eliminate some of the code duplications and to rebuild the infrastructure better for future key blocking cases. In this PR --------------------------------------- 1. reprocess the command once a client becomes unblocked on key (instead of running custom code for the unblocked path that's different than the one that would have run if blocking wasn't needed) 2. eliminate some (now) irrelevant code for handling unblocking lists/zsets/streams etc... 3. modify some tests to intercept the error in cases of error on reprocess after unblock (see details in the notes section below) 4. replace '$' on the client argv with current stream id. Since once we reprocess the stream XREAD we need to read from the last msg and not wait for new msg in order to prevent endless block loop. 5. Added statistics to the info "Clients" section to report the: * `total_blocking_keys` - number of blocking keys * `total_blocking_keys_on_nokey` - number of blocking keys which have at least 1 client which would like to be unblocked on when the key is deleted. 6. Avoid expiring unblocked key during unblock. Previously we used to lookup the unblocked key which might have been expired during the lookup. Now we lookup the key using NOTOUCH and NOEXPIRE to avoid deleting it at this point, so propagating commands in blocked.c is no longer needed. 7. deprecated command flags. We decided to remove the CMD_CALL_STATS and CMD_CALL_SLOWLOG and make an explicit verification in the call() function in order to decide if stats update should take place. This should simplify the logic and also mitigate existing issues: for example module calls which are triggered as part of AOF loading might still report stats even though they are called during AOF loading. Behavior changes --------------------------------------------------- 1. As this implementation prevents writing dedicated code handling unblocked streams/lists/zsets, since we now re-process the command once the client is unblocked some errors will be reported differently. The old implementation used to issue ``UNBLOCKED the stream key no longer exists`` in the following cases: - The stream key has been deleted (ie. calling DEL) - The stream and group existed but the key type was changed by overriding it (ie. with set command) - The key not longer exists after we swapdb with a db which does not contains this key - After swapdb when the new db has this key but with different type. In the new implementation the reported errors will be the same as if the command was processed after effect: NOGROUP - in case key no longer exists, or WRONGTYPE in case the key was overridden with a different type. 2. Reprocessing the command means that some checks will be reevaluated once the client is unblocked. For example, ACL rules might change since the command originally was executed and will fail once the client is unblocked. Another example is OOM condition checks which might enable the command to run and block but fail the command reprocess once the client is unblocked. 3. One of the changes in this PR is that no command stats are being updated once the command is blocked (all stats will be updated once the client is unblocked). This implies that when we have many clients blocked, users will no longer be able to get that information from the command stats. However the information can still be gathered from the client list. Client blocking --------------------------------------------------- the blocking on key will still be triggered the same way as it is done today. in order to block the current client on list of keys, the call to blockForKeys will still need to be made which will perform the same as it is today: * add the client to the list of blocked clients on each key * keep the key with a matching list node (position in the global blocking clients list for that key) in the client private blocking key dict. * flag the client with CLIENT_BLOCKED * update blocking statistics * register the client on the timeout table Key Unblock --------------------------------------------------- Unblocking a specific key will be triggered (same as today) by calling signalKeyAsReady. the implementation in that part will stay the same as today - adding the key to the global readyList. The reason to maintain the readyList (as apposed to iterating over all clients blocked on the specific key) is in order to keep the signal operation as short as possible, since it is called during the command processing. The main change is that instead of going through a dedicated code path that operates the blocked command we will just call processPendingCommandsAndResetClient. ClientUnblock (keys) --------------------------------------------------- 1. Unblocking clients on keys will be triggered after command is processed and during the beforeSleep 8. the general schema is: 9. For each key k in the readyList: ``` For each client c which is blocked on k: in case either: 1. k exists AND the k type matches the current client blocking type OR 2. k exists and c is blocked on module command OR 3. k does not exists and c was blocked with the flag unblock_on_deleted_key do: 1. remove the client from the list of clients blocked on this key 2. remove the blocking list node from the client blocking key dict 3. remove the client from the timeout list 10. queue the client on the unblocked_clients list 11. NEW: call processCommandAndResetClient(c); ``` NOTE: for module blocked clients we will still call the moduleUnblockClientByHandle which will queue the client for processing in moduleUnblockedClients list. Process Unblocked clients --------------------------------------------------- The process of all unblocked clients is done in the beforeSleep and no change is planned in that part. The general schema will be: For each client c in server.unblocked_clients: * remove client from the server.unblocked_clients * set back the client readHandler * continue processing the pending command and input buffer. Some notes regarding the new implementation --------------------------------------------------- 1. Although it was proposed, it is currently difficult to remove the read handler from the client while it is blocked. The reason is that a blocked client should be unblocked when it is disconnected, or we might consume data into void. 2. While this PR mainly keep the current blocking logic as-is, there might be some future additions to the infrastructure that we would like to have: - allow non-preemptive blocking of client - sometimes we can think that a new kind of blocking can be expected to not be preempt. for example lets imagine we hold some keys on disk and when a command needs to process them it will block until the keys are uploaded. in this case we will want the client to not disconnect or be unblocked until the process is completed (remove the client read handler, prevent client timeout, disable unblock via debug command etc...). - allow generic blocking based on command declared keys - we might want to add a hook before command processing to check if any of the declared keys require the command to block. this way it would be easier to add new kinds of key-based blocking mechanisms. Co-authored-by: Oran Agra <oran@redislabs.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2023-01-01 23:35:42 +02:00
guybe7	9c7c6924a0	Cleanup: Get rid of server.core_propagates (#11572 ) 1. Get rid of server.core_propagates - we can just rely on module/call nesting levels 2. Rename in_nested_call to execution_nesting and update the comment 3. Remove module_ctx_nesting (redundant, we can use execution_nesting) 4. Modify postExecutionUnitOperations according to the comment (The main purpose of this PR) 5. trackingHandlePendingKeyInvalidations: Check the nesting level inside this function	2022-12-20 09:51:50 +02:00
Binbin	20854cb610	Fix zuiFind crash / RM_ScanKey hang on SET object listpack encoding (#11581 ) In #11290, we added listpack encoding for SET object. But forgot to support it in zuiFind, causes ZINTER, ZINTERSTORE, ZINTERCARD, ZIDFF, ZDIFFSTORE to crash. And forgot to support it in RM_ScanKey, causes it hang. This PR add support SET listpack in zuiFind, and in RM_ScanKey. And add tests for related commands to cover this case. Other changes: - There is no reason for zuiFind to go into the internals of the SET. It can simply use setTypeIsMember and don't care about encoding. - Remove the `#include "intset.h"` from server.h reduce the chance of accidental intset API use. - Move setTypeAddAux, setTypeRemoveAux and setTypeIsMemberAux interfaces to the header. - In scanGenericCommand, use setTypeInitIterator and setTypeNext to handle OBJ_SET scan. - In RM_ScanKey, improve hash scan mode, use lpGetValue like zset, they can share code and better performance. The zuiFind part fixes #11578 Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2022-12-09 17:08:01 +02:00
Binbin	fa5474e153	Normalize NAN to a single nan type, like we do with inf (#11597 ) From https://en.wikipedia.org/wiki/NaN#Display, it says that apart from nan and -nan, we can also get NAN and even nan(char-sequence) from libc. In #11482, our conclusion was that we wanna normalize it in Redis to a single nan type, like we already normalized inf. For this, we also reverted the assert_match part of the test added in #11506, using assert_equal to validate the changes.	2022-12-08 19:29:30 +02:00

1 2 3 4 5

249 Commits